ENGINEERING IS THE BOTTLENECK IN (DEEP LEARNING) RESEARCH

From: http://blog.dennybritz.com/2017/01/17/engineering-is-the-bottleneck-in-deep-learning-research/

Warning: This a rant post containing a bunch of unorganized thoughts.

When I was in graduate school working on NLP and information extraction I spent most of my time coding up research ideas. That’s what grad students with advisors who don’t like to touch code, which are probably 95% of all advisors, tend to do. When I raised concerns about problems I would often hear the phrase “that’s just an engineering problem; let’s move on”. I later realized that’s code speech for “I don’t think a paper mentioning this would get through the peer review process”. This mindset seems pervasive among people in academia. But as an engineer I can’t help but notice how the lack of engineering practices is holding us back.

I will use the Deep Learning community as an example, because that’s what I’m familiar with, but this probably applies to other communities as well. As a community of researchers we all share a common goal: Move the field forward. Push the state of the art. There are various ways to do this, but the most common one is to publish research papers. The vast majority of published papers are incremental, and I don’t mean this in a degrading fashion. I believe that research is incremental by definition, which is just another way of saying that new work builds upon what other’s have done in the past. And that’s how it should be. To make this concrete, the majority of the papers I come across consist of more than 90% existing work, which includes datasets, preprocessing techniques, evaluation metrics, baseline model architectures, and so on. The authors then typically add a bit novelty and show improvement over well-established baselines.

So far nothing is wrong with this. The problem is not the process itself, but how it is implemented. There are two issues that stand out to me, both of which can be solved with “just engineering.” 1. Waste of research time and 2. Lack of rigor and reproducibility. Let’s look at each of them.

WASTE OF RESEARCH TIME (DIFFICULTY OF BUILDING ON OTHER’S WORK)

Researchers are highly trained professionals. Many have spent years or decades getting PhDs and becoming experts in their respective fields. It only makes sense that those people should spend the majority of their time doing what they’re good at – innovating by coming up with novel techniques. Just like you wouldn’t want a highly trained surgeon spending several hours a day inputting patient data from paper forms. But that’s pretty much what’s happening.

In an ideal world, a researcher with an idea could easily build on top of what has already been done (the 90% I mention above) and have 10% of work left to do in order to test his or her hypothesis. (I realize there are exceptions to this if you’re doing something truly novel, but the majority of published research falls into this category). In practice, quite the opposite is happening. Researchers spend weeks re-doing data pre- and post-processing and re-implementing and debugging baseline models. This often includes tracking down authors of related papers to figure out what tricks were used to make it work at all. Papers tend to not mention the fine print because that would make the results look less impressive.In the process of doing this, researchers introduce dozens of confounding variables, which essentially make the comparisons meaningless.But more on that later.

What I realized is thatthe difficulty of building upon other’s work is a major factor in determining what research is being done. The majority of researchers build on top of their own research, over and over again. Of course, one may argue that this is a result becoming an expert in some very specific subfield, so it only makes sense to continue focusing on similar problems. While no completely untrue, I don’t think that’s what’s happening (especially in Deep Learning, where many subfields are so closely related that knowledge transfers over pretty nicely). I believe the main reason for this is that it’seasiest, from an experimental perspective, to build upon one’s own work. It leads to more publications and faster turnaround time. Baselines are already implemented in familiar code, evaluation is setup, related work is written up, and so on. It also leads to less competition – nobody else has access to your experimental setup and can easily compete with you. If it were just as easy to build upon somebody else’s work we would probably see more diversity in published research.

It’s not all bad news though. There certainly are a few trends going into the right direction. Publishing code is becoming more common. Software packages like OpenAI’s gym (and Universe), ensure that at least evaluation and datasets are streamlined. Tensorflow and other Deep Learning frameworks remove a lot of potential confounding variables by implementing low-level primitives. With that being said, we’re still a far cry from where we could be. Just imagine how efficient research could be if we had standardized frameworks, standard repositories of data, well-documented standard code bases and coding styles to build upon, and strict automated evaluation frameworks and entities operating on exactly the same datasets. From an engineering perspective all of these are simple things – but they could have a huge impact.

I think we’re under-appreciating the fact that we’re dealing with pure software. That sounds obvious, but it’s actually a big deal. Setting up tightly controlled experiments in fields like medicine or psychology is almost impossible and involves an extraordinary amount of work. With software it’s essentially free. It’s more unique than most of us realize. But we’re just not doing it. I believe one reason for why these changes (and many others) are not happening is a misalignment of incentives. Truth the told, most researchers care more about their publications, citations, and tenure tracks than about actually driving the field forward. They are happy with a status quo that favors them.

LACK OF RIGOR

The second problem is closely related to the first. I hinted at it above. It’s a lack of rigor and reproducibility. In an ideal world, a researcher could hold constant all irrelevant variables, implement a new technique, and then show some improvement over a range of baselines within some margin of significance. Sounds obvious? Well, if you happen to read a lot Deep Learning papers this sounds like it’s coming straight from a sci-fi movie.

In practice, as everyone re-implements techniques using different frameworks and pipelines,comparisons become meaningless. In almost every Deep Learning model implementation there exist a huge number “hidden variables” that can affect results. These include non-obvious model hyperparameters baked into the code, data shuffle seeds, variable initializers, and other things that are typically not mentioned in papers, but clearly affect final measurements. As you re-implement your LSTM, use a different framework, pre-process your own data, and write thousands of lines of code, how many confounding variables will you have created? My guess is that it’s in the hundreds to thousands. If you then show a 0.5% marginal improvement over some baseline models (with numbers usually taken from past papers and not even averaged across multiple runs) how can you ever prove causality? How do you know it’s not a result of some combination of confounding variables?

Personally, I do not trust paper results at all. I tend to read papers for inspiration – I look at the ideas, not at the results. This isn’t how it should be. What if all researchers published code? Wouldn’t that solve the problem? Actually, no. Putting your 10,000 lines of undocumented code on Github and saying “here, run this command to reproduce my number” is not the same as producing code that people will read, understand, verify, and build upon. It’s likeShinichi Mochizuki’s proof of the ABC Conjecture, producing something that nobody except you understands.

Again, “just engineering” has the potential to solve this. The solution is pretty much equivalent to problem #1 (standard code, datasets, evaluation entities, etc), but so are the problems. In fact, it may not even be in the best interest of researchers to publish readable code. What if people found bugs in it and you need to retract your paper? Publishing code is risky, without a clear upside other than PR for whatever entity you work for.

/ END OF RANT

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末十嘿,一起剝皮案震驚了整個(gè)濱河市左电,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌垮卓,老刑警劉巖抹恳,帶你破解...
    沈念sama閱讀 219,110評(píng)論 6 508
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件燥翅,死亡現(xiàn)場(chǎng)離奇詭異衔憨,居然都是意外死亡严卖,警方通過查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 93,443評(píng)論 3 395
  • 文/潘曉璐 我一進(jìn)店門梢什,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人朝聋,你說我怎么就攤上這事嗡午。” “怎么了冀痕?”我有些...
    開封第一講書人閱讀 165,474評(píng)論 0 356
  • 文/不壞的土叔 我叫張陵荔睹,是天一觀的道長(zhǎng)狸演。 經(jīng)常有香客問我,道長(zhǎng)僻他,這世上最難降的妖魔是什么宵距? 我笑而不...
    開封第一講書人閱讀 58,881評(píng)論 1 295
  • 正文 為了忘掉前任,我火速辦了婚禮吨拗,結(jié)果婚禮上满哪,老公的妹妹穿的比我還像新娘。我一直安慰自己劝篷,他們只是感情好哨鸭,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,902評(píng)論 6 392
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著娇妓,像睡著了一般像鸡。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上哈恰,一...
    開封第一講書人閱讀 51,698評(píng)論 1 305
  • 那天只估,我揣著相機(jī)與錄音,去河邊找鬼着绷。 笑死仅乓,一個(gè)胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的蓬戚。 我是一名探鬼主播夸楣,決...
    沈念sama閱讀 40,418評(píng)論 3 419
  • 文/蒼蘭香墨 我猛地睜開眼,長(zhǎng)吁一口氣:“原來是場(chǎng)噩夢(mèng)啊……” “哼子漩!你這毒婦竟也來了豫喧?” 一聲冷哼從身側(cè)響起,我...
    開封第一講書人閱讀 39,332評(píng)論 0 276
  • 序言:老撾萬榮一對(duì)情侶失蹤幢泼,失蹤者是張志新(化名)和其女友劉穎紧显,沒想到半個(gè)月后,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體缕棵,經(jīng)...
    沈念sama閱讀 45,796評(píng)論 1 316
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡孵班,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,968評(píng)論 3 337
  • 正文 我和宋清朗相戀三年,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了招驴。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片篙程。...
    茶點(diǎn)故事閱讀 40,110評(píng)論 1 351
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡,死狀恐怖别厘,靈堂內(nèi)的尸體忽然破棺而出虱饿,到底是詐尸還是另有隱情,我是刑警寧澤渴肉,帶...
    沈念sama閱讀 35,792評(píng)論 5 346
  • 正文 年R本政府宣布仇祭,位于F島的核電站颈畸,受9級(jí)特大地震影響承冰,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜寂屏,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,455評(píng)論 3 331
  • 文/蒙蒙 一迁霎、第九天 我趴在偏房一處隱蔽的房頂上張望百宇。 院中可真熱鬧,春花似錦昌粤、人聲如沸啄刹。這莊子的主人今日做“春日...
    開封第一講書人閱讀 32,003評(píng)論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽昵时。三九已至,卻和暖如春壹甥,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背盹廷。 一陣腳步聲響...
    開封第一講書人閱讀 33,130評(píng)論 1 272
  • 我被黑心中介騙來泰國(guó)打工俄占, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留,地道東北人缸榄。 一個(gè)月前我還...
    沈念sama閱讀 48,348評(píng)論 3 373
  • 正文 我出身青樓甚带,卻偏偏與公主長(zhǎng)得像,于是被迫代替她去往敵國(guó)和親晴氨。 傳聞我的和親對(duì)象是個(gè)殘疾皇子籽前,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 45,047評(píng)論 2 355

推薦閱讀更多精彩內(nèi)容

  • **2014真題Directions:Read the following text. Choose the be...
    又是夜半驚坐起閱讀 9,511評(píng)論 0 23
  • 日本NHK電視臺(tái)11月7日?qǐng)?bào)道稱蓖宦,日本海上自衛(wèi)隊(duì)和部署在日本海的美國(guó)海軍航空母艦以及印度海軍舉行了聯(lián)合訓(xùn)練...
    人間清影閱讀 335評(píng)論 0 3
  • 妖子在煙火聲中說球昨,有什么愿望現(xiàn)在說吧。老天爺被我們吵醒了眨攘,不得不聽我們說話啦…
    阿生_閱讀 233評(píng)論 0 0
  • 總有一些故事告訴我們主慰, 只要你愿意鲫售, 不管什么樣的日子共螺, 也能過成我們自己想要的樣子情竹。 1 幾天前藐不,我應(yīng)萍姐的邀請(qǐng)...
    漢芙閱讀 860評(píng)論 0 1
  • 這是胖子對(duì)我講述的一個(gè)童年的初戀故事。從胖子訴說時(shí)那深沉且憂郁的眼神里雏蛮,我能夠讀到這段隱藏在胖子心靈深處的感情涎嚼,該...
    落花逐水流閱讀 72評(píng)論 0 2