Andrej Karpathy - Academic Website | Blog | Github | Quora Session.
- Research Scientist at OpenAI.
- Previously ML/CV PhD student at Stanford under Prof. Fei-Fei Li.
- Course Instructors for famous Stanford CS231n Course in Computer Vision.
Writing papers 寫論文
Writing good papers is an essential survival skill of an academic (kind of like making fire for a caveman). In particular, it is very important to realize that papers are a specific thing: they look a certain way, they flow a certain way, they have a certain structure, language, and statistics that the other academics expect. It’s usually a painful exercise for me to look through some of my early PhD paper drafts because they are quite terrible. There is a lot to learn here.
在學術(shù)界宣肚,能寫好論文是一項關(guān)鍵的生存技能(就像是生火技能對穴居人一樣)证杭。特別地,很重要的一點是要意識到論文是一種特別的事物:它們看起來有一定的形式、以一定的方式流動坐慰、有一定的結(jié)構(gòu)馁蒂、語言以及其他學者所期望的統(tǒng)計數(shù)據(jù)。對我來說瘩欺,查看我博士早期階段的論文真是一種痛苦的歷練必盖,因為它們實在太糟糕了。在這方面有很多東西需要了解俱饿。
Review papers. If you’re trying to learn to write better papers it can feel like a sensible strategy to look at many good papers and try to distill patterns. This turns out to not be the best strategy; it’s analogous to only receiving positive examples for a binary classification problem. What you really want is to also have exposure to a large number of bad papers and one way to get this is by reviewing papers. Most good conferences have an acceptance rate of about 25% so most papers you’ll review are bad, which will allow you to build a powerful binary classifier. You’ll read through a bad paper and realize how unclear it is, or how it doesn’t define it’s variables, how vague and abstract its intro is, or how it dives in to the details too quickly, and you’ll learn to avoid the same pitfalls in your own papers. Another related valuable experience is to attend (or form) journal clubs - you’ll see experienced researchers critique papers and get an impression for how your own papers will be analyzed by others.
查閱論文歌粥。如果你正在學習寫更好的論文,閱讀許多好論文并提取出其中的模式似乎是一個明智的選擇拍埠。但事實證明這并不是最好的策略失驶;這就好像是對于一個二元分類問題只接受正面的樣本一樣。你真正需要的是查閱大量糟糕的論文枣购,其中一種方法是評閱論文嬉探。大部分好的會議的論文接收率大約為 25%,所以你查閱的大部分論文都很差棉圈,這讓你可以構(gòu)建一個強大的二元分類器涩堤。你可以閱讀一篇糟糕的論文,看它的描述有多么不清楚迄损,或者它如何沒有定義自己的變量定躏、摘要介紹有多模糊、或者它如何過快地深入到了細節(jié)之中——你可以學習讓你的論文不落入同樣的陷阱芹敌。另一個相關(guān)的有價值的經(jīng)驗是參加(或組織)讀書俱樂部——你將看到經(jīng)驗豐富的研究者批評論文痊远,并且了解自己的論文將會被其他人怎樣分析。
Get the gestalt right. I remember being impressed with Fei-Fei (my adviser) once during a reviewing session. I had a stack of 4 papers I had reviewed over the last several hours and she picked them up, flipped through each one for 10 seconds, and said one of them was good and the other three bad. Indeed, I was accepting the one and rejecting the other three, but something that took me several hours took her seconds. Fei-Fei was relying on the gestalt of the papers as a powerful heuristic. Your papers, as you become a more senior researcher take on a characteristic look. An introduction of ~1 page. A ~1 page related work section with a good density of citations - not too sparse but not too crowded. A well-designed pull figure (on page 1 or 2) and system figure (on page 3) that were not made in MS Paint. A technical section with some math symbols somewhere, results tables with lots of numbers and some of them bold, one additional cute analysis experiment, and the paper has exactly 8 pages (the page limit) and not a single line less. You’ll have to learn how to endow your papers with the same gestalt because many researchers rely on it as a cognitive shortcut when they judge your work.
格式正確氏捞。我清楚地記得有一次和飛飛參加一次審閱會議碧聪。我在前面的幾個小時里只評閱了 4 篇論文,而她拿起這些論文液茎,每篇只翻了 10 秒鐘就說其中一篇很好逞姿,其它都很糟糕。確實如此捆等,我也接受了這一篇并拒絕了其它三篇滞造,但這項花費我?guī)讉€小時做成的事她只用幾十秒就完成了。飛飛是將論文的格式作為強大的啟發(fā)線索的栋烤。隨著你變成越來越資深的研究者谒养,你的論文將有一種特定風格的外觀。一頁引言/介紹明郭。一頁帶有合適密度引用文獻(不過于稀疏也不過于密集)的相關(guān)成果介紹买窟。一張設計良好的 pull figure(在第一頁或第二頁)和系統(tǒng)圖(在第三頁)——不要用 MS Paint 制作丰泊。描寫技術(shù)的章節(jié)在某個地方有些數(shù)學符號、帶有大量數(shù)字的結(jié)果表(其中一些是粗體)始绍、一個額外的聰明的分析實驗瞳购、而且論文正好有 8 頁(頁數(shù)限制)且一行不少。你將不得不學習如何為你的論文賦予相同的格式亏推,因為許多研究者在評價你的成果時都將其作為認知的捷徑学赛。
Identify the core contribution. Before you start writing anything it’s important to identify the single core contribution that your paper makes to the field. I would especially highlight the word single. A paper is not a random collection of some experiments you ran that you report on. The paper sells a single thing that was not obvious or present before. You have to argue that the thing is important, that it hasn’t been done before, and then you support its merit experimentally in controlled experiments. The entire paper is organized around this core contribution with surgical precision. In particular it doesn’t have any additional fluff and it doesn’t try to pack anything else on a side. As a concrete example, I made a mistake in one of my earlier papers on video classification where I tried to pack in two contributions: 1) a set of architectural layouts for video convnets and an unrelated 2) multi-resolution architecture which gave small improvements. I added it because I reasoned first that maybe someone could find it interesting and follow up on it later and second because I thought that contributions in a paper are additive: two contributions are better than one. Unfortunately, this is false and very wrong. The second contribution was minor/dubious and it diluted the paper, it was distracting, and no one cared. I’ve made a similar mistake again in my CVPR 2014 paper which presented two separate models: a ranking model and a generation model. Several good in-retrospect arguments could be made that I should have submitted two separate papers; the reason it was one is more historical than rational.
確定核心貢獻。在你開始寫任何東西之前径簿,首先很重要的是要確定你的論文對該領(lǐng)域的一個單一的核心貢獻罢屈。我會特別強調(diào)其中的單個詞嘀韧。一篇論文不是你運行的一些實驗的隨機集合的報告篇亭。論文的目的是給出一個之前并不存在或并不明顯的單個事物。你必須認為這個事物是重要的锄贷,它之前從未被完成過译蒂,然后你通過實驗的方式在有對照組的環(huán)境中證明它的優(yōu)點。整篇論文都應該圍繞這一核心貢獻精準地展開谊却。尤其是不要有任何額外的無價值的擴展柔昼,也不要裹帶任何其它東西。舉一個具體的例子炎辨,在我早期的一篇關(guān)于視頻分類的論文(Large-scale Video Classification with Convolutional Neural Networks)中我就犯了這個錯誤捕透,我嘗試一次打包兩個貢獻:1)一個用于視頻卷積網(wǎng)絡的架構(gòu)布局集合,2)一個不相關(guān)的帶有很小改進的多分辨率架構(gòu)碴萧。我把它加上去是因為我覺得一是也許有人會對此感興趣然后跟進后續(xù)研究乙嘀,二是因為我覺得論文的貢獻越多越好:兩個貢獻好于一個貢獻。不幸的是破喻,這是一個非常徹底的錯誤虎谢。第二個貢獻是微不足道的/可疑的,它稀釋了這篇論文曹质,分散了注意力婴噩,而且也沒人關(guān)心。在我 CVPR 2014 的一篇論文(Deep Visual-Semantic Alignments for Generating Image Descriptions)中我又犯了類似的錯誤羽德,我在該論文給出了兩個沒有關(guān)聯(lián)的模型:一個排序模型和一個生成模型几莽。我可以舉出一些好的論據(jù)來證明我應該分開發(fā)兩篇論文;只些一個貢獻的原因更多是歷史上的宅静,而非理智上的章蚣。
The structure. Once you’ve identified your core contribution there is a default recipe for writing a paper about it. The upper level structure is by default Intro, Related Work, Model, Experiments, Conclusions. When I write my intro I find that it helps to put down a coherent top-level narrative in latex comments and then fill in the text below. I like to organize each of my paragraphs around a single concrete point stated on the first sentence that is then supported in the rest of the paragraph. This structure makes it easy for a reader to skim the paper. A good flow of ideas is then along the lines of 1) X (+define X if not obvious) is an important problem 2) The core challenges are this and that. 2) Previous work on X has addressed these with Y, but the problems with this are Z. 3) In this work we do W (?). 4) This has the following appealing properties and our experiments show this and that. You can play with this structure a bit but these core points should be clearly made. Note again that the paper is surgically organized around your exact contribution. For example, when you list the challenges you want to list exactly the things that you address later; you don’t go meandering about unrelated things to what you have done (you can speculate a bit more later in conclusion). It is important to keep a sensible structure throughout your paper, not just in the intro. For example, when you explain the model each section should: 1) explain clearly what is being done in the section, 2) explain what the core challenges are 3) explain what a baseline approach is or what others have done before 4) motivate and explain what you do 5) describe it.
結(jié)構(gòu)。一旦你確定了你的核心貢獻坏为,就有了一個寫論文的默認配方究驴。上層結(jié)構(gòu)默認的是引言/介紹镊绪、相關(guān)工作、模型洒忧、實驗蝴韭、結(jié)論。當我寫我的引言時熙侍,我發(fā)現(xiàn)可以以相關(guān)評論的形式寫下一些條理分明的頂層敘述榄鉴,然后再填寫下面的文本,這會很有幫助蛉抓。我喜歡圍繞單個明確的點來組織我的段落庆尘,并且這個觀點在第一段就會給出,并用該段的剩下部分來支撐這個觀點巷送。這樣的結(jié)構(gòu)可以讓讀者輕松地快速略覽驶忌。然后我們需要一個好的思維流程,可以按以下線索進行:1)X(如果不明顯笑跛,還要加上對 X 的定義)是一個重要的問題付魔;2)核心的挑戰(zhàn)是什么,2)X 上之前的成果已經(jīng)用 Y 解決的問題飞蹂,而這一次的問題是 Z几苍;3)在這項工作中,我們做了 W(?)陈哑;4)這有以下有吸引力的特性妻坝,我們的實現(xiàn)表明了什么。你可以稍微調(diào)整這個結(jié)構(gòu)惊窖,但這些核心的點需要得到明確刽宪。再重申一下:論文需要圍繞你的確切貢獻精準地進行組織。比如說爬坑,當你羅列挑戰(zhàn)的時候纠屋,你需要確切列出那些你將在后面解決的問題,而不要牽扯到你做的與之無關(guān)的事情上(你可以在后面的結(jié)論中多做一點推測)盾计。不只是在引言中售担,保持論文整體的合理結(jié)構(gòu)也是很重要的。比如說署辉,當你解釋你的模型時族铆,每一節(jié)應該:1)解釋清楚在這一節(jié)做了什么,2)解釋核心挑戰(zhàn)哭尝,3)解釋基本方法或之前其他人做了哪些工作哥攘,4)解釋你的動機和你所做的工作,5)描述它。
Break the structure. You should also feel free (and you’re encouraged to!) play with these formulas to some extent and add some spice to your papers. For example, see this amusing paper from Razavian et al. in 2014 that structures the introduction as a dialog between a student and the professor. It’s clever and I like it. As another example, a lot of papers from Alyosha Efros have a playful tone and make great case studies in writing fun papers. As only one of many examples, see this paper he wrote with Antonio Torralba: Unbiased look at dataset bias. Another possibility I’ve seen work well is to include an FAQ section, possibly in the appendix.
打破結(jié)構(gòu)逝淹。你也應該靈活應對這些格式耕姊,擴展你的論文,為之增加一點香料栅葡。比如說 Razavian et al. 的這篇論文(CNN Features off-the-shelf: an Astounding Baseline for Recognition)驚人地將引言做成了一位學生和教授的對話形式茉兰。這做得很聰明,我很喜歡欣簇。另一個例子规脸,Alyosha Efros 的很多論文都帶著一種俏皮的語氣,為有趣論文的書寫給出了絕佳的案例熊咽。比如說他與 Antonio Torralba 合著的這篇論文《Unbiased look at dataset bias》莫鸭。另一種我見過的效果不錯論文是問答式的章節(jié),可能用在附錄中横殴。
Common mistake: the laundry list. One very common mistake to avoid is the “l(fā)aundry list”, which looks as follows: “Here is the problem. Okay now to solve this problem first we do X, then we do Y, then we do Z, and now we do W, and here is what we get”. You should try very hard to avoid this structure. Each point should be justified, motivated, explained. Why do you do X or Y? What are the alternatives? What have others done? It’s okay to say things like this is common (add citation if possible). Your paper is not a report, an enumeration of what you’ve done, or some kind of a translation of your chronological notes and experiments into latex. It is a highly processed and very focused discussion of a problem, your approach and its context. It is supposed to teach your colleagues something and you have to justify your steps, not just describe what you did.
常見的錯誤:洗衣清單(laundry list)被因。洗衣清單是應該避免的一種非常常見的錯誤,它看起來像這樣:「這里有一個問題±溺瑁現(xiàn)在為了解決這個問題氏身,我們首先做 X巍棱,然后我們做 Y惑畴,再做 Z,之后再是 Y航徙,就得到了我們的結(jié)果如贷。」你應該竭力避免這種結(jié)構(gòu)到踏。每一個點都應該得到證明杠袱、給出動機和解釋。為什么你要做 X 或 Y窝稿?有沒有替代選擇楣富?其他人做了什么?可以說這樣的論文很常見(如果可能的話我倒愿意給出例子)伴榔。你的論文不是一份報告纹蝴,不是你做過的事情的枚舉,也不是你的按時間排列的筆記和實驗的某種格式化的翻譯踪少。論文是對于一個問題塘安、你的方法和其背景的高度處理過的和高度聚焦的討論。它應該能教給你的同事一些東西援奢,它必須要能證明你的步驟兼犯,而不只是描述你做了什么。
The language. Over time you’ll develop a vocabulary of good words and bad words to use when writing papers. Speaking about machine learning or computer vision papers specifically as concrete examples, in your papers you never “study” or “investigate” (there are boring, passive, bad words); instead you “develop” or even better you “propose”. And you don’t present a “system” or, shudder, a “pipeline”; instead, you develop a “model”. You don’t learn “features”, you learn “representations”. And god forbid, you never “combine”, “modify” or “expand”. These are incremental, gross terms that will certainly get your paper rejected :).
語言。隨著時間的推移切黔,你會積累一個寫論文時的好詞詞典和壞詞詞典砸脊。具體可以機器學習或計算機視覺論文為例:在你的論文中永遠不要出現(xiàn)「study」和「investigate」(這是無聊的、被動的纬霞、糟糕的詞)脓规;而你應該使用「develop」或甚至「propose」這樣的詞。你不要提出一個「system」或甚至更糟的「pipeline」险领;相反侨舆,你開發(fā)了一個「model」。你不是在學習「features」绢陌,你是在學習「representations」挨下。而且上帝保佑,你千萬不要使用「combine」脐湾、「modify」或「expand」臭笆。這些多余的、粗陋的術(shù)語肯定會讓你的論文被拒 :)
An internal deadlines 2 weeks prior. Not many labs do this, but luckily Fei-Fei is quite adamant about an internal deadline 2 weeks before the due date in which you must submit at least a 5-page draft with all the final experiments (even if not with final numbers) that goes through an internal review process identical to the external one (with the same review forms filled out, etc). I found this practice to be extremely useful because forcing yourself to lay out the full paper almost always reveals some number of critical experiments you must run for the paper to flow and for its argument flow to be coherent, consistent and convincing.
提前兩周的內(nèi)部截至時間秤掌。并沒有許多實驗室這樣做愁铺,但幸運的是飛飛對這個提前兩周的內(nèi)部截至時間限制很是堅定,在這個時間闻鉴,你必須提交至少 5 頁帶有所有最終實驗的草稿(即使不是最終的數(shù)字)茵乱;這份草稿會進入一個與外部完全一樣的內(nèi)部評審過程(具有相同的評審表等等)我發(fā)現(xiàn)這種做法非常有用,因為這會迫使你思考整篇論文的布局孟岛,從而總是能讓你彰顯出一些你必須為這篇論文的思路而運行的關(guān)鍵實驗瓶竭,并讓論據(jù)思路條理清晰、連貫和有說服力渠羞。
Another great resource on this topic is Tips for Writing Technical Papers from Jennifer Widom.
關(guān)于這一主題的另一個好資源是 Jennifer Widom 寫的《Tips for Writing Technical Papers》(https://cs.stanford.edu/people/widom/paper-writing.html)斤贰。
Writing code 寫代碼
A lot of your time will of course be taken up with the execution of your ideas, which likely involves a lot of coding. I won’t dwell on this too much because it’s not uniquely academic, but I would like to bring up a few points.
當然,你仍舊會花很多時間在實現(xiàn)你的想法上次询,也就是說荧恍,你還會編寫很多代碼。因為這并不是學術(shù)上獨有的工作屯吊,所以我不會在此詳談送巡,但還是有幾點我想提一下。
Release your code. It’s a somewhat surprising fact but you can get away with publishing papers and not releasing your code. You will also feel a lot of incentive to not release your code: it can be a lot of work (research code can look like spaghetti since you iterate very quickly, you have to clean up a lot), it can be intimidating to think that others might judge you on your at most decent coding abilities, it is painful to maintain code and answer questions from other people about it (forever), and you might also be concerned that people could spot bugs that invalidate your results. However, it is precisely for some of these reasons that you should commit to releasing your code: it will force you to adopt better coding habits due to fear of public shaming (which will end up saving you time!), it will force you to learn better engineering practices, it will force you to be more thorough with your code (e.g. writing unit tests to make bugs much less likely), it will make others much more likely to follow up on your work (and hence lead to more citations of your papers) and of course it will be much more useful to everyone as a record of exactly what was done for posterity. When you do release your code I recommend taking advantage of docker containers; this will reduce the amount of headaches people email you about when they can’t get all the dependencies (and their precise versions) installed.
公開你的代碼雌芽。雖然你可能會感到驚訝授艰,但是你確實可以不發(fā)表論文也不公開代碼。同時世落,你有很多動機將自己的代碼藏起來:寫代碼會花費許多時間(研究項目的代碼看起來像是意大利面淮腾,因為它的迭代非吃阈瑁快,所以你需要經(jīng)常進行清理)谷朝;同時洲押,光是想到別人可能會對你的代碼評頭論足,就已經(jīng)足夠嚇人了圆凰,維護代碼以及回答別人(永遠會有)的問題是非常痛苦的杈帐,你甚至會擔心別人可能會發(fā)現(xiàn)代碼中的錯誤,從而減弱了研究的可信度专钉。然而挑童,這正是你應該發(fā)表代碼的原因之一:為了避免尷尬的情況發(fā)生,你會不斷采用更好的編碼習慣(而這最終會幫你節(jié)省時間T拘搿)站叼;你會被迫使學習更好的工程實踐;你會被迫使對自己的代碼更加嚴格要求(例如菇民,編寫單元測試以最小化錯誤出現(xiàn)的可能性)尽楔,這一切都將讓你的研究受到更多關(guān)注(并由此帶來更多的引用次數(shù)),并且很自然地第练,你的研究也將對之后的研究更加有用阔馋。當你真的準備發(fā)表代碼的時候,我建議你好好利用 docker containers(https://www.docker.com/)娇掏;它會減少人們發(fā)郵件來問你要附件(和它們的各種版本)呕寝,從而減輕你的煩惱。
Think of the future you. Make sure to document all your code very well for yourself. I guarantee you that you will come back to your code base a few months later (e.g. to do a few more experiments for the camera ready version of the paper), and you will feel completely lost in it. I got into the habit of creating very thorough readme.txt files in all my repos (for my personal use) as notes to future self on how the code works, how to run it, etc.
為將來的你著想驹碍。為了你自己的便捷壁涎,務必將自己的所有代碼妥善記錄,我保證幾個月之后你會回來看你的代碼(例如志秃,為即將發(fā)表的論文再做幾個實驗),那時嚼酝,你會一頭霧水浮还。我已經(jīng)養(yǎng)成了為(自己的)每一個版本編寫非常詳盡的 readme.txt 文件的習慣,以便未來的自己能夠明白代碼的原理和使用方法等等闽巩。
Giving talks 做演講
So, you published a paper and it’s an oral! Now you get to give a few minute talk to a large audience of people - what should it look like?
現(xiàn)在钧舌,你的論文成功發(fā)表了!你需要就這篇論文向許多觀眾進行幾分鐘的演講——它應該是什么樣的涎跨?
The goal of a talk. First, that there’s a common misconception that the goal of your talk is to tell your audience about what you did in your paper. This is incorrect, and should only be a second or third degree design criterion. The goal of your talk is to 1) get the audience really excited about the problem you worked on (they must appreciate it or they will not care about your solution otherwise!) 2) teach the audience something (ideally while giving them a taste of your insight/solution; don’t be afraid to spend time on other’s related work), and 3) entertain (they will start checking their Facebook otherwise). Ideally, by the end of the talk the people in your audience are thinking some mixture of “wow, I’m working in the wrong area”, “I have to read this paper”, and “This person has an impressive understanding of the whole area”.
演講的目的洼冻。首先,一個常有的誤解是隅很,演講的目的是向聽眾介紹你在論文中做了什么撞牢。這是錯誤的,這一目的最多也只能排在第二或第三位。你的演講應應該:1)使聽眾對你研究的問題產(chǎn)生濃厚興趣(如果大家對問題本身沒興趣屋彪,他們也不會在乎你的解決方法的K住)2)教些東西給聽眾(理想的情況是在讓大家體驗你的思考 / 解決方案的時候,不要害怕在別人的相關(guān)工作上花時間)以及 3)有趣(否則很多人會開始刷 Facebook)畜挥。理想情況下仔粥,在演講結(jié)束之后。你的聽眾中應該有人在想這幾件事情:「哇蟹但,我要換個研究方向」躯泰,「我一定要看看這篇論文」,以及「作者本人對整個領(lǐng)域的理解非常出眾华糖≌迕幔」
A few do’s: There are several properties that make talks better. For instance, Do: Lots of pictures. People Love pictures. Videos and animations should be used more sparingly because they distract. Do: make the talk actionable - talk about something someone can do after your talk. Do: give a live demo if possible, it can make your talk more memorable. Do: develop a broader intellectual arch that your work is part of. Do: develop it into a story (people love stories). Do: cite, cite, cite - a lot! It takes very little slide space to pay credit to your colleagues. It pleases them and always reflects well on you because it shows that you’re humble about your own contribution, and aware that it builds on a lot of what has come before and what is happening in parallel. You can even cite related work published at the same conference and briefly advertise it. Do: practice the talk! First for yourself in isolation and later to your lab/friends. This almost always reveals very insightful flaws in your narrative and flow.
一些可以做的事情:有些特征會讓演講更上一層樓,例如缅阳,要:有許多圖片磕蛇。人們喜歡圖片。錄像和動畫應該更少一些十办,因為它們?nèi)菀鬃屓朔中男闫病R屟葜v內(nèi)容高度可執(zhí)行——將一些人們在聽到之后可以馬上動手去做的東西。要:如果可能的話給一個 demo向族,它會讓你的演講更容易被記住呵燕。要發(fā)展一個你的研究涉及到更廣泛的領(lǐng)域。要講成一個故事(人們喜歡故事)件相。要引用再扭,引用,引用——很多應用夜矗!加入引用不會占用你的幻燈片多大的空間泛范,而你的同行們會因此感到高興,并且認為你是一個十分謙虛的人紊撕,因為你意識到自己的貢獻是建立在他人的許多成果之上的罢荡。你甚至可以引用在同一個會議發(fā)表的文章,并為之做簡短的推薦对扶。要進行練習区赵!先自己練習,然后向同事 / 朋友展示浪南。這常常會幫你發(fā)現(xiàn)許多敘述和流程中的重要問題笼才。
Don’t: texttexttext. Don’t crowd your slides with text. There should be very few or no bullet points - speakers sometimes try to use these as a crutch to remind themselves what they should be talking about but the slides are not for you they are for the audience. These should be in your speaker notes. On the topic of crowding the slides, also avoid complex diagrams as much as you can - your audience has a fixed bit bandwidth and I guarantee that your own very familiar and “simple” diagram is not as simple or interpretable to someone seeing it for the first time.
不要加很多文字。不要讓文字擠滿你的幻燈片络凿。你應該少用甚至不用重點標識——演講者們有時會使用重點標識來提醒自己要講些什么骡送,但是幻燈片不是給你自己看的昂羡,而是給觀眾看的。重點標識應該出現(xiàn)在你的演講筆記中各谚。于此類似地紧憾,盡可能地避免使用復雜的圖表——你的聽眾是有固定帶寬的,并且我保證那些在你看來十分熟悉且「簡單」的圖表昌渤,對于那些第一次看到的人來說赴穗,就不是這么好理解了。
Careful with: result tables: Don’t include dense tables of results showing that your method works better. You got a paper, I’m sure your results were decent. I always find these parts boring and unnecessary unless the numbers show something interesting (other than your method works better), or of course unless there is a large gap that you’re very proud of. If you do include results or graphs build them up slowly with transitions, don’t post them all at once and spend 3 minutes on one slide.
注意膀息,結(jié)果表:不要使用信息十分密集的表格來展示你的方法有多么優(yōu)秀般眉。既然你已經(jīng)寫了篇論文出來了,我相信你的結(jié)果至少是可靠的潜支。我一致認為這一部分是非常無聊和無用的甸赃,除非數(shù)字能夠表明一些(與證明你的論文無關(guān)的)十分有趣的東西,或者數(shù)字所表明的差距確實非常巨大冗酿。如果你真的要展示結(jié)果或圖表埠对,請循序漸進地將它們展示出來,而不是把所有東西扔到頁面上裁替,然后在一頁幻燈片上花上三分鐘项玛。
Pitfall: the thin band between bored/confused. It’s actually quite tricky to design talks where a good portion of your audience learns something. A common failure case (as an audience member) is to see talks where I’m painfully bored during the first half and completely confused during the second half, learning nothing by the end. This can occur in talks that have a very general (too general) overview followed by a technical (too technical) second portion. Try to identify when your talk is in danger of having this property.
陷阱:無聊與困惑之間的微小距離。如果你聽眾中的許多人都抱著一種學習的心態(tài)而來弱判,要設計出一個好的演講不是那么容易的襟沮。一個常見的失敗案例是(作為一個聽眾),在演講的前半段無聊至死昌腰,然后在后半段困惑不已开伏,最后啥都沒學到。經(jīng)常出現(xiàn)這一情形的演講的特點是遭商,摘要非常概括性(過于概括了)固灵,然后緊接著技術(shù)(過于技術(shù)的)詳解。嘗試在你的演講中規(guī)避這一傾向株婴。
Pitfall: running out of time. Many speakers spend too much time on the early intro parts (that can often be somewhat boring) and then frantically speed through all the last few slides that contain the most interesting results, analysis or demos. Don’t be that person.
陷阱:超時怎虫。許多演講者會在開始的部分花費過多的時間(一般來講這也會使得演講變得無聊),然后火急火燎地了解最后的幾張幻燈片困介,而那些往往是最有趣的結(jié)果、分析或 demo蘸际。不要做這樣的演講者座哩。
Pitfall: formulaic talks. I might be a special case but I’m always a fan of non-formulaic talks that challenge conventions. For instance, I despise the outline slide. It makes the talk so boring, it’s like saying: “This movie is about a ring of power. In the first chapter we’ll see a hobbit come into possession of the ring. In the second we’ll see him travel to Mordor. In the third he’ll cast the ring into Mount Doom and destroy it. I will start with chapter 1” - Come on! I use outline slides for much longer talks to keep the audience anchored if they zone out (at 30min+ they inevitably will a few times), but it should be used sparingly.
陷阱:形式化的演講鞋吉。我可能是個特例螟炫,但是我一直都喜歡挑戰(zhàn)傳統(tǒng)的、規(guī)避形式化的演講宣肚。例如,我鄙視在幻燈片中加入演講大綱的行為屿良。因為這使得整個演講變得無聊圈澈,就像在說:「這部電影講述的是一個有魔力的戒指,在第一章我們會看到一個霍比特人得到這個戒指尘惧,第二章我們會看到他去了 Mordor康栈,第三章里他將戒指扔到了 Mount Doom 并將之毀壞了。我將從第一章開始講起」——拜托別這樣喷橙!我只在非常長的演講中才使用大綱頁面啥么,以便于聽眾在走神之后重新恢復記憶(30 分鐘后他們往往會走幾次神),但是這應該盡量少用贰逾。
Observe and learn. Ultimately, the best way to become better at giving talks (as it is with writing papers too) is to make conscious effort to pay attention to what great (and not so great) speakers do and build a binary classifier in your mind. Don’t just enjoy talks; analyze them, break them down, learn from them. Additionally, pay close attention to the audience and their reactions. Sometimes a speaker will put up a complex table with many numbers and you will notice half of the audience immediately look down on their phone and open Facebook. Build an internal classifier of the events that cause this to happen and avoid them in your talks.
觀察并學習悬荣。最終,成為一個優(yōu)秀演講者的最好方法是(寫論文也是這樣)疙剑,留意觀察優(yōu)秀的(和不怎么優(yōu)秀的)演講者的行為氯迂,然后在你的大腦里構(gòu)建一個二元分類器。不要僅僅做演講的聽眾言缤;你要對它們進行分析嚼蚀、分解、然后從中學習轧简。除此之外驰坊,留意現(xiàn)場反應。有時哮独,當演講者展示出一個復雜的數(shù)字表格時拳芙,你會注意到,許多觀眾立馬低頭看起了手機皮璧。為可能導致這一場景的行為構(gòu)建一個內(nèi)部分類器舟扎,并在你自己的演講中避免這些行為。
Attending conferences 參加會議
On the subject of conferences:
對于會議:
Go. It’s very important that you go to conferences, especially the 1-2 top conferences in your area. If your adviser lacks funds and does not want to pay for your travel expenses (e.g. if you don’t have a paper) then you should be willing to pay for yourself (usually about $2000 for travel, accommodation, registration and food). This is important because you want to become part of the academic community and get a chance to meet more people in the area and gossip about research topics. Science might have this image of a few brilliant lone wolfs working in isolation, but the truth is that research is predominantly a highly social endeavor - you stand on the shoulders of many people, you’re working on problems in parallel with other people, and it is these people that you’re also writing papers to. Additionally, it’s unfortunate but each field has knowledge that doesn’t get serialized into papers but is instead spread across a shared understanding of the community; things such as what are the next important topics to work on, what papers are most interesting, what is the inside scoop on papers, how they developed historically, what methods work (not just on paper, in reality), etcetc. It is very valuable (and fun!) to become part of the community and get direct access to the hivemind - to learn from it first, and to hopefully influence it later.
參加悴务。參加會議是很重要的睹限,特別是你所在的領(lǐng)域的最頂尖的 1-2 場會議。如果你的導師缺乏資金讯檐,不愿意為你的路費買單(例如羡疗,當你還沒有論文的時候),那么你應當愿意自己買單别洪。這是很重要的叨恨,因為你需要成為學術(shù)圈的一員,并能夠見到更多同僚挖垛,以及了解研究話題的八卦痒钝”牛科學界可能有一些極少數(shù)的單打獨斗的人,但是真相是送矩,做研究很大程度上是一個高度社交性的事業(yè)——你是站在許多人的肩膀上的蚕甥,且還有許多人和你一起努力,并且這些人也是你的論文的閱讀者栋荸。此外菇怀,我很遺憾這么說,但是每一個領(lǐng)域都有一些沒有出現(xiàn)在論文里蒸其、但是在整個圈子里廣為流傳的知識敏释,包括接下來的重要話題有什么,哪些論文是最有趣的摸袁,論文的內(nèi)線消息是什么钥顽,他們之前是如何發(fā)展的,哪些方法管用了(不是在論文里靠汁,而是在實際中)蜂大,等等等等。成為圈子里的一員蝶怔,并且了解這個集體中的共識奶浦,是很有價值的(并且很有趣!)——首先從中學習踢星,然后最好能夠影響這個圈子澳叉。
Talks: choose by speaker. One conference trick I’ve developed is that if you’re choosing which talks to attend it can be better to look at the speakers instead of the topics. Some people give better talks than others (it’s a skill, and you’ll discover these people in time) and in my experience I find that it often pays off to see them speak even if it is on a topic that isn’t exactly connected to your area of research.
講座:根據(jù)演講者進行選擇。我使用的一個會議技巧是沐悦,在選擇講座的時候要看演講嘉賓成洗,而不是講座主題(這是一項技能,慢慢地你會發(fā)現(xiàn)有價值的人)藏否,并且瓶殃,根據(jù)我的經(jīng)驗,我發(fā)現(xiàn)親耳聽這些人演講會大有裨益副签,盡管話題甚至和你的研究領(lǐng)域沒有直接聯(lián)系遥椿。
The real action is in the hallways. The speed of innovation (especially in Machine Learning) now works at timescales much faster than conferences so most of the relevant papers you’ll see at the conference are in fact old news. Therefore, conferences are primarily a social event. Instead of attending a talk I encourage you to view the hallway as one of the main events that doesn’t appear on the schedule. It can also be valuable to stroll the poster session and discover some interesting papers and ideas that you may have missed.
It is said that there are three stages to a PhD. In the first stage you look at a related paper’s reference section and you haven’t read most of the papers. In the second stage you recognize all the papers. In the third stage you’ve shared a beer with all the first authors of all the papers.
真正有價值的信息可能在走廊上。現(xiàn)在淆储,創(chuàng)新的速度(尤其在機器學習領(lǐng)域)已經(jīng)比會議的間隔時間要短了冠场,所以你在會議看到的大部分論文實際上都算是舊新聞了。因此本砰,會議更多地是一項社交活動慈鸠。與其參加一個講座,我建議你把去走廊轉(zhuǎn)轉(zhuǎn)作為一項主要活動灌具。你還可以去海報宣傳去逛逛青团,說不定會發(fā)現(xiàn)一些錯過的有趣論文和想法。
據(jù)說一個博士生有三個階段咖楣。在第一個階段督笆,一篇相關(guān)論文的引用你大部分都沒看過;在第二個階段诱贿,你能認出這些論文娃肿;在第三個階段,你已經(jīng)與所有論文的第一作者喝過一圈了珠十。
Closing thoughts 最后的一些想法
I can’t find the quote anymore but I heard Sam Altman of YC say that there are no shortcuts or cheats when it comes to building a startup. You can’t expect to win in the long run by somehow gaming the system or putting up false appearances. I think that the same applies in academia. Ultimately you’re trying to do good research and push the field forward and if you try to game any of the proxy metrics you won’t be successful in the long run. This is especially so because academia is in fact surprisingly small and highly interconnected, so anything shady you try to do to pad your academic resume (e.g. self-citing a lot, publishing the same idea multiple times with small remixes, resubmitting the same rejected paper over and over again with no changes, conveniently trying to leave out some baselines etc.) will eventually catch up with you and you will not be successful.
盡管我現(xiàn)在找不到出處了料扰,但是我曾聽到 YC 的 Sam Altman 說,建立一個創(chuàng)業(yè)公司沒有捷徑可走焙蹭。你不能指望通過玩弄體制晒杈,或者通過偽裝來獲得長久的勝利。我想在學術(shù)領(lǐng)域也是一樣的孔厉。最終拯钻,你的目的是用優(yōu)秀的研究推動這一領(lǐng)域的進步,如果你試圖針對某些指標動手腳撰豺,從長遠來看你無法成功粪般。在學術(shù)界尤其如此,因為學術(shù)界令人驚訝地小污桦,并且高度關(guān)聯(lián)亩歹,所以,任何你試圖在學術(shù)履歷上用點陰招(例如凡橱,常常自己引用自己小作、將同一想法稍作修改后重復發(fā)表、重復提交被退回的論文而沒有絲毫修改梭纹、為了自己的便利而拋棄一些基本原則躲惰,等等)最終將讓你嘗盡苦果,而你也不會成功变抽。
So at the end of the day it’s quite simple. Do good work, communicate it properly, people will notice and good things will happen. Have a fun ride!
所以础拨,總而言之就一句話:好好工作、適當交流绍载,人們會注意到你诡宗,好事也會發(fā)生。祝博士之旅愉快击儡!
EDIT: HN discussion link.
【附錄:博士論文】
-
論文:連接圖像與自然語言(CONNECTING IMAGES AND NATURAL LANGUAGE)
摘要:人工智能領(lǐng)域的一個長期目標是開發(fā)能夠感知和理解我們周圍豐富的視覺世界塔沃,并能使用自然語言與我們進行關(guān)于其的交流的代理。由于近些年來計算基礎(chǔ)設施阳谍、數(shù)據(jù)收集和算法的發(fā)展蛀柴,人們在這一目標的實現(xiàn)上已經(jīng)取得了顯著的進步螃概。這些進步在視覺識別上尤為迅速——現(xiàn)在計算機已能以可與人類媲美的表現(xiàn)對圖像進行分類,甚至在一些情況下超越人類鸽疾,比如識別狗的品種吊洼。但是,盡管有許多激動人心的進展制肮,但大部分視覺識別方面的進步仍然是在給一張圖像分配一個或多個離散的標簽(如冒窍,人、船豺鼻、鍵盤等等)方面综液。
在這篇學位論文中,我們開發(fā)了讓我們可以將視覺數(shù)據(jù)領(lǐng)域和自然語言話語領(lǐng)域連接起來的模型和技術(shù)儒飒,從而讓我們可以實現(xiàn)兩個領(lǐng)域中元素的互譯谬莹。具體來說,首先我們引入了一個可以同時將圖像和句子嵌入到一個共有的多模態(tài)嵌入空間(multi-modal embedding space)中的模型约素。然后這個空間讓我們可以識別描繪了一個任意句子描述的圖像届良,而且反過來我們還可以找出描述任意圖像的句子。其次圣猎,我們還開發(fā)了一個圖像描述模型(image captioning model)士葫,該模型可以根據(jù)輸入其的圖像直接生成一個句子描述——該描述并不局限于人工編寫的有限選擇集合。最后送悔,我們描述了一個可以定位和描述圖像中所有顯著部分的模型慢显。我們的研究表明這個模型還可以反向使用:以任意描述(如:白色網(wǎng)球鞋)作為輸入,然后有效地在一個大型的圖像集合中定位其所描述的概念欠啤。我們認為這些模型荚藻、它們內(nèi)部所使用的技術(shù)以及它們可以帶來的交互是實現(xiàn)人工智能之路上的一塊墊腳石,而且圖像和自然語言之間的連接也能帶來許多實用的益處和馬上就有價值的應用洁段。
從建模的角度來看应狱,我們的貢獻不在于設計和展現(xiàn)了能以復雜的處理流程處理圖像和句子的明確算法,而在于卷積和循環(huán)神經(jīng)網(wǎng)絡架構(gòu)的混合設計祠丝,這種設計可以在一個單個網(wǎng)絡中將視覺數(shù)據(jù)和自然語言話語連接起來疾呻。因此,圖像写半、句子和關(guān)聯(lián)它們的多模態(tài)嵌入結(jié)構(gòu)的計算處理會在優(yōu)化損失函數(shù)的過程中自動涌現(xiàn)岸蜗,該優(yōu)化考慮網(wǎng)絡在圖像及其描述的訓練數(shù)據(jù)集上的參數(shù)。這種方法享有許多神經(jīng)網(wǎng)絡的優(yōu)點叠蝇,其中包括簡單的均質(zhì)計算的使用璃岳,這讓其易于在硬件上實現(xiàn)并行;以及強大的性能——由于端到端訓練(end-to-end training)可以將這個問題表示成單個優(yōu)化問題,其中該模型的所有組件都具有一個相同的最終目標铃慷。我們的研究表明我們的模型在需要圖像和自然語言的聯(lián)合處理的任務中推進了當前最佳的表現(xiàn)单芜,而且我們可以一種能促進對該網(wǎng)絡的預測的可解讀視覺檢查的方式來設計這一架構(gòu)。
(本文為自己整理枚冗,僅供學習收藏使用缓溅,譯文部分參考機器之心翻譯(有一段翻譯漏掉了,自己加上去了赁温,然后略作修改),在此表示感謝淤齐。未經(jīng)允許禁止轉(zhuǎn)載股囊,授權(quán)轉(zhuǎn)載請注明出處,謝謝8摹)