Why a Mathematician, Statistician, & Machine Learner Solve the Same Problem Differently

為何機器學習攒读,統(tǒng)計學同一件事情做法不同

At a glance, machine learning and statistics seem to be very similar, but many people fail to stress the importance of the difference between these two disciplines. Machine learning and statistics share the same goals—they both focus on data modeling—but their methods are affected by their cultural differences. In order to empower collaboration and knowledge creation, it’s very important to understand the fundamental underlying differences that reflect in the cultural profile of these two disciplines. To gain a deeper understanding of these differences, we need to take a step back and look at their historical roots.

A Brief History of Machine Learning and Statistics

In 1946, the first computer system, ENIAC, was developed with the vision of reforming numerical computation using a machine (instead of manual numerical computation using pencil and paper). The underlying idea, at that time, was that human thinking (human capital investment) and learning could be replicated into a logical format in a machine.

In the ‘50s, Alan Turing, the father of Artificial Intelligence (AI), proposed a test to measure to what extent a machine can learn and perform like a human. In the following decade, Frank Rosenblatt invented the notion of Perceptron at the Cornell Aeronautical Laboratory. The idea behind this revolutionary invention was that perceptrons can be very similar to linear classifier. He showed that by combining a large number of perceptrons, we can create a powerful network model—what we now know as a neural network.

The study of machine learning has been growing into the effort of a handful of computer engineers trying to explore whether computers could learn and mimic the human brain. Machine learning plays a crucial role into the discovery of knowledge from data and has an enormous number of applications today.

The field of statistics started around the second part of the seventeenth century. The idea behind the development of this discipline was to measure uncertainty in experimental and observational science as the basis for probability theory. Statistics, from its outset, was meant to provide tools to not only “describe” but, more importantly, “explain” phenomena.

Interestingly enough, beer has had a large influence on the development of statistics. One of the foundational concepts of the field, the t-statistic, was introduced by a Chemist as a way to account for differences between batches at Guinness brewery in Dublin, Ireland. This and other concepts led to the development of a structured mathematical theory, with well defined definition and principles. Statistics developed instruments that humans could facilitate and use to increase their power of observation, permuting, predicting, and sampling.

The Difference is Cultural

Capturing real-world phenomena is an exercise in dealing with uncertainty. To do so, statisticians must understand the underlying distribution of the population under study, as well as come up with parameters that will provide predictive power. The goal for a statistician is to predict an interaction between variables with some degree of certainty (we are never 100% certain about anything). Machine learners, on the other hand, want to build algorithms that predict, classify, and cluster with the most accuracy. They operate without uncertainty or assumptions, continuously learning in order to improve their accuracy score.

Here’s a snapshot that captures the cultural differences in machine learners and statisticians’ approaches:

So why should we care?

Better, More Informed decisions

A thorough understanding of the differences in culture and jargon between the two disciplines will result in more productive communication. And better communication will certainly lead to better collaboration, which will lead to improved decision-making processes among teams.

Many times, professionals in statistics or machine learning make assumptions about how others might approach a problem. Peter Norvig, director of research at Google, once told a story that’s a great example of how this can backfire.

Norvig teamed up with a Stanford statistician to prove that statisticians, data scientists and mathematicians think the same way. They hypothesized that, if they all received the same dataset, worked on it, and came back together, they’d find they all independently used the same techniques. So, they got a very large dataset and shared it between them.

Norvig used the whole dataset and built a complex predictive model. The statistician took a 1% sample of the dataset, discarded the rest, and showed that the data met certain assumptions.

The mathematician, believe it or not, didn’t even look at the dataset. Rather, he proved the characteristics of various formulas that could (in theory) be applied to the data.

Instead of showing that people in these fields work the same way, Norvig’s experiment demonstrated that communication is essential if people in these disciplines want to tackle tough problems together.

Narrowing the Gap

Understanding the interlocutor and knowing their cultural background enables machine learners and statisticians to expand their knowledge and even apply methods outside their domain of expertise. This is the notion of “data science” itself, which aims to bridge the gap. Collaboration and communication between these two fascinating data-driven disciplines—machine learning and statistics—allows us to make better decisions that will ultimately positively affect the way we live.

About the Authors:

Nir Kalderois the Director of Data Science and the Head of Galvanize Experts, Galvanize, Inc,. Nir also serves on the Faculty of the Master’s of Science in Data Science, powered by the University of New Haven.

Dr. Donatella Taurasiis a lecturer and a Scholar at Haas School of Business and the Fung Institute For Engineering Leadership in Berkeley, and at Hult International Business School in San Francisco.

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市,隨后出現(xiàn)的幾起案子受楼,更是在濱河造成了極大的恐慌欢顷,老刑警劉巖凡简,帶你破解...
    沈念sama閱讀 211,265評論 6 490
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件贬墩,死亡現(xiàn)場離奇詭異,居然都是意外死亡拇惋,警方通過查閱死者的電腦和手機周偎,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 90,078評論 2 385
  • 文/潘曉璐 我一進店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來撑帖,“玉大人蓉坎,你說我怎么就攤上這事『伲” “怎么了蛉艾?”我有些...
    開封第一講書人閱讀 156,852評論 0 347
  • 文/不壞的土叔 我叫張陵,是天一觀的道長衷敌。 經(jīng)常有香客問我勿侯,道長,這世上最難降的妖魔是什么缴罗? 我笑而不...
    開封第一講書人閱讀 56,408評論 1 283
  • 正文 為了忘掉前任助琐,我火速辦了婚禮,結(jié)果婚禮上面氓,老公的妹妹穿的比我還像新娘兵钮。我一直安慰自己蛆橡,他們只是感情好,可當我...
    茶點故事閱讀 65,445評論 5 384
  • 文/花漫 我一把揭開白布掘譬。 她就那樣靜靜地躺著泰演,像睡著了一般。 火紅的嫁衣襯著肌膚如雪屁药。 梳的紋絲不亂的頭發(fā)上粥血,一...
    開封第一講書人閱讀 49,772評論 1 290
  • 那天,我揣著相機與錄音酿箭,去河邊找鬼。 笑死趾娃,一個胖子當著我的面吹牛缭嫡,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播抬闷,決...
    沈念sama閱讀 38,921評論 3 406
  • 文/蒼蘭香墨 我猛地睜開眼妇蛀,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了笤成?” 一聲冷哼從身側(cè)響起评架,我...
    開封第一講書人閱讀 37,688評論 0 266
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎炕泳,沒想到半個月后纵诞,有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 44,130評論 1 303
  • 正文 獨居荒郊野嶺守林人離奇死亡培遵,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 36,467評論 2 325
  • 正文 我和宋清朗相戀三年浙芙,在試婚紗的時候發(fā)現(xiàn)自己被綠了。 大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片籽腕。...
    茶點故事閱讀 38,617評論 1 340
  • 序言:一個原本活蹦亂跳的男人離奇死亡嗡呼,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出皇耗,到底是詐尸還是另有隱情南窗,我是刑警寧澤,帶...
    沈念sama閱讀 34,276評論 4 329
  • 正文 年R本政府宣布郎楼,位于F島的核電站万伤,受9級特大地震影響,放射性物質(zhì)發(fā)生泄漏箭启。R本人自食惡果不足惜壕翩,卻給世界環(huán)境...
    茶點故事閱讀 39,882評論 3 312
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望傅寡。 院中可真熱鬧放妈,春花似錦北救、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,740評論 0 21
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至宅倒,卻和暖如春攘宙,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背拐迁。 一陣腳步聲響...
    開封第一講書人閱讀 31,967評論 1 265
  • 我被黑心中介騙來泰國打工蹭劈, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留,地道東北人线召。 一個月前我還...
    沈念sama閱讀 46,315評論 2 360
  • 正文 我出身青樓铺韧,卻偏偏與公主長得像,于是被迫代替她去往敵國和親缓淹。 傳聞我的和親對象是個殘疾皇子哈打,可洞房花燭夜當晚...
    茶點故事閱讀 43,486評論 2 348

推薦閱讀更多精彩內(nèi)容