Machine Learning in a Week-機器學習一周入門實踐

Machine Learning in a Week

機器學習一周入門實踐


Getting into machine learning(ml) can seem like an unachievable task from the outside.

在外界來看初坠,機器學習的入門是一件難以企及的任務诗轻。

And it definitely can be, if you attack it from the wrong end.

事實是一旦你用錯誤的姿勢打開,你的確可能永遠不會真正入門路操。

However, after dedicating one week to learning the basics of the subject, I found it to be much more accessible than I anticipated.

然而,當我花了一周的時間學習機器學習相關的基礎內容占拍,我發(fā)現(xiàn)機器學習的門檻并沒有我想象中那么高勃教。

This article is intended to give others who’re interested in getting into ml a roadmap of how to get started, drawing from the experiences I made in my intro week.

本文是希望給那些同樣對機器學習的同學一個藍圖,給大家分享我是怎么開始并規(guī)劃我這一周的脐嫂。


Background

背景

Before my machine learning week, I had been reading about the subject for a while, and had gone through half of Andrew Ng’s course on Coursera and a few other theoretical courses. So I had a tiny bit of conceptual understanding of ml, though I was completely unable to transfer any of my knowledge into code. This is what I want to change.

在我開始機器學習的這一周前统刮,我已經(jīng)閱讀過這個學科的內容有一陣并在Coursera上學習了一半Andrew Ng的課程以及一些其他相關的理論課程。所以我已經(jīng)有一些機器學習基礎概念的了解账千。我并沒有將這些知識應用并轉化為代碼侥蒙,這也是我為啥想要改變并開始新的一周的原因。

I wanted to be able to solve problems with ml by the end of the week, even though this meant skipping a lot of fundamentals,and going for a top-down approach, instead of bottoms up.

我希望我可以在這周結束的時候能夠通過機器學習解決一些實際的問題匀奏。這意味著我需要跳過基本的概念鞭衩,通過自上而下的方法來學習,而不是自下而上的方式。

After asking for advice on Hacker News, I came to the conclusion that Python’s Scikit Learn-Module was the best starting point. This module gives you a wealth of algorithms to choose from, reducing the actual machine learning to a few lines of code.

通過在Hacker News上尋求的建議论衍,我了解到Python Scikit learn模塊是一個最好開始的一個點瑞佩,這個模塊給我們提供了一系列算法實踐,讓我們可以通過很少的代碼來調用這些算法坯台,用于處理實際的機器學習任務炬丸。


Monday: Learning some practicalities

周一:學習一些實例

I started off the week by looking for video tutorials which involved Scikit learn. I finally landed on Sentdex’s tutorial on how to use ml for investing in stocks, which gave me the necessary knowledge to move on to the next step.

在這一周的開始,我們通過觀看一些介紹ScikitLearn視頻教程來學習蜒蕾。最終我決定登錄Sentdex’s教程學習機器學習在股票投資上如何應用稠炬,讓我獲取必要的知識來進入到下一步。

The good thing about the Sentdex tutorials that the instructor takes you through all the steps of gathering the data.As you go along, you realize that fetching and cleaning up the data can be much more time consuming than doing the actually machine learning. So the ability to write scripts to scrape data from files or crawl the web are essential skills for aspiring machine learning learning geeks.

Sentdex教程中有一點很贊的是給你詳細介紹了數(shù)據(jù)收集相關的步驟咪啡。當你開始做機器學習以后首启,你會意識到抓取、清洗數(shù)據(jù)上花費的時間往往會多于真正去做機器學習的時間撤摸。所以毅桃,通過寫腳本從文件中收集數(shù)據(jù)或者在網(wǎng)上爬取數(shù)據(jù)的能力是一個有追求的機器學習極客必須的技能。

I have re-watched several of the videos later on, to help me when I’ve been stuck with problem, so I’d recommend you to do the same.

我被卡住的時候愁溜,會去反復觀看這些視頻疾嗅,這解決了我的疑問,所以也推薦你這么去實踐冕象。

However, if you already know how to scrape data from websites, this tutorial might not be the perfect fit, as a lot of the videos evolve around data fetching. In that case, the Udacity’s Intro to Machine Learning might be a better place to start

另外代承,如果你已經(jīng)具備了從網(wǎng)上收集數(shù)據(jù)的技能,這個教程可能并沒有能特別適合你渐扮,不過關于數(shù)據(jù)抓取的視頻教程晚上還有很多论悴。真那樣的話,Udacity’s Intro to Machine Learning應該會是個更好的開始墓律。


Tuesday: Applying it to a real problem

周二:應用機器學習到一個真實的問題

Tuesday I wanted to see if I could use what I had learned to solve an actual problem. As another developer in my coding cooperative was working on Bank of England’s data visualization competition, I teamed up with him to check out the datasets the bank has released. The most interesting data was their household surveys. This is an annual survey the bank perform on a few thousand households, regarding money related subjects.

周二我想看看有沒有什么真實的問題能把我學到的機器學習相關的知識應用上膀估。另外有一個開發(fā)童鞋,是我的開發(fā)伙伴耻讽,我們一起組隊參加了大英銀行數(shù)據(jù)可視化比賽察纯,比賽支持我們下載銀行公布出來的數(shù)據(jù)。里面最讓我們感興趣的數(shù)據(jù)就是家庭調研數(shù)據(jù):銀行每年對成千上萬的家庭進行一項主題和收入相關的調研针肥。

The Problem we decided to solve was the following:

我們決定想要解決的問題閾:

Given a person education level, age and income, can the computer predict its gender?

給定一個人的教育情況饼记,年齡和收入,預測樣本的性別

I Played around with the dataset, spent a few hours cleaning up the data, and used the Scikit Learn map to find a suitable algorithm for the problem.

我開始和這些數(shù)據(jù)集打交道慰枕,花了幾小時的時間來清洗數(shù)據(jù)具则,然后在Scikit Learn map中找到一個合適的算法來解決上述問題。

We ended up with a success ratio at around63%, which isn’t impressive at all. But the machine did at least manage to guess a little better than flipping a coin, which would have given a success rate at 50%.

我們算法最終將預測準確率穩(wěn)定在63%左右具帮。這并不是一個令人亮瞎雙眼的結果博肋,但至少已經(jīng)比拋硬幣的50%的準確率高了一些了低斋。

Seeing results is like fuel to your motivation, so I’d recommend you doing this for yourself, once you have a basic grasp of how to use Scikit Learn

看到結果能點燃你的激情,所以我推薦你自己親手完成這個過程匪凡,這樣你會讓你對Scikit learn有一個直觀的把握膊畴。

It’s a pivotal moment when you realize that you can start using ml to solve in real life problems.

關鍵的是讓自己意識到你已經(jīng)開始使用機器學習來解決一些生活中的實際問題了。


Wednesday: From the ground up

周三:從頭開始

After playing around with various Scikit Learn modules, I decided to try and write linear regression algorithm from the ground up.

當我已經(jīng)玩過了Scikit learn不同的模型病游,我決定嘗試自己重頭寫一個線性回歸算法巴比。

I wanted to do this, because I felt (and still feel) that I really don’t understand what’s happening on under the hood.

從頭做一個算法,是因為我覺得至今都沒有真正理解在算法的內部發(fā)生了什么礁遵,我嘗試去理解內部的邏輯。

Luckily, the Courera course goes into detail on how a few of the algorithms work, which came to great use at great use at this point. More specifically, ti describes the underlying concepts of using linear regressing with gradient descent.

幸運的是采记,Coursera課程會詳細介紹一些算法的工作原理以及使用的方式佣耐。尤其是課程詳細介紹了基于梯度下降的線性回歸算法的基本概念。

This has definitely been the most effective of learning technique, as it forces you to understand the steps that are going on ‘under the hood’. I strongly recommend you to do this at some point.

將你的精力都集中在理解算法‘內部’發(fā)生了什么唧龄,絕對是非常有效的一種學習方式兼砖。我強烈推薦在這個階段你也需要通過這種方式學習。

I plan to rewrite my own implementations of more complex algorithms as I go along, but I prefer doing this after I’ve played around with the respective algorithms in Scikit Learn.

我計劃重寫更多復雜的算法實踐既棺,不過當前我更需要在我完全掌握應用Scikit Learn中各個算法讽挟,所以我計劃以后再去完成算法的重寫。


Thursday: Start competing

周四:開始比賽

On Thursday, I started doing Kaggle’s introductory tutorials. Kaggle is a platform for machine learning competitions,where you can submit solutions to problems released by companies or organizations.

周四丸冕,我開始接觸Kaggle論壇上的介紹教程耽梅。Kaggle是一個機器學習競賽的平臺,在平臺上你可以提交基于一些公司/組織公布數(shù)據(jù)問題的解決方案胖烛。

I recommend you trying out Kaggle after having a little bit of a theoretical and practical understanding of machine learning. You’ll need this in order to start using Kaggle. Otherwise, it will be more frustrating than rewarding.

我推薦你在有一些機器學習的理論知識了解和實際練習之后再參加Kaggle的比賽眼姐。你會需要用到這些知識,不然貿(mào)然去參賽得到的挫敗感比獲得的成就感大得多佩番。

The Bag of Words tutorial guides you through every steps you need to take in order to enter a submission to a competition, plus gives you a brief and exciting introduction into natural language Processing(NLP). I ended the tutorial with much higher interest in NLP than I had when entering it.

詞袋模型的教程會引導你一步一步提交一次比賽結果众旗,另外給你簡要并激奮的介紹了和NLP(自然語言處理)相關的內容。這也讓我除了提交的流程之外更多的對NLP產(chǎn)生了興趣趟畏。


Friday: Back to school

周五:回到學校

Friday, I continued working on the Kaggle tutorials, and also started Udacity’s Intro to Machine Learning. I’m currently half ways through, and find it quite enjoyable.

周五贡歧,我繼續(xù)把時間花在Kaggle上,也開始了學習Udacity’s Intro to Machine Learning課程赋秀,現(xiàn)在已經(jīng)完成了一半的學習利朵,我發(fā)現(xiàn)里面有很多有意思的東西。

It’s a lot easier the Coursera course, as it doesn’t go in depth in the algorithms. But it’s also more practical, as it teaches you Scikit Learn, which is a whole lot easier to apply to the real world than writing algorithms from the ground up in Octave, as you do in the Coursera course.

在Coursera的課程中有很多相對更簡單的課程沃琅,并沒有詳細深入的介紹這些算法哗咆。相對來說,更多的Scikit Learn相關的練習益眉,這些聯(lián)系比起從Octave上從頭開始寫一個算法來說更容易在現(xiàn)實中得到應用晌柬。


The road ahead

前方的路

Doing it for a week hasn’t just been great fun, it has also helped my awareness of its usefulness of machine learning in society. The more I learn about it, the more I see which areas it can be used to solve problems.

過去的一周不僅僅讓我獲得了極大的成就感姥份,還讓我意識到機器學習在社會中的應用。我越對機器學習了解越多年碘,發(fā)現(xiàn)有越多的領域可以用機器學習的方式來解決澈歉。

If you’re interested in getting into machine learning, I strongly recommend you setting off a few days or evenings and simply dive into it.

如果你有興趣進入機器學習的世界,強烈推薦你騰出一些天或者一些晚上出來屿衅,好好的研究下這個領域埃难。

Choose a top down approach if you’re not ready for the heavy stuff, and get into problem solving as quickly as possible.

如果你還沒有準備好全面深入的學習這些東西,建議選擇由上至下的方法涤久,從盡快找一個需要解決的問題域開始涡尘。

Good luck

祝你好運

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市响迂,隨后出現(xiàn)的幾起案子考抄,更是在濱河造成了極大的恐慌,老刑警劉巖蔗彤,帶你破解...
    沈念sama閱讀 211,743評論 6 492
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件川梅,死亡現(xiàn)場離奇詭異,居然都是意外死亡然遏,警方通過查閱死者的電腦和手機贫途,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 90,296評論 3 385
  • 文/潘曉璐 我一進店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來待侵,“玉大人丢早,你說我怎么就攤上這事〗敫” “怎么了香拉?”我有些...
    開封第一講書人閱讀 157,285評論 0 348
  • 文/不壞的土叔 我叫張陵,是天一觀的道長中狂。 經(jīng)常有香客問我凫碌,道長,這世上最難降的妖魔是什么胃榕? 我笑而不...
    開封第一講書人閱讀 56,485評論 1 283
  • 正文 為了忘掉前任盛险,我火速辦了婚禮,結果婚禮上勋又,老公的妹妹穿的比我還像新娘苦掘。我一直安慰自己,他們只是感情好楔壤,可當我...
    茶點故事閱讀 65,581評論 6 386
  • 文/花漫 我一把揭開白布鹤啡。 她就那樣靜靜地躺著,像睡著了一般蹲嚣。 火紅的嫁衣襯著肌膚如雪递瑰。 梳的紋絲不亂的頭發(fā)上祟牲,一...
    開封第一講書人閱讀 49,821評論 1 290
  • 那天,我揣著相機與錄音抖部,去河邊找鬼说贝。 笑死,一個胖子當著我的面吹牛慎颗,可吹牛的內容都是我干的乡恕。 我是一名探鬼主播,決...
    沈念sama閱讀 38,960評論 3 408
  • 文/蒼蘭香墨 我猛地睜開眼俯萎,長吁一口氣:“原來是場噩夢啊……” “哼傲宜!你這毒婦竟也來了?” 一聲冷哼從身側響起夫啊,我...
    開封第一講書人閱讀 37,719評論 0 266
  • 序言:老撾萬榮一對情侶失蹤蛋哭,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后涮母,有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 44,186評論 1 303
  • 正文 獨居荒郊野嶺守林人離奇死亡躁愿,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內容為張勛視角 年9月15日...
    茶點故事閱讀 36,516評論 2 327
  • 正文 我和宋清朗相戀三年叛本,在試婚紗的時候發(fā)現(xiàn)自己被綠了。 大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片彤钟。...
    茶點故事閱讀 38,650評論 1 340
  • 序言:一個原本活蹦亂跳的男人離奇死亡来候,死狀恐怖,靈堂內的尸體忽然破棺而出逸雹,到底是詐尸還是另有隱情营搅,我是刑警寧澤,帶...
    沈念sama閱讀 34,329評論 4 330
  • 正文 年R本政府宣布梆砸,位于F島的核電站转质,受9級特大地震影響,放射性物質發(fā)生泄漏帖世。R本人自食惡果不足惜休蟹,卻給世界環(huán)境...
    茶點故事閱讀 39,936評論 3 313
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望日矫。 院中可真熱鬧赂弓,春花似錦、人聲如沸哪轿。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,757評論 0 21
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽窃诉。三九已至杨耙,卻和暖如春赤套,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背按脚。 一陣腳步聲響...
    開封第一講書人閱讀 31,991評論 1 266
  • 我被黑心中介騙來泰國打工于毙, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留,地道東北人辅搬。 一個月前我還...
    沈念sama閱讀 46,370評論 2 360
  • 正文 我出身青樓唯沮,卻偏偏與公主長得像,于是被迫代替她去往敵國和親堪遂。 傳聞我的和親對象是個殘疾皇子介蛉,可洞房花燭夜當晚...
    茶點故事閱讀 43,527評論 2 349

推薦閱讀更多精彩內容

  • 機器學習(Machine Learning)&深度學習(Deep Learning)資料(Chapter 1) 注...
    Albert陳凱閱讀 22,217評論 9 476
  • 傍晚在回學校的路上,經(jīng)過漢中門地鐵站出口那片廣場時溶褪,我看到好幾個賣小吃的三輪車币旧,幾個大媽在賣手抓餅,一個老大爺在賣...
    漂泊瓶閱讀 354評論 0 0
  • 喜歡一個人猿妈,是一件治愈的事兒吹菱。 偷偷喜歡一個人,則是一件治愈又致郁的事兒彭则。 還是感謝你出現(xiàn)在我生命里啊鳍刷, 我才有了...
    一朵屋里安安閱讀 559評論 0 0
  • 不喜歡她的他,在等哪個她俯抖?
    最美的時光Y閱讀 105評論 0 0
  • 我:西瓜真好吃输瓜。網(wǎng)友:其他水果就不好吃嗎?難道就西瓜努力芬萍?我:我只想表達西瓜好吃尤揣,沒想牽扯其他水果。網(wǎng)友:你就說其...
    我有一口小白牙閱讀 241評論 6 0