Machine Learning in a Week
機器學習一周入門實踐
Getting into machine learning(ml) can seem like an unachievable task from the outside.
在外界來看初坠,機器學習的入門是一件難以企及的任務诗轻。
And it definitely can be, if you attack it from the wrong end.
事實是一旦你用錯誤的姿勢打開,你的確可能永遠不會真正入門路操。
However, after dedicating one week to learning the basics of the subject, I found it to be much more accessible than I anticipated.
然而,當我花了一周的時間學習機器學習相關的基礎內容占拍,我發(fā)現(xiàn)機器學習的門檻并沒有我想象中那么高勃教。
This article is intended to give others who’re interested in getting into ml a roadmap of how to get started, drawing from the experiences I made in my intro week.
本文是希望給那些同樣對機器學習的同學一個藍圖,給大家分享我是怎么開始并規(guī)劃我這一周的脐嫂。
Background
背景
Before my machine learning week, I had been reading about the subject for a while, and had gone through half of Andrew Ng’s course on Coursera and a few other theoretical courses. So I had a tiny bit of conceptual understanding of ml, though I was completely unable to transfer any of my knowledge into code. This is what I want to change.
在我開始機器學習的這一周前统刮,我已經(jīng)閱讀過這個學科的內容有一陣并在Coursera上學習了一半Andrew Ng的課程以及一些其他相關的理論課程。所以我已經(jīng)有一些機器學習基礎概念的了解账千。我并沒有將這些知識應用并轉化為代碼侥蒙,這也是我為啥想要改變并開始新的一周的原因。
I wanted to be able to solve problems with ml by the end of the week, even though this meant skipping a lot of fundamentals,and going for a top-down approach, instead of bottoms up.
我希望我可以在這周結束的時候能夠通過機器學習解決一些實際的問題匀奏。這意味著我需要跳過基本的概念鞭衩,通過自上而下的方法來學習,而不是自下而上的方式。
After asking for advice on Hacker News, I came to the conclusion that Python’s Scikit Learn-Module was the best starting point. This module gives you a wealth of algorithms to choose from, reducing the actual machine learning to a few lines of code.
通過在Hacker News上尋求的建議论衍,我了解到Python Scikit learn模塊是一個最好開始的一個點瑞佩,這個模塊給我們提供了一系列算法實踐,讓我們可以通過很少的代碼來調用這些算法坯台,用于處理實際的機器學習任務炬丸。
Monday: Learning some practicalities
周一:學習一些實例
I started off the week by looking for video tutorials which involved Scikit learn. I finally landed on Sentdex’s tutorial on how to use ml for investing in stocks, which gave me the necessary knowledge to move on to the next step.
在這一周的開始,我們通過觀看一些介紹ScikitLearn視頻教程來學習蜒蕾。最終我決定登錄Sentdex’s教程學習機器學習在股票投資上如何應用稠炬,讓我獲取必要的知識來進入到下一步。
The good thing about the Sentdex tutorials that the instructor takes you through all the steps of gathering the data.As you go along, you realize that fetching and cleaning up the data can be much more time consuming than doing the actually machine learning. So the ability to write scripts to scrape data from files or crawl the web are essential skills for aspiring machine learning learning geeks.
Sentdex教程中有一點很贊的是給你詳細介紹了數(shù)據(jù)收集相關的步驟咪啡。當你開始做機器學習以后首启,你會意識到抓取、清洗數(shù)據(jù)上花費的時間往往會多于真正去做機器學習的時間撤摸。所以毅桃,通過寫腳本從文件中收集數(shù)據(jù)或者在網(wǎng)上爬取數(shù)據(jù)的能力是一個有追求的機器學習極客必須的技能。
I have re-watched several of the videos later on, to help me when I’ve been stuck with problem, so I’d recommend you to do the same.
我被卡住的時候愁溜,會去反復觀看這些視頻疾嗅,這解決了我的疑問,所以也推薦你這么去實踐冕象。
However, if you already know how to scrape data from websites, this tutorial might not be the perfect fit, as a lot of the videos evolve around data fetching. In that case, the Udacity’s Intro to Machine Learning might be a better place to start
另外代承,如果你已經(jīng)具備了從網(wǎng)上收集數(shù)據(jù)的技能,這個教程可能并沒有能特別適合你渐扮,不過關于數(shù)據(jù)抓取的視頻教程晚上還有很多论悴。真那樣的話,Udacity’s Intro to Machine Learning應該會是個更好的開始墓律。
Tuesday: Applying it to a real problem
周二:應用機器學習到一個真實的問題
Tuesday I wanted to see if I could use what I had learned to solve an actual problem. As another developer in my coding cooperative was working on Bank of England’s data visualization competition, I teamed up with him to check out the datasets the bank has released. The most interesting data was their household surveys. This is an annual survey the bank perform on a few thousand households, regarding money related subjects.
周二我想看看有沒有什么真實的問題能把我學到的機器學習相關的知識應用上膀估。另外有一個開發(fā)童鞋,是我的開發(fā)伙伴耻讽,我們一起組隊參加了大英銀行數(shù)據(jù)可視化比賽察纯,比賽支持我們下載銀行公布出來的數(shù)據(jù)。里面最讓我們感興趣的數(shù)據(jù)就是家庭調研數(shù)據(jù):銀行每年對成千上萬的家庭進行一項主題和收入相關的調研针肥。
The Problem we decided to solve was the following:
我們決定想要解決的問題閾:
Given a person education level, age and income, can the computer predict its gender?
給定一個人的教育情況饼记,年齡和收入,預測樣本的性別
I Played around with the dataset, spent a few hours cleaning up the data, and used the Scikit Learn map to find a suitable algorithm for the problem.
我開始和這些數(shù)據(jù)集打交道慰枕,花了幾小時的時間來清洗數(shù)據(jù)具则,然后在Scikit Learn map中找到一個合適的算法來解決上述問題。
We ended up with a success ratio at around63%, which isn’t impressive at all. But the machine did at least manage to guess a little better than flipping a coin, which would have given a success rate at 50%.
我們算法最終將預測準確率穩(wěn)定在63%左右具帮。這并不是一個令人亮瞎雙眼的結果博肋,但至少已經(jīng)比拋硬幣的50%的準確率高了一些了低斋。
Seeing results is like fuel to your motivation, so I’d recommend you doing this for yourself, once you have a basic grasp of how to use Scikit Learn
看到結果能點燃你的激情,所以我推薦你自己親手完成這個過程匪凡,這樣你會讓你對Scikit learn有一個直觀的把握膊畴。
It’s a pivotal moment when you realize that you can start using ml to solve in real life problems.
關鍵的是讓自己意識到你已經(jīng)開始使用機器學習來解決一些生活中的實際問題了。
Wednesday: From the ground up
周三:從頭開始
After playing around with various Scikit Learn modules, I decided to try and write linear regression algorithm from the ground up.
當我已經(jīng)玩過了Scikit learn不同的模型病游,我決定嘗試自己重頭寫一個線性回歸算法巴比。
I wanted to do this, because I felt (and still feel) that I really don’t understand what’s happening on under the hood.
從頭做一個算法,是因為我覺得至今都沒有真正理解在算法的內部發(fā)生了什么礁遵,我嘗試去理解內部的邏輯。
Luckily, the Courera course goes into detail on how a few of the algorithms work, which came to great use at great use at this point. More specifically, ti describes the underlying concepts of using linear regressing with gradient descent.
幸運的是采记,Coursera課程會詳細介紹一些算法的工作原理以及使用的方式佣耐。尤其是課程詳細介紹了基于梯度下降的線性回歸算法的基本概念。
This has definitely been the most effective of learning technique, as it forces you to understand the steps that are going on ‘under the hood’. I strongly recommend you to do this at some point.
將你的精力都集中在理解算法‘內部’發(fā)生了什么唧龄,絕對是非常有效的一種學習方式兼砖。我強烈推薦在這個階段你也需要通過這種方式學習。
I plan to rewrite my own implementations of more complex algorithms as I go along, but I prefer doing this after I’ve played around with the respective algorithms in Scikit Learn.
我計劃重寫更多復雜的算法實踐既棺,不過當前我更需要在我完全掌握應用Scikit Learn中各個算法讽挟,所以我計劃以后再去完成算法的重寫。
Thursday: Start competing
周四:開始比賽
On Thursday, I started doing Kaggle’s introductory tutorials. Kaggle is a platform for machine learning competitions,where you can submit solutions to problems released by companies or organizations.
周四丸冕,我開始接觸Kaggle論壇上的介紹教程耽梅。Kaggle是一個機器學習競賽的平臺,在平臺上你可以提交基于一些公司/組織公布數(shù)據(jù)問題的解決方案胖烛。
I recommend you trying out Kaggle after having a little bit of a theoretical and practical understanding of machine learning. You’ll need this in order to start using Kaggle. Otherwise, it will be more frustrating than rewarding.
我推薦你在有一些機器學習的理論知識了解和實際練習之后再參加Kaggle的比賽眼姐。你會需要用到這些知識,不然貿(mào)然去參賽得到的挫敗感比獲得的成就感大得多佩番。
The Bag of Words tutorial guides you through every steps you need to take in order to enter a submission to a competition, plus gives you a brief and exciting introduction into natural language Processing(NLP). I ended the tutorial with much higher interest in NLP than I had when entering it.
詞袋模型的教程會引導你一步一步提交一次比賽結果众旗,另外給你簡要并激奮的介紹了和NLP(自然語言處理)相關的內容。這也讓我除了提交的流程之外更多的對NLP產(chǎn)生了興趣趟畏。
Friday: Back to school
周五:回到學校
Friday, I continued working on the Kaggle tutorials, and also started Udacity’s Intro to Machine Learning. I’m currently half ways through, and find it quite enjoyable.
周五贡歧,我繼續(xù)把時間花在Kaggle上,也開始了學習Udacity’s Intro to Machine Learning課程赋秀,現(xiàn)在已經(jīng)完成了一半的學習利朵,我發(fā)現(xiàn)里面有很多有意思的東西。
It’s a lot easier the Coursera course, as it doesn’t go in depth in the algorithms. But it’s also more practical, as it teaches you Scikit Learn, which is a whole lot easier to apply to the real world than writing algorithms from the ground up in Octave, as you do in the Coursera course.
在Coursera的課程中有很多相對更簡單的課程沃琅,并沒有詳細深入的介紹這些算法哗咆。相對來說,更多的Scikit Learn相關的練習益眉,這些聯(lián)系比起從Octave上從頭開始寫一個算法來說更容易在現(xiàn)實中得到應用晌柬。
The road ahead
前方的路
Doing it for a week hasn’t just been great fun, it has also helped my awareness of its usefulness of machine learning in society. The more I learn about it, the more I see which areas it can be used to solve problems.
過去的一周不僅僅讓我獲得了極大的成就感姥份,還讓我意識到機器學習在社會中的應用。我越對機器學習了解越多年碘,發(fā)現(xiàn)有越多的領域可以用機器學習的方式來解決澈歉。
If you’re interested in getting into machine learning, I strongly recommend you setting off a few days or evenings and simply dive into it.
如果你有興趣進入機器學習的世界,強烈推薦你騰出一些天或者一些晚上出來屿衅,好好的研究下這個領域埃难。
Choose a top down approach if you’re not ready for the heavy stuff, and get into problem solving as quickly as possible.
如果你還沒有準備好全面深入的學習這些東西,建議選擇由上至下的方法涤久,從盡快找一個需要解決的問題域開始涡尘。