Scikit-learn in Python 去年學(xué)的最重要的機器學(xué)習(xí)工具

scikit-learn in Python – the most important Machine Learning tool I learnt last year!

This article went through a series of changes!

I was initially writing on?a different topic (related to analytics). I had almost finished writing it. I had put in about 2 hours and written an average article. If I had made it live, it would have done OK! But something in me stopped me from making it live. I was just not satisfied with the output. The article didn’t convey how I am feeling about 2015 and how useful Analytics Vidhya could become for your analytics learning this year.

So, I put that article in Trash and started re-thinking which topic would do the justice. This is what I ended up with – let me write awesome articles and guides about what was my biggest learning in 2014 – The Scikit-learn library in Python. This was my biggest learning because?it is now the tool I use for any machine learning project I work upon.

Creating these articles would not only be immensely useful for readers of the blog, but would also challenge me in writing about something I am still relatively new at. I would also love to hear from you on the same – what was your biggest learning in 2014 and would you want to share it with readers of this blog?

What is scikit-learn?

Scikit-learn is probably the most useful library for machine learning in Python. It is?on NumPy, SciPy and matplotlib, this library contains a lot of effiecient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction.

Please note that scikit-learn is used to build models. It should not be used for reading the data, manipulating and summarizing it. There are better libraries for that (e.g. NumPy, Pandas etc.)

Components of scikit-learn:

Scikit-learn comes loaded with a lot of features. Here are a few of them to help you understand the spread:

Supervised learning algorithms:Think of any supervised learning algorithm you might have heard about and there is a very high chance that it is part of scikit-learn. Starting from Generalized linear models (e.g Linear Regression), Support Vector Machines (SVM), Decision Trees to Bayesian methods – all of them are part of scikit-learn toolbox. The spread of algorithms is one of the big reasons for high usage of scikit-learn. I started using scikit to solve supervised learning problems and would recommend that to?people new to scikit / machine learning as well.

Cross-validation:There are various methods to check the accuracy of supervised models on unseen data

Unsupervised learning algorithms:Again there is a large?spread of algorithms in the offering – starting from clustering, factor analysis, principal component analysis to unsupervised neural networks.

Various toy datasets:This came in handy while learning scikit-learn. I had learnt SAS using various academic datasets (e.g. IRIS dataset, Boston House prices dataset). Having them handy while learning a new library helped a lot.

Feature extraction:Useful for extracting features from images and text (e.g. Bag of words)

Community / Organizations using scikit-learn:

One of the main reasons behind using open source tools is the huge community it has. Same is true for scikit-learn ass well. There are about 35 contributors to scikit learn till date, the most notable being Andreas Mueller (P.S. Andy’smachine learning cheat sheetis one of the best visualizations to understand the spectrum of machine learning algorithms).

There are various Organiations of the likes of Evernote, Inria and AWeber which are being displayed onscikit learn home pageas users. But I truly believe that the actual usage is far more.

In addition to these communities, there are various meetups across the globe. There was also aKaggle knowledge contest, which finished recently but might still be one of the best places to start playing around with the library.

Machine Learning cheat sheet – see Original image for better resolution

Quick Example:

Now that you understand the eco-system at a high level, let me illustrate the use of scikit learn with an example. The idea is to just illustrate the simplicity of usage of scikit-learn. We will have a look at various algorithms and best ways to use them in one of the articles which follow.

We will build a logistic regression on IRIS dataset:

Step 1: Import the relevant libraries and read the dataset

import numpy as np

import matplotlib as plt

from sklearn import datasets

from sklearn import metrics

from sklearn.linear_model import LogisticRegression

We have imported all the libraries. Next, we read the dataset:

dataset = datasets.load_iris()

Step 2: Understand the dataset by looking at distributions and plots

I am skipping these steps for now. You can readthis article, if you want to learn exploratory analysis.

Step 3: Build a logistic regression model on the dataset and making predictions

model.fit(dataset.data, dataset.target)

expected = dataset.target

predicted = model.predict(dataset.data)

Step 4: Print confusion metrix

print(metrics.classification_report(expected, predicted))

print(metrics.confusion_matrix(expected, predicted))

End Notes:

This was an overview of one of the most powerful and versatile machine learning library in Python. It was also the biggest learning I did in 2014. What was your biggest learning in 2014? Please share it with the group through comments below.

Are you excited about learning and using Scikit-learn? If Yes, stay tuned for the remaining articles in this series.

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市碟渺,隨后出現(xiàn)的幾起案子性宏,更是在濱河造成了極大的恐慌,老刑警劉巖曼玩,帶你破解...
    沈念sama閱讀 212,383評論 6 493
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場離奇詭異,居然都是意外死亡雏逾,警方通過查閱死者的電腦和手機踱卵,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 90,522評論 3 385
  • 文/潘曉璐 我一進店門廊驼,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人惋砂,你說我怎么就攤上這事妒挎。” “怎么了西饵?”我有些...
    開封第一講書人閱讀 157,852評論 0 348
  • 文/不壞的土叔 我叫張陵酝掩,是天一觀的道長。 經(jīng)常有香客問我眷柔,道長期虾,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 56,621評論 1 284
  • 正文 為了忘掉前任驯嘱,我火速辦了婚禮镶苞,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘鞠评。我一直安慰自己茂蚓,他們只是感情好,可當我...
    茶點故事閱讀 65,741評論 6 386
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著煌贴,像睡著了一般御板。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上牛郑,一...
    開封第一講書人閱讀 49,929評論 1 290
  • 那天怠肋,我揣著相機與錄音,去河邊找鬼淹朋。 笑死笙各,一個胖子當著我的面吹牛,可吹牛的內(nèi)容都是我干的础芍。 我是一名探鬼主播杈抢,決...
    沈念sama閱讀 39,076評論 3 410
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼仑性!你這毒婦竟也來了惶楼?” 一聲冷哼從身側(cè)響起,我...
    開封第一講書人閱讀 37,803評論 0 268
  • 序言:老撾萬榮一對情侶失蹤诊杆,失蹤者是張志新(化名)和其女友劉穎歼捐,沒想到半個月后,有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體晨汹,經(jīng)...
    沈念sama閱讀 44,265評論 1 303
  • 正文 獨居荒郊野嶺守林人離奇死亡豹储,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 36,582評論 2 327
  • 正文 我和宋清朗相戀三年,在試婚紗的時候發(fā)現(xiàn)自己被綠了淘这。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片剥扣。...
    茶點故事閱讀 38,716評論 1 341
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖铝穷,靈堂內(nèi)的尸體忽然破棺而出钠怯,到底是詐尸還是另有隱情,我是刑警寧澤曙聂,帶...
    沈念sama閱讀 34,395評論 4 333
  • 正文 年R本政府宣布呻疹,位于F島的核電站,受9級特大地震影響筹陵,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜镊尺,卻給世界環(huán)境...
    茶點故事閱讀 40,039評論 3 316
  • 文/蒙蒙 一朦佩、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧庐氮,春花似錦语稠、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,798評論 0 21
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽输涕。三九已至,卻和暖如春慨畸,著一層夾襖步出監(jiān)牢的瞬間莱坎,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 32,027評論 1 266
  • 我被黑心中介騙來泰國打工寸士, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留檐什,地道東北人。 一個月前我還...
    沈念sama閱讀 46,488評論 2 361
  • 正文 我出身青樓弱卡,卻偏偏與公主長得像乃正,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個殘疾皇子婶博,可洞房花燭夜當晚...
    茶點故事閱讀 43,612評論 2 350

推薦閱讀更多精彩內(nèi)容