[翻譯]新手 Python-機器學習 四部曲資源匯總

翻譯:Four steps to master machine learning with python (including free books & resources),來自LernPython. 這篇文章很爛,不過里面的資源匯總的不錯,這里相當于限Mark下,后面準備翻譯那些不錯的書和Paper,也歡迎更多的人加入.

想要理解和研究機器學習,首先你應該要掌握 Python 或者 R ,都是和 C, Java, PHP 差不多的語言(譯:差太多了好吧).不過呢, Python 和 R 都是比較年輕(譯:不懂, Python 可并不年輕吧),而且呢更高級,完全不用理解底層(譯:?),所以他倆都很容易學. Python 更牛逼的地方在于她能夠處理更多的問題,比如,機器學習,算法,圖像等,而不像 R 只能是進行數(shù)據(jù)處理和分析. Python 有著更廣泛的應用領域,比如 后端框架 Django (譯:原文是,'Hosting websites: Jango'),自然語言處理(譯: 原文是, 'natural language proecssing',作者太不認真,NLP),網站接入等,而且 Python 更像 C 語言(譯:扯淡),所以她現(xiàn)在很流行.

毛子的原文里面有不少錯誤,我以自己的理解加以修正,僅供參考.語法文法錯誤我就直接修改,原文作者的表達內容錯誤會依據(jù)原文不變,在()內說明.

新手用 Python 進行機器學習的四個步驟

  • Python 基礎知識學習,有書,Mooc,視頻.
  • 處理數(shù)據(jù),你得了解一些模塊,如: Pandas, Numpy, Matplotlib 和 Natural Language Processing.
  • 接著你就得爬取數(shù)據(jù),可以通過API,也可以直接到網站上去爬取.網站爬蟲模塊: BeautifulSoup(譯:應該是 Scrapy, BS 是 HTML/XML 解析器).我們用拿到的數(shù)據(jù)來訓練算法.
  • 最后一步,就是要學習 ML 的相關算法,以及工具 Scikit-learn.

1. 學習 Python

學習 Python 最簡單粗暴的法子就是到 Codecademy 上去注冊個賬號來學習基礎知識.一個被好多碼農推薦的很經典的網站 LearnPythonTheHardWay. Byte of Python 這篇文章是非常值得去學習的. Python社區(qū)還為新手給出了一個 Python 學習資源列表. O’Reilley 出版的一本書 Think Python, 這里可以免費下載. 最后還有一個 Introduction to Python for Econometrics, Statistics and Data Analysis 也講了好多 Python 的基礎知識.

2. 導入模塊

做機器學習很重要的幾個模塊和工具是 NumPy, Pandas, Matplotlib 和 IPython.Data Analysis with Open Source Tools 這本書里面都有涉及這些內容. 上面提到的 Introduction to Python for Econometrics, Statistics and Data Analysis 也涵蓋了這些東西.還有一本書 Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython.下面還有一些免費的資源:

3. 爬取挖掘數(shù)據(jù)

一旦你掌握了 Python 的基礎,下面就要學會怎么去爬取數(shù)據(jù). 也就是網頁爬蟲. 像 Twitter 和 LinkedIn 這些網站都給出了 API s接口,讓我們去獲得文本數(shù)據(jù).關于這方面下面有幾本書不錯的書: Mining the Social Web(免費), Web Scraping with PythonWeb Scraping with Python: Collecting Data from the Modern Web.

最后這些文本數(shù)據(jù)要由 NLP 技術處理成數(shù)值化數(shù)據(jù):Natural language processing with Python . 圖像和視頻要用圖像處理 CV,下面有幾個不錯的資源: Programming Computer Vision with Python(免費), Programming Computer Vision with Python: Tools and algorithms for analyzing imagesPractical Python and OpenCV .

Python 爬蟲的一些例子:

4. 機器學習

機器學習可以分為四部分: 分類, 聚類, 回歸和降維.


Machine learning in Python

Scikit-learn 官網上有很多指南,下面列一些其它的:

書:

機器學習相關的Blog和課程

在線課程: Collection of links . MOOC : machine learningData Analyst Nanodegree.
這里是一些Blog.

機器學習理論

書:

還有一些 Watch 15 hours theory of machine learning!

越看越懶得翻,著實沒什么營養(yǎng),索性直接列出資源.下面是美國麻省理工學院(MIT)博士林達華老師(ML大牛)推薦的書單.

Machine Learning

Pattern Recognition and Machine Learning

By Christopher M. Bishop
A new treatment of classic machine learning topics, such as classification, regression, and time series analysis from a Bayesian perspective. It is a must read for people who intends to perform research on Bayesian learning and probabilistic inference.

Graphical Models, Exponential Families, and Variational Inference

By Martin J. Wainwright and Michael I. Jordan
It is a comprehensive and brilliant presentation of three closely related subjects: graphical models, exponential families, and variational inference. This is the best manuscript that I have ever read on this subject. Strongly recommended to everyone interested in graphical models. The connections between various inference algorithms and convex optimization is clearly explained. Note: pdf version of this book is freely available online.

Big Data: A Revolution That Will Transform How We Live, Work, and Think

Viktor Mayer-Schonberger, and Kenneth Cukier
A short but insightful manuscript that will motivate you to rethink how we should face the explosive growth of data in the new century.

Statistical Pattern Recognition (2nd/3rd Edition)

By Andrew R. Webb, and Keith D. Copsey
A well written book on pattern recognition for beginners. It covers basic topics in this field, including discriminant analysis, decision trees, feature selection, and clustering -- all are basic knowledge that researchers in machine learning or pattern recognition should understand.

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

By Bernhard Schlkopf and Alexander J. Smola
A comprehensive and in-depth treatment of kernel methods and support vector machine. It not only clearly develops the mathematical foundation, namely the reproducing kernel Hilbert space, but also gives a lot of practical guidance (e.g. how to choose or design kernels.)

Mathematics

Topology (2nd Edition)

By James Munkres
A classic on topology for beginners. It provides a clear introduction of important concepts in general topology, such as continuity, connectedness, compactness, and metric spaces, which are the fundamentals that you have to grasped before embarking on more advanced subjects such as real analysis.

Introductory Functional Analysis with Applications

ByErwin Kreyszig
It is a very well written book on functional analysis that I would like to recommend to every one who would like to study this subject for the first time. Starting from simple notions such as metrics and norms, the book gradually unfolds the beauty of functional analysis, exposing important topics including Banach spaces, Hilbert spaces, and spectral theory with a reasonable depth and breadth. Most important concepts needed in machine learning are covered by this book. The exercises are of great help to reinforce your understanding.

Real Analysis and Probability (Cambridge Studies in Advanced Mathematics)

By R. M. Dudley
This is a dense text that combines Real analysis and modern probability theory in 500+ pages. What I like about this book is its treatment that emphasizes the interplay between real analysis and probability theory. Also the exposition of measure theory based on semi-rings gives a deep insight of the algebraic structure of measures.

Convex Optimization

By Stephen Boyd, and Lieven Vandenberghe
A classic on convex optimization. Everyone that I knew who had read this book liked it. The presentation style is very comfortable and inspiring, and it assumes only minimal prerequisite on linear algebra and calculus. Strongly recommended for any beginners on optimization. Note: the pdf of this book is freely available on the Prof. Boyd's website.

Nonlinear Programming (2nd Edition)

By Dimitri P. Bersekas
A thorough treatment of nonlinear optimization. It covers gradient-based techniques, Lagrange multiplier theory, and convex programming. Part of this book overlaps with Boyd's. Overall, it goes deeper and takes more efforts to read.

Introduction to Smooth Manifolds

By John M. Lee
This is the book that I used to learn differential geometry and Lie group theory. It provides a detailed introduction to basics of modern differential geometry -- manifolds, tangent spaces, and vector bundles. The connections between manifold theory and Lie group theory is also clearly explained. It also covers De Rham Cohomology and Lie algebra, where audience is invited to discover the beauty by linking geometry with algebra.

Modern Graph Theory

By Bela Bollobas
It is a modern treatment of this classical theory, which emphasizes the connections with other mathematical subjects -- for example, random walks and electrical networks. I found some messages conveyed by this book is enlightening for my research on machine learning methods.

Probability Theory: A Comprehensive Course (Universitext)

By Achim Klenke
This is a complete coverage of modern probability theory -- not only including traditional topics, such as measure theory, independence, and convergence theorems, but also introducing topics that are typically in textbooks on stochastic processes, such as Martingales, Markov chains, and Brownian motion, Poisson processes, and Stochastic differential equations. It is recommended as the main textbook on probability theory.

A First Course in Stochastic Processes (2nd Edition)

By Samuel Karlin, and Howard M. Taylor
A classic textbook on stochastic process which I think are particularly suitable for beginners without much background on measure theory. It provides a complete coverage of many important stochastic processes in an intuitive way. Its development of Markov processes and renewal processes is enlightening.

Poisson Processes (Oxford Studies in Probability)

By J. F. C. Kingman
If you are interested in Bayesian nonparametrics, this is the book that you should definitely check out. This manuscript provides an unparalleled introduction to random point processes, including Poisson and Cox processes, and their deep theoretical connections with complete randomness.

Programming

Structure and Interpretation of Computer Programs (2nd Edition)

By Harold Abelson, Gerald Jay Sussman, and Julie Sussman
Timeless classic that must be read by all computer science majors. While some topics and the use of Scheme as the teaching language seems odd at first glance, the presentation of fundamental concepts such as abstraction, recursion, and modularity is so beautiful and insightful that you would never experienced elsewhere.

Thinking in C++: Introduction to Standard C++ (2nd Edition)

By Bruce Eckel
While it is kind of old (written in 2000), I still recommend this book to all beginners to learn C++. The thoughts underlying object-oriented programming is very clearly explained. It also provides a comprehensive coverage of C++ in a well-tuned pace.

Effective C++: 55 Specific Ways to Improve Your Programs and Designs (3rd Edition)

By Scott Meyers
The Effective C++ series by Scott Meyers is a must for anyone who is serious about C++ programming. The items (rules) listed in this book conveys the author's deep understanding of both C++ itself and modern software engineering principles. This edition reflects latest updates in C++ development, including generic programming the use of TR1 library.

Advanced C++ Metaprogramming

ByDavide Di Gennaro
Like it or hate it, meta-programming has played an increasingly important role in modern C++ development. If you asked what is the key aspects that distinguishes C++ from all other languages, I would say it is the unparalleled generic programming capability based on C++ templates. This book summarizes the latest advancement of metaprogramming in the past decade. I believe it will take the place of Loki's "Modern C++ Design" to become the bible for C++ meta-programming.

Introduction to Algorithms (2nd/3rd Edition)

By Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein
If you know nothing about algorithms, you never understand computer science. This is book is definitely a classic on algorithms and data structures that everyone who is serious about computer science must read. This contents of this book ranges from elementary topics such as classic sorting algorithms and hash table to advanced topics such as maximum flow, linear programming, and computational geometry. It is a book for everyone. Everytime I read it, I learned something new.

Design Patterns: Elements of Reusable Object-Oriented Software

By Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides
Textbooks on C++, Java, or other languages typically use toy examples (animals, students, etc) to illustrate the concept of OOP. This way, however, does not reflect the full strength of object oriented programming. This book, which has been widely acknowledged as a classic in software engineering, shows you, via compelling examples distilled from real world projects, how specific OOP patterns can vastly improve your code's reusability and extensibility.

Structured Parallel Programming: Patterns for Efficient Computation

By Michael McCool, James Reinders, and Arch Robison
Recent trends of hardware advancement has switched from increasing CPU frequencies to increasing the number of cores. A significant implication of this change is that "free lunch has come to an end" -- you have to explicitly parallelize your codes in order to benefit from the latest progress on CPU/GPUs. This book summarizes common patterns used in parallel programming, such as mapping, reduction, and pipelining -- all are very useful in writing parallel codes.

Introduction to High Performance Computing for Scientists and Engineers

By Georg Hager and Gerhard Wellein
This book covers important topics that you should know in developing high performance computing programs. Particularly, it introduces SIMD, memory hierarchies, OpenMP, and MPI. With these knowledges in mind, you understand what are the factors that might influence the run-time performance of your codes.

CUDA Programming: A Developer's Guide to Parallel Computing with GPUs

By Shane Cook
This book provides an in-depth coverage of important aspects related to CUDA programming -- a programming technique that can unleash the unparalleled power of GPU computation. With CUDA and an affordable GPU card, you can run your data analysis program in the matter of minutes which may otherwise require multiple servers to run for hours.

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯(lián)系作者
  • 序言:七十年代末馍忽,一起剝皮案震驚了整個濱河市,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌沙绝,老刑警劉巖咱筛,帶你破解...
    沈念sama閱讀 221,331評論 6 515
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場離奇詭異,居然都是意外死亡蜕乡,警方通過查閱死者的電腦和手機嫁怀,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 94,372評論 3 398
  • 文/潘曉璐 我一進店門设捐,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人塘淑,你說我怎么就攤上這事萝招。” “怎么了存捺?”我有些...
    開封第一講書人閱讀 167,755評論 0 360
  • 文/不壞的土叔 我叫張陵槐沼,是天一觀的道長。 經常有香客問我,道長岗钩,這世上最難降的妖魔是什么逸爵? 我笑而不...
    開封第一講書人閱讀 59,528評論 1 296
  • 正文 為了忘掉前任,我火速辦了婚禮凹嘲,結果婚禮上师倔,老公的妹妹穿的比我還像新娘。我一直安慰自己周蹭,他們只是感情好趋艘,可當我...
    茶點故事閱讀 68,526評論 6 397
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著凶朗,像睡著了一般瓷胧。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上棚愤,一...
    開封第一講書人閱讀 52,166評論 1 308
  • 那天搓萧,我揣著相機與錄音,去河邊找鬼宛畦。 笑死瘸洛,一個胖子當著我的面吹牛,可吹牛的內容都是我干的次和。 我是一名探鬼主播反肋,決...
    沈念sama閱讀 40,768評論 3 421
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼踏施!你這毒婦竟也來了石蔗?” 一聲冷哼從身側響起,我...
    開封第一講書人閱讀 39,664評論 0 276
  • 序言:老撾萬榮一對情侶失蹤畅形,失蹤者是張志新(化名)和其女友劉穎养距,沒想到半個月后,有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體日熬,經...
    沈念sama閱讀 46,205評論 1 319
  • 正文 獨居荒郊野嶺守林人離奇死亡棍厌,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內容為張勛視角 年9月15日...
    茶點故事閱讀 38,290評論 3 340
  • 正文 我和宋清朗相戀三年,在試婚紗的時候發(fā)現(xiàn)自己被綠了碍遍。 大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片定铜。...
    茶點故事閱讀 40,435評論 1 352
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖怕敬,靈堂內的尸體忽然破棺而出揣炕,到底是詐尸還是另有隱情,我是刑警寧澤东跪,帶...
    沈念sama閱讀 36,126評論 5 349
  • 正文 年R本政府宣布畸陡,位于F島的核電站鹰溜,受9級特大地震影響,放射性物質發(fā)生泄漏丁恭。R本人自食惡果不足惜曹动,卻給世界環(huán)境...
    茶點故事閱讀 41,804評論 3 333
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望牲览。 院中可真熱鬧墓陈,春花似錦、人聲如沸第献。這莊子的主人今日做“春日...
    開封第一講書人閱讀 32,276評論 0 23
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽庸毫。三九已至仔拟,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間飒赃,已是汗流浹背利花。 一陣腳步聲響...
    開封第一講書人閱讀 33,393評論 1 272
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留载佳,地道東北人炒事。 一個月前我還...
    沈念sama閱讀 48,818評論 3 376
  • 正文 我出身青樓,卻偏偏與公主長得像刚盈,于是被迫代替她去往敵國和親羡洛。 傳聞我的和親對象是個殘疾皇子挂脑,可洞房花燭夜當晚...
    茶點故事閱讀 45,442評論 2 359

推薦閱讀更多精彩內容

  • 通常母親的微笑是很多見的藕漱,但在我的生活中母親的微笑就不多見,雖然有時也會有崭闲,但也是極少極少的偶然肋联。 我很想...
    蛋蛋君語錄閱讀 404評論 0 1
  • 忍住不吼娃day1。停止說教
    宜儐Belinda閱讀 228評論 0 0
  • 這一首詩用來寫給你 紀念不存在的相遇 和風一樣的離別 夜色從星空鋪展開去 如水刁俭,枯草如燈 照見黑暗里憔悴的橄欖林 ...
    宋云帆閱讀 323評論 1 1
  • 文/冬日暖陽 在一個班集體牍戚,總有一兩個孩子特別喜歡惹是生非侮繁,有的甚至經常動手打人。這讓老師和家長都極為頭痛如孝,批評教...
    幫得上閱讀 287評論 4 4