Anaconda Python 是完全免費的企業(yè)級的Python發(fā)行大規(guī)模數(shù)據(jù)處理、預測分析和科學計算工具弄喘。
Anaconda 是 Python 科學技術(shù)包的合集,功能和 Python(x,y) 類似。它是新起之秀渐裂,已更新多次了炉抒。包管理使用 conda球拦,GUI基于PySide置森,容量適中,但該有的科學計算包都有端衰。Anaconda 支持所有操作系統(tǒng)平臺叠洗,它的安裝、更新和刪除都很方便,且所有的東西都只安裝在一個目錄中旅东。Anaconda目前提供Python 2.6.X,Python 2.7.X,Python 3.3.X和Python 3.4.X四個系列發(fā)行包灭抑,這也是其他發(fā)行版所望塵莫及的。
1.ipython
IPython provides a rich architecture for interactive computing with:
1)Powerful interactive shells (terminal and Qt-based).
2)A browser-based notebook with support for code, text, mathematical expressions, inline plots and other rich media.
3)Support for interactive data visualization and use of GUI toolkits.
4)Flexible, embeddable interpreters to load into your own projects.
5)Easy to use, high performance tools for parallel computing.
“iPython 是一個Python 的交互式Shell抵代,比默認的Python Shell 好用得多腾节,功能也更強大。 她支持語法高亮荤牍、自動完成案腺、代碼調(diào)試、對象自省康吵,支持 Bash Shell 命令劈榨,內(nèi)置了許多很有用的功能和函式等,非常容易使用晦嵌。 ” 啟動iPython的時候用這個命令“ipython –pylab”鞋既,默認開啟了matploblib的繪圖交互,用起來很方便耍铜。
NumPy is the fundamental package for scientific computing with Python. It contains among other things:
1)a powerful N-dimensional array object
2)sophisticated (broadcasting) functions
3)tools for integrating C/C++ and Fortran code
4) useful linear algebra, Fourier transform, and random number capabilities
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.
NumPy幾乎是一個無法回避的科學計算工具包,最常用的也許是它的N維數(shù)組對象跌前,其他還包括一些成熟的函數(shù)庫棕兼,用于整合C/C++和Fortran代碼的工具包,線性代數(shù)抵乓、傅里葉變換和隨機數(shù)生成函數(shù)等伴挚。NumPy提供了兩種基本的對象:ndarray(N-dimensional array object)和 ufunc(universal function object)靶衍。ndarray是存儲單一數(shù)據(jù)類型的多維數(shù)組,而ufunc則是能夠?qū)?shù)組進行處理的函數(shù)茎芋。
3.scipy: Python Data Analysis Library
SciPy refers to several related but distinct entities:
1)The SciPy Stack, a collection of open source software for scientific computing in Python, and particularly a specified set of core packages.
2)The community of people who use and develop this stack.
3)Several conferences dedicated to scientific computing in Python – SciPy, EuroSciPy and SciPy.in.
4)The SciPy library, one component of the SciPy stack, providing many numerical routines.
matplotlib 是python最著名的繪圖庫颅眶,它提供了一整套和matlab相似的命令API,十分適合交互式地進行制圖田弥。而且也可以方便地將它作為繪圖控件涛酗,嵌入GUI應用程序中。Matplotlib可以配合ipython shell使用偷厦,提供不亞于Matlab的繪圖體驗商叹。
matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. matplotlib can be used in python scripts, the python and ipython shell (ala MATLAB?* or Mathematica??), web application servers, and six graphical user interface toolkits.
matplotlib 是python最著名的繪圖庫,它提供了一整套和matlab相似的命令API只泼,十分適合交互式地進行制圖剖笙。而且也可以方便地將它作為繪圖控件,嵌入GUI應用程序中请唱。Matplotlib可以配合ipython shell使用弥咪,提供不亞于Matlab的繪圖體驗。
第二部分 機器學習十绑、數(shù)據(jù)挖掘相關(guān)工具包
You didn’t write that awful page. You’re just trying to get some data out of it. Beautiful Soup is here to help. Since 2004, it’s been saving programmers hours or days of work on quick-turnaround screen scraping projects.
爬蟲工具
2.pandas: Python Data Analysis Library
Pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series.
Pandas也是基于NumPy和Matplotlib開發(fā)的聚至,主要用于數(shù)據(jù)分析和數(shù)據(jù)可視化,它的數(shù)據(jù)結(jié)構(gòu)DataFrame和R語言里的data.frame很像孽惰,特別是對于時間序列數(shù)據(jù)有自己的一套分析機制晚岭,非常不錯。
3.scikit-learn: Machine Learning in Python
scikit-learn (formerly scikits.learn) is an open source machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, logistic regression, naive Bayes, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.
scikit-learn是一個基于NumPy, SciPy, Matplotlib的開源機器學習工具包勋功,主要涵蓋分類坦报,回歸和聚類算法,例如SVM狂鞋, 邏輯回歸片择,樸素貝葉斯,隨機森林骚揍,k-means等算法字管,代碼和文檔都非常不錯,在許多Python項目中都有應用信不。例如在我們熟悉的NLTK中嘲叔,分類器方面就有專門針對scikit-learn的接口,可以調(diào)用scikit-learn的分類算法以及訓練數(shù)據(jù)來訓練分類器模型抽活。
4.nltk:Natural Language Toolkit
NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, and an active discussion forum.
自然語言處理包
Conda is an open source package management system and environment management system for installing multiple versions of software packages and their dependencies and switching easily between them. It works on Linux, OS X and Windows, and was created for Python programs but can package and distribute any software.
conda是一個開源的包管理和環(huán)境管理系統(tǒng)硫戈。包管理功能能讓你非常容易的安裝和卸載各種Python庫,并且很好的管理Anaconda的各個組件下硕。環(huán)境管理功能支持在不同的python版本和插件換將下進行切換丁逝,方便不同的開發(fā)需求汁胆。
相關(guān)功能在test-drive文檔介紹的非常清楚,在此不再贅述霜幼。
使用一種基于Web技術(shù)的交互式計算文檔格式嫩码。為什么說它是文檔格式,而非計算工具呢罪既?實際上它兩者都是铸题。Notebook 在交互上使用了 C/S 結(jié)構(gòu),它通過 Tornado 建立一個 shell 服務器萝衩,并使用瀏覽器作為客戶端回挽。另外 notebook 頁面都被保存為 .ipynb 的類 JSON 文件格式。這種文件格式也是 Notebook 最吸引人的地方猩谊。IPython Notebook使用瀏覽器作為界面千劈,向后臺的IPython服務器發(fā)送請求,并顯示結(jié)果牌捷。在瀏覽器的界面中使用單元(Cell)保存各種信息墙牌。Cell有多種類型,經(jīng)常使用的有表示格式化文本的Markdown單元暗甥,和表示代碼的Code單元喜滨。
Spyder是Python(x,y)的作者為它開發(fā)的一個簡單的集成開發(fā)環(huán)境。和其他的Python開發(fā)環(huán)境相比撤防,它最大的優(yōu)點就是模仿MATLAB的“工作空間”的功能虽风,可以很方便地觀察和修改數(shù)組的值。
PyQt是一個創(chuàng)建GUI應用程序的工具包寄月。它是Python編程語言和Qt庫的成功融合辜膝。Qt庫是目前最強大的庫之一。 PyQt實現(xiàn)了一個Python模塊集漾肮。它有超過300類厂抖,將近6000個函數(shù)和方法。它是一個多平臺的工具包克懊,可以運行在所有主要操作系統(tǒng)上忱辅,包括UNIX谭溉,Windows和Mac。 PyQt采用雙許可證扮念,開發(fā)人員可以選擇GPL和商業(yè)許可。在此之前,GPL的版本只能用在Unix上场躯,從PyQt的版本4開始,GPL許可證可用于所有支持的平臺旅挤。
用C語言實現(xiàn)Python及其解釋器