[翻譯]高質(zhì)量Python代碼技巧

Python的開發(fā)跟其他的一些語言是有很大不同的. 她和Ruby, Perl一樣都是解釋型語言,所以開發(fā)者能夠交互式編程環(huán)境來實(shí)時的測試執(zhí)行代碼. Python的這一特性意味著她在不用編譯, 能夠用來快速地開發(fā)和調(diào)試代碼原型. Python類似于Scala和Javascript都包含了很多實(shí)用的開發(fā)工具來幫助腳本式的開發(fā). 但Python同時又是像Java和C++一樣, 具有很強(qiáng)擴(kuò)展性, 能模塊化編程的面向?qū)ο缶幊陶Z言,而不僅僅是簡單的執(zhí)行腳本.

一般Python用來快速執(zhí)行的單一腳本, 用類似Django這樣的大型可擴(kuò)展框架來開發(fā)網(wǎng)站應(yīng)用, 用Celery做數(shù)據(jù)處理等, 甚至科學(xué)計算科學(xué)處理都占Python應(yīng)用的一大部分.Python這門輕量級高效的編程語言, 在大多系統(tǒng)都是默認(rèn)就安裝了的, 所以呢, 用她來做數(shù)據(jù)分析,數(shù)據(jù)處理, 任務(wù)分析等任務(wù)就是第一選擇了.

然而, Python的一大缺點(diǎn)就是沒有一整套的開發(fā)流程, 自然也是沒有一個標(biāo)準(zhǔn)的IDE或者是開發(fā)框架. 大部分Python參考資料都是教你怎么去使用這門腳本語言,完全忽略了一個重點(diǎn),那就是如何去構(gòu)建一個大型的Python項(xiàng)目. 這篇文章就是來介紹下用Python來構(gòu)建大型數(shù)據(jù)類型項(xiàng)目的一個流程.

開發(fā)環(huán)境

那么, 要完成成功的開發(fā)數(shù)據(jù)項(xiàng)目這目標(biāo), 你應(yīng)該需要些什么呢? 很簡單的兩點(diǎn):

  1. 文本編輯器, Notepad++, Vim, Emacs 或者 Text Wrangler等都行.(譯注: Sublime)
  2. 終端, 當(dāng)然你得把環(huán)境變量設(shè)好.(譯注: 把Python的path加入PATH環(huán)境變量中)

對, 只需要這兩個! 當(dāng)然也有很多帶調(diào)試, 代碼補(bǔ)全和語法高亮的開發(fā)環(huán)境. 然而這些東西歸根到底, 都只是把文本編輯器和終端結(jié)合, 然后添加了一些使用的功能. 如果你執(zhí)意要使用IDE, 那么我推薦一些的一些:

  • IDLE -這個對于Windows用戶可能會很熟悉, 因?yàn)橥ǔK麄兊牡谝粋€Python程序就是在這里完工的. 雖然她很簡單, 但是Python自帶的而且效率也還不錯.
  • Komodo Edit - 這款免費(fèi)IDE是由ActiveState公司操刀的, 提供了很多的工具和實(shí)用的功能.
  • PyCharm - 雖然收費(fèi), 但是絕對值, 用起來和 IntelliJ 一樣.
  • Aptana Studio - 雖然她是助攻 Web 開發(fā)的, 但是也內(nèi)置了對 Python 的支持.
  • Spyder - 專注于科學(xué)計算.
  • iPython - 交互式開發(fā)環(huán)境, 可以保存運(yùn)行的 Python 代碼和數(shù)據(jù).

然而, 即使你使用了這些工具, 你還是會回到下面要講的基本開發(fā)流程. Sublime Text 3具有很多巧妙而又強(qiáng)大的特性, 語法高亮也只需要添加pdb文件, 同時還有命令行, 所以很多獨(dú)立開發(fā)者都是使用她作為他們的首要工具.

隨著你項(xiàng)目的增大, 你也會使用到下面的一些工具:

還有很多使用的輔助開發(fā)工具, 但是這三個工具在當(dāng)前 Python 開發(fā)中是比較重要而且比較常用的, 下面我還會進(jìn)一步講到.

第三方庫

在開發(fā)的過程中, 不可避免,你肯定會或多或少地使用到第三方庫, 特別是在做數(shù)據(jù)處理時需要像 Numpy, Pandas等其它的工具. 安裝這些庫在你的系統(tǒng)上通常只需要使用pip-python的包管理工具.使用pip會幫你解決不少麻煩,節(jié)省時間, 當(dāng)然你得在你的機(jī)子上先安裝好她!

requests.py 是一個很簡單的HTTP庫, 很容易實(shí)現(xiàn)請求web數(shù)據(jù). 要安裝她只需要使用下面簡單的命令:

$ pip install requests

安裝,卸載,更新都是用pip這個命令. pip freeze能夠查看你系統(tǒng)上安裝的python庫. 要搜索可用的庫,到這里 Python Package Index (PyPI).

虛擬環(huán)境

當(dāng)你開發(fā)的東西越來越多, 你會發(fā)現(xiàn)有一些特殊版本的工具或者工具是很難運(yùn)行起來, 特定的項(xiàng)目要特定版本的庫或工具, 有時候還有和其他項(xiàng)目用到的庫發(fā)生沖突. 當(dāng)開發(fā)Python2 和Python3 兩個版本時, 甚至 Python 本省就有問題, 有可能(很小)你在開發(fā)的時候系統(tǒng)崩潰.

解決辦法是用給開發(fā)包一個專門的虛擬環(huán)境, 然后在這個環(huán)境下開發(fā)項(xiàng)目. 虛擬環(huán)境可用可以創(chuàng)建一個包含特定版本Python,pip, 以及第三方包的目錄. 這個虛擬環(huán)境在命令行中啟用和停止, 允許用戶創(chuàng)建自己的虛擬環(huán)境. 而且她還能個匹配特定的生產(chǎn)環(huán)境(通常是Linux).

Virtualenvwrapper 是另外一個能夠讓你管理多喝虛擬環(huán)境并把他們關(guān)聯(lián)成一個特定項(xiàng)目的庫. 這個工具同樣必不可少的. 用下面的命令來安裝這兩個工具:

$ pip install virtualenv virtualenvwrapper

然后在你的家目錄下編輯.profile文件,并在最后添加下面下面幾行:

export WORKON_HOME=$HOME/.virtualenvs
export PROJECT_HOME=$HOME/Projects
source /usr/local/bin/virtualenvwrapper.sh

你所有的虛擬環(huán)境都會存在一個叫virtualenvs的隱藏目錄下, 你的項(xiàng)目目錄就是用來存放你代碼的地方, 我在下面來討論這塊.為了更方便的使用, 我給irtualenv腳本做了很多別號,可以在Ben's VirtualEnv Cheat Sheet查看擴(kuò)充.

注意: Windows 用戶可能需要每個系統(tǒng)有所差別.

代碼構(gòu)建流程

有一下兩種形式的代碼構(gòu)建和執(zhí)行:

  • 把代碼寫到文本文件中,然后用python執(zhí)行
  • 把代碼寫到文本文件中,然后導(dǎo)入到交互式編程環(huán)境中.

Generally speaking, developers do both. Python programs are intended to be executed on the command line via thepython
binary, and the thing that is executed is usually an entry point to a much larger library of code that is imported. The difference between importing and execution is subtle, but as you do more Python it becomes more important.

With either of these workflows, you create your code in as modular a fashion as possible and, during the creation process, you execute it in one of the methods described above to check it's working. Most Python developers are back and forth between their terminal and the editor, and can do fine grained testing of every single line of code as they're writing it. This is the rapid prototyping aspect of Python.
So let's start with a simple example.
Open a terminal window (see your specific operating system for instructions on how to do this).
NOTE: Commands are in bash (Linux/Mac) or Windows Powershell

Create a workspace for yourself. A workspace, in this sense, is just an empty directory where you can get ready to start doing development work. You should probably also keep your various projects (here, a synonym for workspace) in their own directory as well, for now we'll just call it "Projects" and assume it is in your home directory. Our first project will be called "myproject", but you'd just name this whatever you'd like.
$ cd ~/Projects$ mkdir myproject$ cd myproject

Let's create our first Python script. You can either open your favorite editor and save the file into your workspace (the ~/Projects/myproject directory), or you can touch it and then open that file with your editor.
$ touch foo.py

PRO TIP: If you're using Sublime Text 3 and have thesubl
command line tool installed (See Sublime Text installation instructions), you can use the following command to open up the current directory in the editor:
$ subl . &

I use this so much that I've aliased the command toe
.

So here's where you should be: You should have a text editor open and editing the file at~/Projects/myproject/foo.py
, and you should have a terminal window open whose current working directory is~/Projects/myproject
. You're now ready to develop. Add the following code to foo.py:

!/usr/bin/env pythonimport csvdef dataset(path): with open(path, 'rU') as data: reader = csv.reader(data) for row in reader: row[2] = int(row[2]) yield row

This code is very simple. It just implements a function that accepts a path and returns an iterator so that you can access every row of a CSV file, while also converting the third item in every row to an integer.
PRO TIP: The#!
(pronounced "shebang") line must appear at the very beginning of an executable Python script with nothing before it. It will tell your computer that this is a Python file and execute the script correctly if run from the command line as a standalone app. This line doesn't need to appear in library modules, that is, Python code that you plan to import rather than execute.

Create some data so that we can use our function. Let's keep all of our data in a fixtures directory in our project.
$ mkdir fixtures$ touch fixtures/calories.csv

Using your editor, add this data to the calories.csv file:
butter,tbsp,102cheddar cheese,slice,113whole milk,cup,148hamburger,item,254

Ok, now it's time to use our code. First, let's try to execute the code in the interpreter. Open up the REPL as follows:
$ python>>>

You should now be presented with the Python prompt (>>>
). Anything you type in now should be in Python, not bash. Always note the prompts in the instructions. A prompt with$
means type in command line instructions (bash), a prompt that says>>>
means type in Python on the REPL, and if there is no prompt, you're probably editing a file. Import your code:

from foo import dataset>>> for row in dataset('fixtures/calories.csv'):... print row[0]buttercheddar cheesewhole milkhamburger>>>

A lot happened here, so let's inspect it. First, when you imported the dataset function from foo, Python looked in your current working directory and found thefoo.py
file, and that's where it imported it from. Where you are on the command line and what your Python path is matters!
When you import the dataset function the way we did, the module is loaded and executed all at once and provided to the interpreter's namespace. You can now use it by writing a for loop to go through every row and print the first item. Note the...
prompt. This means that Python is expecting an indented block. To exit the block, hit enter twice. The print results appear right in the screen, and then you're returned to the prompt.
But what if you make a change in the code, for example, capitalizing the first letter of the words in first item of each row? The changes you write in your file won't show up in the REPL. This is because Python has already loaded the code once. To get the changes, you either have to exit the REPL and restart or you have to import in a different way:

import foo>>> for row in foo.dataset('fixtures/calories.csv'):...

Now you can reload the foo module and get your code changes:

reload(foo)

This can get pretty unwieldy as code gets larger and more changes happen, so let's shift our development strategy over to executing Python files. Inside foo.py, add the following to the end of the file:
if name == 'main': for row in dataset('fixtures/calories.csv'): print row[0]

To execute this code, you simply type the following on the command line:
$ python foo.pybuttercheddar cheesewhole milkhamburger

Theif name == 'main':
statement means that the code will only get executed if the code is run directly, not imported. In fact, if you open up the REPL and type inimport foo
, nothing will be printed to your screen. This is incredibly useful. It means that you can put test code inside your script as you're developing it without worrying that it will interfere with your project. Not only that, it documents to other developers how the code in that file should be used and provides a simple test to check to make sure that you're not creating errors.
In larger projects, you'll see that most developers put test and debugging code under so called "ifmain" statements at the bottom of their files. You should do this too!

With this example, hopefully you have learned the workflow for developing Python programs both by executing scripts and using "ifmain" as well as importing and reloading scripts in the REPL. Most developers use both methods interchangeably, using whatever is needed at the time.
Structuring Larger Projects
Ok, so how do you write an actual Python program and move from experimenting with short snippets of code to larger programs? The first thing you have to do is organize your code into a project. Unfortunately there is really nothing to do this for you automatically, but most developers follow a well known pattern that was introduce by Zed Shaw in his book Learn Python the Hard Way.
In order to create a new project, you'll implement the "Python project skeleton," a set of directories and files that belong in every single project you create. The project skeleton is very familiar to Python developers, and you'll quickly start to recognize it as you investigate the code of other Python developers (which you should be doing). The basic skeleton is implemented inside of a project directory, which are stored in your workspace as described above. The directory structure is as follows (for an example project calledmyproject
):
$ myproject.├── README.md├── LICENSE.txt├── requirements.txt├── setup.py├── bin| └── myapp.py├── docs| ├── _build| ├── conf.py| ├── index.rst| └── Makefile├── fixtures├── foo| └── init.py└── tests └── init.py

This is a lot, but don't be intimidated. This structure implements many tools including packaging for distribution, documentation with Sphinx, testing, and more.
Let's go through the pieces one by one. Project documentation is the first part, implemented asREADME.md
andLICENSE.txt
files. The README file is a markdown document that you can add developer-specific documentation to your project. The LICENSE can be any open source license, or a Copyright statement in the case of proprietary code. Both of these files are typically generated for you if you create your project in Github. If you do create your file in Github, you should also use the Python.gitignore
that Github provides, which helps keep your repositories clean.
Thesetup.py
script is a Python setuptools or distutils installation script and will allow you to configure your project for deployment. It will use therequirements.txt
to specify the third party dependencies required to implement your project. Other developers will also use these files to create their development environments.
Thedocs
directory contains the Sphinx documentation generator, Python documentation is written in restructuredText, a Markup language similar to Markdown and others. This documentation should be more extensive and should be for both users and developers. Thebin
directory will contain any executable scripts you intend to build. Data scientists also typically also have afixtures
directory in which to store data files.
Thefoo
andtests
directories are actually Python modules since they contain the__init__.py
file. You'll put your code in foo and your tests in tests. Once you start developing inside your foo directory, note that when you open up the REPL, you have to import everything from the 'foo' namespace. You can put import statements in your__init__.py
files to make things easier to import as well. You can still also execute your scripts in the foo directory using the "ifmain" method.
Setting Up Your First Project
You don't have to manually create the structure above, many tools will help you build this environment. For example the Cookiecutter project will help you manage project templates and quickly build them. The spinx-quickstart command will generate your documentation directory. Github will add theREADME.md
andLICENSE.txt
stubs. Finally,pip freeze
will generate therequirements.txt
file.
Starting a Python project is a ritual, however, so I will take you through my process for starting one. Light a candle, roll up your sleeves, and get a coffee. It's time.
Inside of your Projects directory, create a directory for your workspace (project). Let's pretend that we're building a project that will generate a social network from emails, we'll call it "emailgraph."
$ mkdir ~/Projects/emailgraph$ cd ~/Projects/emailgraph

Initialize your repository with Git.
$ git init

Initialize your virtualenv with virtualenv wrapper.
$ mkvirtualenv -a $(pwd) emailgraph

This will create the virtual environment in ~/.virtualenvs/emailgraph and automatically activate it for you. At any time and at any place on the command line, you can issue theworkon emailgraph
command and you'll be taken to your project directory (the-a
flag specifies that this is the project directory for this virtualenv).

Create the various directories that you'll require:
(emailgraph)$ mkdir bin tests emailgraph docs fixtures

And then create the various files that are needed:
(emailgraph)$ touch tests/init.py(emailgraph)$ touch emailgraph/init.py(emailgraph)$ touch setup.py README.md LICENSE.txt .gitignore(emailgraph)$ touch bin/emailgraph-admin.py

Generate the documentation usingsphinx-quickstart
:
(emailgraph)$ sphinx-quickstart

You can safely use the defaults, but make sure that you do accept the Makefile at the end to quickly and easily generate the documentation. This should create an index.rst and conf.py file in yourdocs
directory.

Install nose and coverage to begin your test harness:
(emailgraph)$ pip install nose coverage

Open up thetests/init.py
file with your favorite editor, and add the following initialization tests:
import unittestclass InitializationTests(unittest.TestCase): def test_initialization(self): """ Check the test suite runs by affirming 2+2=4 """ self.assertEqual(2+2, 4) def test_import(self): """ Ensure the test suite can import our module """ try: import emailgraph except ImportError: self.fail("Was not able to import the emailgraph")

From your project directory, you can now run the test suite, with coverage as follows:
(emailgraph)$ nosetests -v --with-coverage --cover-package=emailgraph \ --cover-inclusive --cover-erase tests

You should see two tests passing along with a 100% test coverage report.

Open up thesetup.py
file and add the following lines:

!/usr/bin/env pythonraise NotImplementedError("Setup not implemented yet.")

Setting up your app for deployment is the topic of another post, but this will alert other developers to the fact that you haven't gotten around to it yet.

Create therequirements.txt
file usingpip freeze
:
(emailgraph)$ pip freeze > requirements.txt

Finally, commit all the work you've done to email graph to the repository.
(emailgraph)$ git add --all(emailgraph)$ git statusOn branch masterInitial commitChanges to be committed: (use "git rm --cached <file>..." to unstage) new file: LICENSE.txt new file: README.md new file: bin/emailgraph-admin.py new file: docs/Makefile new file: docs/conf.py new file: docs/index.rst new file: emailgraph/init.py new file: requirements.txt new file: setup.py new file: tests/init.py(emailgraph)$ git commit -m "Initial repository setup"

With that you should have your project all setup and ready to go. Get some more coffee, it's time to start work!
Conclusion
With this post, hopefully you've discovered some best practices and workflows for Python development. Structuring both your code and projects this way will help keep you organized and will also help others quickly understand what you've built, which is critical when working on projects involving more than one person. More importantly, this project structure is the preparation for deployment and the base for larger applications and professional, production grade software. Whether you're scripting or writing apps, I hope that these workflows will be useful.
If you'd like to explore further how to include professional grade tools into your Python development, check out some of the following tools:
Travis-CI is a continuing integration service that will automatically run your test harness when you commit to Github. It will make sure that all of your tests are passing before you push to production!
Waffle.io will turn your Github issues into a full Agile board allowing you to track milestones and sprints, and better coordinate your team.
Pylint will automatically check for good coding standards, error detection, and even draw UML diagrams for your code!

If you're having trouble with anything we've covered or you find any errors, please leave us a comment! Also, all developers are as different as they are the same, so if you have a workflow that you think others would benefit from, please let us know in the code!
If you liked this post and found it helpful, go to the blog home page and click the Subscribe button so that you don't miss any of the awesome posts we have coming up.

擴(kuò)展閱讀

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市,隨后出現(xiàn)的幾起案子九昧,更是在濱河造成了極大的恐慌御滩,老刑警劉巖严望,帶你破解...
    沈念sama閱讀 206,378評論 6 481
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件件余,死亡現(xiàn)場離奇詭異鲫懒,居然都是意外死亡蚁堤,警方通過查閱死者的電腦和手機(jī)醉者,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 88,356評論 2 382
  • 文/潘曉璐 我一進(jìn)店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來披诗,“玉大人撬即,你說我怎么就攤上這事〕识樱” “怎么了剥槐?”我有些...
    開封第一講書人閱讀 152,702評論 0 342
  • 文/不壞的土叔 我叫張陵,是天一觀的道長宪摧。 經(jīng)常有香客問我粒竖,道長,這世上最難降的妖魔是什么几于? 我笑而不...
    開封第一講書人閱讀 55,259評論 1 279
  • 正文 為了忘掉前任温圆,我火速辦了婚禮,結(jié)果婚禮上孩革,老公的妹妹穿的比我還像新娘岁歉。我一直安慰自己,他們只是感情好膝蜈,可當(dāng)我...
    茶點(diǎn)故事閱讀 64,263評論 5 371
  • 文/花漫 我一把揭開白布锅移。 她就那樣靜靜地躺著,像睡著了一般饱搏。 火紅的嫁衣襯著肌膚如雪非剃。 梳的紋絲不亂的頭發(fā)上,一...
    開封第一講書人閱讀 49,036評論 1 285
  • 那天推沸,我揣著相機(jī)與錄音备绽,去河邊找鬼券坞。 笑死,一個胖子當(dāng)著我的面吹牛肺素,可吹牛的內(nèi)容都是我干的恨锚。 我是一名探鬼主播,決...
    沈念sama閱讀 38,349評論 3 400
  • 文/蒼蘭香墨 我猛地睜開眼倍靡,長吁一口氣:“原來是場噩夢啊……” “哼猴伶!你這毒婦竟也來了?” 一聲冷哼從身側(cè)響起塌西,我...
    開封第一講書人閱讀 36,979評論 0 259
  • 序言:老撾萬榮一對情侶失蹤他挎,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后捡需,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體办桨,經(jīng)...
    沈念sama閱讀 43,469評論 1 300
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 35,938評論 2 323
  • 正文 我和宋清朗相戀三年站辉,在試婚紗的時候發(fā)現(xiàn)自己被綠了呢撞。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點(diǎn)故事閱讀 38,059評論 1 333
  • 序言:一個原本活蹦亂跳的男人離奇死亡庵寞,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出薛匪,到底是詐尸還是另有隱情捐川,我是刑警寧澤,帶...
    沈念sama閱讀 33,703評論 4 323
  • 正文 年R本政府宣布逸尖,位于F島的核電站古沥,受9級特大地震影響,放射性物質(zhì)發(fā)生泄漏娇跟。R本人自食惡果不足惜岩齿,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 39,257評論 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望苞俘。 院中可真熱鬧盹沈,春花似錦、人聲如沸吃谣。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,262評論 0 19
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽岗憋。三九已至肃晚,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間仔戈,已是汗流浹背关串。 一陣腳步聲響...
    開封第一講書人閱讀 31,485評論 1 262
  • 我被黑心中介騙來泰國打工拧廊, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留,地道東北人晋修。 一個月前我還...
    沈念sama閱讀 45,501評論 2 354
  • 正文 我出身青樓吧碾,卻偏偏與公主長得像,于是被迫代替她去往敵國和親飞蚓。 傳聞我的和親對象是個殘疾皇子滤港,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 42,792評論 2 345

推薦閱讀更多精彩內(nèi)容