第四章:API
- API的說明
- API的幾個例子
- JSON格式在python中的解析
很簡單斟冕,JSON同一級的轉(zhuǎn)化為list台汇,每個JSON對象轉(zhuǎn)化為dic萍诱。
import json
jsonString = '{"arrayOfNums":[{"number":0},{"number":1},{"number":2}],
"arrayOfFruits":[{"fruit":"apple"},{"fruit":"banana"},
{"fruit":"pear"}]}'
jsonObj = json.loads(jsonString)
print(jsonObj.get("arrayOfNums"))
print(jsonObj.get("arrayOfNums")[1])
print(jsonObj.get("arrayOfNums")[1].get("number")+
jsonObj.get("arrayOfNums")[2].get("number"))
print(jsonObj.get("arrayOfFruits")[2].get("fruit"))
結(jié)果:
[{'number': 0}, {'number': 1}, {'number': 2}]
{'number': 1}
3
pear
數(shù)據(jù)存儲
數(shù)據(jù)庫、文件谁撼、Email
存儲鏈接的優(yōu)缺點
優(yōu)點:快、占用空間小滋饲、容易編寫代碼厉碟、使用urllib.request中的urlretrieve函數(shù)下載資源
保存數(shù)據(jù)到CSV
import csv
csvFile = open("../files/test.csv", 'w+')
try:
writer = csv.writer(csvFile)
writer.writerow(('number', 'number plus 2', 'number times 2'))
for i in range(10):
writer.writerow( (i, i+2, i*2))
finally:
csvFile.close()
結(jié)果:
number,number plus 2,number times 2
0,2,0
1,3,2
2,4,4
...
介紹數(shù)據(jù)庫
用Email發(fā)送信息
讀取文件
清洗臟數(shù)據(jù)
自然語言處理
表格和登陸
我們要用到requests這個庫
- 提交一個簡單的表格
首先看到表格頁面:
記住它們各種的名字
填寫頁面和真正處理的頁面不一定是同一個頁面
def SubForm():
params = {'firstname': 'yuecheng', 'lastname': 'li'}
r = requests.post("http://pythonscraping.com/files/processing.php", data=params)
print(r.text)
輸出:
Hello there, yuecheng li!
某個注冊頁面的表格:
<form action="http://post.oreilly.com/client/o/oreilly/forms/
quicksignup.cgi" id="example_form2" method="POST">
<input name="client_token" type="hidden" value="oreilly" />
<input name="subscribe" type="hidden" value="optin" />
<input name="success_url" type="hidden" value="http://oreilly.com/store/
newsletter-thankyou.html" />
<input name="error_url" type="hidden" value="http://oreilly.com/store/
newsletter-signup-error.html" />
<input name="topic_or_dod" type="hidden" value="1" />
<input name="source" type="hidden" value="orm-home-t1-dotd" />
<fieldset>
<input class="email_address long" maxlength="200" name=
"email_addr" size="25" type="text" value=
"Enter your email here" />
<button alt="Join" class="skinny" name="submit" onclick=
"return addClickTracking('orm','ebook','rightrail','dod'
);" value="submit">Join</button>
</fieldset>
</form>
看起來好像很嚇人,但是你只要抓住兩點:
1.記住對應的名字
2.表格填充的數(shù)據(jù)真正被提交到哪里去
其他組件的數(shù)據(jù)提交
同上或者用瀏覽器模擬然后觀察發(fā)送的數(shù)據(jù)上傳文件
保持登陸和cookie
import requests
params = {'username': 'Ryan', 'password': 'password'}
r = requests.post("http://pythonscraping.com/pages/cookies/welcome.php", params)
print("Cookie is set to:")
print(r.cookies.get_dict())
print("-----------")
print("Going to profile page...")
r = requests.get("http://pythonscraping.com/pages/cookies/profile.php",
cookies=r.cookies)
print(r.text)
或者使用session
import requests
session = requests.Session()
params = {'username': 'username', 'password': 'password'}
s = session.post("http://pythonscraping.com/pages/cookies/welcome.php", params)
print("Cookie is set to:")
print(s.cookies.get_dict())
print("-----------")
print("Going to profile page...")
s = session.get("http://pythonscraping.com/pages/cookies/profile.php")
print(s.text)
HTTP Basic Access Authentication
其他形式的表格問題
看后續(xù)章節(jié)