緣由
唐家三少《龍王傳說》寫了比較多了沦偎,剛好看到了瀏覽器里有推薦,于是就稍稍看看咳蔚,然而豪嚎,總是感覺頁面廣告太多了,體驗(yàn)太差谈火,干脆侈询,用Python整理一下好了。
環(huán)境
windows糯耍,Python2.x扔字,requests,lxml
代碼
#!/usr/bin/python
# -*- coding: utf-8 -*-
import sys
reload(sys)
sys.setdefaultencoding("utf-8")
import requests
def getHtml(url,headers=None):
r = requests.get(url,headers=headers)
return r.content
def useXpath(html):
from lxml import etree
html = etree.HTML(html)
#print type(html)
mulu = []
urls_text = html.xpath('//*[@id="list"]/dl/dd/a/text()')
urls = html.xpath('//*[@id="list"]/dl/dd/a/@href')
headers = {
'Referer':'http://www.aiquxs.com/read/41/41742/index.html',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36'
}
with open('d://xiaoshu8888.txt','a') as f:
for x in range(len(urls)):
url = 'http://www.aiquxs.com/read/41/41742/' + urls[x]
print u'正在抓取 ',urls_text[x],u' 網(wǎng)址為: ' + url
f.write(urls_text[x]+'\n') # 將章節(jié)名寫入文件
html = getHtml(url,headers) # 獲取章節(jié)內(nèi)容的源碼
html = etree.HTML(html)
text = html.xpath('//*[@id="booktext"]/text()')
for item in text:
f.write(item+'\n')
if __name__ == '__main__':
#目錄 url
url = 'http://www.aiquxs.com/read/41/41742/index.html'
html = getHtml(url)
useXpath(html)
運(yùn)行圖片

結(jié)束語
喜歡的話温技,歡迎關(guān)注革为、打賞,收藏舵鳞,謝謝震檩!