寫在前面
環(huán)境:pycharm
用到的庫:re饱须、requests
過程
找到URL
在搜索框里輸入關(guān)鍵字芝薇,可以發(fā)現(xiàn)url發(fā)生了變化先巴,我們把一些不需要的參數(shù)給去掉椎工,試試網(wǎng)頁還能不能正常返回(別問我怎么知道哪些需要哪些不需要)
然后整理得到最終的URL是這個樣子的
分析網(wǎng)頁源代碼
這里我們查看網(wǎng)頁的源代碼,隨便搜索一個物品的名稱翘簇,發(fā)現(xiàn)是在raw_title這里面
同理撬码,我們可以找到價格的位置存放在raw_price里,這樣我們就可以獲取到名稱與價格了
實現(xiàn)過程
首先導(dǎo)入我們所需要的庫
import re
import requests
接下來是獲取網(wǎng)頁的源代碼
def getHTMLText(url):
try:
response = requests.get(url, timeout = 30)
response.raise_for_status()
response.encoding = 'utf-8'
return response.text
except:
return ''
然后就是解析網(wǎng)頁版保,并獲取寶貝的價格和標(biāo)題了
def parseHtml(html):
try:
re_title = re.compile(r'"raw_title":"(.*?)"', re.S)
re_price = re.compile(r'"view_price":"(.*?)"', re.S)
raw_title = re.findall(re_title, html)
view_price = re.findall(re_price, html)
for title, price in zip(raw_title, view_price):
print(title, price)
except:
return ''
基本大功告成了呜笑,再稍稍的添加一下功能夫否,比如分頁效果之類的(寫在main函數(shù)里了)
def main():
url = 'https://s.taobao.com/search?q='
goods = input('查詢物品名稱:')
deeps = int(input('查詢頁數(shù):'))
print('-' * 30)
for i in range(deeps):
html = getHTMLText(url + goods + "&s=" + str(44 *i))
parseHtml(html)
nice,完整代碼貼上
import requests
import re
#獲取網(wǎng)頁源代碼
def getHTMLText(url):
try:
response = requests.get(url, timeout = 30)
response.raise_for_status()
response.encoding = 'utf-8'
return response.text
except:
return ''
#解析網(wǎng)頁叫胁,并獲取寶貝的價格和標(biāo)題
def parseHtml(html):
try:
re_title = re.compile(r'"raw_title":"(.*?)"', re.S)
re_price = re.compile(r'"view_price":"(.*?)"', re.S)
raw_title = re.findall(re_title, html)
view_price = re.findall(re_price, html)
for title, price in zip(raw_title, view_price):
print(title, price)
except:
return ''
def main():
url = 'https://s.taobao.com/search?q='
goods = input('查詢物品名稱:')
deeps = int(input('查詢頁數(shù):'))
print('-' * 30)
for i in range(deeps):
html = getHTMLText(url + goods + "&s=" + str(44 *i))
parseHtml(html)
if __name__ == '__main__':
main()
完成
附一張效果圖