現(xiàn)在想看個(gè)電影越來越麻煩婆翔,在線看就要各種會(huì)員葫男,下載看就要先忍受一波各種垃圾廣告的狂轟亂炸膳沽,于是矫渔,寫個(gè)爬蟲抓取電影資源的下載鏈接彤蔽。
1.這里以比特兔為例(其實(shí)各種bt網(wǎng)站大同小異)
(網(wǎng)址)[http://www.btrabbit.cc/]
2.搜索一部電影如守法公民,網(wǎng)址變?yōu)椤?a target="_blank" rel="nofollow">http://www.btrabbit.cc/search/守法公民.html”
3.右鍵檢查(Chrome)庙洼,copy Xpath即可直接獲得下載路徑
4.源碼:
# -*- coding: utf-8 -*-
import os
import sys
import re
import requests
from lxml import html
reload(sys)
sys.setdefaultencoding('utf8')
def analyUrl(name):
url='http://www.btrabbit.cc/search/%s.html'%name
response=requests.get(url).content
selector = html.fromstring(response)
hrefs=selector.xpath('//div[@class="search-item detail-width"]')
sourcelist=[]
if len(hrefs)>0:
href=hrefs[0]
for x in hrefs:
name=x.xpath('div[@class="item-title"]/h3/a/@title')
nameStr=''
nameStr=nameStr+name[0]
detail=href.xpath('div[@class="item-bar"]/a/text()')
if detail:
nameStr=nameStr+detail[0]
sourcelist.append(nameStr)
downUrl=x.xpath('div[@class="item-bar"]/a/@href')
sourcelist.append(downUrl[0])
if len(sourcelist)==2:
break
return sourcelist
def searchFH(name):
seedstr = '\n'.join(analyUrl(name))
return seedstr
if __name__ == '__main__':
print searchFH('守法公民')
5.完成顿痪。