第一次學(xué)習(xí)異步加載的網(wǎng)頁如何找出真實(shí)網(wǎng)頁,看了一下午挣柬,實(shí)在是有點(diǎn)困難潮酒。但是就是有這么個(gè)毛病,越是找不到的就越想找到邪蛔。
Paste_Image.png
到現(xiàn)在終于找到了我要的真實(shí)網(wǎng)址急黎,淚奔。侧到。勃教。
我們以黃山為例:在輸入黃山之后,得到的評(píng)論如下圖所示:
Paste_Image.png
什么叫異步加載匠抗,就是我在選取評(píng)論語言的時(shí)候故源,上面的網(wǎng)址是不會(huì)變的,說明有貓膩汞贸。
Paste_Image.png
我在首先明白了什么叫抓包绳军,以及怎么去抓包之后就開始了漫長的找包之旅印机,過程就不贅述了,
首先發(fā)現(xiàn)在起始網(wǎng)頁中加入瀏覽器信息的時(shí)候是可以解析出英文界面的门驾,但是I淙!奶是!
Paste_Image.png
在這里有一個(gè)更多楣责,又是一個(gè)異步加載!還得接著找诫隅。
在開發(fā)者工具里點(diǎn)擊 clear
Paste_Image.png
在多次點(diǎn)擊更多之后腐魂,發(fā)現(xiàn)出來一個(gè)這個(gè)玩意
Paste_Image.png
Paste_Image.png
Paste_Image.png
到此結(jié)束了兔毒?
肯定并沒有,那些一長串的數(shù)字是怎么來的甸箱? 下一篇再介紹育叁。 to be continue...
Paste_Image.png
照例,附上單獨(dú)解析的代碼:
import requests
from lxml import etree
url='http://www.tripadvisor.cn/ExpandedUserReviews-g303685-d550738?target=410115359&context=1&reviews=410115359,409344604,407255372,401140048,400179383,398229741,396111020,395334568,394200191,393782571&servlet=Attraction_Review&expand=1'
headers = {'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate, sdch',
'Accept-Language': 'zh-CN,zh;q=0.8',
'Connection': 'keep-alive',
'Cookie': 'ServerPool=X; TATravelInfo=V2*A.2*MG.-1*HP.2*FL.3*RVL.550738_100*RS.1; TASSK=enc%3AAGMMZ%2Bwe98u9po0Y%2FIY8pNbyuAGi9fbnqnNLKXa4%2BK5cWP0RMuCHTRZhu0uFf1yydRIPPAQ%2FpF7EdW0NLOpBZZId19ek1a9GHWZKvnuTIJ0QcXx1ULQXtiMx%2F%2BHhNCUrIg%3D%3D; TAUnique=%1%enc%3AjrXWw0qqncCEQMzfl5keG315t9yL8iOg6jLwcPiP6q8%3D; _jzqckmp=1; bdshare_firstime=1491815789350; __gads=ID=e5060e1a6b1ed08f:T=1491815796:S=ALNI_MbFkpxx2-zq7ubsIoe4wvdJnbQWoA; TALanguage=en; TAReturnTo=%1%%2FAttraction_Review-g303685-d550738-Reviews-Mt_Huangshan_Yellow_Mountain-Huangshan_Anhui.html; TASession=%1%V2ID.DA0C735ECBB05FFBD2F31EA11943410C*SQ.15*LP.%2FAttraction_Review-g303685-d550738-Reviews-Mt_Huangshan_Yellow_Mountain-Huangshan_Anhui%5C.html*LS.Attraction_Review*GR.70*TCPAR.53*TBR.19*EXEX.62*ABTR.65*PHTB.78*FS.82*CPU.26*HS.popularity*ES.popularity*AS.popularity*DS.5*SAS.popularity*FPS.oldFirst*LF.en*FA.1*DF.0*MS.-1*RMS.-1*FLO.550738*TRA.false*LD.550738; CM=%1%HanaPersist%2C%2C-1%7CPremiumMobSess%2C%2C-1%7Ct4b-pc%2C%2C-1%7CHanaSession%2C%2C-1%7CRCPers%2C%2C-1%7CWShadeSeen%2C%2C-1%7CFtrPers%2C%2C-1%7CTheForkMCCPers%2C%2C-1%7CHomeASess%2C%2C-1%7CPremiumSURPers%2C%2C-1%7CPremiumMCSess%2C%2C-1%7Csesscoestorem%2C%2C-1%7CCpmPopunder_1%2C1%2C1491902222%7CCCSess%2C%2C-1%7CCpmPopunder_2%2C1%2C-1%7CViatorMCPers%2C%2C-1%7Csesssticker%2C%2C-1%7C%24%2C%2C-1%7CPremiumORSess%2C%2C-1%7Ct4b-sc%2C%2C-1%7CMC_IB_UPSELL_IB_LOGOS2%2C%2C-1%7Cb2bmcpers%2C%2C-1%7CMC_IB_UPSELL_IB_LOGOS%2C%2C-1%7CPremMCBtmSess%2C%2C-1%7CPremiumSURSess%2C%2C-1%7CLaFourchette+Banners%2C%2C-1%7Csess_rev%2C%2C-1%7Csessamex%2C%2C-1%7Cperscoestorem%2C%2C-1%7CPremiumRRSess%2C%2C-1%7CSaveFtrPers%2C%2C-1%7CTheForkRRSess%2C%2C-1%7Cpers_rev%2C%2C-1%7CMetaFtrSess%2C%2C-1%7CRBAPers%2C%2C-1%7CWAR_RESTAURANT_FOOTER_PERSISTANT%2C%2C-1%7CFtrSess%2C%2C-1%7CHomeAPers%2C%2C-1%7CPremiumMobPers%2C%2C-1%7CRCSess%2C%2C-1%7CLaFourchette+MC+Banners%2C%2C-1%7Cbookstickcook%2C%2C-1%7Csh%2C%2C-1%7CLastPopunderId%2C137-1859-null%2C-1%7Cpssamex%2C%2C-1%7CTheForkMCCSess%2C%2C-1%7C2016sticksess%2C%2C-1%7CCCPers%2C%2C-1%7CWAR_RESTAURANT_FOOTER_SESSION%2C%2C-1%7Cb2bmcsess%2C%2C-1%7C2016stickpers%2C%2C-1%7CViatorMCSess%2C%2C-1%7CPremiumMCPers%2C%2C-1%7CPremiumRRPers%2C%2C-1%7CPremMCBtmPers%2C%2C-1%7CTheForkRRPers%2C%2C-1%7CSaveFtrSess%2C%2C-1%7CPremiumORPers%2C%2C-1%7CRBASess%2C%2C-1%7Cbookstickpers%2C%2C-1%7Cperssticker%2C%2C-1%7CMetaFtrPers%2C%2C-1%7C; TAUD=LA-1491815815299-1*LG-14277644-2.1.F.*LD-14277645-.....; roybatty=TNI1625!AP9YRq1oHIHfPtXcJCINRrDe7hLPCe8L8uurjbOYo996M1NrdEF3UC8F2w%2BA%2FvgIK20Ptfm2qFK2Y7gBNq3fPyswrYVGd%2BwBp%2FhQTse54C7MDQU3%2FCl9pe%2FrrYw8WiSNYgQ6pewgJ',
'Host': 'www.tripadvisor.cn',
'Referer': 'http://www.tripadvisor.cn/Attraction_Review-g303685-d550738-Reviews-Mt_Huangshan_Yellow_Mountain-Huangshan_Anhui.html',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36',
}
html=requests.post(url,headers=headers).content
selector=etree.HTML(html)
infos = selector.xpath('//div[@class="entry"]')
print(len(infos))
for info in infos:
comment = info.xpath('p/text()')[0]
print(comment)