response.follow()不用拼接域名url
yield response.follow(url, callback=self.parse_mate)
xpath選擇所有子類文本例子.xpath('string(.)')
:
node_list = response.xpath('//h3[@class="c-title"]/a').xpath('string(.)').extract_first()
獲取子標(biāo)簽帶html標(biāo)簽的xpath :
''.join(node.xpath('./h3[@class="c-title"]/a/node()').extract())
獲取子標(biāo)簽只獲取文本:
node.xpath('./h3[@class="c-title"]/a').xpath('string(.)').extract_first().
獲取html內(nèi)容和beatifullsoup一樣
response = etree.HTML(content)
response.tostring()
給一個離線庫下載的網(wǎng)站:
https://www.lfd.uci.edu/~gohlke/pythonlibs/#lxml