安裝對(duì)應(yīng)版本
selenium==2.48.0
beautifulsoup4==4.7.1
pip安裝
pip3 install selenium==2.48.0
pip3 install sqlite3, beautifulsoup4,selenium
pip3 install lxml / pip3 install html5lib
Pip3 install PyExecJS
獲取chromedriver
方法一 Mac安裝
brew install chromedriver
方法二:
https://npm.taobao.org/mirrors/chromedriver/ 下載地址
找對(duì)應(yīng)的chrome版本-下載chromedriver包,解壓后智什,放入到/usr/local/bin
提升權(quán)限:
sudo chmod u+x,o+x /usr/local/bin/chromedriver
phantomjs安裝(和chromedriver二選一)
方法一
使用官網(wǎng):http://phantomjs.org/download.html
方法二
sudo npm install -g phantomjs-prebuilt
方法三
brew update && brew install phantomjs
selenium使用說(shuō)明
八種單數(shù)形式
1.id定位:find_element_by_id(self, id_)
2.name定位:find_element_by_name(self, name)
3.class定位:find_element_by_class_name(self, name)
4.tag定位:find_element_by_tag_name(self, name)
5.link定位:find_element_by_link_text(self, link_text)
6.partial_link定位find_element_by_partial_link_text(self, link_text)
7.xpath定位:find_element_by_xpath(self, xpath)
8.css定位:find_element_by_css_selector(self, css_selector)
八種復(fù)數(shù)形式
9.id復(fù)數(shù)定位find_elements_by_id(self, id_)
10.name復(fù)數(shù)定位find_elements_by_name(self, name)
11.class復(fù)數(shù)定位find_elements_by_class_name(self, name)
12.tag復(fù)數(shù)定位find_elements_by_tag_name(self, name)
13.link復(fù)數(shù)定位find_elements_by_link_text(self, text)
14.partial_link復(fù)數(shù)定位find_elements_by_partial_link_text(self, link_text)
15.xpath復(fù)數(shù)定位find_elements_by_xpath(self, xpath)
16.css復(fù)數(shù)定位find_elements_by_css_selector(self, css_selector)
綜合用例
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
link = 'http://www.baidu.com'
browser = webdriver.PhantomJS() # 使用PhantomJS
# browser = webdriver.Chrome() #使用chromedriver
browser.get(link)
browser.encoding="utf-8"
html_doc = browser.page_source
soup=BeautifulSoup(html_doc,'lxml')
soupArr=soup.select( '[style="text-decoration:none;"]' )
yuanwen_list=soup.find_all("div", "contson")[0]
......