問(wèn)題:爬取某網(wǎng)站需要用到 chromedriver + mitmproxy , 但是有個(gè)致命的問(wèn)題就是加入headless和proxy參數(shù)后淤毛,代理一直會(huì)出錯(cuò),其實(shí)就是證書(shū)的問(wèn)題芍殖。
解決:根據(jù)以上問(wèn)題黍衙,進(jìn)行了很多搜索,測(cè)試料扰。 但是有很多版本的答案其實(shí)都是不能用的凭豪,問(wèn)題還是沒(méi)有得到解決。最終在chromium上找到了一個(gè)標(biāo)準(zhǔn)答案晒杈。下面的代碼轉(zhuǎn)自:https://bugs.chromium.org/p/chromium/issues/detail?id=721739#c60
作者:黑螞蟻
來(lái)源:CSDN
原文:https://blog.csdn.net/weixin_39847926/article/details/82190341
版權(quán)聲明:本文為博主原創(chuàng)文章嫂伞,轉(zhuǎn)載請(qǐng)附上博文鏈接!
from selenium import webdriver
from pyvirtualdisplay import Display
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
#_____________________基本設(shè)定___________________________
CHROME_DRIVER_PATH = r'/usr/bin/chromedriver'
PROXY = "http://127.0.0.1:8080"
#_____________________啟動(dòng)參數(shù)___________________________
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--disable-gpu')
options.add_argument("window-size=1024,768")
options.add_argument("--no-sandbox")
#_____________________代理參數(shù)___________________________
desired_capabilities = options.to_capabilities()
desired_capabilities['acceptSslCerts'] = True
desired_capabilities['acceptInsecureCerts'] = True
desired_capabilities['proxy'] = {
"httpProxy": PROXY,
"ftpProxy": PROXY,
"sslProxy": PROXY,
"noProxy": None,
"proxyType": "MANUAL",
"class": "org.openqa.selenium.Proxy",
"autodetect": False,
}
#_____________________啟動(dòng)瀏覽器___________________________
driver = webdriver.Chrome(
chrome_options=options,
executable_path=CHROME_DRIVER_PATH,
desired_capabilities = desired_capabilities,
)
for i in range(1):
driver.get('https://www.iplocation.net')
contant = driver.page_source
driver.save_screenshot('hello.png')
print(contant)
driver.close()
driver.quit()
mitmdump -p 8080
抄自https://blog.csdn.net/weixin_39847926/article/details/82190341,查侵刪
成功
我的是返回<html xmlns="http://www.w3.org/1999/xhtml"><head></head><body></body></html>