@前言:
工作閑暇之余自學(xué)Python,想找個(gè)項(xiàng)目練練手,于是瞄準(zhǔn)了我們客戶阿里給我們下發(fā)任務(wù)的網(wǎng)站楚殿,往常同事都是手動(dòng)登錄網(wǎng)站,手動(dòng)復(fù)制粘貼Case內(nèi)容到Excel竿痰。Kanshan震驚脆粥,都9102年了,怎么還要做這么低效(無(wú)腦)的工作影涉,于是自學(xué)python嘗試自動(dòng)化獲取case內(nèi)容并且保存到本地变隔,想一想,能有多難蟹倾?匣缘??
然鵝:人生第一次認(rèn)真爬的網(wǎng)頁(yè)有萬(wàn)萬(wàn)個(gè)沒(méi)想到...
@問(wèn)題和方法
-
萬(wàn)萬(wàn)沒(méi)想到①:不是所有的網(wǎng)站都隨便逛的鲜棠,遇到這種攔路虎怎么辦肌厨,盤他? AVMS網(wǎng)址
方法①:先登錄網(wǎng)站豁陆,拿到cookies,放到headers里面請(qǐng)求網(wǎng)頁(yè)柑爸,發(fā)現(xiàn)網(wǎng)頁(yè)是Ajax渲染的,而且提交方式為post盒音,此路不通表鳍。
方法②:selenium模擬登錄后獲取cookies馅而,保存到本地,每次使用時(shí)再調(diào)用譬圣。先上模擬登錄的代碼:
@模擬登錄
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
def login():
driver.get(url) #加載頁(yè)面
#定位輸入用戶名的表單
username = WAIT.until(EC.presence_of_element_located((By.CSS_SELECTOR, "#exampleInputUser")))
#定位輸入密碼的表單
password = WAIT.until(EC.presence_of_element_located((By.CSS_SELECTOR, "#exampleInputPassword")))
#定位登錄的按鈕
submit = WAIT.until(EC.element_to_be_clickable((By.XPATH, '//*[@id="login-button"]')))
username.send_keys("XXXX") #引號(hào)內(nèi)為用戶和密碼
password.send_keys("XXXX")
submit.click() #模擬鼠標(biāo)點(diǎn)擊
driver.refresh() #刷新頁(yè)面
if __name__ =='__main__':
task_id = input("請(qǐng)輸入需要抓取的task_id:")
url = 'http://www.aliavms.cn:7001/tsmanager/index.html#/detail?task_id=' + task_id
pages_string = input("請(qǐng)輸入需要抓取得頁(yè)數(shù):")
pages = int(pages_string)
#options = webdriver.ChromeOptions() #使用chromeless需要的參數(shù)
#options.add_argument('headless')
#options.add_argument('disable-gpu')
#driver = webdriver.Chrome(options=options)
driver = webdriver.Firefox()
WAIT = WebDriverWait(driver, 10)
task_name, case_name = login() #為了生成excel名稱和sheet表格名稱
(下面的獲取cookies瓮恭、保存、讀取后來(lái)都沒(méi)有用到)
import os
import json
def get_cookies():
cookies = driver.get_cookies() #webdriver直接獲取cookies
def save_cookies(cookies):
with open("cookies.txt", "w") as fp:
json.dump(cookies, fp)
def read_cookie():
if os.path.exists('cookies.text'):
cookies_dict = dict()
with open("cookies.txt", "r") as fp:
cookies = json.load(fp)
for cookie in cookies:
cookies_dict[cookie['name']] = cookie['value']
return cookies_dict
else:
get_cookies()
return read_cookie()
- 如果是靜態(tài)網(wǎng)頁(yè)厘熟,那就很簡(jiǎn)單了
import requests
headers = {
# 假裝自己是瀏覽器
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/73.0.3683.75 Chrome/73.0.3683.75 Safari/537.36',
# 把你剛剛拿到的Cookie塞進(jìn)來(lái)
'Cookie': 'eda38d470a662ef3606390ac3b84b86f9; Hm_lvt_f1d3b035c559e31c390733e79e080736=1553503899; biihu__user_login=omvZVatKKSlcXbJGmXXew9BmqediJ4lzNoYGzLQjTR%2Fjw1wOz3o4lIacanmcNncX1PsRne5tXpE9r1sqrkdhAYQrugGVfaBICYp8BAQ7yBKnMpAwicq7pZgQ2pg38ZzFyEZVUvOvFHYj3cChZFEWqQ%3D%3D; Hm_lpvt_f1d3b035c559e31c390733e79e080736=1553505597',
}
session = requests.Session()
url = "https://......."
response = session.get(url)
print(response.text)
- 萬(wàn)萬(wàn)沒(méi)想到②:driver.page_source抓取的html只有部分代碼偎血,因?yàn)槭腔贏jax渲染的(雖然kanshan很菜,但是kanshan不會(huì)這么容易屈服的...)最終使用selenium+xpath定位獲取到需要抓取的每頁(yè)項(xiàng)數(shù)盯漂。
data = driver.find_element_by_xpath('/html/body/div[2]/div/div/div[8]/div[2]/div/table/tbody').find_elements_by_tag_name('tr')
length = len(data) - 1
- 接著抓取每一項(xiàng)的內(nèi)容颇玷,每一項(xiàng)又是單獨(dú)的一個(gè)頁(yè)面,所以規(guī)則是:點(diǎn)擊抓取項(xiàng)就缆,跳轉(zhuǎn)到新的頁(yè)面帖渠,因?yàn)榈谝淮瓮媾老x,所以這里也踩到坑了竭宰,因?yàn)閐river的定位還在主頁(yè)面空郊,雖然另外加載了一個(gè)標(biāo)簽頁(yè),獲取的仍然是主頁(yè)面的信息切揭,所以要做如下操作:
import time
def new_page(button1): #button1是抓取項(xiàng)的xpath路徑
page_detail = WAIT.until(EC.element_to_be_clickable((By.XPATH, button1)))
page_detail.click()
time.sleep(2) #給足頁(yè)面加載時(shí)間
#driver.window_handles是獲取所有句柄
new_page = driver.window_handles[-1] #獲取新標(biāo)簽頁(yè)(子頁(yè)面)的句柄
page = driver.window_handles[0] #獲取主頁(yè)面的句柄
driver.switch_to.window(new_page) #跳轉(zhuǎn)到子頁(yè)面
save_to_excel()
time.sleep(1)
driver.close() #抓取完成關(guān)閉子頁(yè)面
driver.switch_to.window(page) #跳轉(zhuǎn)到主頁(yè)面
- 萬(wàn)萬(wàn)沒(méi)想到③:緊接著遇到新的問(wèn)題:子頁(yè)面里面有框架iframe的嵌套...
最終解決辦法是先定位到iframe狞甚,然后再跳出,進(jìn)入下一個(gè)iframe,再跳出廓旬,所以Kanshan還寫了個(gè)循環(huán)哼审。
for i in range(1, 4):
#iframe的xpath
button2 = "/html/body/div/div[2]/div/div[4]/div[%d]/div[2]/div/div/div/iframe" % i
iframe = WAIT.until(EC.presence_of_element_located((By.XPATH, button2)))
driver.switch_to.frame(iframe) #跳轉(zhuǎn)到指定的itrame框架
data = WAIT.until(EC.presence_of_all_elements_located((By.TAG_NAME, 'p')))
if len(data) == 0: #這里寫了個(gè)判斷是因?yàn)椴糠謎frame沒(méi)數(shù)據(jù),為了不產(chǎn)生報(bào)錯(cuò)
tc_data = "" #沒(méi)有數(shù)據(jù)的地方使其為空寫到excel
else:
text = ""
for item in data:
text = text + item.text + '\n' #不同小標(biāo)簽的內(nèi)容會(huì)換行
tc_data.append(text) #字典的append()方法追加內(nèi)容
driver.switch_to.default_content() #切到出事的frame孕豹,為了跳出iframe,然后進(jìn)入下一個(gè)iframe
- 萬(wàn)萬(wàn)沒(méi)想到④:此外還遇到本身內(nèi)容為空的情況涩盾,程序會(huì)報(bào)錯(cuò),解決方法如下:
try:
tc_class = WAIT.until(EC.presence_of_element_located((By.XPATH, '/html/body/div/div[2]/div/div[2]/table/tr[1]/td[4]/span'))).text
except Exception as e: #出錯(cuò)也能繼續(xù)執(zhí)行
tc_class = ""
- 萬(wàn)萬(wàn)沒(méi)想到⑤:接著是保存到excel的部分:這里我寫了個(gè)循環(huán)励背,是為了解決在excel中追加新的sheet而不是覆蓋春霍。
import xlwt
import xlrd
from xlutils.copy import copy as xl_copy
if os.path.exists(u'%s.xls' % task_name):
#讀取存在的excel文檔
read_book = xlrd.open_workbook((u'%s.xls' % task_name), formatting_info=True)
write_book = xl_copy(read_book) #復(fù)制
#新增sheet
sheet = write_book.add_sheet(case_name, cell_overwrite_ok=True)
else:
#新建excel
write_book = xlwt.Workbook(encoding='utft-8', style_compression=0)
#新建sheet
sheet = write_book.add_sheet(case_name, cell_overwrite_ok=True)
# 表頭內(nèi)容
sheet.write(0, 0, '用例名稱')
sheet.write(0, 1, '用例描述')
sheet.write(0, 2, '用例步驟')
sheet.write(0, 3, 'Pass/Fail標(biāo)準(zhǔn)')
sheet.write(0, 4, '用例類別')
sheet.write(0, 5, '備注說(shuō)明')
sheet.write(0, 6, '結(jié)果')
n = 1
tc_num = 1
- 接上部分
def save_to_excel():
global n #這里很重要,設(shè)置全局變量
tc_name = WAIT.until(EC.presence_of_element_located((By.XPATH, '/html/body/div/div[2]/div/div[2]/table/tr[1]/td[2]/span'))).text
print("爬取第%d項(xiàng) tc_name: %s" % (n, tc_name))
try:
tc_class = WAIT.until(EC.presence_of_element_located((By.XPATH, '/html/body/div/div[2]/div/div[2]/table/tr[1]/td[4]/span'))).text
except Exception as e:
tc_class = ""
try:
tc_comment = WAIT.until(EC.presence_of_element_located((By.XPATH, '/html/body/div/div[2]/div/div[2]/div[2]/div[2]'))).text
except Exception as e:
tc_comment = ""
tc_data = []
for i in range(1, 4):
button2 = "/html/body/div/div[2]/div/div[4]/div[%d]/div[2]/div/div/div/iframe" % i
iframe = WAIT.until(EC.presence_of_element_located((By.XPATH, button2)))
driver.switch_to.frame(iframe)
data = WAIT.until(EC.presence_of_all_elements_located((By.TAG_NAME, 'p')))
if len(data) == 0:
tc_data = ""
else:
text = ""
for item in data:
text = text + item.text + '\n'
tc_data.append(text)
driver.switch_to.default_content()
tc_description = tc_data[0]
tc_step = tc_data[1]
tc_criteria = tc_data[2]
sheet.write(n, 0, tc_name)
sheet.write(n, 1, tc_description)
sheet.write(n, 2, tc_step)
sheet.write(n, 3, tc_criteria)
sheet.write(n, 4, tc_class)
sheet.write(n, 5, tc_comment)
n += 1
- 全部代碼:
# -*- coding:utf-8 -*-
# Copyright (c)2019, KanShan,All rightsreserved
# Author:KanShan
#Description:輸入阿里avms的task_id和頁(yè)面數(shù)叶眉,自動(dòng)抓取Case_info并保存...
import time
import xlwt
import xlrd
import os
from xlutils.copy import copy as xl_copy
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
def login():
driver.get(url)
username = WAIT.until(EC.presence_of_element_located((By.CSS_SELECTOR, "#exampleInputUser")))
password = WAIT.until(EC.presence_of_element_located((By.CSS_SELECTOR, "#exampleInputPassword")))
submit = WAIT.until(EC.element_to_be_clickable((By.XPATH, '//*[@id="login-button"]')))
username.send_keys("XXXX")
password.send_keys("XXXX")
submit.click()
driver.refresh()
#task_name.xlsx
task_name = WAIT.until(EC.presence_of_element_located((By.XPATH, '/html/body/div[2]/div/div/div[4]/div[2]/div/div[2]/div/table/tbody/tr/td[2]/span'))).text
#task里面的case
case_name = WAIT.until(EC.presence_of_element_located((By.XPATH, '//*[@id="app"]/div/div/div[2]/table/tr[1]/td[2]/span'))).text
return task_name, case_name
def save_to_excel():
global n
tc_name = WAIT.until(EC.presence_of_element_located((By.XPATH, '/html/body/div/div[2]/div/div[2]/table/tr[1]/td[2]/span'))).text
print("爬取第%d項(xiàng) tc_name: %s" % (n, tc_name))
try:
tc_class = WAIT.until(EC.presence_of_element_located((By.XPATH, '/html/body/div/div[2]/div/div[2]/table/tr[1]/td[4]/span'))).text
except Exception as e:
tc_class = ""
try:
tc_comment = WAIT.until(EC.presence_of_element_located((By.XPATH, '/html/body/div/div[2]/div/div[2]/div[2]/div[2]'))).text
except Exception as e:
tc_comment = ""
tc_data = []
for i in range(1, 4):
button2 = "/html/body/div/div[2]/div/div[4]/div[%d]/div[2]/div/div/div/iframe" % i
iframe = WAIT.until(EC.presence_of_element_located((By.XPATH, button2)))
driver.switch_to.frame(iframe)
data = WAIT.until(EC.presence_of_all_elements_located((By.TAG_NAME, 'p')))
if len(data) == 0:
tc_data = ""
else:
text = ""
for item in data:
text = text + item.text + '\n'
tc_data.append(text)
driver.switch_to.default_content()
tc_description = tc_data[0]
tc_step = tc_data[1]
tc_criteria = tc_data[2]
sheet.write(n, 0, tc_name)
sheet.write(n, 1, tc_description)
sheet.write(n, 2, tc_step)
sheet.write(n, 3, tc_criteria)
sheet.write(n, 4, tc_class)
sheet.write(n, 5, tc_comment)
n += 1
def new_page(button1):
page_detail = WAIT.until(EC.element_to_be_clickable((By.XPATH, button1)))
page_detail.click()
time.sleep(2)
new_page = driver.window_handles[-1]
page = driver.window_handles[0]
driver.switch_to.window(new_page)
save_to_excel()
time.sleep(1)
driver.close()
driver.switch_to.window(page)
def page_detail():
data = driver.find_element_by_xpath('/html/body/div[2]/div/div/div[8]/div[2]/div/table/tbody').find_elements_by_tag_name('tr')
length = len(data) - 1
indexs = length
for index in range(2, indexs + 2):
if length <= 0:
break
else:
button1 = ('//*[@id="app"]/div/div/div[8]/div[2]/div/table/tbody/tr[%d]/td[3]/div/div/a' % index)
try:
new_page(button1)
length -= 1
except Exception as e:
pass
button2 = (
'/html/body/div[2]/div/div/div[8]/div[2]/div/table/tbody/tr[%d]/td[2]/table/tr/td[2]/div/div/span' % index)
result = WAIT.until(EC.presence_of_element_located((By.XPATH, button2))).text
print('抓取測(cè)試結(jié)果:%s' % result)
global n
n -= 1
sheet.write(n, 6, result)
n += 1
length -= 3
def main():
print("爬取Task_name: %s" % task_name)
print("爬取Case_name: %s" % case_name)
if pages == 1:
print("爬取第1頁(yè)")
page_detail()
print("爬取完成:共1頁(yè)址儒,保存中")
driver.close()
elif pages >= 2:
try:
page_detail()
print("爬取完成:第1頁(yè)")
for page in range(2, pages + 1):
print("爬取第%d頁(yè)" % page)
if page > 6:
next_page = WAIT.until(EC.element_to_be_clickable((By.XPATH, '//*[@id="app"]/div/div/div[8]/div[2]/div/div[2]/div/div/ul/li[7]')))
else:
next_page = WAIT.until(EC.element_to_be_clickable((By.XPATH, '//*[@id="app"]/div/div/div[8]/div[2]/div/div[2]/div/div/ul/li[%d]' % (page + 1))))
next_page.click()
time.sleep(3)
page_detail()
print("爬取完成:第%d頁(yè)" % page)
finally:
driver.close()
print("爬取完成:共%d頁(yè),保存中" % pages)
else:
print("頁(yè)數(shù)輸入錯(cuò)誤衅疙,請(qǐng)輸入大于等于1的整數(shù)")
exit()
if __name__ =='__main__':
task_id = input("請(qǐng)輸入需要抓取的task_id:")
url = 'http://www.aliavms.cn:7001/tsmanager/index.html#/detail?task_id=' + task_id
pages_string = input("請(qǐng)輸入需要抓取得頁(yè)數(shù):")
pages = int(pages_string)
#chrome_options = webdriver.ChromeOptions()
#chrome_options.add_argument('headless')
#chrome_options.add_argument('disable-gpu')
#driver = webdriver.Chrome(options=chrome_options)
driver = webdriver.Firefox()
WAIT = WebDriverWait(driver, 10)
task_name, case_name = login()
if os.path.exists(u'%s.xls' % task_name):
read_book = xlrd.open_workbook((u'%s.xls' % task_name), formatting_info=True)
write_book = xl_copy(read_book)
sheet = write_book.add_sheet(case_name, cell_overwrite_ok=True)
else:
write_book = xlwt.Workbook(encoding='utft-8', style_compression=0)
sheet = write_book.add_sheet(case_name, cell_overwrite_ok=True)
sheet.write(0, 0, '測(cè)試用例名稱')
sheet.write(0, 1, '測(cè)試用例描述')
sheet.write(0, 2, '測(cè)試用例步驟')
sheet.write(0, 3, '測(cè)試Pass/Fail標(biāo)準(zhǔn)')
sheet.write(0, 4, '測(cè)試用例類別')
sheet.write(0, 5, '備注說(shuō)明')
sheet.write(0, 6, '測(cè)試結(jié)果')
n = 1
tc_num = 1
main()
#保存為excel文件
write_book.save(u'%s.xls' % task_name)