目的
在整個疫情期間,公務(wù)員的招生也被推遲了,因為陜西省的招生網(wǎng)址很復(fù)雜嫉你,自己寫個爬蟲實時獲取信息很方便
網(wǎng)址
操作步驟
1.利用requests庫進行請求
2.利用lxml或者bs4或者正則進行信息提取
3.寫成程序
程序代碼:
- 首先導(dǎo)入requests, lxml, bs4的庫函數(shù)
import requests
from lxml import etree
from bs4 import BeautifulSoup
- 進行請求并且添加headers
url = 'http://www.sxrsks.cn/website/bm_index.aspx'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 +'
'(KHTML, like Gecko) Chrome/80.0.3987.163 +'
'Safari/537.36 Edg/80.0.361.111'
}
html = requests.get(url, headers=headers)
res = etree.HTML(html.text)
- 利用lxml進行篩選
res = etree.HTML(html.text)
for result in res.xpath('//div[@ class="er_right right"]/div[@class="ksgg_list list_box"]/ul/li'):
print(result.xpath('.//span/text()'))
print(result.xpath('.//a/text()'))
-
運行結(jié)果