今天給大家?guī)淼氖莃視頻數(shù)據(jù)爬取并存入MySQL數(shù)據(jù)庫的爬蟲藏雏,話不多說,小編來帶領大家一步步完成翎蹈。
需導入的庫
import requests
from lxml import etree
import time
from fake_useragent import UserAgent
import pymysql
爬取網(wǎng)頁主體函數(shù)
這是一個html網(wǎng)頁还蹲,所以我們下面直接使用xpath將所需的數(shù)據(jù)提取出來就行了
代碼:
ua = UserAgent(use_cache_server=False)
infos=[]
def spider(url):
try:
response=requests.get(url,headers={"User-Agent":ua.random})
except:
try:
if response.status_code !=200:
response = requests.get(url, headers={"User-Agent": ua.random})
except:
pass
try:
HTML=etree.HTML(response.text)
lis=HTML.xpath('//ul[@class="video-contain clearfix"]/li')
for li in lis:
info={
'title':li.xpath('./a/@title')[0],
'href':"https:"+li.xpath('./a/@href')[0],
'time':li.xpath('.//span/text()')[0].strip()
}
infos.append(info)
except:
pass
存儲入mysql:
我們這里想將數(shù)據(jù)存入數(shù)據(jù)庫哲戚,首先需要先連接數(shù)據(jù)庫奔滑,將相關參數(shù)填對。再創(chuàng)建游標顺少,寫mysql執(zhí)行語句來實現(xiàn)我們想要的操作朋其。
代碼:
def save_to_mysql(key,infos):
conn = pymysql.connect(host='localhost', user='root', password='0000',
database='pymysql_demo', port=3306)
cursor = conn.cursor()
sql_createTb = """CREATE TABLE IF NOT EXISTS {}(
id INT NOT NULL AUTO_INCREMENT,
title VARCHAR(500),
href char(80),
time CHAR(80),
PRIMARY KEY(id))
""".format(key)
cursor.execute(sql_createTb)
for info in infos:
title=info['title']
href=info['href']
time=info['time']
sql = '''
insert into {}(title,href,time) value(%s,%s,%s)
'''.format(key)
cursor.execute(sql, (title,href,time))
conn.commit()
conn.close()
控制主函數(shù)
我們使用主函數(shù)來實現(xiàn)參數(shù)的傳遞,并依次控制網(wǎng)頁爬取函數(shù)與數(shù)據(jù)保存函數(shù)脆炎。
代碼:
def main():
key=input("請輸入搜索內容:")
pages=int(input("爬取頁數(shù):"))
for page in range(1,pages+1):
print("第"+str(page)+"頁")
url="https://search.bilibili.com/all?keyword="+str(key)+"&page="+str(page)+""
spider(url)
print(infos)
save_to_mysql(key,infos)
if __name__ == '__main__':
main()
運行效果如下:
python:
MySQL:
這里我們已經(jīng)順利了完成了我們的要求梅猿,將數(shù)據(jù)成功存入了mysql,還等什么腕窥,趕快去試試吧粒没!