需導入的庫

import requests
from lxml import etree
import time
from fake_useragent import UserAgent
import pymysql

爬取網(wǎng)頁主體函數(shù)

真夜貓.JPG

這是一個html網(wǎng)頁还蹲，所以我們下面直接使用xpath將所需的數(shù)據(jù)提取出來就行了

代碼：

ua = UserAgent(use_cache_server=False)
infos=[]
def spider(url):
    try:
        response=requests.get(url,headers={"User-Agent":ua.random})
    except:
        try:
            if response.status_code !=200:
                response = requests.get(url, headers={"User-Agent": ua.random})
        except:
            pass

    try:
        HTML=etree.HTML(response.text)
        lis=HTML.xpath('//ul[@class="video-contain clearfix"]/li')
        for li in lis:
            info={
                    'title':li.xpath('./a/@title')[0],
                    'href':"https:"+li.xpath('./a/@href')[0],
                    'time':li.xpath('.//span/text()')[0].strip()
            }
            infos.append(info)
    except:
        pass

存儲入mysql：

我們這里想將數(shù)據(jù)存入數(shù)據(jù)庫哲戚，首先需要先連接數(shù)據(jù)庫奔滑，將相關參數(shù)填對。再創(chuàng)建游標顺少，寫mysql執(zhí)行語句來實現(xiàn)我們想要的操作朋其。

代碼：

def save_to_mysql(key,infos):
    conn = pymysql.connect(host='localhost', user='root', password='0000',
                           database='pymysql_demo', port=3306)
    cursor = conn.cursor()
    sql_createTb = """CREATE TABLE IF NOT EXISTS  {}(
                     id INT NOT NULL AUTO_INCREMENT,
                     title  VARCHAR(500),
                     href char(80),
                     time CHAR(80),
                     PRIMARY KEY(id))
                     """.format(key)
    cursor.execute(sql_createTb)
    for info in infos:
        title=info['title']
        href=info['href']
        time=info['time']
        sql = '''
        insert into {}(title,href,time) value(%s,%s,%s)
        '''.format(key)
        cursor.execute(sql, (title,href,time))
        conn.commit()
    conn.close()

控制主函數(shù)

我們使用主函數(shù)來實現(xiàn)參數(shù)的傳遞，并依次控制網(wǎng)頁爬取函數(shù)與數(shù)據(jù)保存函數(shù)脆炎。

代碼:

def main():
    key=input("請輸入搜索內容：")
    pages=int(input("爬取頁數(shù)："))
    for page in range(1,pages+1):
        print("第"+str(page)+"頁")
        url="https://search.bilibili.com/all?keyword="+str(key)+"&page="+str(page)+""
        spider(url)
    print(infos)
    save_to_mysql(key,infos)

if __name__ == '__main__':
    main()

運行效果如下：

python：

真夜貓.JPG

MySQL：

真夜貓.JPG

這里我們已經(jīng)順利了完成了我們的要求梅猿，將數(shù)據(jù)成功存入了mysql，還等什么腕窥，趕快去試試吧粒没！

最后編輯于：2020.09.09 11:09:50

?著作權歸作者所有,轉載或內容合作請聯(lián)系作者