B 站小視頻網(wǎng)址:http://vc.bilibili.com/p/eden/rank#/?tab=%E5%85%A8%E9%83%A8
提取API
通過 F12 打開開發(fā)者模式阿蝶,然后在 Networking -> Name 字段下找到這個鏈接
http://api.vc.bilibili.com/board/v1/ranking/top?page_size=10&next_offset=&tag=%E4%BB%8A%E6%97%A5%E7%83%AD%E9%97%A8&platform=pc
image
查看 Headers 屬性
Request URL這個屬性值呼巴,我們向下滑動加載視頻的過程中,發(fā)現(xiàn)只有這段url是不變的
http://api.vc.bilibili.com/board/v1/ranking/top?
next_offset 會一直變化芭挽,我們可以猜測褐墅,這個可能就是獲取下一個視頻序號翩蘸,我們只需要把這部分參數(shù)取出來浪汪,把 next_offset 寫成變量值判没,用 JSON 的格式返回到目標(biāo)網(wǎng)頁即可
image
代碼實現(xiàn)
通過上面的嘗試寫了段代碼,發(fā)現(xiàn) B 站在一定程度上做了反爬蟲操作庄新,所以我們需要先獲取 headers 信息鞠眉,否則下載下來的視頻是空的薯鼠,然后定義 params 參數(shù)存儲 JSON 數(shù)據(jù),然后通過 requests.get 去獲取其參數(shù)值信息械蹋,用 JSON的格式返回到目標(biāo)網(wǎng)頁即可出皇,實現(xiàn)代碼如下:
defget_json(url):
headers = {
'User-Agent':
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'
}
params = {
'page_size':10,
'next_offset': str(num),
'tag':'今日熱門',
'platform':'pc'
}
try:
html = requests.get(url,params=params,headers=headers)
returnhtml.json()
exceptBaseException:
print('request error')
pass
為了能夠清楚的看到下載的情況,添加一個下載器上去哗戈,實現(xiàn)代碼如下:
def download(url,path):
start= time.time() # 開始時間
size=0
headers = {
'User-Agent':
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'
}
response = requests.get(url,headers=headers,stream=True) # stream屬性必須帶上
chunk_size =1024# 每次下載的數(shù)據(jù)大小
content_size =int(response.headers['content-length']) # 總大小
ifresponse.status_code ==200:
print('[文件大小]:%0.2f MB'%(content_size / chunk_size /1024)) # 換算單位
withopen(path,'wb')asfile:
fordatainresponse.iter_content(chunk_size=chunk_size):
file.write(data)
size+=len(data) # 已下載的文件大小
效果如下:
image
將上面的代碼進(jìn)行匯總郊艘,整個實現(xiàn)過程如下:
#-*-coding:utf-8-*-
importrequests
importrandom
importtime
defget_json(url):
headers = {
'User-Agent':
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'
}
params = {
'page_size':10,
'next_offset': str(num),
'tag':'今日熱門',
'platform':'pc'
}
try:
html = requests.get(url,params=params,headers=headers)
returnhtml.json()
exceptBaseException:
print('request error')
pass
defdownload(url,path):
start = time.time()# 開始時間
size =0
headers = {
'User-Agent':
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'
}
response = requests.get(url,headers=headers,stream=True)# stream屬性必須帶上
chunk_size =1024# 每次下載的數(shù)據(jù)大小
content_size = int(response.headers['content-length'])# 總大小
ifresponse.status_code ==200:
print('[文件大小]:%0.2f MB'%(content_size / chunk_size /1024))# 換算單位
withopen(path,'wb')asfile:
fordatainresponse.iter_content(chunk_size=chunk_size):
file.write(data)
size += len(data)# 已下載的文件大小
if__name__ =='__main__':
foriinrange(10):
url ='http://api.vc.bilibili.com/board/v1/ranking/top?'
num = i*10+1
html = get_json(url)
infos = html['data']['items']
forinfoininfos:
title = info['item']['description']# 小視頻的標(biāo)題
video_url = info['item']['video_playurl']# 小視頻的下載鏈接
print(title)
# 為了防止有些視頻沒有提供下載鏈接的情況
try:
download(video_url,path='%s.mp4'%title)
print('成功下載一個!')
exceptBaseException:
print('涼涼,下載失敗')
pass
time.sleep(int(format(random.randint(2,8))))# 設(shè)置隨機(jī)等待時間
爬取效果圖如下:
image