隨便打開一個(gè)西瓜視頻地址丹弱,比如:https://www.ixigua.com/6903716672067076612
查看源代碼
image.png
可以看到所有信息參數(shù) 基本都包括在里面了
url='https://www.ixigua.com/6903716672067076612'
response = requests.get(url, verify=False, headers=headers).text
pattern = re.compile('(?<=window._SSR_HYDRATED_DATA=).*?(?=</script>)')
jsonResult = pattern.findall(response)[0]
這里直接找到_SSR_HYDRATED_DATA參數(shù) 正則匹配出來(lái)script標(biāo)簽中的內(nèi)容
結(jié)果是一段json數(shù)據(jù) 不過(guò)有一些小問(wèn)題
image.png
就是部分value值是undefined
所以替換一下 直接給他加個(gè)雙引號(hào)
jsonResult = jsonResult.replace(':undefined', ':"undefined"')
image.png
我們需要的信息就在這里面了
infor=jsonData['anyVideo']['gidInformation']['packerData']['video']
dash=infor['videoResource']['dash']
if 'dynamic_video' in dash.keys():
audioUrl=dash['dynamic_video']['dynamic_audio_list'][0]['main_url']
videoUrl=dash['dynamic_video']['dynamic_video_list'][0]['main_url']
else:
print('未獲取到源地址')
這里我們直接獲取到音頻 視頻的源地址
不過(guò)main_url 還是加密的
image.png
audio_url = base64.b64decode(audioUrl).decode("utf-8")
video_url = base64.b64decode(videoUrl).decode("utf-8")
再用base解密一下 就獲取到了音頻 視頻的源地址
image.png
完整代碼:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time : 2021/2/23 12:18
# @Author : pp
# @Software: PyCharm
import requests
import urllib3
urllib3.disable_warnings()
import re
import json
import base64
cookie='你的cookie'
headers={
"user-agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36",
"cookie":cookie
}
def getRealUrl(url):
response = requests.get(url, verify=False, headers=headers).text
pattern = re.compile('(?<=window._SSR_HYDRATED_DATA=).*?(?=</script>)')
jsonResult = pattern.findall(response)[0]
print(jsonResult)
jsonResult = jsonResult.replace(':undefined', ':"undefined"')
jsonData = json.loads(jsonResult)
print(jsonResult)
infor=jsonData['anyVideo']['gidInformation']['packerData']['video']
dash=infor['videoResource']['dash']
if 'dynamic_video' in dash.keys():
audioUrl=dash['dynamic_video']['dynamic_audio_list'][0]['main_url']
videoUrl=dash['dynamic_video']['dynamic_video_list'][0]['main_url']
else:
print('未獲取到源地址')
audio_url = base64.b64decode(audioUrl).decode("utf-8")
video_url = base64.b64decode(videoUrl).decode("utf-8")
return audio_url,video_url
baseUrl='https://www.ixigua.com/6903716672067076612'
audio_url,video_url=getRealUrl(baseUrl)
print(audio_url)
print(video_url)