以華數(shù)TV播放頁地址 http://www.wasu.cn/Play/show/id/7882670 為例沪斟,說明如何得到視頻的真實地址宇攻。
打開瀏覽器的開發(fā)者工具查看加載播放頁面時的網(wǎng)絡(luò)請求对蒲,經(jīng)分析發(fā)現(xiàn),從頁面加載到視頻開始播放俏让,依次出現(xiàn)如下相關(guān)請求:
http://www.wasu.cn/Api/getPlayInfoById/id/7882670/datatype/xml
http://apiontime.wasu.cn/Auth/getVideoUrl?id=7882670&key=11ac882a1f434800cf661ae5dbd81ca4&url=aHR0cDovL3ZvZHBjLWFsLndhc3UuY24vcGNzYW4xMi9tYW1zL3ZvZC8yMDE2MTAvMjcvMTcvMjAxNjEwMjcxNzE5MTYwOTQ5NzQ1NmQxNi5tcDQ=
http://vodpc-al.wasu.cn/pcsan12/mams/vod/201610/27/17/2016102717191609497456d16.mp4?auth_key=97f712597251633ab91f611e75b058ff-1477935556-f8dc297b46735af55223e73d3e3af535-&vid=7882670&cid=4&start=3165&end=3170&version=P2PPlayer_V.4.1.0
從字面來看笔咽,第1個接口請求用來獲取播放信息,第2個接口請求用來獲取視頻地址蔫巩,第3個請求就是視頻的真實地址了谆棱。
獲取播放信息
播放信息獲取接口 http://www.wasu.cn/Api/getPlayInfoById/id/7882670/datatype/xml 中的7882670即為視頻ID,在視頻播放頁地址中可以提取到批幌。
請求該接口础锐,我們發(fā)現(xiàn)返回如下有用信息:
<mp4>
<hd1>aHR0cDovL3ZvZHBjLWFsLndhc3UuY24vcGNzYW4xMi9tYW1zL3ZvZC8yMDE2MTAvMjcvMTcvMjAxNjEwMjcxNzE5MTYwOTQ5NzQ1NmQxNi5tcDQ=</hd1>
<hd4>aHR0cDovL3ZvZHBjLWFsLndhc3UuY24vcGNzYW4xMi9tYW1zL3ZvZC8yMDE2MDkvMTgvMDcvMjAxNjA5MTgwNzA0MjQ0NDJjMzU2OGJmMV9mN2Q2YjNhOC5tcDQ=</hd4>
<hd3>aHR0cDovL3ZvZHBjLWFsLndhc3UuY24vcGNzYW4xMi9tYW1zL3ZvZC8yMDE2MTAvMjcvMTcvMjAxNjEwMjcxNzIzMTY1MTEzODRlNzliNy5tcDQ=</hd3>
<hd2>aHR0cDovL3ZvZHBjLWFsLndhc3UuY24vcGNzYW4xMi9tYW1zL3ZvZC8yMDE2MTAvMjcvMTcvMjAxNjEwMjcxNzIwMDUzNDFjNDI4ZWRhOS5tcDQ=</hd2>
<hd0>aHR0cDovL3ZvZHBjLWFsLndhc3UuY24vcGNzYW4xMi9tYW1zL3ZvZC8yMDE2MTAvMjcvMTcvMjAxNjEwMjcxNzI1NTk5ODhlOGZlMmFlZi5tcDQ=</hd0>
</mp4>
hd0~hd4為視頻的清晰度,每個加密的字符串是什么含義荧缘,目前我們還無法得出皆警。
獲取視頻地址
接下來看看獲取視頻地址的接口是如何構(gòu)造的。
- 參數(shù)id為視頻ID
- 參數(shù)key尚不知道從哪里來
- 參數(shù)url就是第1個接口返回信息中某個清晰度對應(yīng)的加密字符串
現(xiàn)在的問題是參數(shù)key從哪里得到截粗。實際上信姓,我們可以在播放頁源碼中找到key
_playKey = '11ac882a1f434800cf661ae5dbd81ca4'
OK鸵隧,第2個接口的參數(shù)搞定了。我們看看這個接口返回的數(shù)據(jù)是什么樣的:
<?xml version="1.0" encoding="utf-8"?>
<root>
<id></id>
<title></title>
<video>
<![CDATA[8ec7ZEZEWDowIRsyTmMjGXQKUiQYai5SPn8VCUZ+ciYNEDdrejlBEQMud3EBJG8Caz5mOylbWFcTXhQOZCUJbi5Ybj9kSy1BHkgdHCkzRRMYMwomAVQAW0UyXFlzdFF0UxBPB0gbCRZTUhYrfi9fNgYgTypyCGhvFn0VXB40IWdla0dPdwEreFUlL2J+CUwhVRR9GQhQbT1cWX0lN0p4CDdoODEwO2sDTFx2AT1qJzJkHl0POlsjfRkSEGsQMDE0D3smYU1fVw==]]>
</video>
<page></page>
</root>
是個XML格式內(nèi)容意推,video標(biāo)簽包含的應(yīng)該就是視頻的地址豆瘫,只不過是個加密后的地址,就看如何解密了菊值。
在瀏覽器開發(fā)者工具中外驱,可以看到第2個請求是Flash播放器發(fā)出的。很有可能Flash播放器中對加密視頻地址作了解密腻窒。經(jīng)過反編譯華數(shù)TV的Flash播放器文件WsPlayer.swf昵宇,找到了相關(guān)解密方法,翻譯成Python語言如下:
def url_decode(param1):
# md5_hex是用來計算md5哈希值的
param2 = md5_hex('wasu!@#48217#$@#1')
loc7 = md5_hex(param2[0:16])
loc8 = md5_hex(param2[16:32])
loc11 = loc7 + md5_hex(loc7 + param1[0:4])
loc12 = len(loc11)
param1 = base64.b64decode(param1[4:])
loc13 = len(param1)
loc14 = []
loc15 = []
loc16 = 0
while loc16 < 128:
loc14.append(loc16)
loc15.append(ord(loc11[loc16 % loc12]) & 255)
loc16 += 1
loc16 = 0
loc17 = 0
loc19 = 0
while loc16 < 128:
loc17 = (loc17 + loc14[loc16] + loc15[loc16]) % 128
loc19 = loc14[loc16]
loc14[loc16] = loc14[loc17]
loc14[loc17] = loc19
loc16 += 1
loc17 = 0
loc16 = 0
loc18 = 0
loc20 = []
while loc16 < loc13:
loc18 = (loc18 + 1) % 128
loc17 = (loc17 + loc14[loc18]) % 128
loc19 = loc14[loc18]
loc14[loc18] = loc14[loc17]
loc14[loc17] = loc19
t = ord(param1[loc16]) & 255 ^ loc14[(loc14[loc18] + loc14[loc17]) % 128]
loc20.append(chr(ord(param1[loc16]) & 255 ^ loc14[(loc14[loc18] + loc14[loc17]) % 128]))
loc16 += 1
return (''.join(loc20))[26:]
至此儿子,大功告成瓦哎。
Python代碼示例
import re
import requests
import base64
import hashlib
from pyquery import PyQuery as pq
def md5_hex(data):
m = hashlib.md5()
m.update(data)
return m.hexdigest()
url = 'http://www.wasu.cn/Play/show/id/7882670'
# get vid
vid = re.search('/id/(\d+)', url).group(1)
# get key
r = requests.get(url)
key = re.search('_playKey\s*=\s*\'(\w+)\'', r.content).group(1)
r = requests.get('http://www.wasu.cn/Api/getPlayInfoById/id/%s/datatype/xml' % vid)
d = pq(r.content)
for definition in ('hd3', 'hd2', 'hd1', 'hd0'):
element = d('mp4')(definition)
r = requests.get('http://apiontime.wasu.cn/Auth/getVideoUrl?id=%s&key=%s&url=%s' % (vid, key, element.text()))
tmp_d = pq(r.content)
encoded_url = tmp_d('video').text()
print definition, url_decode(encoded_url)