背景
python3.8
window10
Chrome
Charles
目標
- 站點:https://weixin.sogou.com/
- 獲取微信公眾號文章內容
過程
搜狗微信列表
image.png
- 獲取詳情頁鏈接
href="/link?url=dn9a_-gY295K0Rci_xozVXfdMkSQTLW6cwJThYulHEtVjXrGTiVgS05vDMvAw8toT_Tn2j42MFRVGL8s5vDf41qXa8Fplpd9AqOK-jhx_qZB8By7fitCFPL-rOAZcbLVyiX2MqCvgw4ptMLwH26p9dm4YaL6ZYtS4NeqHS5Q_nTllRIHVAW4nkfffpKeIKEhRBeQ7G0thG5XpEcc4HfRvLd1KUjSkao4yxsOMKOcnvZDSDZWVslU657CZyNbec73WkNT_QICA71r1dbZWbKUeQ..&type=2&query=%E6%B5%8E%E5%8D%97%E5%A4%A9%E6%B0%94&token=A971863740828180BABF63558B7CED06BA05DE98622EA7A3"
拼接出完整鏈接
url = news.css('div.txt-box > h3 > a::attr(href)').extract_first()
url = urllib.parse.urljoin(response.url, url)
搜狗微信詳情
- 詳情頁鏈接
https://weixin.sogou.com/link?url=dn9a_-gY295K0Rci_xozVXfdMkSQTLW6cwJThYulHEtVjXrGTiVgS05vDMvAw8toT_Tn2j42MFRVGL8s5vDf41qXa8Fplpd9AqOK-jhx_qZB8By7fitCFPL-rOAZcbLVyiX2MqCvgw4ptMLwH26p9dm4YaL6ZYtS4NeqHS5Q_nTllRIHVAW4nkfffpKeIKEhRBeQ7G0thG5XpEcc4HfRvLd1KUjSkao4yxsOMKOcnvZDSDZWVslU657CZyNbec73WkNT_QICA71r1dbZWbKUeQ..&type=2&query=%E6%B5%8E%E5%8D%97%E5%A4%A9%E6%B0%94&token=A971863740828180BABF63558B7CED06BA05DE98622EA7A3
-
瀏覽器直接訪問詳情頁鏈接(清空cookie)
image.png -
postman方式
image.png 瀏覽器刷新列表頁(set-cookie廊散,緩存,各種參數(shù)產生)
-
瀏覽器直接訪問詳情頁
image.png
小結:
- 訪問詳情需要帶參數(shù)
- 網(wǎng)頁跳轉(301)
問題展示
- 對IP有限制
- 列表頁訪問速度太快會被封(已驗證)
- 當日訪問總量(未驗證)
- 如何獲取跳轉后的真實鏈接
- 微信鏈接會失效
解決過程
IP的問題
代理IP
獲取搜狗微信真實鏈接
image.png
- 發(fā)生了一次301重定向
curl -H "Host: mp.weixin.qq.com" -H "Upgrade-Insecure-Requests: 1" -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36" -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9" -H "Referer: https://weixin.sogou.com/link?url=dn9a_-gY295K0Rci_xozVXfdMkSQTLW6cwJThYulHEtVjXrGTiVgS05vDMvAw8toa5ZWgFUPIu5VGL8s5vDf41qXa8Fplpd9AqOK-jhx_qZB8By7fitCFPL-rOAZcbLVyiX2MqCvgw4ptMLwH26p9dm4YaL6ZYtS4NeqHS5Q_nTllRIHVAW4nrFzx6-3M-9Ud3TUgO8nEfttSG3lUTs25OkkcsTUz87GouPLeK06uSd2rPlJlESjn-37p7qSMapfYNakzj9lgtwYJSSFPgfogQ..&type=2&query=%E6%B5%8E%E5%8D%97%E5%A4%A9%E6%B0%94&token=A9871CD7955756556E6BB54BA8C36F206F25D310622EAE02&k=48&h=F" -H "Accept-Language: zh-CN,zh;q=0.9" -H "Cookie: rewardsn=; wxtokenkey=777" --compressed "http://mp.weixin.qq.com/s?src=11×tamp=1647226370&ver=3675&signature=ykBEMHUd-2F9qhxMiD*XJI9QZi4qDEaneJX2DGYg0tibi8Jn*QtbK0-MYOhyn9AC7Jmpw-gN83DXTNFBpU37jcgNnolYWG2ZSRoKGX7hdMQC387Oj9sGtV5We9q6d59*&new=1"
請求的鏈接
http://mp.weixin.qq.com/s?src=11×tamp=1647226370&ver=3675&signature=ykBEMHUd-2F9qhxMiD*XJI9QZi4qDEaneJX2DGYg0tibi8Jn*QtbK0-MYOhyn9AC7Jmpw-gN83DXTNFBpU37jcgNnolYWG2ZSRoKGX7hdMQC387Oj9sGtV5We9q6d59*&new=1
響應內容
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="content-type" content="text/html;charset=utf8">
<meta id="viewport" name="viewport" content="width=device-width,initial-scale=1.0,maximum-scale=1.0,user-scalable=0" />
<title>未知錯誤</title>
<style>
html,body{
height:100%;
padding:0px;
margin:0px;
}
body{
background-color: #F4F4F4;
}
.panel {
padding: 18px 22px 10px;
}
.mesg-block{
margin-bottom:20px;
}
.mesg-block p{
font-size: 16px;
line-height: 1.3em;
color: #858585;
text-shadow: 0px 1px 0px #FFF;
text-align:center;
}
</style>
</head>
<body>
<div class="panel">
<div class="mesg-block">
<p>未知錯誤,請稍后再試</p>
</div>
</div>
<script>
(function(){
document.addEventListener('WeixinJSBridgeReady', function onBridgeReady() {
var appId = '',
imgUrl = ''
link = 'http://mp.weixin.qq.com/mp/conference/default/share',
title = '失效的驗證頁面'
desc = '你暫無權限查看此頁面內容谈飒。',
content = '#微信分享#篮幢,你暫無權限查看此頁面內容鹰祸。';
WeixinJSBridge.on('menu:share:appmessage', function(argv){
WeixinJSBridge.invoke('sendAppMessage',{
"appid":appId,
"img_url":imgUrl,
"img_width":"640",
"img_height":"640",
"link":link,
"desc":desc,
"title":title
}, function(res) {})
});
WeixinJSBridge.on('menu:share:timeline', function(argv){
WeixinJSBridge.invoke('shareTimeline',{
"img_url":imgUrl,
"img_width":"640",
"img_height":"640",
"link":link,
"desc": desc,
"title":title
}, function(res) {
});
});
var weiboContent = '';
WeixinJSBridge.on('menu:share:weibo', function(argv){
WeixinJSBridge.invoke('shareWeibo',{
"content":content,
"url":link,
}, function(res) {
});
});
WeixinJSBridge.call('hideOptionMenu');
}, false);
})();
</script>
</body>
</head>
</html>
可以發(fā)現(xiàn)這個不是真正的鏈接
- 跳轉后的鏈接可以通過response headers 中的 Location獲取
Location: https://mp.weixin.qq.com/s?src=11×tamp=1647226370&ver=3675&signature=ykBEMHUd-2F9qhxMiD*XJI9QZi4qDEaneJX2DGYg0tibi8Jn*QtbK0-MYOhyn9AC7Jmpw-gN83DXTNFBpU37jcgNnolYWG2ZSRoKGX7hdMQC387Oj9sGtV5We9q6d59*&new=1
image.png
-
真實有效請求
在跳轉后又發(fā)生一次請求侥猩,這個就是真正的鏈接
image.png
請求地址
https://mp.weixin.qq.com/s?src=11×tamp=1647226370&ver=3675&signature=ykBEMHUd-2F9qhxMiD*XJI9QZi4qDEaneJX2DGYg0tibi8Jn*QtbK0-MYOhyn9AC7Jmpw-gN83DXTNFBpU37jcgNnolYWG2ZSRoKGX7hdMQC387Oj9sGtV5We9q6d59*&new=1
分析:
- (真實有效請求)的鏈接地址是從(重定向請求)中的Location中的獲取的
- 因此獲得到(301重定向)的請求鏈接就可以獲得真正的地址
繼續(xù)
-
我們嘗試直接搜索一下真實鏈接中的一部分(各種試秫筏。莫杈。互例。)
image.png 真的找到一條可疑請求請求(我們稱之為:拼接請求)
響應內容
<meta content="always" name="referrer">
<script>
(new Image()).src = 'https://weixin.sogou.com/approve?uuid=' + '7bc50a8b-2449-43a3-b852-ae83b83ee01c' + '&token=' + 'A9874F20955756556E6BB54BA8C36F206F25D310622EAE11' + '&from=inner';
setTimeout(function () {
var url = '';
url += 'http://mp.w';
url += 'eixin.qq.co';
url += 'm/s?src=11&';
url += 'timestamp=1';
url += '647226370&v';
url += 'er=3675&sig';
url += 'nature=ykBE';
url += 'MHUd-2F9qhx';
url += 'MiD*XJI9QZi';
url += '4qDEaneJX2D';
url += 'GYg0tibi8Jn*QtbK0-MYOhyn9AC7Jmpw-gN83DXTNFBpU37jcgNnolYWG2ZSRoKGX7hdMQC387Oj9sGtV5We9q6d59*&new=1';
url.replace("@", "");
window.location.replace(url)
},100);
</script>
curl -H "Host: weixin.sogou.com" -H "sec-ch-ua: \" Not A;Brand\";v=\"99\", \"Chromium\";v=\"98\", \"Google Chrome\";v=\"98\"" -H "sec-ch-ua-mobile: ?0" -H "sec-ch-ua-platform: \"Windows\"" -H "Upgrade-Insecure-Requests: 1" -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36" -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9" -H "Sec-Fetch-Site: same-origin" -H "Sec-Fetch-Mode: navigate" -H "Sec-Fetch-User: ?1" -H "Sec-Fetch-Dest: document" -H "Referer: https://weixin.sogou.com/weixin?type=2&query=%E6%B5%8E%E5%8D%97%E5%A4%A9%E6%B0%94&ie=utf8&s_from=input&_sug_=n&_sug_type_=1&w=01015002&oq=&ri=6&sourceid=sugg&sut=0&sst0=1647224739319&lkt=0%2C0%2C0&p=40040108" -H "Accept-Language: zh-CN,zh;q=0.9" -H "Cookie: ABTEST=0|1647226370|v1; SNUID=955756556E6BB54BA8C36F206F25D310; IPLOC=CN3701; SUID=FA38383A1B0DA00A00000000622EAE02; SUID=FA38383A6555A00A00000000622EAE02; JSESSIONID=aaaPefwhkqDB384EsWe9x; SUV=00B5600B3A3838FA622EAE0242604246; ariaDefaultTheme=undefined" --compressed "https://weixin.sogou.com/link?url=dn9a_-gY295K0Rci_xozVXfdMkSQTLW6cwJThYulHEtVjXrGTiVgS05vDMvAw8toa5ZWgFUPIu5VGL8s5vDf41qXa8Fplpd9AqOK-jhx_qZB8By7fitCFPL-rOAZcbLVyiX2MqCvgw4ptMLwH26p9dm4YaL6ZYtS4NeqHS5Q_nTllRIHVAW4nrFzx6-3M-9Ud3TUgO8nEfttSG3lUTs25OkkcsTUz87GouPLeK06uSd2rPlJlESjn-37p7qSMapfYNakzj9lgtwYJSSFPgfogQ..&type=2&query=%E6%B5%8E%E5%8D%97%E5%A4%A9%E6%B0%94&token=A9871CD7955756556E6BB54BA8C36F206F25D310622EAE02&k=48&h=F"
- 可看出在此處(拼接請求)直接拼接出(301)跳轉請求的鏈接
- 如果可以完成這個請求就大功告成了
分析
- 拼接請求的請求地址
https://weixin.sogou.com/link?url=dn9a_-gY295K0Rci_xozVXfdMkSQTLW6cwJThYulHEtVjXrGTiVgS05vDMvAw8toa5ZWgFUPIu5VGL8s5vDf41qXa8Fplpd9AqOK-jhx_qZB8By7fitCFPL-rOAZcbLVyiX2MqCvgw4ptMLwH26p9dm4YaL6ZYtS4NeqHS5Q_nTllRIHVAW4nrFzx6-3M-9Ud3TUgO8nEfttSG3lUTs25OkkcsTUz87GouPLeK06uSd2rPlJlESjn-37p7qSMapfYNakzj9lgtwYJSSFPgfogQ..&type=2&query=%E6%B5%8E%E5%8D%97%E5%A4%A9%E6%B0%94&token=A9871CD7955756556E6BB54BA8C36F206F25D310622EAE02&k=48&h=F
- 和我們在搜狗列表頁獲取的鏈接很相似,對比下
https://weixin.sogou.com/link?url=dn9a_-gY295K0Rci_xozVXfdMkSQTLW6cwJThYulHEtVjXrGTiVgS05vDMvAw8toa5ZWgFUPIu5VGL8s5vDf41qXa8Fplpd9AqOK-jhx_qZB8By7fitCFPL-rOAZcbLVyiX2MqCvgw4ptMLwH26p9dm4YaL6ZYtS4NeqHS5Q_nTllRIHVAW4nrFzx6-3M-9Ud3TUgO8nEfttSG3lUTs25OkkcsTUz87GouPLeK06uSd2rPlJlESjn-37p7qSMapfYNakzj9lgtwYJSSFPgfogQ..&type=2&query=%E6%B5%8E%E5%8D%97%E5%A4%A9%E6%B0%94&token=A9871CD7955756556E6BB54BA8C36F206F25D310622EAE02&k=48&h=F
https://weixin.sogou.com/link?url=dn9a_-gY295K0Rci_xozVXfdMkSQTLW6cwJThYulHEtVjXrGTiVgS05vDMvAw8toT_Tn2j42MFRVGL8s5vDf41qXa8Fplpd9AqOK-jhx_qZB8By7fitCFPL-rOAZcbLVyiX2MqCvgw4ptMLwH26p9dm4YaL6ZYtS4NeqHS5Q_nTllRIHVAW4nkfffpKeIKEhRBeQ7G0thG5XpEcc4HfRvLd1KUjSkao4yxsOMKOcnvZDSDZWVslU657CZyNbec73WkNT_QICA71r1dbZWbKUeQ..&type=2&query=%E6%B5%8E%E5%8D%97%E5%A4%A9%E6%B0%94&token=A971863740828180BABF63558B7CED06BA05DE98622EA7A3
- 拼接請求多了參數(shù):&k=48&h=F
- 找到這個參數(shù)如何生成的就大功告成
繼續(xù)
-
&k=48&h=F這個參數(shù)到底在哪呢筝闹?那就再搜索一下試試吧
image.png -
抱歉媳叨,并沒有找到有效信息,在Charles中关顷,那就瀏覽器里試試唄
image.png -
也沒有搜到糊秆,那就縮短一下詞,搜k=解寝,h=扩然,&k,&h聋伦,
image.png 這段JS看起來很可疑
<script>
(function() {
$("a").on("mousedown click contextmenu", function() {
var b = Math.floor(100 * Math.random()) + 1
, a = this.href.indexOf("url=")
, c = this.href.indexOf("&k=");
-1 !== a && -1 === c && (a = this.href.substr(a + 4 + parseInt("21") + b, 1),
this.href += "&k=" + b + "&h=" + a)
})
}
)();
</script>
- 看不懂夫偶,猜一下界睁,說的是:點擊什么后,鏈接里添加&k=兵拢,&h=
- 這個點什么翻斟,我又猜是指的,列表頁的鏈接说铃,
-
去看看列表頁列表變了嗎
image.png - 真的變了(對比的是 本文:過程>搜狗微信列表 > 圖片 )
- 就是這段JS了
- 翻譯成python
def get_k_h(self, url):
b = int(random.random() * 100) + 1
a = url.find("url=")
url = url + "&k=" + str(b) + "&h=" + url[a + 4 + 21 + b: a + 4 + 21 + b + 1]
return url
-
模擬拼接請求
image.png - 獲取真實鏈接访惜,組裝和拼接
def get_real_url(self, content):
# 獲取真實url
url_text = re.findall("= \'(\S+?)\';", content, re.S)
best_url = ''.join(url_text)
return best_url
- 到此,我們就獲得了真實的鏈接
微信鏈接會失效
- 臨時鏈接轉永久鏈接的第三方
- 之前用過神箭手腻扇,現(xiàn)在已經(jīng)不在了
- 其他平臺提供的收費接口债热,不提供名稱了,避免廣告嫌疑
- 微信公眾平臺獲取永久鏈接
寫在最后
- 如果覺得我寫的不錯的話幼苛,點個贊鼓勵一下
- 如果沒看懂窒篱,可以咨詢我
- 如果需要幫忙,可以咨詢我
- 如有錯誤舶沿,請指正
- 如有更好的辦法墙杯,請指教