最近想要在某網(wǎng)站上獲取無損音樂資源隘击,并通過自己的蝦米音樂歌單來搜索自己喜歡聽的歌曲名單
登錄前準(zhǔn)備
如下圖所示膏孟,首先找到登錄的POST請求(2處鹃锈,1處為掃碼登錄)窜醉,3處即為POST請求的URL宪萄,4處為參數(shù)設(shè)置,5處為返回值
image.png
接下來看看Params榨惰,看看POST請求需要提供哪些參數(shù)
image.png
1處和2處即為蝦米的賬號和密碼拜英,3處為一個隨機參數(shù),
image.png
根據(jù)1處提供的URL地址琅催,獲取相應(yīng)cookies中的_xiamitoken參數(shù)
登錄前代碼
根據(jù)上面的分析居凶,在登錄前需要獲取_xiamitoken參數(shù),改代碼如下:
def login_pre():
print('獲取_xiamitoken的值藤抡,為登錄必備參數(shù)')
url = 'https://login.xiami.com/member/qrcodelogin'
headers = {
'Host':'login.xiami.com',
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0',
'Accept':'application/json, text/javascript, */*; q=0.01',
'Accept-Language':'zh-CN,en-US;q=0.7,en;q=0.3',
'Accept-Encoding':'gzip, deflate, br',
'Referer':'https://login.xiami.com/member/login',
'X-Requested-With':'XMLHttpRequest',
'Connection':'keep-alive'
}
payload = {
'lgToken':'7aac8d8dee47f354776a27d7af7cdeb2',
'defaulturl':'https%3A%2F%2Fwww.xiami.com%2F',
't':str(int(time()*1000)) # 時間隨機數(shù)
}
req = session.get(url=url,params=payload)
# print(req)
print(req.text)
cookies = req.cookies
_xiamitoken = cookies['_xiamitoken']
# print(_xiamitoken)
return _xiamitoken
登錄代碼
獲取參數(shù)_xiamitoken后即可進行登錄相關(guān)操作侠碧,代碼如下:
def login(_xiamitoken):
print('登錄蝦米')
url = 'https://login.xiami.com/passport/login'
headers = {
'Host':'login.xiami.com',
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0',
'Accept':'application/json, text/javascript, */*; q=0.01',
'Accept-Language':'zh-CN,en-US;q=0.7,en;q=0.3',
'Accept-Encoding':'gzip, deflate, br',
'Referer':'https://login.xiami.com/member/login',
'Content-Type':'application/x-www-form-urlencoded; charset=UTF-8',
'X-Requested-With':'XMLHttpRequest',
'Content-Length':'164',
'Connection':'keep-alive'
}
account = input('蝦米賬號:')
pw = input('蝦米密碼:')
payload = {
'_xiamitoken':_xiamitoken,
'done':'https%3A%2F%2Fwww.xiami.com',
'verifycode':'',
'account':account,
'pw':pw,
'submit':'登+錄'
}
req = session.post(url=url,data=payload,headers=headers)
# print(req)
print(req.text)
登陸后驗證
為了驗證是否登錄成功,可以獲取一下個人信息列表看看:
def login_after():
print('登錄后個人信息')
url = 'https://www.xiami.com/index/home'
headers = {
'Host':'www.xiami.com',
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0',
'Accept':'application/json, text/javascript, */*; q=0.01',
'Accept-Language':'zh-CN,en-US;q=0.7,en;q=0.3',
'Accept-Encoding':'gzip, deflate, br',
'Referer':'https://www.xiami.com/',
'X-Requested-With':'XMLHttpRequest',
'Connection':'keep-alive'
}
req = session.get(url=url,headers=headers)
req = json.loads(req.text)
# print('用戶信息:',req)
nick_name = req['data']['userInfo']['nick_name']
print('nick_name:',nick_name)
如果此處能夠正確打印個人用戶名缠黍,說明已經(jīng)登錄成功弄兜,接下來可以查看“我的音樂”列表了
歌單下載
點擊“我的音樂”獲取個人音樂列表。
步驟如下:
- 首先瓷式,隨機選取某頁(1處)音樂的內(nèi)容替饿,并獲取最大列表頁數(shù)(2處)
- 其次,查看音樂列表的URL(3處)贸典,并搜索過濾出來(4處至6處)视卢,
image.png
由上圖可見我們需要一個循環(huán)變量(pg_num)來表示頁數(shù)。
然后我們需要獲取列表中的歌名(可以順帶獲取相應(yīng)的URL)和演唱者
方法一瓤漏,通過HTML標(biāo)簽獲取
image.png
通過觀察可知腾夯,這些歌單列表在<tbody>-><tr>-><td class="song_name">下
image.png
方法二,通過正則表達式獲取
image.png
由上圖可知蔬充,歌名和演唱者具有相同的結(jié)構(gòu)蝶俱,我們可以通過正則表達式獲取紅框中的內(nèi)容
pattern = re.compile('</span><a title=".*?".*?<a class="artist_name".*?</a>.*?</td>',re.S)
items = re.findall(pattern,text)
對象items即為本列所有歌單列表,再通過for循環(huán)即可逐步提取歌名和演唱者:
for item in items:
yield{
'name':re.match('(.*)<a title="(.*?)"(.*?)',item).group(2),
'href':re.match('(.*)href="(.*?)"(.*?)',item).group(2),
'artist_name':re.search('(.*?)<a class="artist_name"(.*?)">(.*?)</a>.*>',item, re.S).group(3)
}
用yield即可在主函數(shù)中把該返回值當(dāng)中iterable進行讀取饥漫,完整代碼如下:
def lib_song():
print('用戶歌單')
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0'
}
for pg_num in range(1,8):
url = 'https://www.xiami.com/space/lib-song/u/13963315/page/{page_num}'.format(page_num=pg_num)
req = session.get(url=url,headers=headers)
print(req)
text = req.text
pattern = re.compile('</span><a title=".*?".*?<a class="artist_name".*?</a>.*?</td>',re.S)
items = re.findall(pattern,text)
# print(items)
for item in items:
yield{
'name':re.match('(.*)<a title="(.*?)"(.*?)',item).group(2),
'href':re.match('(.*)href="(.*?)"(.*?)',item).group(2),
'artist_name':re.search('(.*?)<a class="artist_name"(.*?)">(.*?)</a>.*>',item, re.S).group(3)
}
主函數(shù)代碼
if __name__ == '__main__':
session = requests.session()
_xiamitoken = login_pre()
login(_xiamitoken)
login_after()
items = lib_song()
# for item in items:
# print(item)
df = pd.DataFrame(items)
df.to_csv('XiamiMusic.csv',encoding='utf-8_sig') # 通過encoding解決保存中文亂碼問題
lib_song的返回值是iterable對象榨呆,可用for循環(huán)逐個獲取
完整代碼
#!/usr/bin/env python3.6
# -*- coding: utf-8 -*-
'''
@author: Haffner2010
@contact: myprojtest@163.com
@Software: Pycharm + Python3.6
@OS:Windows 7 64 bit
@Site:http://www.reibang.com/u/e031670b216b
@file: XiamiMusic.py
@time: 2018/6/5 20:14
@desc:
'''
import requests
import re
import json
import pandas as pd
from time import time
def login_pre():
print('獲取_xiamitoken的值,為登錄必備參數(shù)')
url = 'https://login.xiami.com/member/qrcodelogin'
headers = {
'Host':'login.xiami.com',
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0',
'Accept':'application/json, text/javascript, */*; q=0.01',
'Accept-Language':'zh-CN,en-US;q=0.7,en;q=0.3',
'Accept-Encoding':'gzip, deflate, br',
'Referer':'https://login.xiami.com/member/login',
'X-Requested-With':'XMLHttpRequest',
'Connection':'keep-alive'
}
payload = {
'lgToken':'7aac8d8dee47f354776a27d7af7cdeb2',
'defaulturl':'https%3A%2F%2Fwww.xiami.com%2F',
't':str(int(time()*1000)) # 時間隨機數(shù)
}
req = session.get(url=url,params=payload)
# print(req)
print(req.text)
cookies = req.cookies
_xiamitoken = cookies['_xiamitoken']
# print(_xiamitoken)
return _xiamitoken
def login(_xiamitoken):
print('登錄蝦米')
url = 'https://login.xiami.com/passport/login'
headers = {
'Host':'login.xiami.com',
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0',
'Accept':'application/json, text/javascript, */*; q=0.01',
'Accept-Language':'zh-CN,en-US;q=0.7,en;q=0.3',
'Accept-Encoding':'gzip, deflate, br',
'Referer':'https://login.xiami.com/member/login',
'Content-Type':'application/x-www-form-urlencoded; charset=UTF-8',
'X-Requested-With':'XMLHttpRequest',
'Content-Length':'164',
'Connection':'keep-alive'
}
account = input('蝦米賬號:')
pw = input('蝦米密碼:')
payload = {
'_xiamitoken':_xiamitoken,
'done':'https%3A%2F%2Fwww.xiami.com',
'verifycode':'',
'account':account,
'pw':pw,
'submit':'登+錄'
}
req = session.post(url=url,data=payload,headers=headers)
# print(req)
print(req.text)
def login_after():
print('登錄后個人信息')
url = 'https://www.xiami.com/index/home'
headers = {
'Host':'www.xiami.com',
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0',
'Accept':'application/json, text/javascript, */*; q=0.01',
'Accept-Language':'zh-CN,en-US;q=0.7,en;q=0.3',
'Accept-Encoding':'gzip, deflate, br',
'Referer':'https://www.xiami.com/',
'X-Requested-With':'XMLHttpRequest',
'Connection':'keep-alive'
}
req = session.get(url=url,headers=headers)
req = json.loads(req.text)
# print('用戶信息:',req)
nick_name = req['data']['userInfo']['nick_name']
print('nick_name:',nick_name)
def lib_song():
print('用戶歌單')
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0'
}
for pg_num in range(1,8):
url = 'https://www.xiami.com/space/lib-song/u/13963315/page/{page_num}'.format(page_num=pg_num)
req = session.get(url=url,headers=headers)
print(req)
text = req.text
pattern = re.compile('</span><a title=".*?".*?<a class="artist_name".*?</a>.*?</td>',re.S)
items = re.findall(pattern,text)
# print(items)
for item in items:
yield{
'name':re.match('(.*)<a title="(.*?)"(.*?)',item).group(2),
'href':re.match('(.*)href="(.*?)"(.*?)',item).group(2),
'artist_name':re.search('(.*?)<a class="artist_name"(.*?)">(.*?)</a>.*>',item, re.S).group(3)
}
if __name__ == '__main__':
session = requests.session()
_xiamitoken = login_pre()
login(_xiamitoken)
login_after()
items = lib_song()
# for item in items:
# print(item)
df = pd.DataFrame(items)
df.to_csv('XiamiMusic.csv',encoding='utf-8_sig') # 通過encoding解決保存中文亂碼問題
獲取的歌單保存在本地csv文件當(dāng)中庸队,下步將通過csv文件來搜索FLAC音樂資源
注意事項
文中的部分代碼有些需要完整的headers积蜻,有些只需要設(shè)置User-Agent即可,有些完全不需要headers彻消,可以通過不斷嘗試來判斷竿拆。
To be continued
有問題再補充