萬惡的酷狗瀏覽器網(wǎng)頁版居然只能看第一頁侄刽,要下載播放器才能瀏覽后面的內容色鸳。
此段代碼爬取所有的歌曲及鏈接
第一頁就是這樣的,觀察發(fā)現(xiàn)https://www.kugou.com/yy/rank/home/1-8888.html?from=rank
把1改成2
https://www.kugou.com/yy/rank/home/2-8888.html?from=rank
就是第二頁了胀蛮,爬取多頁冷溶,如下
'''
import lxml
import requests
from bs4 import BeautifulSoup
headers={
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.75 Safari/537.36"
}
def get_informations(url):
web_data=requests.get(url,headers)
soup=BeautifulSoup(web_data.text,"lxml")
informations=soup.find_all("a","pc_temp_songname")
for information in informations:
data={
'歌曲':information.get('title'),
'網(wǎng)址':information.get("href")
}
print(data)
urls=["https://www.kugou.com/yy/rank/home/{}-8888.html?from=rank".format(str(i)) for i in range(1,24)]
for url in urls:
get_informations(url)
'''