CBA聯(lián)賽正進(jìn)行得如火如荼祖屏,而20支CBA球隊(duì)各自的陣容深度究竟幾何?
Scrapy框架 + MongoDB买羞,獲取CBA中國男籃所有球隊(duì)球員的基本信息袁勺,以便往后的數(shù)據(jù)分析使用。
開發(fā)環(huán)境
- python3.7
- Scrapy框架及其組件
- json模塊
- pymongo模塊
獲取分析:
1.獲取球隊(duì)鏈接
獲取球隊(duì)鏈接的網(wǎng)頁是通過Ajax技術(shù)異步加載得到畜普。
通過抓包發(fā)現(xiàn)期丰,所需的數(shù)據(jù)正是json格式,是理想中的情況吃挑。
#解析球隊(duì)鏈接:
def parse(self, response):
club0 = json.loads(response.text)
clubs = club0['data']
baseurl = "https://api-all.9h-sports.com/cba-data/api/cba/v1/league/player-history?clubId={}"
for oneclub in clubs:
clubname = oneclub['name']
clubid = oneclub['club_id']
cluburl = baseurl.format(clubid)
print (clubname + cluburl)
yield scrapy.Request(url=cluburl,callback=self.parsecluburl,dont_filter=True,meta={'clubname':clubname})
2.進(jìn)入各支球隊(duì)URL鏈接頁面咐汞,獲取球隊(duì)名單下所有球員的數(shù)據(jù)。
通過抓包發(fā)現(xiàn)儒鹿,所需要的球員數(shù)據(jù)同樣是通過Ajax技術(shù)異步加載得到,且數(shù)據(jù)格式也和上一個(gè)頁面相同几晤,還是json格式的數(shù)據(jù)约炎。
#解析球員數(shù)據(jù)
def parsecluburl(self,response0):
players0 = json.loads(response0.text)
players = players0['data']
for player in players:
...
allitem = items.CbaplayerItem(
playername = playername,
playernumber = playernumber,
playercountry = playercountry,
playerposition = playerposition,
playerheight = playerheight,
playerweight = playerweight,
playerbirth = playerbirth,
playerclub = playerclub
)
yield allitem
最后將獲取到的所有球員的數(shù)據(jù)通過pipelines組件存入MongoDB數(shù)據(jù)庫。
#pipelines入庫
import pymongo
import json
class CbaplayerPipeline(object):
def __init__(self):
self.conn = pymongo.MongoClient(host='127.0.0.1',port=27017) #連接數(shù)據(jù)庫
self.dbb = self.conn.cbaplayers #創(chuàng)建數(shù)據(jù)庫
self.dbbcc = self.dbb.cbaplayer0 #創(chuàng)建集合
def process_item(self, item, spider):
item = dict(item)
self.dbbcc.insert(item)
return item
結(jié)果展示:
廣東集合蟹瘾!