上一篇對(duì)本賽題的基礎(chǔ)數(shù)據(jù)做了一些簡單地了解郑诺,并做了一些探索性可視化分析。
本節(jié)繼續(xù)對(duì)比賽的詳細(xì)數(shù)據(jù)進(jìn)行探索分析蝙叛,得到一些更棒更有趣的結(jié)論襟诸。
Game detail infomation visualization(比賽詳細(xì)統(tǒng)計(jì)數(shù)據(jù)可視化)
- WFGM:投籃命中數(shù)享潜。
- WFGA:未命中數(shù)困鸥。
- WFGM3:3分命中數(shù),注意這部分?jǐn)?shù)據(jù)是包含在WFGM中的剑按。
- WFGA3:3分未命中數(shù)疾就。
- WFTM:罰球命中數(shù)。
- WFTA:罰球未命中數(shù)艺蝴。
- WOR:進(jìn)攻籃板猬腰。
- WDR:防守籃板。
- WAst:助攻數(shù)猜敢。
- WTO:失誤數(shù)姑荷。
- WStl:搶斷數(shù)。
- WBlk:蓋帽數(shù)缩擂。
- WPF:個(gè)人犯規(guī)數(shù)鼠冕。
分別對(duì)常規(guī)賽、NACC賽(除決賽)撇叁、NACC決賽的以上數(shù)據(jù)進(jìn)行均值統(tǒng)計(jì) 并進(jìn)行可視化
cols = ['TeamID','Score','FGM','FGA','FGM3','FGA3','FTM','FTA','OR','DR','Ast','TO','Stl','Blk','PF']
cols = ['W'+col for col in cols] + ['L'+col for col in cols]
regular_data = section1_detail.query('is_regular == 1')[cols].mean().rename('value')
tournament_data = section1_detail.query('is_regular == 0 and DayNum < 154')[cols].mean().rename('value')
final_data = section1_detail.query('DayNum == 154')[cols].mean().rename('value')
tmp = pd.DataFrame({})
tmp = tmp.append([['regular']+list(regular_data.values)],ignore_index=True)
tmp = tmp.append([['tournament']+list(tournament_data.values)],ignore_index=True)
tmp = tmp.append([['final']+list(final_data.values)],ignore_index=True)
tmp.columns = ['match_type']+list(regular_data.index)
tmp
plt.subplots(figsize=(20, 3*5))
plt.subplots_adjust(wspace=0.2, hspace=0.4)
for i,col in enumerate(['FGM','FGA','FGM3','FGA3','FTM','FTA','OR','DR','Ast','TO','Stl','Blk','PF']):
# plt.subplot(5,3,i*3+1)
# sns.barplot(x="match_type", y='W'+col, data=tmp)
# plt.subplot(5,3,i*3+2)
# sns.barplot(x="match_type", y='L'+col, data=tmp)
# plt.subplot(5,3,i*3+3)
# tmp['W'+col+'_L'+col] = tmp['W'+col] - tmp['L'+col]
# sns.barplot(x="match_type", y='W'+col+'_L'+col, data=tmp)
plt.subplot(5,3,i+1)
tmp['W'+col+'_L'+col] = tmp['W'+col] - tmp['L'+col]
sns.barplot(x="match_type", y='W'+col+'_L'+col, data=tmp)
- FGM&FGA: 命中數(shù)上看,決賽中球隊(duì)的差異最小畦贸,未命中上看陨闹,決賽中,勝利的一方似乎出手更穩(wěn)重薄坏,也就是投丟了更少的球
- FGM3&FGA3: 三分球上看趋厉,在決賽中,勝利的一方甚至進(jìn)的三分球要少于失敗的一方胶坠,這也說明一個(gè)現(xiàn)象君账,通常來說決賽因?yàn)榉朗胤绞健⒋盗P方式沈善、球員心態(tài)等各種因素乡数,一般球隊(duì)會(huì)采取更保守的得分手段椭蹄,而較少采用不穩(wěn)定的三分球。
- FTM&FTA: 罰球上看差異不大净赴,勝利方的罰球命中率更高
- Rebounds: 籃板球:灌籃高手里面說绳矩,贏得籃板的人贏得比賽,看看是不是這樣玖翅∫砉荩可以看到對(duì)于進(jìn)攻籃板,勝利的球隊(duì)總是更少的金度,決賽尤其如此应媚,通常來說進(jìn)攻籃板少可以說明球隊(duì)的戰(zhàn)術(shù)偏保守,也就是積極退防猜极,以避免因?yàn)閾屵M(jìn)攻籃板導(dǎo)致對(duì)方出現(xiàn)快攻的機(jī)會(huì)中姜。而防守籃板上看,勝利方明顯表現(xiàn)更好
- 助攻和失誤:在決賽魔吐,這兩個(gè)相對(duì)都更少扎筒,這是因?yàn)橥ǔQ賽由于防守強(qiáng)度、判罰尺度等問題酬姆,球隊(duì)通常需要采取更多的個(gè)人進(jìn)攻嗜桌,這也是明星球員的作用,而這種方式通常對(duì)應(yīng)的助攻和失誤都會(huì)少一些
- 搶斷和蓋帽:這是兩種不同的防守方式辞色,通常搶斷是有一定冒險(xiǎn)成分的骨宠,因?yàn)閾寯嗍⊥ǔR馕吨唬虼藳Q賽中搶斷更少出現(xiàn)相满,而蓋帽多則表示到油漆區(qū)的進(jìn)攻更多层亿,這也是決賽中更多的進(jìn)攻方式選擇導(dǎo)致的
- 個(gè)人犯規(guī)上,決賽中勝利方和失敗方相比差異更小立美,一定程度上說也是因?yàn)椴幌氚驯荣惖慕Y(jié)果交給裁判匿又,而是更多的給球員發(fā)揮,當(dāng)然這對(duì)一些喜歡造犯規(guī)的球員來說就不是很友好
Event Data
Each MEvents & WEvents file lists the play-by-play event logs for more than 99.5% of games from that season. Each event is assigned to either a team or a single one of the team's players. Thus if a basket is made by one player and an assist is credited to a second player, that would show up as two separate records. The players are listed by PlayerID within the xPlayers.csv file.
Mens Event Files:
MEvents2015.csv, MEvents2016.csv, MEvent2017.csv, MEvents2018.csv, MEvents2019.csv
Womens Event Files:WEvents2015.csv, WEvents2016.csv, WEvents2017.csv, WEvents2018.csv, WEvents2019.csv
We can read in all files and combine into one huge dataframe, one for womens and one for mens.
EventID - this is a unique ID for each logged event. The EventID's are different within each year and uniquely identify each play-by-play event. They ought to be listed in chronological order for the events within their game.
Season, DayNum, WTeamID, LTeamID - these four columns are sufficient to uniquely identify each game. The games are a mix of Regular Season, NCAA? Tourney, and Secondary Tourney games.
WFinalScore, LFinalScore
WCurrentScore, LCurrentScore
ElapsedSeconds - 這是從比賽開始到事件發(fā)生所經(jīng)過的秒數(shù)建蹄。(this is the number of seconds that have elapsed from the start of the game until the event occurred. With a 20-minute half, that means that an ElapsedSeconds value from 0 to 1200 represents an event in the first half, a value from 1200 to 2400 represents an event in the second half, and a value above 2400 represents an event in overtime. For example, since overtime periods are five minutes long (that's 300 seconds), a value of 2699 would represent one second left in the first overtime.)
EventTeamID - this is the ID of the team that the event is logged for, which will either be the WTeamID or the LTeamID.
EventPlayerID - this is the ID of the player that the event is logged for, as described in the MPlayers.csv file.
EventType, EventSubType - these indicate the type of the event that was logged (see listing below).
assist - 助攻
block - 蓋帽
steal - 搶斷
sub - 換人
timeout - 超時(shí): unk=unknown type of timeout; comm=commercial timeout; full=full timeout; short= short timeout
turnover -失誤: unk=unknown type of turnover; 10sec=10 second violation; 3sec=3 second violation; 5sec=5 second violation; bpass=bad pass turnover; dribb=dribbling turnover; lanev=lane violation; lostb=lost ball; offen=offensive turnover (?); offgt=offensive goaltending; other=other type of turnover; shotc=shot clock violation; trav=travelling
foul - 犯規(guī): unk=unknown type of foul; admT=administrative technical; benT=bench technical; coaT=coach technical; off=offensive foul; pers=personal foul; tech=technical foul
fouled 被犯規(guī)
reb - 籃板: deadb=a deadball rebound; def=a defensive rebound; defdb=a defensive deadball rebound; off=an offensive rebound; offdb=an offensive deadball rebound
made1, miss1 - a one-point free throw was made or missed, with one of the following subtypes: 1of1=the only free throw of the trip to the line; 1of2=the first of two free throw attempts; 2of2=the second of two free throw attempts; 1of3=the first of three free throw attempts; 2of3=the second of three free throw attempts; 3of3=the third of three free throw attempts; unk=unknown what the free throw sequence is
made2, miss2 - a two-point field goal was made or missed, with one of the following subtypes: unk=unknown type of two-point shot; dunk=dunk; lay=layup; tip=tip-in; jump=jump shot; alley=alley-oop; drive=driving layup; hook=hook shot; stepb=step-back jump shot; pullu=pull-up jump shot; turna=turn-around jump shot; wrong=wrong basket
made3, miss3 - a three-point field goal was made or missed, with one of the following subtypes: unk=unknown type of three-point shot; jump=jump shot; stepb=step-back jump shot; pullu=pull-up jump shot; turna=turn-around jump shot; wrong=wrong basket
jumpb 跳球: start=start period; block=block tie-up; heldb=held ball; lodge=lodged ball; lost=jump ball lost; outof=out of bounds; outrb=out of bounds rebound; won=jump ball won
讀取Event Data數(shù)據(jù)
mens_events = []
for year in [2015, 2016, 2017, 2018, 2019]:
mens_events.append(pd.read_csv(Mfolder_path+f'MEvents{year}.csv'))
MEvents = pd.concat(mens_events)
print(MEvents.shape)
MEvents.head()
(13149684, 17)
womens_events = []
for year in [2015, 2016, 2017, 2018, 2019]:
womens_events.append(pd.read_csv(Wfolder_path+f'WEvents{year}.csv'))
WEvents = pd.concat(womens_events)
print(WEvents.shape)
WEvents.head()
(12744264, 17)
del mens_events
del womens_events
gc.collect()
對(duì)EventType進(jìn)行統(tǒng)計(jì)并可視化
EventType = pd.DataFrame({'MEvents' : MEvents['EventType'].value_counts(),'WEvents': WEvents['EventType'].value_counts()})
EventType = EventType.sort_values('MEvents').reset_index()
#EventType.sort_values('MEvents') 這里可視化更好一些
plt.figure(figsize=(15,6))
plt.subplots(figsize=(20, 10))
plt.subplot(2,1,1)
sns.barplot(x ='index', y = 'MEvents', data = EventType)
plt.subplot(2,1,2)
sns.barplot(x ='index', y = 'WEvents', data = EventType)
籃球場上換人發(fā)生是最多的碌更,其次才是籃板球。甚至犯規(guī)次數(shù)都要比投進(jìn)和投失兩分球的次數(shù)還要多洞慎。
-
值得注意的是:
該數(shù)據(jù)還給定了XY坐標(biāo)和坐標(biāo)對(duì)應(yīng)的區(qū)域名稱痛单,球場左下角為(0,0),右上角為(100,100)劲腿,中心為(50,50)旭绒,將這個(gè)坐標(biāo)和標(biāo)準(zhǔn)NACC球場進(jìn)行尺度統(tǒng)一,再結(jié)合數(shù)據(jù)點(diǎn)將事件可視化將會(huì)更加直觀。- X, Y - for games where it is available, this describes an X/Y position on the court where the lower-left corner of the full court is (0,0), the upper-right corner of the full court is (100,100), the exact middle of the full court (where the initial jump ball happens) is (50,50), and so on. The X/Y position is provided for fouls, turnovers, and field-goal attempts (either 2-point or 3-point).
- Area - for events where an X/Y position is provided, this position is more generally categorized into one of 13 "areas" of the court, as follows: 1=under basket; 2=in the paint; 3=inside right wing; 4=inside right; 5=inside center; 6=inside left; 7=inside left wing; 8=outside right wing; 9=outside right; 10=outside center; 11=outside left; 12=outside left wing; 13=backcourt
坐標(biāo)示意圖
區(qū)域示意圖
給定區(qū)域?yàn)閿?shù)字挥吵,構(gòu)建一個(gè)映射利用map得到區(qū)域的名稱 并分組進(jìn)行可視化化
#MEvents['Area'].value_counts()
area_mapping = {0: np.nan,
1: 'under basket',
2: 'in the paint',
3: 'inside right wing',
4: 'inside right',
5: 'inside center',
6: 'inside left',
7: 'inside left wing',
8: 'outside right wing',
9: 'outside right',
10: 'outside center',
11: 'outside left',
12: 'outside left wing',
13: 'backcourt'}
MEvents['Area_Name'] = MEvents['Area'].map(area_mapping)
fig, ax = plt.subplots(figsize=(15, 8))
MEvents_X_Y = MEvents.loc[~MEvents['Area_Name'].isna()].groupby('Area_Name')
for i, d in MEvents_X_Y:
sns.scatterplot(x='X', y='Y', data = d, label=i, alpha = 0.3)
#將圖例放在圖外
plt.legend(loc=[1, 0])
#plt.legend(bbox_to_anchor=(1.04,1), loc="upper left")
ax.set_xticks([])
ax.set_yticks([])
ax.set_xlabel('')
ax.set_ylabel('')
ax.set_xlim(0, 100)
ax.set_ylim(0, 100)
plt.show()
球場的可視化
剛開始看到這幅圖的時(shí)候完全被經(jīng)驗(yàn)到了重父,大概瀏覽了一下代碼之后發(fā)現(xiàn):就這,這其實(shí)就是一個(gè)簡單的對(duì)坐標(biāo)進(jìn)行了統(tǒng)一然后利用簡單幾何圖形的疊加蔫劣。但其中的坐標(biāo)轉(zhuǎn)化其實(shí)也沒那么簡單坪郭,而且配色啊什么的也是一門學(xué)問,下面貼出代碼脉幢,如果后邊有時(shí)間的話歪沃,會(huì)對(duì)這個(gè)代碼進(jìn)行一個(gè)分解,將會(huì)學(xué)到更多關(guān)于python-matplotlib繪圖的奧秘嫌松。
def create_ncaa_full_court(ax=None, three_line='mens', court_color='#dfbb85',
lw=3, lines_color='black', lines_alpha=0.5,
paint_fill='blue', paint_alpha=0.4,
inner_arc=False):
"""
Creates NCAA Basketball Court
Dimensions are in feet (Court is 97x50 ft)
Created by: Rob Mulla / https://github.com/RobMulla
* Note that this function uses "feet" as the unit of measure.
* NCAA Data is provided on a x range: 0, 100 and y-range 0 to 100
* To plot X/Y positions first convert to feet like this:
```
Events['X_'] = (Events['X'] * (94/100))
Events['Y_'] = (Events['Y'] * (50/100))
```
ax: matplotlib axes if None gets current axes using `plt.gca`
three_line: 'mens', 'womens' or 'both' defines 3 point line plotted
court_color : (hex) Color of the court
lw : line width
lines_color : Color of the lines
lines_alpha : transparency of lines
paint_fill : Color inside the paint
paint_alpha : transparency of the "paint"
inner_arc : paint the dotted inner arc
"""
if ax is None:
ax = plt.gca()
# Create Pathes for Court Lines
center_circle = Circle((94/2, 50/2), 6,
linewidth=lw, color=lines_color, lw=lw,
fill=False, alpha=lines_alpha)
hoop_left = Circle((5.25, 50/2), 1.5 / 2,
linewidth=lw, color=lines_color, lw=lw,
fill=False, alpha=lines_alpha)
hoop_right = Circle((94-5.25, 50/2), 1.5 / 2,
linewidth=lw, color=lines_color, lw=lw,
fill=False, alpha=lines_alpha)
# Paint - 18 Feet 10 inches which converts to 18.833333 feet - gross!
left_paint = Rectangle((0, (50/2)-6), 18.833333, 12,
fill=paint_fill, alpha=paint_alpha,
lw=lw, edgecolor=None)
right_paint = Rectangle((94-18.83333, (50/2)-6), 18.833333,
12, fill=paint_fill, alpha=paint_alpha,
lw=lw, edgecolor=None)
left_paint_boarder = Rectangle((0, (50/2)-6), 18.833333, 12,
fill=False, alpha=lines_alpha,
lw=lw, edgecolor=lines_color)
right_paint_boarder = Rectangle((94-18.83333, (50/2)-6), 18.833333,
12, fill=False, alpha=lines_alpha,
lw=lw, edgecolor=lines_color)
left_arc = Arc((18.833333, 50/2), 12, 12, theta1=-
90, theta2=90, color=lines_color, lw=lw,
alpha=lines_alpha)
right_arc = Arc((94-18.833333, 50/2), 12, 12, theta1=90,
theta2=-90, color=lines_color, lw=lw,
alpha=lines_alpha)
leftblock1 = Rectangle((7, (50/2)-6-0.666), 1, 0.666,
fill=True, alpha=lines_alpha,
lw=0, edgecolor=lines_color,
facecolor=lines_color)
leftblock2 = Rectangle((7, (50/2)+6), 1, 0.666,
fill=True, alpha=lines_alpha,
lw=0, edgecolor=lines_color,
facecolor=lines_color)
ax.add_patch(leftblock1)
ax.add_patch(leftblock2)
left_l1 = Rectangle((11, (50/2)-6-0.666), 0.166, 0.666,
fill=True, alpha=lines_alpha,
lw=0, edgecolor=lines_color,
facecolor=lines_color)
left_l2 = Rectangle((14, (50/2)-6-0.666), 0.166, 0.666,
fill=True, alpha=lines_alpha,
lw=0, edgecolor=lines_color,
facecolor=lines_color)
left_l3 = Rectangle((17, (50/2)-6-0.666), 0.166, 0.666,
fill=True, alpha=lines_alpha,
lw=0, edgecolor=lines_color,
facecolor=lines_color)
ax.add_patch(left_l1)
ax.add_patch(left_l2)
ax.add_patch(left_l3)
left_l4 = Rectangle((11, (50/2)+6), 0.166, 0.666,
fill=True, alpha=lines_alpha,
lw=0, edgecolor=lines_color,
facecolor=lines_color)
left_l5 = Rectangle((14, (50/2)+6), 0.166, 0.666,
fill=True, alpha=lines_alpha,
lw=0, edgecolor=lines_color,
facecolor=lines_color)
left_l6 = Rectangle((17, (50/2)+6), 0.166, 0.666,
fill=True, alpha=lines_alpha,
lw=0, edgecolor=lines_color,
facecolor=lines_color)
ax.add_patch(left_l4)
ax.add_patch(left_l5)
ax.add_patch(left_l6)
rightblock1 = Rectangle((94-7-1, (50/2)-6-0.666), 1, 0.666,
fill=True, alpha=lines_alpha,
lw=0, edgecolor=lines_color,
facecolor=lines_color)
rightblock2 = Rectangle((94-7-1, (50/2)+6), 1, 0.666,
fill=True, alpha=lines_alpha,
lw=0, edgecolor=lines_color,
facecolor=lines_color)
ax.add_patch(rightblock1)
ax.add_patch(rightblock2)
right_l1 = Rectangle((94-11, (50/2)-6-0.666), 0.166, 0.666,
fill=True, alpha=lines_alpha,
lw=0, edgecolor=lines_color,
facecolor=lines_color)
right_l2 = Rectangle((94-14, (50/2)-6-0.666), 0.166, 0.666,
fill=True, alpha=lines_alpha,
lw=0, edgecolor=lines_color,
facecolor=lines_color)
right_l3 = Rectangle((94-17, (50/2)-6-0.666), 0.166, 0.666,
fill=True, alpha=lines_alpha,
lw=0, edgecolor=lines_color,
facecolor=lines_color)
ax.add_patch(right_l1)
ax.add_patch(right_l2)
ax.add_patch(right_l3)
right_l4 = Rectangle((94-11, (50/2)+6), 0.166, 0.666,
fill=True, alpha=lines_alpha,
lw=0, edgecolor=lines_color,
facecolor=lines_color)
right_l5 = Rectangle((94-14, (50/2)+6), 0.166, 0.666,
fill=True, alpha=lines_alpha,
lw=0, edgecolor=lines_color,
facecolor=lines_color)
right_l6 = Rectangle((94-17, (50/2)+6), 0.166, 0.666,
fill=True, alpha=lines_alpha,
lw=0, edgecolor=lines_color,
facecolor=lines_color)
ax.add_patch(right_l4)
ax.add_patch(right_l5)
ax.add_patch(right_l6)
# 3 Point Line
if (three_line == 'mens') | (three_line == 'both'):
# 22' 1.75" distance to center of hoop
three_pt_left = Arc((6.25, 50/2), 44.291, 44.291, theta1=-78,
theta2=78, color=lines_color, lw=lw,
alpha=lines_alpha)
three_pt_right = Arc((94-6.25, 50/2), 44.291, 44.291,
theta1=180-78, theta2=180+78,
color=lines_color, lw=lw, alpha=lines_alpha)
# 4.25 feet max to sideline for mens
ax.plot((0, 11.25), (3.34, 3.34),
color=lines_color, lw=lw, alpha=lines_alpha)
ax.plot((0, 11.25), (50-3.34, 50-3.34),
color=lines_color, lw=lw, alpha=lines_alpha)
ax.plot((94-11.25, 94), (3.34, 3.34),
color=lines_color, lw=lw, alpha=lines_alpha)
ax.plot((94-11.25, 94), (50-3.34, 50-3.34),
color=lines_color, lw=lw, alpha=lines_alpha)
ax.add_patch(three_pt_left)
ax.add_patch(three_pt_right)
if (three_line == 'womens') | (three_line == 'both'):
# womens 3
three_pt_left_w = Arc((6.25, 50/2), 20.75 * 2, 20.75 * 2, theta1=-85,
theta2=85, color=lines_color, lw=lw, alpha=lines_alpha)
three_pt_right_w = Arc((94-6.25, 50/2), 20.75 * 2, 20.75 * 2,
theta1=180-85, theta2=180+85,
color=lines_color, lw=lw, alpha=lines_alpha)
# 4.25 inches max to sideline for mens
ax.plot((0, 8.3), (4.25, 4.25), color=lines_color,
lw=lw, alpha=lines_alpha)
ax.plot((0, 8.3), (50-4.25, 50-4.25),
color=lines_color, lw=lw, alpha=lines_alpha)
ax.plot((94-8.3, 94), (4.25, 4.25),
color=lines_color, lw=lw, alpha=lines_alpha)
ax.plot((94-8.3, 94), (50-4.25, 50-4.25),
color=lines_color, lw=lw, alpha=lines_alpha)
ax.add_patch(three_pt_left_w)
ax.add_patch(three_pt_right_w)
# Add Patches
ax.add_patch(left_paint)
ax.add_patch(left_paint_boarder)
ax.add_patch(right_paint)
ax.add_patch(right_paint_boarder)
ax.add_patch(center_circle)
ax.add_patch(hoop_left)
ax.add_patch(hoop_right)
ax.add_patch(left_arc)
ax.add_patch(right_arc)
if inner_arc:
left_inner_arc = Arc((18.833333, 50/2), 12, 12, theta1=90,
theta2=-90, color=lines_color, lw=lw,
alpha=lines_alpha, ls='--')
right_inner_arc = Arc((94-18.833333, 50/2), 12, 12, theta1=-90,
theta2=90, color=lines_color, lw=lw,
alpha=lines_alpha, ls='--')
ax.add_patch(left_inner_arc)
ax.add_patch(right_inner_arc)
# Restricted Area Marker
restricted_left = Arc((6.25, 50/2), 8, 8, theta1=-90,
theta2=90, color=lines_color, lw=lw,
alpha=lines_alpha)
restricted_right = Arc((94-6.25, 50/2), 8, 8,
theta1=180-90, theta2=180+90,
color=lines_color, lw=lw, alpha=lines_alpha)
ax.add_patch(restricted_left)
ax.add_patch(restricted_right)
# Backboards
ax.plot((4, 4), ((50/2) - 3, (50/2) + 3),
color=lines_color, lw=lw*1.5, alpha=lines_alpha)
ax.plot((94-4, 94-4), ((50/2) - 3, (50/2) + 3),
color=lines_color, lw=lw*1.5, alpha=lines_alpha)
ax.plot((4, 4.6), (50/2, 50/2), color=lines_color,
lw=lw, alpha=lines_alpha)
ax.plot((94-4, 94-4.6), (50/2, 50/2),
color=lines_color, lw=lw, alpha=lines_alpha)
# Half Court Line
ax.axvline(94/2, color=lines_color, lw=lw, alpha=lines_alpha)
# Boarder
boarder = Rectangle((0.3,0.3), 94-0.4, 50-0.4, fill=False, lw=3, color='black', alpha=lines_alpha)
ax.add_patch(boarder)
# Plot Limit
ax.set_xlim(0, 94)
ax.set_ylim(0, 50)
ax.set_facecolor(court_color)
ax.set_xticks([])
ax.set_yticks([])
ax.set_xlabel('')
return ax
fig, ax = plt.subplots(figsize=(15, 8.5))
create_ncaa_full_court(ax, three_line='both', paint_alpha=0.4)
plt.show()
有了上邊球場作為基礎(chǔ)沪曙,在上邊繪制XY坐標(biāo)點(diǎn)將會(huì)十分直觀。
Plotting X, Y Data
之后的可視化確實(shí)比較直觀和有趣萎羔,請(qǐng)往后看液走。
X, Y points are not available for all games- so this is not a complete sample
XY坐標(biāo)并不是對(duì)所有比賽都是完整的,有些是缺失的贾陷,比如我想對(duì)以下的MVP進(jìn)行可視化缘眶,無奈沒有數(shù)據(jù)。髓废。
- XY坐標(biāo)僅為以下事件fouls, turnovers, and field-goal attempts (either 2-point or 3-point)提供坐標(biāo)巷懈,其他事件沒有坐標(biāo)。
NACC標(biāo)準(zhǔn)球場大大小為94*50,將其與坐標(biāo)統(tǒng)一
# Normalize X, Y positions for court dimentions
# Court is 50 feet wide and 94 feet end to end.
MEvents['X_'] = (MEvents['X'] * (94/100))
MEvents['Y_'] = (MEvents['Y'] * (50/100))
WEvents['X_'] = (WEvents['X'] * (94/100))
WEvents['Y_'] = (WEvents['Y'] * (50/100))
分別對(duì)男女Turnover失誤的位置進(jìn)行可視化
#fouls, turnovers, and field-goal attempts (either 2-point or 3-point). No X/Y data for other events.
fig, ax = plt.subplots(figsize=(15, 7.8))
ms = 10
ax = create_ncaa_full_court(ax, paint_alpha=0.1)
MEvents.query('EventType == "turnover"') \
.plot(x='X_', y='Y_', style='X',
title='Turnover Locations (Mens)',
c='red',
alpha=0.3,
figsize=(15, 9),
label='Steals',
ms=ms,#點(diǎn)的大小
ax=ax)
ax.set_xlabel('')
ax.get_legend().remove()
plt.show()
fig, ax = plt.subplots(figsize = (15,7.8))
ms = 10
ax = create_ncaa_full_court(ax, paint_alpha=0.2)
WEvents[WEvents['EventType'] == 'turnover']\
.plot(x = 'X_', y = 'Y_', style = 'o',
title='Turnover Locations (Womens)',
alpha = 0.2,
figsize=(15, 9),
label='Steals',
ms=ms,
ax = ax)
ax.set_xlabel('')
ax.get_legend().remove()
plt.show()
下面利用subplot分別將男女兩分慌洪、三分投進(jìn)顶燕、投失的位置進(jìn)行可視化
COURT_COLOR = '#dfbb85'
fig, ax = plt.subplots(2, 2, figsize=(20, 10))
ax1 = ax[0,0]
ax2 = ax[0,1]
ax3 = ax[1,0]
ax4 = ax[1,1]
# Where are 3 pointers made from? (This is really cool)
WEvents.query('EventType == "made3"') \
.plot(x='X_', y='Y_', style='.',
color='blue',
title='3 Pointers Made (Womens)',
alpha=0.01, ax=ax1)
ax1 = create_ncaa_full_court(ax1, lw=0.5, three_line='womens', paint_alpha=0.1)
ax1.set_facecolor(COURT_COLOR)
WEvents.query('EventType == "miss3"') \
.plot(x='X_', y='Y_', style='.',
title='3 Pointers Missed (Womens)',
color='red',
alpha=0.01, ax=ax2)
ax2.set_facecolor(COURT_COLOR)
ax2 = create_ncaa_full_court(ax2, lw=0.5, three_line='womens', paint_alpha=0.1)
WEvents.query('EventType == "made2"') \
.plot(x='X_', y='Y_', style='.',
color='blue',
title='2 Pointers Made (Womens)',
alpha=0.01, ax=ax3)
ax3.set_facecolor(COURT_COLOR)
ax3 = create_ncaa_full_court(ax3, lw=0.5, three_line='womens', paint_alpha=0.1)
WEvents.query('EventType == "miss2"') \
.plot(x='X_', y='Y_', style='.',
title='2 Pointers Missed (Womens)',
color='red',
alpha=0.01, ax=ax4)
ax4.set_facecolor(COURT_COLOR)
ax4 = create_ncaa_full_court(ax4, lw=0.5, three_line='womens', paint_alpha=0.1)
ax1.get_legend().remove()
ax2.get_legend().remove()
ax1.set_xticks([])
ax1.set_yticks([])
ax2.set_xticks([])
ax2.set_yticks([])
ax1.set_xlabel('')
ax2.set_xlabel('')
ax3.get_legend().remove()
ax4.get_legend().remove()
ax3.set_xticks([])
ax4.set_yticks([])
ax3.set_xticks([])
ax4.set_yticks([])
ax3.set_xlabel('')
ax4.set_xlabel('')
plt.show()
以上是對(duì)整體數(shù)據(jù)的XY坐標(biāo)的可視化,下面結(jié)合Players.csv對(duì)單個(gè)球員的數(shù)據(jù)進(jìn)行可視化冈爹。
單個(gè)球員數(shù)據(jù)的可視化
Players數(shù)據(jù)包含 ID涌攻、姓、名频伤、所在球隊(duì)的ID
#男子數(shù)據(jù)有幾行不太好 利用error_bad_lines參數(shù)進(jìn)行讀取
MPlayers = pd.read_csv(Mfolder_path+f'MPlayers.csv', error_bad_lines=False)
WPlayers = pd.read_csv(Wfolder_path+f'WPlayers.csv')
MPlayers.head()
與事件數(shù)據(jù)進(jìn)行合并
# Merge Player name onto events
MEvents = MEvents.merge(MPlayers,
how='left',
left_on='EventPlayerID',
right_on='PlayerID')
WEvents = WEvents.merge(WPlayers,
how='left',
left_on='EventPlayerID',
right_on='PlayerID')
看一下19年恳谎、18年冠軍及其MVP的ID
#2019 弗吉尼亞大學(xué)騎兵隊(duì) MVP 凱爾·蓋伊
MPlayers.query('FirstName == "Donte" and LastName == "DiVincenzo"') #但是沒有坐標(biāo)位置 沒法可視化
| 2018 維拉諾瓦大學(xué)野貓隊(duì) MVP丹特·迪溫琴佐
MPlayers.query('FirstName == "Kyle" and LastName == "Guy"')
發(fā)現(xiàn)這兩個(gè)MVP的XY數(shù)據(jù)恰好均是缺失,所以隨意選了一個(gè)有數(shù)據(jù)的可視化憋肖,泰·杰羅姆 1997年7月8日出生于美國紐約州新羅謝爾因痛,美國職業(yè)籃球運(yùn)動(dòng)員,司職控球后衛(wèi)瞬哼,效力于NBA菲尼克斯太陽隊(duì)婚肆。 ID為12410
MEvents.query('EventPlayerID == 12410')['EventType'].value_counts()
sub 469
assist 376
reb 311
miss3 254
miss2 208
foul 201
made2 193
made3 164
turnover 144
steal 126
made1 117
fouled 63
miss1 32
block 4
對(duì)其數(shù)據(jù)進(jìn)行可視化
ms = 10 # Marker Size
fig, ax = plt.subplots(figsize=(15, 8))
ax = create_ncaa_full_court(ax)
MEvents.query('EventPlayerID == 12410 and EventType == "made2"') \
.plot(x='X_', y='Y_', style='o',
title='Shots (Ty Jerome)',
alpha=0.5,
figsize=(15, 8),
label='Made 2',
ms=ms,
ax=ax)
plt.legend()
MEvents.query('EventPlayerID == 12410 and EventType == "miss2"') \
.plot(x='X_', y='Y_', style='X',
alpha=0.5, ax=ax,
label='Missed 2',
ms=ms)
plt.legend()
MEvents.query('EventPlayerID == 12410 and EventType == "made3"') \
.plot(x='X_', y='Y_', style='o',
c='brown',
alpha=0.5,
figsize=(15, 8),
label='Made 3', ax=ax,
ms=ms)
plt.legend()
MEvents.query('EventPlayerID == 12410 and EventType == "miss3"') \
.plot(x='X_', y='Y_', style='X',
c='green',
alpha=0.5, ax=ax,
label='Missed 3',
ms=ms)
ax.set_xlabel('')
plt.legend()
plt.show()
來看一看 錫安·威廉姆森 ID 2825 的數(shù)據(jù)
- 錫安-威廉姆斯(Zion Williamson)租副,2000年7月6日出生于美國南卡羅來納州斯帕坦堡坐慰,美國職業(yè)籃球運(yùn)動(dòng)員,司職大前鋒,效力于NBA[新奧爾良鵜鶘隊(duì)錫安-威廉姆斯于2019年以選秀狀元身份進(jìn)入NBA结胀。
MPlayers.query('FirstName == "Zion" and LastName == "Williamson"')
ms = 10 # Marker Size
FirstName = 'Zion'
LastName = 'Williamson'
fig, ax = plt.subplots(figsize=(15, 8))
ax = create_ncaa_full_court(ax)
MEvents.query('EventPlayerID == 2825 and EventType == "made2"') \
.plot(x='X_', y='Y_', style='o',
title='Shots (Zion Williamson)',
alpha=0.5,
figsize=(15, 8),
label='Made 2',
ms=ms,
ax=ax)
plt.legend()
MEvents.query('EventPlayerID == 2825 and EventType == "miss2"') \
.plot(x='X_', y='Y_', style='X',
alpha=0.5, ax=ax,
label='Missed 2',
ms=ms)
plt.legend()
MEvents.query('EventPlayerID == 2825 and EventType == "made3"') \
.plot(x='X_', y='Y_', style='o',
c='brown',
alpha=0.5,
figsize=(15, 8),
label='Made 3', ax=ax,
ms=ms)
plt.legend()
MEvents.query('EventPlayerID == 2825 and EventType == "miss3"') \
.plot(x='X_', y='Y_', style='X',
c='green',
alpha=0.5, ax=ax,
label='Missed 3',
ms=ms)
ax.set_xlabel('')
plt.legend()
plt.show()
再來看看女子比賽的凱特-薩繆爾森(Katie Lou Samuelson) ID 3163赞咙,她以三分射手而出名,并且和庫里一樣糟港,很擅長三分線外的超遠(yuǎn)距離三分攀操,下面可視化看看實(shí)時(shí)是否真的如此?
WPlayers.query('FirstName == "Katie Lou" and LastName == "Samuelson"')
fig, ax = plt.subplots(figsize=(15, 8))
ax = create_ncaa_full_court(ax, three_line='womens')
WEvents.query('EventPlayerID == 1821 and EventType == "made2"') \
.plot(x='X_', y='Y_', style='o',
title='Shots (Katie Lou Samuelson)',
alpha=0.5,
figsize=(15, 8),
label='Made 2',
ms=ms,
ax=ax)
plt.legend()
WEvents.query('EventPlayerID == 1821 and EventType == "miss2"') \
.plot(x='X_', y='Y_', style='X',
alpha=0.5, ax=ax,
label='Missed 2',
ms=ms)
plt.legend()
WEvents.query('EventPlayerID == 1821 and EventType == "made3"') \
.plot(x='X_', y='Y_', style='o',
c='brown',
alpha=0.5,
figsize=(15, 8),
label='Made 3', ax=ax,
ms=ms)
plt.legend()
WEvents.query('EventPlayerID == 1821 and EventType == "miss3"') \
.plot(x='X_', y='Y_', style='X',
c='green',
alpha=0.5, ax=ax,
label='Missed 3',
ms=ms)
ax.set_xlabel('')
plt.legend()
plt.show()
很直觀的可以看出秸抚,確實(shí)如此速和。
投射熱力圖
無論是否投中,對(duì)所有的兩分球和三分球的投射位置進(jìn)行統(tǒng)計(jì)畫熱力圖剥汤,然后分析比較男女不同運(yùn)動(dòng)員餓投射偏好颠放。
N_bins = 100
#將投籃事件 和 坐標(biāo)不為0 的事件取出
shot_events = MEvents.loc[MEvents['EventType'].isin(['miss3','made3','miss2','made2']) & (MEvents['X_'] != 0)]
fig, ax = plt.subplots(figsize=(15, 7))
ax = create_ncaa_full_court(ax,
paint_alpha=0.0,
three_line='mens',
court_color='black',
lines_color='white')
plt.hist2d(shot_events['X_'].values + np.random.normal(0, 0.1, shot_events['X_'].shape), # Add Jitter to values for plotting
shot_events['Y_'].values + np.random.normal(0, 0.1, shot_events['Y_'].shape),
bins=N_bins, norm=mpl.colors.LogNorm(),
cmap='plasma')
# Plot a colorbar with label.
cb = plt.colorbar()
cb.set_label('Number of shots')
ax.set_title('Shot Heatmap (Mens)')
plt.show()
N_bins = 100
shot_events = WEvents.loc[WEvents['EventType'].isin(['miss3','made3','miss2','made2']) & (WEvents['X_'] != 0)]
fig, ax = plt.subplots(figsize=(15, 7))
ax = create_ncaa_full_court(ax, three_line='womens', paint_alpha=0.0,
court_color='black',
lines_color='white')
plt.hist2d(shot_events['X_'].values + np.random.normal(0, 0.2, shot_events['X_'].shape),
shot_events['Y_'].values + np.random.normal(0, 0.2, shot_events['Y_'].shape),
bins=N_bins, norm=mpl.colors.LogNorm(),
cmap='plasma')
# Plot a colorbar with label.
cb = plt.colorbar()
cb.set_label('Number of shots')
ax.set_title('Shot Heatmap (Womens)')
plt.show()
在將男子和女子比賽進(jìn)行比較時(shí),有趣的觀察是吭敢,男子射擊的許多鏡頭都直接在籃筐下方碰凶,而女子射擊的熱點(diǎn)更多地出現(xiàn)在籃筐的左側(cè)和右側(cè)。
下面考察一下每個(gè)坐標(biāo)點(diǎn)的平均得分情況
MEvents['PointsScored'] = 0
MEvents.loc[MEvents['EventType'] == 'made2', 'PointsScored'] = 2
MEvents.loc[MEvents['EventType'] == 'made3', 'PointsScored'] = 3
MEvents.loc[MEvents['EventType'] == 'missed2', 'PointsScored'] = 0
MEvents.loc[MEvents['EventType'] == 'missed3', 'PointsScored'] = 0
avg_pnt_xy = MEvents.loc[MEvents['EventType'].isin(['miss3','made3','miss2','made2']) & (MEvents['X_'] != 0)] \
.groupby(['X_','Y_'])['PointsScored'].mean().reset_index()
#avg_pnt_xy.plot(x='X_',y='Y_', style='.')
bins = [0,0.5,1,1.33,1.67,2,2.5,3]
avg_pnt_xy['PointsScored'] = pd.cut(avg_pnt_xy['PointsScored'],bins)
fig, ax = plt.subplots(figsize=(15, 8))
ax = sns.scatterplot(data=avg_pnt_xy, x='X_', y='Y_', hue='PointsScored')
ax = create_ncaa_full_court(ax)
plt.legend(loc=[1,0])
plt.show()
下面考察一下對(duì)每個(gè)坐標(biāo)點(diǎn)投射次數(shù)的統(tǒng)計(jì)
MEvents['Made'] = False
MEvents['Made'] = False
MEvents.loc[MEvents['EventType'] == 'made2', 'Made'] = True
MEvents.loc[MEvents['EventType'] == 'made3', 'Made'] = True
MEvents.loc[MEvents['EventType'] == 'missed2', 'Made'] = False
MEvents.loc[MEvents['EventType'] == 'missed3', 'Made'] = False
MEvents.loc[MEvents['EventType'] == 'made2', 'Missed'] = False
MEvents.loc[MEvents['EventType'] == 'made3', 'Missed'] = False
MEvents.loc[MEvents['EventType'] == 'missed2', 'Missed'] = True
MEvents.loc[MEvents['EventType'] == 'missed3', 'Missed'] = True
avg_made_xy = MEvents.loc[MEvents['EventType'].isin(['miss3','made3','miss2','made2']) & (MEvents['X_'] != 0)] \
.groupby(['X_','Y_'])['Made','Missed'].sum().reset_index()
bins = [0,25,50,100,200,1000,2000]
avg_made_xy['Made'] = pd.cut(avg_made_xy['Made'],bins)
fig, ax = plt.subplots(figsize=(15, 8))
cmap = sns.cubehelix_palette(as_cmap=True)
ax = sns.scatterplot(data=avg_made_xy, x='X_', y='Y_', cmap='plasma', hue='Made')
ax = create_ncaa_full_court(ax, paint_alpha=0)
ax.set_title('Number of Shots Made')
plt.legend(loc=[1, 0])
plt.show()
本節(jié)完
下節(jié)帶來 - 只根據(jù)球隊(duì)的對(duì)陣信息 如何確定球隊(duì)的水平 并給出排名