visualization——matplotlib

通過本手冊毅戈，你將收獲以下知識：

matplotlib 及環(huán)境配置
數(shù)據(jù)圖的組成結(jié)構(gòu)隙咸，與 matplotlib 對應(yīng)的名稱
常見的數(shù)據(jù)繪圖類型，與繪制方法

您可能需要以下的準(zhǔn)備與先修知識：

Python開發(fā)環(huán)境及matplotlib工具包
Python基礎(chǔ)語法
Python numpy 包使用

1.matplotlib安裝配置

linux可以通過以下方式安裝matplotlib
sudo pip install numpy
sudo pip install scipy
sudo pip install matplotlib
windows墻裂推薦大家使用anaconda

2.一副可視化圖的基本結(jié)構(gòu)

通常及穗，使用 numpy 組織數(shù)據(jù), 使用 matplotlib API 進(jìn)行數(shù)據(jù)圖像繪制肝箱。一幅數(shù)據(jù)圖基本上包括如下結(jié)構(gòu)：

Data: 數(shù)據(jù)區(qū)哄褒，包括數(shù)據(jù)點(diǎn)、描繪形狀
Axis: 坐標(biāo)軸煌张，包括 X 軸呐赡、 Y 軸及其標(biāo)簽、刻度尺及其標(biāo)簽
Title: 標(biāo)題骏融，數(shù)據(jù)圖的描述
Legend: 圖例链嘀，區(qū)分圖中包含的多種曲線或不同分類的數(shù)據(jù)
其他的還有圖形文本 (Text)、注解 (Annotate)等其他描述

image.png

3.畫法

下面以常規(guī)圖為例档玻，詳細(xì)記錄作圖流程及技巧怀泊。按照繪圖結(jié)構(gòu)，可將數(shù)據(jù)圖的繪制分為如下幾個步驟：

導(dǎo)入 matplotlib 包相關(guān)工具包
準(zhǔn)備數(shù)據(jù)误趴，numpy 數(shù)組存儲
繪制原始曲線
配置標(biāo)題霹琼、坐標(biāo)軸、刻度冤留、圖例
添加文字說明碧囊、注解
顯示、保存繪圖結(jié)果

3.1 導(dǎo)包

會用到 matplotlib.pyplot纤怒、pylab 和 numpy

#coding:utf-8
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from pylab import *

3.2 準(zhǔn)備數(shù)據(jù)

numpy 常用來組織源數(shù)據(jù):

# 定義數(shù)據(jù)部分
x = np.arange(0., 10, 0.2)
y1 = np.cos(x)
y2 = np.sin(x)
y3 = np.sqrt(x)

#x = all_df['house_age']
#y = all_df['house_price']

3.3繪制基本曲線

使用 plot 函數(shù)直接繪制上述函數(shù)曲線糯而，可以通過配置 plot 函數(shù)參數(shù)調(diào)整曲線的樣式、粗細(xì)泊窘、顏色熄驼、標(biāo)記等：

# 繪制 3 條函數(shù)曲線
# $y=\sqrt{x}$
plt.rcParams["figure.figsize"] = (12,8)
plt.plot(x, y1, color='blue', linewidth=1.5, linestyle='-', marker='.', label=r'$y = cos{x}$')
plt.plot(x, y2, color='green', linewidth=1.5, linestyle='-', marker='*', label=r'$y = sin{x}$')
plt.plot(x, y3, color='m', linewidth=1.5, linestyle='-', marker='x', label=r'$y = \sqrt{x}$')

3.3.1 關(guān)于顏色的補(bǔ)充

主要是color參數(shù)：

r 紅色
g 綠色
b 藍(lán)色
c cyan
m 紫色
y 土黃色
k 黑色
w 白色

image.png

3.3.2 linestyle參數(shù)

linestyle 參數(shù)主要包含虛線、點(diǎn)化虛線烘豹、粗虛線瓜贾、實(shí)線，如下：

image.png

3.3.3 marker參數(shù)

marker參數(shù)設(shè)定在曲線上標(biāo)記的特殊符號携悯，以區(qū)分不同的線段祭芦。常見的形狀及表示符號如下圖所示：

image.png

3.4 設(shè)置坐標(biāo)軸

可通過如下代碼，移動坐標(biāo)軸 spines

# 坐標(biāo)軸上移
ax = plt.subplot(111)
#ax = plt.subplot(2,2,1)
ax.spines['right'].set_color('none')     # 去掉右邊的邊框線
ax.spines['top'].set_color('none')       # 去掉上邊的邊框線
# 移動下邊邊框線憔鬼，相當(dāng)于移動 X 軸
ax.xaxis.set_ticks_position('bottom')    
ax.spines['bottom'].set_position(('data', 0))
# 移動左邊邊框線龟劲，相當(dāng)于移動 y 軸
ax.yaxis.set_ticks_position('left')
ax.spines['left'].set_position(('data', 0))

可通過如下代碼，設(shè)置刻度尺間隔 lim轴或、刻度標(biāo)簽 ticks

# 設(shè)置 x, y 軸的刻度取值范圍
plt.xlim(x.min()*1.1, x.max()*1.1)
plt.ylim(-1.5, 4.0)
# 設(shè)置 x, y 軸的刻度標(biāo)簽值
plt.xticks([2, 4, 6, 8, 10], [r'two', r'four', r'6', r'8', r'10'])
plt.yticks([-1.0, 0.0, 1.0, 2.0, 3.0, 4.0],
    [r'-1.0', r'0.0', r'1.0', r'2.0', r'3.0', r'4.0'])

可通過如下代碼昌跌，設(shè)置 X、Y 坐標(biāo)軸和標(biāo)題：

# 設(shè)置標(biāo)題照雁、x軸蚕愤、y軸
plt.title(r'$the \ function \ figure \ of \ cos(), \ sin() \ and \ sqrt()$', fontsize=19)
plt.xlabel(r'$the \ input \ value \ of \ x$', fontsize=18, labelpad=88.8)
plt.ylabel(r'$y = f(x)$', fontsize=18, labelpad=12.5)

3.5 設(shè)置文字描述、注解

可通過如下代碼，在數(shù)據(jù)圖中添加文字描述 text：

plt.text(0.8, 0.9, r'$x \in [0.0, \ 10.0]$', color='k', fontsize=15)
plt.text(0.8, 0.8, r'$y \in [-1.0, \ 4.0]$', color='k', fontsize=15)

可通過如下代碼萍诱，在數(shù)據(jù)圖中給特殊點(diǎn)添加注解 annotate：

# 特殊點(diǎn)添加注解
plt.scatter([8,],[np.sqrt(8),], 50, color ='m')  # 使用散點(diǎn)圖放大當(dāng)前點(diǎn)
plt.annotate(r'$2\sqrt{2}$', xy=(8, np.sqrt(8)), xytext=(8.5, 2.2), fontsize=16, color='#090909', arrowprops=dict(arrowstyle='->', connectionstyle='arc3, rad=0.1', color='#090909'))

3.6 設(shè)置圖例

可使用如下兩種方式悬嗓，給繪圖設(shè)置圖例：

1: 在 plt.plot 函數(shù)中添加 label 參數(shù)后，使用 plt.legend(loc=’up right’)
2: 不使用參數(shù) label, 直接使用如下命令：

plt.legend(['cos(x)', 'sin(x)', 'sqrt(x)'], loc='upper right')

image.png

3.7 網(wǎng)格線開關(guān)

可使用如下代碼砂沛，給繪圖設(shè)置網(wǎng)格線：

# 顯示網(wǎng)格線
plt.grid(True)

3.8 顯示與圖像保存

plt.show()    # 顯示
#savefig('../figures/plot3d_ex.png',dpi=48)    # 保存烫扼，前提目錄存在

4. 完整的繪制程序

#coding:utf-8

import numpy as np
import matplotlib.pyplot as plt
from pylab import *

# 定義數(shù)據(jù)部分
x = np.arange(0., 10, 0.2)
y1 = np.cos(x)
y2 = np.sin(x)
y3 = np.sqrt(x)

# 繪制 3 條函數(shù)曲線
plt.plot(x, y1, color='blue', linewidth=1.5, linestyle='-', marker='.', label=r'$y = cos{x}$')
plt.plot(x, y2, color='green', linewidth=1.5, linestyle='-', marker='*', label=r'$y = sin{x}$')
plt.plot(x, y3, color='m', linewidth=1.5, linestyle='-', marker='x', label=r'$y = \sqrt{x}$')

# 坐標(biāo)軸上移
ax = plt.subplot(111)
ax.spines['right'].set_color('none')     # 去掉右邊的邊框線
ax.spines['top'].set_color('none')       # 去掉上邊的邊框線

# 移動下邊邊框線，相當(dāng)于移動 X 軸
ax.xaxis.set_ticks_position('bottom')    
ax.spines['bottom'].set_position(('data', 0))

# 移動左邊邊框線碍庵，相當(dāng)于移動 y 軸
ax.yaxis.set_ticks_position('left')
ax.spines['left'].set_position(('data', 0))

# 設(shè)置 x, y 軸的取值范圍
plt.xlim(x.min()*1.1, x.max()*1.1)
plt.ylim(-1.5, 4.0)

# 設(shè)置 x, y 軸的刻度值
plt.xticks([2, 4, 6, 8, 10], [r'2', r'4', r'6', r'8', r'10'])
plt.yticks([-1.0, 0.0, 1.0, 2.0, 3.0, 4.0], 
    [r'-1.0', r'0.0', r'1.0', r'2.0', r'3.0', r'4.0'])

# 添加文字
plt.text(0.8, 0.8, r'$x \in [0.0, \ 10.0]$', color='k', fontsize=15)
plt.text(0.8, 0.9, r'$y \in [-1.0, \ 4.0]$', color='k', fontsize=15)

# 特殊點(diǎn)添加注解
plt.scatter([8,],[np.sqrt(8),], 50, color ='m')  # 使用散點(diǎn)圖放大當(dāng)前點(diǎn)
plt.annotate(r'$2\sqrt{2}$', xy=(8, np.sqrt(8)), xytext=(8.5, 2.2), fontsize=16, color='#090909', arrowprops=dict(arrowstyle='->', connectionstyle='arc3, rad=0.1', color='#090909'))

# 設(shè)置標(biāo)題映企、x軸、y軸
plt.title(r'$the \ function \ figure \ of \ cos(), \ sin() \ and \ sqrt()$', fontsize=19)
plt.xlabel(r'$the \ input \ value \ of \ x$', fontsize=18, labelpad=88.8)
plt.ylabel(r'$y = f(x)$', fontsize=18, labelpad=12.5)

# 設(shè)置圖例及位置
plt.legend(loc='up right')    
# plt.legend(['cos(x)', 'sin(x)', 'sqrt(x)'], loc='up right')

# 顯示網(wǎng)格線
plt.grid(True)    

# 顯示繪圖
plt.show()

5.常用圖像

細(xì)節(jié)看這里静浴，看這里堰氓，看這里
想成為可視化專家的你，工具手冊在哪里苹享？在這里双絮！更全的在這里

曲線圖：matplotlib.pyplot.plot(data)
灰度圖：matplotlib.pyplot.hist(data)
散點(diǎn)圖：matplotlib.pyplot.scatter(data)
箱式圖：matplotlib.pyplot.boxplot(data)

x = np.arange(-5,5,0.1)
y = x ** 2
plt.plot(x,y)

x = np.random.normal(size=1000)
plt.hist(x, bins=10)

plt.rcParams["figure.figsize"] = (8,8)
x = np.random.normal(size=1000)
y = np.random.normal(size=1000)
plt.scatter(x,y)

plt.boxplot(x)

箱式圖科普

上邊緣（Q3+1.5IQR）、下邊緣（Q1-1.5IQR）得问、IQR=Q3-Q1
上四分位數(shù)（Q3）囤攀、下四分位數(shù)（Q1）
中位數(shù)
異常值
處理異常值時與標(biāo)準(zhǔn)的異同：統(tǒng)計(jì)邊界是否受異常值影響、容忍度的大小

6.案例：自行車租賃數(shù)據(jù)分析與可視化

step1. 導(dǎo)入數(shù)據(jù)宫纬，做簡單的數(shù)據(jù)處理

import pandas as pd # 讀取數(shù)據(jù)到DataFrame
import urllib # 獲取網(wǎng)絡(luò)數(shù)據(jù)
import tempfile # 創(chuàng)建臨時文件系統(tǒng)
import shutil # 文件操作
import zipfile # 壓縮解壓

temp_dir = tempfile.mkdtemp() # 建立臨時目錄
data_source = 'http://archive.ics.uci.edu/ml/machine-learning-databases/00275/Bike-Sharing-Dataset.zip' # 網(wǎng)絡(luò)數(shù)據(jù)地址
zipname = temp_dir + '/Bike-Sharing-Dataset.zip' # 拼接文件和路徑
urllib.urlretrieve(data_source, zipname) # 獲得數(shù)據(jù)

zip_ref = zipfile.ZipFile(zipname, 'r') # 創(chuàng)建一個ZipFile對象處理壓縮文件
zip_ref.extractall(temp_dir) # 解壓
zip_ref.close()

daily_path = 'data/day.csv'
daily_data = pd.read_csv(daily_path) # 讀取csv文件
daily_data['dteday'] = pd.to_datetime(daily_data['dteday']) # 把字符串?dāng)?shù)據(jù)傳換成日期數(shù)據(jù)
drop_list = ['instant', 'season', 'yr', 'mnth', 'holiday', 'workingday', 'weathersit', 'atemp', 'hum'] # 不關(guān)注的列
daily_data.drop(drop_list, inplace = True, axis = 1) # inplace=true在對象上直接操作

shutil.rmtree(temp_dir) # 刪除臨時文件目錄

daily_data.head() # 看一看數(shù)據(jù)~

step2. 配置參數(shù)

from __future__ import division, print_function # 引入3.x版本的除法和打印
from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
# 在notebook中顯示繪圖結(jié)果
%matplotlib inline

# 設(shè)置一些全局的資源參數(shù)焚挠，可以進(jìn)行個性化修改
import matplotlib
# 設(shè)置圖片尺寸 14" x 7"
# rc: resource configuration
matplotlib.rc('figure', figsize = (14, 7))
# 設(shè)置字體 14
matplotlib.rc('font', size = 14)
# 不顯示頂部和右側(cè)的坐標(biāo)線
matplotlib.rc('axes.spines', top = False, right = False)
# 不顯示網(wǎng)格
matplotlib.rc('axes', grid = False)
# 設(shè)置背景顏色是白色
matplotlib.rc('axes', facecolor = 'white')

step3. 關(guān)聯(lián)分析

散點(diǎn)圖

分析變量關(guān)系

from matplotlib import font_manager
fontP = font_manager.FontProperties()
fontP.set_family('SimHei')
fontP.set_size(14)

# 包裝一個散點(diǎn)圖的函數(shù)便于復(fù)用
def scatterplot(x_data, y_data, x_label, y_label, title):

    # 創(chuàng)建一個繪圖對象
    fig, ax = plt.subplots()

    # 設(shè)置數(shù)據(jù)、點(diǎn)的大小漓骚、點(diǎn)的顏色和透明度
    ax.scatter(x_data, y_data, s = 10, color = '#539caf', alpha = 0.75) # http://www.114la.com/other/rgb.htm

    # 添加標(biāo)題和坐標(biāo)說明
    ax.set_title(title)
    ax.set_xlabel(x_label)
    ax.set_ylabel(y_label)

# 繪制散點(diǎn)圖
scatterplot(x_data = daily_data['temp']
            , y_data = daily_data['cnt']
            , x_label = 'Normalized temperature (C)'
            , y_label = 'Check outs'
            , title = 'Number of Check Outs vs Temperature')

曲線圖

擬合變量關(guān)系

# 線性回歸
import statsmodels.api as sm # 最小二乘
from statsmodels.stats.outliers_influence import summary_table # 獲得匯總信息
x = sm.add_constant(daily_data['temp']) # 線性回歸增加常數(shù)項(xiàng) y=kx+b
y = daily_data['cnt']
regr = sm.OLS(y, x) # 普通最小二乘模型蝌衔，ordinary least square model
res = regr.fit()
# 從模型獲得擬合數(shù)據(jù)
st, data, ss2 = summary_table(res, alpha=0.05) # 置信水平alpha=5%，st數(shù)據(jù)匯總蝌蹂，data數(shù)據(jù)詳情噩斟，ss2數(shù)據(jù)列名
fitted_values = data[:,2]

# 包裝曲線繪制函數(shù)
def lineplot(x_data, y_data, x_label, y_label, title):
    # 創(chuàng)建繪圖對象
    _, ax = plt.subplots()

    # 繪制擬合曲線，lw=linewidth孤个，alpha=transparancy
    ax.plot(x_data, y_data, lw = 2, color = '#539caf', alpha = 1)

    # 添加標(biāo)題和坐標(biāo)說明
    ax.set_title(title)
    ax.set_xlabel(x_label)
    ax.set_ylabel(y_label)

# 調(diào)用繪圖函數(shù)
lineplot(x_data = daily_data['temp']
         , y_data = fitted_values
         , x_label = 'Normalized temperature (C)'
         , y_label = 'Check outs'
         , title = 'Line of Best Fit for Number of Check Outs vs Temperature')

x.head()
type(regr)
st

帶置信區(qū)間的曲線圖

評估曲線擬合結(jié)果

# 獲得5%置信區(qū)間的上下界
predict_mean_ci_low, predict_mean_ci_upp = data[:,4:6].T

# 創(chuàng)建置信區(qū)間DataFrame剃允，上下界
CI_df = pd.DataFrame(columns = ['x_data', 'low_CI', 'upper_CI'])
CI_df['x_data'] = daily_data['temp']
CI_df['low_CI'] = predict_mean_ci_low
CI_df['upper_CI'] = predict_mean_ci_upp
CI_df.sort_values('x_data', inplace = True) # 根據(jù)x_data進(jìn)行排序

# 繪制置信區(qū)間
def lineplotCI(x_data, y_data, sorted_x, low_CI, upper_CI, x_label, y_label, title):
    # 創(chuàng)建繪圖對象
    _, ax = plt.subplots()

    # 繪制預(yù)測曲線
    ax.plot(x_data, y_data, lw = 1, color = '#539caf', alpha = 1, label = 'Fit')
    # 繪制置信區(qū)間，順序填充
    ax.fill_between(sorted_x, low_CI, upper_CI, color = '#539caf', alpha = 0.4, label = '95% CI')
    # 添加標(biāo)題和坐標(biāo)說明
    ax.set_title(title)
    ax.set_xlabel(x_label)
    ax.set_ylabel(y_label)

    # 顯示圖例齐鲤，配合label參數(shù)硅急，loc=“best”自適應(yīng)方式
    ax.legend(loc = 'best')

# Call the function to create plot
lineplotCI(x_data = daily_data['temp']
           , y_data = fitted_values
           , sorted_x = CI_df['x_data']
           , low_CI = CI_df['low_CI']
           , upper_CI = CI_df['upper_CI']
           , x_label = 'Normalized temperature (C)'
           , y_label = 'Check outs'
           , title = 'Line of Best Fit for Number of Check Outs vs Temperature')

雙坐標(biāo)曲線圖

曲線擬合不滿足置信閾值時，考慮增加獨(dú)立變量
*分析不同尺度多變量的關(guān)系

# 雙縱坐標(biāo)繪圖函數(shù)
def lineplot2y(x_data, x_label, y1_data, y1_color, y1_label, y2_data, y2_color, y2_label, title):
    _, ax1 = plt.subplots()
    ax1.plot(x_data, y1_data, color = y1_color)
    # 添加標(biāo)題和坐標(biāo)說明
    ax1.set_ylabel(y1_label, color = y1_color)
    ax1.set_xlabel(x_label)
    ax1.set_title(title)

    ax2 = ax1.twinx() # 兩個繪圖對象共享橫坐標(biāo)軸
    ax2.plot(x_data, y2_data, color = y2_color)
    ax2.set_ylabel(y2_label, color = y2_color)
    # 右側(cè)坐標(biāo)軸可見
    ax2.spines['right'].set_visible(True)

# 調(diào)用繪圖函數(shù)
lineplot2y(x_data = daily_data['dteday']
           , x_label = 'Day'
           , y1_data = daily_data['cnt']
           , y1_color = '#539caf'
           , y1_label = 'Check outs'
           , y2_data = daily_data['windspeed']
           , y2_color = '#7663b0'
           , y2_label = 'Normalized windspeed'
           , title = 'Check Outs and Windspeed Over Time')

step4. 分布分析

灰度圖

粗略區(qū)間計(jì)數(shù)

# 繪制灰度圖的函數(shù)
def histogram(data, x_label, y_label, title):
    _, ax = plt.subplots()
    res = ax.hist(data, color = '#539caf', bins=10) # 設(shè)置bin的數(shù)量
    ax.set_ylabel(y_label)
    ax.set_xlabel(x_label)
    ax.set_title(title)
    return res

# 繪圖函數(shù)調(diào)用
res = histogram(data = daily_data['registered']
           , x_label = 'Check outs'
           , y_label = 'Frequency'
           , title = 'Distribution of Registered Check Outs')
res[0] # value of bins
res[1] # boundary of bins

堆疊直方圖

比較兩個分布

# 繪制堆疊的直方圖
def overlaid_histogram(data1, data1_name, data1_color, data2, data2_name, data2_color, x_label, y_label, title):
    # 歸一化數(shù)據(jù)區(qū)間佳遂，對齊兩個直方圖的bins
    max_nbins = 10
    data_range = [min(min(data1), min(data2)), max(max(data1), max(data2))]
    binwidth = (data_range[1] - data_range[0]) / max_nbins
    bins = np.arange(data_range[0], data_range[1] + binwidth, binwidth) # 生成直方圖bins區(qū)間

    # Create the plot
    _, ax = plt.subplots()
    ax.hist(data1, bins = bins, color = data1_color, alpha = 1, label = data1_name)
    ax.hist(data2, bins = bins, color = data2_color, alpha = 0.75, label = data2_name)
    ax.set_ylabel(y_label)
    ax.set_xlabel(x_label)
    ax.set_title(title)
    ax.legend(loc = 'best')

# Call the function to create plot
overlaid_histogram(data1 = daily_data['registered']
                   , data1_name = 'Registered'
                   , data1_color = '#539caf'
                   , data2 = daily_data['casual']
                   , data2_name = 'Casual'
                   , data2_color = '#7663b0'
                   , x_label = 'Check outs'
                   , y_label = 'Frequency'
                   , title = 'Distribution of Check Outs By Type')

registered：注冊的分布，正態(tài)分布撒顿，why
casual：偶然的分布丑罪，疑似指數(shù)分布，why

密度圖

精細(xì)刻畫概率分布
KDE: kernal density estimate

# 計(jì)算概率密度
from scipy.stats import gaussian_kde
data = daily_data['registered']
density_est = gaussian_kde(data) # kernal density estimate: https://en.wikipedia.org/wiki/Kernel_density_estimation
# 控制平滑程度，數(shù)值越大吩屹，越平滑
density_est.covariance_factor = lambda : .3
density_est._compute_covariance()
x_data = np.arange(min(data), max(data), 200)

# 繪制密度估計(jì)曲線
def densityplot(x_data, density_est, x_label, y_label, title):
    _, ax = plt.subplots()
    ax.plot(x_data, density_est(x_data), color = '#539caf', lw = 2)
    ax.set_ylabel(y_label)
    ax.set_xlabel(x_label)
    ax.set_title(title)

# 調(diào)用繪圖函數(shù)
densityplot(x_data = x_data
            , density_est = density_est
            , x_label = 'Check outs'
            , y_label = 'Frequency'
            , title = 'Distribution of Registered Check Outs')

type(density_est)

step5. 組間分析

組間定量比較
分組粒度
組間聚類

柱狀圖

一級類間均值方差比較

# 分天分析統(tǒng)計(jì)特征
mean_total_co_day = daily_data[['weekday', 'cnt']].groupby('weekday').agg([np.mean, np.std])
mean_total_co_day.columns = mean_total_co_day.columns.droplevel()

# 定義繪制柱狀圖的函數(shù)
def barplot(x_data, y_data, error_data, x_label, y_label, title):
    _, ax = plt.subplots()
    # 柱狀圖
    ax.bar(x_data, y_data, color = '#539caf', align = 'center')
    # 繪制方差
    # ls='none'去掉bar之間的連線
    ax.errorbar(x_data, y_data, yerr = error_data, color = '#297083', ls = 'none', lw = 5)
    ax.set_ylabel(y_label)
    ax.set_xlabel(x_label)
    ax.set_title(title)

# 繪圖函數(shù)調(diào)用
barplot(x_data = mean_total_co_day.index.values
        , y_data = mean_total_co_day['mean']
        , error_data = mean_total_co_day['std']
        , x_label = 'Day of week'
        , y_label = 'Check outs'
        , title = 'Total Check Outs By Day of Week (0 = Sunday)')

mean_total_co_day.columns
daily_data[['weekday', 'cnt']].groupby('weekday').agg([np.mean, np.std])

堆積柱狀圖

多級類間相對占比比較

mean_by_reg_co_day = daily_data[['weekday', 'registered', 'casual']].groupby('weekday').mean()
mean_by_reg_co_day

# 分天統(tǒng)計(jì)注冊和偶然使用的情況
mean_by_reg_co_day = daily_data[['weekday', 'registered', 'casual']].groupby('weekday').mean()
# 分天統(tǒng)計(jì)注冊和偶然使用的占比
mean_by_reg_co_day['total'] = mean_by_reg_co_day['registered'] + mean_by_reg_co_day['casual']
mean_by_reg_co_day['reg_prop'] = mean_by_reg_co_day['registered'] / mean_by_reg_co_day['total']
mean_by_reg_co_day['casual_prop'] = mean_by_reg_co_day['casual'] / mean_by_reg_co_day['total']


# 繪制堆積柱狀圖
def stackedbarplot(x_data, y_data_list, y_data_names, colors, x_label, y_label, title):
    _, ax = plt.subplots()
    # 循環(huán)繪制堆積柱狀圖
    for i in range(0, len(y_data_list)):
        if i == 0:
            ax.bar(x_data, y_data_list[i], color = colors[i], align = 'center', label = y_data_names[i])
        else:
            # 采用堆積的方式跪另，除了第一個分類，后面的分類都從前一個分類的柱狀圖接著畫
            # 用歸一化保證最終累積結(jié)果為1
            ax.bar(x_data, y_data_list[i], color = colors[i], bottom = y_data_list[i - 1], align = 'center', label = y_data_names[i])
    ax.set_ylabel(y_label)
    ax.set_xlabel(x_label)
    ax.set_title(title)
    ax.legend(loc = 'upper right') # 設(shè)定圖例位置

# 調(diào)用繪圖函數(shù)
stackedbarplot(x_data = mean_by_reg_co_day.index.values
               , y_data_list = [mean_by_reg_co_day['reg_prop'], mean_by_reg_co_day['casual_prop']]
               , y_data_names = ['Registered', 'Casual']
               , colors = ['#539caf', '#7663b0']
               , x_label = 'Day of week'
               , y_label = 'Proportion of check outs'
               , title = 'Check Outs By Registration Status and Day of Week (0 = Sunday)')

分組柱狀圖

多級類間絕對數(shù)值比較

# 繪制分組柱狀圖的函數(shù)
def groupedbarplot(x_data, y_data_list, y_data_names, colors, x_label, y_label, title):
    _, ax = plt.subplots()
    # 設(shè)置每一組柱狀圖的寬度
    total_width = 0.8
    # 設(shè)置每一個柱狀圖的寬度
    ind_width = total_width / len(y_data_list)
    # 計(jì)算每一個柱狀圖的中心偏移
    alteration = np.arange(-total_width/2+ind_width/2, total_width/2+ind_width/2, ind_width)

    # 分別繪制每一個柱狀圖
    for i in range(0, len(y_data_list)):
        # 橫向散開繪制
        ax.bar(x_data + alteration[i], y_data_list[i], color = colors[i], label = y_data_names[i], width = ind_width)
    ax.set_ylabel(y_label)
    ax.set_xlabel(x_label)
    ax.set_title(title)
    ax.legend(loc = 'upper right')

# 調(diào)用繪圖函數(shù)
groupedbarplot(x_data = mean_by_reg_co_day.index.values
               , y_data_list = [mean_by_reg_co_day['registered'], mean_by_reg_co_day['casual']]
               , y_data_names = ['Registered', 'Casual']
               , colors = ['#539caf', '#7663b0']
               , x_label = 'Day of week'
               , y_label = 'Check outs'
               , title = 'Check Outs By Registration Status and Day of Week (0 = Sunday)')

偏移前：ind_width/2
偏移后：total_width/2
偏移量：total_width/2-ind_width/2

箱式圖

多級類間數(shù)據(jù)分布比較
柱狀圖 + 堆疊灰度圖

# 只需要指定分類的依據(jù)煤搜，就能自動繪制箱式圖
days = np.unique(daily_data['weekday'])
bp_data = []
for day in days:
    bp_data.append(daily_data[daily_data['weekday'] == day]['cnt'].values)

# 定義繪圖函數(shù)
def boxplot(x_data, y_data, base_color, median_color, x_label, y_label, title):
    _, ax = plt.subplots()

    # 設(shè)置樣式
    ax.boxplot(y_data
               # 箱子是否顏色填充
               , patch_artist = True
               # 中位數(shù)線顏色
               , medianprops = {'color': base_color}
               # 箱子顏色設(shè)置免绿，color：邊框顏色，facecolor：填充顏色
               , boxprops = {'color': base_color, 'facecolor': median_color}
               # 貓須顏色whisker
               , whiskerprops = {'color': median_color}
               # 貓須界限顏色whisker cap
               , capprops = {'color': base_color})

    # 箱圖與x_data保持一致
    ax.set_xticklabels(x_data)
    ax.set_ylabel(y_label)
    ax.set_xlabel(x_label)
    ax.set_title(title)

# 調(diào)用繪圖函數(shù)
boxplot(x_data = days
        , y_data = bp_data
        , base_color = 'b'
        , median_color = 'r'
        , x_label = 'Day of week'
        , y_label = 'Check outs'
        , title = 'Total Check Outs By Day of Week (0 = Sunday)')

7. 簡單總結(jié)

關(guān)聯(lián)分析擦盾、數(shù)值比較：散點(diǎn)圖嘲驾、曲線圖
分布分析：灰度圖、密度圖
涉及分類的分析：柱狀圖迹卢、箱式圖

8.案例：2014世界杯決賽分析

step1. 預(yù)處理

準(zhǔn)備好相應(yīng)的數(shù)據(jù)辽故，同時也引入需要的包。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from footyscripts.footyviz import draw_events, draw_pitch, type_names

#plotting settings
%matplotlib inline
pd.options.display.mpl_style = 'default'

df = pd.read_csv("../datasets/germany-vs-argentina-731830.csv", encoding='utf-8', index_col=0)
df.head()

df.index = range(1,len(df) + 1)
df.head()

#standard dimensions
x_size = 105.0
y_size = 68.0
box_height = 16.5*2 + 7.32
box_width = 16.5
y_box_start = y_size/2-box_height/2
y_box_end = y_size/2+box_height/2

#scale of dataset is 100 by 100. Normalizing for a standard soccer pitch size
df['x']=df['x']/100*x_size 
df['y']=df['y']/100*y_size
df['to_x']=df['to_x']/100*x_size
df['to_y']=df['to_y']/100*y_size

#creating some measures and classifiers from the original 
df['count'] = 1
df['dx'] = df['to_x'] - df['x']
df['dy'] = df['to_y'] - df['y']
df['distance'] = np.sqrt(df['dx']**2+df['dy']**2)
df['fivemin'] = np.floor(df['min']/5)*5
df['type_name'] = df['type'].map(type_names.get)
df['to_box'] = (df['to_x'] > x_size - box_width) & (y_box_start < df['to_y']) & (df['to_y'] < y_box_end)
df['from_box'] = (df['x'] > x_size - box_width) & (y_box_start < df['y']) & (df['y'] < y_box_end)
df['on_offense'] = df['x']>x_size/2

添加隊(duì)名和球員的名字腐碱，翻遍后面進(jìn)行統(tǒng)計(jì)和評估

df['team_name'] = np.where(df['team']==357, 'Germany', 'Argentina')

player_dic = {15207:"Philipp Lahm",44989:"Toni Kroos",15208:"Bastian Schweinsteiger",40691:"Jerome Boateng",37605:"Mesut ?zil",32644:"Javier Mascherano",66842:"André Schürrle",41316:"Benedikt H?wedes",38392:"Mats Hummels",55634:"Thomas Müller",39462:"Lucas Biglia",28525:"Ezequiel Garay",15312:"Martín Demichelis",20658:"Pablo Zabaleta",19054:"Lionel Messi",58893:"Marcos Rojo",20388:"Manuel Neuer",55661:"Enzo Pérez",42899:"Sergio Agüero",37572:"Sergio Romero",5155:"Miroslav Klose",69600:"Fernando Gago",19975:"Mario G?tze",40232:"Gonzalo Higuaín",45154:"Ezequiel Lavezzi",20153:"Rodrigo Palacio",100927:"Christoph Kramer",17127:"Per Mertesacker"}

def get_player_name(player_id):
    return player_dic[player_id]

df['player_name'] = df['player_id'].apply(get_player_name)

#preslicing of the main DataFrame in smaller DFs that will be reused along the notebook
dfPeriod1 = df[df['period']==1]
dfP1Shots = dfPeriod1[dfPeriod1['type'].isin([13, 14, 15, 16])]
dfPeriod2 = df[df['period']==2]
dfP2Shots = dfPeriod2[dfPeriod2['type'].isin([13, 14, 15, 16])]
dfExtraTime = df[df['period']>2]
dfETShots = dfExtraTime[dfExtraTime['type'].isin([13, 14, 15, 16])]

step2. 上半場

咱們快速過一下上半場誊垢，下面我們來做一個圖標(biāo)，看看進(jìn)攻和防守的狀況（大于0的上半部分表示德國隊(duì)的進(jìn)攻症见，小于0的部分表示德國隊(duì)的防守）喂走，圖中還標(biāo)出了射球的點(diǎn)。

fig = plt.figure(figsize=(12,4))

avg_x = (dfPeriod1[dfPeriod1['team_name']=='Germany'].groupby('min').apply(np.mean)['x'] - 
         dfPeriod1[dfPeriod1['team_name']=='Argentina'].groupby('min').apply(np.mean)['x'])

plt.stackplot(list(avg_x.index.values), list([x if x>0 else 0 for x in avg_x]))
plt.stackplot(list(avg_x.index.values), list([x if x<0 else 0 for x in avg_x]))

for i, shot in dfP1Shots.iterrows():
    x = shot['min']
    y = avg_x.ix[shot['min']]
    signal = 1 if shot['team_name']=='Germany' else -1
    plt.annotate(s=(shot['type_name']+' ('+shot['team_name'][0]+")"), xy=(x, y), xytext=(x-5,y+30*signal), arrowprops=dict(facecolor='black'))

plt.gca().set_xlabel('minute')
plt.title("First Half Profile")

image.png

上半場很有意思的地方在于谋作，德國隊(duì)基本主導(dǎo)著比賽芋肠，使得阿根廷大多數(shù)時候都在自己的半場內(nèi)傳球。對于這個的一個可視化瓷们，可能更能說明問題业栅，我們一起來看看，阿根廷上半場的傳球路徑谬晕。

draw_pitch()
draw_events(dfPeriod1[(dfPeriod1['type']==1) & (dfPeriod1['outcome']==1) & (dfPeriod1['team_name']=='Argentina')], mirror_away=True)
plt.text(x_size/4, -3, "Germany's defense", color='black', bbox=dict(facecolor='white', alpha=0.5), horizontalalignment='center')
plt.text(x_size*3/4, -3, "Argentina's defense", color='black', bbox=dict(facecolor='white', alpha=0.5), horizontalalignment='center')
plt.title("Argentina's passes during the first half")

image.png

dfPeriod1.groupby('team_name').agg({'x': np.mean, 'on_offense': np.mean})

dfPeriod1[dfPeriod1.type==1].groupby('team_name').agg({'outcome': np.mean})

上面還做了一個數(shù)據(jù)的分析碘裕，阿根廷大概只有28%的傳球是在進(jìn)攻階段，而德國有61%是進(jìn)攻階段攒钳。同時即使是進(jìn)攻階段帮孔，你會發(fā)現(xiàn)德國隊(duì)也保持著更高的傳球準(zhǔn)確率。
不過從進(jìn)入禁區(qū)和射門的角度上看不撑，德國隊(duì)也并沒有這么輕松文兢，事實(shí)上，從下面我們做出的圖里你可以看到焕檬，德國隊(duì)在多次嘗試進(jìn)入禁區(qū)射門里姆坚，有效的很少。

draw_pitch()
draw_events(df[(df['to_box']==True) & (df['type']==1) & (df['from_box']==False) & (df['period']==1) & (df['outcome']==1)], mirror_away=True)
draw_events(df[(df['to_box']==True) & (df['type']==1) & (df['from_box']==False) & (df['period']==1) & (df['outcome']==0)], mirror_away=True, alpha=0.2)
draw_events(dfP1Shots, mirror_away=True, base_color='#a93e3e')
plt.text(x_size/4, -3, "Germany's defense", color='black', bbox=dict(facecolor='white', alpha=0.5), horizontalalignment='center')
plt.text(x_size*3/4, -3, "Argentina's defense", color='black', bbox=dict(facecolor='white', alpha=0.5), horizontalalignment='center')

image.png

dfPeriod1[(dfPeriod1['to_box']==True) & (dfPeriod1['from_box']==False) & (dfPeriod1['type']==1)].groupby(['team_name']).agg({'outcome': np.mean,  'count': np.sum})

step3. 關(guān)于克拉默的分析

大概19分鐘的時候实愚，克拉默受傷了兼呵，但是12分鐘之后才真正換上替補(bǔ)球員兔辅。然后你會發(fā)現(xiàn)這段時間簡直就是德國上半場的地獄期，在我們之前的圖表里也可以看出來击喂。
Reports say that he acted confused维苔，相關(guān)數(shù)據(jù)表明在克拉默受傷以后直到替補(bǔ)上場，他基本是“無功能”狀態(tài)：唯一做的可能就是有一個接應(yīng)懂昂，同時穿了一次球介时，還失掉了一次球。

dfKramer = df[df['player_name']=='Christoph Kramer']
pd.pivot_table(dfKramer, values='count', index='type_name', columns='min', aggfunc=sum, fill_value=0)

dfKramer['action']=dfKramer['outcome'].map(str) + '-' + dfKramer['type_name']
dfKramer['action'].unique()

score = {'1-LINEUP': 0, '1-RUN WITH BALL': 0.5, '1-RECEPTION': 0, '1-PASS': 1, '0-PASS': -1,
       '0-TACKLE (NO CONTROL)': 0, '1-CLEAR BALL (OUT OF PITCH)': 0.5,
       '0-LOST CONTROL OF BALL': -1, '1-SUBSTITUTION (OFF)': 0}

dfKramer['score'] = dfKramer['action'].map(score.get)

dfKramer.groupby('min')['score'].sum().reindex(range(32), fill_value=0).plot(kind='bar')
plt.annotate('Injury', (19,0.5), (14,1.1), arrowprops=dict(facecolor='black'))
plt.annotate('Substitution', (31,0), (22,1.6), arrowprops=dict(facecolor='black'))
plt.gca().set_xlabel('minute')
plt.gca().set_ylabel('no. events')

image.png

step4. 下半場

相比之下凌彬，下半場就勢均力敵多了沸柔，按照上半場的方式繪出圖形，你會發(fā)現(xiàn)雙方的控球確實(shí)是相當(dāng)?shù)摹?/p>

fig = plt.figure(figsize=(12,4))

avg_x = (dfPeriod2[dfPeriod2['team_name']=='Germany'].groupby('min').apply(np.mean)['x'] - 
         dfPeriod2[dfPeriod2['team_name']=='Argentina'].groupby('min').apply(np.mean)['x'])

plt.stackplot(list(avg_x.index.values), list([x if x>0 else 0 for x in avg_x]))
plt.stackplot(list(avg_x.index.values), list([x if x<0 else 0 for x in avg_x]))

for i, shot in dfP2Shots.iterrows():
    x = shot['min']
    y = avg_x.ix[shot['min']]
    signal = 1 if shot['team_name']=='Germany' else -1
    plt.annotate(s=(shot['type_name']+' ('+shot['team_name'][0]+")"), xy=(x, y), xytext=(x-5,y+30*signal), arrowprops=dict(facecolor='black'))

plt.gca().set_xlabel('minute')
plt.title("Second Half Profile")

image.png

dfPeriod2.groupby('team_name').agg({'x': np.mean, 'on_offense': np.mean})

dfPeriod2[dfPeriod2['type']==1].groupby('team_name').agg({'outcome': np.mean})

draw_pitch()
draw_events(df[(df['to_box']==True) & (df['type']==1) & (df['from_box']==False) & (df['period']==2) & (df['outcome']==1)], mirror_away=True)
draw_events(df[(df['to_box']==True) & (df['type']==1) & (df['from_box']==False) & (df['period']==2) & (df['outcome']==0)], mirror_away=True, alpha=0.2)
draw_events(dfP2Shots, mirror_away=True, base_color='#a93e3e')
plt.text(x_size/4, -3, "Germany's defense", color='black', bbox=dict(facecolor='white', alpha=0.5), horizontalalignment='center')
plt.text(x_size*3/4, -3, "Argentina's defense", color='black', bbox=dict(facecolor='white', alpha=0.5), horizontalalignment='center')

image.png

dfPeriod2[(dfPeriod2['to_box']==True) & (dfPeriod2['from_box']==False) & (dfPeriod2['type']==1)].groupby(['team_name']).agg({'outcome': np.mean,  'count': np.sum})

step5. 加時部分

fig = plt.figure(figsize=(12,4))

avg_x = (dfExtraTime[dfExtraTime['team_name']=='Germany'].groupby('min').apply(np.mean)['x'] - 
         dfExtraTime[dfExtraTime['team_name']=='Argentina'].groupby('min').apply(np.mean)['x'].reindex(dfExtraTime['min'].unique(), fill_value=0))

plt.stackplot(list(avg_x.index.values), list([x if x>0 else 0 for x in avg_x]))
plt.stackplot(list(avg_x.index.values), list([x if x<0 else 0 for x in avg_x]))

for i, shot in dfETShots.iterrows():
    x = shot['min']
    y = avg_x.ix[shot['min']]
    signal = 1 if shot['team_name']=='Germany' else -1
    plt.annotate(s=(shot['type_name']+' ('+shot['team_name'][0]+")"), xy=(x, y), xytext=(x-5,y+20*signal), arrowprops=dict(facecolor='black'))

plt.gca().set_xlabel('minute')
plt.title("Extra Time Profile")

image.png

df.groupby(['team_name', 'period']).agg({'count': np.sum, 'x': np.mean, 'on_offense': np.mean})

我們發(fā)現(xiàn)德國隊(duì)的第4段和其余階段很不同饿序，德國隊(duì)明顯減少了傳球次數(shù)勉失，他們在試圖控制比賽，把節(jié)奏放慢（有點(diǎn)拖延時間的味道原探？）乱凿。你可以看看在德國隊(duì)的上一記射門之后的數(shù)據(jù)，更能體現(xiàn)這一點(diǎn)咽弦。

goal_ix = df[df['type']==16].index[0]
df_after_shot = df.ix[goal_ix+1:]
df_after_shot.groupby(['team_name', 'period']).agg({'count': np.sum, 'x': np.mean, 'on_offense': np.mean})

draw_pitch()
draw_events(df_after_shot[(df_after_shot['to_box']==True) & (df_after_shot['type']==1) & (df_after_shot['from_box']==False) & (df_after_shot['outcome']==1)], mirror_away=True)
draw_events(df_after_shot[(df_after_shot['to_box']==True) & (df_after_shot['type']==1) & (df_after_shot['from_box']==False) & (df_after_shot['outcome']==0)], mirror_away=True, alpha=0.2)
draw_events(df_after_shot[df_after_shot['type'].isin([13,14,15,16])], mirror_away=True, base_color='#a93e3e')
plt.text(x_size/4, -3, "Germany's defense", color='black', bbox=dict(facecolor='white', alpha=0.5), horizontalalignment='center')
plt.text(x_size*3/4, -3, "Argentina's defense", color='black', bbox=dict(facecolor='white', alpha=0.5), horizontalalignment='center')

image.png

df_after_shot[df_after_shot['type'].isin([13,14,15,16])][['min', 'player_name', 'team_name', 'type_name']]

德國隊(duì)基本不打算繼續(xù)射門了徒蟆，只有一次是試圖把球傳入禁區(qū)的。但是他們的防守策略非常成功型型，以至于阿根廷基本很難進(jìn)入他們的禁區(qū)段审。2記射門全都是禁區(qū)外射門的，而且都出自梅西之腳闹蒜，然而梅西可能到這時候也深感絕望了寺枉。

step6. 射門

goal = int(df[df['type']==16].index[0])
dfGoal = df.ix[goal-30:goal]
#goal = np.where(df.type==16)[0][0]
#dfGoal = df.iloc[goal-30:goal+1]
draw_pitch()
draw_events(dfGoal[dfGoal.team_name=='Germany'], base_color='white')
draw_events(dfGoal[dfGoal.team_name=='Argentina'], base_color='cyan')

image.png

#Germany's players involved in the play
dfGoal['progression']=dfGoal['to_x']-dfGoal['x']
dfGoal[dfGoal['type'].isin([1, 101, 16])][['player_name', 'type_name', 'progression']]

step7. 一些基礎(chǔ)數(shù)據(jù)

#passing accuracy
df.groupby(['player_name', 'team_name']).agg({'count': np.sum, 'outcome': np.mean}).sort('count', ascending=False)

#shots
pd.pivot_table(df[df['type'].isin([13,14,15,16])],
               values='count',
               aggfunc=sum,
               index=['player_name', 'team_name'], 
               columns='type_name',
               fill_value=0,
               margins=True).sort('All', ascending=False)

#defensive play
pd.pivot_table(df[df['type'].isin([7, 8, 49])],
               values='count',
               aggfunc=np.sum,
               index=['player_name', 'team_name'], 
               columns='type_name',
               fill_value=0,
               margins=True).sort('All', ascending=False)

最后編輯于：2017.12.09 21:50:01

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個濱河市绷落，隨后出現(xiàn)的幾起案子姥闪，更是在濱河造成了極大的恐慌，老刑警劉巖砌烁，帶你破解...
沈念sama閱讀 217,185評論 6贊 503
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件筐喳，死亡現(xiàn)場離奇詭異，居然都是意外死亡函喉，警方通過查閱死者的電腦和手機(jī)避归，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 92,652評論 3贊 393
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來管呵，“玉大人梳毙，你說我怎么就攤上這事【柘拢” “怎么了顿天？”我有些...
開封第一講書人閱讀 163,524評論 0贊 353
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵堂氯，是天一觀的道長。經(jīng)常有香客問我牌废，道長，這世上最難降的妖魔是什么啤握？我笑而不...
開封第一講書人閱讀 58,339評論 1贊 293
?港島之戀（遺憾婚禮）
正文為了忘掉前任鸟缕，我火速辦了婚禮，結(jié)果婚禮上排抬，老公的妹妹穿的比我還像新娘懂从。我一直安慰自己，他們只是感情好蹲蒲，可當(dāng)我...
茶點(diǎn)故事閱讀 67,387評論 6贊 391
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布番甩。她就那樣靜靜地躺著，像睡著了一般届搁。火紅的嫁衣襯著肌膚如雪缘薛。梳的紋絲不亂的頭發(fā)上，一...
開封第一講書人閱讀 51,287評論 1贊 301
城市分裂傳說
那天卡睦，我揣著相機(jī)與錄音宴胧，去河邊找鬼。笑死表锻，一個胖子當(dāng)著我的面吹牛恕齐，可吹牛的內(nèi)容都是我干的。我是一名探鬼主播瞬逊，決...
沈念sama閱讀 40,130評論 3贊 418
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼显歧，長吁一口氣：“原來是場噩夢啊……” “哼！你這毒婦竟也來了确镊？” 一聲冷哼從身側(cè)響起士骤，我...
開封第一講書人閱讀 38,985評論 0贊 275
萬榮殺人案實(shí)錄
序言：老撾萬榮一對情侶失蹤，失蹤者是張志新（化名）和其女友劉穎骚腥，沒想到半個月后敦间，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體，經(jīng)...
沈念sama閱讀 45,420評論 1贊 313
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡束铭，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 37,617評論 3贊 334
?白月光啟示錄
正文我和宋清朗相戀三年廓块，在試婚紗的時候發(fā)現(xiàn)自己被綠了。大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片契沫。...
茶點(diǎn)故事閱讀 39,779評論 1贊 348
活死人
序言：一個原本活蹦亂跳的男人離奇死亡带猴，死狀恐怖，靈堂內(nèi)的尸體忽然破棺而出懈万，到底是詐尸還是另有隱情拴清，我是刑警寧澤靶病，帶...
沈念sama閱讀 35,477評論 5贊 345
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布，位于F島的核電站口予，受9級特大地震影響娄周，放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜沪停，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 41,088評論 3贊 328
男人毒藥：我在死后第九天來索命
文/蒙蒙一煤辨、第九天我趴在偏房一處隱蔽的房頂上張望。院中可真熱鬧木张，春花似錦众辨、人聲如沸。這莊子的主人今日做“春日...
開封第一講書人閱讀 31,716評論 0贊 22
一樁弒父案鹃彻，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽。三九已至妻献，卻和暖如春蛛株，著一層夾襖步出監(jiān)牢的瞬間，已是汗流浹背旋奢。一陣腳步聲響...
開封第一講書人閱讀 32,857評論 1贊 269
情欲美人皮
我被黑心中介騙來泰國打工泳挥，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留，地道東北人至朗。一個月前我還...
沈念sama閱讀 47,876評論 2贊 370
代替公主和親
正文我出身青樓屉符，卻偏偏與公主長得像，于是被迫代替她去往敵國和親锹引。傳聞我的和親對象是個殘疾皇子矗钟，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 44,700評論 2贊 354

visualization——matplotlib

1.matplotlib安裝配置

2.一副可視化圖的基本結(jié)構(gòu)

3.畫法

3.1 導(dǎo)包

3.2 準(zhǔn)備數(shù)據(jù)

3.3繪制基本曲線

3.3.1 關(guān)于顏色的補(bǔ)充

3.3.2 linestyle參數(shù)

3.3.3 marker參數(shù)

3.4 設(shè)置坐標(biāo)軸

3.5 設(shè)置文字描述、注解

3.6 設(shè)置圖例

3.7 網(wǎng)格線開關(guān)

3.8 顯示與圖像保存

4. 完整的繪制程序

5.常用圖像

6.案例：自行車租賃數(shù)據(jù)分析與可視化

step1. 導(dǎo)入數(shù)據(jù)宫纬，做簡單的數(shù)據(jù)處理

step2. 配置參數(shù)

step3. 關(guān)聯(lián)分析

step4. 分布分析

step5. 組間分析

7. 簡單總結(jié)

8.案例：2014世界杯決賽分析

step1. 預(yù)處理

step2. 上半場

step3. 關(guān)于克拉默的分析

step4. 下半場

step5. 加時部分

step6. 射門

step7. 一些基礎(chǔ)數(shù)據(jù)

推薦閱讀更多精彩內(nèi)容