大師兄的Python機器學(xué)習(xí)筆記:統(tǒng)計學(xué)基礎(chǔ)之底層代碼實現(xiàn)（一）

大師兄的Python機器學(xué)習(xí)筆記:統(tǒng)計學(xué)基礎(chǔ)之底層代碼實現(xiàn)（二）

一船侧、中心趨勢度量(Measure of Central Tendency)

1.眾數(shù)(mode)

符號： $M_0$
一組數(shù)據(jù)中出現(xiàn)最多的值。
純python代碼實現(xiàn):

>>>def calculate_mode(data):
>>>    # 返回眾數(shù)的list
>>>    data_set = set(data)
>>>    frequency_of_data = {}
>>>    for item in data_set:
>>>        frequency_of_data[item] = data.count(item)
>>>    max_frequency = max(frequency_of_data)
>>>    result = [k for k,v in frequency_of_data.items() if v == max_frequency]
>>>    return result
>>>if __name__ == '__main__':
>>>    test_data = [1,2,3,4,1,2,3,1,2,1,5,6,1,1,2]
>>>    print('眾數(shù):',calculate_mode(test_data))
眾數(shù): [1]

使用numpy包:

>>>import numpy as numpy
>>>def descriptive_mode_numpy(list):
>>>    # [第1步] 獲取 所有不重復(fù)的變量值 在 變量值列表 中的 出現(xiàn)頻數(shù)
>>>    frequency_dict=numpy.bincount(list)
>>>    # [第2步] 獲取 出現(xiàn)頻率 最高的變量值
>>>    return numpy.argmax(frequency_dict)
>>>if __name__ == '__main__':
>>>    test_data = [1,2,3,4,1,2,3,1,2,1,5,6,1,1,2]
>>>    print('眾數(shù):',descriptive_mode_numpy(test_data))
眾數(shù): 1

2.中位數(shù)(Median)

按順序排列的一組數(shù)據(jù)中居于中間位置的蜜宪。
公式： $Q_\frac{1}{2}(x)=\begin{cases}x'_\frac{n+1}{2}, &\text{if $n$ is odd number.} \\ \frac{1}{2} (x'_\frac{n}{2}+x'_{\frac{n}{2}+1}), &\text{if $n$ is even number.}\end{cases}$
純python代碼實現(xiàn):

>>>def calculate_median(data):
>>>    # 返回中位數(shù)
>>>    length_of_data = len(list(data))
>>>    data.sort()
>>>    half_of_length = length_of_data // 2
>>>    if (length_of_data % 2) == 1:
>>>        result = data[half_of_length]
>>>    else:
>>>        result = (data[half_of_length] + x[half_of_length-1])/2
>>>    return result
>>>if __name__ == '__main__':
>>>    test_data = [1,2,3,4,1,2,3,1,2,1,5,6,1,1,2]
>>>    print('中位數(shù):',calculate_median(test_data))
中位數(shù): 2

使用numpy包:

>>>import numpy as numpy
>>>test_data = [1,2,3,4,1,2,3,1,2,1,5,6,1,1,2]
>>>median = numpy.median(test_data)
>>>print('中位數(shù):',median)
中位數(shù): 2.0

3.分位數(shù)(Quantile)

一組數(shù)據(jù)排序后處于 $Q_n$ 位置上的值终议。
純python代碼實現(xiàn):

>>>def calculate_quantile(data):
>>>    # 計算四分位數(shù)
>>>    data = sorted(data)
>>>    length_of_data = len(data)
>>>    quantile_of_length,rem = divmod(length_of_data,2) # 返回商和余數(shù)
>>>    if rem:
>>>        result = data[:quantile_of_length],data[quantile_of_length+1:],data[quantile_of_length]
>>>    else:
>>>        result = data[:quantile_of_length],data[quantile_of_length:],(data[quantile_of_length->>>1]+data[quantile_of_length])/2
>>>    return result
>>>if __name__ == '__main__':
>>>    test_data = [1,2,3,4,1,2,3,1,2,1,5,6,1,1,2]
>>>    l_half,r_half,q2 = calculate_quantile(test_data)
>>>    quantile_l = calculate_quantile(l_half)[2]
>>>    quantile_h = calculate_quantile(r_half)[2]
>>>    print('下四分位數(shù):',quantile_l)
>>>    print('上四分位數(shù):',quantile_h)
下四分位數(shù): 1
上四分位數(shù): 3

使用numpy包:

>>>import numpy as np
>>>test_data = [1,2,3,4,1,2,3,1,2,1,5,6,1,1,2]
>>>quantile = np.percentile(test_data,(25,75),interpolation='midpoint')
>>>print('下四分位數(shù):', quantile[0])
>>>print('上四分位數(shù):', quantile[1])
下四分位數(shù): 1.0
上四分位數(shù): 3.0

4.平均數(shù)

符號: $\mu$ 汇竭、 $\overline{x}$ (樣本均值)

1）簡單平均數(shù)(mean)

公式: $\overline{x}=\frac{\sum_{i=1}^nx_i}{n}$
純python代碼實現(xiàn):

>>>def calculate_mean(data):
>>>    # 計算平均數(shù)
>>>    sum = 0
>>>    for item in data:
>>>        sum += float(item)
>>>    result = sum/(len(data))
>>>    return result
>>>if __name__ == '__main__':
>>>   test_data = [1,2,3,4,1,2,3,1,2,1,5,6,1,1,2]
>>>   print('平均數(shù):',calculate_mean(test_data))
平均數(shù): 2.3333333333333335

使用numpy包:

>>>import numpy as np
>>>test_data = [1,2,3,4,1,2,3,1,2,1,5,6,1,1,2]
>>>mean = np.mean(test_data)
>>>print('平均數(shù):', mean)
平均數(shù): 2.3333333333333335

2）加權(quán)平均數(shù)(Weighted mean)

加權(quán)平均數(shù)中每個點對于平均數(shù)的貢獻并不是相等的葱蝗，有些點要比其他的點更加重要。
公式: $\overline{x}=\frac{\sum_{i=1}^nw_ix_i}{\sum_{i=1}^nw_i}$
$w$ ：權(quán)重
純python代碼實現(xiàn):

>>>def calculate_weighted_mean(data):
>>>    data_set = set(data)
>>>    t = list(range(len(data)))
>>>    t_sum = 0
>>>    for i in range(len(t)):
>>>        t_sum += i
>>>    result = 0
>>>    for i in range(len(data)):
>>>        result += data[i]*t[i]/t_sum
>>>    return result
>>> if __name__ == '__main__':
>>>     test_data = [1,2,3,4,1,2,3,1,2,1,5,6,1,1,2]
>>>     print('加權(quán)平均數(shù):',calculate_weighted_mean(test_data))
加權(quán)平均數(shù): 2.4095238095238094

使用numpy包:

>>>import numpy as np
>>>def calculate_weighted_mean_np(data):
>>>    t = np.arange(len(data)) 
>>>    result = np.average(data,weights=t)
>>>    return result
>>>if __name__ == '__main__':
>>>   test_data = [1,2,3,4,1,2,3,1,2,1,5,6,1,1,2]
>>>   print('加權(quán)平均數(shù):', calculate_weighted_mean_np(test_data))
加權(quán)平均數(shù): 2.4095238095238094

3）幾何平均數(shù)(Geometric mean)

幾何平均數(shù)通過值的乘積來指示一組數(shù)字的集中趨勢或典型值细燎。
公式： $\overline{G}=\sqrt[n]{x_1x_2...x_n}$
純python代碼實現(xiàn):

>>>def calculate_geometric_mean(data):
>>>    product = 1
>>>    for item in data:
>>>        product = product*float(item)
>>>    result = product**(1/len(data))
>>>    return result
>>>if __name__ == '__main__':
>>>    test_data = [1,2,3,4,1,2,3,1,2,1,5,6,1,1,2]
>>>    print('幾何平均數(shù):',calculate_geometric_mean(test_data))
幾何平均數(shù): 1.916473929999829

使用numpy包:

>>>import numpy as np
>>>geometric_mean = np.power(np.prod(test_data),1/len(test_data))
>>>print('幾何平均數(shù):', geometric_mean)
幾何平均數(shù): 1.916473929999829

二两曼、離散程度度量(Measure of dispersion)

1.異眾比率(variation ratio)

用來衡量眾數(shù)對一組數(shù)據(jù)的代表程度
公式： $V_r = 1- \frac{f_m}{\sum{f_i}}$
$f_m$ ：眾數(shù)組的頻數(shù)
$\sum{f_i}$ ：總頻數(shù)
代碼實現(xiàn)：

>>>import numpy as np
>>>def calculate_frequency_of_mode(data):
>>>    frequency_dict = np.bincount(data)
>>>    return frequency_dict[np.argmax(frequency_dict)]

>>>def calculate_variation_ratio(data):
>>>    # 計算眾數(shù)的頻數(shù)
>>>    frequency_of_mode = calculate_frequency_of_mode(data)
>>>    # 計算異眾比率
>>>    result = 1-(frequency_of_mode)/len(data)
>>>    return result
>>>if __name__ == '__main__':
>>>    test_data = [1, 2, 3, 4, 1, 2, 3, 1, 2, 1, 5, 6, 1, 1, 2]
>>>    print(‘異眾比率:’,calculate_variation_ratio(test_data))
異眾比率: 0.6

2.平均絕對偏差(Mean Absolute Deviation)

各個變量值同平均數(shù)的的離差絕對值的算術(shù)平均數(shù)。
公式： $M_d = \frac{\sum{\mid{x_i-\overline{x}\mid}}}{n}$
代碼實現(xiàn):

>>>import numpy as np
>>>def calculate_mean_absolute_deviation(data):
>>>    # 求平均值
>>>    mean = np.mean(data)
>>>    # 求平均差
>>>    result = sum([abs(x - mean) for x in data])/len(data)
>>>    return result
>>>if __name__ == '__main__':
>>>    test_data = [1, 2, 3, 4, 1, 2, 3, 1, 2, 1, 5, 6, 1, 1, 2]
>>>    print('平均絕對偏差:',calculate_mean_absolute_deviation(test_data))
平均絕對偏差: 1.2444444444444442

3.方差(Variance)

描述數(shù)據(jù)的離散程度玻驻，也是數(shù)據(jù)離其期望值的距離悼凑。
總體（樣本）方差公式： $\sigma^2 =\frac{\sum{(x_i-\overline{x})^2}}{n}$
樣本方差公式： $s^2 = \frac{\sum{(x_i-\overline{x})^2}}{n-1}$
純python代碼實現(xiàn):

>>>import numpy as np
>>>def calculate_variance(data):
>>>    # 求平均值
>>>    mean = np.mean(data)
>>>    # 求方差
>>>    result = sum([(x - mean)**2 for x in data])/len(data)
>>>    return result
>>>if __name__ == '__main__':
>>>    test_data = [1, 2, 3, 4, 1, 2, 3, 1, 2, 1, 5, 6, 1, 1, 2]
>>>    print('方差:',calculate_variance(test_data))
方差: 2.3555555555555556

使用numpy包:

>>>import numpy as np
>>>test_data = [1, 2, 3, 4, 1, 2, 3, 1, 2, 1, 5, 6, 1, 1, 2]
>>>variance = np.var(test_data)
>>>print('方差:',variance)
方差: 2.3555555555555556

4.標(biāo)準(zhǔn)差(Standard Deviation)

方差的平方根。
公式： $\sigma = \sqrt{方差}$
純python代碼實現(xiàn):

>>>import numpy as np
>>>def calculate_variance(data):
>>>    # 求平均值
>>>    mean = np.mean(data)
>>>    # 求方差
>>>    result = sum([(x - mean)**2 for x in data])/len(data)
>>>    return result

>>>def calculate_standard_deviation(data):
>>>    # 求平均值
>>>    mean = np.mean(data)
>>>    # 求方差
>>>    variance = sum([(x - mean) ** 2 for x in data]) / len(data)
>>>    # 求平均差
>>>    result = variance**(1/2)
>>>    return result

>>>if __name__ == '__main__':
>>>    test_data = [1, 2, 3, 4, 1, 2, 3, 1, 2, 1, 5, 6, 1, 1, 2]
>>>    print('標(biāo)準(zhǔn)差:',calculate_standard_deviation(test_data))
標(biāo)準(zhǔn)差: 1.5347819244295118

使用numpy包:

>>>import numpy as np
>>>test_data = [1, 2, 3, 4, 1, 2, 3, 1, 2, 1, 5, 6, 1, 1, 2]
>>>standard_deviation = np.std(test_data)
>>>print('標(biāo)準(zhǔn)差:',standard_deviation)
標(biāo)準(zhǔn)差: 1.5347819244295118

5.標(biāo)準(zhǔn)分?jǐn)?shù)(z-score)

代表著原始分?jǐn)?shù)和母體平均值之間有多少個標(biāo)準(zhǔn)差璧瞬。
在原始分?jǐn)?shù)低于平均值時Z為負(fù)數(shù)佛析，反之則為正數(shù)。
公式： $z = \frac{x_i-\mu}{\sigma}$
代碼實現(xiàn):

>>>import numpy as np
>>>def calculate_zscore(x,data):
>>>    # 求平均值
>>>    mean = np.mean(data)
>>>    # 求標(biāo)準(zhǔn)差
>>>    std = np.std(data)
>>>    # 計算z-score
>>>    result = (x-mean)/std
>>>    return result
>>>if __name__ == '__main__':
>>>    test_data = [1, 2, 3, 4, 1, 2, 3, 1, 2, 1, 5, 6, 1, 1, 2]
>>>    print('標(biāo)準(zhǔn)分?jǐn)?shù):',calculate_zscore(test_data[0],test_data))
標(biāo)準(zhǔn)分?jǐn)?shù): -0.8687444855261388

6.四分位距(interquartile range)

與方差彪蓬、標(biāo)準(zhǔn)差一樣，表示統(tǒng)計資料中各變量分散情形捺萌，但四分差更多為一種穩(wěn)健統(tǒng)計档冬。
公式： $Q_d = Q_U - Q_L = Q_3 - Q_1$
代碼實現(xiàn)：

>>>import numpy as np
>>>def calculate_QPR(data):
>>>    # 獲取上下四分位數(shù)
>>>    Q_L = np.quantile(data,0.25,interpolation='lower')
>>>    Q_U = np.quantile(data,0.75,interpolation='higher')
>>>    result = Q_U - Q_L
>>>    return result
>>>if __name__ == '__main__':
>>>    test_data = [1, 2, 3, 4, 1, 2, 3, 1, 2, 1, 5, 6, 1, 1, 2]
>>>    print('四分位距:',calculate_QPR(test_data))
四分位距: 2

7.離散系數(shù)(coefficient of variation)

是概率分布離散程度的一個歸一化量度。
只在平均值不為零時有定義桃纯，而且一般適用于平均值大于零的情況酷誓。
公式： $c_v = \frac{\sigma}{\mid\mu\mid}$

>>>import numpy as np
>>>def calculate_coefficient_of_variation(data):
>>>    # 計算平均差
>>>    std = np.std(data)
>>>    # 計算平均值
>>>    mean = np.mean(data)
>>>    # 計算離散系數(shù)
>>>    result = std/abs(mean)
>>>    return result
>>>if __name__ == '__main__':
>>>    test_data = [1, 2, 3, 4, 1, 2, 3, 1, 2, 1, 5, 6, 1, 1, 2]
>>>    print('離散系數(shù):',calculate_coefficient_of_variation(test_data))
離散系數(shù): 0.6577636818983621

三、數(shù)據(jù)分布的形狀(Shape of dispersion)

1.偏態(tài)系數(shù)(skewness)

對分布對稱性的測度
偏態(tài)系數(shù)公式： $sk = \frac{\mu_3}{\sigma^3} = \frac{E[(X - \mu)^3]}{(E[(X - \mu)^2])^\frac{3}{2}}$
$0.5<\mid sk\mid<1$ 中等偏態(tài)
$sk=0$ 對稱态坦，無偏態(tài)
$sk>0$ 右偏盐数， $sk>0$ 左偏
用python實現(xiàn)（與pandas包算法不同有誤差）：

>>>import numpy as np
>>>def calculate_skewness(data):
>>>    l = len(data)
>>>    # 計算平均值
>>>    mean = np.mean(data)
>>>    # 計算三階中心距
>>>    mu_3 = sum([x**3 for x in data])/l
>>>    # 計算標(biāo)準(zhǔn)差
>>>    std = np.std(data)
>>>    # 計算偏態(tài)
>>>    result = (mu_3 - 3*mean*std**2-mean**3)/std**3
 >>>   return result
>>>if __name__ == '__main__':
>>>    test_data = [1, 2, 3, 4, 1, 2, 3, 1, 2, 1, 5, 6, 1, 1, 2]
>>>    print('偏態(tài)系數(shù)：',calculate_skewness(test_data))
偏態(tài)系數(shù)： 1.0900284582544935

用pandas包實現(xiàn)：

>>>import pandas as pd
>>>test_data = [1, 2, 3, 4, 1, 2, 3, 1, 2, 1, 5, 6, 1, 1, 2]
>>>test_data = pd.Series(test_data)
>>>skewness = test_data.skew()
>>>print('偏態(tài)系數(shù):',skewness)
偏態(tài)系數(shù): 1.2150779271256849

2.峰態(tài)系數(shù)(kurtosis)

對分布平峰或尖峰的測度。
公式： $K=\frac{V_4}{\sigma^4} = \frac{E[(X-\mu)^4]}{(E[(x-\mu)^2])^2}$
標(biāo)準(zhǔn)正態(tài)分布的峰態(tài)系數(shù)為0伞梯。
K>0為尖峰分布玫氢。
K<0為偏平分布。
用python實現(xiàn):

def calculate_kurtosis(data):
    l = len(data)
    # 計算平均值
    mean = np.mean(data)
    # 計算標(biāo)準(zhǔn)差
    std = np.std(data)
    # 計算峰態(tài)

    v4 = sum([((x - mean) ** 4)/l for x in data])
    result = v4/(std**4)
    return  result

if __name__ == '__main__':
    test_data = [1, 2, 3, 4, 1, 2, 3, 1, 2, 1, 5, 6, 1, 1, 2]
    print('峰態(tài)系數(shù)：',calculate_kurtosis(test_data))
峰態(tài)系數(shù)： 3.105197579209683

用pandas包實現(xiàn)（這里我也不理解為什么會差很多谜诫，兩種方法應(yīng)該都沒有用錯）：

>>>import pandas as pd
>>>test_data = [1, 2, 3, 4, 1, 2, 3, 1, 2, 1, 5, 6, 1, 1, 2]
>>>test_data = pd.Series(test_data)
>>>kurtosis = test_data.kurt()
>>>print(kurtosis)
0.6895144727113385

四漾峡、數(shù)據(jù)分布特征(Data distribution characteristics)

1.切比雪夫法則(Chebyshev's Inequality)

可能有很少的測量值落在平均值的1個標(biāo)準(zhǔn)差范圍內(nèi)。
所有數(shù)據(jù)中喻旷，至少有3/4(或75%)的數(shù)據(jù)位于平均數(shù)2個標(biāo)準(zhǔn)差范圍內(nèi)生逸。
所有數(shù)據(jù)中，至少有8/9(或88.9%)的數(shù)據(jù)位于平均數(shù)3個標(biāo)準(zhǔn)差范圍內(nèi)且预。
所有數(shù)據(jù)中槽袄，至少有24/25(或96%)的數(shù)據(jù)位于平均數(shù)5個標(biāo)準(zhǔn)差范圍內(nèi)。
通常锋谐，對于任意大于1的數(shù)k,至少有 $(1-1/k^2)$ 的測量值落在K個標(biāo)準(zhǔn)差范圍內(nèi)遍尺。
公式： $P(\mid X - E(X)\mid \geq b) \leq \frac{Var(X)}{b^2}$
代碼實現(xiàn):

>>>import numpy as np
>>>def calculate_range(mean,std,k):
>>>    range = (int(mean - k*std),int(mean + k*std))
>>>    return range
>
>>>def calculate_data_distribution_characteristics(data):
>>>    std = np.std(data)
>>>    mean = np.mean(data)
>>>    d1 = dict()
>>>    for item in data:
>>>        k = 0
>
>>>        while True:
>>>            r1,r2 = calculate_range(mean, std, k)
>>>            if item in range(r1,r2) :
>>>                if k not in d1:
>>>                    d1[k] = 0
>>>                d1[k] += 1
>>>                break
>>>            k += 1
>
>>>    result = {}
>>>    for k,v in d1.items():
>>>        n = k
>>>        while True:
>>>            if n == 0:
>>>                break
>>>            try:
>>>                v += d1[n-1]
>>>                n-=1
>>>            except KeyError as e:
>>>                n-=1
>>>        result[k] = '{}標(biāo)準(zhǔn)差內(nèi):{:.0%}'.format(k, float(v / len(data)))
>>>    return result
>
>>>if __name__ == '__main__':
>>>    test_data = [1, 2, 3, 4, 1, 2, 3, 1, 2, 1, 5, 6, 1, 1, 2]
>>>    print(calculate_data_distribution_characteristics(test_data))
{1: '1標(biāo)準(zhǔn)差內(nèi):67%', 2: '2標(biāo)準(zhǔn)差內(nèi):87%', 3: '3標(biāo)準(zhǔn)差內(nèi):93%', 4: '4標(biāo)準(zhǔn)差內(nèi):100%'}

2.經(jīng)驗法則(rule of thumb)

使用條件：數(shù)據(jù)對稱分布。
大約68%的測量值位于均值的一個標(biāo)準(zhǔn)差范圍內(nèi)涮拗。
大約95%的測量值位于均值的2個標(biāo)準(zhǔn)差范圍內(nèi)狮鸭。
幾乎所有的測量值位于均值的3個標(biāo)準(zhǔn)差范圍內(nèi)合搅。

>>>import numpy as np
>>>def calculate_range(mean,std,k):
>>>    range = (int(mean - k*std),int(mean + k*std))
>>>    return range
>
>>>def calculate_data_distribution_characteristics(data):
>>>    std = np.std(data)
>>>    mean = np.mean(data)
>>>    d1 = dict()
>>>    for item in data:
>>>        k = 0
>
>>>        while True:
>>>            r1,r2 = calculate_range(mean, std, k)
>>>            if item in range(r1,r2) :
>>>                if k not in d1:
>>>                    d1[k] = 0
>>>                d1[k] += 1
>>>                break
>>>            k += 1
>
>>>    result = {}
>>>    for k,v in d1.items():
>>>        n = k
>>>        while True:
 >>>           if n == 0:
>>>                break
>>>            try:
>>>                v += d1[n-1]
>>>                n-=1
>>>            except KeyError as e:
>>>                n-=1
>>>        result[k] = '{}標(biāo)準(zhǔn)差內(nèi):{:.0%}'.format(k, float(v / len(data)))
>>>    return result
>
>>>if __name__ == '__main__':
>>>    test_data_e = [1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,5,5]
>>>    print(calculate_data_distribution_characteristics(test_data_e))
{1: '1標(biāo)準(zhǔn)差內(nèi):60%', 2: '2標(biāo)準(zhǔn)差內(nèi):80%', 3: '3標(biāo)準(zhǔn)差內(nèi):100%'}

參考資料

https://study.163.com/provider/400000000471022/index.htm 靜學(xué)社
http://www.beijingdecheng.com/index.html 德成智庫

本文作者：大師兄(superkmi)

最后編輯于：2020.02.27 19:19:27

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個濱河市歧蕉，隨后出現(xiàn)的幾起案子灾部，更是在濱河造成了極大的恐慌，老刑警劉巖惯退，帶你破解...
沈念sama閱讀 222,681評論 6贊 517
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件赌髓，死亡現(xiàn)場離奇詭異，居然都是意外死亡催跪，警方通過查閱死者的電腦和手機锁蠕，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 95,205評論 3贊 399
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來懊蒸，“玉大人荣倾，你說我怎么就攤上這事∑锿瑁” “怎么了舌仍？”我有些...
開封第一講書人閱讀 169,421評論 0贊 362
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵，是天一觀的道長通危。經(jīng)常有香客問我铸豁，道長，這世上最難降的妖魔是什么菊碟？我笑而不...
開封第一講書人閱讀 60,114評論 1贊 300
?港島之戀（遺憾婚禮）
正文為了忘掉前任节芥，我火速辦了婚禮，結(jié)果婚禮上逆害，老公的妹妹穿的比我還像新娘头镊。我一直安慰自己，他們只是感情好魄幕，可當(dāng)我...
茶點故事閱讀 69,116評論 6贊 398
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布拧晕。她就那樣靜靜地躺著，像睡著了一般梅垄。火紅的嫁衣襯著肌膚如雪厂捞。梳的紋絲不亂的頭發(fā)上，一...
開封第一講書人閱讀 52,713評論 1贊 312
城市分裂傳說
那天队丝，我揣著相機與錄音靡馁，去河邊找鬼。笑死机久，一個胖子當(dāng)著我的面吹牛臭墨，可吹牛的內(nèi)容都是我干的。我是一名探鬼主播膘盖，決...
沈念sama閱讀 41,170評論 3贊 422
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼胧弛，長吁一口氣：“原來是場噩夢啊……” “哼尤误！你這毒婦竟也來了？” 一聲冷哼從身側(cè)響起结缚，我...
開封第一講書人閱讀 40,116評論 0贊 277
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤损晤，失蹤者是張志新（化名）和其女友劉穎，沒想到半個月后红竭，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體尤勋，經(jīng)...
沈念sama閱讀 46,651評論 1贊 320
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點故事閱讀 38,714評論 3贊 342
?白月光啟示錄
正文我和宋清朗相戀三年茵宪，在試婚紗的時候發(fā)現(xiàn)自己被綠了最冰。大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
茶點故事閱讀 40,865評論 1贊 353
活死人
序言：一個原本活蹦亂跳的男人離奇死亡稀火，死狀恐怖暖哨，靈堂內(nèi)的尸體忽然破棺而出，到底是詐尸還是另有隱情凰狞，我是刑警寧澤篇裁，帶...
沈念sama閱讀 36,527評論 5贊 351
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布，位于F島的核電站服球，受9級特大地震影響，放射性物質(zhì)發(fā)生泄漏颠焦。R本人自食惡果不足惜斩熊，卻給世界環(huán)境...
茶點故事閱讀 42,211評論 3贊 336
男人毒藥：我在死后第九天來索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望伐庭。院中可真熱鬧粉渠，春花似錦、人聲如沸圾另。這莊子的主人今日做“春日...
開封第一講書人閱讀 32,699評論 0贊 25
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽集乔。三九已至去件，卻和暖如春，著一層夾襖步出監(jiān)牢的瞬間扰路，已是汗流浹背尤溜。一陣腳步聲響...
開封第一講書人閱讀 33,814評論 1贊 274
情欲美人皮
我被黑心中介騙來泰國打工，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留汗唱，地道東北人宫莱。一個月前我還...
沈念sama閱讀 49,299評論 3贊 379
代替公主和親
正文我出身青樓，卻偏偏與公主長得像哩罪，于是被迫代替她去往敵國和親授霸。傳聞我的和親對象是個殘疾皇子巡验，可洞房花燭夜當(dāng)晚...
茶點故事閱讀 45,870評論 2贊 361

大師兄的Python機器學(xué)習(xí)筆記:統(tǒng)計學(xué)基礎(chǔ)之底層代碼實現(xiàn)（一）

一船侧、中心趨勢度量(Measure of Central Tendency)

1.眾數(shù)(mode)

2.中位數(shù)(Median)

3.分位數(shù)(Quantile)

4.平均數(shù)

二两曼、離散程度度量(Measure of dispersion)

1.異眾比率(variation ratio)

2.平均絕對偏差(Mean Absolute Deviation)

3.方差(Variance)

4.標(biāo)準(zhǔn)差(Standard Deviation)

5.標(biāo)準(zhǔn)分?jǐn)?shù)(z-score)

6.四分位距(interquartile range)

7.離散系數(shù)(coefficient of variation)

三、數(shù)據(jù)分布的形狀(Shape of dispersion)

1.偏態(tài)系數(shù)(skewness)

2.峰態(tài)系數(shù)(kurtosis)

四漾峡、數(shù)據(jù)分布特征(Data distribution characteristics)

1.切比雪夫法則(Chebyshev's Inequality)

2.經(jīng)驗法則(rule of thumb)

參考資料

推薦閱讀更多精彩內(nèi)容