需要用到numpy庫(kù)
import numpy as np
import scipy.stats as stats
import scipy.optimize as opt
首先我們來創(chuàng)造兩個(gè)數(shù)組作為測(cè)試數(shù)據(jù)
n = 200
norm_dist = stats.norm(loc=0.5, scale=10) #構(gòu)造一個(gè)正態(tài)分布获高,均值為0.5弦牡,標(biāo)準(zhǔn)差為10 “標(biāo)準(zhǔn)差”也稱“均方差”,是“方差”開根號(hào)
dat = norm_dist.rvs(size=n) #隨機(jī)取200個(gè)點(diǎn)
print ("mean of data is: " + str(np.mean(dat)))
print ("median of data is: " + str(np.median(dat)))
print ("standard deviation of data is: " + str(np.std(dat))) #因?yàn)檫@200個(gè)點(diǎn)是隨機(jī)取得瓦侮,所以跟原先的正態(tài)分布可能有一些不同
norm_dist2 = stats.norm(loc=0.2, scale=1)
dat2 = norm_dist2.rvs(size=n/2)#隨機(jī)取100個(gè)點(diǎn)
print ("mean of data is: " + str(np.mean(dat2)))
print ("median of data is: " + str(np.median(dat2)))
print ("standard deviation of data is: " + str(np.std(dat2)))
對(duì)這兩個(gè)數(shù)組分析差異---雙樣本的t檢驗(yàn)
stat_val, p_val = stats.ttest_ind(dat, dat2, equal_var=False)
#看看兩個(gè)分布在均值上有沒有顯著差異
#注意影涉,這里我們生成的第二組數(shù)據(jù)樣本大小膏秫、方差和第一組均不相等侮措,在運(yùn)用t檢驗(yàn)時(shí)需要使用Welch's t-test
#即指定ttest_ind中的equal_var=False懈叹。
print ('Two-sample t-statistic D = %6.3f, p-value = %6.4f' % (stat_val, p_val))
計(jì)算兩個(gè)序列的相關(guān)性,并做顯著性檢驗(yàn)
import scipy.stats as stats
x = [76,81,78,76,76,78,76,78,98,88,76,66,44,67,65,59,87,77,79,85,68,76,77,98,99,98,87,67,78]
y = [43,33,23,34,31,51,56,43,44,45,32,33,28,39,31,38,21,27,43,46,41,41,48,56,55,45,68,54,33]
r, p=stats.pearsonr(x,y)
[out]:(0.39341862097439129, 0.034735931329532836)
相關(guān)系數(shù)為0.39分扎,說明這兩個(gè)序列存在一定的相關(guān)性
p-value為0.035澄成,說明結(jié)果是統(tǒng)計(jì)顯著的