應用場景非常簡單,成對的數(shù)據(jù)需要檢驗組間是否存在差異
分成兩步:
1、檢驗正態(tài)性
from scipy import stats
##檢驗是否正態(tài)
def norm_test(data):
t,p = stats.shapiro(data)
#print(t,p)
if p>=0.05:
return True
else:
return False
2腋舌、根據(jù)正態(tài)性的檢驗結果,分別選擇配對樣本t檢驗和wilcoxon檢驗渗蟹。目標是獲取統(tǒng)計量和P值块饺。方法的選擇可以參考https://segmentfault.com/a/1190000007626742
if norm_test(data_b) and norm_test(data_p):
print('yes')
t,p=ttest_rel(list(data_b),list(data_p))
else:
print('no')
t,p=wilcoxon(list(data_b),list(data_p),zero_method='wilcox', correction=False)#
這里有一個需要注意的坑點
scipy包里帶的wilcoxon函數(shù)返回的不是統(tǒng)計量z和P值,返回的是負秩和和P值雌芽,因此這里需要找到wilcoxon的源碼授艰,路徑為:Lib\site-packages\scipy\stats\morestats.py
點進morestats文件,將函數(shù)返回的數(shù)據(jù)改成z和p值世落,如下:
def wilcoxon(x, y=None, zero_method="wilcox", correction=False):
"""
Calculate the Wilcoxon signed-rank test.
The Wilcoxon signed-rank test tests the null hypothesis that two
related paired samples come from the same distribution. In particular,
it tests whether the distribution of the differences x - y is symmetric
about zero. It is a non-parametric version of the paired T-test.
Parameters
----------
x : array_like
The first set of measurements.
y : array_like, optional
The second set of measurements. If `y` is not given, then the `x`
array is considered to be the differences between the two sets of
measurements.
zero_method : string, {"pratt", "wilcox", "zsplit"}, optional
"pratt":
Pratt treatment: includes zero-differences in the ranking process
(more conservative)
"wilcox":
Wilcox treatment: discards all zero-differences
"zsplit":
Zero rank split: just like Pratt, but spliting the zero rank
between positive and negative ones
correction : bool, optional
If True, apply continuity correction by adjusting the Wilcoxon rank
statistic by 0.5 towards the mean value when computing the
z-statistic. Default is False.
Returns
-------
statistic : float
The sum of the ranks of the differences above or below zero, whichever
is smaller.
pvalue : float
The two-sided p-value for the test.
Notes
-----
Because the normal approximation is used for the calculations, the
samples used should be large. A typical rule is to require that
n > 20.
References
----------
.. [1] http://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test
"""
if zero_method not in ["wilcox", "pratt", "zsplit"]:
raise ValueError("Zero method should be either 'wilcox' "
"or 'pratt' or 'zsplit'")
if y is None:
d = asarray(x)
else:
x, y = map(asarray, (x, y))
if len(x) != len(y):
raise ValueError('Unequal N in wilcoxon. Aborting.')
d = x - y
if zero_method == "wilcox":
# Keep all non-zero differences
d = compress(np.not_equal(d, 0), d, axis=-1)
count = len(d)
if count < 10:
warnings.warn("Warning: sample size too small for normal approximation.")
r = stats.rankdata(abs(d))
r_plus = np.sum((d > 0) * r, axis=0)
r_minus = np.sum((d < 0) * r, axis=0)
if zero_method == "zsplit":
r_zero = np.sum((d == 0) * r, axis=0)
r_plus += r_zero / 2.
r_minus += r_zero / 2.
T = min(r_plus, r_minus)
mn = count * (count + 1.) * 0.25
se = count * (count + 1.) * (2. * count + 1.)
if zero_method == "pratt":
r = r[d != 0]
replist, repnum = find_repeats(r)
if repnum.size != 0:
# Correction for repeated elements.
se -= 0.5 * (repnum * (repnum * repnum - 1)).sum()
se = sqrt(se / 24)
correction = 0.5 * int(bool(correction)) * np.sign(T - mn)
z = (T - mn - correction) / se
prob = 2. * distributions.norm.sf(abs(z))
#print('hehe')
return Wilcoxonresult(z, prob)
后面就可以愉快的用這個工具啦~