用python進行配對樣本差異分析

應用場景非常簡單，成對的數(shù)據(jù)需要檢驗組間是否存在差異
分成兩步：
1、檢驗正態(tài)性

from scipy import stats
##檢驗是否正態(tài)
def norm_test(data):
    t,p =  stats.shapiro(data)
    #print(t,p)
    if p>=0.05:
        return True
    else:
        return False

2腋舌、根據(jù)正態(tài)性的檢驗結果，分別選擇配對樣本t檢驗和wilcoxon檢驗渗蟹。目標是獲取統(tǒng)計量和P值块饺。方法的選擇可以參考https://segmentfault.com/a/1190000007626742

if norm_test(data_b) and norm_test(data_p):
  print('yes')
  t,p=ttest_rel(list(data_b),list(data_p))
else:
  print('no')
  t,p=wilcoxon(list(data_b),list(data_p),zero_method='wilcox', correction=False)#

這里有一個需要注意的坑點

scipy包里帶的wilcoxon函數(shù)返回的不是統(tǒng)計量z和P值，返回的是負秩和和P值雌芽，因此這里需要找到wilcoxon的源碼授艰，路徑為：Lib\site-packages\scipy\stats\morestats.py
點進morestats文件，將函數(shù)返回的數(shù)據(jù)改成z和p值世落，如下：

def wilcoxon(x, y=None, zero_method="wilcox", correction=False):
    """
    Calculate the Wilcoxon signed-rank test.

    The Wilcoxon signed-rank test tests the null hypothesis that two
    related paired samples come from the same distribution. In particular,
    it tests whether the distribution of the differences x - y is symmetric
    about zero. It is a non-parametric version of the paired T-test.

    Parameters
    ----------
    x : array_like
        The first set of measurements.
    y : array_like, optional
        The second set of measurements.  If `y` is not given, then the `x`
        array is considered to be the differences between the two sets of
        measurements.
    zero_method : string, {"pratt", "wilcox", "zsplit"}, optional
        "pratt":
            Pratt treatment: includes zero-differences in the ranking process
            (more conservative)
        "wilcox":
            Wilcox treatment: discards all zero-differences
        "zsplit":
            Zero rank split: just like Pratt, but spliting the zero rank
            between positive and negative ones
    correction : bool, optional
        If True, apply continuity correction by adjusting the Wilcoxon rank
        statistic by 0.5 towards the mean value when computing the
        z-statistic.  Default is False.

    Returns
    -------
    statistic : float
        The sum of the ranks of the differences above or below zero, whichever
        is smaller.
    pvalue : float
        The two-sided p-value for the test.

    Notes
    -----
    Because the normal approximation is used for the calculations, the
    samples used should be large.  A typical rule is to require that
    n > 20.

    References
    ----------
    .. [1] http://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test

    """

    if zero_method not in ["wilcox", "pratt", "zsplit"]:
        raise ValueError("Zero method should be either 'wilcox' "
                         "or 'pratt' or 'zsplit'")

    if y is None:
        d = asarray(x)
    else:
        x, y = map(asarray, (x, y))
        if len(x) != len(y):
            raise ValueError('Unequal N in wilcoxon.  Aborting.')
        d = x - y

    if zero_method == "wilcox":
        # Keep all non-zero differences
        d = compress(np.not_equal(d, 0), d, axis=-1)

    count = len(d)
    if count < 10:
        warnings.warn("Warning: sample size too small for normal approximation.")

    r = stats.rankdata(abs(d))
    r_plus = np.sum((d > 0) * r, axis=0)
    r_minus = np.sum((d < 0) * r, axis=0)

    if zero_method == "zsplit":
        r_zero = np.sum((d == 0) * r, axis=0)
        r_plus += r_zero / 2.
        r_minus += r_zero / 2.

    T = min(r_plus, r_minus)
    mn = count * (count + 1.) * 0.25
    se = count * (count + 1.) * (2. * count + 1.)

    if zero_method == "pratt":
        r = r[d != 0]

    replist, repnum = find_repeats(r)
    if repnum.size != 0:
        # Correction for repeated elements.
        se -= 0.5 * (repnum * (repnum * repnum - 1)).sum()

    se = sqrt(se / 24)
    correction = 0.5 * int(bool(correction)) * np.sign(T - mn)
    z = (T - mn - correction) / se
    prob = 2. * distributions.norm.sf(abs(z))
    #print('hehe')
    return Wilcoxonresult(z, prob)

后面就可以愉快的用這個工具啦~

最后編輯于：2018.08.09 14:21:20

?著作權歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末淮腾，一起剝皮案震驚了整個濱河市，隨后出現(xiàn)的幾起案子，更是在濱河造成了極大的恐慌谷朝，老刑警劉巖洲押，帶你破解...
沈念sama閱讀 207,113評論 6贊 481
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件，死亡現(xiàn)場離奇詭異圆凰，居然都是意外死亡杈帐，警方通過查閱死者的電腦和手機，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 88,644評論 2贊 381
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進店門专钉，熙熙樓的掌柜王于貴愁眉苦臉地迎上來挑童，“玉大人，你說我怎么就攤上這事跃须≌镜穑” “怎么了？”我有些...
開封第一講書人閱讀 153,340評論 0贊 344
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵回怜，是天一觀的道長大年。經(jīng)常有香客問我，道長玉雾，這世上最難降的妖魔是什么翔试？我笑而不...
開封第一講書人閱讀 55,449評論 1贊 279
?港島之戀（遺憾婚禮）
正文為了忘掉前任，我火速辦了婚禮复旬，結果婚禮上垦缅，老公的妹妹穿的比我還像新娘。我一直安慰自己驹碍，他們只是感情好壁涎，可當我...
茶點故事閱讀 64,445評論 5贊 374
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布。她就那樣靜靜地躺著志秃，像睡著了一般怔球。火紅的嫁衣襯著肌膚如雪。梳的紋絲不亂的頭發(fā)上浮还，一...
開封第一講書人閱讀 49,166評論 1贊 284
城市分裂傳說
那天竟坛，我揣著相機與錄音，去河邊找鬼钧舌。笑死担汤，一個胖子當著我的面吹牛，可吹牛的內(nèi)容都是我干的洼冻。我是一名探鬼主播崭歧，決...
沈念sama閱讀 38,442評論 3贊 401
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼，長吁一口氣：“原來是場噩夢啊……” “哼撞牢！你這毒婦竟也來了率碾？” 一聲冷哼從身側(cè)響起叔营，我...
開封第一講書人閱讀 37,105評論 0贊 261
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤，失蹤者是張志新（化名）和其女友劉穎播掷，沒想到半個月后审编，有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體，經(jīng)...
沈念sama閱讀 43,601評論 1贊 300
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡歧匈，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點故事閱讀 36,066評論 2贊 325
?白月光啟示錄
正文我和宋清朗相戀三年垒酬，在試婚紗的時候發(fā)現(xiàn)自己被綠了。大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片件炉。...
茶點故事閱讀 38,161評論 1贊 334
活死人
序言：一個原本活蹦亂跳的男人離奇死亡勘究，死狀恐怖，靈堂內(nèi)的尸體忽然破棺而出斟冕，到底是詐尸還是另有隱情口糕，我是刑警寧澤，帶...
沈念sama閱讀 33,792評論 4贊 323
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布磕蛇，位于F島的核電站景描，受9級特大地震影響，放射性物質(zhì)發(fā)生泄漏秀撇。R本人自食惡果不足惜超棺，卻給世界環(huán)境...
茶點故事閱讀 39,351評論 3贊 307
男人毒藥：我在死后第九天來索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望呵燕。院中可真熱鬧棠绘，春花似錦、人聲如沸再扭。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,352評論 0贊 19
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽泛范。三九已至让虐，卻和暖如春，著一層夾襖步出監(jiān)牢的瞬間罢荡，已是汗流浹背赡突。一陣腳步聲響...
開封第一講書人閱讀 31,584評論 1贊 261
情欲美人皮
我被黑心中介騙來泰國打工，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留柠傍，地道東北人麸俘。一個月前我還...
沈念sama閱讀 45,618評論 2贊 355
代替公主和親
正文我出身青樓辩稽，卻偏偏與公主長得像惧笛，于是被迫代替她去往敵國和親。傳聞我的和親對象是個殘疾皇子逞泄，可洞房花燭夜當晚...
茶點故事閱讀 42,916評論 2贊 344

用python進行配對樣本差異分析

這里有一個需要注意的坑點

推薦閱讀更多精彩內(nèi)容