前面探索了如果對數據進行了數據分布檢驗勋拟，發(fā)現符合正態(tài)分布勋磕，并且數據方法齊性，那么就可以采用參數檢驗的方法進行統(tǒng)計檢驗敢靡。如果數據的分布不符合正態(tài)分布或者方差齊性挂滓，那么就要采用非參數檢驗。

學了好久啸胧，參數檢驗和非參數檢驗當中赶站，參數是什么意思，參數（parameters）指什么吓揪，而非參數檢驗又是怎么把參數去掉亲怠。
參數檢驗主要就是指通過對樣本數據的統(tǒng)計參數，比如均值柠辞、方差、中位數等這些數值的分析主胧，來估計總體參數叭首，然后比較總體之間是否有差異习勤。而非參數檢驗就是不去管這些參數，只是通過了解數據的分布狀況焙格，來判斷總體是否具有差異性图毕。

3.非參數檢驗

還是貫徹以實例促進認知的個人學習理論，直接用例子來開始眷唉。
這次使用使用survminer包中的數據集myeloma先加載包

library(survminer)
data("myeloma")
head(myeloma)[1:3,1:11]

查看這個數據結果

 colnames(myeloma)
 [1] "molecular_group" "chr1q21_status"  "treatment"       "event"           "time"           
 [6] "CCND1"           "CRIM1"           "DEPDC1"          "IRF4"            "TP53"           
[11] "WHSC1"

可以看到行是樣本予颤，11列中6到11列是基因名稱，前面5列是其他信息冬阳，我們想要看看DEPDC1基因在幾個分組中的表達情況蛤虐。

3.1 正態(tài)性和方差齊性檢驗

在分析數據之前，尤其在比較差異之前肝陪，先對數據的分布進行檢查驳庭。在統(tǒng)計方法選擇（1）中寫到的正態(tài)分布采用shapiro.test()函數，方差齊性檢驗使用 bartlett.test()函數

> shapiro.test(myeloma$DEPDC1)    

    Shapiro-Wilk normality test

data:  myeloma$DEPDC1
W = 0.84077, p-value = 1.606e-15

> #方差齊性檢驗：用bartlett.test()或者leveneTest()
> bartlett.test(DEPDC1~molecular_group,data = myeloma)   #巴雷特檢驗

    Bartlett test of homogeneity of variances

data:  DEPDC1 by molecular_group
Bartlett's K-squared = 89.032, df = 6, p-value < 2.2e-16

> library(car)
> leveneTest(DEPDC1~molecular_group,data = myeloma)
Levene's Test for Homogeneity of Variance (center = median)
       Df F value    Pr(>F)    
group   6  6.9894 6.954e-07 ***
      249                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>

正態(tài)性檢驗和方差齊性檢驗氯窍，只要數據正態(tài)分布不齊或者方差不齊那么就應該選擇非參數檢驗的方法饲常。

3.2 兩組之間非參數檢驗

兩組之間非參數檢驗通常選擇wilcox.test方法，在統(tǒng)計方法的選擇（2）--參數檢驗使用stat_compare_means()和compare_means()兩個函數時發(fā)現其默認的方法就是非參數檢驗的方法狼讨，兩者比較使用wilcox.test贝淤，多組全局比較使用kruskal.test 。

針對上面的數據政供，我們先進性兩兩比較

table(myeloma$molecular_group)
compare_means(DEPDC1~molecular_group, data = myeloma, method = "wilcox.test")

結果如下

   Cyclin D-1       Cyclin D-2     Hyperdiploid Low bone disease              MAF            MMSET 
              22               43               66               31               21               44 
   Proliferation 
              29 

# A tibble: 21 x 8
   .y.    group1     group2                 p p.adj p.format p.signif method  
   <chr>  <chr>      <chr>              <dbl> <dbl> <chr>    <chr>    <chr>   
 1 DEPDC1 Cyclin D-1 Cyclin D-2       0.267   1     0.2667   ns       Wilcoxon
 2 DEPDC1 Cyclin D-1 Hyperdiploid     0.00356 0.053 0.0036   **       Wilcoxon
 3 DEPDC1 Cyclin D-1 Low bone disease 0.00521 0.065 0.0052   **       Wilcoxon
 4 DEPDC1 Cyclin D-1 MAF              0.819   1     0.8193   ns       Wilcoxon
 5 DEPDC1 Cyclin D-1 MMSET            0.681   1     0.6805   ns       Wilcoxon
 6 DEPDC1 Cyclin D-1 Proliferation    0.00465 0.065 0.0046   **       Wilcoxon
 7 DEPDC1 Cyclin D-2 Hyperdiploid     0.00219 0.035 0.0022   **       Wilcoxon
 8 DEPDC1 Cyclin D-2 Low bone disease 0.00474 0.065 0.0047   **       Wilcoxon
 9 DEPDC1 Cyclin D-2 MAF              0.563   1     0.5625   ns       Wilcoxon
10 DEPDC1 Cyclin D-2 MMSET            0.502   1     0.5016   ns       Wilcoxon
# ... with 11 more rows

因為總共分了7組播聪，所以兩兩組合，就會很多鲫骗，這個時候犬耻，其實可以將這幾個組和DEPDC1的平均值進行比較，函數只需要加入一個參數执泰，在統(tǒng)計方法的選擇（2）--參數檢驗最后也談及到枕磁。
方法如下：

compare_means(DEPDC1~molecular_group, data = myeloma, 
              ref.group = ".all.", 
              method = "wilcox.test")

結果：

# A tibble: 7 x 8
  .y.    group1 group2                     p     p.adj p.format p.signif method  
  <chr>  <chr>  <chr>                  <dbl>     <dbl> <chr>    <chr>    <chr>   
1 DEPDC1 .all.  Cyclin D-1       0.234       0.94      0.23417  ns       Wilcoxon
2 DEPDC1 .all.  Cyclin D-2       0.790       1         0.78957  ns       Wilcoxon
3 DEPDC1 .all.  Hyperdiploid     0.000639    0.0038    0.00064  ***      Wilcoxon
4 DEPDC1 .all.  Low bone disease 0.00457     0.023     0.00457  **       Wilcoxon
5 DEPDC1 .all.  MAF              0.399       1         0.39923  ns       Wilcoxon
6 DEPDC1 .all.  MMSET            0.383       1         0.38321  ns       Wilcoxon
7 DEPDC1 .all.  Proliferation    0.000000155 0.0000011 1.5e-07  ****     Wilcoxon

可以看到兩兩比較的差異的結果及各組與平均值之間的比較。

3.2 多組之間的差異比較

其實按照上面的思路术吝，結合上一篇计济，多組之間只是在函數中選擇相應的方法。將上面的進行可視化排苍，并選用kruskal.test方法對多組進行比較沦寂，代碼如下

ggboxplot(myeloma, x="molecular_group", y="DEPDC1", 
          color = "molecular_group", add = "jitter", legend="none")+ 
  rotate_x_text(angle = 45)+ 
  geom_hline(yintercept = mean(myeloma$DEPDC1,na.rm = T), linetype=2)+# Add horizontal line at base mean 
  stat_compare_means(method = "kruskal.test",
                     label.y = 2000
                     )+ # Add global kruskal.test p-value 
  stat_compare_means(label = "p.signif", method = "wilcox.test", ref.group = ".all.")# Pairwise comparison against all

根據統(tǒng)計方法選擇的那幅圖，只剩下一個問題淘衙，就是多組比較之后传藏，如果有差異，那么兩兩比較如何進行。還是上一篇中提到的毯侦，compare_means()和stat_compare_means()這兩個函數默認的會對兩兩之間進行差異比較哭靖，更嚴格的來講，這樣其實會增加犯Ⅰ類錯誤的概率侈离，所以试幽，統(tǒng)計方面的專家又進行了事后兩兩比較方法的研究。

寫在后面的話：這一篇其實就是上一篇很簡單的延續(xù)卦碾，之所以獨立出來铺坞，就是為了區(qū)分參數檢驗和非參數檢驗，而這一篇當中用到的函數以及作圖所用的函數都是compare_means()和stat_compare_means()這兩個函數洲胖。其實針對非參數檢驗济榨，有專門的兩個函數kruskal.test()和wilcox.test(),只不過整合到這個stat_compare_means()函數中，在函數參數選擇上選擇相應的函數就好宾濒。

參考文章及書籍

R語言統(tǒng)計分析與機器學習-薛震
R語言統(tǒng)計分析與應用-汪海波
白話統(tǒng)計-馮國雙
一張圖說明統(tǒng)計方法的選擇
 R語言中方差齊性檢驗丨數析學院
 方差分析與R實現
 統(tǒng)計方法如何選以及全代碼作圖實現

統(tǒng)計方法的選擇（3）--非參數檢驗