GWAS分析后笆环,顯著性的SNP,我們想把不同的分型在群體里面的表型進(jìn)行可視化嘶居,現(xiàn)在最流行的方法是箱線圖+散點圖+顯著性標(biāo)識罪帖,類似的圖片如下:
上面A、B邮屁、C是三種基因型整袁,比如AA、AT佑吝、TT坐昙,可以看到A分型顯著低于B和C分型,B和C分型之間差異不顯著芋忿。這個圖包括:箱線圖+散點圖+顯著性炸客,非常直觀,一圖勝千言戈钢。
除了上面的上面的需求痹仙,我們還有其他的圖可以繪制,具體匯總?cè)缦拢?/p>
本節(jié)要實現(xiàn)下面截個圖:
「單因素二水平T檢驗箱線圖可視化」
「單因素三水平T檢驗箱線圖可視化」
「單因素三水平柱形圖」
「單因素三水平折線圖」
「二因素柱形圖」
「二因素折線圖」
- 單因素二水平
這種試驗殉了,比如有兩個品種开仰,株高的差異,每個品種調(diào)查了10株,就構(gòu)成了這樣的試驗數(shù)據(jù)众弓。
「模擬數(shù)據(jù):」
set.seed(123)
y1 = rnorm(10) + 5
y2 = rnorm(10) + 15
dd = data.frame(Group = rep(c("A","B"),each=10),y = c(y1,y2))
dd
str(dd)
dd$Group = as.factor(dd$Group)
「數(shù)據(jù):」
> dd
Group y
1 A 4.439524
2 A 4.769823
3 A 6.558708
4 A 5.070508
5 A 5.129288
6 A 6.715065
7 A 5.460916
8 A 3.734939
9 A 4.313147
10 A 4.554338
11 B 16.224082
12 B 15.359814
13 B 15.400771
14 B 15.110683
15 B 14.444159
16 B 16.786913
17 B 15.497850
18 B 13.033383
19 B 15.701356
20 B 14.527209
這里恩溅,使用的是ggpubr包進(jìn)行繪圖:
1.1 繪制箱線圖
library(ggplot2)
library(ggpubr)
ggboxplot(dd,x = "Group",y = "y")
1.2 箱線圖添加不同顏色
ggboxplot(dd,x = "Group",y = "y",color = "Group")
1.3 箱線圖添加散點圖
ggboxplot(dd,x = "Group",y = "y",color = "Group",add = "jitter")
1.4 箱線圖+散點圖+顯著性水平
這里,默認(rèn)的統(tǒng)計方法是非參數(shù)統(tǒng)計Wilcoxon谓娃,如果想用t.test脚乡,見下面操作
ggboxplot(dd,x = "Group",y = "y",color = "Group",add = "jitter") +
stat_compare_means()
1.5 用t.test作為統(tǒng)計方法
ggboxplot(dd,x = "Group",y = "y",color = "Group",add = "jitter") +
stat_compare_means(method = "t.test")
1.6 直接輸出顯著性
ggboxplot(dd,x = "Group",y = "y",color = "Group",add = "jitter") +
stat_compare_means(method = "t.test",label = "p.signif")
- 單因素三水平
二個水平可以用T檢驗,三個水平或者多個水平的數(shù)據(jù)傻粘,如何檢驗?zāi)兀?/li>
「模擬數(shù)據(jù):」
# 構(gòu)建三個水平 ANOVA
set.seed(123)
y1 = rnorm(10) + 5
y2 = rnorm(10) + 15
y3 = rnorm(10) + 15
dd = data.frame(Group = rep(c("A","B","C"),each=10),y = c(y1,y2,y3))
dd
str(dd)
dd$Group = as.factor(dd$Group)
「數(shù)據(jù)如下:」
> dd
Group y
1 A 4.439524
2 A 4.769823
3 A 6.558708
4 A 5.070508
5 A 5.129288
6 A 6.715065
7 A 5.460916
8 A 3.734939
9 A 4.313147
10 A 4.554338
11 B 16.224082
12 B 15.359814
13 B 15.400771
14 B 15.110683
15 B 14.444159
16 B 16.786913
17 B 15.497850
18 B 13.033383
19 B 15.701356
20 B 14.527209
21 C 13.932176
22 C 14.782025
23 C 13.973996
24 C 14.271109
25 C 14.374961
26 C 13.313307
27 C 15.837787
28 C 15.153373
29 C 13.861863
30 C 16.253815
2.1 箱線圖+散點圖
p = ggboxplot(dd,x = "Group",y = "y",color = "Group",add = "jitter")
p
2.2 箱線圖+散點圖+顯著性
p + stat_compare_means(method = "anova")
2.3 兩兩之間顯著性繪制
my_comparisons = list( c("A", "B"), c("A", "C"), c("B", "C") )
p + stat_compare_means(comparisons = my_comparisons,
# label = "p.signif",
method = "t.test")
2.4 顯示顯著性
p + stat_compare_means(comparisons = my_comparisons,
label = "p.signif",
method = "t.test")
- 兩因素數(shù)據(jù)
「模擬數(shù)據(jù):」
# 兩個因素的數(shù)據(jù)
set.seed(123)
y1 = rnorm(10) + 5
y2 = rnorm(10) + 8
y3 = rnorm(10) + 7
y4 = rnorm(10) + 15
y5 = rnorm(10) + 18
y6 = rnorm(10) + 17
dd = data.frame(Group1 = rep(c("A","B","C"),each=10),
Group2 = rep(c("X","Y"),each=30),
y = c(y1,y2,y3,y4,y5,y6))
dd
str(dd)
dd$Group1 = as.factor(dd$Group1)
dd$Group2 = as.factor(dd$Group2)
str(dd)
「數(shù)據(jù)預(yù)覽:」
> dd
Group1 Group2 y
1 A X 4.439524
2 A X 4.769823
3 A X 6.558708
4 A X 5.070508
5 A X 5.129288
6 A X 6.715065
7 A X 5.460916
8 A X 3.734939
9 A X 4.313147
10 A X 4.554338
11 B X 9.224082
12 B X 8.359814
13 B X 8.400771
14 B X 8.110683
15 B X 7.444159
16 B X 9.786913
17 B X 8.497850
18 B X 6.033383
19 B X 8.701356
20 B X 7.527209
21 C X 5.932176
22 C X 6.782025
23 C X 5.973996
24 C X 6.271109
25 C X 6.374961
26 C X 5.313307
27 C X 7.837787
28 C X 7.153373
29 C X 5.861863
30 C X 8.253815
31 A Y 15.426464
32 A Y 14.704929
33 A Y 15.895126
34 A Y 15.878133
35 A Y 15.821581
36 A Y 15.688640
37 A Y 15.553918
38 A Y 14.938088
39 A Y 14.694037
40 A Y 14.619529
41 B Y 17.305293
42 B Y 17.792083
43 B Y 16.734604
44 B Y 20.168956
45 B Y 19.207962
46 B Y 16.876891
47 B Y 17.597115
48 B Y 17.533345
49 B Y 18.779965
50 B Y 17.916631
51 C Y 17.253319
52 C Y 16.971453
53 C Y 16.957130
54 C Y 18.368602
55 C Y 16.774229
56 C Y 18.516471
57 C Y 15.451247
58 C Y 17.584614
59 C Y 17.123854
60 C Y 17.215942
3.1 繪制分組箱線圖
p = ggboxplot(dd,x = "Group1",y="y",color = "Group2",
add = "jitter")
p
3.2 增加P值
p + stat_compare_means(aes(group = Group2),method = "t.test")
3.3 修改為顯著性結(jié)果
p + stat_compare_means(aes(group = Group2),method = "t.test",label = "p.signif")
3.4 將分組數(shù)據(jù)分開繪制
p = ggboxplot(dd,x = "Group2",y="y",color = "Group1",
add = "jitter",facet.by = "Group1")
p
3.5 分組顯示統(tǒng)計檢驗
p + stat_compare_means(method = "t.test")
3.6 分組顯示顯著性結(jié)果
p + stat_compare_means(method = "t.test",label = "p.signif",label.y = 17)
- 單因素直方圖繪制
直方圖+標(biāo)準(zhǔn)誤每窖,之前用ggplot2需要很長的代碼,這里有更好的方案弦悉。
4.1 直方圖+標(biāo)準(zhǔn)誤
p = ggbarplot(dd,x = "Group1",y = "y",add = "mean_se",color = "Group1")
p
4.2 直方圖+標(biāo)準(zhǔn)誤+顯著性
p + stat_compare_means(method = "anova",,label.y = 15)+
stat_compare_means(comparisons = my_comparisons)
- 單因素折線圖繪制
5.1 折線圖+標(biāo)準(zhǔn)誤
p = ggline(dd,x = "Group1",y = "y",add = "mean_se")
p
5.2 折線圖+標(biāo)準(zhǔn)誤+顯著性
p + stat_compare_means(method = "anova",,label.y = 15)+
stat_compare_means(comparisons = my_comparisons)
- 二因素直方圖繪制
6.1 直方圖+標(biāo)準(zhǔn)誤
p = ggbarplot(dd,x = "Group1",y = "y",add = "mean_se",color = "Group2", position = position_dodge(0.8))
p
6.2 直方圖+標(biāo)準(zhǔn)誤+顯著性
p + stat_compare_means(aes(group=Group2), label = "p.signif")
- 二因素折線圖繪制
7.1 折線圖+標(biāo)準(zhǔn)誤
p = ggline(dd,x = "Group1",y = "y",add = "mean_se",color = "Group2", position = position_dodge(0.8))
p
7.2 折線圖+標(biāo)準(zhǔn)誤+顯著性
p + stat_compare_means(aes(group=Group2), label = "p.signif")
- 代碼匯總
下面代碼是所有代碼的匯總窒典,里面包括生成數(shù)據(jù),做不同類型的圖稽莉。只需要將數(shù)據(jù)整理為這種格式瀑志,就可以出圖了,對于初學(xué)者而言污秆,是最簡單最快捷的方法劈猪。show you the code!
# > 歡迎關(guān)注我的公眾號:`育種數(shù)據(jù)分析之放飛自我`良拼。主要分享R語言战得,Python,育種數(shù)據(jù)分析庸推,生物統(tǒng)計常侦,數(shù)量遺傳學(xué),混合線性模型贬媒,GWAS和GS相關(guān)的知識聋亡。
# 構(gòu)建兩個水平 T-test
set.seed(123)
y1 = rnorm(10) + 5
y2 = rnorm(10) + 15
dd = data.frame(Group = rep(c("A","B"),each=10),y = c(y1,y2))
dd
str(dd)
dd$Group = as.factor(dd$Group)
library(ggplot2)
library(ggpubr)
ggboxplot(dd,x = "Group",y = "y")
ggboxplot(dd,x = "Group",y = "y",color = "Group")
ggboxplot(dd,x = "Group",y = "y",color = "Group",add = "jitter")
ggboxplot(dd,x = "Group",y = "y",color = "Group",add = "jitter") +
stat_compare_means()
ggboxplot(dd,x = "Group",y = "y",color = "Group",add = "jitter") +
stat_compare_means(method = "t.test")
ggboxplot(dd,x = "Group",y = "y",color = "Group",add = "jitter") +
stat_compare_means(method = "t.test",label = "p.signif")
# 構(gòu)建三個水平 ANOVA
set.seed(123)
y1 = rnorm(10) + 5
y2 = rnorm(10) + 15
y3 = rnorm(10) + 15
dd = data.frame(Group = rep(c("A","B","C"),each=10),y = c(y1,y2,y3))
dd
str(dd)
dd$Group = as.factor(dd$Group)
p = ggboxplot(dd,x = "Group",y = "y",color = "Group",add = "jitter")
p
p + stat_compare_means(method = "anova")
# Perorm pairwise comparisons
# compare_means(y ~ Group, data = dd,method = "anova")
my_comparisons = list( c("A", "B"), c("A", "C"), c("B", "C") )
p + stat_compare_means(comparisons = my_comparisons,
# label = "p.signif",
method = "t.test")
p + stat_compare_means(comparisons = my_comparisons,
label = "p.signif",
method = "t.test")
# 兩個因素的數(shù)據(jù)
set.seed(123)
y1 = rnorm(10) + 5
y2 = rnorm(10) + 8
y3 = rnorm(10) + 7
y4 = rnorm(10) + 15
y5 = rnorm(10) + 18
y6 = rnorm(10) + 17
dd = data.frame(Group1 = rep(c("A","B","C"),each=10),
Group2 = rep(c("X","Y"),each=30),
y = c(y1,y2,y3,y4,y5,y6))
dd
str(dd)
dd$Group1 = as.factor(dd$Group1)
dd$Group2 = as.factor(dd$Group2)
str(dd)
## 分組查看
p = ggboxplot(dd,x = "Group1",y="y",color = "Group2",
add = "jitter")
p
p + stat_compare_means(aes(group = Group2),method = "t.test")
p + stat_compare_means(aes(group = Group2),method = "t.test",label = "p.signif")
## 分組查看
p = ggboxplot(dd,x = "Group2",y="y",color = "Group1",
add = "jitter",facet.by = "Group1")
p
p + stat_compare_means(method = "t.test")
p + stat_compare_means(method = "t.test",label = "p.signif",label.y = 17)
# 單分組
# 三水平直方圖
p = ggbarplot(dd,x = "Group1",y = "y",add = "mean_se",color = "Group1")
p
p + stat_compare_means(method = "anova",,label.y = 15)+
stat_compare_means(comparisons = my_comparisons)
# 有誤差的折線圖
p = ggline(dd,x = "Group1",y = "y",add = "mean_se")
p
p + stat_compare_means(method = "anova",,label.y = 15)+
stat_compare_means(comparisons = my_comparisons)
# 二分組
p = ggbarplot(dd,x = "Group1",y = "y",add = "mean_se",color = "Group2", position = position_dodge(0.8))
p
p + stat_compare_means(aes(group=Group2), label = "p.signif")
# 有誤差的折線圖
p = ggline(dd,x = "Group1",y = "y",add = "mean_se",color = "Group2", position = position_dodge(0.8))
p
p + stat_compare_means(aes(group=Group2), label = "p.signif")
本文引自育種數(shù)據(jù)分析之放飛自我