Pophelper基本說(shuō)明
群體遺傳下游分析中的一項(xiàng)常規(guī)分析即是群體結(jié)構(gòu)的分層展示(STRUCTURE ANALYSIS)鳞青,不同于系統(tǒng)樹(shù)和PCA,群體結(jié)構(gòu)分層可以分?jǐn)喑鲂∪后w的個(gè)數(shù)番甩,每個(gè)小群體之間的基因交流情況侵贵,甚至是小群體或者個(gè)體內(nèi)的血源組成届搁。
群體結(jié)構(gòu)分層常用軟件有STRUCTURE缘薛,ADMIXTURE和faststructure。STRUCTURE是群體結(jié)構(gòu)分析的經(jīng)典軟件卡睦,但運(yùn)行速度較慢宴胧。ADMIXTURE和faststructure軟件等是近些年較新的軟件,由于運(yùn)算速度相對(duì)較快表锻,已有了較多的引用次數(shù)恕齐。
群體結(jié)構(gòu)分層的可視化展示通常是以堆疊柱狀圖所展示,Pophelper即是面向群體結(jié)構(gòu)分層展示的強(qiáng)大的R包軟件瞬逊。
- 主頁(yè):[http://www.royfrancis.com/pophelper/]
- Web App(wrote by shiny) [https://roymf.shinyapps.io/structure/]
-
軟件基本流程workflow:
使用介紹
1. 安裝
R version >3.5
# install the dependency packages
install.packages(c("devtools","ggplot2","gridExtra","gtable","label.switching","tidyr"),dependencies=T)
# install pophelper package from GitHub
devtools::install_github('royfrancis/pophelper')
2. 讀取文件
Pophelper接受structure显歧,admixture,faststructure确镊,tess等軟件的輸出文件士骤。個(gè)人較熟悉的ADMIXTURE和faststructure,其輸出文件結(jié)構(gòu)都是以meanQ和meanP的矩陣文件蕾域。
- 以Admixture結(jié)果文件為示例[https://github.com/royfrancis/pophelper/tree/master/inst/files/admixture]
library(pophelper)
options(stringsAsFactors = F)
dir.create("pophelper_learning")
setwd("pophelper_learning/")
### INPUT STRUCTURE RESULT FILES
sfiles <- list.files(path=system.file("files/structure",package="pophelper"), full.names=T)
slist <- readQ(files=sfiles)
### INPUT ADMIXTURE RESULT FILES
alist <- readQ(list.files(path=system.file("files/admixture",package="pophelper"), full.names=T)
3. 繪制最佳K值線
Pophelper中evannoMethodStructure()函數(shù)僅支持對(duì)STRUCTURE的結(jié)果繪制最佳K值線拷肌。其基本步驟包括三步
-
tabularQ()
到旦,接收讀取的structure list文件 -
summariseQ()
,接收tabularQ返回結(jié)果 -
evannoMethodStructure()
巨缘,接收summ返回結(jié)果添忘,繪制最佳K值線
tbq <- tabulateQ(slist)
smq <- summariseQ(tbq)
evannoMethodStructure(data=sr1,exportplot=T,returnplot=T,returndata=F,basesize=12,linesize=0.7,height = 10,width = 12,outputfilename = "test")
-
最佳K值線結(jié)果
4. 繪制柱狀堆疊圖plotQ()
一個(gè)plotQ包含了復(fù)雜的柱狀堆疊圖的參數(shù),一些常用參數(shù):
-
imgoutput= "sep"/"join"
默認(rèn)sep若锁,展示每個(gè)K值圖搁骑,或者合并 -
showsp=T
:strip panel,展示每個(gè)K值堆疊圖的標(biāo)簽sppos="left"
-
splab="nameK1"
splab=paste0("K=",sapply(slist[c(1,4:8)],ncol))
僅顯示K=num的標(biāo)簽 -
spbgcol=
..
-
clustercol=c("#A6CEE3", "#3F8EAA", "#79C360".....)
堆疊柱狀圖的顏色 -
showlegend=T
:展示圖例legend。 -
useindlab=T
:show individual lab 每個(gè)堆疊柱狀圖的label展示又固,需要q矩陣的rowname()靶病。-
indlabsize
,indlabcol
-
-
sortind="all"/"Cluster1"
不設(shè)置時(shí)是默認(rèn)是按照樣本rowname()的順序展示堆疊圖,可設(shè)置cluster排序口予,或者個(gè)人手動(dòng)調(diào)整matrix矩陣的樣本順序 -
grplab=onelabset1
(group label)在底部分組展示娄周。含有g(shù)roup時(shí)設(shè)置sorted會(huì)同時(shí)顯示。onelabeset1為列表沪停,其順序同是按照meanQ矩陣的rownames()順序來(lái)的煤辨。grpsize
-
grpangle=90
字體垂直展示
-
panel spacer=0.3
對(duì)join合并圖柱狀堆疊圖中間距離
5. 一些群體結(jié)構(gòu)圖的示例
sfiles <- list.files(path=system.file("files/structure",package="pophelper"), full.names=T)
slist <- readQ(files=sfiles,indlabfromfile=T)
threelabset <- read.delim(system.file("files/metadata.txt", package="pophelper"), header=T,stringsAsFactors=F)
twolabset <- threelabset[,2:3] ### group label
##繪圖
plotQ(slist[2:3],imgoutput="join",showindlab=T,grplab=twolabset,
subsetgrp=c("Brazil","Greece"),selgrp="loc",ordergrp=T,showlegend=T,
showtitle=T,showsubtitle=T,titlelab="The Great Structure",
subtitlelab="The amazing population structure of your favourite organism.",
height=1.6,indlabsize=2.3,indlabheight=0.08,indlabspacer=-1,
barbordercolour="white",barbordersize=0,outputfilename="plotq",imgtype="png")
Reference
[http://www.royfrancis.com/pophelper/articles/index.html#plotq]