上次已經(jīng)介紹過了使用R繪制系統(tǒng)發(fā)育樹的基本用法,也埋下一個小坑:復(fù)現(xiàn)文章里好看的樹,現(xiàn)在過來填坑了,哈哈哈。
我準(zhǔn)備了一些文章里自己覺得很不錯的樹(當(dāng)然盡可能風(fēng)格不同),然后自己生成隨機的樹和一些隨機的無科學(xué)意義的注釋(僅供畫圖參考!!!)躬窜,主要是用ggtree和ggtreeExtra進行代碼復(fù)現(xiàn),爭取把樹的主體都用代碼完成炕置,當(dāng)然一些小細(xì)節(jié)就還是不得不使用AI等pdf編輯軟件進行添加了荣挨。
Preparation
首先還是要把我們要用的一些包都安裝好并導(dǎo)入進來。
#tree
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("YuLab-SMU/treedataverse")
BiocManager::install("ggtree")
BiocManager::install("ggtreeExtra")
library(treedataverse)
library(ggtree)
library(ggtreeExtra)
library(treeio)
#plot
library(dplyr)
library(ggplot2)
library(ggnewscale)
library(reshape2)
library(ggrepel)
然后需要準(zhǔn)備一些函數(shù):
#寫個函數(shù)類似child獲取子節(jié)點朴摊,但是可以指定層數(shù)
child2=function(tree,node,depth=1){
if(depth==1){return(child(tree,node))}
else {
return(child2(tree,child(tree,node)%>%unlist()%>%unname(),depth-1)%>%unlist()%>%unname())
}
}
Example1
第一個例子來自Nature Communication的一篇文章 (1)默垄,這是一個相對簡單的樹。
按照ggplot搭積木的邏輯甚纲,我們看看有哪些需要畫的:
- 樹的主體口锭,圓形布局,并打開一個小角度介杆,方便展示注釋信息的x軸label
- 外圈注釋1鹃操,熱圖形式(tile),顏色代表每一個tip的Phylum春哨,透明度代表相對豐度
- 外圈注釋2荆隘,柱形圖形式(col或bar),顏色代表每一個tip的Phylum赴背,高度代表SVM系數(shù)
相應(yīng)的我們生成數(shù)據(jù):
#生成一個2000tip的樹
ntips=2000
set.seed(123)
tree=rtree(ntips,rooted = T)
#變成tbl_tree對象椰拒,方便操作
ggtree::fortify(tree)->tree_df
#生成一些假的taxonomy信息
phylums=c("Arthropoda","Streptophyta","Cyanobacteria","Acidobacteriota","Bacteroidetes","Firmicutes","Actinobacteria","Proteobacteria","Others")
phy_nodes=child2(tree,ntips+1,depth = 4)
set.seed(123)
phy_nodes=setNames(phy_nodes,sample(phylums,length(phy_nodes),replace = T))
tree_df=groupClade(tree_df,phy_nodes,group_name = "Phylum")
anno=filter(tree_df,isTip=="TRUE")%>%select(label,Phylum)
#生成隨機數(shù)作為豐度和SVM
anno=cbind(anno,sapply(1:4, \(i)rexp(ntips,3))%>%vegan::decostand(.,"total",2))
anno=cbind(anno,rexp(ntips)/10)
colnames(anno)=c("node","Phylum","Ice/Snow","Terrestrial","Marine","Freshwater","SVM")
head(anno)
## node Phylum Ice/Snow Terrestrial Marine Freshwater
## 1 t339 Cyanobacteria 0.0007693819 0.000448952905 0.00003100616 0.00034017986
## 2 t1180 Cyanobacteria 0.0002356377 0.000483643521 0.00124672220 0.00009722752
## 3 t1807 Cyanobacteria 0.0002908480 0.000035084895 0.00074445757 0.00011096549
## 4 t572 Cyanobacteria 0.0019889166 0.000108407239 0.00098076293 0.00014436096
## 5 t1739 Cyanobacteria 0.0004149838 0.000004413762 0.00006021236 0.00006270886
## 6 t1245 Cyanobacteria 0.0004753852 0.000004451763 0.00015540813 0.00011400371
## SVM
## 1 0.0001332960
## 2 0.0474528998
## 3 0.0222924708
## 4 0.0001208082
## 5 0.0169753369
## 6 0.0277805382
有了樹和注釋數(shù)據(jù),我們開始繪圖:
# 1. 樹的主體,樹枝太多把size調(diào)小癞尚,圓形布局耸三,并打開一個小角度
p=ggtree(tree,size=0.1,layout = "fan",open.angle = 10)
#(p=ggtree(tree_df,aes(color=Phylum),size=0.1,layout = "fan",open.angle = 20))
p
# 2. 外圈注釋1,熱圖形式(tile)浇揩,顏色代表每一個tip的Phylum仪壮,透明度代表相對豐度
anno1=melt(anno[,1:6],id.vars =1:2,variable.name = "Env",value.name = "Abundance")
head(anno1)
## node Phylum Env Abundance
## 1 t339 Cyanobacteria Ice/Snow 0.0007693819
## 2 t1180 Cyanobacteria Ice/Snow 0.0002356377
## 3 t1807 Cyanobacteria Ice/Snow 0.0002908480
## 4 t572 Cyanobacteria Ice/Snow 0.0019889166
## 5 t1739 Cyanobacteria Ice/Snow 0.0004149838
## 6 t1245 Cyanobacteria Ice/Snow 0.0004753852
p1=p+geom_fruit(
data=anno1,
geom = geom_tile,
mapping = aes(y=node,x=Env,fill=Phylum,alpha=Abundance),
pwidth = 0.2,
axis.params=list(axis="x",text.size = 2,text.angle=270)
)+scale_alpha(range = c(0,1),guide=guide_none())+
ggsci::scale_fill_npg()
p1
# 3. 外圈注釋2,柱形圖形式(col或bar)胳徽,顏色代表每一個tip的Phylum积锅,高度代表SVM系數(shù)
p2=p1+geom_fruit(
data=anno,
geom = geom_col,
mapping = aes(y=node,x=SVM,fill=Phylum),
pwidth = 0.3,
axis.params=list(axis="x",text.size = 2),
grid.params = list()
)+theme(legend.position = c(0,0.3))
p2
Example2
第二個例子來自Nature Microbiology的一篇文章 (2)。
我們看看有哪些需要畫的:
- 樹的主體养盗,比較特別的布局(equal_angle)缚陷,并且樹枝要加上一些Form的分類顏色信息,再加上一個scale標(biāo)尺
- 外圈注釋1,標(biāo)簽往核,在每類分支附近箫爷,背景顏色是Form的分類
- 外圈注釋2,點和文字,應(yīng)該是手動挑選的一些節(jié)點虎锚,在樹枝頂端加上了灰點以及黑色文字
相應(yīng)的我們生成數(shù)據(jù):
#生成一個400tip的樹
ntips=400
set.seed(123)
tree=rtree(ntips,rooted = T)
#變成tbl_tree對象硫痰,方便操作
ggtree::fortify(tree,layout ="equal_angle" )->tree_df
#生成一些假的Form信息
forms=paste0("Form ",c("I","II","II/III","III-a","III-b","III-c","III-like","IV"))
phy_nodes=child2(tree,ntips+1,depth = 3)
set.seed(123)
phy_nodes=setNames(phy_nodes,sample(forms,length(phy_nodes),replace = F))
tree_df=groupClade(tree_df,phy_nodes,group_name = "Form")
#指定顏色
colors=c("#1F77B4FF","#FF7F0EFF","#2CA02CFF","#D62728FF","#9467BDFF","#8C564BFF","#E377C2FF","#BCBD22FF","#17BECFFF")
#挑選一些nodes
set.seed(123)
label_node=sample(seq_len(ntips),20)
有了樹和注釋數(shù)據(jù),我們開始繪圖:
# 1. 樹的主體窜护,比較特別的布局(equal_angle)效斑,并且樹枝要加上一些Form的分類顏色信息
p=ggtree(tree_df,aes(color=Form),layout = "equal_angle")+
geom_treescale(-5,7,fontsize=3, linesize=0.5,width=1)+
scale_color_manual(values = c("black",colors))+
coord_flip()+theme(legend.position = "none")
p
# 2. 外圈注釋1,標(biāo)簽柱徙,在每類分支附近缓屠,背景顏色是Form的分類
p1=p+geom_label_repel(data = subset(tree_df,node%in%phy_nodes),
mapping = aes(x=x,y=y,label=Form,fill=Form),color="black",alpha=0.7)+
scale_fill_manual(values = colors)
p1
# 3. 外圈注釋2,點和文字护侮,應(yīng)該是手動挑選的一些節(jié)點敌完,在樹枝頂端加上了灰點以及黑色文字
p2=p1+geom_point(data = subset(tree_df,node%in%label_node),
mapping = aes(x=x*1.03,y=y*1.03),color="grey50")+
geom_text_repel(data = subset(tree_df,node%in%label_node),
mapping = aes(x=x*1.05,y=y*1.05,label=label),color="black")
p2
當(dāng)然文字和標(biāo)簽的位置有點不太好,需要導(dǎo)出pdf再稍微調(diào)整一下概行。
Example3
第三個例子來自Nature Biotechnology的一篇文章 (3) 蠢挡。
我們看看有哪些需要畫的:
- 樹的主體弧岳,層級樹的感覺(把branch.length忽略了凳忙,所有的tip在一個位置),打開角度為180禽炬,灰色樹枝
- 內(nèi)圈注釋涧卵,給部分clade加上不同Phylum的背景顏色
- 外圈注釋1,3圈熱圖腹尖,用的是有無數(shù)據(jù)
- 外圈注釋2柳恐,2圈柱形圖,Size和GC含量
相應(yīng)的我們生成數(shù)據(jù):
#生成一個1000tip的樹
ntips=1000
set.seed(123)
tree=rtree(ntips,rooted = T)
#變成tbl_tree對象热幔,方便操作
ggtree::fortify(tree)->tree_df
#生成一些假的taxonomy信息
phylums=rev(c("Arthropoda","Streptophyta","Cyanobacteria","Acidobacteriota","Bacteroidetes","Firmicutes","Actinobacteria","Proteobacteria"))
#把8個phylum賦給最多tip的幾個clade乐设,其他的是others
phy_nodes=child2(tree,ntips+1,depth = 4)
phy_nodes_tips=sapply(phy_nodes, \(i)nrow(offspring(tree_df,i)))
names(phy_nodes)=rep("Others",length(phy_nodes))
names(phy_nodes)[which(phy_nodes_tips%in%tail(sort(phy_nodes_tips),8))]=phylums
tree_df=groupClade(tree_df,phy_nodes,group_name = "Phylum1")
anno=filter(tree_df,isTip=="TRUE")%>%select(label,Phylum1)
#添加每個phylum的個數(shù)和百分比
anno%>%count(Phylum1)%>%mutate(per=100*n/sum(n))%>%
mutate(Phylum=paste0(Phylum1," (",n,", ",per,"%)"))%>%select(Phylum1,Phylum)->in_anno
in_anno=right_join(in_anno,data.frame(node=phy_nodes,Phylum1=names(phy_nodes)))
set.seed(123)
#生成隨機變量作為Source,16S rRNA presence,Newly identified
anno$Source=sample(c("Cultivated","MAG"),ntips,T,c(0.1,0.9))
anno$`16S rRNA presence`=sample(c("Yes","No"),ntips,T,c(0.3,0.7))
anno$`Newly identified`=sample(c("Yes","No"),ntips,T,c(0.9,0.1))
#生成隨機數(shù)作為Size和GC含量
anno$`Size (Mb)`=10-rpois(ntips,2)
anno$`GC (%)`=runif(ntips,30,80)
colnames(anno)[1]="node"
head(anno)
## # A tibble: 6 × 7
## node Phylum1 Source `16S rRNA presence` Newly identifie…1 Size …2 GC (%…3
## <chr> <fct> <chr> <chr> <chr> <dbl> <dbl>
## 1 t876 Others MAG No Yes 9 45.2
## 2 t896 Others MAG No Yes 6 71.6
## 3 t437 Others MAG No Yes 9 59.7
## 4 t750 Others MAG Yes Yes 8 70.4
## 5 t270 Others Cultivated Yes Yes 9 44.7
## 6 t412 Others MAG No Yes 8 37.1
## # … with abbreviated variable names 1`Newly identified`, 2`Size (Mb)`,
## # 3`GC (%)`
# 1. 樹的主體,層級樹的感覺(把branch.length忽略了绎巨,所有的tip在一個位置)近尚,打開角度為180,灰色樹枝
p=ggtree(tree,layout = "fan",open.angle = 180,branch.length = "none",size=0.2,color="grey")
p
# 2. 內(nèi)圈注釋场勤,給部分clade加上不同Phylum的背景顏色
p1=p+geom_highlight(data = in_anno,
mapping = aes(node=node,fill=Phylum),to.bottom = T,alpha=1)+
ggsci::scale_fill_rickandmorty(guide=guide_legend(ncol = 2,title.position = "top",title = "Clade: Phylum",order = 4))
p1
# 3. 外圈注釋1戈锻,3圈熱圖,用的是有無數(shù)據(jù)
p2=p1+ggnewscale::new_scale_fill()+
geom_fruit(
data=anno,
geom = geom_tile,
mapping = aes(y=node,fill=`Newly identified`),
pwidth = 0.05,offset = 0.05,color=NA
)+scale_fill_manual(values = c("white","red"),
guide=guide_legend(ncol = 2,title.position = "top",override.aes = list(color="red",size=3),order = 3))+
ggnewscale::new_scale_fill()+
geom_fruit(
data=anno,
geom = geom_tile,
mapping = aes(y=node,fill=`16S rRNA presence`),
pwidth = 0.05,offset = 0.1,color=NA
)+scale_fill_manual(values = c("white","blue4"),
guide=guide_legend(ncol = 2,title.position = "top",override.aes = list(color="blue4",size=3),order = 2))+
ggnewscale::new_scale_fill()+
geom_fruit(
data=anno,
geom = geom_tile,
mapping = aes(y=node,fill=Source),
pwidth = 0.05,offset = 0.1,color=NA
)+scale_fill_manual(values = c("green4","white"),
guide=guide_legend(ncol = 2,title.position = "top",override.aes = list(color="green4",size=3),order = 1))
p2
# 4. 外圈注釋2和媳,2圈柱形圖格遭,Size和GC含量
p3=p2+geom_fruit(
data=anno,
geom = geom_col,
mapping = aes(y=node,x=`Size (Mb)`),fill="purple3",
pwidth = 0.15,offset = 0.1,
axis.params=list(axis="x",text.size = 2,text.angle=90,hjust=1,nbreak=3,line.color="black"),
grid.params = NULL
)+geom_fruit(
data=anno,
geom = geom_col,
mapping = aes(y=node,x=`GC (%)`),fill="#F7C194",
pwidth = 0.2,offset = 0.05,
axis.params=list(axis="x",text.size = 2,text.angle=90,hjust=1,nbreak=2,line.color="black"),
grid.params = NULL
)+theme(legend.position = c(0.5,0.3),legend.box = "horizontal",
legend.text = element_text(size=8))
#最后再加上幾個標(biāo)簽
p3+geom_text(data = data.frame(x=c(20,22,24,27,31),y=c(10),
label=c("Source ","16S rRNA presence ","Newly identified ","Size (Mb) ","GC (%) ")),
aes(x,y,label=label),angle=90,hjust=1,size=3)
Example4
第四個例子來自Nature的一篇文章 (4)。這個圖是用iTOL做的留瞳,因為iTOL支持直接畫tip到圓等半徑的空間顏色填充拒迅。但是我覺得用R還是一樣能畫。
我們看看有哪些需要畫的:
- 樹的主體,很正常璧微,打開小角度骤竹,開口在左上角
- 內(nèi)圈注釋,給部分clade加上不同Phylum的顏色往毡,但是這個色塊是加在tip到圓等半徑的空間(這個很有意思蒙揣,還沒有看到過別人用R實現(xiàn)過)
- 外圈注釋1,方塊代表phage开瞭,顏色代表family
- 外圈注釋2懒震,灰色五角星代表Genome contains Thoeris
- 外圈注釋3,綠色菱形加上文字
相應(yīng)的我們生成數(shù)據(jù):
#生成一個200tip的樹
ntips=200
set.seed(123)
tree=rtree(ntips,rooted = T)
#變成tbl_tree對象嗤详,方便操作
ggtree::fortify(tree)->tree_df
#生成一些假的taxonomy信息
phylums=rev(c("Arthropoda","Streptophyta","Cyanobacteria","Acidobacteriota","Bacteroidetes","Firmicutes","Actinobacteria","Proteobacteria"))
#把8個phylum賦給最多tip的幾個clade个扰,其他的是others
phy_nodes=child2(tree,ntips+1,depth = 4)
phy_nodes_tips=sapply(phy_nodes, \(i)nrow(offspring(tree_df,i)))
names(phy_nodes)=rep("Unknown",length(phy_nodes))
names(phy_nodes)[which(phy_nodes_tips%in%tail(sort(phy_nodes_tips),8))]=phylums
tree_df=groupClade(tree_df,phy_nodes,group_name = "Phylum")
anno=filter(tree_df,isTip=="TRUE")%>%select(label,Phylum)%>%rename(node="label")
colors=c("Firmicutes"="#d3edeb","Actinobacteria"="#019a99","Bacteroidetes"="#0077b0",
"Proteobacteria"="#ffba4d","Acidobacteriota"="#282152","Cyanobacteria"="#caa59a",
"Streptophyta"="#ff7880","Arthropoda"="#aac8eb",Unknown="white")
set.seed(123)
#生成隨機變量作為Source,16S rRNA presence,Newly identified
anno$`Phage family`=anno$`Thoeris`=anno$`bac`=NA
anno$`Phage family`[sample(seq_len(ntips),30)]=sample(c("Myoviridae","Podoviridae","Siphoviridae"),30,replace = T,prob = c(0.2,0.1,0.7))
anno$`Thoeris`[sample(seq_len(ntips),5)]="Genome contains Thoeris"
anno$bac[sample(seq_len(ntips),10)]="Bac"
# 1. 樹的主體,很正常葱色,打開小角度递宅,開口在左上角
p=ggtree(tree,layout = "fan",open.angle = 5)
# 2. 內(nèi)圈注釋,給部分clade加上不同Phylum的顏色苍狰,但是這個色塊是加在tip到圓等半徑的空間(這個很有意思办龄,還沒有看到過別人用R實現(xiàn)過)
p1=p+geom_tiplab(data=tree_df,mapping = aes(color=Phylum),align = T,linetype = 1,linesize = 3.5,size=0)+
scale_color_manual(values = colors,guide=guide_legend(title = "Host phylum",nrow = 3))+geom_tree(layout = "fan")
p1
# 3. 外圈注釋1,方塊代表phage淋昭,顏色代表family
library(ggstar)
p2=p1+geom_fruit(
geom = geom_star,
data = anno,
mapping = aes(
y=node,fill=`Phage family`
),
starshape=13,
starstroke=0,size=4
)+scale_fill_manual(values = c("#de255c","#496db6","#c4c64f"),na.translate=FALSE,guide=guide_legend(ncol = 1))
p2
# 4. 外圈注釋2俐填,灰色五角星代表Genome contains Thoeris
p3=p2+new_scale_fill()+
geom_fruit(
geom = geom_star,
data = anno,
mapping = aes(
y=node,fill=`Thoeris`
),
starshape=1,offset = 0,
starstroke=0,size=5
)+scale_fill_manual(name="",values = c("grey"),na.translate=FALSE)
# 5. 外圈注釋3,綠色菱形加上文字
p4=p3+new_scale_fill()+
geom_fruit(
geom = geom_star,
data = anno,
mapping = aes(
y=node,fill=`bac`
),
starshape=12,offset = 0.1,
starstroke=0,size=3
)+scale_fill_manual(values = c("#74cfd2"),na.translate=FALSE,guide=guide_none())+
geom_tiplab(data = tree_df%>%filter(label%in%(anno$node[which(!is.na(anno$bac))])),
mapping = aes(label=label),angle=0,align = T,linetype = 0,offset = 2.5)
p4+theme(legend.position = "bottom")
呼~翔忽,暫時先做這幾個圖吧英融,再次強調(diào),這是生成隨機的樹和一些隨機的無科學(xué)意義的注釋(僅供畫圖參考P健J晃颉!)材失。
如果你有好看的圖需要復(fù)現(xiàn)或者有什么繪圖上的問題痕鳍,歡迎聯(lián)系。
Reference
- M. Bourquin, S. B. Busi, S. Fodelianakis, H. Peter, et al., The microbiome of cryospheric ecosystems. Nature Communications. 13, 3087 (2022).
- M. Royo-Llonch, P. Sánchez, C. Ruiz-González, G. Salazar, et al., Compendium of 530 metagenome-assembled bacterial and archaeal genomes from the polar Arctic Ocean. Nature Microbiology. 6, 1561–1574 (2021).
- Y. Liu, M. Ji, T. Yu, J. Zaugg, et al., A genome and gene catalog of glacier microbiomes. Nature Biotechnology. 40, 1341–1348 (2022).
- A. Leavitt, E. Yirmiya, G. Amitai, A. Lu, et al., Viruses inhibit TIR gcADPR signalling to overcome bacterial defence. Nature. 611, 326–331 (2022).