今天分享一篇關(guān)于分析細(xì)胞通訊的,發(fā)表于2020年nature protocols的相關(guān)文章地回。
CellPhoneDB的配體-受體數(shù)據(jù)庫集成于UniProt扁远、Ensembl、PDB刻像、IUPHAR等畅买,共存儲(chǔ)978種蛋白質(zhì):其中501種是分泌蛋白,而585種是膜蛋白细睡,這些蛋白質(zhì)參與1,396種相互作用谷羞。CellPhoneDB數(shù)據(jù)庫還考慮了配體和受體的亞基結(jié)構(gòu),準(zhǔn)確地表示了異聚體復(fù)合物纹冤,有466個(gè)是異聚體洒宝,對(duì)于準(zhǔn)確研究多亞基復(fù)合物介導(dǎo)的細(xì)胞通訊很關(guān)鍵。CellPhoneDB有474種相互作用涉及分泌蛋白萌京,而490種相互作用涉及膜蛋白,總共有250個(gè)涉及整合素的相互作用宏浩。相比于早期的方法知残,改方法的優(yōu)點(diǎn)就是考慮到多聚體受體的情況,而非簡單地把受體限定為單個(gè)分子比庄。
當(dāng)然目前只支持人的數(shù)據(jù)求妹。
依據(jù)構(gòu)建的配體-受體庫乏盐,對(duì)于用戶輸入的單細(xì)胞數(shù)據(jù),基于一種細(xì)胞類型的受體和另一種細(xì)胞類型的配體的表達(dá)制恍,得出兩種細(xì)胞群之間豐富的受體-配體相互作用父能。對(duì)于細(xì)胞群所表達(dá)的基因,計(jì)算表達(dá)該基因的細(xì)胞百分比和基因表達(dá)平均值净神。若該基因只在該群中10%及以下的細(xì)胞中表達(dá)(默認(rèn)值為0.1)何吝,則被移除。
將所有細(xì)胞的簇標(biāo)簽隨機(jī)排列1000次(默認(rèn)值)鹃唯,并確定簇中受體平均表達(dá)水平和相互作用簇中配體平均表達(dá)水平的平均值爱榕。對(duì)于兩種細(xì)胞類型之間每對(duì)比較中的每一個(gè)受體-配體對(duì),這產(chǎn)生了一個(gè)零分布(null distribution坡慌,亦稱伯努利分布黔酥、兩點(diǎn)分布,指的是對(duì)于隨機(jī)變量X有, 參數(shù)為p(0<p<1)洪橘,如果它分別以概率p和1-p取1和0為值跪者。EX= p,DX=p(1-p)。伯努利試驗(yàn)成功的次數(shù)服從伯努利分布,參數(shù)p是試驗(yàn)成功的概率)熄求。
通過計(jì)算等于或高于實(shí)際平均值的平均值的比例渣玲,可以得到了一個(gè)P值,表示給定受體-配體復(fù)合物細(xì)胞類型特異性的可能性抡四。換句話說柜蜈,如果觀察到的平均值在前5%,則相互作用的P值為0.05指巡。
根據(jù)兩種細(xì)胞類型中富集到的顯著的受體-配體對(duì)的數(shù)量淑履,對(duì)細(xì)胞類型之間高度特異性的相互作用進(jìn)行排序,以便手動(dòng)篩選出生物學(xué)相關(guān)的相互作用關(guān)系藻雪。
===CellPhoneDB的使用測試===
安裝:
官網(wǎng)地址:https://github.com/Teichlab/cellphonedb
下面是官網(wǎng)的推薦安裝方式:
1. Create python=>3.6 environment
Using conda:?conda create -n cpdb python=3.7
Using virtualenv:?python -m venv cpdb
Activate environment
Using conda:?source activate cpdb
Using virtualenv:?source cpdb/bin/activate
Install CellPhoneDB?pip install cellphonedb
我是用的conda安裝的:
conda create -n cpdb python=3.7
source activate cpdb
pip install cellphonedb
因?yàn)楹竺娴漠媹D是基于R的秘噪,所以還需要安裝R以及需要的幾個(gè)畫圖相關(guān)的包:ggplot2,pheatmap勉耀。
測試數(shù)據(jù):
用的也是官網(wǎng)的測試數(shù)據(jù):
https://raw.githubusercontent.com/Teichlab/cellphonedb/master/in/example_data/test_counts.txt
https://raw.githubusercontent.com/Teichlab/cellphonedb/master/in/example_data/test_meta.txt
counts里面是表達(dá)譜文件:
meta是細(xì)胞類型和分組信息
CellPhoneDB主要分成method指煎、plot、query和database4個(gè)模塊便斥。
我們主要進(jìn)行分析的是method(分析模塊)和plot模塊(畫圖模塊)至壤。
query是進(jìn)行數(shù)據(jù)查詢的模塊,查詢基因有哪些互作結(jié)果(get_interaction_gene結(jié)果為數(shù)據(jù)庫涉及到的基因枢纠,find_interactions_by_element可以找到特定基因的受體配體作用數(shù)據(jù)對(duì))像街。
database是輸入數(shù)據(jù)庫,一般默認(rèn)可以不輸入,直接使用即可镰绎。
運(yùn)行程序:
cellphonedb method analysis test_meta.txt test_counts.txt
當(dāng)然還有一些其它可供選擇的參數(shù):例如threads設(shè)置線程的脓斩,output-path設(shè)置輸出文件夾的等。
運(yùn)行完之后會(huì)有一個(gè)叫做默認(rèn)out的輸出文件夾畴栖,里面會(huì)有相應(yīng)的pvalues.txt means.txt significant_means.txt等輸出文件随静。
繪制氣泡圖:
cellphonedb plot dot_plot
當(dāng)然可以通過參數(shù):--pvalues-path --output-path --output-name來設(shè)置輸出和輸入的文件路徑和名字。
繪制熱圖:
cellphonedb plot heatmap_plot test_meta.txt
當(dāng)然如果覺得官方畫的不好看吗讶,也可以自己提取輸出文件的信息來美化圖片燎猛。
library(tidyverse)
library(DT)
create_dt <- function(x)
{
DT::datatable(x,extensions='Buttons',options=list(dom='Blfrtip',buttons=c('csv','excel','pdf'),lengthMenu=list(c(10,25,50,-1),c(10,25,50,"All"))))
}
//讀取文件
mypvals <- read.delim("pvalues.txt",check.names=FALSE)
mymeans <- read.delim("means.txt",check.names=FALSE)
mysigmeans <- read.delim("significant_means.txt",check.names=FALSE)
dim(mysigmeans)
mymeans[1:4,1:4]
//調(diào)整配體-受體順序,配體在前关翎,受體在后
library(dplyr)
order_sequence <- function(df)
{
? ?da<-data.frame()
? ?for(i in 1:length(df$gene_a))
? ?{
? ? ?sub_data <- df[i,]
? ? ?if(sub_data$receptor_b=="False")
{
? ?if(sub_data$receptor_a=="True")
? ?{
? ? ?old_names<- colnames(sub_data)
?
my_list<-strsplit(old_names[-c(1:11)],split="\\|")
my_character <- paste(sapply(my_list,'[[',2L),sapply(my_list,'[[',1L),sep='|')
new_names<-c(names(sub_data)[1:4],"gene_b","gene_a","secreted","receptor_b","receptor_a","annotation_strategy","is_integrin",my_character)
sub_data=dplyr::select(sub_data,new_names)
names(sub_data)<-old_names
da=rbind(da,sub_data)
? ?}
}
else
{
? ?da=rbind(da,sub_data)
}
? ?}
? ?return(da)
}
df<-subset(mymeans,receptor_a=="True"&receptor_b=="False"|receptor_a=="False"&receptor_b=="True")
df <- df %>% dplyr::mutate(na_count=rowSums(is.na(df)|df=="")) %>% subset(na_count==0) %>% dplyr::select(-na_count)
dim(df)
means_order <- order_sequence(df) %>% tidyr::unite(Pairs,gene_a,gene_b)
pvals_order <- order_sequence(mypvals) %>% tidyr::unite(Pairs,gene_a,gene_b)
//合并表達(dá)量文件和pvalue文件
means_sub <- means_order[,c('Pairs',colnames(mymeans)[-c(1:11)])]
pvals_sub <- pvals_order[,c('Pairs',colnames(mymeans)[-c(1:11)])]
means_gather <- tidyr::gather(means_sub,celltype,mean_exprssion,names(means_sub)[-1])
pvals_gather <- tidyr::gather(pvals_sub,celltype,pval,names(pvals_sub)[-1])
mean_pval <- dplyr::left_join(means_gather,pvals_gather,by=c('Pairs','celltype'))
create_dt(mean_pval)
//提取顯著性表達(dá)的受體配體對(duì)
a <- mean_pval %>% dplyr::select(Pairs,celltype,pval) %>% tidyr::spread(key=celltype,value=pval)
sig_pairs <- a[which(rowSums(a<=0.05)!=0),]
dim(sig_pairs)
mean_pval_sub <- subset(mean_pval,Pairs %in% sig_pairs$Pairs)
//可視化顯著性表達(dá)的受體配體對(duì)
library(RColorBrewer)
library(scales)
library(ggplot2)
library(cowplot)
p1 <- mean_pval_sub %>% ggplot(aes(x=mean_exprssion)) +geom_density(alpha=0.2,color='red')
p2 <- mean_pval_sub %>% ggplot(aes(x=log2(mean_exprssion))) +geom_density(alpha=0.2,color='red')
mean_distribution <- plot_grid(p1,p2,labels="AUTO",nrow=1)
p1 <- mean_pval_sub %>% ggplot(aes(x=pval)) +geom_density(alpha=0.2,color='red')
p2 <- mean_pval_sub %>% ggplot(aes(x=-log10(pval+1*10^-3))) +geom_density(alpha=0.2,color='red')
pval_distribution <- plot_grid(p1,p2,labels="AUTO",nrow=1)
plot_grid(mean_distribution,pval_distribution,labels="AUTO",ncol=1)
p <- mean_pval_sub %>% dplyr::arrange(Pairs) %>% head(16*10) %>% ggplot(aes(x=Pairs,y=celltype)) +
geom_point(aes(color=log2(mean_exprssion),size=-log10(pval+1*10^-3))) +
guides(colour=guide_colourbar(order=1),size=guide_legend(order=2)) +
labs(x='',y='') +
scale_color_gradientn(name='Expression level',colours=terrain.colors(100)) +
theme(axis.text.x=element_text(angle=45,hjust=1)) +
theme(panel.border=element_rect(color='black',fill=NA),panel.grid.major.x=element_blank(),panel.grid.major.y=element_blank(),panel.grid.minor.x=element_blank(),panel.grid.minor.y=element_blank(),panel.background=element_blank(),axis.title.x=element_blank(),axis.title.y=element_blank(),axis.ticks=element_blank())+
theme(legend.key.size=unit(0.4,'cm'),legend.title=element_text(size=9),legend.text=element_text(size=8))
p
save_plot("ligands_Receptor_dotplot.png",p,base_height=4,base_aspect_ratio=3,base_width=NULL,dpi=600)
save_plot("ligands_Receptor_dotplot.pdf",p,base_height=4,base_aspect_ratio=3,base_width=NULL)
====真實(shí)例子測試====
還用最常用的pbmc3k扛门,上次做軌跡分析的例子。因?yàn)槲覀円呀?jīng)保存了纵寝,所以直接load進(jìn)來论寨。
pbmc3k <- readRDS("pbmc.rds")
//準(zhǔn)備cellphone的輸入文件
write.table(as.matrix(pbmc3k@assays$RNA@data), 'cellphonedb_count.txt', sep='\t', quote=F)
meta_data <- cbind(rownames(pbmc3k@meta.data), pbmc3k@meta.data[,'cell_type', drop=F])??
meta_data <- as.matrix(meta_data)
meta_data[is.na(meta_data)] = "Unkown"?
write.table(meta_data, 'cellphonedb_meta.txt', sep='\t', quote=F, row.names=F)
//運(yùn)行cellphone
cellphonedb method statistical_analysis cellphonedb_meta.txt cellphonedb_count.txt --counts-data=gene_name ?--threads 20
cellphonedb plot dot_plot?
cellphonedb plot heatmap_plot cellphonedb_meta.txt??
//讀入網(wǎng)絡(luò)數(shù)據(jù),構(gòu)建網(wǎng)絡(luò)爽茴,并繪制最簡單的一張網(wǎng)絡(luò)圖:
pbmc='out/'?
library(psych)
library(qgraph)
library(igraph)
library(tidyverse)
netf<- "count_network.txt"
mynet <- read.delim(paste0(pbmc,"count_network.txt"), check.names = FALSE)
table(mynet$count)
mynet %>% filter(count>0) -> mynet?
head(mynet)
net<- graph_from_data_frame(mynet)
plot(net)
//為了給這個(gè)網(wǎng)絡(luò)圖的邊點(diǎn)mapping上不同的屬性我們引入一串顏色葬凳。
allcolour=c("#DC143C","#0000FF","#20B2AA","#FFA500","#9370DB",
? ? ? ? ? ? "#98FB98","#F08080","#1E90FF","#7CFC00","#FFFF00",
? ? ? ? ? ? "#808000","#FF00FF","#FA8072","#7B68EE","#9400D3",
? ? ? ? ? ? "#800080","#A0522D","#D2B48C","#D2691E","#87CEEB",
? ? ? ? ? ? "#40E0D0","#5F9EA0","#FF1493",
? ? ? ? ? ? "#FFE4B5","#8A2BE2","#228B22","#E9967A","#4682B4",
? ? ? ? ? ? "#32CD32","#F0E68C","#FFFFE0","#EE82EE","#FF6347",
? ? ? ? ? ? "#6A5ACD","#9932CC","#8B008B","#8B4513","#DEB887")
karate_groups <- cluster_optimal(net)
coords <- layout_in_circle(net, order =order(membership(karate_groups)))? ? ? ?#設(shè)置網(wǎng)絡(luò)布局
E(net)$width? <- E(net)$count/10? ? ? ? # 邊點(diǎn)權(quán)重(粗細(xì))
plot(net, edge.arrow.size=.1,?
? ? ?edge.curved=0,
? ? ?vertex.color=allcolour,
? ? ?vertex.frame.color="#555555",
? ? ?vertex.label.color="black",
? ? ?layout = coords,
? ? ?vertex.label.cex=.7)?
經(jīng)過以上一番修飾后,我們得到的網(wǎng)絡(luò)圖如下:
但是邊的顏色和點(diǎn)的顏色還是對(duì)應(yīng)不上室奏,我們來修改一番邊的屬性火焰。
net2 <- net? # 復(fù)制一份備用
for (i in 1: length(unique(mynet$SOURCE)) ){
? E(net)[map(unique(mynet$SOURCE),function(x) {
? ? get.edge.ids(net,vp = c(unique(mynet$SOURCE)[i],x))
? })%>% unlist()]$color <- allcolour[i]
}??
plot(net, edge.arrow.size=.1,?
? ? ?edge.curved=0,
? ? ?vertex.color=allcolour,
? ? ?vertex.frame.color="#555555",
? ? ?vertex.label.color="black",
? ? ?layout = coords,
? ? ?vertex.label.cex=.7)?
//把線條換成曲線
plot(net, edge.arrow.size=.1,?
? ? ?edge.curved=0.2, # 只是調(diào)了這個(gè)參數(shù)
? ? ?vertex.color=allcolour,
? ? ?vertex.frame.color="#555555",
? ? ?vertex.label.color="black",
? ? ?layout = coords,
? ? ?vertex.label.cex=.7)?
//接下來繪制,另一種貝殼一樣的圖胧沫,相當(dāng)于查看每個(gè)單獨(dú)的source的網(wǎng)絡(luò)圖
length(unique(mynet$SOURCE)) # 查看需要繪制多少張圖昌简,以方便布局
par(mfrow=c(2,5), mar=c(.3,.3,.3,.3))
for (i in 1: length(unique(mynet$SOURCE)) ){
? net1<-net2
? E(net1)[map(unique(mynet$SOURCE),function(x) {
? ? get.edge.ids(net,vp = c(unique(mynet$SOURCE)[i],x))
? })%>% unlist()]$color <- allcolour[i]
??
? plot(net1, edge.arrow.size=.1,?
? ? ? ?edge.curved=0.4,
? ? ? ?vertex.color=allcolour,
? ? ? ?vertex.frame.color="#555555",
? ? ? ?vertex.label.color="black",
? ? ? ?layout = coords,
? ? ? ?vertex.label.cex=1)?
}
//可以加上頻數(shù)
length(unique(mynet$SOURCE))
par(mfrow=c(2,5), mar=c(.3,.3,.3,.3))
for (i in 1: length(unique(mynet$SOURCE)) ){
? net1<-net2
??
? E(net1)$count <- ""
? E(net1)[map(unique(mynet$SOURCE),function(x) {
? ? get.edge.ids(net,vp = c(unique(mynet$SOURCE)[i],x))
? })%>% unlist()]$count? <- E(net2)[map(unique(mynet$SOURCE),function(x) {
? ? get.edge.ids(net,vp = c(unique(mynet$SOURCE)[i],x))
? })%>% unlist()]$count? # 故技重施
??
? E(net1)[map(unique(mynet$SOURCE),function(x) {
? ? get.edge.ids(net,vp = c(unique(mynet$SOURCE)[i],x))
? })%>% unlist()]$color <- allcolour[i]
??
? plot(net1, edge.arrow.size=.1,?
? ? ? ?edge.curved=0.4,
? ? ? ?edge.label = E(net1)$count, # 繪制邊的權(quán)重
? ? ? ?vertex.color=allcolour,
? ? ? ?vertex.frame.color="#555555",
? ? ? ?vertex.label.color="black",
? ? ? ?layout = coords,
? ? ? ?vertex.label.cex=1
? )?
??
}
//也可以只挑自己感興趣的配體受體等進(jìn)行點(diǎn)圖的展示
chemokines <- grep("^CXC|CCL|CCR|CX3|XCL|XCR", mymeans$interacting_pair,value = T)
th1 <- grep("IL2|IL12|IL18|IL27|IFNG|IL10|TNF$|TNF |LTA|LTB|STAT1|CCR5|CXCR3|IL12RB1|IFNGR1|TBX21|STAT4",?
? ? ? ? ? ? mymeans$interacting_pair,value = T)
th2 <- grep("IL4|IL5|IL25|IL10|IL13|AREG|STAT6|GATA3|IL4R",?
? ? ? ? ? ? mymeans$interacting_pair,value = T)
th17 <- grep("IL21|IL22|IL24|IL26|IL17A|IL17A|IL17F|IL17RA|IL10|RORC|RORA|STAT3|CCR4|CCR6|IL23RA|TGFB",?
? ? ? ? ? ? ?mymeans$interacting_pair,value = T)
treg <- grep("IL35|IL10|FOXP3|IL2RA|TGFB", mymeans$interacting_pair,value = T)
costimulatory <- grep("CD86|CD80|CD48|LILRB2|LILRB4|TNF|CD2|ICAM|SLAM|LT[AB]|NECTIN2|CD40|CD70|CD27|CD28|CD58|TSLP|PVR|CD44|CD55|CD[1-9]",?
? ? ? ? ? ? ? ? ? ? ? mymeans$interacting_pair,value = T)
coinhibitory <- grep("SIRP|CD47|ICOS|TIGIT|CTLA4|PDCD1|CD274|LAG3|HAVCR|VSIR",?
? ? ? ? ? ? ? ? ? ? ?mymeans$interacting_pair,value = T)
niche <- grep("CSF", mymeans$interacting_pair,value = T)
mymeans %>% dplyr::filter(interacting_pair %in% costimulatory)%>%
? dplyr::select("interacting_pair",starts_with("NK"),ends_with("NK"))? %>%??
? reshape2::melt() -> meansdf
colnames(meansdf)<- c("interacting_pair","CC","means")
mypvals %>% dplyr::filter(interacting_pair %in% costimulatory)%>%
? dplyr::select("interacting_pair",starts_with("NK"),ends_with("NK"))%>%??
? reshape2::melt()-> pvalsdf
colnames(pvalsdf)<- c("interacting_pair","CC","pvals")
pvalsdf$joinlab<- paste0(pvalsdf$interacting_pair,"_",pvalsdf$CC)
meansdf$joinlab<- paste0(meansdf$interacting_pair,"_",meansdf$CC)
pldf <- merge(pvalsdf,meansdf,by = "joinlab")
summary((filter(pldf,means >1))$means)
pldf%>% filter(means >1) %>%?
? ggplot(aes(CC.x,interacting_pair.x) )+?
? geom_point(aes(color=means,size=-log10(pvals+0.0001)) ) +
? scale_size_continuous(range = c(1,3))+
? scale_color_gradient2(high="red",mid = "yellow",low ="darkblue",midpoint = 25? )+ theme_bw()+?
? theme(axis.text.x = element_text(angle = -45,hjust = -0.1,vjust = 0.8))?
本文使用 文章同步助手 同步