適用背景
之前寫(xiě)了一篇文章介紹四步完成單細(xì)胞數(shù)據(jù)調(diào)控網(wǎng)絡(luò)流程分析踪区,但生成的結(jié)果沒(méi)有進(jìn)行可視化,所以本篇文章將介紹可視化的內(nèi)容莫杈。
本篇可視化主要分成三個(gè)部分:
- 1亏娜、常規(guī)熱圖
- 2、基于Seurat對(duì)象作圖
- 3菌赖、網(wǎng)絡(luò)圖
- 4梅肤、plotRSS氣泡圖
SCENIC/pySCENIC結(jié)果具體展示什么呢司蔬?
據(jù)我初步了解,主要展示regulons的活性值姨蝴,所謂regulons其實(shí)應(yīng)該就是TF俊啼,另外可以展示這些regulons調(diào)控的基因網(wǎng)絡(luò)圖。
另外左医,SCENIC提供了一個(gè)plotRSS函數(shù)能根據(jù)RSS得分篩選細(xì)胞類(lèi)型特異的regulons授帕,而且可以對(duì)RSS得分進(jìn)行可視化。注意RSS分值和活性值不能搞混浮梢,RSS分值衡量的是regulons對(duì)某種細(xì)胞類(lèi)型的特異性跛十。
可視化舉例
做熱圖主要分為兩種,一種是把細(xì)胞分組求regulons的活性平均值/中位數(shù)秕硝,另一種是展示所有細(xì)胞regulons的活性平均值芥映。
-
1、細(xì)胞分組求regulons的活性平均值/中位數(shù)
-
2、所有細(xì)胞的活性值
-
3奈偏、基于Seurat的氣泡圖
-
4坞嘀、基于Seurat的峰圖
5、基于Seurat的小提琴圖
6惊来、基于Seurat的點(diǎn)圖
7丽涩、regulon網(wǎng)絡(luò)調(diào)控圖
8、RSS得分熱圖和氣泡圖
數(shù)據(jù)準(zhǔn)備
以經(jīng)典的pbmc3k數(shù)據(jù)集為例裁蚁,獲取代碼如下:
library(Seurat)
library(SeuratData)
data("pbmc3k")
sce <- pbmc3k.final
saveRDS(sce,'pbmc3k.rds')
這個(gè)數(shù)據(jù)集已做好降維聚類(lèi)和注釋?zhuān)⑨屃袨閟eurat_annotations:
> head(sce)
orig.ident nCount_RNA nFeature_RNA seurat_annotations percent.mt
AAACATACAACCAC pbmc3k 2419 779 Memory CD4 T 3.0177759
AAACATTGAGCTAC pbmc3k 4903 1352 B 3.7935958
AAACATTGATCAGC pbmc3k 3147 1129 Memory CD4 T 0.8897363
AAACCGTGTATGCG pbmc3k 980 521 NK 1.2244898
AAACGCACTGGTAC pbmc3k 2163 781 Memory CD4 T 1.6643551
AAACGCTGACCAGT pbmc3k 2175 782 CD8 T 3.8160920
AAACGCTGGTTCTT pbmc3k 2260 790 CD8 T 3.0973451
AAACGCTGTAGCCA pbmc3k 1275 532 Naive CD4 T 1.1764706
AAACGCTGTTTCTG pbmc3k 1103 550 FCGR3A+ Mono 2.9011786
AAACTTGAAAAACG pbmc3k 3914 1112 B 2.6315789
RNA_snn_res.0.5 seurat_clusters group1 sub_celltype
AAACATACAACCAC 1 1 Memory CD4 T Memory CD4 T
AAACATTGAGCTAC 3 3 B B
AAACATTGATCAGC 1 1 Memory CD4 T Memory CD4 T
AAACCGTGTATGCG 6 6 NK NK
AAACGCACTGGTAC 1 1 Memory CD4 T Memory CD4 T
AAACGCTGACCAGT 4 4 CD8 T CD8 T
AAACGCTGGTTCTT 4 4 CD8 T CD8 T
AAACGCTGTAGCCA 0 0 Naive CD4 T Naive CD4 T
AAACGCTGTTTCTG 5 5 FCGR3A+ Mono FCGR3A+ Mono
AAACTTGAAAAACG 3 3 B B
然后按照之前的文章跑一下流程:
Rscript get_count_from_seurat.R -i pbmc3k.rds -d seurat_annotations -s 200 -l out
python create_loom_file_from_scanpy.py -i out.csv
sh pyscenic_from_loom.sh -i out.loom -n 10
Rscript calcRSS_by_scenic.R -l aucell.loom -m metadata_subset.xls -c seurat_annotations
生成文件有:
aucell.loom
ctx.csv
grn.tsv
metadata_subset.xls
out.csv
out.loom
regulon_RSS.Rdata
seurat_annotations_rss.rds
subset.rds
加載需要的包:
library(Seurat)
library(SCopeLoomR)
library(AUCell)
library(SCENIC)
library(dplyr)
library(KernSmooth)
library(RColorBrewer)
library(plotly)
library(BiocParallel)
library(pheatmap)
library(cowplot)
library(ggpubr)
library(ggsci)
library(ggplot2)
library(tidygraph)
library(ggraph)
library(stringr)
colpalettes<-unique(c(pal_npg("nrc")(10),pal_aaas("default")(10),pal_nejm("default")(8),pal_lancet("lanonc")(9),
pal_jama("default")(7),pal_jco("default")(10),pal_ucscgb("default")(26),pal_d3("category10")(10),
pal_locuszoom("default")(7),pal_igv("default")(51),
pal_uchicago("default")(9),pal_startrek("uniform")(7),
pal_tron("legacy")(7),pal_futurama("planetexpress")(12),pal_rickandmorty("schwifty")(12),
pal_simpsons("springfield")(16),pal_gsea("default")(12)))
len <- 100
incolor<-c(brewer.pal(8, "Dark2"),brewer.pal(12, "Paired"),brewer.pal(8, "Set2"),brewer.pal(9, "Set1"),colpalettes,rainbow(len))
主函數(shù)plot_pyscenic
plot_pyscenic函數(shù)參數(shù)介紹:
- inloom='aucell.loom'内狸,pyscenic得到的aucell.loom文件,可更改厘擂。
- incolor=incolor,調(diào)色板锰瘸,可自定義
- inrss='seurat_annotations_rss.rds'刽严,計(jì)算細(xì)胞類(lèi)型特異性regulons得到的_rss.rds后綴文件
- inrds='subset.rds',用于分析pyscenic的rds避凝,這里分組隨機(jī)取了200個(gè)細(xì)胞
- infun='median'舞萄,展示的熱圖用中位數(shù)還是平均值,默認(rèn)用中位數(shù)管削,平均值改成mean倒脓。
- ct.col='seurat_annotations',分組的列含思,這里用seurat_annotations
- inregulons=NULL崎弃,可以指定感興趣的regulons,如果沒(méi)有指定含潘,默認(rèn)使用前5的
- ingrn='grn.tsv'
- ntop1=5饲做,熱圖展示的活性值top regulon,默認(rèn)為前5.
- ntop2=50遏弱,網(wǎng)絡(luò)圖展示regulons調(diào)控的權(quán)重top基因盆均,默認(rèn)為前50。
plot_pyscenic <- function(inloom='aucell.loom',incolor=incolor,inrss='seurat_annotations_rss.rds',inrds='subset.rds',infun='median', ct.col='seurat_annotations',inregulons=NULL,ingrn='grn.tsv',ntop1=5,ntop2=50){
###load data
loom <- open_loom(inloom)
regulons_incidMat <- get_regulons(loom, column.attr.name="Regulons")
regulons <- regulonsToGeneLists(regulons_incidMat)
regulonAUC <- get_regulons_AUC(loom,column.attr.name='RegulonsAUC')
regulonAucThresholds <- get_regulon_thresholds(loom)
embeddings <- get_embeddings(loom)
close_loom(loom)
rss <- readRDS(inrss)
sce <- readRDS(inrds)
##calculate RSS fc
df = do.call(rbind,
lapply(1:ncol(rss), function(i){
dat= data.frame(
regulon = rownames(rss),
cluster = colnames(rss)[i],
sd.1 = rss[,i],
sd.2 = apply(rss[,-i], 1, get(infun))
)
}))
df$fc = df$sd.1 - df$sd.2
#select top regulon
ntopg <- df %>% group_by(cluster) %>% top_n(ntop1, fc)
ntopgene <- unique(ntopg$regulon)
write.table(ntopgene,'sd_regulon_RSS.list',sep='\t',quote=F,row.names=F,col.names=F)
#plot rss by cluster
#using plotRSS
rssPlot <- plotRSS(rss)
regulonsToPlot <- rssPlot$rowOrder
rp_df <- rssPlot$df
write.table(regulonsToPlot,'rss_regulon.list',sep='\t',quote=F,row.names=F,col.names=F)
write.table(rp_df,'rssPlot_data.xls',sep='\t',quote=F)
nlen <- length(regulonsToPlot)
hei <- ceiling(nlen)*0.4
blu<-colorRampPalette(brewer.pal(9,"Blues"))(100)
lgroup <- levels(rssPlot$df$cellType)
nlen2 <- length(lgroup)
wei <- nlen2*2
pdf(paste0('regulons_RSS_',ct.col,'_in_dotplot.pdf'),wei,hei)
print(rssPlot$plot)
dev.off()
# sd top gene
anrow = data.frame( group = ntopg$cluster)
lcolor <- incolor[1:length(unique(ntopg$cluster))]
names(lcolor) <- unique(anrow$group)
annotation_colors <- list(group=lcolor)
pn1 = rss[ntopg$regulon,]
pn2 = rss[unique(ntopg$regulon),]
rownames(pn1) <- make.unique(rownames(pn1))
rownames(anrow) <- rownames(pn1)
scale='row'
hei <- ceiling(length(ntopg$regulon)*0.4)
pdf(paste0('regulon_RSS_in_sd_topgene_',ct.col,'.pdf'),wei,hei)
print(
pheatmap(pn1,annotation_row = anrow,scale=scale,annotation_colors=annotation_colors,show_rownames = T,main='sd top regulons')
)
print(
pheatmap(pn2,scale=scale,show_rownames = T, main='sd top unique regulons')
)
dev.off()
#plotRSS gene
pn2 = rss[unique(rp_df$Topic),]
scale='row'
hei <- ceiling(length(unique(rp_df$Topic))*0.4)
pdf(paste0('regulon_RSS_in_plotRSS_',ct.col,'.pdf'),wei,hei)
print(
pheatmap(pn2,scale=scale,show_rownames = T, main='plotRSS unique regulons')
)
dev.off()
#all regulons
hei <- ceiling(length(rownames(rss))*0.2)
pdf(paste0('all_regulons_RSS_in_',ct.col,'.pdf'),wei,hei)
print(
pheatmap(rss,scale=scale,show_rownames = T,main='all regulons RSS')
)
dev.off()
#plot rss by all cells
if (is.null(inregulons)){
inregulons <- regulonsToPlot
}else{
inregulons <- intersect(inregulons,rownames(rss))
regulonsToPlot <- inregulons
}
pn3=as.matrix(regulonAUC@assays@data$AUC)
regulon <- rownames(pn3)
#regulon <- inregulons
pn3 <- pn3[regulon,]
#pn3 <- pn3[,sample(1:dim(pn3)[2],500)]
sce$group1=sce@meta.data[,ct.col]
meta <- sce@meta.data
meta <- meta[order(meta$group1),]
#meta <- meta[colnames(pn3),]
ancol = data.frame(meta[,c('group1')])
colnames(ancol) <- c('group1')
rownames(ancol) <- rownames(meta)
lcolor <- incolor[1:length(unique(ntopg$cluster))]
names(lcolor) <- unique(ntopg$cluster)
annotation_colors <- list(group1 =lcolor)
df1 <- ancol
df1$cell <- rownames(df1)
df1 <- df1[order(df1$group1),]
pn3 <- pn3[,rownames(df1)]
torange=c(-2,2)
pn3 <- scales::rescale(pn3,to=torange)
pn3 <- pn3[,rownames(ancol)]
scale='none'
hei <- ceiling(length(unique(regulon))*0.2)
pdf(paste0('all_regulon_activity_in_allcells.pdf'),10,hei)
print(
pheatmap(pn3,annotation_col = ancol,scale=scale,annotation_colors=annotation_colors,show_rownames = T,show_colnames = F,cluster_cols=F)
)
#pheatmap(pn3,scale=scale,show_rownames = T, show_colnames = F,cluster_cols=F)
dev.off()
#plot in seurat
regulonsToPlot = inregulons
sce$sub_celltype <- sce@meta.data[,ct.col]
sub_regulonAUC <- regulonAUC[,match(colnames(sce),colnames(regulonAUC))]
cellClusters <- data.frame(row.names = colnames(sce),
seurat_clusters = as.character(sce$seurat_clusters))
cellTypes <- data.frame(row.names = colnames(sce),
celltype = sce$sub_celltype)
sce@meta.data = cbind(sce@meta.data ,t(sub_regulonAUC@assays@data@listData$AUC[regulonsToPlot,]))
Idents(sce) <- sce$sub_celltype
nlen <- length(regulonsToPlot)
hei <- ceiling(nlen)*0.4
blu<-colorRampPalette(brewer.pal(9,"Blues"))(100)
nlen2 <- length(unique(sce$sub_celltype))
wei <- nlen2*2
pdf('regulons_activity_in_dotplot.pdf',wei,hei)
print(DotPlot(sce, features = unique(regulonsToPlot)) + coord_flip()+
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = .5))+
scale_color_gradientn(colours = blu)
)
dev.off()
hei=ceiling(nlen/4)*4
pdf('regulons_activity_in_umap.pdf',16,hei)
print(RidgePlot(sce, features = regulonsToPlot , ncol = 4))
print(VlnPlot(sce, features = regulonsToPlot,pt.size = 0 ))
print(FeaturePlot(sce, features = regulonsToPlot))
dev.off()
grn <- read.table(ingrn,sep='\t',header=T,stringsAsFactors=F)
inregulons1=gsub('[(+)]','',inregulons)
c1 <- which(grn$TF %in% inregulons1)
grn <- grn[c1,]
#edge1 <- data.frame()
#node1 <- data.frame()
pdf(paste0(ntop2,'_regulon_netplot.pdf'),10,10)
for (tf in unique(grn$TF)) {
tmp <- subset(grn,TF==tf)
if (dim(tmp)[1] > ntop2) {
tmp <- tmp[order(tmp$importance,decreasing=T),]
tmp <- tmp[1:ntop2,]
}
node2 <- data.frame(tmp$target)
node2$node.size=1.5
node2$node.colour <- 'black'
colnames(node2) <- c('node','node.size','node.colour')
df1 <- data.frame(node=tf,node.size=2,node.colour='#FFDA00')
node2 <- rbind(df1,node2)
edge2 <- tmp
colnames(edge2) <- c('from','to','edge.width')
edge2$edge.colour <- "#1B9E77"
torange=c(0.1,1)
edge2$edge.width <- scales::rescale(edge2$edge.width,to=torange)
graph_data <- tidygraph::tbl_graph(nodes = node2, edges = edge2, directed = T)
p1 <- ggraph(graph = graph_data, layout = "stress", circular = TRUE) + geom_edge_arc(aes(edge_colour = edge.colour, edge_width = edge.width)) +
scale_edge_width_continuous(range = c(1,0.2)) +geom_node_point(aes(colour = node.colour, size = node.size))+ theme_void() +
geom_node_label(aes(label = node,colour = node.colour),size = 3.5, repel = TRUE)
p1 <- p1 + scale_color_manual(values=c('#FFDA00','black'))+scale_edge_color_manual(values=c("#1B9E77"))
print(p1)
}
dev.off()
#plot activity heatmap
meta <- sce@meta.data
celltype <- ct.col
cellsPerGroup <- split(rownames(meta),meta[,celltype])
sub_regulonAUC <- regulonAUC[onlyNonDuplicatedExtended(rownames(regulonAUC)),]
# Calculate average expression:
regulonActivity_byGroup <- sapply(cellsPerGroup,
function(cells)
rowMeans(getAUC(sub_regulonAUC)[,cells]))
scale='row'
rss <- regulonActivity_byGroup
hei <- ceiling(length(regulonsToPlot)*0.4)
pn1 <- rss[regulonsToPlot,]
pdf(paste0('regulon_activity_in_',ct.col,'.pdf'),wei,hei)
print(
pheatmap(pn1,scale=scale,show_rownames = T, main='regulons activity')
)
dev.off()
hei <- ceiling(length(rownames(rss))*0.2)
pdf(paste0('all_regulons_activity_in_',ct.col,'.pdf'),wei,hei)
print(
pheatmap(rss,scale=scale,show_rownames = T,main='all regulons activity')
)
dev.off()
##calculate fc
df = do.call(rbind,
lapply(1:ncol(rss), function(i){
dat= data.frame(
regulon = rownames(rss),
cluster = colnames(rss)[i],
sd.1 = rss[,i],
sd.2 = apply(rss[,-i], 1, get(infun))
)
}))
df$fc = df$sd.1 - df$sd.2
#select top regulon
ntopg <- df %>% group_by(cluster) %>% top_n(ntop1, fc)
ntopgene <- unique(ntopg$regulon)
write.table(ntopgene,'sd_regulon_activity.list',sep='\t',quote=F,row.names=F,col.names=F)
anrow = data.frame( group = ntopg$cluster)
lcolor <- incolor[1:length(unique(ntopg$cluster))]
names(lcolor) <- unique(anrow$group)
annotation_colors <- list(group=lcolor)
pn1 = rss[ntopg$regulon,]
pn2 = rss[unique(ntopg$regulon),]
rownames(pn1) <- make.unique(rownames(pn1))
rownames(anrow) <- rownames(pn1)
scale='row'
hei <- ceiling(length(ntopg$regulon)*0.4)
pdf(paste0('regulon_activity_in_sd_topgene_',ct.col,'.pdf'),wei,hei)
print(
pheatmap(pn1,annotation_row = anrow,scale=scale,annotation_colors=annotation_colors,show_rownames = T,main='sd top regulons')
)
print(
pheatmap(pn2,scale=scale,show_rownames = T, main='sd top unique regulons ')
)
dev.off()
}
使用方法及結(jié)果
使用示例:
plot_pyscenic(inloom='aucell.loom',incolor=incolor,inrss='seurat_annotations_rss.rds',inrds='subset.rds',infun='median', ct.col='seurat_annotations',inregulons=NULL,ingrn='grn.tsv',ntop1=5,ntop2=50)
生成文件如下:
-
所有細(xì)胞中找到的所有regulons的活性熱圖文件all_regulon_activity_in_allcells.pdf
-
分組中找到的所有regulons的活性熱圖文件all_regulons_activity_in_seurat_annotations.pdf
-
分組中找到的所有regulons的RSS分值熱圖文件
all_regulons_RSS_in_seurat_annotations.pdf
-
分組plotRSS函數(shù)篩選的regulons活性在所有細(xì)胞熱圖文件 regulon_activity_in_allcells.pdf
-
分組差值篩選的top regulons活性熱圖文件 regulon_activity_in_sd_topgene_seurat_annotations.pdf
-
分組plotRSS函數(shù)篩選的regulons活性熱圖文件 regulon_activity_in_seurat_annotations.pdf
-
基于Seurat的plotRSS函數(shù)篩選的regulons活性氣泡圖文件 regulons_activity_in_dotplot.pdf
-
基于Seurat的plotRSS函數(shù)篩選的regulons活性峰圖漱逸、小提琴圖和散點(diǎn)圖文件 regulons_activity_in_umap.pdf
-
plotRSS函數(shù)篩選的regulons調(diào)控網(wǎng)絡(luò)圖文件 50_regulon_netplot.pdf
-
分組plotRSS函數(shù)篩選的regulons RSS分值熱圖文件
-
分組差值篩選的top regulons RSS分值熱圖文件
-
分組plotRSS函數(shù)篩選的regulons RSS分值氣泡圖文件
小結(jié)與討論
這是我目前想到的可視化類(lèi)型泪姨,歡迎大家評(píng)論區(qū)補(bǔ)充。
*以上作圖參考了pySCENIC的轉(zhuǎn)錄因子分析及數(shù)據(jù)可視化的系列文章和使用pyscenic做轉(zhuǎn)錄因子分析饰抒。