軌跡分析系列:
Monocle3和Monocle2并沒有本質(zhì)上的區(qū)別,只是把降維圖從DDRTree
改成了UMAP
捅膘。原因可能是包的作者認為UMAP比DDRTree降維更能反映高維空間的數(shù)據(jù)水评。
擬時分析的原理見:Trajectory inference analysis of scRNA-seq data
Monocle2的原理和應用已經(jīng)介紹過:monocle2
monocle3的三個主要功能:
1. 分群、計數(shù)細胞
2. 構(gòu)建細胞軌跡
3. 差異表達分析
monocle3的工作流程:
Monocle3的官網(wǎng):https://cole-trapnell-lab.github.io/monocle3/
1. 安裝
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(version = "3.10")
BiocManager::install(c('BiocGenerics', 'DelayedArray', 'DelayedMatrixStats',
'limma', 'S4Vectors', 'SingleCellExperiment',
'SummarizedExperiment', 'batchelor', 'Matrix.utils'))
install.packages("devtools")
devtools::install_github('cole-trapnell-lab/leidenbase')
devtools::install_github('cole-trapnell-lab/monocle3')
2. 數(shù)據(jù)準備绍申,創(chuàng)建CDS對象并進行降維。
注意:該數(shù)據(jù)集使用的是pbmc3k的數(shù)據(jù)集顾彰,由于pbmc都是分化成熟的免疫細胞极阅,理論上并不存在直接的分化關系,因此不適合用來做擬時軌跡分析涨享。這里僅作為學習演示筋搏。
library(Seurat)
library(monocle3)
library(tidyverse)
library(patchwork)
rm(list=ls())
dir.create("Monocle3")
setwd("Monocle3")
##創(chuàng)建CDS對象并預處理數(shù)據(jù)
pbmc <- readRDS("pbmc.rds")
data <- GetAssayData(pbmc, assay = 'RNA', slot = 'counts')
cell_metadata <- pbmc@meta.data
gene_annotation <- data.frame(gene_short_name = rownames(data))
rownames(gene_annotation) <- rownames(data)
cds <- new_cell_data_set(data,
cell_metadata = cell_metadata,
gene_metadata = gene_annotation)
3. 預處理
3.1 標準化和PCA降維
(RNA-seq是使用PCA,如果是處理ATAC-seq的數(shù)據(jù)用Latent Semantic Indexing)
#??preprocess_cds函數(shù)相當于seurat中NormalizeData+ScaleData+RunPCA
cds <- preprocess_cds(cds, num_dim = 50)
plot_pc_variance_explained(cds)
3.2 可視化
- umap降維
cds <- reduce_dimension(cds,preprocess_method = "PCA") #preprocess_method默認是PCA
plot_cells(cds)
color_cells_by參數(shù)設置umap圖的顏色吁讨,可以是colData(cds)中的任何一列髓迎。
colnames(colData(cds))
[1] "orig.ident" "nCount_RNA"
[3] "nFeature_RNA" "percent.mt"
[5] "RNA_snn_res.0.5" "seurat_clusters"
[7] "cell_type" "Size_Factor"
#以之前的Seurat分群來添加顏色,和原有的Seurat分群對比
p1 <- plot_cells(cds, reduction_method="UMAP", color_cells_by="seurat_clusters") + ggtitle('cds.umap')
##從seurat導入整合過的umap坐標
cds.embed <- cds@int_colData$reducedDims$UMAP
int.embed <- Embeddings(pbmc, reduction = "umap")
int.embed <- int.embed[rownames(cds.embed),]
cds@int_colData$reducedDims$UMAP <- int.embed
p2 <- plot_cells(cds, reduction_method="UMAP", color_cells_by="seurat_clusters") + ggtitle('int.umap')
p = p1|p2
ggsave("Reduction_Compare.pdf", plot = p, width = 10, height = 5)
如果細胞數(shù)目特別多(>10,000細胞或更多)排龄,可以設置一些參數(shù)來加快UMAP運行速度。在reduce_dimension()函數(shù)中設置
umap.fast_sgd=TRUE
可以使用隨機梯度下降方法(fast stochastic gradient descent method)加速運行茶鹃。還可以使用cores
參數(shù)設置多線程運算涣雕。
可視化指定基因
ciliated_genes <- c("CD4","CD52","JUN")
plot_cells(cds,
genes=ciliated_genes,
label_cell_groups=FALSE,
show_trajectory_graph=FALSE)
- 也可以使用tSNE降維
cds <- reduce_dimension(cds, reduction_method="tSNE")
plot_cells(cds, reduction_method="tSNE", color_cells_by="seurat_clusters")
- 隨后也可使用Monocle3分cluster,鑒定每個cluster的marker基因并進行細胞注釋等等闭翩。由于在Seurat的操作中已經(jīng)對數(shù)據(jù)進行了注釋挣郭,就不再使用Monocle3進行這些操作殊鞭。
plot_cells(cds, reduction_method="UMAP", color_cells_by="cell_type")
4. Cluster your cells
這里的cluster其實是做分區(qū)滓走,不同分區(qū)的細胞會進行單獨的軌跡分析。
cds <- cluster_cells(cds)
plot_cells(cds, color_cells_by = "partition")
5. 構(gòu)建細胞軌跡
5.1 軌跡學習Learn the trajectory graph(使用learn_graph()
函數(shù))
## 識別軌跡
cds <- learn_graph(cds)
p = plot_cells(cds, color_cells_by = "cell_type", label_groups_by_cluster=FALSE,
label_leaves=FALSE, label_branch_points=FALSE)
ggsave("Trajectory.pdf", plot = p, width = 8, height = 6)
上面這個圖將被用于許多下游分析登淘,比如分支分析和差異表達分析。
plot_cells(cds, color_cells_by = "cell_type", label_groups_by_cluster=FALSE,
+ label_leaves=TRUE, label_branch_points=TRUE,graph_label_size=1.5)
黑色的線顯示的是graph的結(jié)構(gòu)流译。數(shù)字帶白色圓圈表示不同的結(jié)局逞怨,也就是葉子。數(shù)字帶黑色圓圈代表分叉點福澡,從這個點開始叠赦,細胞可以有多個結(jié)局。這些數(shù)字可以通過label_leaves
和label_branch_points
參數(shù)設置革砸。
5.2 細胞按擬時排序
在學習了graph之后除秀,我們就可以根據(jù)學習的發(fā)育軌跡(擬時序)排列細胞。
為了對細胞進行排序算利,我們首先需要告訴Monocle哪里是這個過程的起始點册踩。也就是需要指定軌跡的'roots'。
- 手動選擇root
# 解決order_cells(cds)報錯"object 'V1' not found"
# rownames(cds@principal_graph_aux[["UMAP"]]$dp_mst) <- NULL
# colnames(cds@int_colData@listData$reducedDims@listData$UMAP) <- NULL
cds <- order_cells(cds)
p = plot_cells(cds, color_cells_by = "pseudotime", label_cell_groups = FALSE,
label_leaves = FALSE, label_branch_points = FALSE)
ggsave("Trajectory_Pseudotime.pdf", plot = p, width = 8, height = 6)
saveRDS(cds, file = "cds.rds")
6. 差異表達分析
There are two approaches for differential analysis in Monocle:
- Regression analysis: using
fit_models()
, you can evaluate whether each gene depends on variables such as time, treatments, etc.- Graph-autocorrelation analysis: using
graph_test()
, you can find genes that vary over a trajectory or between clusters.
6.1 尋找擬時軌跡差異基因
#graph_test分析最重要的結(jié)果是莫蘭指數(shù)(morans_I)俯邓,其值在-1至1之間骡楼,0代表此基因沒有
#空間共表達效應,1代表此基因在空間距離相近的細胞中表達值高度相似稽鞭。
Track_genes <- graph_test(cds, neighbor_graph="principal_graph", cores=6)
Track_genes <- Track_genes[,c(5,2,3,4,1,6)] %>% filter(q_value < 1e-3)
write.csv(Track_genes, "Trajectory_genes.csv", row.names = F)
6.2 挑選top10畫圖展示
Track_genes_sig <- Track_genes %>% top_n(n=10, morans_I) %>%
pull(gene_short_name) %>% as.character()
基因表達趨勢圖
p <- plot_genes_in_pseudotime(cds[Track_genes_sig,], color_cells_by="seurat_clusters",
min_expr=0.5, ncol = 2)
ggsave("Genes_Jitterplot.pdf", plot = p, width = 8, height = 6)
FeaturePlot圖
p <- plot_cells(cds, genes=Track_genes_sig, show_trajectory_graph=FALSE,
label_cell_groups=FALSE, label_leaves=FALSE)
p$facet$params$ncol <- 5
ggsave("Genes_Featureplot.pdf", plot = p, width = 20, height = 8)
尋找共表達基因模塊
Track_genes <- read.csv("Trajectory_genes.csv")
genelist <- pull(Track_genes, gene_short_name) %>% as.character()
gene_module <- find_gene_modules(cds[genelist,], resolution=1e-1, cores = 6)
write.csv(gene_module, "Genes_Module.csv", row.names = F)
cell_group <- tibble::tibble(cell=row.names(colData(cds)),
cell_group=colData(cds)$seurat_clusters)
agg_mat <- aggregate_gene_expression(cds, gene_module, cell_group)
row.names(agg_mat) <- stringr::str_c("Module ", row.names(agg_mat))
p <- pheatmap::pheatmap(agg_mat, scale="column", clustering_method="ward.D2")
ggsave("Genes_Module.pdf", plot = p, width = 8, height = 8)
提取擬時分析結(jié)果返回seurat對象
pseudotime <- pseudotime(cds, reduction_method = 'UMAP')
pseudotime <- pseudotime[rownames(pbmc@meta.data)]
pbmc$pseudotime <- pseudotime
p = FeaturePlot(pbmc, reduction = "umap", features = "pseudotime")
# pseudotime中有無限值鸟整,無法繪圖。
ggsave("Pseudotime_Seurat.pdf", plot = p, width = 8, height = 6)
saveRDS(pbmc, file = "sco_pseudotime.rds")