同系列文章:
- sc-RAN-seq 數(shù)據(jù)分析||Seurat新版教程:Guided Clustering Tutorial
- sc-RAN-seq 數(shù)據(jù)分析||Seurat新版教程: Integrating datasets to learn cell-type specific responses
- sc-RAN-seq 數(shù)據(jù)分析||Seurat新版教程: Using sctransform in Seurat
- 單細(xì)胞轉(zhuǎn)錄組數(shù)據(jù)分析||Seurat新版教程:Differential expression testing
- 單細(xì)胞轉(zhuǎn)錄組 數(shù)據(jù)分析||Seurat新版教程:New data visualization methods in v3.0
- 單細(xì)胞轉(zhuǎn)錄組數(shù)據(jù)分析||Seurat并行策略
- Seurat Weekly NO.0 || 開刊詞
- Seurat Weekly NO.1 || 到底分多少個(gè)群是合適的茄茁?颂斜!
- Seurat Weekly NO.2 || 我該如何取子集
- 你到底想要什么樣的umap/tsne圖?
- scRNA-seq擬時(shí)分析 || Monocle2 踩坑教程
- scRNA-seq數(shù)據(jù)分析 || Monocle3
關(guān)于擬時(shí)分析的一些基本知識(shí)請(qǐng)參看以下兩篇及其參考文章:
更多詳細(xì)內(nèi)容請(qǐng)參考monocle3官網(wǎng)江滨,本文僅為個(gè)人學(xué)習(xí)筆記庸论。
在發(fā)育過程中职辅,細(xì)胞對(duì)刺激作出反應(yīng),并在整個(gè)生命過程中聂示,從一種功能“狀態(tài)”過渡到另一種功能“狀態(tài)”域携。不同狀態(tài)的細(xì)胞表達(dá)不同的基因,產(chǎn)生蛋白質(zhì)和代謝物的動(dòng)態(tài)重復(fù)序列鱼喉,從而完成它們的工作秀鞭。當(dāng)細(xì)胞在狀態(tài)之間移動(dòng)時(shí),它們經(jīng)歷一個(gè)轉(zhuǎn)錄重組的過程扛禽,一些基因被沉默锋边,另一些基因被激活。這些瞬時(shí)狀態(tài)通常很難描述编曼,因?yàn)樵诟€(wěn)定的端點(diǎn)狀態(tài)之間純化細(xì)胞可能是困難的或不可能的豆巨。單細(xì)胞RNA-Seq可以使您在不需要純化的情況下看到這些狀態(tài)。然而掐场,要做到這一點(diǎn)往扔,我們必須確定每個(gè)cell在可能的狀態(tài)范圍內(nèi)的位置。
Monocle介紹了利用RNA-Seq進(jìn)行單細(xì)胞軌跡分析的策略熊户。Monocle不是通過實(shí)驗(yàn)將細(xì)胞純化成離散狀態(tài)萍膛,而是使用一種算法來學(xué)習(xí)每個(gè)細(xì)胞必須經(jīng)歷的基因表達(dá)變化序列,作為動(dòng)態(tài)生物學(xué)過程的一部分嚷堡。一旦它了解了基因表達(dá)變化的整體“軌跡”卦羡,Monocle就可以將每個(gè)細(xì)胞置于軌跡中的適當(dāng)位置。然后,您可以使用Monocle的微分分析工具包來查找在軌跡過程中受到調(diào)控的基因绿饵,如查找作為偽時(shí)間函數(shù)變化的基因一節(jié)所述欠肾。如果這個(gè)過程有多個(gè)結(jié)果,Monocle將重建一個(gè)“分支”軌跡拟赊。這些分支與細(xì)胞的“決策”相對(duì)應(yīng)刺桃,Monocle提供了強(qiáng)大的工具來識(shí)別受它們影響的基因,并參與這些基因的形成吸祟。在分析單細(xì)胞軌跡中的分支的小節(jié)中瑟慈,您可以看到如何分析分支。
monocle 能做的不只是擬時(shí)分析屋匕,或者說為了做擬時(shí)分析他也做了sc-rna-seq的基本分析流程:數(shù)據(jù)讀入葛碧,均一化,降維(PCA过吻,umap,tsne,)进泼,聚類,marker基因篩選以及可視化函數(shù)纤虽。在新的學(xué)習(xí)中我們發(fā)現(xiàn)monocle能做的遠(yuǎn)不只這些乳绕,例如用shiny開發(fā)了web程序,更加用戶友好逼纸;借助garnett包可以做細(xì)胞定義-----monocle已經(jīng)是一個(gè)sc-rna-seq數(shù)據(jù)分析的工具箱洋措。
library(monocle3)
packageVersion("monocle3")
[1] ‘0.1.1’
monocle3是新的開始大部分的函數(shù)名稱都改變了,連版本號(hào)都從零開始杰刽。
多種格式的數(shù)據(jù)讀入菠发,這里我們依然是從seurat3對(duì)象中讀入:
#Generate a cell_data_set
#expression_matrix <- readRDS(url("http://staff.washington.edu/hpliner/data/cao_l2_expression.rds"))
#cell_metadata <- readRDS(url("http://staff.washington.edu/hpliner/data/cao_l2_colData.rds"))
#gene_annotation <- readRDS(url("http://staff.washington.edu/hpliner/data/cao_l2_rowData.rds"))
#cds <- new_cell_data_set(expression_matrix,
# cell_metadata = cell_metadata,
# gene_metadata = gene_annotation)
讀10X數(shù)據(jù),cellranger V2 V3 均可:
#Generate a cell_data_set from 10X output
# Provide the path to the Cell Ranger output.
#cds <- load_cellranger_data("D:\\Users\\Administrator\\Desktop\\RStudio\\single_cell\\V2")
setwd("D:\\Users\\Administrator\\Desktop\\Novo周運(yùn)來\\SingleCell\\scrna_tools")
#library(monocle)
#<-importCDS(pbmc,import_all = TRUE)
#Load Seurat object
pbmc <- readRDS('D:\\Users\\Administrator\\Desktop\\RStudio\\single_cell\\filtered_gene_bc_matrices\\hg19pbmc_tutorial.rds')
#Extract data, phenotype data, and feature data from the SeuratObject
data <- as(as.matrix(pbmc@assays$RNA@counts), 'sparseMatrix')
pd <- pbmc@meta.data
fData <- data.frame(gene_short_name = row.names(data), row.names = row.names(data))
colnames(pd)
[1] "orig.ident" "nCount_RNA" "nFeature_RNA" "percent.mt" "RNA_snn_res.0.5" "seurat_clusters"
創(chuàng)建cds對(duì)象,注意這里每一項(xiàng)是什么。
#Construct monocle cds
cds <- new_cell_data_set(data,
cell_metadata = pd,
gene_metadata = fData)
> cds
class: cell_data_set
dim: 13714 2638
metadata(1): cds_version
assays(1): counts
rownames(13714): AL627309.1 AP006222.2 ... PNRC2.1 SRSF10.1
rowData names(1): gene_short_name
colnames(2638): AAACATACAACCAC AAACATTGAGCTAC ... TTTGCATGAGAGGC TTTGCATGCCTCAC
colData names(10): orig.ident nCount_RNA ... cell_type cluster_ext_type
reducedDimNames(2): PCA UMAP
spikeNames(0):
降維
- 標(biāo)準(zhǔn)化
Monocle3中的大多數(shù)分析(包括軌跡推斷和聚類)都需要各種規(guī)范化和預(yù)處理步驟贺嫂。preprocess_cds執(zhí)行并存儲(chǔ)這些預(yù)處理步驟滓鸠。
具體地說,根據(jù)選擇的參數(shù)涝婉,preprocess_cds首先根據(jù)日志和因子大小對(duì)數(shù)據(jù)進(jìn)行規(guī)范化(normalizes)哥力,以處理深度差異蔗怠,或者只根據(jù)大小因子進(jìn)行規(guī)范化墩弯。有三種方法: PCA or LSI. For LS,默認(rèn)是PCA寞射。接下來渔工,preprocess_cds計(jì)算一個(gè)較低的維度空間,該空間將用作進(jìn)一步降維的輸入桥温,如tSNE和UMAP引矩。
#Pre-process the data
cds = preprocess_cds(cds, num_dim = 100)
?preprocess_cds
plot_pc_variance_explained(cds)
可視化降維
- UMAP
#Reduce dimensionality and visualize the cells
cds = reduce_dimension(cds) #Monocle uses UMAP by default
plot_cells(cds)
#No trajectory to plot. Has learn_graph() been called yet?
head(rownames(cds))
"AL627309.1" "AP006222.2" "RP11-206L10.2" "RP11-206L10.9" "LINC00115" "NOC2L"
plot_cells(cds, genes=c("S100A9", "RPS27", "FCER1A", "GNLY"))
- tSNE
cds = reduce_dimension(cds, reduction_method="tSNE")
plot_cells(cds, reduction_method="tSNE", color_cells_by="seurat_clusters")
可以自己設(shè)置分組看看是否有批次效應(yīng)。
#Check for and remove batch effects
plot_cells(cds, color_cells_by="seurat_clusters", label_cell_groups=FALSE)
聚類
無監(jiān)督聚類是許多單細(xì)胞表達(dá)工作中的一個(gè)常見步驟。在包含多種細(xì)胞類型的實(shí)驗(yàn)中旺韭,每個(gè)cluxter可能對(duì)應(yīng)不同的細(xì)胞類型氛谜。該函數(shù)接受cell_data_set作為輸入,使用Louvain community detection對(duì)細(xì)胞進(jìn)行聚類区端,并返回一個(gè)cell_data_set值漫,其中包含內(nèi)部存儲(chǔ)的聚類結(jié)果。
#Group cells into clusters
#library(reticulate)
#use_python("E:\\conda\\python.exe")
#reticulate::py_install("louvain")
cds = cluster_cells(cds, python_home = "E:\\conda\\python",verbose = T,resolution=c(10^seq(-6,-1)))
Running louvain clustering algorithm ...
Run kNN based graph clustering starts:
-Input data of 2638 rows and 2 columns
-k is set to 20
Finding nearest neighbors...DONE ~ 0.129999999997381 s
Compute jaccard coefficient between nearest-neighbor sets ...DONE ~ 0.129999999997381 s
Build undirected graph from the weighted links ...DONE ~ 0.14 s
Run louvain clustering on the graph ...
Running louvain iteration 1 ...
Current iteration is 1; current resolution is 1e-06; Modularity is 0.547843095426104; Number of clusters are 3
Current iteration is 1; current resolution is 1e-05; Modularity is 0.547843095426104; Number of clusters are 3
Current iteration is 1; current resolution is 1e-04; Modularity is 0.547843095426104; Number of clusters are 3
Current iteration is 1; current resolution is 0.001; Modularity is 0.723545062987134; Number of clusters are 5
Current iteration is 1; current resolution is 0.01; Modularity is 0.865973276843872; Number of clusters are 13
Current iteration is 1; current resolution is 0.1; Modularity is 0.822758898433183; Number of clusters are 49
Maximal modularity is 0.865973276843872; corresponding resolution is 0.01
Run kNN based graph clustering DONE, totally takes 2.71215486526489 s.
-Number of clusters: 13
plot_cells(cds,graph_label_size=15,cell_size=1.5)
尋找marker 基因
#Find marker genes expressed by each cluster
marker_test_res = top_markers(cds, group_cells_by="seurat_clusters", reference_cells=1000, cores=8)
library(dplyr)
top_specific_markers = marker_test_res %>%
filter(fraction_expressing >= 0.10) %>%
group_by(cell_group) %>%
top_n(1, pseudo_R2)
top_specific_marker_ids = unique(top_specific_markers %>% pull(gene_id))
plot_genes_by_group(cds,
top_specific_marker_ids,
group_cells_by="seurat_clusters",
ordering_type="maximal_on_diag",
max.size=3)
top_specific_markers = marker_test_res %>%
filter(fraction_expressing >= 0.10) %>%
group_by(cell_group) %>%
top_n(3, pseudo_R2)
top_specific_marker_ids = unique(top_specific_markers %>% pull(gene_id))
plot_genes_by_group(cds,
top_specific_marker_ids,
group_cells_by="seurat_clusters",
ordering_type="cluster_row_col",
max.size=3)
已知細(xì)胞類型
#Annotate your cells according to type
colData(cds)$assigned_cell_type=as.character(clusters(cds))
colData(cds)$assigned_cell_type = dplyr::recode(colData(cds)$assigned_cell_type,
"1"="Body wall muscle",
"2"="Germline","3"="Unclassified neurons","4"="Seam cells",
"5"="Coelomocytes",
"6"="Pharyngeal epithelia",
"7"="Vulval precursors",
"8"="Non-seam hypodermis",
"9"="Intestinal/rectal muscle",
"10"="Touch receptor neurons",
"11"="Unclassified neurons",
"12"="flp-1(+) interneurons",
"13"="Canal associated neurons")
plot_cells(cds, group_cells_by="cluster", color_cells_by="assigned_cell_type",group_label_size=4,cell_size=1.5)
手動(dòng)選擇細(xì)胞
cds_subset = choose_cells(cds)
#Warning: package ‘shiny’ was built under R version 3.5.3
軌跡推斷
Monocle3的目的是在實(shí)驗(yàn)中了解細(xì)胞是如何通過一個(gè)基因表達(dá)變化的生物程序進(jìn)行轉(zhuǎn)化的织盼。每個(gè)細(xì)胞都可以看作是高維空間中的一個(gè)點(diǎn)杨何,每個(gè)維描述了不同基因的表達(dá)。識(shí)別基因表達(dá)變化的程序相當(dāng)于學(xué)習(xí)細(xì)胞在這個(gè)空間中遵循的軌跡沥邻。然而危虱,分析中維度越多,學(xué)習(xí)軌跡就越困難唐全。
幸運(yùn)的是埃跷,許多基因通常彼此共存,因此可以使用各種不同的算法來降低數(shù)據(jù)的維數(shù)芦瘾。Monocle3通過reduce_dimension提供了兩種不同的降維算法(UMAP和tSNE)捌蚊。兩者都使用cell_data_set對(duì)象和一些允許用于縮小空間的維度。您還可以提供一個(gè)模型公式近弟,指示從數(shù)據(jù)中“減去”的一些變量(例如批ID或其他技術(shù)因素)缅糟,這樣它就不會(huì)對(duì)軌跡產(chǎn)生影響。
函數(shù)learn_graph是繼preprocess_cds祷愉、reduce_dimensions和cluster_cells之后的軌跡構(gòu)建過程的第四個(gè)步驟窗宦。在learn_graph之后,通常調(diào)用order_cells二鳄。
> cds <- learn_graph(cds)
|===============================================================================================================================================| 100%
|===============================================================================================================================================| 100%
|===============================================================================================================================================| 100%
> heand(colData(cds))
> head(colData(cds))
DataFrame with 6 rows and 11 columns
orig.ident nCount_RNA nFeature_RNA percent.mt RNA_snn_res.0.5 seurat_clusters Size_Factor garnett_cluster cell_type cluster_ext_type assigned_cell_type
<factor> <numeric> <integer> <numeric> <factor> <factor> <numeric> <logical> <character> <character> <character>
AAACATACAACCAC pbmc3k 2419 779 3.01777594047127 1 1 1.10763503821696 NA T cells CD4 T cells Coelomocytes
AAACATTGAGCTAC pbmc3k 4903 1352 3.79359575769937 3 3 2.24503290300858 NA Unknown B cells Unclassified neurons
AAACATTGATCAGC pbmc3k 3147 1129 0.889736256752463 1 1 1.44097869585315 NA CD4 T cells CD4 T cells Seam cells
AAACCGTGCTTCCG pbmc3k 2639 960 1.74308450170519 2 2 1.20837075893119 NA Monocytes Monocytes flp-1(+) interneurons
AAACCGTGTATGCG pbmc3k 980 521 1.22448979591837 6 6 0.44873184681795 NA NK cells NK cells Intestinal/rectal muscle
AAACGCACTGGTAC pbmc3k 2163 781 1.66435506241331 1 1 0.99041529047676 NA T cells CD4 T cells Seam cells
plot_cells(cds,
color_cells_by = "assigned_cell_type",
label_groups_by_cluster=FALSE,
label_leaves=FALSE,
label_branch_points=FALSE,
group_label_size=4,cell_size=1.5)
選擇root
#Order the cells in pseudotime
cds = order_cells(cds)
plot_cells(cds,
color_cells_by = "pseudotime",
label_cell_groups=FALSE,
label_leaves=TRUE,
label_branch_points=TRUE,
graph_label_size=1.5,
group_label_size=4,cell_size=1.5)
# a helper function to identify the root principal points:
get_earliest_principal_node <- function(cds, time_bin="Body wall muscle"){
cell_ids <- which(colData(cds)[, "assigned_cell_type"] == time_bin)
closest_vertex <-cds@principal_graph_aux[["UMAP"]]$pr_graph_cell_proj_closest_vertex
closest_vertex <- as.matrix(closest_vertex[colnames(cds), ])
root_pr_nodes <-
igraph::V(principal_graph(cds)[["UMAP"]])$name[as.numeric(names(which.max(table(closest_vertex[cell_ids,]))))]
root_pr_nodes
}
cds = order_cells(cds, root_pr_nodes=get_earliest_principal_node(cds))
plot_cells(cds,
color_cells_by = "pseudotime",
label_cell_groups=FALSE,
label_leaves=FALSE,
label_branch_points=FALSE,
graph_label_size=1.5,
group_label_size=4,cell_size=1.5)
3D 圖
#Working with 3D trajectories
cds_3d = reduce_dimension(cds, max_components = 3)
cds_3d = cluster_cells(cds_3d)
cds_3d = learn_graph(cds_3d)
cds_3d = order_cells(cds_3d, root_pr_nodes=get_earliest_principal_node(cds))
cds_3d_plot_obj = plot_cells_3d(cds_3d, color_cells_by="assigned_cell_type")
cds_3d_plot_obj
單個(gè)基因的擬時(shí)軌跡
AFD_genes = c("S100A9", "RPS27", "FCER1A")
AFD_lineage_cds = cds[AFD_genes,
clusters(cds) %in% c(11, 12, 5)]
plot_genes_in_pseudotime(AFD_lineage_cds,
color_cells_by="pseudotime",
min_expr=0.5)
細(xì)胞定義
setwd("D:\\Users\\Administrator\\Desktop\\Novo周運(yùn)來\\SingleCell\\scrna_tools")
library(garnett)
load("hsPBMC")# 下載好的訓(xùn)練模型
library(org.Hs.eg.db)
cds <- classify_cells(cds, hsPBMC,
db = org.Hs.eg.db,
cluster_extend = TRUE,
cds_gene_id_type = "SYMBOL")
> head(pData(cds))
DataFrame with 6 rows and 11 columns
orig.ident nCount_RNA nFeature_RNA percent.mt RNA_snn_res.0.5 seurat_clusters Size_Factor garnett_cluster cell_type cluster_ext_type assigned_cell_type
<factor> <numeric> <integer> <numeric> <factor> <factor> <numeric> <logical> <character> <character> <character>
AAACATACAACCAC pbmc3k 2419 779 3.01777594047127 1 1 1.10763503821696 NA T cells CD4 T cells Coelomocytes
AAACATTGAGCTAC pbmc3k 4903 1352 3.79359575769937 3 3 2.24503290300858 NA Unknown B cells Unclassified neurons
AAACATTGATCAGC pbmc3k 3147 1129 0.889736256752463 1 1 1.44097869585315 NA CD4 T cells CD4 T cells Seam cells
AAACCGTGCTTCCG pbmc3k 2639 960 1.74308450170519 2 2 1.20837075893119 NA Monocytes Monocytes flp-1(+) interneurons
AAACCGTGTATGCG pbmc3k 980 521 1.22448979591837 6 6 0.44873184681795 NA NK cells NK cells Intestinal/rectal muscle
AAACGCACTGGTAC pbmc3k 2163 781 1.66435506241331 1 1 0.99041529047676 NA T cells CD4 T cells Seam cells
> table(pData(cds)$cell_type)
B cells CD34+ CD4 T cells CD8 T cells Dendritic cells Monocytes NK cells T cells Unknown
297 3 563 147 27 421 252 345 583
plot_cells(cds,
group_cells_by="partition",
color_cells_by="cell_type",
cell_size=1.5,
group_label_size=5)
回歸分析
在本節(jié)中赴涵,我們將探索如何使用Monocle來發(fā)現(xiàn)根據(jù)幾種不同標(biāo)準(zhǔn)表達(dá)不同的基因。根據(jù)分析的復(fù)雜程度订讼,對(duì)cell_data_set對(duì)象中的所有基因執(zhí)行差異表達(dá)分析可能需要幾分鐘到幾個(gè)小時(shí)髓窜。為了讓這個(gè)小插曲簡單而快速,我們將使用一組小的基因欺殿。不過寄纵,請(qǐng)放心,Monocle即使在大型實(shí)驗(yàn)中也可以分析數(shù)千個(gè)基因脖苏,這對(duì)于在您正在研究的生物學(xué)過程中發(fā)現(xiàn)動(dòng)態(tài)調(diào)控基因非常有用程拭。
Monocle中的差異分析工具非常靈活。Monocle的工作原理是為每個(gè)基因擬合一個(gè)回歸模型棍潘。您可以指定此模型來考慮實(shí)驗(yàn)中的各種因素(時(shí)間恃鞋、治療等)崖媚。例如,在胚胎數(shù)據(jù)中恤浪,細(xì)胞是在不同的時(shí)間點(diǎn)收集的畅哑。我們可以通過先對(duì)每個(gè)基因擬合一個(gè)廣義線性模型來測(cè)試上述基因的表達(dá)是否會(huì)隨著時(shí)間發(fā)生變化:
ciliated_genes = top_specific_marker_ids[0:4]
cds_subset = cds[rowData(cds)$gene_short_name %in% ciliated_genes,]
gene_fits = fit_models(cds_subset, model_formula_str = "~seurat_clusters")
# A tibble: 4 x 4
gene_short_name model model_summary status
<fct> <list> <list> <chr>
1 S100A9 <speedglm> <smmry.sp> OK
2 S100A8 <speedglm> <smmry.sp> OK
3 RPS27 <speedglm> <smmry.sp> OK
4 FCER1A <speedglm> <smmry.sp> OK
gene_fits是一個(gè)包含每個(gè)基因一行的tibble。模型列包含廣義線性模型對(duì)象水由,每個(gè)模型對(duì)象都旨在使用上面的方程解釋基因在細(xì)胞間的表達(dá)敢课。參數(shù)model_formula a_str應(yīng)該是指定模型公式的字符串。您在測(cè)試中使用的模型公式可以包含colData表中作為列存在的任何術(shù)語绷杜,包括Monocle在其他分析步驟中添加的那些列直秆。例如,如果使用cluster_cells鞭盟,可以使用cluster或partition(分別)作為模型公式來測(cè)試集群和分區(qū)之間不同的基因圾结。您還可以包含多個(gè)變量,例如~胚齿诉。時(shí)間+批處理筝野,這對(duì)于減去不需要的效果非常有用。
> fit_coefs = coefficient_table(gene_fits)
> fit_coefs
# A tibble: 36 x 10
gene_short_name status term estimate std_err test_val p_value normalized_effect model_component q_value
<fct> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
1 S100A9 OK (Intercept) -1.78 0.196 -9.08 2.07e- 19 0 count 2.07e- 19
2 S100A9 OK seurat_clusters1 0.129 0.295 0.437 6.62e- 1 0.176 count 1.00e+ 0
3 S100A9 OK seurat_clusters2 5.16 0.196 26.3 2.36e-135 7.36 count 7.08e-135
4 S100A9 OK seurat_clusters3 0.0943 0.330 0.286 7.75e- 1 0.129 count 1.00e+ 0
5 S100A9 OK seurat_clusters4 0.00262 0.369 0.0071 9.94e- 1 0.00357 count 9.94e- 1
6 S100A9 OK seurat_clusters5 2.80 0.220 12.7 3.82e- 36 3.96 count 1.15e- 35
7 S100A9 OK seurat_clusters6 -0.223 0.503 -0.443 6.58e- 1 -0.301 count 1.00e+ 0
8 S100A9 OK seurat_clusters7 2.68 0.309 8.68 7.04e- 18 3.79 count 1.41e- 17
9 S100A9 OK seurat_clusters8 1.70 0.621 2.74 6.16e- 3 2.39 count 1.85e- 2
10 S100A8 OK (Intercept) -2.33 0.230 -10.1 1.29e- 23 0 count 2.58e- 23
# ... with 26 more rows
> emb_time_terms = fit_coefs %>% filter(term == "seurat_clusters1")
> emb_time_terms
# A tibble: 4 x 10
gene_short_name status term estimate std_err test_val p_value normalized_effect model_component q_value
<fct> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
1 S100A9 OK seurat_clusters1 0.129 0.295 0.437 6.62e- 1 0.176 count 1.00e+ 0
2 S100A8 OK seurat_clusters1 0.190 0.341 0.558 5.77e- 1 0.251 count 1.00e+ 0
3 RPS27 OK seurat_clusters1 -0.192 0.0174 -11.1 8.74e-28 -0.277 count 3.50e-27
4 FCER1A OK seurat_clusters1 -2.47 1.30 -1.90 5.70e- 2 -1.51 count 1.71e- 1
>
現(xiàn)在粤剧,讓我們找出那些具有重要時(shí)間成分的基因歇竟。ent_table()計(jì)算每個(gè)系數(shù)在Wald檢驗(yàn)下是否顯著不同于零。默認(rèn)情況下抵恋,ent_table()使用Benjamini和Hochberg方法對(duì)這些p值進(jìn)行調(diào)整焕议,用于多重假設(shè)檢驗(yàn)。這些調(diào)整值可以在q_value列中找到弧关。我們可以對(duì)結(jié)果進(jìn)行過濾盅安,控制假發(fā)現(xiàn)率如下:
emb_time_terms = fit_coefs %>% filter(term == "seurat_clusters1")
emb_time_terms %>% filter (q_value < 0.05) %>%
select(gene_short_name, term, q_value, estimate)
plot_genes_violin(cds_subset[,], group_cells_by="cell_type", ncol=2) +
theme(axis.text.x=element_text(angle=45, hjust=1))