hello,大家好梨撞,今天給大家分享一個(gè)轉(zhuǎn)錄因子活性預(yù)測(cè)的工具雹洗,DoRothEA,在多篇高分文章中都有運(yùn)用卧波,我們就來(lái)看看這個(gè)軟件的優(yōu)勢(shì)吧时肿。大家可以參考DoRothEA。
先來(lái)看看介紹
首先是數(shù)據(jù)庫(kù)港粱,DoRothEA是一種包含轉(zhuǎn)錄因子(TF)與其靶標(biāo)相互作用的基因集螃成。一個(gè)TF及其對(duì)應(yīng)靶點(diǎn)的集合被定義為調(diào)節(jié)子(regulons)。DoRothEA regulons 收集了文獻(xiàn)查坪,ChIP-seq peaks寸宏,TF結(jié)合位點(diǎn)基序,從基因表達(dá)推斷相互作用等不同類型的互作證據(jù)偿曙。TF和靶標(biāo)之間的互作可信度根據(jù)支持的證據(jù)數(shù)量劃分為A-E五個(gè)等級(jí)氮凝,A是最可信,E為可信度低望忆。
TF 活性是根據(jù)其靶標(biāo)的 mRNA 表達(dá)水平計(jì)算的罩阵。 因此竿秆,可以將 TF 活性視為給定轉(zhuǎn)錄狀態(tài)的代表 。
看一看代碼案例
安裝和加載
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("dorothea")
## We load the required packages
library(dorothea)
library(dplyr)
library(Seurat)
library(tibble)
library(pheatmap)
library(tidyr)
library(viper)
讀取數(shù)據(jù)(以pbmc為例)
## Load the PBMC dataset
pbmc.data <- Read10X(data.dir = "filtered_gene_bc_matrices/hg19/")
## Initialize the Seurat object with the raw (non-normalized data).
pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k",
min.cells = 3, min.features = 200)
前處理(可選稿壁,如果讀取的rds已經(jīng)做過(guò)處理幽钢,這一步就不需要了)
## Identification of mithocondrial genes
pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
## Filtering cells following standard QC criteria.
pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 &
percent.mt < 5)
## Normalizing the data
pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize",
scale.factor = 10000)
pbmc <- NormalizeData(pbmc)
## Identify the 2000 most highly variable genes
pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = 2000)
## In addition we scale the data
all.genes <- rownames(pbmc)
pbmc <- ScaleData(pbmc, features = all.genes)
降維聚類(可選,Seurat的方法常摧,通常我們前面都已經(jīng)分析過(guò)了)
pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc),
verbose = FALSE)
pbmc <- FindNeighbors(pbmc, dims = 1:10, verbose = FALSE)
pbmc <- FindClusters(pbmc, resolution = 0.5, verbose = FALSE)
pbmc <- RunUMAP(pbmc, dims = 1:10, umap.method = "uwot", metric = "cosine")
pbmc.markers <- FindAllMarkers(pbmc, only.pos = TRUE, min.pct = 0.25,
logfc.threshold = 0.25, verbose = FALSE)
## Assigning cell type identity to clusters
new.cluster.ids <- c("Naive CD4 T", "Memory CD4 T", "CD14+ Mono", "B", "CD8 T",
"FCGR3A+ Mono", "NK", "DC", "Platelet")
names(new.cluster.ids) <- levels(pbmc)
pbmc <- RenameIdents(pbmc, new.cluster.ids)
DimPlot(pbmc, reduction = "umap", label = TRUE, pt.size = 0.5) + NoLegend()
計(jì)算細(xì)胞的TF活性,案例中首先通過(guò)使用包裝函數(shù) run_viper() 在 DoRothEA 的regulons上運(yùn)行 VIPER 以獲得 TFs activity。 該函數(shù)可以處理不同的輸入類型威创,例如矩陣落午、數(shù)據(jù)框、表達(dá)式集甚至 Seurat 對(duì)象肚豺。 在 seurat 對(duì)象的情況下溃斋,該函數(shù)返回相同的 seurat 對(duì)象,其中包含一個(gè)名為 dorothea 的assay吸申,其中包含slot數(shù)據(jù)中的 TFs activity梗劫。
## We read Dorothea Regulons for Human:
dorothea_regulon_human <- get(data("dorothea_hs", package = "dorothea"))
## We obtain the regulons based on interactions with confidence level A, B and C
regulon <- dorothea_regulon_human %>%
dplyr::filter(confidence %in% c("A","B","C"))
## We compute Viper Scores
pbmc <- run_viper(pbmc, regulon,
options = list(method = "scale", minsize = 4,
eset.filter = FALSE, cores = 1,
verbose = FALSE))
然后我們應(yīng)用 Seurat 按照與上述相同的方法但使用 TFs activity分?jǐn)?shù)對(duì)細(xì)胞進(jìn)行聚類。
## We compute the Nearest Neighbours to perform cluster
DefaultAssay(object = pbmc) <- "dorothea"
pbmc <- ScaleData(pbmc)
pbmc <- RunPCA(pbmc, features = rownames(pbmc), verbose = FALSE)
pbmc <- FindNeighbors(pbmc, dims = 1:10, verbose = FALSE)
pbmc <- FindClusters(pbmc, resolution = 0.5, verbose = FALSE)
pbmc <- RunUMAP(pbmc, dims = 1:10, umap.method = "uwot", metric = "cosine")
pbmc.markers <- FindAllMarkers(pbmc, only.pos = TRUE, min.pct = 0.25,
logfc.threshold = 0.25, verbose = FALSE)
## Assigning cell type identity to clusters
new.cluster.ids <- c("Naive CD4 T", "Memory CD4 T", "CD14+ Mono", "B", "CD8 T",
"FCGR3A+ Mono", "NK", "DC", "Platelet")
names(new.cluster.ids) <- levels(pbmc)
pbmc <- RenameIdents(pbmc, new.cluster.ids)
DimPlot(pbmc, reduction = "umap", label = TRUE, pt.size = 0.5) + NoLegend()
每個(gè)細(xì)胞群的TF活性(相當(dāng)于每個(gè)細(xì)胞群的bulk RNAseq),根據(jù)先前計(jì)算的 DoRothEA regulons的 VIPER 分?jǐn)?shù)截碴,我們根據(jù)它們的TF activities來(lái)表征不同的細(xì)胞群梳侨。
## We transform Viper scores, scaled by seurat, into a data frame to better
## handling the results
viper_scores_df <- GetAssayData(pbmc, slot = "scale.data",
assay = "dorothea") %>%
data.frame(check.names = F) %>%
t()
## We create a data frame containing the cells and their clusters
CellsClusters <- data.frame(cell = names(Idents(pbmc)),
cell_type = as.character(Idents(pbmc)),
check.names = F)
## We create a data frame with the Viper score per cell and its clusters
viper_scores_clusters <- viper_scores_df %>%
data.frame() %>%
rownames_to_column("cell") %>%
gather(tf, activity, -cell) %>%
inner_join(CellsClusters)
## We summarize the Viper scores by cellpopulation
summarized_viper_scores <- viper_scores_clusters %>%
group_by(tf, cell_type) %>%
summarise(avg = mean(activity),
std = sd(activity))
選擇在細(xì)胞群間變化最大的20個(gè)TFs進(jìn)行可視化
## We select the 20 most variable TFs. (20*9 populations = 180)
highly_variable_tfs <- summarized_viper_scores %>%
group_by(tf) %>%
mutate(var = var(avg)) %>%
ungroup() %>%
top_n(180, var) %>%
distinct(tf)
## We prepare the data for the plot
summarized_viper_scores_df <- summarized_viper_scores %>%
semi_join(highly_variable_tfs, by = "tf") %>%
dplyr::select(-std) %>%
spread(tf, avg) %>%
data.frame(row.names = 1, check.names = FALSE)
palette_length = 100
my_color = colorRampPalette(c("Darkblue", "white","red"))(palette_length)
my_breaks <- c(seq(min(summarized_viper_scores_df), 0,
length.out=ceiling(palette_length/2) + 1),
seq(max(summarized_viper_scores_df)/palette_length,
max(summarized_viper_scores_df),
length.out=floor(palette_length/2)))
viper_hmap <- pheatmap(t(summarized_viper_scores_df),fontsize=14,
fontsize_row = 10,
color=my_color, breaks = my_breaks,
main = "DoRothEA (ABC)", angle_col = 45,
treeheight_col = 0, border_color = NA)
感覺(jué)還挺好,方便日丹,能說(shuō)明一些生物學(xué)的問(wèn)題
生活很好走哺,有你更好