在2019/08/07的Nature刊中殷勘,中科院景乃禾課題組發(fā)表了文章——Molecular architecture of lineage allocation and tissue organization in early mouse embryo 厌殉,我在這篇文章中發(fā)現(xiàn)了一個(gè)被湯神組 (就是Hemberg-lab單細(xì)胞轉(zhuǎn)錄組數(shù)據(jù)分析(二)- 實(shí)驗(yàn)平臺(tái)中開辟了單細(xì)胞轉(zhuǎn)錄組領(lǐng)域的人)反復(fù)用到的R工具-SCENIC
皿淋,現(xiàn)在讓我們來(lái)一起看看該工具有何妙用!
SCENIC簡(jiǎn)介
SCENIC
是一種同時(shí)重建基因調(diào)控網(wǎng)絡(luò)并從單細(xì)胞RNA-seq數(shù)據(jù)中鑒定stable cell states的工具藐俺∪笄福基于共表達(dá)和DNA淖阍桑基序 (motif)分析推斷基因調(diào)控網(wǎng)絡(luò) ,然后在每個(gè)細(xì)胞中分析網(wǎng)絡(luò)活性以鑒定細(xì)胞狀態(tài)读虏。
SCENIC
發(fā)表于2017年的Nature method文章责静。具體見鏈接:
https://www.nature.com/articles/nmeth.4463
要求:當(dāng)前版本的SCENIC支持人類,鼠和果蠅(Drosophila melanogaster)盖桥。
要將SCENIC應(yīng)用于其他物種灾螃,需要手動(dòng)調(diào)整第二步(例如使用新的RcisTarget
數(shù)據(jù)庫(kù)或使用不同的motif-enrichment-analysis
工具)。
輸入:SCENIC需要輸入的是單細(xì)胞RNA-seq表達(dá)矩陣
—— 每列對(duì)應(yīng)于樣品(細(xì)胞)揩徊,每行對(duì)應(yīng)一個(gè)基因腰鬼。基因ID應(yīng)該是gene-symbol并存儲(chǔ)為rownames (尤其是基因名字部分是為了與RcisTarget
數(shù)據(jù)庫(kù)兼容)塑荒;表達(dá)數(shù)據(jù)是Gene的reads count熄赡。根據(jù)作者的測(cè)試,提供原始的或Normalized UMI count袜炕,無(wú)論是否log轉(zhuǎn)換本谜,或使用TPM值,結(jié)果相差不大偎窘。(Overall, SCENIC is quite robust to this choice, we have applied SCENIC to datasets using raw (logged) UMI counts, normalized UMI counts, and TPM and they all provided reliable results (see Aibar et al. (2017)).)
SCENIC
先安裝R乌助,如果處理大樣本數(shù)據(jù)溜在,則建議使用Python版 (https://pyscenic.readthedocs.io/en/latest/);
SCENIC在R中實(shí)現(xiàn)基于三個(gè)R包:
-
GENIE3:
推斷基因共表達(dá)網(wǎng)絡(luò)
-
RcisTarget:
用于分析轉(zhuǎn)錄因子結(jié)合motif
-
AUCell:
用于鑒定scRNA-seq數(shù)據(jù)中具有活性基因集(基因網(wǎng)絡(luò))的細(xì)胞
運(yùn)行SCENIC需要安裝這些軟件包以及一些額外的依賴包:
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::version()
# If your bioconductor version is previous to 3.9, see the section bellow
## Required
BiocManager::install(c("AUCell", "RcisTarget"))
BiocManager::install(c("GENIE3")) # Optional. Can be replaced by GRNBoost
## Optional (but highly recommended):
# To score the network on cells (i.e. run AUCell):
BiocManager::install(c("zoo", "mixtools", "rbokeh"))
# For various visualizations and perform t-SNEs:
BiocManager::install(c("DT", "NMF", "pheatmap", "R2HTML", "Rtsne"))
# To support paralell execution (not available in Windows):
BiocManager::install(c("doMC", "doRNG"))
# To export/visualize in http://scope.aertslab.org
if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools")
devtools::install_github("aertslab/SCopeLoomR", build_vignettes = TRUE)
Version: AUCell >=1.4.1 (minimum 1.2.4), RcisTarget>=1.2.0 (minimum 1.0.2) and GENIE3>=1.4.0(minimum 1.2.1).
安裝SCENIC:
if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools")
devtools::install_github("aertslab/SCENIC")
packageVersion("SCENIC")
下載評(píng)分?jǐn)?shù)據(jù)庫(kù)
除了必要的R包之外,需要下載RcisTarget
的物種特定數(shù)據(jù)庫(kù)(https://resources.aertslab.org/cistarget/
他托;主題排名)掖肋。默認(rèn)情況下,SCENIC
使用在基因啟動(dòng)子(TSS上游500 bp
)和TSS周圍 20 kb (+/- 10kb)
中對(duì)模序進(jìn)行評(píng)分的數(shù)據(jù)庫(kù)赏参。
- For human:
dbFiles <- c("https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/mc9nr/gene_based/hg19-500bp-upstream-7species.mc9nr.feather",
"https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/mc9nr/gene_based/hg19-tss-centered-10kb-7species.mc9nr.feather")
# mc9nr: Motif collection version 9: 24k motifs
- For mouse:
dbFiles <- c("https://resources.aertslab.org/cistarget/databases/mus_musculus/mm9/refseq_r45/mc9nr/gene_based/mm9-500bp-upstream-7species.mc9nr.feather",
"https://resources.aertslab.org/cistarget/databases/mus_musculus/mm9/refseq_r45/mc9nr/gene_based/mm9-tss-centered-10kb-7species.mc9nr.feather")
# mc9nr: Motif collection version 9: 24k motifs
- For fly:
dbFiles <- c("https://resources.aertslab.org/cistarget/databases/drosophila_melanogaster/dm6/flybase_r6.02/mc8nr/gene_based/dm6-5kb-upstream-full-tx-11species.mc8nr.feather")
# mc8nr: Motif collection version 8: 20k motifs
注意:下載后最好確認(rèn)下載的數(shù)據(jù)是否完整志笼,可基于MD5值評(píng)估。(參考鏈接:https://resources.aertslab.org/cistarget/databases/sha256sum.txt)
完成這些設(shè)置步驟后把篓,SCENIC
即可運(yùn)行纫溃!
數(shù)據(jù)格式不同時(shí)如何讀入?
最終讀入的信息有兩個(gè)韧掩,一個(gè)是前面說(shuō)的表達(dá)矩陣紊浩,還有一個(gè)是樣品分組信息。
- a) From
.loom
file
.loom文件可以通過(guò)SCopeLoomR
包直接導(dǎo)入SCENIC疗锐。(loom
格式是用于存儲(chǔ)非常大的組學(xué)數(shù)據(jù)集的專屬格式坊谁,具體見 http://linnarssonlab.org/loompy/)
## Download:download.file("http://loom.linnarssonlab.org/clone/Previously%20Published/Cortex.loom", "Cortex.loom")
loomPath <- "Cortex.loom"
- b) From 10X/CellRanger output files
10X/CellRanger
輸出結(jié)果可用作SCENIC的輸入文件 (需要先安裝Seurat
)。
# BiocManager::install("Seurat")
# 測(cè)試數(shù)據(jù)也可以從Seurat官網(wǎng)下載滑臊,自己修改路徑
# 測(cè)試數(shù)據(jù):https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/output/matrices#r-load-mat>
singleCellMatrix <- Seurat::Read10X(data.dir="data/pbmc3k/filtered_gene_bc_matrices/hg19/")
cellInfo <- data.frame(seuratCluster=Idents(seuratObject))
- c) From other R objects (e.g. Seurat, SingleCellExperiment)
sce <- load_as_sce(dataPath) # any SingleCellExperiment object
exprMat <- counts(sce)
cellInfo <- colData(sce)
- d) From GEO
從GEO下載和格式化數(shù)據(jù)集的示例(例如口芍,GEOGSE60361
:3005小鼠腦細(xì)胞數(shù)據(jù)集)
# 獲取數(shù)據(jù)及數(shù)據(jù)標(biāo)注
# dir.create("SCENIC_MouseBrain"); setwd("SCENIC_MouseBrain") # if needed
# (This may take a few minutes)
if (!requireNamespace("GEOquery", quietly = TRUE)) BiocManager::install("GEOquery")
library(GEOquery)
geoFile <- getGEOSuppFiles("GSE60361", makeDirectory=FALSE)
gzFile <- grep("Expression", basename(rownames(geoFile)), value=TRUE)
txtFile <- gsub(".gz", "", gzFile)
gunzip(gzFile, destname=txtFile, remove=TRUE)
library(data.table)
geoData <- fread(txtFile, sep="\t")
geneNames <- unname(unlist(geoData[,1, with=FALSE]))
exprMatrix <- as.matrix(geoData[,-1, with=FALSE])
rm(geoData)
dim(exprMatrix)
rownames(exprMatrix) <- geneNames
exprMatrix <- exprMatrix[unique(rownames(exprMatrix)),]
exprMatrix[1:5,1:4]
# Remove file downloaded:
file.remove(txtFile)
運(yùn)行 SCENIC
SCENIC workflow:
建立基因調(diào)控網(wǎng)絡(luò)(Gene Regulation Network,GRN
):
基于共表達(dá)識(shí)別每個(gè)轉(zhuǎn)錄因子TF的潛在靶標(biāo)雇卷。
過(guò)濾表達(dá)矩陣并運(yùn)行GENIE3或者GRNBoost鬓椭,它們是利用表達(dá)矩陣推斷基因調(diào)控網(wǎng)絡(luò)的一種算法,能得到轉(zhuǎn)錄因子和潛在靶標(biāo)的相關(guān)性網(wǎng)絡(luò)聋庵;
將目標(biāo)從GENIE3或者GRNBoost格式轉(zhuǎn)為共表達(dá)模塊膘融。根據(jù)DNA模序分析(
motif
)選擇潛在的直接結(jié)合靶標(biāo)(調(diào)節(jié)因子)(利用RcisTarget
包:TF基序分析)
確定細(xì)胞狀態(tài)及其調(diào)節(jié)因子:
- 分析每個(gè)細(xì)胞中的網(wǎng)絡(luò)活性(AUCell)
- 在細(xì)胞中評(píng)分調(diào)節(jié)子(計(jì)算AUC)
SCENIC完整流程
### 導(dǎo)入數(shù)據(jù)
loomPath <- system.file(package="SCENIC", "examples/mouseBrain_toy.loom")
library(SCopeLoomR)
loom <- open_loom(loomPath, mode="r")
exprMat <- get_dgem(loom)
cellInfo <- get_cellAnnotation(loom)
close_loom(loom)
### Initialize settings 初始設(shè)置,導(dǎo)入評(píng)分?jǐn)?shù)據(jù)庫(kù)
library(SCENIC)
scenicOptions <- initializeScenic(org="mgi", dbDir="cisTarget_databases", nCores=10)
# scenicOptions@inputDatasetInfo$cellInfo <- "int/cellInfo.Rds"
saveRDS(scenicOptions, file="int/scenicOptions.Rds")
### 共表達(dá)網(wǎng)絡(luò)
genesKept <- geneFiltering(exprMat, scenicOptions)
exprMat_filtered <- exprMat[genesKept, ]
runCorrelation(exprMat_filtered, scenicOptions)
exprMat_filtered_log <- log2(exprMat_filtered+1)
runGenie3(exprMat_filtered_log, scenicOptions)
### Build and score the GRN 構(gòu)建并給基因調(diào)控網(wǎng)絡(luò)(GRN)打分
exprMat_log <- log2(exprMat+1)
scenicOptions@settings$dbs <- scenicOptions@settings$dbs["10kb"] # Toy run settings
runSCENIC_1_coexNetwork2modules(scenicOptions)
runSCENIC_2_createRegulons(scenicOptions, coexMethod=c("top5perTarget")) # Toy run settings
runSCENIC_3_scoreCells(scenicOptions, exprMat_log)
# Export:
export2scope(scenicOptions, exprMat)
# Binarize activity?
# aucellApp <- plotTsne_AUCellApp(scenicOptions, exprMat_log)
# savedSelections <- shiny::runApp(aucellApp)
# newThresholds <- savedSelections$thresholds
# scenicOptions@fileNames$int["aucell_thresholds",1] <- "int/newThresholds.Rds"
# saveRDS(newThresholds, file=getIntName(scenicOptions, "aucell_thresholds"))
# saveRDS(scenicOptions, file="int/scenicOptions.Rds")
runSCENIC_4_aucell_binarize(scenicOptions)
### Exploring output
# Check files in folder 'output'
# .loom file @ http://scope.aertslab.org
# output/Step2_MotifEnrichment_preview.html in detail/subset:
motifEnrichment_selfMotifs_wGenes <- loadInt(scenicOptions, "motifEnrichment_selfMotifs_wGenes")
tableSubset <- motifEnrichment_selfMotifs_wGenes[highlightedTFs=="Sox8"]
viewMotifs(tableSubset)
# output/Step2_regulonTargetsInfo.tsv in detail:
regulonTargetsInfo <- loadInt(scenicOptions, "regulonTargetsInfo")
tableSubset <- regulonTargetsInfo[TF=="Stat6" & highConfAnnot==TRUE]
viewMotifs(tableSubset)
舉個(gè)栗子祭玉!
輸入表達(dá)矩陣
在本教程中氧映,我們提供了一個(gè)示例,樣本是小鼠大腦的200個(gè)細(xì)胞和862個(gè)基因:
loomPath <- system.file(package="SCENIC", "examples/mouseBrain_toy.loom")
打開loom文件并加載表達(dá)矩陣脱货;
library(SCopeLoomR)
loom <- open_loom(loomPath, mode="r")
exprMat <- get_dgem(loom)
cellInfo <- get_cellAnnotation(loom)
close_loom(loom)
dim(exprMat)
細(xì)胞信息/表型
# cellInfo$nGene <- colSums(exprMat>0)
head(cellInfo)
cellInfo <- data.frame(cellInfo)
cellTypeColumn <- "Class"
colnames(cellInfo)[which(colnames(cellInfo)==cellTypeColumn)] <- "CellType"
cbind(table(cellInfo$CellType))
# Color to assign to the variables (same format as for NMF::aheatmap)
colVars <- list(CellType=c("microglia"="forestgreen",
"endothelial-mural"="darkorange",
"astrocytes_ependymal"="magenta4",
"oligodendrocytes"="hotpink",
"interneurons"="red3",
"pyramidal CA1"="skyblue",
"pyramidal SS"="darkblue"))
colVars$CellType <- colVars$CellType[intersect(names(colVars$CellType), cellInfo$CellType)]
saveRDS(colVars, file="int/colVars.Rds")
plot.new(); legend(0,1, fill=colVars$CellType, legend=names(colVars$CellType))
初始化SCENIC設(shè)置
為了在SCENIC
的多個(gè)步驟中保持設(shè)置一致岛都,SCENIC包中的大多數(shù)函數(shù)使用一個(gè)公共對(duì)象,該對(duì)象存儲(chǔ)當(dāng)前運(yùn)行的選項(xiàng)并代替大多數(shù)函數(shù)的“參數(shù)”振峻。比如下面的org
臼疫,dbDir
等,可以在開始就將物種rog(mgi
—— mouse扣孟, hgnc
—— human烫堤, dmel
—— fly)和RcisTarge
數(shù)據(jù)庫(kù)位置分別讀給對(duì)象org
,dbDir
,之后統(tǒng)一用函數(shù)initializeScenic
得到對(duì)象scenicOptions
鸽斟。具體參數(shù)設(shè)置可以用?initializeScenic
help一下拔创。
library(SCENIC)
org="mgi" # or hgnc, or dmel
dbDir="cisTarget_databases" # RcisTarget databases location
myDatasetTitle="SCENIC example on Mouse brain" # choose a name for your analysis
data(defaultDbNames)
dbs <- defaultDbNames[[org]]
scenicOptions <- initializeScenic(org=org, dbDir=dbDir, dbs=dbs, datasetTitle=myDatasetTitle, nCores=10)
# Modify if needed
scenicOptions@inputDatasetInfo$cellInfo <- "int/cellInfo.Rds"
scenicOptions@inputDatasetInfo$colVars <- "int/colVars.Rds"
# Databases:
# scenicOptions@settings$dbs <- c("mm9-5kb-mc8nr"="mm9-tss-centered-5kb-10species.mc8nr.feather")
# scenicOptions@settings$db_mcVersion <- "v8"
# Save to use at a later time...
saveRDS(scenicOptions, file="int/scenicOptions.Rds")
共表達(dá)網(wǎng)絡(luò)
SCENIC
工作流程的第一步是根據(jù)表達(dá)數(shù)據(jù)推斷潛在的轉(zhuǎn)錄因子靶標(biāo)。為此富蓄,我們使用GENIE3或GRNBoost剩燥,輸入文件是表達(dá)矩陣(過(guò)濾后的)和轉(zhuǎn)錄因子列表。GENIE3/GRBBoost的輸出結(jié)果和相關(guān)矩陣將用于創(chuàng)建共表達(dá)模塊(runSCENIC_1_coexNetwork2modules()
)立倍。
基因過(guò)濾/選擇
-
按每個(gè)基因的reads總數(shù)進(jìn)行過(guò)濾灭红。
該filter旨在去除最可能是噪音的基因。
默認(rèn)情況下口注,它(
minCountsPerGene
)保留所有樣品中至少帶有6個(gè)UMI reads的基因(例如变擒,如果在1%的細(xì)胞中以3的值表達(dá),則基因?qū)⒕哂械目倲?shù))寝志。 -
通過(guò)基因的細(xì)胞數(shù)來(lái)實(shí)現(xiàn)過(guò)濾(例如 UMI > 0 赁项,或log 2(TPM)> 1 )。
默認(rèn)情況下(
minSamples
)澈段,保留下來(lái)的基因能在至少1%的細(xì)胞中檢測(cè)得到。 最后舰攒,只保留
RcisTarget
數(shù)據(jù)庫(kù)中可用的基因败富。
# (Adjust minimum values according to your dataset)
genesKept <- geneFiltering(exprMat, scenicOptions=scenicOptions,
minCountsPerGene=3*.01*ncol(exprMat),
minSamples=ncol(exprMat)*.01)
在進(jìn)行網(wǎng)絡(luò)推斷之前,檢查是否有任何已知的相關(guān)基因被過(guò)濾掉(如果缺少任何相關(guān)基因摩窃,請(qǐng)仔細(xì)檢查filter
設(shè)置是否合適):
interestingGenes <- c("Sox9", "Sox10", "Dlx5")
# any missing?
interestingGenes[which(!interestingGenes %in% genesKept)]
相關(guān)性
GENIE33或者GRNBoost可以檢測(cè)正負(fù)關(guān)聯(lián)兽叮。為了區(qū)分潛在的激活和抑制,我們將目標(biāo)分為正相關(guān)和負(fù)相關(guān)目標(biāo)(比如TF與潛在目標(biāo)之間的Spearman
相關(guān)性)猾愿。
runCorrelation(exprMat_filtered, scenicOptions)
運(yùn)行GENIE3
得到潛在轉(zhuǎn)錄因子TF
## If launched in a new session, you will need to reload...
# setwd("...")
# loomPath <- "..."
# loom <- open_loom(loomPath, mode="r")
# exprMat <- get_dgem(loom)
# close_loom(loom)
# genesKept <- loadInt(scenicOptions, "genesKept")
# exprMat_filtered <- exprMat[genesKept,]
# library(SCENIC)
# scenicOptions <- readRDS("int/scenicOptions.Rds")
# Optional: add log (if it is not logged/normalized already)
exprMat_filtered <- log2(exprMat_filtered+1)
# Run GENIE3
runGenie3(exprMat_filtered, scenicOptions)
構(gòu)建并評(píng)分GRN(runSCENIC_ …)
必要時(shí)重新加載表達(dá)式矩陣:
loom <- open_loom(loomPath, mode="r")
exprMat <- get_dgem(loom)
close_loom(loom)
# Optional: log expression (for TF expression plot, it does not affect any other calculation)
logMat <- log2(exprMat+1)
dim(exprMat)
使用wrapper
函數(shù)運(yùn)行其余步驟:
library(SCENIC)
scenicOptions <- readRDS("int/scenicOptions.Rds")
scenicOptions@settings$verbose <- TRUE
scenicOptions@settings$nCores <- 10
scenicOptions@settings$seed <- 123
# For a very quick run:
# coexMethod=c("top5perTarget")
scenicOptions@settings$dbs <- scenicOptions@settings$dbs["10kb"] # For toy run
# save...
runSCENIC_1_coexNetwork2modules(scenicOptions)
runSCENIC_2_createRegulons(scenicOptions, coexMethod=c("top5perTarget")) #** Only for toy run!!
runSCENIC_3_scoreCells(scenicOptions, logMat)
可選步驟
將network activity轉(zhuǎn)換成ON/OFF(二進(jìn)制)格式
nPcs <- c(5) # For toy dataset
# nPcs <- c(5,15,50)
scenicOptions@settings$seed <- 123 # same seed for all of them
# Run t-SNE with different settings:
fileNames <- tsneAUC(scenicOptions, aucType="AUC", nPcs=nPcs, perpl=c(5,15,50))
fileNames <- tsneAUC(scenicOptions, aucType="AUC", nPcs=nPcs, perpl=c(5,15,50), onlyHighConf=TRUE, filePrefix="int/tSNE_oHC")
# Plot as pdf (individual files in int/):
fileNames <- paste0("int/",grep(".Rds", grep("tSNE_", list.files("int"), value=T), value=T))
par(mfrow=c(length(nPcs), 3))
fileNames <- paste0("int/",grep(".Rds", grep("tSNE_AUC", list.files("int"), value=T, perl = T), value=T))
plotTsne_compareSettings(fileNames, scenicOptions, showLegend=FALSE, varName="CellType", cex=.5)
# Using only "high-confidence" regulons (normally similar)
par(mfrow=c(3,3))
fileNames <- paste0("int/",grep(".Rds", grep("tSNE_oHC_AUC", list.files("int"), value=T, perl = T), value=T))
plotTsne_compareSettings(fileNames, scenicOptions, showLegend=FALSE, varName="CellType", cex=.5)
輸出到 loom/SCope
SCENIC
生成的結(jié)果既能在http://scope.aertslab.org
查看鹦聪,還能用函數(shù)export2scope()
(需要SCopeLoomR
包)保存成.loom
文件。
# DGEM (Digital gene expression matrix)
# (non-normalized counts)
# exprMat <- get_dgem(open_loom(loomPath))
# dgem <- exprMat
# head(colnames(dgem)) #should contain the Cell ID/name
# Export:
scenicOptions@fileNames$output["loomFile",] <- "output/mouseBrain_SCENIC.loom"
export2scope(scenicOptions, exprMat)
加載.loom文件中的結(jié)果
SCopeLoomR
中也有函數(shù)可以導(dǎo)入.loom
文件中的內(nèi)容蒂秘,比如調(diào)節(jié)因子泽本,AUC
和封裝內(nèi)容(比如regulon activity
的t-SNE和UMAP
結(jié)果)。
library(SCopeLoomR)
scenicLoomPath <- getOutName(scenicOptions, "loomFile")
loom <- open_loom(scenicLoomPath)
# Read information from loom file:
regulons_incidMat <- get_regulons(loom)
regulons <- regulonsToGeneLists(regulons_incidMat)
regulonsAUC <- get_regulonsAuc(loom)
regulonsAucThresholds <- get_regulonThresholds(loom)
embeddings <- get_embeddings(loom)
解讀結(jié)果
1. 細(xì)胞狀態(tài)
AUCell
提供跨細(xì)胞的調(diào)節(jié)子的活性,AUCell使用“Area under Curve 曲線下面積”(AUC
)來(lái)計(jì)算輸入基因集的關(guān)鍵子集是否在每個(gè)細(xì)胞的表達(dá)基因中富集姻僧。通過(guò)該調(diào)節(jié)子活性(連續(xù)或二進(jìn)制AUC矩陣)來(lái)聚類細(xì)胞规丽,我們可以看出是否存在傾向于具有相同調(diào)節(jié)子活性的細(xì)胞群,并揭示在多個(gè)細(xì)胞中反復(fù)發(fā)生的網(wǎng)絡(luò)狀態(tài)撇贺。這些狀態(tài)等同于網(wǎng)絡(luò)的吸引子狀態(tài)赌莺。將這些聚類與不同的可視化方法相結(jié)合,我們可以探索細(xì)胞狀態(tài)與特定調(diào)節(jié)子的關(guān)聯(lián)松嘶。
將AUC和TF表達(dá)投射到t-SNE上
logMat <- exprMat # Better if it is logged/normalized
aucellApp <- plotTsne_AUCellApp(scenicOptions, logMat) # default t-SNE
savedSelections <- shiny::runApp(aucellApp)
print(tsneFileName(scenicOptions))
tSNE_scenic <- readRDS(tsneFileName(scenicOptions))
aucell_regulonAUC <- loadInt(scenicOptions, "aucell_regulonAUC")
# Show TF expression:
par(mfrow=c(2,3))
AUCell::AUCell_plotTSNE(tSNE_scenic$Y, exprMat, aucell_regulonAUC[onlyNonDuplicatedExtended(rownames(aucell_regulonAUC))[c("Dlx5", "Sox10", "Sox9","Irf1", "Stat6")],], plots="Expression")
# Save AUC as PDF:
Cairo::CairoPDF("output/Step4_BinaryRegulonActivity_tSNE_colByAUC.pdf", width=20, height=15)
par(mfrow=c(4,6))
AUCell::AUCell_plotTSNE(tSNE_scenic$Y, cellsAUC=aucell_regulonAUC, plots="AUC")
dev.off()
library(KernSmooth)
library(RColorBrewer)
dens2d <- bkde2D(tSNE_scenic$Y, 1)$fhat
image(dens2d, col=brewer.pal(9, "YlOrBr"), axes=FALSE)
contour(dens2d, add=TRUE, nlevels=5, drawlabels=FALSE)
#par(bg = "black")
par(mfrow=c(1,2))
regulonNames <- c( "Dlx5","Sox10")
cellCol <- plotTsne_rgb(scenicOptions, regulonNames, aucType="AUC", aucMaxContrast=0.6)
text(0, 10, attr(cellCol,"red"), col="red", cex=.7, pos=4)
text(-20,-10, attr(cellCol,"green"), col="green3", cex=.7, pos=4)
regulonNames <- list(red=c("Sox10", "Sox8"),
green=c("Irf1"),
blue=c( "Tef"))
cellCol <- plotTsne_rgb(scenicOptions, regulonNames, aucType="Binary")
text(5, 15, attr(cellCol,"red"), col="red", cex=.7, pos=4)
text(5, 15-4, attr(cellCol,"green"), col="green3", cex=.7, pos=4)
text(5, 15-8, attr(cellCol,"blue"), col="blue", cex=.7, pos=4)
GRN:Regulon靶標(biāo)和模序
regulons <- loadInt(scenicOptions, "regulons")
regulons[c("Dlx5", "Irf1")]
regulons <- loadInt(scenicOptions, "aucell_regulons")
head(cbind(onlyNonDuplicatedExtended(names(regulons))))
regulonTargetsInfo <- loadInt(scenicOptions, "regulonTargetsInfo")
tableSubset <- regulonTargetsInfo[TF=="Stat6" & highConfAnnot==TRUE]
viewMotifs(tableSubset)
2. 細(xì)胞群的調(diào)控因子
regulonAUC <- loadInt(scenicOptions, "aucell_regulonAUC")
regulonAUC <- regulonAUC[onlyNonDuplicatedExtended(rownames(regulonAUC)),]
regulonActivity_byCellType <- sapply(split(rownames(cellInfo), cellInfo$CellType),
function(cells) rowMeans(getAUC(regulonAUC)[,cells]))
regulonActivity_byCellType_Scaled <- t(scale(t(regulonActivity_byCellType), center = T, scale=T))
pheatmap::pheatmap(regulonActivity_byCellType_Scaled, #fontsize_row=3,
color=colorRampPalette(c("blue","white","red"))(100), breaks=seq(-3, 3, length.out = 100),
treeheight_row=10, treeheight_col=10, border_color=NA)
# filename="regulonActivity_byCellType.pdf", width=10, height=20)
topRegulators <- reshape2::melt(regulonActivity_byCellType_Scaled)
colnames(topRegulators) <- c("Regulon", "CellType", "RelativeActivity")
topRegulators <- topRegulators[which(topRegulators$RelativeActivity>0),]
viewTable(topRegulators)
minPerc <- .7
binaryRegulonActivity <- loadInt(scenicOptions, "aucell_binary_nonDupl")
cellInfo_binarizedCells <- cellInfo[which(rownames(cellInfo)%in% colnames(binaryRegulonActivity)),, drop=FALSE]
regulonActivity_byCellType_Binarized <- sapply(split(rownames(cellInfo_binarizedCells), cellInfo_binarizedCells$CellType),
function(cells) rowMeans(binaryRegulonActivity[,cells, drop=FALSE]))
binaryActPerc_subset <- regulonActivity_byCellType_Binarized[which(rowSums(regulonActivity_byCellType_Binarized>minPerc)>0),]
pheatmap::pheatmap(binaryActPerc_subset, # fontsize_row=5,
color = colorRampPalette(c("white","pink","red"))(100), breaks=seq(0, 1, length.out = 100),
treeheight_row=10, treeheight_col=10, border_color=NA)
參考文獻(xiàn)
Aibar, Sara, Carmen Bravo González-Blas, Thomas Moerman, Jasper Wouters, Van Anh Huynh-Thu, Hana Imrichová, Zeynep Kalender Atak, et al. 2017. “SCENIC: Single-Cell Regulatory Network Inference and Clustering.” Nature Methods 14 (october): 1083–6. doi:10.1038/nmeth.4463.
Davie, Kristofer, Jasper Janssens, Duygu Koldere, and “et al.” 2018. “A Single-Cell Transcriptome Atlas of the Aging Drosophila Brain.” Cell, june. doi:10.1016/j.cell.2018.05.057.
Huynh-Thu, Van Anh, Alexandre Irrthum, Louis Wehenkel, and Pierre Geurts. 2010. “Inferring Regulatory Networks from Expression Data Using Tree-Based Methods.” PloS One 5 (9). doi:10.1371/journal.pone.0012776.
Marbach, Daniel, James C. Costello, Robert Küffner, Nicole M. Vega, Robert J. Prill, Diogo M. Camacho, Kyle R. Allison, et al. 2012. “Wisdom of Crowds for Robust Gene Network Inference.” Nature Methods 9 (8): 796–804. doi:10.1038/nmeth.2016.