GRN:基因調(diào)控網(wǎng)絡(luò)
分為四步進(jìn)行GRN分析
1.GENIE3/GRNBoost:基于共表達(dá)情況鑒定每個(gè)TF的潛在靶點(diǎn);
過(guò)濾表達(dá)矩陣并運(yùn)行R包GENIE3/GRNBoost侍咱;
將得到的靶點(diǎn)格式化為共表達(dá)模塊骗灶;
2.RcisTarget:基于DNA-motif 分析選擇潛在的直接結(jié)合靶點(diǎn)(調(diào)控原件)
鑒定細(xì)胞狀態(tài)和細(xì)胞的調(diào)控分子
3. AUCell:分析每個(gè)細(xì)胞的網(wǎng)絡(luò)活動(dòng)度
計(jì)算AUC對(duì)細(xì)胞的調(diào)控元件進(jìn)行評(píng)分;
將網(wǎng)絡(luò)活動(dòng)度轉(zhuǎn)為二分類(lèi)矩陣(on、off)
4.細(xì)胞聚類(lèi):基于GRN的活動(dòng)度鑒定穩(wěn)定的細(xì)胞狀態(tài)并對(duì)結(jié)果進(jìn)行探索
一.input數(shù)據(jù)準(zhǔn)備:
exprMatPath <- "data/sceMouseBrain.RData"
library(SingleCellExperiment)
sceMouseBrain <- readRDS(exprMatPath)
exprMat <- counts(sceMouseBrain)
dim(exprMat)
提供cell info或者表型相關(guān)的信息,用于step3~4的比較分析
cellInfo <- get_cellAnnotation(loom)
close_loom(loom)# cellInfo$nGene <- colSums(exprMat>0)# cellInfo <- data.frame(cellInfo)head(cellInfo)
dir.create("int")
saveRDS(cellInfo, file="int/cellInfo.Rds")# Color to assign to the variables (same format as for NMF::aheatmap)colVars <- list(CellType=c("microglia"="forestgreen",
? ? ? ? ? ? ? ? ? ? ? ? ? "endothelial-mural"="darkorange",
? ? ? ? ? ? ? ? ? ? ? ? ? "astrocytes_ependymal"="magenta4",
? ? ? ? ? ? ? ? ? ? ? ? ? "oligodendrocytes"="hotpink",
? ? ? ? ? ? ? ? ? ? ? ? ? "interneurons"="red3",
? ? ? ? ? ? ? ? ? ? ? ? ? "pyramidal CA1"="skyblue",
? ? ? ? ? ? ? ? ? ? ? ? ? "pyramidal SS"="darkblue"))
colVars$CellType <- colVars$CellType[intersect(names(colVars$CellType), cellInfo$CellType)]
saveRDS(colVars, file="int/colVars.Rds")
plot.new(); legend(0,1, fill=colVars$CellType, legend=names(colVars$CellType))
二.初始化SCENIC的設(shè)置參數(shù)
org父丰,dbDir,dbs,datasetTitle
library(SCENIC)
org="mgi" # or hgnc, or dmel
dbDir="cisTarget_databases" # RcisTarget databases location
myDatasetTitle="SCENIC example on Mouse brain" # choose a name for your analysis
data(defaultDbNames)
dbs <- defaultDbNames[[org]]
scenicOptions <- initializeScenic(org=org, dbDir=dbDir, dbs=dbs,datasetTitle=myDatasetTitle, nCores=10)?
根據(jù)需要進(jìn)行修正
scenicOptions@inputDatasetInfo$cellInfo <- "int/cellInfo.Rds"
scenicOptions@inputDatasetInfo$colVars <- "int/colVars.Rds"
# Databases:
# scenicOptions@settings$dbs <- c("mm9-5kb-mc8nr"="mm9-tss-centered-5kb-10species.mc8nr.feather")
# scenicOptions@settings$db_mcVersion <- "v8"
saveRDS(scenicOptions, file="int/scenicOptions.Rds")
三.共表達(dá)網(wǎng)絡(luò)
1.利用R包GENIE3/GRNBoost推斷潛在轉(zhuǎn)錄因子靶點(diǎn)
input數(shù)據(jù)形式:表達(dá)矩陣(過(guò)濾)或者轉(zhuǎn)錄因子的列表
output數(shù)據(jù)形式:相關(guān)矩陣著蟹,用于構(gòu)建共表達(dá)模塊(runSCENIC_1_coexNetwork2modules())
關(guān)于R包GENIE3/GRNBoost的選擇:
GENIE3:推薦3-5k細(xì)胞數(shù)據(jù)集,耗時(shí)較長(zhǎng)
GRNBoost:較大的數(shù)據(jù)集窒盐,耗時(shí)稍短
細(xì)胞的二次抽樣:
如果input的數(shù)據(jù)集存在較多低質(zhì)量的細(xì)胞或者耗時(shí)較長(zhǎng)草则,可以選擇對(duì)數(shù)據(jù)集的細(xì)胞先進(jìn)行二次抽樣(可以是隨機(jī)抽樣或者是選高質(zhì)量細(xì)胞),作為共表達(dá)的訓(xùn)練集蟹漓,用于AUCell所有細(xì)胞評(píng)估炕横。注意二次抽樣的細(xì)胞必須具有代表性。
基因的過(guò)濾葡粒、篩選:推薦較松的過(guò)濾標(biāo)準(zhǔn)份殿,移除低表達(dá)或者少數(shù)細(xì)胞表達(dá)的基因
(1)total number of reads per gene 平均每個(gè)基因的讀序數(shù)
目的:是移除疑似“噪聲”的基因
默認(rèn)保存所有樣本≥6UMI counts的基因,根據(jù)數(shù)據(jù)集的表達(dá)量單位(UMI,TPM等)校正(minCountsPerGene)嗽交。
?(2) number of cells in which the gene is detected表達(dá)該基因的細(xì)胞數(shù)
目的:移除少量“噪聲”細(xì)胞运准,即該基因在少數(shù)細(xì)胞高表達(dá)窄做,導(dǎo)致噪聲產(chǎn)生
默認(rèn)保存至少在1%的細(xì)胞表達(dá)的基因(minSamples)。
為避免移除有生物學(xué)意義的罕見(jiàn)細(xì)胞,推薦設(shè)置低于細(xì)胞最小分群占比的百分?jǐn)?shù)作為篩選標(biāo)
準(zhǔn)
(3)僅保留RcisTarget包中數(shù)據(jù)庫(kù)有的基因
目的:移除下游分析不需要的基因随橘,節(jié)省GENIE3/GRNBoost運(yùn)行的時(shí)間
# (Adjust minimum values according to your dataset)genesKept <- geneFiltering(exprMat, scenicOptions=scenicOptions,
? ? ? ? ? ? ? ? ? ? ? ? ? minCountsPerGene=3*.01*ncol(exprMat),
? ? ? ? ? ? ? ? ? ? ? ? ? minSamples=ncol(exprMat)*.01)
## Maximum value in the expression matrix: 421
## Ratio of detected vs non-detected: 0.24
## Number of counts (in the dataset units) per gene:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
##? ? 0.0? ? 17.0? ? 48.0? 146.8? 137.0? 5868.0
## Number of cells in which each gene is detected:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
##? ? 0.00? 10.00? 25.00? 39.01? 60.00? 180.00
##
## Number of genes left after applying the following filters (sequential):
## 771 genes with counts per gene > 6
## 770 genes detected in more than 2 cells
## 770 genes available in RcisTarget database
## Gene list saved in int/1.1_genesKept.Rds
在進(jìn)行共表達(dá)網(wǎng)絡(luò)推斷之前,先檢查是否把相關(guān)的TF篩除了搜锰,以判斷過(guò)濾標(biāo)準(zhǔn)是否設(shè)置正確缭乘。
interestingGenes <- c("Sox9", "Sox10", "Dlx5")
# any missing?
interestingGenes[which(!interestingGenes %in% genesKept)]
## character(0)
從exprMat取過(guò)濾后的基因的部分矩陣作為過(guò)濾后的矩陣exprMat_filtered
exprMat_filtered <- exprMat[genesKept, ]
dim(exprMat_filtered)
## [1] 770 200
移除exprMat
rm(exprMat)
計(jì)算spearman相關(guān)系數(shù)
runCorrelation(exprMat_filtered, scenicOptions)
推斷TF靶點(diǎn)
函數(shù) runGenie3 以默認(rèn)參數(shù)運(yùn)行r包GENIE3,將RcisTarget的數(shù)據(jù)庫(kù)作為候選的調(diào)控因子邑茄,對(duì)于大多數(shù)數(shù)據(jù)集是足夠的姨蝴。
GENIE3運(yùn)用了隨機(jī)森林的算法,每次運(yùn)行的結(jié)果都有些許不同肺缕,ntrees越多左医,可變性就越低。推薦設(shè)置set.seed從而產(chǎn)生較為準(zhǔn)確的結(jié)果同木。因?yàn)檫@一步耗時(shí)較長(zhǎng)(數(shù)小時(shí)或數(shù)天)浮梢,推薦可以在另外的r console或者r server或者HPC進(jìn)行下面的分析。
# log轉(zhuǎn)換(如果未進(jìn)行歸一化或者log轉(zhuǎn)換)?
exprMat_filtered <- log2(exprMat_filtered+1)?
# 運(yùn)行GENIE3
runGenie3(exprMat_filtered, scenicOptions)
四.構(gòu)建并對(duì)GRN進(jìn)行評(píng)分
構(gòu)建基因調(diào)控網(wǎng)絡(luò):
1.獲得共表達(dá)模塊
2.r 包RcisTarget進(jìn)行TFmotif分析獲得調(diào)控元件
鑒定細(xì)胞狀態(tài):
3.r包AUCell對(duì)細(xì)胞的GRN進(jìn)行評(píng)分
4.根據(jù)GRN活動(dòng)度對(duì)細(xì)胞進(jìn)行聚類(lèi)
loom <- open_loom(loomPath, mode="r")
exprMat <- get_dgem(loom)
close_loom(loom)
# Optional: log expression (for TF expression plot, it does not affect any other calculation)
logMat <- log2(exprMat+1)
dim(exprMat)
scenicOptions@settings$verbose <- TRUE
scenicOptions@settings$nCores <- 10
scenicOptions@settings$seed <- 123
#用的測(cè)試數(shù)據(jù)
scenicOptions@settings$dbs <- scenicOptions@settings$dbs["10kb"]
runSCENIC_1_coexNetwork2modules(scenicOptions)
#用top5perTarget進(jìn)行測(cè)試
runSCENIC_2_createRegulons(scenicOptions, coexMethod=c("top5perTarget"))
runSCENIC_3_scoreCells(scenicOptions, logMat)
對(duì)網(wǎng)絡(luò)活動(dòng)度進(jìn)行AUC評(píng)分彤路,對(duì)細(xì)胞進(jìn)行二分類(lèi)
output是一個(gè)二進(jìn)制矩陣黔寇,行名代表調(diào)控元件,列名代表細(xì)胞斩萌,“1”代表激活調(diào)控元件缝裤,“0”代表其他屏轰。r包AUCell可以自動(dòng)確定AUCscore的閾值,但是一般比較保守憋飞,建議人為進(jìn)行調(diào)整從而選擇最佳閾值霎苗。
aucellApp <- plotTsne_AUCellApp(scenicOptions, logMat)
savedSelections <- shiny::runApp(aucellApp)
# Save the modified thresholds:
newThresholds <- savedSelections$thresholds
scenicOptions@fileNames$int["aucell_thresholds",1] <- "int/newThresholds.Rds"saveRDS(newThresholds, file=getIntName(scenicOptions, "aucell_thresholds"))
saveRDS(scenicOptions, file="int/scenicOptions.Rds")?
最后根據(jù)閾值進(jìn)行二分類(lèi)
# scenicOptions@settings$devType="png"
runSCENIC_4_aucell_binarize(scenicOptions)
五.根據(jù)GRN評(píng)分對(duì)細(xì)胞進(jìn)行聚類(lèi)
聚類(lèi)方法可以是tsne,umap或者其他HC方法均可榛做;建議聚類(lèi)時(shí)嘗試不同參數(shù)看細(xì)胞狀態(tài)是否穩(wěn)定唁盏;
#選擇PC的數(shù)目
nPcs <- c(5,15,50)
scenicOptions@settings$seed <- 123
#用不同參數(shù)進(jìn)行t-SNE降維聚類(lèi)
fileNames <- tsneAUC(scenicOptions, aucType="AUC", nPcs=nPcs, perpl=c(5,15,50))
fileNames <- tsneAUC(scenicOptions, aucType="AUC", nPcs=nPcs, perpl=c(5,15,50), onlyHighConf=TRUE, filePrefix="int/tSNE_oHC")
#保存為PDF文件
fileNames <- paste0("int/",grep(".Rds", grep("tSNE_", list.files("int"), value=T), value=T))
#對(duì)t-SNE結(jié)果進(jìn)行可視化,并進(jìn)行比較
par(mfrow=c(length(nPcs), 3))
fileNames <- paste0("int/",grep(".Rds", grep("tSNE_AUC", list.files("int"), value=T, perl = T), value=T))
plotTsne_compareSettings(fileNames, scenicOptions, showLegend=FALSE, varName="CellType", cex=.5)
#選擇高度可信的調(diào)控元件
par(mfrow=c(3,3))
fileNames <- paste0("int/",grep(".Rds", grep("tSNE_oHC_AUC", list.files("int"), value=T, perl = T), value=T))
plotTsne_compareSettings(fileNames, scenicOptions, showLegend=FALSE, varName="CellType", cex=.5)
上述選擇的t-SNE用于繪圖检眯,并保存為.rds文件
scenicOptions@settings$defaultTsne$aucType <- "AUC"
scenicOptions@settings$defaultTsne$dims <- 5
scenicOptions@settings$defaultTsne$perpl <- 15
saveRDS(scenicOptions, file="int/scenicOptions.Rds")
六.輸出為.loom文件
需要r包:SCopeLoomR厘擂;運(yùn)用函數(shù)export2scope()
# Export:
scenicOptions@fileNames$output["loomFile",] <- "output/mouseBrain_SCENIC.loom"
export2scope(scenicOptions, exprMat)
從.loom讀取結(jié)果:regulons, AUC,? embeddings 這3個(gè)結(jié)果;motif富集分析和共表達(dá)模塊分析的結(jié)果以txt.file形式另外保存锰瘸。
library(SCopeLoomR)
scenicLoomPath <- getOutName(scenicOptions, "loomFile")
loom <- open_loom(scenicLoomPath)# Read information from loom file:regulons_incidMat <- get_regulons(loom)
regulons <- regulonsToGeneLists(regulons_incidMat)
regulonsAUC <- get_regulonsAuc(loom)
regulonsAucThresholds <- get_regulonThresholds(loom)
embeddings <- get_embeddings(loom)