CibersortX網(wǎng)站是常用的工具,但是是網(wǎng)頁上傳數(shù)據(jù)励两,現(xiàn)在網(wǎng)頁503打不開,而BayesPrism在PMID: 37717006 文章benchmark 9種方法中發(fā)現(xiàn)BayesPrism的假陽性與假陰性數(shù)量上最低,并且在分解精細(xì)的免疫譜系時(shí)展現(xiàn)出最佳性能古毛;因此可以作為替代工具钦购,并且BayesPrism也提供了網(wǎng)頁工具檐盟,不過我們還是習(xí)慣本地跑代碼;新版的BayesPrism支持系數(shù)矩陣作為輸入押桃。
BayesPrism是一個(gè)綜合工具葵萎,旨在利用貝葉斯統(tǒng)計(jì)方法從bulk RNA測(cè)序數(shù)據(jù)中精確解析腫瘤微環(huán)境的細(xì)胞組成,并同時(shí)考慮細(xì)胞特異性的基因表達(dá)模式怨规,通過先進(jìn)的算法模塊實(shí)現(xiàn)對(duì)復(fù)雜細(xì)胞混合物的深入分析和理解陌宿。
BayesPrism包含細(xì)胞去卷積模塊和嵌入學(xué)習(xí)模塊。細(xì)胞去卷積模塊依據(jù)來自單細(xì)胞RNA測(cè)序(scRNA-seq)的細(xì)胞類型特異性表達(dá)輪廓建立先驗(yàn)波丰,聯(lián)合估計(jì)腫瘤(或非腫瘤)樣本的bulk RNA-seq表達(dá)數(shù)據(jù)中細(xì)胞類型組成及其特異性基因表達(dá)的后驗(yàn)分布壳坪。嵌入學(xué)習(xí)模塊采用期望最大化算法,基于去卷積模塊推測(cè)出的非惡性細(xì)胞表達(dá)量和比例條件掰烟,通過線性組合惡性基因程序來近似腫瘤表達(dá)模式爽蝴。
安裝
library(remotes)
remotes::install_github("Danko-Lab/BayesPrism")
注:using是我寫的函數(shù),有需要可以后臺(tái)聯(lián)系蝎亚,加入交流群九孩;using作用是一次性加載多個(gè)R包,不用寫雙引號(hào)发框,并且不在屏幕上打印包的加載信息
加載示例數(shù)據(jù)躺彬,可以后臺(tái)聯(lián)系獲得數(shù)據(jù)代碼和結(jié)果文件
using(remotes, data.table, BayesPrism)
load('tutorial.gbm.rdata')
輸入文件
bk.dat
: bulk轉(zhuǎn)錄組表達(dá)譜(矩陣、行樣本梅惯、列基因)
sc.dat
: 單細(xì)胞轉(zhuǎn)錄組表達(dá)譜(矩陣宪拥、行樣本、列基因)
cell.type.labels
: 細(xì)胞標(biāo)簽(向量)
cell.state.labels
: 細(xì)胞狀態(tài)標(biāo)簽(向量铣减,每個(gè)標(biāo)簽對(duì)應(yīng)的細(xì)胞數(shù)目要大于20)
注意:
- bk.dat等只是變量名換成其他也行她君;
- bk.dat和sc.dat為同樣的標(biāo)準(zhǔn)化方式,支持raw count葫哗、TPM, RPM, RPKM, FPKM缔刹,不支持log轉(zhuǎn)化后的數(shù)據(jù)
質(zhì)控圖
相關(guān)性圖查看樣本間相關(guān)性
BayesPrism::plot.cor.phi(
input = sc.dat,
input.labels = cell.state.labels,
title = "cell state correlation",
pdf.prefix="gbm.cor.cs",
cexRow = 0.2,
cexCol = 0.2,
margins = c(2, 2)
)
相關(guān)性圖查看細(xì)胞類型間相關(guān)性
BayesPrism::plot.cor.phi(
input = sc.dat,
input.labels = cell.type.labels,
title = "cell type correlation",
pdf.prefix="gbm.cor.ct",
cexRow = 0.5,
cexCol = 0.5
)
過濾基因
目的:去除線粒體、核糖體基因劣针、性染色體基因校镐、低表達(dá)基因,只選擇編碼蛋白的基因
繪制單細(xì)胞和bulk的離群基因
圖中顯示每個(gè)基因的歸一化平均表達(dá)(x 軸)和最大特異性(y 軸)的對(duì)數(shù)捺典,以及每個(gè)基因是否屬于一個(gè)潛在的“異趁鹣瑁”,糖體蛋白基因通常表現(xiàn)出高平均表達(dá)水平和低細(xì)胞類型特異性得分辣苏。
sc.stat <- BayesPrism::plot.scRNA.outlier(
input = sc.dat, # make sure the colnames are gene symbol or ENSMEBL ID
cell.type.labels = cell.type.labels,
species = "hs", # currently only human(hs) and mouse(mm) annotations are supported
return.raw = TRUE # return the data used for plotting.
pdf.prefix="gbm.sc.stat"
)
bk.stat <- BayesPrism::plot.bulk.outlier(
bulk.input = bk.dat, # make sure the colnames are gene symbol or ENSMEBL ID
sc.input = sc.dat, # make sure the colnames are gene symbol or ENSMEBL ID
cell.type.labels = cell.type.labels,
species = "hs", # currently only human(hs) and mouse(mm) annotations are supported
return.raw = TRUE
pdf.prefix="gbm.bk.stat"
)
去除線粒體肝箱、核糖體基因、性染色體基因稀蟋、低表達(dá)基因
sc.dat.filtered <- BayesPrism::cleanup.genes(
input = sc.dat,
input.type = "count.matrix",
species = "hs",
gene.group = c("Rb", "Mrp", "other_Rb", "chrM", "MALAT1", "chrX", "chrY"),
exp.cells = 5
)
繪制bk.dat與sc.dat間的相關(guān)性(只支持人的基因數(shù)據(jù))
BayesPrism::plot.bulk.vs.sc(
sc.input = sc.dat.filtered,
bulk.input = bk.dat
pdf.prefix="gbm.bk.vs.sc"
)
只選擇編碼蛋白的基因
sc.dat.filtered.pc <- BayesPrism::select.gene.type(sc.dat.filtered, gene.type = "protein_coding")
選擇signature基因
用差異分析方法給每個(gè)細(xì)胞類型選擇相應(yīng)的marker基因(>50)煌张,如果基因少,可以調(diào)整閾值
diff.exp.stat <- BayesPrism::get.exp.stat(
sc.dat = sc.dat[, colSums(sc.dat > 0) > 3], # filter genes to reduce memory use
cell.type.labels = cell.type.labels,
cell.state.labels = cell.state.labels,
pseudo.count = 0.1, # a numeric value used for log2 transformation. =0.1 for 10x data, =10 for smart-seq. Default=0.1.
cell.count.cutoff = 50, # a numeric value to exclude cell state with number of cells fewer than this value for t test. Default=50.
n.cores = 1 # number of threads
)
sc.dat.filtered.pc.sig <- BayesPrism::select.marker(
sc.dat = sc.dat.filtered.pc,
stat = diff.exp.stat,
pval.max = 0.01,
lfc.min = 0.1
)
構(gòu)造Prism對(duì)象
myPrism <- BayesPrism::new.prism(
reference = sc.dat.filtered.pc,
mixture = bk.dat,
input.type = "count.matrix",
cell.type.labels = cell.type.labels,
cell.state.labels = cell.state.labels,
key = "tumor",
outlier.cut = 0.01,
outlier.fraction = 0.1,
)
運(yùn)行BayesPrism
分布運(yùn)行退客,
bp.res.initial <- BayesPrism::run.prism(prism = myPrism, n.cores = 50, update.gibbs = FALSE)
base::saveRDS(bp.res.initial, file = file.path(outdir, "bp.res.initial.rds"))
bp.res.update <- BayesPrism::update.theta(bp = bp.res.initial)
base::saveRDS(bp.res.update, file = file.path(outdir, "bp.res.update.rds"))
提取細(xì)胞比例
BayesPrism在輸出中同時(shí)保留了細(xì)胞類型組成θ0的初始估計(jì)值和經(jīng)過更新的細(xì)胞類型組成估計(jì)值θf骏融。大多數(shù)情況下,用戶應(yīng)使用更新后的θ值萌狂,因?yàn)樗芨倪M(jìn)初始估計(jì)档玻。然而,在某些特殊情況下茫藏,可能需要使用初始估計(jì)值θ0误趴。例如,當(dāng)混合物中腫瘤成分含量較少(<50%)時(shí)务傲,或者參考樣本和混合樣本之間不存在批次效應(yīng)凉当,比如參考數(shù)據(jù)是從同一bulk RNA-seq平臺(tái)上通過流式細(xì)胞分選獲得的情況下枣申。
theta <- BayesPrism::get.fraction(
bp = bp.res.update,
which.theta = "final",
state.or.type = "type"
)
data.table::as.data.table(theta, keep.rownames = "Sample") %>% data.table::fwrite(file.path(outdir, "theta.csv"))
Reference
https://www.bayesprism.org/
https://github.com/Danko-Lab/BayesPrism
本文由mdnice多平臺(tái)發(fā)布