本文是參考學(xué)習(xí)單細(xì)胞轉(zhuǎn)錄組基礎(chǔ)分析二:數(shù)據(jù)質(zhì)控與標(biāo)準(zhǔn)化的學(xué)習(xí)筆記沈贝∪杖梗可能根據(jù)學(xué)習(xí)情況有所改動(dòng)错负。
本節(jié)及之后的三您访、四節(jié)主要介紹單細(xì)胞表達(dá)矩陣到細(xì)胞類型鑒定的分析步驟,流程圖如下:
10X官網(wǎng)下載數(shù)據(jù)
官方演示數(shù)據(jù)集的頁面如下振诬,Chromium Connect是自動(dòng)化系統(tǒng)挑随,官方介紹操作造成的批次效應(yīng)較小善炫。Chromium Next GEM最新版的芯片,由老版的雙十字微流控芯片改成了單十字微流控芯片捐韩。目前國(guó)內(nèi)的10X試劑基本都是V3版本的了退唠,因此也不要下載V1和V2試劑的數(shù)據(jù)了。為了保證筆記本電腦能運(yùn)行荤胁,我們下載細(xì)胞數(shù)比較少的數(shù)據(jù)瞧预,下圖紅色箭頭所示:
本教程的分析都是基于箭頭標(biāo)示的數(shù)據(jù)。點(diǎn)擊之后需要提交個(gè)人信息仅政,提交之后就進(jìn)入數(shù)據(jù)下載界面 了垢油。
下載紅框標(biāo)示的數(shù)據(jù),Seurat分析的輸入文件在這里圆丹,解壓后如下圖所示:
數(shù)據(jù)質(zhì)控與標(biāo)準(zhǔn)化
上游分析軟件Cell Ranger會(huì)對(duì)數(shù)據(jù)進(jìn)行質(zhì)控滩愁,但是在進(jìn)行下游分析前,我們一般會(huì)對(duì)數(shù)據(jù)進(jìn)行更嚴(yán)格的過濾辫封。
library(Seurat)
運(yùn)行上面的代碼后會(huì)在"QC"文件夾下面得到4個(gè)文件
打vlnplot_before_qc的文件
上面的小提琴圖依次是細(xì)胞的基因數(shù)量惊楼、mRNA分子數(shù)量、線粒體基因比例秸讹、紅細(xì)胞基因比例檀咙。我們一般根據(jù)基因數(shù)量和線粒體比例來過濾細(xì)胞,細(xì)胞的最低基因數(shù)一般選擇200-500璃诀,最大基因數(shù)和核糖體比例根據(jù)上圖來選擇弧可,我的建議是minGene=500,maxGene=4000劣欢,pctMT=15棕诵。
##設(shè)置質(zhì)控標(biāo)準(zhǔn)
運(yùn)行上面的代碼后會(huì)在"QC"文件夾下面得到vlnplot_after_qc的文件
可以看到基因數(shù)和核糖體比例不正常的細(xì)胞都過濾了裁良。以上代碼的最后一行是對(duì)數(shù)據(jù)進(jìn)行標(biāo)準(zhǔn)化,它的作用是讓測(cè)序數(shù)據(jù)量不同的細(xì)胞的基因表達(dá)量具有可比性校套。計(jì)算公式如下:標(biāo)準(zhǔn)化后基因表達(dá)量 = log1p(10000基因counts/細(xì)胞總counts)*
> ##創(chuàng)建seurat對(duì)象
> scRNA.counts <- Read10X(data.dir = "filtered_feature_bc_matrix")
> try({scRNA = CreateSeuratObject(scRNA.counts[['Gene Expression']])},silent=T)
> if(exists('scRNA')){} else {scRNA = CreateSeuratObject(scRNA.counts)}
> table(scRNA@meta.data$orig.ident) #查看樣本的細(xì)胞數(shù)量
SeuratProject
1222
> ##計(jì)算質(zhì)控指標(biāo)
> #計(jì)算細(xì)胞中核糖體基因比例
> scRNA[["percent.mt"]] <- PercentageFeatureSet(scRNA, pattern = "^MT-")
> #計(jì)算紅細(xì)胞比例
> HB.genes <- c("HBA1","HBA2","HBB","HBD","HBE1","HBG1","HBG2","HBM","HBQ1","HBZ")
> HB_m <- match(HB.genes, rownames(scRNA@assays$RNA))
> HB.genes <- rownames(scRNA@assays$RNA)[HB_m]
> HB.genes <- HB.genes[!is.na(HB.genes)]
> scRNA[["percent.HB"]]<-PercentageFeatureSet(scRNA, features=HB.genes)
> head(scRNA@meta.data)
orig.ident nCount_RNA nFeature_RNA percent.mt percent.HB
AAACCCAAGGAGAGTA-1 SeuratProject 8288 2620 10.774614 0
AAACGCTTCAGCCCAG-1 SeuratProject 5512 1808 7.964441 0
AAAGAACAGACGACTG-1 SeuratProject 4283 1562 6.187252 0
AAAGAACCAATGGCAG-1 SeuratProject 2754 1225 5.991285 0
AAAGAACGTCTGCAAT-1 SeuratProject 6592 1831 6.614078 0
AAAGGATAGTAGACAT-1 SeuratProject 8845 2048 7.959299 0
> col.num <- length(levels(scRNA@active.ident))
> violin <- VlnPlot(scRNA,
+ features = c("nFeature_RNA", "nCount_RNA", "percent.mt","percent.HB"),
+ cols =rainbow(col.num),
+ pt.size = 0.01, #不需要顯示點(diǎn)价脾,可以設(shè)置pt.size = 0
+ ncol = 4) +
+ theme(axis.title.x=element_blank(), axis.text.x=element_blank(), axis.ticks.x=element_blank())
> ggsave("QC/vlnplot_before_qc.pdf", plot = violin, width = 12, height = 6)
> ggsave("QC/vlnplot_before_qc.png", plot = violin, width = 12, height = 6)
> plot1 <- FeatureScatter(scRNA, feature1 = "nCount_RNA", feature2 = "percent.mt")
> plot2 <- FeatureScatter(scRNA, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")
> plot3 <- FeatureScatter(scRNA, feature1 = "nCount_RNA", feature2 = "percent.HB")
> pearplot <- CombinePlots(plots = list(plot1, plot2, plot3), nrow=1, legend="none")
Warning message:
CombinePlots is being deprecated. Plots should now be combined using the patchwork system.
> ggsave("QC/pearplot_before_qc.pdf", plot = pearplot, width = 12, height = 5)
> ggsave("QC/pearplot_before_qc.png", plot = pearplot, width = 12, height = 5)
> # 運(yùn)行上面的代碼后會(huì)在"QC"文件夾下面得到4個(gè)文件
> # 圖片
> #
> # 打vlnplot_before_qc的文件
> # 圖片
> #
> # 上面的小提琴圖依次是細(xì)胞的基因數(shù)量、mRNA分子數(shù)量笛匙、線粒體基因比例侨把、紅細(xì)胞基因比例。我們一般根據(jù)基因數(shù)量和線粒體比例來過濾細(xì)胞妹孙,細(xì)胞的最低基因數(shù)一般選擇200-500秋柄,最大基因數(shù)和核糖體比例根據(jù)上圖來選擇,我的建議是minGene=500蠢正,maxGene=4000骇笔,pctMT=15。
> ##設(shè)置質(zhì)控標(biāo)準(zhǔn)
> print(c("請(qǐng)輸入允許基因數(shù)和核糖體比例嚣崭,示例如下:", "minGene=500", "maxGene=4000", "pctMT=20"))
[1] "請(qǐng)輸入允許基因數(shù)和核糖體比例笨触,示例如下:" "minGene=500"
[3] "maxGene=4000" "pctMT=20"
> minGene=500
> maxGene=4000
> pctMT=15
> ##數(shù)據(jù)質(zhì)控
> scRNA <- subset(scRNA, subset = nFeature_RNA > minGene & nFeature_RNA < maxGene & percent.mt < pctMT)
> col.num <- length(levels(scRNA@active.ident))
> violin <-VlnPlot(scRNA,
+ features = c("nFeature_RNA", "nCount_RNA", "percent.mt","percent.HB"),
+ cols =rainbow(col.num),
+ pt.size = 0.1,
+ ncol = 4) +
+ theme(axis.title.x=element_blank(), axis.text.x=element_blank(), axis.ticks.x=element_blank())
> ggsave("QC/vlnplot_after_qc.pdf", plot = violin, width = 12, height = 6)
> ggsave("QC/vlnplot_after_qc.png", plot = violin, width = 12, height = 6)
> scRNA <- NormalizeData(scRNA, normalization.method = "LogNormalize", scale.factor = 10000)
Performing log-normalization
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
> ##保存中間結(jié)果
> saveRDS(scRNA, file="scRNA.rds")