更多內(nèi)容請訪問個人公眾號---KS科研分享與服務(wù)---
接上節(jié)(跟著Cell學(xué)單細(xì)胞轉(zhuǎn)錄組分析(二):單細(xì)胞轉(zhuǎn)錄組測序文件的讀入及Seurat對象構(gòu)建)窖张。
構(gòu)建完Seurat對象之后解阅,我們還需對數(shù)據(jù)進(jìn)行一些列的質(zhì)控亥啦,參能進(jìn)行降維聚類分析,QC對于后續(xù)的分析影響還是比較大的乐设,所以要重視部脚。
一般下游分析QC包含:
細(xì)胞基因檢出數(shù)译秦,低質(zhì)量細(xì)胞基因檢出數(shù)通常較低远搪,雙細(xì)胞或者同時捕獲多個細(xì)胞會有很高的基因數(shù)劣纲。所以要去除低質(zhì)量的,和過高的細(xì)胞谁鳍。
細(xì)胞檢測出的分子數(shù)
線粒體基因比例癞季,一般低質(zhì)量細(xì)胞或者死細(xì)胞線粒體基因檢出數(shù)很高。但是特殊情況特殊對待倘潜,有些細(xì)胞功能活躍绷柒,線粒體活躍,檢出數(shù)自然也會很高涮因。所以不能一刀切辉巡。
先計算下線粒體基因比例,用小提琴圖展示指控前指標(biāo)蕊退。
GM[["percent.mt"]] <- PercentageFeatureSet(GM,pattern = "^MT-")
BM[["percent.mt"]] <- PercentageFeatureSet(BM,pattern = "^MT-")
preQC_GM <- VlnPlot(GM, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"),
ncol = 3,
group.by = "orig.ident",
pt.size = 0)
preQC_BM <- VlnPlot(BM, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"),
ncol = 3,
group.by = "orig.ident",
pt.size = 0)
preQC_GM:
preQC_BM:
接下來,按照《Cell》原文章中的標(biāo)準(zhǔn)進(jìn)行質(zhì)控憔恳。
GM <- subset(GM, subset = nFeature_RNA > 200 & nFeature_RNA < 5000 & percent.mt < 15)
BM <- subset(BM, subset = nFeature_RNA > 200 & nFeature_RNA < 5000 & percent.mt < 15)
postQC_GM <- VlnPlot(GM, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"),
ncol = 3,
group.by = "orig.ident",
pt.size = 0)
postQC_BM <- VlnPlot(BM, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"),
ncol = 3,
group.by = "orig.ident",
pt.size = 0)
postQC_GM:
postQC_BM:
文章中還對質(zhì)控前后的細(xì)胞數(shù)進(jìn)行了對比瓤荔,這里不再演示,沒啥意義钥组。接下來就是將兩個數(shù)據(jù)合并输硝,去除批次效應(yīng)扣典,整合成一個seurat對象進(jìn)行下游降維捞稿。
BM <- NormalizeData(BM)
BM <- FindVariableFeatures(BM, nfeatures = 4000)
GM <- NormalizeData(GM)
GM <- FindVariableFeatures(GM, nfeatures = 4000)
#數(shù)據(jù)標(biāo)準(zhǔn)化及計算高變基因
使用FindIntegrationAnchors合并數(shù)據(jù)熊锭,IntegrateData去除批次效應(yīng)幌衣,當(dāng)然還有其他的函數(shù)可以選擇,原文作者使用了這種郎逃。最后將得到的數(shù)據(jù)保存哥童。
sampleList <- list(GM, BM)
scedata <- FindIntegrationAnchors(object.list = sampleList, dims = 1:50)
scedata <- IntegrateData(anchorset = scedata, dims = 1:50)
save(scedata, file = "scedata.RData")
這個質(zhì)控還是比較簡單的,但是要控制數(shù)據(jù)質(zhì)量要從細(xì)胞收集開始褒翰,上機(jī)測序也要質(zhì)控贮懈,各個環(huán)節(jié)都做好,才能最大程度保證數(shù)據(jù)的可靠性优训。下節(jié)開始朵你,單細(xì)胞數(shù)據(jù)的降維聚類!