數(shù)據(jù)處理
1.加載原始數(shù)據(jù)
首先要加載 scRNA-seq 原始數(shù)據(jù):
library(Scillus)
library(tidyverse)
library(Seurat)
library(magrittr)
scRNA <- load_scfile(m)
Scillus 將為每個(gè)樣本創(chuàng)建Seurat
對(duì)象并自動(dòng)調(diào)用PercentageFeatureSet()
函數(shù)來(lái)計(jì)算線粒體基因百分比次屠。得到的scRNA
結(jié)果是多個(gè) Seurat 對(duì)象的列表。它的長(zhǎng)度等于原始數(shù)據(jù)的行數(shù)m
:
length(scRNA)
[1] 6
2.繪制質(zhì)控圖
QC 圖可以由plot_qc()
繪制。可以使用 ggplot 的語(yǔ)法自定義生成相應(yīng)的圖(如axis.title、theme 等)怠缸。
plot_qc(scRNA, metrics = "percent.mt")
plot_qc(scRNA, metrics = "nFeature_RNA")
plot_qc(scRNA, metrics = "nCount_RNA")
plot_qc()
有3個(gè)可選參數(shù):plot_type
,group_by
,和pal_setup
帘皿。
默認(rèn)值plot_type
就是"combined"
,這意味著這兩個(gè)箱形圖和小提琴同時(shí)繪制畸陡。如果僅首選兩個(gè)繪圖中的一個(gè)鹰溜,則可以將其設(shè)置為"box"
或"violin"
虽填。
plot_qc(scRNA, metrics = "percent.mt", plot_type = "box")
"density"
用于繪制密度圖。請(qǐng)注意曹动,可以添加額外的 ggplot 語(yǔ)法(此處為log10轉(zhuǎn)換)斋日。
plot_qc(scRNA, metrics = "nCount_RNA", plot_type = "density") + scale_x_log10()
)
group_by
的默認(rèn)值是"sample"
墓陈,其對(duì)應(yīng)于sample
在原始數(shù)據(jù)列表m
桑驱。由于加載過(guò)程中包含metadata數(shù)據(jù),QC 質(zhì)控結(jié)果也可以通過(guò)這些因素繪制跛蛋,例如"group"
(group
對(duì)應(yīng)于metadata數(shù)據(jù)中的列m
)熬的。
plot_qc(scRNA, metrics = "percent.mt", group_by = "group")
該參數(shù)pal_setup
支持三種類型的輸入:
-
RColorBrewer
調(diào)色板名稱 - 調(diào)色板設(shè)置數(shù)據(jù)框(查看上一節(jié)的最后一部分)
- 手動(dòng)指定的顏色向量。默認(rèn)值是調(diào)色板
"Set2"
plot_qc(scRNA, metrics = "percent.mt", group_by = "group", pal_setup = "Accent")
plot_qc(scRNA, metrics = "percent.mt", group_by = "group", pal_setup = pal)
plot_qc(scRNA, metrics = "percent.mt", group_by = "group", pal_setup = c("purple","yellow"))
3.過(guò)濾和整合
該filter_scdata()
函數(shù)用于 Seurat 對(duì)象子集化赊级。subset
參數(shù)的語(yǔ)法與Seurat 對(duì)象的subset()
函數(shù)相同押框。將自動(dòng)繪制條形圖以顯示過(guò)濾前后的細(xì)胞數(shù)量。
scRNA_f <- filter_scdata(scRNA, subset = nFeature_RNA > 500 & percent.mt < 10)
過(guò)濾后的 Seurat 對(duì)象列表scRNA_f
將由Seurat 標(biāo)準(zhǔn)流程進(jìn)一步處理:
scRNA_f %<>%
purrr::map(.f = NormalizeData) %>%
purrr::map(.f = FindVariableFeatures) %>%
purrr::map(.f = CellCycleScoring,
s.features = cc.genes$s.genes,
g2m.features = cc.genes$g2m.genes)
Seurat 對(duì)象列表scRNA_f
可以合并為一個(gè)單獨(dú)的 Seurat 對(duì)象scRNA_int
以進(jìn)行整合分析:
scRNA_int <- IntegrateData(anchorset = FindIntegrationAnchors(object.list = scRNA_f, dims = 1:30, k.filter = 50), dims = 1:30)
scRNA_int %<>%
ScaleData(vars.to.regress = c("nCount_RNA", "percent.mt", "S.Score", "G2M.Score"))
scRNA_int %<>%
RunPCA(npcs = 50, verbose = TRUE)
scRNA_int %<>%
RunUMAP(reduction = "pca", dims = 1:20, n.neighbors = 30) %>%
FindNeighbors(reduction = "pca", dims = 1:20) %>%
FindClusters(resolution = 0.3)
4.Factoring
通過(guò)refactor_seurat()
分解Seurat對(duì)象元數(shù)據(jù)是一個(gè)可選步驟理逊,主要是為了更好地繪圖橡伞。該函數(shù)將元數(shù)據(jù)m
作為參數(shù),并使Seurat對(duì)象元數(shù)據(jù)與m
中的元數(shù)據(jù)相同的因子級(jí)別晋被。如果沒(méi)有提供metadata
參數(shù)兑徘。Seurat 對(duì)象元數(shù)據(jù)中的所有字符向量都將被分解。
m %<>%
mutate(group = factor(group, levels = c("Normal", "CTCL")))
scRNA_int %<>%
refactor_seurat(metadata = m)
參考文獻(xiàn):
https://github.com/xmc811/Scillus