(一)Seurat的數(shù)據(jù)結(jié)構(gòu)
版本:3.1.5
直接輸入Seurat object的名稱圃阳,我們可以得到類似如下內(nèi)容:
An object of class Seurat 13425 features across 39233 samples within 1 assay?
Active assay: RNA (13425 features, 3000 variable features) 3 dimensional reductions calculated: pca, umap, tsne
這個(gè)告訴我們當(dāng)前對(duì)象主體是13425(基因數(shù))*39233(細(xì)胞數(shù))的矩陣诉字,有一個(gè)叫RNA的assay,在這個(gè)assay中伺帘,我們選擇了3000個(gè)基因作為variable features(一般用來計(jì)算PCA)能真,計(jì)算了三種降維:PCA, UMAP, t-SNE妄讯。
Assay
The Assay object is the basic unit of Seurat; each Assay stores raw, normalized, and scaled data as well as cluster information, variable features, and any other assay-specific metadata. Assays should contain single cell expression data such as RNA-seq, protein, or imputed expression data.
默認(rèn)情況下肠鲫,我們的seurat對(duì)象中是一個(gè)叫RNA的Assay。在我們處理數(shù)據(jù)的過程中,做整合(integration)幽歼,或者做變換(SCTransform)朵锣,或者做去除污染(SoupX),或者是融合velocity的數(shù)據(jù)等甸私,我們可能會(huì)生成新的相關(guān)的Assay诚些,用于存放這些處理之后的矩陣蔗喂。在之后的處理中谚赎,我們可以根據(jù)情況使用指定Assay下的數(shù)據(jù)。不指定Assay使用數(shù)據(jù)的時(shí)候冗尤, Seurat給我們調(diào)用的是Default Assay下的內(nèi)容弃鸦〗视酰可以通過對(duì)象名@active.assay查看當(dāng)前Default Assay,通過DefaultAssay函數(shù)更改當(dāng)前Default Assay唬格。Assay數(shù)據(jù)中家破,counts為raw,data為normalized购岗,scale為scaled汰聋。
調(diào)用Assay中的數(shù)據(jù)的方式為,以調(diào)取一個(gè)名為PBMC的Seurat對(duì)象中Assay integrate中的nomalized數(shù)據(jù)為例:
PBMC@assays$RNA@data
meta.data
元數(shù)據(jù)藕畔,對(duì)每個(gè)細(xì)胞的描述马僻。一般計(jì)算的nFeature_RNA等信息就以metafeature的形式存在Seurat對(duì)象的metadata中庄拇。計(jì)算的分類信息一般以RNA_snn_res.x(x指使用的resolution)存放在metadata中注服。
調(diào)取metadata中metafeature值的方式有多種,以調(diào)取一個(gè)名為PBMC的對(duì)象中stim這個(gè)metafeature為例:
方法1:PBMC[["stim"]]
方法2:PBMC$stim
reductions
降維之后的每個(gè)細(xì)胞的坐標(biāo)信息措近。
以調(diào)取一個(gè)名為PBMC的對(duì)象中PCA embedding (也就是坐標(biāo))信息為例:
PBMC@reductions$pca@cell.embeddings
rownames(object) 獲取的是全部基因
colnames(object)獲取的是全部細(xì)胞id
VariableFeatures(object)獲取當(dāng)前object的Variable feature
levels(object)獲取當(dāng)前object的分類信息?
(二)Seurat中包含的函數(shù)
Seurat 提供了非常豐富的函數(shù)來協(xié)助單細(xì)胞數(shù)據(jù)分析溶弟,我想先把這些函數(shù)主要分為下面幾種:
其一是用于提取數(shù)據(jù)的函數(shù)
????包括subset, WhichCell, VariableFeatures, Cells
其二是用于處理數(shù)據(jù)的函數(shù)
????包括NormalizeData, RunPCA,?RunUMAP
其三是用來展示數(shù)據(jù)的函數(shù)
????包括DotPlot, DoHeatmap, DimPlot, UMAPPlot, DimPlot, FeaturePlot
1 用于提取數(shù)據(jù)的函數(shù)
對(duì)Seurat對(duì)象結(jié)構(gòu)有所了解之后,我們其實(shí)可以直接在Seurat對(duì)象中提取數(shù)據(jù)瞭郑」加可能為了方便,Seurat也提供了一些函數(shù)來幫助我們提取一些我們想要的數(shù)據(jù)屈张。
這里用一些例子來做實(shí)際說明
1.1 提取細(xì)胞ID
獲取整個(gè)object的細(xì)胞ID:Cells(object)擒权,colnames(object)
按照idents獲取部分細(xì)胞ID:WhichCells(object, idents = c(1, 2))
按照基因表達(dá)獲取部分細(xì)胞ID:WhichCells(object, expression = gene1 > 1), WhichCells(object, expression = gene1 > 1, slot = "counts")
1.2 提取包含部分細(xì)胞的對(duì)象
按照細(xì)胞ID提取:subset(x = object, cells = cells)
按照idents提雀笞弧:subset(x = object, idents = c(1, 2))
按照meta.data中設(shè)置過的stim信息提忍汲:subset(x = object, stim == "Ctrl")
按照某一個(gè)resolution下的分群提取:subset(x = object, RNA_snn_res.2 == 2)
當(dāng)然還可以根據(jù)某個(gè)基因的表達(dá)量來提瘸÷獭:subset(x = object, gene1 > 1)剖效,subset(x = object, gene1 > 1, slot = "counts")
1.3 提取降維之后的坐標(biāo)信息
Embeddings(object = object[["pca"]])
Embeddings(object =?object[["umap"]])
2 用于處理數(shù)據(jù)的函數(shù)
Seurat作為單細(xì)胞數(shù)據(jù)處理的R包,用于處理數(shù)據(jù)的函數(shù)非常的豐富璧尸。這里做一些簡單的介紹和總結(jié)咒林。
2.1 標(biāo)準(zhǔn)化
一般用的是:NormalizeData()
可以選擇的另一種:SCTransform(),SCTransform也不是簡單的標(biāo)準(zhǔn)化數(shù)據(jù)爷光,這個(gè)函數(shù)會(huì)生成data, scale.data, VariableFeature, 然后存在一個(gè)叫SCT的assay里
2.2 降維
提供了包括RunPCA,?RunUMAP, RunTSNE垫竞,在每種降維算法里還可以選擇不同的方法
2.3 聚類
FindClusters()
2.4 差異分析
分析特定某些idents的差異:FindMarkers(object = object, ident.1 = 1, ident.2 = 2),?FindMarkers(object = object, ident.1 = c(1, 2), ident.2 =?c(3, 4))
分析每個(gè)ident和其他idents的差異:FindAllMarkers(object = object)
2.5 細(xì)胞周期
CellCycleScoring(object =?object, s.features = s.genes, g2m.features = g2m.genes)
2.6 基因集和的表達(dá)
Seurat給了兩種選擇去考慮一個(gè)基因集的整體表達(dá)情況。
計(jì)算基因模塊分值(Calculate module scores for feature expression programs in single cells):AddModuleScore(object = object, features = genes,? name =?"Module_Score")
合計(jì)基因集表達(dá)(Aggregate expression of multiple features into a single feature):MetaFeature(object = , features = genes, meta.name = "Aggregate_Feature")
3. 用來展示數(shù)據(jù)的函數(shù)
數(shù)據(jù)的展示對(duì)于數(shù)據(jù)分析來說尤為重要瞎颗,Seurat提供了多種類型的展示方式件甥,在結(jié)果展示方面給我們提供了很多選擇。而且由于Seurat很多展示方法的基礎(chǔ)是ggplot哼拔,所以除了Seurat函數(shù)固定的那些參數(shù)以外引有,我們還可以有很多個(gè)性化的操作。
3.1 展示降維信息
DimPlot(object = object, reduction = reduction.name, group.by = groups, label = T)
其中倦逐,展示特定降維方法:UMAPPlot, TSNEPlot
3.2?基于降維結(jié)果展示數(shù)據(jù)表達(dá)
FeaturePlot(object = object, features = c("gene1", "gene2", "gene3", "gene4"))譬正,也可以是meta.data里的值
3.3 Violin Plot
VlnPlot(object = merged, features = c("nFeature_RNA", "nCount_RNA", "gene1", "gene2"), ncol = 2, pt.size = 0.1)
VlnPlot(object = merged, features = c("gene1", "gene2", "gene3", "gene4"), ncol = 2, pt.size = 0.1, slot = "counts")
3.4 DotPlot?
DotPlot(object = object, features = genes)
3.5?DoHeatmap
DoHeatmap(object = object, features = genes)
由于不同cluster細(xì)胞數(shù)目不同,建議抽樣之后再畫
DoHeatmap(object =?object, features = genes, cells = downsampledCells)
3.6 Scatter Plot
兩個(gè)feature的Scatter: FeatureScatter(object =?object, feature1 = feature1, feature2 = feature2)
兩個(gè)細(xì)胞所有feature的Scatter:CellScatter(object = object, cell1 =?cell1, cell2 = cell2)?
reference:https://www.bilibili.com/read/cv7142541?spm_id_from=333.999.0.0