Seurat - Dimensional Reduction Vignette
我們知道單細(xì)胞轉(zhuǎn)錄組數(shù)據(jù)一個(gè)主要的特點(diǎn)就是數(shù)據(jù)稀疏,維度較高巡语◆岬福基于此,Seurat提供了不少降維的方法:
主要是PCA,TSNE,UMAP三種男公,其實(shí)降維方法何其的多:
那么荤堪,我們?nèi)绻雽?duì)我們的數(shù)據(jù)應(yīng)用其他降維方法,我們需要如何操作呢枢赔?今天我們就帶大家走一走澄阳,Seurat對(duì)象的【multi-dimensional scaling (MDS)】降維方法。若要求原始空間中樣本之間的距離在低維空間中得以保持踏拜,即得到"多維縮放" (Multiple Dimensional Scaling碎赢,簡(jiǎn)稱 MDS),基于此速梗,來(lái)探究降維的一般方法以及進(jìn)一步了解Seurat的數(shù)據(jù)結(jié)構(gòu)肮塞。
什么,PCA姻锁,TSNE枕赵,UMAP我還沒(méi)搞明白呢? MDS是什么意思位隶?看看運(yùn)來(lái)哥上一段感情經(jīng)歷的筆記啊:
數(shù)量生態(tài)學(xué)筆記||非約束排序|NMDS
Seurat3 中的降維結(jié)構(gòu)
在Seurat v3.0中拷窜,存儲(chǔ)和與維度縮減信息的交互已經(jīng)被一般化并正式化為DimReduc
對(duì)象。每個(gè)維度縮減過(guò)程作為一個(gè)命名列表的元素存儲(chǔ)在object@slot
中的DimReduc
對(duì)象中。訪問(wèn)這些縮減可以通過(guò)[[
操作符調(diào)用所需的縮減的名稱來(lái)完成篮昧。例如赋荆,在使用RunPCA
運(yùn)行主成分分析之后,object[['pca']]
將包含pca的結(jié)果恋谭。通過(guò)向列表中添加新元素糠睡,用戶可以添加額外的、自定義的維度縮減疚颊。每個(gè)存儲(chǔ)的維度縮減包含以下slot:
-
cell.embeddings:
stores the coordinates for each cell in low-dimensional space. -
feature.loadings:
stores the weight for each feature along each dimension of the embedding -
feature.loadings.projected:
Seurat typically calculate the dimensional reduction on a subset of genes (for example, high-variance genes), and then project that structure onto the entire dataset (all genes). The results of that projection (calculated withProjectDim
) are stored in this slot. Note that the cell loadings will remain unchanged after projection but there are now feature loadings for all feature -
stdev:
The standard deviations of each dimension. Most often used with PCA (storing the square roots of the eigenvalues of the covariance matrix) and can be useful when looking at the drop off in the amount of variance that is explained by each successive dimension. -
key:
Sets the column names for the cell.embeddings and feature.loadings matrices. For example, for PCA, the column names are PC1, PC2, etc., so the key is “PC”. -
jackstraw:
Stores the results of the jackstraw procedure run using this dimensional reduction technique. Currently supported only for PCA. -
misc:
Bonus slot to store any other information you might want
為了訪問(wèn)這些插槽狈孔,我們提供了Embeddings
、Loadings
和Stdev
函數(shù):
library(Seurat)
pbmc_small[["pca"]]
A dimensional reduction object with key PC_
Number of dimensions: 19
Projected dimensional reduction calculated: TRUE
Jackstraw run: TRUE
Computed using assay: RNA
我們用相應(yīng)的函數(shù)方法來(lái)查看一下啊
> head(Embeddings(pbmc_small, reduction = "pca")[, 1:5]) # 細(xì)胞 PCA坐標(biāo)值
PC_1 PC_2 PC_3 PC_4 PC_5
ATGCCAGAACGACT -0.77403708 -0.8996461 -0.2493078 0.5585948 0.4650838
CATGGCCTGTGCAT -0.02602702 -0.3466795 0.6651668 0.4182900 0.5853204
GAACCTGATGAACC -0.45650250 0.1795811 1.3175907 2.0137210 -0.4818851
TGACTGGATTCTCA -0.81163243 -1.3795340 -1.0019320 0.1390503 -1.5982232
AGTCAGACTGCACA -0.77403708 -0.8996461 -0.2493078 0.5585948 0.4650838
TCTGATACACGTGT -0.77403708 -0.8996461 -0.2493078 0.5585948 0.4650838
> head(Loadings(pbmc_small, reduction = "pca")[, 1:5]) # 基因在每個(gè)主成分中的loading值
PC_1 PC_2 PC_3 PC_4 PC_5
PPBP 0.33832535 0.04095778 0.02926261 0.03111034 -0.090420744
IGLL5 -0.03504289 0.05815335 -0.29906272 0.54744454 0.214603428
VDAC3 0.11990482 -0.10994433 -0.02386025 0.06015126 -0.809207588
CD1C -0.04690284 0.19835522 -0.35090617 -0.51112169 -0.130306281
AKR1C3 -0.03894635 -0.42880452 0.08845847 -0.27274386 0.087791646
PF4 0.34392057 0.02474860 -0.02519515 -0.01231411 -0.006725932
> head(Stdev(pbmc_small, reduction = "pca")) # 標(biāo)準(zhǔn)差
[1] 2.7868782 1.6145733 1.3162945 1.1241143 1.0347596 0.9876531
Seurat提供了RunPCA (pca)和RunTSNE (tsne)材义,并表示了通常應(yīng)用于scRNA-seq數(shù)據(jù)的降維技術(shù)均抽。當(dāng)使用這些功能時(shí),所有插槽都會(huì)自動(dòng)填充其掂。
我們還允許用戶添加單獨(dú)計(jì)算的自定義維縮減技術(shù)的結(jié)果(例如油挥,多維縮放(MDS)或零膨脹因子分析)。您所需要的只是一個(gè)矩陣款熬,其中包含低維空間中每個(gè)單元的坐標(biāo)深寥,如下所示.
存儲(chǔ)自定義維度縮減計(jì)算
Classical (Metric) Multidimensional Scaling
Classical multidimensional scaling (MDS) of a data matrix. Also known as principal coordinates analysis (Gower, 1966).
雖然不是作為Seurat包的一部分,但它很容易在r中運(yùn)行多維縮放(MDS)贤牛。如果你有興趣運(yùn)行MDS并將輸出存儲(chǔ)在Seurat對(duì)象中:
# Before running MDS, we first calculate a distance matrix between all pairs of cells. Here we
# use a simple euclidean distance metric on all genes, using scale.data as input
d <- dist(t(GetAssayData(pbmc_small, slot = "scale.data")))
# Run the MDS procedure, k determines the number of dimensions
mds <- cmdscale(d = d, k = 2)
head(mds)
[,1] [,2]
ATGCCAGAACGACT 0.77403708 -0.8996461
CATGGCCTGTGCAT 0.02602702 -0.3466795
GAACCTGATGAACC 0.45650250 0.1795811
TGACTGGATTCTCA 0.81163243 -1.3795340
AGTCAGACTGCACA 0.77403708 -0.8996461
TCTGATACACGTGT 0.77403708 -0.8996461
# cmdscale returns the cell embeddings, we first label the columns to ensure downstream
# consistency
colnames(mds) <- paste0("MDS_", 1:2)
# We will now store this as a custom dimensional reduction called 'mds'
pbmc_small[["mds"]] <- CreateDimReducObject(embeddings = mds, key = "MDS_", assay = DefaultAssay(pbmc_small))
pbmc_small
An object of class Seurat
230 features across 80 samples within 1 assay
Active assay: RNA (230 features)
3 dimensional reductions calculated: pca, tsne, mds
我們的對(duì)象中已經(jīng)有了mds
這個(gè)slot了惋鹅,下面我們像pca , tsne. umap,那樣可視化它:
# We can now use this as you would any other dimensional reduction in all downstream functions
DimPlot(pbmc_small, reduction = "mds", pt.size = 0.5)
pbmc_small <- ProjectDim(pbmc_small, reduction = "mds")
MDS_ 1
Positive: HLA-DPB1, HLA-DQA1, S100A9, S100A8, GNLY, RP11-290F20.3, CD1C, AKR1C3, IGLL5, VDAC3
PARVB, RUFY1, PGRMC1, MYL9, TREML1, CA2, TUBB1, PPBP, PF4, SDPR
Negative: SDPR, PF4, PPBP, TUBB1, CA2, TREML1, MYL9, PGRMC1, RUFY1, PARVB
VDAC3, IGLL5, AKR1C3, CD1C, RP11-290F20.3, GNLY, S100A8, S100A9, HLA-DQA1, HLA-DPB1
MDS_ 2
Positive: HLA-DPB1, HLA-DQA1, S100A8, S100A9, CD1C, RP11-290F20.3, PARVB, IGLL5, MYL9, SDPR
PPBP, CA2, RUFY1, TREML1, PF4, TUBB1, PGRMC1, VDAC3, AKR1C3, GNLY
Negative: GNLY, AKR1C3, VDAC3, PGRMC1, TUBB1, PF4, TREML1, RUFY1, CA2, PPBP
SDPR, MYL9, IGLL5, PARVB, RP11-290F20.3, CD1C, S100A9, S100A8, HLA-DQA1, HLA-DPB1
Warning message:
In print.DimReduc(x = redeuc, dims = dims.print, nfeatures = nfeatures.print, :
Only 2 dimensions have been computed.
# Display the results as a heatmap
DimHeatmap(pbmc_small, reduction = "mds", dims = 1, cells = 500, projected = TRUE, balanced = TRUE)
VlnPlot(pbmc_small, features = "MDS_1")
查看MDS1維度如何與PC1維度相關(guān)性:
# See how the first MDS dimension is correlated with the first PC dimension
FeatureScatter(pbmc_small, feature1 = "MDS_1", feature2 = "PC_1")
FeatureScatter(pbmc_small, feature1 = "MDS_1", feature2 = "tSNE_1")