10xGenomics單細(xì)胞轉(zhuǎn)錄組
10xGenomics單細(xì)胞轉(zhuǎn)錄組分析的核心就是聚類(lèi)糊饱,但是單細(xì)胞轉(zhuǎn)錄組分析的聚類(lèi)到目前為止還存在很多困難和挑戰(zhàn)书释,具體可以參考文獻(xiàn)Challenges in unsupervised clustering of single-cell RNA-seq data。這里介紹的聚類(lèi)漠秋、亞群再分析使用的10xGenomics單細(xì)胞轉(zhuǎn)錄組最常用的seurat軟件包。
Hi,
Determining the "right" set of clusters for a single-cell dataset is a challenging problem and often requires interpretation from a biological viewpoint. As mentioned in #819, this article provides a good review on single cell clustering.
satijalab Further subdivisions within clusters #1192
10xGenomics單細(xì)胞轉(zhuǎn)錄組亞群細(xì)分策略
目前較為常見(jiàn)的方法有兩種策略:
- 調(diào)整亞群分辨率
- 亞群細(xì)胞提取出來(lái)国拇,重新從頭進(jìn)行聚類(lèi)
調(diào)整亞群分辨率
其實(shí)調(diào)整亞群聚類(lèi)分辨率來(lái)實(shí)現(xiàn)亞群細(xì)分疹尾,官方Guided Clustering Tutorial手冊(cè)有相關(guān)說(shuō)明上忍。
Further subdivisions within cell types
If you perturb some of our parameter choices above (for example, setting resolution=0.8 or changing the number of PCs), you might see the CD4 T cells subdivide into two groups. You can explore this subdivision to find markers separating the two T cell subsets. However, before reclustering (which will overwrite object@ident), we can stash our renamed identities to be easily recovered later.
# First lets stash our identities for later
pbmc <- StashIdent(object = pbmc, save.name = "ClusterNames_0.6")
# Note that if you set save.snn=T above, you don't need to recalculate the
# SNN, and can simply put: pbmc <- FindClusters(pbmc,resolution = 0.8)
pbmc <- FindClusters(object = pbmc, reduction.type = "pca", dims.use = 1:10,
resolution = 0.8, print.output = FALSE)
## Warning in BuildSNN(object = object, genes.use = genes.use, reduction.type
## = reduction.type, : Build parameters exactly match those of already
## computed and stored SNN. To force recalculation, set force.recalc to TRUE.
# Demonstration of how to plot two tSNE plots side by side, and how to color
# points based on different criteria
plot1 <- TSNEPlot(object = pbmc, do.return = TRUE, no.legend = TRUE, do.label = TRUE)
plot2 <- TSNEPlot(object = pbmc, do.return = TRUE, group.by = "ClusterNames_0.6",
no.legend = TRUE, do.label = TRUE)
plot_grid(plot1, plot2)
亞群細(xì)胞提取出來(lái),重新從頭進(jìn)行聚類(lèi)
這種方式就是要根據(jù)表達(dá)矩陣和聚類(lèi)文件纳本,把某一個(gè)聚類(lèi)的所有細(xì)胞表達(dá)矩陣提取出來(lái)窍蓝,然后重頭分析一遍,提取表達(dá)矩陣需要兩個(gè)文件:
- 細(xì)胞以及對(duì)應(yīng)聚類(lèi)編號(hào)csv文件]
- 所有細(xì)胞表達(dá)矩陣文件
細(xì)胞以及對(duì)應(yīng)聚類(lèi)編號(hào)csv文件:
一共兩列繁成,第一列為細(xì)胞barcode吓笙,第二列為聚類(lèi)編號(hào)。
表達(dá)矩陣文件
第一行為表頭巾腕,第一列為基因名稱(chēng)面睛,除了第一列以外,其他的每一列為一個(gè)細(xì)胞barcode祠墅。每一行為某個(gè)基因在所有細(xì)胞總的表達(dá)情況侮穿,對(duì)應(yīng)每個(gè)數(shù)字為該基因在該細(xì)胞中的表達(dá)量。
具體提取腳本毁嗦,會(huì)有另外文章說(shuō)明亲茅,這里不再概述。
提取表達(dá)量文件后狗准,重新按照pipeline進(jìn)行分析克锣,得到聚類(lèi)結(jié)果等。
特別說(shuō)明:
上述主要是針對(duì)單個(gè)樣品的亞群細(xì)分分析腔长,如果是有比較差異分析的話袭祟,還是需要提前表達(dá)矩陣或者S4對(duì)象,重新聚類(lèi)捞附、差異分析巾乳,這里是官方GitHub回復(fù)意見(jiàn):
You can certainly subset your data, and recalculate Variable Genes, scale, run PCA, and cluster.
Note that you can set ident.use = c(0, 1) to subset two clusters.
satijalab Re-clustering of given clusters #752