Seurat Weekly NO.0 || 開刊詞
Seurat Weekly NO.1 || 到底分多少個(gè)群是合適的绅你?!
在過去的一周里Seurat社區(qū)在github總提問數(shù)由上周的3090上升到3116卒密,當(dāng)然有同一問題反復(fù)討論的情況,也有之前的問題還有人再問的情況棠赛,總的來說上周其在github中的issues往來郵件一共110封.本次主要和大家分享一下幾個(gè)哮奇,本期的封面故事是:Randomly downsample seurat object .
在這之前,我們先講幾個(gè)Seurat的tips睛约。讀入數(shù)據(jù)鼎俘,并人工生成兩個(gè)分組:
library(Seurat)
pbmc <- readRDS(file = "F:\\Rstudio\\SingleR\\data\\pbmc5k.rds")
pbmc
An object of class Seurat
18792 features across 4666 samples within 1 assay
Active assay: RNA (18792 features)
3 dimensional reductions calculated: pca, tsne, umap
# pretend that cells were originally assigned to one of two replicates (we assign randomly here)
# if your cells do belong to multiple replicates, and you want to add this info to the Seurat
# object create a data frame with this information (similar to replicate.info below)
set.seed(42)
pbmc$replicate <- sample(c("rep1", "rep2"), size = ncol(pbmc), replace = TRUE)
head(pbmc@meta.data)
orig.ident nCount_RNA nFeature_RNA percent.mito percent.HB RNA_snn_res.0.6 seurat_clusters
AAACCCAAGCGTATGG-1 pbmc5k 13509 3498 10.659560 0 1 1
AAACCCAGTCCTACAA-1 pbmc5k 12637 3382 5.610509 0 1 1
AAACGCTAGGGCATGT-1 pbmc5k 5743 1798 10.691276 0 7 7
AAACGCTGTAGGTACG-1 pbmc5k 13107 2888 7.866026 0 9 9
AAACGCTGTGTCCGGT-1 pbmc5k 15510 3807 7.446809 0 3 3
AAACGCTGTGTGATGG-1 pbmc5k 6131 2348 9.982058 0 5 5
replicate
AAACCCAAGCGTATGG-1 rep1
AAACCCAGTCCTACAA-1 rep1
AAACGCTAGGGCATGT-1 rep1
AAACGCTGTAGGTACG-1 rep1
AAACGCTGTGTCCGGT-1 rep2
AAACGCTGTGTGATGG-1 rep2
分群和樣本間標(biāo)簽轉(zhuǎn)換
分群標(biāo)簽:
# Plot UMAP, coloring cells by cell type (currently stored in object@ident)
DimPlot(pbmc, reduction = "umap")
樣本標(biāo)簽:
# How do I create a UMAP plot where cells are colored by replicate? First, store the current
# identities in a new column of meta.data called CellType
pbmc$CellType <- Idents(pbmc)
# Next, switch the identity class of all cells to reflect replicate ID
Idents(pbmc) <- "replicate"
DimPlot(pbmc, reduction = "umap")
這里注意Idents
的應(yīng)用,可以直接指定meta.data
某一列作為pbmc@active.ident
.演示完了辩涝,我們再把標(biāo)簽換回來贸伐。
# alternately : DimPlot(pbmc, reduction = 'umap', group.by = 'replicate') you can pass the
# shape.by to label points by both replicate and cell type
# Switch back to cell type labels
Idents(pbmc) <- "CellType"
其實(shí)我們不轉(zhuǎn)換ident大部分的繪圖也可以用group.by
來指定分群方式
DimPlot(pbmc, reduction = "umap",group.by = "replicate")
統(tǒng)計(jì)分群信息
# How many cells are in each cluster
table(Idents(pbmc))
0 1 2 3 4 5 6 7 8 9 10 11 12
1183 752 662 325 314 311 305 260 258 212 32 26 26
# How many cells are in each replicate?
table(pbmc$replicate)
rep1 rep2
2356 2310
prop.table(table(Idents(pbmc)))
0 1 2 3 4 5 6 7 8
0.253536219 0.161165881 0.141877411 0.069652808 0.067295328 0.066652379 0.065366481 0.055722246 0.055293613
9 10 11 12
0.045435062 0.006858123 0.005572225 0.005572225
# How does cluster membership vary by replicate?
table(Idents(pbmc), pbmc$replicate)
rep1 rep2
0 599 584
1 405 347
2 315 347
3 170 155
4 159 155
5 140 171
6 138 167
7 140 120
8 132 126
9 110 102
10 18 14
11 15 11
12 15 11
prop.table(table(Idents(pbmc), pbmc$replicate), margin = 2)
rep1 rep2
0 0.254244482 0.252813853
1 0.171901528 0.150216450
2 0.133701188 0.150216450
3 0.072156197 0.067099567
4 0.067487267 0.067099567
5 0.059422750 0.074025974
6 0.058573854 0.072294372
7 0.059422750 0.051948052
8 0.056027165 0.054545455
9 0.046689304 0.044155844
10 0.007640068 0.006060606
11 0.006366723 0.004761905
12 0.006366723 0.004761905
常見的頻率統(tǒng)計(jì)圖:
as.data.frame(prop.table(table(Idents(pbmc), pbmc@meta.data[,"replicate"]), margin = 2))-> pdf -> td
library(tidyverse)
allcolour=c("#DC143C","#0000FF","#20B2AA","#FFA500","#9370DB","#98FB98","#F08080","#1E90FF","#7CFC00","#FFFF00","#808000","#FF00FF","#FA8072","#7B68EE","#9400D3","#800080","#A0522D","#D2B48C","#D2691E","#87CEEB","#40E0D0","#5F9EA0","#FF1493","#0000CD","#008B8B","#FFE4B5","#8A2BE2","#228B22","#E9967A","#4682B4","#32CD32","#F0E68C","#FFFFE0","#EE82EE","#FF6347","#6A5ACD","#9932CC","#8B008B","#8B4513","#DEB887")
allcolour -> colour1
plt<- ggplot(td,aes(x=td[,2],y=td[,3],fill=td[,1]))+
geom_bar(position = 'stack',stat="identity")+
labs(x="replicate",y="Cells Ratio")+
theme(panel.background=element_rect(fill='transparent', color='black'),
legend.key=element_rect(fill='transparent', color='transparent'),axis.text = element_text(color="black"))+
scale_y_continuous(expand=c(0.001,0.001))+
scale_fill_manual(values=colour1)+
guides(fill = guide_legend(keywidth = 1, keyheight = 1,ncol=1,title = 'Cluster'))
plt
取子集
Randomly downsample seurat object
github: https://github.com/satijalab/seurat/issues/3108
pbmc[, sample(colnames(pbmc), size =30, replace=F)]
An object of class Seurat
18792 features across 30 samples within 1 assay
Active assay: RNA (18792 features)
3 dimensional reductions calculated: pca, tsne, umap
當(dāng)我們進(jìn)入Seurat的進(jìn)一步分析
的時(shí)候,就會遇到這種情況:
- 取符合某種規(guī)律的
細(xì)胞
出來分析 - 取符合某種規(guī)律的
基因
出來分析 - 取符合某種規(guī)律的
Seurat對象
-
循環(huán)
地取符合某種規(guī)律的對象 -
隨機(jī)
取細(xì)胞子集 - 取Seurat對象中數(shù)據(jù)存儲
某一部分
- Seurat 在計(jì)算的過程中默認(rèn)值的過濾怔揩,特別是基因,如:
Genes filtering prior to Normalization #3104
https://github.com/satijalab/seurat/issues/3104
missing features in SCT scale data of integrated object #3056
https://github.com/satijalab/seurat/issues/3056
有時(shí)候在做了用了相應(yīng)的函數(shù)后發(fā)現(xiàn)基因數(shù)少了很多捉邢,其實(shí)是函數(shù)默認(rèn)值卡掉了脯丝,這個(gè)是可以自己設(shè)置的。
取子的動機(jī)很簡單伏伐,就是為了提出來重點(diǎn)分析:
- 統(tǒng)計(jì)某類特征
- 與其他分析結(jié)合
關(guān)于循環(huán)宠进,我們上周講了:Seurat Weekly NO.1 || 到底分多少個(gè)群是合適的?藐翎!
# What are the cell names of all 12 cells?
WhichCells(pbmc, idents = "12")
[1] "AAGATAGTCTTTACAC-1" "AATTCCTAGGATCACG-1" "ACCGTTCTCTTAGCAG-1" "ACCTACCGTGGACCAA-1" "AGGACTTGTAGCGAGT-1" "AGGTAGGGTACTCGAT-1" "CATTCTAAGATCGGTG-1"
[8] "CGGGTGTGTGTTACTG-1" "CTAAGTGTCCTCAGAA-1" "CTCAGGGTCAACCTTT-1" "CTCTCGACAGGTCCGT-1" "GAAATGAAGCGCACAA-1" "GAGGGTATCCTAGCTC-1" "GGATCTATCGGCTATA-1"
[15] "GGGTCTGAGCGACATG-1" "GGGTTATCATGGAGAC-1" "GTAGAGGTCAGGACAG-1" "GTAGGTTCAGGGTTGA-1" "GTTGCGGCACTTGAGT-1" "TAACACGCACTGCACG-1" "TAACTTCTCAGGTGTT-1"
[22] "TCGGGTGGTCTGTTAG-1" "TCTTTGACAAACCATC-1" "TTACAGGTCGCCTCTA-1" "TTCCAATTCCACGAAT-1" "TTCCTTCTCAACACGT-1"
特定群的count矩陣:
# How can I extract expression matrix for all ident = 12 cells (perhaps, to load into another package)
subraw.data <- as.matrix(GetAssayData(pbmc, slot = "counts")[, WhichCells(pbmc, ident = "12")])
subraw.data[1:4,1:4]
AAGATAGTCTTTACAC-1 AATTCCTAGGATCACG-1 ACCGTTCTCTTAGCAG-1 ACCTACCGTGGACCAA-1
RP11-34P13.7 0 0 0 0
FO538757.2 0 0 0 1
AP006222.2 0 1 1 0
RP4-669L17.10 0 0 0 0
Seurat 對象:
特定分組:
subset(pbmc, subset = replicate == "rep2")
An object of class Seurat
18792 features across 2310 samples within 1 assay
Active assay: RNA (18792 features)
3 dimensional reductions calculated: pca, tsne, umap
特定亞群:
# Can I create a Seurat object of just the 12 cells and 11 cells?
subset(pbmc, idents = c("12", "11"))
An object of class Seurat
18792 features across 52 samples within 1 assay
Active assay: RNA (18792 features)
3 dimensional reductions calculated: pca, tsne, umap
表達(dá)量條件:
# Can I create a Seurat object based on expression of a feature or value in object metadata?
subset(pbmc, subset = MS4A1 > 1)
An object of class Seurat
18792 features across 567 samples within 1 assay
Active assay: RNA (18792 features)
3 dimensional reductions calculated: pca, tsne, umap
排除法:
# Can I create a Seurat object of all cells except the 12 cells and 11 cells?
subset(pbmc, idents = c("12", "11"), invert = TRUE)
An object of class Seurat
18792 features across 4614 samples within 1 assay
Active assay: RNA (18792 features)
3 dimensional reductions calculated: pca, tsne, umap
計(jì)算平均表達(dá)量
# How can I calculate the average expression of all cells within a cluster?
cluster.averages <- AverageExpression(pbmc)
head(cluster.averages[["RNA"]][, 1:5])
0 1 2 3 4
RP11-34P13.7 0.006605121 0.0127983446 0.010980787 0.008408694 0.006315117
FO538757.2 0.318666146 0.5629066563 0.323559434 0.596101470 0.378227054
AP006222.2 0.076024402 0.0882946264 0.088864328 0.080678612 0.082825181
RP4-669L17.10 0.005443071 0.0081874454 0.001254630 0.005508605 0.004664880
RP5-857K21.4 0.002683901 0.0008380836 0.001704552 0.000000000 0.004602852
RP11-206L10.9 0.099380365 0.0914327250 0.099045164 0.110635758 0.109613675
返回一個(gè)平均表達(dá)的Seurat對象:
# Return this information as a Seurat object (enables downstream plotting and analysis) First,
# replace spaces with underscores '_' so ggplot2 doesn't fail
#orig.levels <- levels(pbmc)
#Idents(pbmc) <- gsub(pattern = " ", replacement = "_", x = Idents(pbmc))
#orig.levels <- gsub(pattern = " ", replacement = "_", x = orig.levels)
#levels(pbmc) <- orig.levels
cluster.averages <- AverageExpression(pbmc, return.seurat = TRUE)
cluster.averages
An object of class Seurat
18792 features across 13 samples within 1 assay
Active assay: RNA (18792 features)
head(cluster.averages@meta.data)
orig.ident nCount_RNA nFeature_RNA
0 Average 10000 16897
1 Average 10000 15744
2 Average 10000 16344
3 Average 10000 15908
4 Average 10000 14570
5 Average 10000 14523
)
啊砰苍,為什么nCount_RNA
都是10000?計(jì)算平均的數(shù)據(jù)用的solt一定是data
了阱高,驗(yàn)證一下: ?AverageExpression
果然赚导。
# How can I calculate expression averages separately for each replicate?
cluster.averages <- AverageExpression(pbmc, return.seurat = TRUE, add.ident = "replicate")
其他
細(xì)胞相關(guān)性
CellScatter(object = pbmc_small, cell1 = 'ATAGGAGAAACAGA', cell2 = 'CATCAGGATGCACA',smooth =T)
Identifying differential expressed genes across conditions #3111
https://github.com/satijalab/seurat/issues/3111#event-3412957227
What is most likely happening is that the gene may be expressed in a small number of cells, or have outlier expression in just a few cells. This can cause it to appear to have a high expression in the 'pseudobulk', but will not return a statistically significant result when using FindMarkers. We would suggest trusting the Findmarkers output when there are discrepancies like this, but you can explore the individual genes in more detail to see what is going on. For example, please feel free to examine one of the genes that concerns you and to post a violin plot comparing the expression of any gene across conditions
數(shù)據(jù)格式轉(zhuǎn)換R包:SeuratDisk
https://github.com/mojaveazure/seurat-disk
特別是Seurat and Scanpy.之間的對象傳輸。