Seurat Weekly NO.2 || 我該如何取子集鳍咱?

Seurat Weekly NO.0 || 開刊詞
Seurat Weekly NO.1 || 到底分多少個(gè)群是合適的绅你?!

在過去的一周里Seurat社區(qū)在github總提問數(shù)由上周的3090上升到3116卒密,當(dāng)然有同一問題反復(fù)討論的情況,也有之前的問題還有人再問的情況棠赛,總的來說上周其在github中的issues往來郵件一共110封.本次主要和大家分享一下幾個(gè)哮奇,本期的封面故事是:Randomly downsample seurat object .

在這之前,我們先講幾個(gè)Seurat的tips睛约。讀入數(shù)據(jù)鼎俘,并人工生成兩個(gè)分組:

library(Seurat)
pbmc <- readRDS(file = "F:\\Rstudio\\SingleR\\data\\pbmc5k.rds")
pbmc
An object of class Seurat 
18792 features across 4666 samples within 1 assay 
Active assay: RNA (18792 features)
 3 dimensional reductions calculated: pca, tsne, umap


# pretend that cells were originally assigned to one of two replicates (we assign randomly here)
# if your cells do belong to multiple replicates, and you want to add this info to the Seurat
# object create a data frame with this information (similar to replicate.info below)
set.seed(42)
pbmc$replicate <- sample(c("rep1", "rep2"), size = ncol(pbmc), replace = TRUE)
head(pbmc@meta.data)

                   orig.ident nCount_RNA nFeature_RNA percent.mito percent.HB RNA_snn_res.0.6 seurat_clusters
AAACCCAAGCGTATGG-1     pbmc5k      13509         3498    10.659560          0               1               1
AAACCCAGTCCTACAA-1     pbmc5k      12637         3382     5.610509          0               1               1
AAACGCTAGGGCATGT-1     pbmc5k       5743         1798    10.691276          0               7               7
AAACGCTGTAGGTACG-1     pbmc5k      13107         2888     7.866026          0               9               9
AAACGCTGTGTCCGGT-1     pbmc5k      15510         3807     7.446809          0               3               3
AAACGCTGTGTGATGG-1     pbmc5k       6131         2348     9.982058          0               5               5
                   replicate
AAACCCAAGCGTATGG-1      rep1
AAACCCAGTCCTACAA-1      rep1
AAACGCTAGGGCATGT-1      rep1
AAACGCTGTAGGTACG-1      rep1
AAACGCTGTGTCCGGT-1      rep2
AAACGCTGTGTGATGG-1      rep2

分群和樣本間標(biāo)簽轉(zhuǎn)換

分群標(biāo)簽:

# Plot UMAP, coloring cells by cell type (currently stored in object@ident)
DimPlot(pbmc, reduction = "umap")

樣本標(biāo)簽:

# How do I create a UMAP plot where cells are colored by replicate?  First, store the current
# identities in a new column of meta.data called CellType
pbmc$CellType <- Idents(pbmc)
# Next, switch the identity class of all cells to reflect replicate ID
Idents(pbmc) <- "replicate"
DimPlot(pbmc, reduction = "umap")

這里注意Idents的應(yīng)用,可以直接指定meta.data某一列作為pbmc@active.ident.演示完了辩涝,我們再把標(biāo)簽換回來贸伐。

# alternately : DimPlot(pbmc, reduction = 'umap', group.by = 'replicate') you can pass the
# shape.by to label points by both replicate and cell type

# Switch back to cell type labels
Idents(pbmc) <- "CellType"

其實(shí)我們不轉(zhuǎn)換ident大部分的繪圖也可以用group.by來指定分群方式

DimPlot(pbmc, reduction = "umap",group.by = "replicate")

統(tǒng)計(jì)分群信息

# How many cells are in each cluster
table(Idents(pbmc))

  0    1    2    3    4    5    6    7    8    9   10   11   12 
1183  752  662  325  314  311  305  260  258  212   32   26   26 
# How many cells are in each replicate?
table(pbmc$replicate)

rep1 rep2 
2356 2310 
prop.table(table(Idents(pbmc)))

          0           1           2           3           4           5           6           7           8 
0.253536219 0.161165881 0.141877411 0.069652808 0.067295328 0.066652379 0.065366481 0.055722246 0.055293613 
          9          10          11          12 
0.045435062 0.006858123 0.005572225 0.005572225 
# How does cluster membership vary by replicate?
table(Idents(pbmc), pbmc$replicate)

     rep1 rep2
  0   599  584
  1   405  347
  2   315  347
  3   170  155
  4   159  155
  5   140  171
  6   138  167
  7   140  120
  8   132  126
  9   110  102
  10   18   14
  11   15   11
  12   15   11

prop.table(table(Idents(pbmc), pbmc$replicate), margin = 2)
    
            rep1        rep2
  0  0.254244482 0.252813853
  1  0.171901528 0.150216450
  2  0.133701188 0.150216450
  3  0.072156197 0.067099567
  4  0.067487267 0.067099567
  5  0.059422750 0.074025974
  6  0.058573854 0.072294372
  7  0.059422750 0.051948052
  8  0.056027165 0.054545455
  9  0.046689304 0.044155844
  10 0.007640068 0.006060606
  11 0.006366723 0.004761905
  12 0.006366723 0.004761905

常見的頻率統(tǒng)計(jì)圖:

as.data.frame(prop.table(table(Idents(pbmc), pbmc@meta.data[,"replicate"]), margin = 2))-> pdf -> td
library(tidyverse)
allcolour=c("#DC143C","#0000FF","#20B2AA","#FFA500","#9370DB","#98FB98","#F08080","#1E90FF","#7CFC00","#FFFF00","#808000","#FF00FF","#FA8072","#7B68EE","#9400D3","#800080","#A0522D","#D2B48C","#D2691E","#87CEEB","#40E0D0","#5F9EA0","#FF1493","#0000CD","#008B8B","#FFE4B5","#8A2BE2","#228B22","#E9967A","#4682B4","#32CD32","#F0E68C","#FFFFE0","#EE82EE","#FF6347","#6A5ACD","#9932CC","#8B008B","#8B4513","#DEB887")
allcolour -> colour1
plt<- ggplot(td,aes(x=td[,2],y=td[,3],fill=td[,1]))+
  geom_bar(position = 'stack',stat="identity")+
  labs(x="replicate",y="Cells Ratio")+
  theme(panel.background=element_rect(fill='transparent', color='black'),
        legend.key=element_rect(fill='transparent', color='transparent'),axis.text = element_text(color="black"))+
  scale_y_continuous(expand=c(0.001,0.001))+
  scale_fill_manual(values=colour1)+
  guides(fill = guide_legend(keywidth = 1, keyheight = 1,ncol=1,title = 'Cluster'))

plt

取子集

Randomly downsample seurat object

github: https://github.com/satijalab/seurat/issues/3108

pbmc[, sample(colnames(pbmc), size =30, replace=F)]
An object of class Seurat 
18792 features across 30 samples within 1 assay 
Active assay: RNA (18792 features)
 3 dimensional reductions calculated: pca, tsne, umap

當(dāng)我們進(jìn)入Seurat的進(jìn)一步分析的時(shí)候,就會遇到這種情況:

  • 取符合某種規(guī)律的細(xì)胞出來分析
  • 取符合某種規(guī)律的基因出來分析
  • 取符合某種規(guī)律的Seurat對象
  • 循環(huán)地取符合某種規(guī)律的對象
  • 隨機(jī)取細(xì)胞子集
  • 取Seurat對象中數(shù)據(jù)存儲某一部分
  • Seurat 在計(jì)算的過程中默認(rèn)值的過濾怔揩,特別是基因,如:

Genes filtering prior to Normalization #3104

https://github.com/satijalab/seurat/issues/3104

missing features in SCT scale data of integrated object #3056

https://github.com/satijalab/seurat/issues/3056

有時(shí)候在做了用了相應(yīng)的函數(shù)后發(fā)現(xiàn)基因數(shù)少了很多捉邢,其實(shí)是函數(shù)默認(rèn)值卡掉了脯丝,這個(gè)是可以自己設(shè)置的。

取子的動機(jī)很簡單伏伐,就是為了提出來重點(diǎn)分析:

  • 統(tǒng)計(jì)某類特征
  • 與其他分析結(jié)合

關(guān)于循環(huán)宠进,我們上周講了:Seurat Weekly NO.1 || 到底分多少個(gè)群是合適的?藐翎!

# What are the cell names of all 12 cells?
WhichCells(pbmc, idents = "12")
 [1] "AAGATAGTCTTTACAC-1" "AATTCCTAGGATCACG-1" "ACCGTTCTCTTAGCAG-1" "ACCTACCGTGGACCAA-1" "AGGACTTGTAGCGAGT-1" "AGGTAGGGTACTCGAT-1" "CATTCTAAGATCGGTG-1"
 [8] "CGGGTGTGTGTTACTG-1" "CTAAGTGTCCTCAGAA-1" "CTCAGGGTCAACCTTT-1" "CTCTCGACAGGTCCGT-1" "GAAATGAAGCGCACAA-1" "GAGGGTATCCTAGCTC-1" "GGATCTATCGGCTATA-1"
[15] "GGGTCTGAGCGACATG-1" "GGGTTATCATGGAGAC-1" "GTAGAGGTCAGGACAG-1" "GTAGGTTCAGGGTTGA-1" "GTTGCGGCACTTGAGT-1" "TAACACGCACTGCACG-1" "TAACTTCTCAGGTGTT-1"
[22] "TCGGGTGGTCTGTTAG-1" "TCTTTGACAAACCATC-1" "TTACAGGTCGCCTCTA-1" "TTCCAATTCCACGAAT-1" "TTCCTTCTCAACACGT-1"

特定群的count矩陣:

# How can I extract expression matrix for all ident  =  12  cells (perhaps, to load into another package)
subraw.data <- as.matrix(GetAssayData(pbmc, slot = "counts")[, WhichCells(pbmc, ident = "12")])
subraw.data[1:4,1:4]

              AAGATAGTCTTTACAC-1 AATTCCTAGGATCACG-1 ACCGTTCTCTTAGCAG-1 ACCTACCGTGGACCAA-1
RP11-34P13.7                   0                  0                  0                  0
FO538757.2                     0                  0                  0                  1
AP006222.2                     0                  1                  1                  0
RP4-669L17.10                  0                  0                  0                  0

Seurat 對象:

特定分組:

 subset(pbmc, subset = replicate == "rep2")
An object of class Seurat 
18792 features across 2310 samples within 1 assay 
Active assay: RNA (18792 features)
 3 dimensional reductions calculated: pca, tsne, umap

特定亞群:

# Can I create a Seurat object of just the 12 cells and 11 cells?
subset(pbmc, idents = c("12", "11"))
An object of class Seurat 
18792 features across 52 samples within 1 assay 
Active assay: RNA (18792 features)
 3 dimensional reductions calculated: pca, tsne, umap

表達(dá)量條件:

 # Can I create a Seurat object based on expression of a feature or value in object metadata?
 subset(pbmc, subset = MS4A1 > 1)
An object of class Seurat 
18792 features across 567 samples within 1 assay 
Active assay: RNA (18792 features)
 3 dimensional reductions calculated: pca, tsne, umap

排除法:

# Can I create a Seurat object of all cells except the 12 cells and 11  cells?

subset(pbmc, idents = c("12", "11"), invert = TRUE)
An object of class Seurat 
18792 features across 4614 samples within 1 assay 
Active assay: RNA (18792 features)
 3 dimensional reductions calculated: pca, tsne, umap

計(jì)算平均表達(dá)量

# How can I calculate the average expression of all cells within a cluster?
cluster.averages <- AverageExpression(pbmc)
head(cluster.averages[["RNA"]][, 1:5])

                       0            1           2           3           4
RP11-34P13.7  0.006605121 0.0127983446 0.010980787 0.008408694 0.006315117
FO538757.2    0.318666146 0.5629066563 0.323559434 0.596101470 0.378227054
AP006222.2    0.076024402 0.0882946264 0.088864328 0.080678612 0.082825181
RP4-669L17.10 0.005443071 0.0081874454 0.001254630 0.005508605 0.004664880
RP5-857K21.4  0.002683901 0.0008380836 0.001704552 0.000000000 0.004602852
RP11-206L10.9 0.099380365 0.0914327250 0.099045164 0.110635758 0.109613675

返回一個(gè)平均表達(dá)的Seurat對象:

# Return this information as a Seurat object (enables downstream plotting and analysis) First,
# replace spaces with underscores '_' so ggplot2 doesn't fail
#orig.levels <- levels(pbmc)
#Idents(pbmc) <- gsub(pattern = " ", replacement = "_", x = Idents(pbmc))
#orig.levels <- gsub(pattern = " ", replacement = "_", x = orig.levels)
#levels(pbmc) <- orig.levels
cluster.averages <- AverageExpression(pbmc, return.seurat = TRUE)
cluster.averages

An object of class Seurat 
18792 features across 13 samples within 1 assay 
Active assay: RNA (18792 features)

 head(cluster.averages@meta.data)
  orig.ident nCount_RNA nFeature_RNA
0    Average      10000        16897
1    Average      10000        15744
2    Average      10000        16344
3    Average      10000        15908
4    Average      10000        14570
5    Average      10000        14523

)

啊砰苍,為什么nCount_RNA都是10000?計(jì)算平均的數(shù)據(jù)用的solt一定是data了阱高,驗(yàn)證一下: ?AverageExpression果然赚导。

# How can I calculate expression averages separately for each replicate?
cluster.averages <- AverageExpression(pbmc, return.seurat = TRUE, add.ident = "replicate")

其他

細(xì)胞相關(guān)性

CellScatter(object = pbmc_small, cell1 = 'ATAGGAGAAACAGA', cell2 = 'CATCAGGATGCACA',smooth =T)

Identifying differential expressed genes across conditions #3111

https://github.com/satijalab/seurat/issues/3111#event-3412957227

What is most likely happening is that the gene may be expressed in a small number of cells, or have outlier expression in just a few cells. This can cause it to appear to have a high expression in the 'pseudobulk', but will not return a statistically significant result when using FindMarkers. We would suggest trusting the Findmarkers output when there are discrepancies like this, but you can explore the individual genes in more detail to see what is going on. For example, please feel free to examine one of the genes that concerns you and to post a violin plot comparing the expression of any gene across conditions

數(shù)據(jù)格式轉(zhuǎn)換R包:SeuratDisk

https://github.com/mojaveazure/seurat-disk

特別是Seurat and Scanpy.之間的對象傳輸。

喬治·修拉(Georges Seurat赤惊,1859—1891)吼旧,法國畫家
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個(gè)濱河市未舟,隨后出現(xiàn)的幾起案子圈暗,更是在濱河造成了極大的恐慌,老刑警劉巖裕膀,帶你破解...
    沈念sama閱讀 206,013評論 6 481
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件员串,死亡現(xiàn)場離奇詭異,居然都是意外死亡昼扛,警方通過查閱死者的電腦和手機(jī)寸齐,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 88,205評論 2 382
  • 文/潘曉璐 我一進(jìn)店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來抄谐,“玉大人渺鹦,你說我怎么就攤上這事∮己” “怎么了毅厚?”我有些...
    開封第一講書人閱讀 152,370評論 0 342
  • 文/不壞的土叔 我叫張陵,是天一觀的道長浦箱。 經(jīng)常有香客問我吸耿,道長,這世上最難降的妖魔是什么酷窥? 我笑而不...
    開封第一講書人閱讀 55,168評論 1 278
  • 正文 為了忘掉前任咽安,我火速辦了婚禮,結(jié)果婚禮上竖幔,老公的妹妹穿的比我還像新娘板乙。我一直安慰自己是偷,他們只是感情好拳氢,可當(dāng)我...
    茶點(diǎn)故事閱讀 64,153評論 5 371
  • 文/花漫 我一把揭開白布募逞。 她就那樣靜靜地躺著,像睡著了一般馋评。 火紅的嫁衣襯著肌膚如雪放接。 梳的紋絲不亂的頭發(fā)上,一...
    開封第一講書人閱讀 48,954評論 1 283
  • 那天留特,我揣著相機(jī)與錄音纠脾,去河邊找鬼。 笑死蜕青,一個(gè)胖子當(dāng)著我的面吹牛苟蹈,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播右核,決...
    沈念sama閱讀 38,271評論 3 399
  • 文/蒼蘭香墨 我猛地睜開眼慧脱,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了贺喝?” 一聲冷哼從身側(cè)響起菱鸥,我...
    開封第一講書人閱讀 36,916評論 0 259
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎躏鱼,沒想到半個(gè)月后氮采,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 43,382評論 1 300
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡染苛,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 35,877評論 2 323
  • 正文 我和宋清朗相戀三年鹊漠,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片茶行。...
    茶點(diǎn)故事閱讀 37,989評論 1 333
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡贸呢,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出拢军,到底是詐尸還是另有隱情楞陷,我是刑警寧澤,帶...
    沈念sama閱讀 33,624評論 4 322
  • 正文 年R本政府宣布茉唉,位于F島的核電站固蛾,受9級特大地震影響,放射性物質(zhì)發(fā)生泄漏度陆。R本人自食惡果不足惜艾凯,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 39,209評論 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望懂傀。 院中可真熱鬧趾诗,春花似錦、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,199評論 0 19
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至贝乎,卻和暖如春情连,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背览效。 一陣腳步聲響...
    開封第一講書人閱讀 31,418評論 1 260
  • 我被黑心中介騙來泰國打工却舀, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留,地道東北人锤灿。 一個(gè)月前我還...
    沈念sama閱讀 45,401評論 2 352
  • 正文 我出身青樓挽拔,卻偏偏與公主長得像,于是被迫代替她去往敵國和親但校。 傳聞我的和親對象是個(gè)殘疾皇子篱昔,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 42,700評論 2 345