重生之我在劍橋大學學習單細胞RNA-seq分析——8. scRNA-seq數(shù)據(jù)整合(4)

8.8 Seurat v3, 3’ 10k PBMC和全血STRT-Seq
盡管我們的/data文件夾中已經(jīng)有了所有必要的文件,我們?nèi)钥梢詮腉EO數(shù)據(jù)庫下載必要的文件:

download.file("https://ftp.ncbi.nlm.nih.gov/geo/series/GSE149nnn/GSE149938/suppl/GSE149938_umi_matrix.csv.gz",
              destfile = "GSE149938_umi_matrix.csv.gz")
download.file("https://cf.10xgenomics.com/samples/cell-exp/4.0.0/Parent_NGSC3_DI_PBMC/Parent_NGSC3_DI_PBMC_filtered_feature_bc_matrix.h5",
              destfile = "3p_pbmc10k_filt.h5")
> umi_gz <- gzfile("data/update/GSE149938_umi_matrix.csv.gz",'rt')  
> umi <- read.csv(umi_gz,check.names = F,quote = "")
> matrix_3p    <- Read10X_h5("data/update/3p_pbmc10k_filt.h5",use.names = T)

接下來账锹,讓我們創(chuàng)建Seurat對象并重新定義一些元數(shù)據(jù)列(GEO數(shù)據(jù)集將細胞類型放入orig.ident slot掠哥,這會干擾我們接下來要做的事情):

> srat_wb <- CreateSeuratObject(t(umi),project = "whole_blood")
> srat_3p <- CreateSeuratObject(matrix_3p,project = "pbmc10k_3p")
> rm(umi_gz)
> rm(umi)
> rm(matrix_3p)
> colnames(srat_wb@meta.data)[1] <- "cell_type"
> srat_wb@meta.data$orig.ident <- "whole_blood"
> srat_wb@meta.data$orig.ident <- as.factor(srat_wb@meta.data$orig.ident)
> head(srat_wb[[]])
                   cell_type nCount_RNA nFeature_RNA  orig.ident
BNK_spBM1_L1_bar25       BNK      24494         1869 whole_blood
BNK_spBM1_L1_bar26       BNK      61980         2051 whole_blood
BNK_spBM1_L1_bar27       BNK     124382         3872 whole_blood
BNK_spBM1_L1_bar28       BNK       8144         1475 whole_blood
BNK_spBM1_L1_bar29       BNK      53612         2086 whole_blood
BNK_spBM1_L1_bar30       BNK      33582         2038 whole_blood

進行基本的QC。STRT-seq與10x有很大不同趾代,每個細胞檢測到的基因要多得多。此外,出于某種原因番电,全血數(shù)據(jù)集的矩陣中沒有MT基因,但這并不重要辆琅。

> srat_wb <- SetIdent(srat_wb,value = "orig.ident")
> srat_wb[["percent.mt"]] <- PercentageFeatureSet(srat_wb, pattern = "^MT-")
> srat_wb[["percent.rbp"]] <- PercentageFeatureSet(srat_wb, pattern = "^RP[SL]")
> srat_3p[["percent.mt"]] <- PercentageFeatureSet(srat_3p, pattern = "^MT-")
> srat_3p[["percent.rbp"]] <- PercentageFeatureSet(srat_3p, pattern = "^RP[SL]")
> VlnPlot(srat_wb, features = c("nFeature_RNA","nCount_RNA","percent.mt","percent.rbp"), ncol = 4)
> VlnPlot(srat_3p, features = c("nFeature_RNA","nCount_RNA","percent.mt","percent.rbp"), ncol = 4)

用于處理GEO全血數(shù)據(jù)集的注釋文件與Cell Ranger GRCh38-2020A有很大不同漱办。讓我們看看有多少個共有基因:

> table(rownames(srat_3p) %in% rownames(srat_wb))

FALSE  TRUE 
18050 18551
> common_genes <- rownames(srat_3p)[rownames(srat_3p) %in% rownames(srat_wb)]

過濾基因數(shù)量過高或過低、MT基因含量過高的細胞婉烟。另外娩井,讓我們將單個矩陣限制為僅包含共有基因:

> srat_3p <- subset(srat_3p, subset = nFeature_RNA > 500 & nFeature_RNA < 5000 & percent.mt < 15)
> srat_wb <- subset(srat_wb, subset = nFeature_RNA > 1000 & nFeature_RNA < 6000)
> srat_3p <- srat_3p[rownames(srat_3p) %in% common_genes,]
> srat_wb <- srat_wb[rownames(srat_wb) %in% common_genes,]

與之前的Seurat v3一樣,讓我們列出一個列表并為每個對象標準化/篩選HVG:

> wb_list <- list()
> wb_list[["pbmc10k_3p"]]   <- srat_3p
> wb_list[["whole_blood"]]  <- srat_wb
> for (i in 1:length(wb_list)) {
    wb_list[[i]] <- NormalizeData(wb_list[[i]], verbose = F)
    wb_list[[i]] <- FindVariableFeatures(wb_list[[i]], selection.method = "vst", nfeatures = 2000, verbose = F)
  }

接下來進行整合似袁,Seurat 3分兩步完成洞辣。

> wb_anchors <- FindIntegrationAnchors(object.list = wb_list, dims = 1:30)
> wb_seurat  <- IntegrateData(anchorset = wb_anchors, dims = 1:30)
> rm(wb_list)
> rm(wb_anchors)

對未整合的數(shù)據(jù)集進行基本的處理和可視化:

> DefaultAssay(wb_seurat) <- "RNA"
> wb_seurat <- NormalizeData(wb_seurat, verbose = F)
> wb_seurat <- FindVariableFeatures(wb_seurat, selection.method = "vst", nfeatures = 2000, verbose = F)
> wb_seurat <- ScaleData(wb_seurat, verbose = F)
> wb_seurat <- RunPCA(wb_seurat, npcs = 30, verbose = F)
> wb_seurat <- RunUMAP(wb_seurat, reduction = "pca", dims = 1:30, verbose = F) 
> DimPlot(wb_seurat,reduction = "umap") + plot_annotation(title = "10k 3' PBMC and whole blood, before integration")

現(xiàn)在,讓我們看一下整合的數(shù)據(jù)集:

> DefaultAssay(wb_seurat) <- "integrated"
> wb_seurat <- ScaleData(wb_seurat, verbose = F)
> wb_seurat <- RunPCA(wb_seurat, npcs = 30, verbose = F)
> wb_seurat <- RunUMAP(wb_seurat, reduction = "pca", dims = 1:30, verbose = F)
> DimPlot(wb_seurat, reduction = "umap") + plot_annotation(title = "10k 3' PBMC and white blood cells, after integration (Seurat 3)")

讓我們看一些marker:

> FeaturePlot(wb_seurat,c("MS4A1","LYZ","NKG7","PPBP","LTF","HBA1","FCER1A","IL7R","FCGR3B")) & scale_colour_gradientn(colours = rev(brewer.pal(n = 11, name = "Spectral")))

從圖中我們可以看出昙衅,有一些重要的細胞類型在PBMC數(shù)據(jù)集中不存在扬霜,但存在于全血數(shù)據(jù)集中。LTF基因是中性粒細胞最顯著的marker而涉,HBA1是在紅細胞中表達的血紅蛋白基因著瓶。
現(xiàn)在讓我們對整合矩陣進行聚類,并查看聚類在兩個數(shù)據(jù)集之間的分布情況:

> wb_seurat <- FindNeighbors(wb_seurat, dims = 1:30, k.param = 10, verbose = F)
> wb_seurat <- FindClusters(wb_seurat, verbose = F)
> DimPlot(wb_seurat,label = T) + NoLegend()

聚類組成顯示了全血數(shù)據(jù)集獨有的許多聚類:

> count_table <- table(wb_seurat@meta.data$seurat_clusters, wb_seurat@meta.data$orig.ident)
> count_table
    
     pbmc10k_3p whole_blood
  0        1426         237
  1        1385          71
  2        1264         130
  3        1211         112
  4        1115         145
  5           0        1052
  6         355         467
  7         377         211
  8         386         199
  9           0         550
  10        343         157
  11        390          82
  12          0         441
  13        283         125
  14          7         388
  15          3         380
  16         19         362
  17          4         367
  18          2         316
  19        297          13
  20          0         308
  21          0         265
  22          0         222
  23          0         221
  24         15         179
  25        106          22
  26         93          19
  27          0         103
  28         77           3
  29          0          50
  30         32           2
  31          0          32
> plot_integrated_clusters(wb_seurat)

我們可以利用GSE149938中存在的元數(shù)據(jù):

> meta <- wb_seurat[[]]
> table(meta[meta$seurat_clusters == '5',]$cell_type) ## erythrocytes

   BNK    CMP    ery   immB    MEP    MLP   preB   proB toxiNK 
     1      2   1040      3      1      1      2      1      1 
> table(meta[meta$seurat_clusters == '20',]$cell_type) ## neutrophils

    BNK     CLP     CMP     HSC    immB  kineNK matureN   metaN     MLP    myeN 
      2       2       1       1       1      55       1       7       1      99 
 plasma    proN  toxiNK 
      1     136       1 
> table(meta[meta$seurat_clusters == '24',]$cell_type) ## plasma

   ery   naiB plasma 
    11      2    166 
> table(meta[meta$seurat_clusters == '16',]$cell_type) ## platelets

    BNK     CLP     CMP     ery     HSC    LMPP matureN     MEP     MPP     NKP 
      1       3      61       4      72       1       1     144      74       1
> rm(wb_seurat)

8.9 Harmony, 3’ 10k PBMC和全血STRT-Seq
首先創(chuàng)建一個合并的Seurat數(shù)據(jù)集啼县,并對其進行標準化和后續(xù)處理:

> wb_harmony    <- merge(srat_3p,srat_wb)
> wb_harmony <- NormalizeData(wb_harmony, verbose = F)
> wb_harmony <- FindVariableFeatures(wb_harmony, selection.method = "vst", nfeatures = 2000, verbose = F)
> wb_harmony <- ScaleData(wb_harmony, verbose = F)
> wb_harmony <- RunPCA(wb_harmony, npcs = 30, verbose = F)
> wb_harmony <- RunUMAP(wb_harmony, reduction = "pca", dims = 1:30, verbose = F)

看一下PCA圖的變化材原,以及沿第一個主成分的分布:

> p1 <- DimPlot(object = wb_harmony, reduction = "pca", pt.size = .1, group.by = "orig.ident") + NoLegend()
> p2 <- VlnPlot(object = wb_harmony, features = "PC_1", group.by = "orig.ident", pt.size = .1) + NoLegend()
> plot_grid(p1,p2)

UMAP也顯示了數(shù)據(jù)集之間的明顯差異:

> DimPlot(wb_harmony,reduction = "umap") + plot_annotation(title = "10k 3' PBMC and whole blood, before integration")

使用SeuratWrappers庫中名為RunHarmony的函數(shù)來運行harmony

> wb_harmony <- wb_harmony %>% RunHarmony("orig.ident", plot_convergence = T)

這會生成我們稍后將用于所有下游分析的降維結(jié)果沸久。

> harmony_embeddings <- Embeddings(wb_harmony, 'harmony')
> harmony_embeddings[1:5, 1:5]
                   harmony_1  harmony_2 harmony_3   harmony_4  harmony_5
AAACCCACATAACTCG-1  2.592137 -1.6045869  3.192993 -0.03594751  3.8447941
AAACCCACATGTAACC-1  4.244764  3.2122684 -9.738222 -5.90380632  0.9607984
AAACCCAGTGAGTCAG-1  3.208084 -2.1054920  2.035577  1.90984384  5.3665634
AAACGAACAGTCAGTT-1 -1.106694  1.8151680  3.092745 -2.34038488  7.6785360
AAACGAACATTCGGGC-1  4.735928 -0.4421468 -2.196355  2.77622970 -3.2385050

校正后的PCA和分布:

> p1 <- DimPlot(object = wb_harmony, reduction = "harmony", pt.size = .1, group.by = "orig.ident") + NoLegend()
> p2 <- VlnPlot(object = wb_harmony, features = "harmony_1", group.by = "orig.ident", pt.size = .1) + NoLegend()
> plot_grid(p1,p2)

進行UMAP降維和Louvain聚類:

> wb_harmony <- wb_harmony %>% 
    RunUMAP(reduction = "harmony", dims = 1:30, verbose = F) %>% 
    FindNeighbors(reduction = "harmony", k.param = 10, dims = 1:30) %>% 
    FindClusters() %>% 
    identity()
> wb_harmony <- SetIdent(wb_harmony,value = "orig.ident")
> DimPlot(wb_harmony,reduction = "umap") + plot_annotation(title = "10k 3' PBMC and whole blood, after integration (Harmony)")
> DimPlot(wb_harmony, reduction = "umap", group.by = "orig.ident", pt.size = .1, split.by = 'orig.ident') + NoLegend()

該數(shù)據(jù)集的整合結(jié)果看起來與Seurat v3非常相似:

> wb_harmony <- SetIdent(wb_harmony,value = "seurat_clusters")
> DimPlot(wb_harmony,label = T) + NoLegend()

更詳細的聚類檢查似乎也證實了這一點:

> plot_integrated_clusters(wb_harmony)
> rm(wb_harmony)

8.10 LIGER, 3’ 10k PBMC和全血STRT-Seq
最后,使用LIGER進行數(shù)據(jù)整合华糖。此步驟需要一些時間來運行:

> wb_liger    <- merge(srat_3p,srat_wb) 
> wb_liger    <- NormalizeData(wb_liger)
> wb_liger    <- FindVariableFeatures(wb_liger)
> wb_liger    <- ScaleData(wb_liger, split.by = "orig.ident", do.center = F)
> wb_liger    <- RunOptimizeALS(wb_liger, k = 30, lambda = 5, split.by = "orig.ident")
> wb_liger    <- RunQuantileNorm(wb_liger, split.by = "orig.ident")

然后麦向,使用與之前類似的設置執(zhí)行Louvain聚類(FindNeighborsFindClusters):

> wb_liger    <- FindNeighbors(wb_liger,reduction = "iNMF",k.param = 10,dims = 1:30)
> wb_liger    <- FindClusters(wb_liger)

讓我們用幾種不同的方式來看一下整合后的UMAP圖:

> wb_liger    <- RunUMAP(wb_liger, dims = 1:ncol(wb_liger[["iNMF"]]), reduction = "iNMF",verbose = F)
> wb_liger <- SetIdent(wb_liger,value = "orig.ident")
> DimPlot(wb_liger,reduction = "umap") + plot_annotation(title = "10k 3' PBMC and 10k 5' PBMC cells, after integration (LIGER)")
> DimPlot(wb_liger, reduction = "umap", group.by = "orig.ident", pt.size = .1, split.by = 'orig.ident') + NoLegend()

最后,看一下每個cluster的分布:

> plot_integrated_clusters(wb_liger)
> rm(wb_liger)
> rm(srat_wb)
> rm(srat_3p)

往期內(nèi)容:
重生之我在劍橋大學學習單細胞RNA-seq分析——8. scRNA-seq數(shù)據(jù)整合(1)
重生之我在劍橋大學學習單細胞RNA-seq分析——8. scRNA-seq數(shù)據(jù)整合(2)
重生之我在劍橋大學學習單細胞RNA-seq分析——8. scRNA-seq數(shù)據(jù)整合(3)

?著作權歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末客叉,一起剝皮案震驚了整個濱河市诵竭,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌兼搏,老刑警劉巖卵慰,帶你破解...
    沈念sama閱讀 218,386評論 6 506
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場離奇詭異佛呻,居然都是意外死亡裳朋,警方通過查閱死者的電腦和手機,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 93,142評論 3 394
  • 文/潘曉璐 我一進店門吓著,熙熙樓的掌柜王于貴愁眉苦臉地迎上來鲤嫡,“玉大人,你說我怎么就攤上這事绑莺∨郏” “怎么了?”我有些...
    開封第一講書人閱讀 164,704評論 0 353
  • 文/不壞的土叔 我叫張陵纺裁,是天一觀的道長诫肠。 經(jīng)常有香客問我,道長欺缘,這世上最難降的妖魔是什么栋豫? 我笑而不...
    開封第一講書人閱讀 58,702評論 1 294
  • 正文 為了忘掉前任,我火速辦了婚禮谚殊,結(jié)果婚禮上丧鸯,老公的妹妹穿的比我還像新娘。我一直安慰自己嫩絮,他們只是感情好丛肢,可當我...
    茶點故事閱讀 67,716評論 6 392
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著絮记,像睡著了一般摔踱。 火紅的嫁衣襯著肌膚如雪虐先。 梳的紋絲不亂的頭發(fā)上怨愤,一...
    開封第一講書人閱讀 51,573評論 1 305
  • 那天,我揣著相機與錄音蛹批,去河邊找鬼撰洗。 笑死篮愉,一個胖子當著我的面吹牛,可吹牛的內(nèi)容都是我干的差导。 我是一名探鬼主播试躏,決...
    沈念sama閱讀 40,314評論 3 418
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼设褐!你這毒婦竟也來了颠蕴?” 一聲冷哼從身側(cè)響起,我...
    開封第一講書人閱讀 39,230評論 0 276
  • 序言:老撾萬榮一對情侶失蹤助析,失蹤者是張志新(化名)和其女友劉穎犀被,沒想到半個月后,有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體外冀,經(jīng)...
    沈念sama閱讀 45,680評論 1 314
  • 正文 獨居荒郊野嶺守林人離奇死亡寡键,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 37,873評論 3 336
  • 正文 我和宋清朗相戀三年,在試婚紗的時候發(fā)現(xiàn)自己被綠了雪隧。 大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片西轩。...
    茶點故事閱讀 39,991評論 1 348
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖脑沿,靈堂內(nèi)的尸體忽然破棺而出藕畔,到底是詐尸還是另有隱情,我是刑警寧澤捅伤,帶...
    沈念sama閱讀 35,706評論 5 346
  • 正文 年R本政府宣布劫流,位于F島的核電站床估,受9級特大地震影響堕虹,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜失暂,卻給世界環(huán)境...
    茶點故事閱讀 41,329評論 3 330
  • 文/蒙蒙 一熄诡、第九天 我趴在偏房一處隱蔽的房頂上張望可很。 院中可真熱鬧,春花似錦凰浮、人聲如沸我抠。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,910評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽菜拓。三九已至,卻和暖如春笛厦,著一層夾襖步出監(jiān)牢的瞬間纳鼎,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 33,038評論 1 270
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留贱鄙,地道東北人劝贸。 一個月前我還...
    沈念sama閱讀 48,158評論 3 370
  • 正文 我出身青樓,卻偏偏與公主長得像逗宁,于是被迫代替她去往敵國和親映九。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當晚...
    茶點故事閱讀 44,941評論 2 355

推薦閱讀更多精彩內(nèi)容