2021-05-26 單細(xì)胞分析之harmony與Seurat

參考:生信會(huì)客廳

harmony原理

Harmony需要輸入低維空間的坐標(biāo)值(embedding)弯院,一般使用PCA的降維結(jié)果缸兔。Harmony導(dǎo)入PCA的降維數(shù)據(jù)后瘟裸,會(huì)采用soft k-means clustering算法將細(xì)胞聚類学辱。常用的聚類算法僅考慮細(xì)胞在低維空間的距離缎除,但是soft clustering算法會(huì)考慮我們提供的校正因素域慷。這就好比我們的高考加分制度荒辕,小明高考成績(jī)本來(lái)達(dá)不到A大學(xué)的錄取分?jǐn)?shù)線,但是他有一項(xiàng)省級(jí)競(jìng)賽一等獎(jiǎng)加10分就夠線了犹褒。同樣的道理抵窒,細(xì)胞c2距離cluster1有點(diǎn)遠(yuǎn),本來(lái)不能算作cluster1的一份子叠骑;但是c2和cluster1的細(xì)胞來(lái)自不同的數(shù)據(jù)集李皇,因?yàn)槲覀兤谕煌臄?shù)據(jù)集融合,所以破例讓它加入cluster1了宙枷。聚類之后先計(jì)算每個(gè)cluster內(nèi)各個(gè)數(shù)據(jù)集的細(xì)胞的中心點(diǎn)掉房,然后根據(jù)這些中心點(diǎn)計(jì)算各個(gè)cluster的中心點(diǎn)。最后通過(guò)算法讓cluster內(nèi)的細(xì)胞向中心聚集慰丛,實(shí)在收斂不了的離群細(xì)胞就過(guò)濾掉卓囚。調(diào)整之后的數(shù)據(jù)重復(fù):聚類—計(jì)算cluster中心點(diǎn)—收斂細(xì)胞—聚類的過(guò)程,不斷迭代直至聚類效果趨于穩(wěn)定诅病。

  • 從第三方的評(píng)測(cè)結(jié)果來(lái)看哪亿,harmony的整合效果可以與seurat的錨點(diǎn)整合方法媲美,并且運(yùn)行時(shí)間方面具有明顯優(yōu)勢(shì)贤笆,不過(guò)內(nèi)存消耗好像沒(méi)有那么夸張的優(yōu)秀蝇棉。(哈哈我的電腦用seurat錨點(diǎn)整合被卡住了,試了一下harmony芥永,時(shí)間挺快的篡殷,而且電腦也帶得動(dòng)(記錄一下學(xué)習(xí)筆記)
library(harmony)
install_github("immunogenomics/harmony")
library(devtools)
##==準(zhǔn)備seurat對(duì)象列表==##
library(Seurat)
library(tidyverse)
library(patchwork)
rm(list=ls())
dir = c('Data/GSE139324/GSE139324_SUB/GSM4138110', 
        'Data/GSE139324/GSE139324_SUB/GSM4138111', 
        'Data/GSE139324/GSE139324_SUB/GSM4138128',
        'Data/GSE139324/GSE139324_SUB/GSM4138129',
        'Data/GSE139324/GSE139324_SUB/GSM4138148',
        'Data/GSE139324/GSE139324_SUB/GSM4138149',
        'Data/GSE139324/GSE139324_SUB/GSM4138162',
        'Data/GSE139324/GSE139324_SUB/GSM4138163',
        'Data/GSE139324/GSE139324_SUB/GSM4138168',
        'Data/GSE139324/GSE139324_SUB/GSM4138169')       #GSE139324是存放數(shù)據(jù)的目錄
sample_name <- c('HNC01PBMC', 'HNC01TIL', 'HNC10PBMC', 'HNC10TIL', 'HNC20PBMC',
                 'HNC20TIL',  'PBMC1', 'PBMC2', 'Tonsil1', 'Tonsil2')
scRNAlist <- list()
for(i in 1:length(dir)){
  counts <- Read10X(data.dir = dir[i])
  scRNAlist[[i]] <- CreateSeuratObject(counts, project=sample_name[i], min.cells=3, min.features = 200)
  scRNAlist[[i]][["percent.mt"]] <- PercentageFeatureSet(scRNAlist[[i]], pattern = "^MT-")
  scRNAlist[[i]] <- subset(scRNAlist[[i]], subset = percent.mt < 10) 
}   
saveRDS(scRNAlist, "scRNAlist.rds")




##==harmony整合多樣本==##
library(harmony)
scRNAlist <- readRDS("scRNAlist.rds")
##PCA降維
scRNA_harmony <- merge(scRNAlist[[1]], y=c(scRNAlist[[2]], scRNAlist[[3]], scRNAlist[[4]], scRNAlist[[5]], 
                                           scRNAlist[[6]], scRNAlist[[7]], scRNAlist[[8]], scRNAlist[[9]], scRNAlist[[10]]))
 scRNA_harmony <- NormalizeData(scRNA_harmony) %>% FindVariableFeatures() %>% ScaleData() %>% RunPCA(verbose=FALSE)
##整合
system.time({scRNA_harmony <- RunHarmony(scRNA_harmony, group.by.vars = "orig.ident")})
#   用戶   系統(tǒng)   流逝 
# 34.308  0.024 34.324

#user  system elapsed 
#62.90    4.06   72.08 

#降維聚類
scRNA_harmony <- RunUMAP(scRNA_harmony, reduction = "harmony", dims = 1:30)
scRNA_harmony <- FindNeighbors(scRNA_harmony, reduction = "harmony", dims = 1:30) %>% FindClusters()
##作圖
#group_by_cluster
plot1 = DimPlot(scRNA_harmony, reduction = "umap", label=T) 
#group_by_sample
plot2 = DimPlot(scRNA_harmony, reduction = "umap", group.by='orig.ident') 
#combinate
plotc <- plot1+plot2
plotc

練習(xí)另外一個(gè)數(shù)據(jù)

#harmonny
library(harmony)
library(Seurat)
library(tidyverse)
library(dplyr)
library(patchwork)
setwd("D:\\genetic_r\\study\\geo\\lesson7")
library(harmony)
scRNA.2=Read10X("D:\\genetic_r\\study\\geo\\lesson7\\GSE152048_BC2.matrix\\BC2")
scRNA.3=Read10X("D:\\genetic_r\\study\\geo\\lesson7\\GSE152048_BC3.matrix\\BC3")
scRNA.2 = CreateSeuratObject(scRNA.2 ,project="sample_2",min.cells = 3, min.features = 200)
scRNA.3 = CreateSeuratObject(scRNA.3 ,project="sample_3",min.cells = 3, min.features = 200)
scRNA_harmony <- merge(scRNA.2, y=c(scRNA.3 ))
Warning message:
  In CheckDuplicateCellNames(object.list = objects) :
  Some cell names are duplicated across objects provided. Renaming to enforce unique cell names.
scRNA_harmony <- NormalizeData(scRNA_harmony) %>% FindVariableFeatures() %>% ScaleData() %>% RunPCA(verbose=FALSE)
Performing log-normalization
0%   10   20   30   40   50   60   70   80   90   100%
  [----|----|----|----|----|----|----|----|----|----|
     **************************************************|
     Calculating gene variances
   0%   10   20   30   40   50   60   70   80   90   100%
     [----|----|----|----|----|----|----|----|----|----|
        **************************************************|
        Calculating feature variances of standardized and clipped values
      0%   10   20   30   40   50   60   70   80   90   100%
        [----|----|----|----|----|----|----|----|----|----|
           **************************************************|
           Centering and scaling data matrix
         |============================================================| 100%
#開(kāi)始整合
system.time({scRNA_harmony <- RunHarmony(scRNA_harmony, group.by.vars = "orig.ident")})
Harmony 1/10
0%   10   20   30   40   50   60   70   80   90   100%
  [----|----|----|----|----|----|----|----|----|----|
     **************************************************|
     Harmony 2/10
   0%   10   20   30   40   50   60   70   80   90   100%
     [----|----|----|----|----|----|----|----|----|----|
        **************************************************|
        Harmony 3/10
      0%   10   20   30   40   50   60   70   80   90   100%
        [----|----|----|----|----|----|----|----|----|----|
           **************************************************|
           Harmony 4/10
         0%   10   20   30   40   50   60   70   80   90   100%
           [----|----|----|----|----|----|----|----|----|----|
              **************************************************|
              Harmony 5/10
            0%   10   20   30   40   50   60   70   80   90   100%
              [----|----|----|----|----|----|----|----|----|----|
                 **************************************************|
                 Harmony 6/10
               0%   10   20   30   40   50   60   70   80   90   100%
                 [----|----|----|----|----|----|----|----|----|----|
                    **************************************************|
                    Harmony converged after 6 iterations
                  Warning: Invalid name supplied, making object name syntactically valid. New object name is Seurat..ProjectDim.RNA.harmony; see ?make.names for more details on syntax validity
                  用戶  系統(tǒng)  流逝 
                  67.39  5.28 88.92 
降維聚類
scRNA_harmony <- FindNeighbors(scRNA_harmony, reduction = "harmony", dims = 1:15) %>% FindClusters(resolution = 0.5)
                  Computing nearest neighbor graph
                  Computing SNN
                  Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
                  
                  Number of nodes: 14760
                  Number of edges: 529809
                  
                  Running Louvain algorithm...
                  0%   10   20   30   40   50   60   70   80   90   100%
                    [----|----|----|----|----|----|----|----|----|----|
                       **************************************************|
                       Maximum modularity in 10 random starts: 0.9244
                     Number of communities: 12
                     Elapsed time: 3 seconds
降維可視化
scRNA_harmony <- RunUMAP(scRNA_harmony, reduction = "harmony", dims = 1:15)
                     11:41:02 UMAP embedding parameters a = 0.9922 b = 1.112
                     11:41:02 Read 14760 rows and found 15 numeric columns
                     11:41:02 Using Annoy for neighbor search, n_neighbors = 30
                     11:41:02 Building Annoy index with metric = cosine, n_trees = 50
                     0%   10   20   30   40   50   60   70   80   90   100%
                       [----|----|----|----|----|----|----|----|----|----|
                          **************************************************|
                          11:41:09 Writing NN index file to temp file C:\Users\ASUS\AppData\Local\Temp\RtmpO8LMz0\file2da4113d7aa5
                        11:41:09 Searching Annoy index using 1 thread, search_k = 3000
                        11:41:16 Annoy recall = 100%
                        11:41:39 Commencing smooth kNN distance calibration using 1 thread
                        11:41:45 Initializing from normalized Laplacian + noise
                        11:41:47 Commencing optimization for 200 epochs, with 640848 positive edges
                        0%   10   20   30   40   50   60   70   80   90   100%
                          [----|----|----|----|----|----|----|----|----|----|
                             **************************************************|
                             11:42:07 Optimization finished
畫圖看看整合效果
#group_by_cluster
plot1 = DimPlot(scRNA_harmony, reduction = "umap", label=T) 
#group_by_sample
plot2 = DimPlot(scRNA_harmony, reduction = "umap", group.by='orig.ident') 
#combinate
plotc <- plot1+plot2
plotc
對(duì)比一下Seurat整合樣本,Seurat是采用錨定的方法來(lái)整合埋涧。
數(shù)據(jù)歸一化處理
scRNA.list <- list(scRNA.2,scRNA.3)
for (i in 1:length(scRNA.list)) {
scRNA.list[[i]] <- NormalizeData(scRNA.list[[i]])
scRNA.list[[i]] <- FindVariableFeatures(scRNA.list[[i]], selection.method = "vst")
}
Performing log-normalization
0%   10   20   30   40   50   60   70   80   90   100%
  [----|----|----|----|----|----|----|----|----|----|
     **************************************************|
     Calculating gene variances
   0%   10   20   30   40   50   60   70   80   90   100%
     [----|----|----|----|----|----|----|----|----|----|
        **************************************************|
        Calculating feature variances of standardized and clipped values
      0%   10   20   30   40   50   60   70   80   90   100%
        [----|----|----|----|----|----|----|----|----|----|
           **************************************************|
           Performing log-normalization
         0%   10   20   30   40   50   60   70   80   90   100%
           [----|----|----|----|----|----|----|----|----|----|
              **************************************************|
              Calculating gene variances
            0%   10   20   30   40   50   60   70   80   90   100%
              [----|----|----|----|----|----|----|----|----|----|
                 **************************************************|
                 Calculating feature variances of standardized and clipped values
               0%   10   20   30   40   50   60   70   80   90   100%
                 [----|----|----|----|----|----|----|----|----|----|
                    **************************************************|

尋找錨點(diǎn)                    
system.time({scRNA.anchors <- FindIntegrationAnchors(object.list = scRNA.list)})
Computing 2000 integration features
Scaling features for provided objects
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=13s  
Finding all pairwise anchors
|                                                  | 0 % ~calculating  Running CCA
Merging objects
Finding neighborhoods
Finding anchors
Found 12929 anchors
Filtering anchors
Retained 4266 anchors
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=15m 55s
user  system elapsed 
365.04  207.64  973.92 
#錨定后整合
system.time({scRNA_seurat <- IntegrateData(anchorset = scRNA.anchors)})
scRNA_seurat<- IntegrateData(anchorset = scRNA.anchors)
Merging dataset 1 into 2
Extracting anchors for merged samples
Finding integration vectors
Finding integration vector weights
0%   10   20   30   40   50   60   70   80   90   100%
  [----|----|----|----|----|----|----|----|----|----|
     **************************************************|
     Integrating data
save(scRNA.anchors,file='scRNA.anchors.RData')
rm(list = ls())
降維聚類
scRNA_seurat <- ScaleData(scRNA_seurat) %>% RunPCA(verbose=FALSE)
Centering and scaling data matrix
|===========================================================| 100%
scRNA_seurat <- FindNeighbors(scRNA_seurat, dims = 1:15) %>% FindClusters(resolution = 0.5)
Computing nearest neighbor graph
Computing SNN
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 14760
Number of edges: 523184

Running Louvain algorithm...
0%   10   20   30   40   50   60   70   80   90   100%
  [----|----|----|----|----|----|----|----|----|----|
     **************************************************|
     Maximum modularity in 10 random starts: 0.9068
   Number of communities: 12
   Elapsed time: 3 seconds
scRNA_seurat <- ScaleData(scRNA_seurat, features = VariableFeatures(scRNA_seurat))
scRNA_seurat <- RunPCA(scRNA_seurat, features = VariableFeatures(scRNA_seurat))
plot1 <- DimPlot(scRNA_seurat, reduction = "pca", group.by="orig.ident")
plot2 <- ElbowPlot(scRNA_seurat, ndims=30, reduction="pca") 
plotc <- plot1+plot2
pc.num=1:20
scRNA_seurat <- FindNeighbors(scRNA_seurat, dims = pc.num) 
scRNA_seurat <- FindClusters(scRNA_seurat, resolution = 0.5)
table(scRNA_seurat@meta.data$seurat_clusters)
metadata <- scRNA_seurat@meta.data
cell_cluster <- data.frame(cell_ID=rownames(metadata), cluster_ID=metadata$seurat_clusters)
#tSNE
scRNA_seurat = RunTSNE(scRNA_seurat, dims = pc.num)
#group_by_cluster
plot1 = DimPlot(scRNA_seurat, reduction = "tsne", label=T) 

#group_by_sample
plot2 = DimPlot(scRNA_seurat, reduction = "tsne", group.by='orig.ident') 

#combinate
plotc <- plot1+plot2

#UMAP
scRNA_seurat <- RunUMAP(scRNA_seurat, dims = pc.num)
#group_by_cluster
plot3 = DimPlot(scRNA_seurat, reduction = "umap", label=T) 
#group_by_sample
plot4 = DimPlot(scRNA_seurat, reduction = "umap", group.by='orig.ident')
#combinate
plotc <- plot3+plot4


plotc <- plot3+plot4+ plot_layout(guides = 'collect')

哈哈 暫時(shí)還看不出harmony與Seurat誰(shuí)的整合效果好贴唇。。飞袋。

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個(gè)濱河市链患,隨后出現(xiàn)的幾起案子巧鸭,更是在濱河造成了極大的恐慌,老刑警劉巖麻捻,帶你破解...
    沈念sama閱讀 216,372評(píng)論 6 498
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件纲仍,死亡現(xiàn)場(chǎng)離奇詭異呀袱,居然都是意外死亡,警方通過(guò)查閱死者的電腦和手機(jī)郑叠,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 92,368評(píng)論 3 392
  • 文/潘曉璐 我一進(jìn)店門夜赵,熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái),“玉大人乡革,你說(shuō)我怎么就攤上這事寇僧。” “怎么了沸版?”我有些...
    開(kāi)封第一講書人閱讀 162,415評(píng)論 0 353
  • 文/不壞的土叔 我叫張陵嘁傀,是天一觀的道長(zhǎng)。 經(jīng)常有香客問(wèn)我视粮,道長(zhǎng)细办,這世上最難降的妖魔是什么? 我笑而不...
    開(kāi)封第一講書人閱讀 58,157評(píng)論 1 292
  • 正文 為了忘掉前任蕾殴,我火速辦了婚禮笑撞,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘钓觉。我一直安慰自己茴肥,他們只是感情好,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,171評(píng)論 6 388
  • 文/花漫 我一把揭開(kāi)白布议谷。 她就那樣靜靜地躺著炉爆,像睡著了一般。 火紅的嫁衣襯著肌膚如雪卧晓。 梳的紋絲不亂的頭發(fā)上芬首,一...
    開(kāi)封第一講書人閱讀 51,125評(píng)論 1 297
  • 那天,我揣著相機(jī)與錄音逼裆,去河邊找鬼郁稍。 笑死,一個(gè)胖子當(dāng)著我的面吹牛胜宇,可吹牛的內(nèi)容都是我干的耀怜。 我是一名探鬼主播,決...
    沈念sama閱讀 40,028評(píng)論 3 417
  • 文/蒼蘭香墨 我猛地睜開(kāi)眼桐愉,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼财破!你這毒婦竟也來(lái)了?” 一聲冷哼從身側(cè)響起从诲,我...
    開(kāi)封第一講書人閱讀 38,887評(píng)論 0 274
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤左痢,失蹤者是張志新(化名)和其女友劉穎,沒(méi)想到半個(gè)月后,有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體俊性,經(jīng)...
    沈念sama閱讀 45,310評(píng)論 1 310
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡略步,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,533評(píng)論 2 332
  • 正文 我和宋清朗相戀三年,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了定页。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片趟薄。...
    茶點(diǎn)故事閱讀 39,690評(píng)論 1 348
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡,死狀恐怖典徊,靈堂內(nèi)的尸體忽然破棺而出杭煎,到底是詐尸還是另有隱情,我是刑警寧澤宫峦,帶...
    沈念sama閱讀 35,411評(píng)論 5 343
  • 正文 年R本政府宣布岔帽,位于F島的核電站,受9級(jí)特大地震影響导绷,放射性物質(zhì)發(fā)生泄漏犀勒。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,004評(píng)論 3 325
  • 文/蒙蒙 一妥曲、第九天 我趴在偏房一處隱蔽的房頂上張望贾费。 院中可真熱鬧,春花似錦檐盟、人聲如沸褂萧。這莊子的主人今日做“春日...
    開(kāi)封第一講書人閱讀 31,659評(píng)論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)导犹。三九已至,卻和暖如春羡忘,著一層夾襖步出監(jiān)牢的瞬間谎痢,已是汗流浹背。 一陣腳步聲響...
    開(kāi)封第一講書人閱讀 32,812評(píng)論 1 268
  • 我被黑心中介騙來(lái)泰國(guó)打工卷雕, 沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留节猿,地道東北人。 一個(gè)月前我還...
    沈念sama閱讀 47,693評(píng)論 2 368
  • 正文 我出身青樓漫雕,卻偏偏與公主長(zhǎng)得像滨嘱,于是被迫代替她去往敵國(guó)和親。 傳聞我的和親對(duì)象是個(gè)殘疾皇子浸间,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 44,577評(píng)論 2 353

推薦閱讀更多精彩內(nèi)容