2021-05-26 單細(xì)胞分析之harmony與Seurat

參考：生信會(huì)客廳

harmony原理

Harmony需要輸入低維空間的坐標(biāo)值（embedding）弯院，一般使用PCA的降維結(jié)果缸兔。Harmony導(dǎo)入PCA的降維數(shù)據(jù)后瘟裸，會(huì)采用soft k-means clustering算法將細(xì)胞聚類学辱。常用的聚類算法僅考慮細(xì)胞在低維空間的距離缎除，但是soft clustering算法會(huì)考慮我們提供的校正因素域慷。這就好比我們的高考加分制度荒辕，小明高考成績(jī)本來(lái)達(dá)不到A大學(xué)的錄取分?jǐn)?shù)線，但是他有一項(xiàng)省級(jí)競(jìng)賽一等獎(jiǎng)加10分就夠線了犹褒。同樣的道理抵窒，細(xì)胞c2距離cluster1有點(diǎn)遠(yuǎn)，本來(lái)不能算作cluster1的一份子叠骑；但是c2和cluster1的細(xì)胞來(lái)自不同的數(shù)據(jù)集李皇，因?yàn)槲覀兤谕煌臄?shù)據(jù)集融合，所以破例讓它加入cluster1了宙枷。聚類之后先計(jì)算每個(gè)cluster內(nèi)各個(gè)數(shù)據(jù)集的細(xì)胞的中心點(diǎn)掉房，然后根據(jù)這些中心點(diǎn)計(jì)算各個(gè)cluster的中心點(diǎn)。最后通過(guò)算法讓cluster內(nèi)的細(xì)胞向中心聚集慰丛，實(shí)在收斂不了的離群細(xì)胞就過(guò)濾掉卓囚。調(diào)整之后的數(shù)據(jù)重復(fù)：聚類—計(jì)算cluster中心點(diǎn)—收斂細(xì)胞—聚類的過(guò)程，不斷迭代直至聚類效果趨于穩(wěn)定诅病。

從第三方的評(píng)測(cè)結(jié)果來(lái)看哪亿，harmony的整合效果可以與seurat的錨點(diǎn)整合方法媲美，并且運(yùn)行時(shí)間方面具有明顯優(yōu)勢(shì)贤笆，不過(guò)內(nèi)存消耗好像沒(méi)有那么夸張的優(yōu)秀蝇棉。(哈哈我的電腦用seurat錨點(diǎn)整合被卡住了，試了一下harmony芥永，時(shí)間挺快的篡殷，而且電腦也帶得動(dòng)（記錄一下學(xué)習(xí)筆記）

library(harmony)
install_github("immunogenomics/harmony")
library(devtools)
##==準(zhǔn)備seurat對(duì)象列表==##
library(Seurat)
library(tidyverse)
library(patchwork)
rm(list=ls())
dir = c('Data/GSE139324/GSE139324_SUB/GSM4138110', 
        'Data/GSE139324/GSE139324_SUB/GSM4138111', 
        'Data/GSE139324/GSE139324_SUB/GSM4138128',
        'Data/GSE139324/GSE139324_SUB/GSM4138129',
        'Data/GSE139324/GSE139324_SUB/GSM4138148',
        'Data/GSE139324/GSE139324_SUB/GSM4138149',
        'Data/GSE139324/GSE139324_SUB/GSM4138162',
        'Data/GSE139324/GSE139324_SUB/GSM4138163',
        'Data/GSE139324/GSE139324_SUB/GSM4138168',
        'Data/GSE139324/GSE139324_SUB/GSM4138169')       #GSE139324是存放數(shù)據(jù)的目錄
sample_name <- c('HNC01PBMC', 'HNC01TIL', 'HNC10PBMC', 'HNC10TIL', 'HNC20PBMC',
                 'HNC20TIL',  'PBMC1', 'PBMC2', 'Tonsil1', 'Tonsil2')
scRNAlist <- list()
for(i in 1:length(dir)){
  counts <- Read10X(data.dir = dir[i])
  scRNAlist[[i]] <- CreateSeuratObject(counts, project=sample_name[i], min.cells=3, min.features = 200)
  scRNAlist[[i]][["percent.mt"]] <- PercentageFeatureSet(scRNAlist[[i]], pattern = "^MT-")
  scRNAlist[[i]] <- subset(scRNAlist[[i]], subset = percent.mt < 10) 
}   
saveRDS(scRNAlist, "scRNAlist.rds")




##==harmony整合多樣本==##
library(harmony)
scRNAlist <- readRDS("scRNAlist.rds")
##PCA降維
scRNA_harmony <- merge(scRNAlist[[1]], y=c(scRNAlist[[2]], scRNAlist[[3]], scRNAlist[[4]], scRNAlist[[5]], 
                                           scRNAlist[[6]], scRNAlist[[7]], scRNAlist[[8]], scRNAlist[[9]], scRNAlist[[10]]))
 scRNA_harmony <- NormalizeData(scRNA_harmony) %>% FindVariableFeatures() %>% ScaleData() %>% RunPCA(verbose=FALSE)
##整合
system.time({scRNA_harmony <- RunHarmony(scRNA_harmony, group.by.vars = "orig.ident")})
#   用戶   系統(tǒng)   流逝 
# 34.308  0.024 34.324

#user  system elapsed 
#62.90    4.06   72.08 

#降維聚類
scRNA_harmony <- RunUMAP(scRNA_harmony, reduction = "harmony", dims = 1:30)
scRNA_harmony <- FindNeighbors(scRNA_harmony, reduction = "harmony", dims = 1:30) %>% FindClusters()
##作圖
#group_by_cluster
plot1 = DimPlot(scRNA_harmony, reduction = "umap", label=T) 
#group_by_sample
plot2 = DimPlot(scRNA_harmony, reduction = "umap", group.by='orig.ident') 
#combinate
plotc <- plot1+plot2
plotc

練習(xí)另外一個(gè)數(shù)據(jù)

#harmonny
library(harmony)
library(Seurat)
library(tidyverse)
library(dplyr)
library(patchwork)
setwd("D:\\genetic_r\\study\\geo\\lesson7")
library(harmony)
scRNA.2=Read10X("D:\\genetic_r\\study\\geo\\lesson7\\GSE152048_BC2.matrix\\BC2")
scRNA.3=Read10X("D:\\genetic_r\\study\\geo\\lesson7\\GSE152048_BC3.matrix\\BC3")
scRNA.2 = CreateSeuratObject(scRNA.2 ,project="sample_2",min.cells = 3, min.features = 200)
scRNA.3 = CreateSeuratObject(scRNA.3 ,project="sample_3",min.cells = 3, min.features = 200)
scRNA_harmony <- merge(scRNA.2, y=c(scRNA.3 ))
Warning message:
  In CheckDuplicateCellNames(object.list = objects) :
  Some cell names are duplicated across objects provided. Renaming to enforce unique cell names.
scRNA_harmony <- NormalizeData(scRNA_harmony) %>% FindVariableFeatures() %>% ScaleData() %>% RunPCA(verbose=FALSE)
Performing log-normalization
0%   10   20   30   40   50   60   70   80   90   100%
  [----|----|----|----|----|----|----|----|----|----|
     **************************************************|
     Calculating gene variances
   0%   10   20   30   40   50   60   70   80   90   100%
     [----|----|----|----|----|----|----|----|----|----|
        **************************************************|
        Calculating feature variances of standardized and clipped values
      0%   10   20   30   40   50   60   70   80   90   100%
        [----|----|----|----|----|----|----|----|----|----|
           **************************************************|
           Centering and scaling data matrix
         |============================================================| 100%
#開(kāi)始整合
system.time({scRNA_harmony <- RunHarmony(scRNA_harmony, group.by.vars = "orig.ident")})
Harmony 1/10
0%   10   20   30   40   50   60   70   80   90   100%
  [----|----|----|----|----|----|----|----|----|----|
     **************************************************|
     Harmony 2/10
   0%   10   20   30   40   50   60   70   80   90   100%
     [----|----|----|----|----|----|----|----|----|----|
        **************************************************|
        Harmony 3/10
      0%   10   20   30   40   50   60   70   80   90   100%
        [----|----|----|----|----|----|----|----|----|----|
           **************************************************|
           Harmony 4/10
         0%   10   20   30   40   50   60   70   80   90   100%
           [----|----|----|----|----|----|----|----|----|----|
              **************************************************|
              Harmony 5/10
            0%   10   20   30   40   50   60   70   80   90   100%
              [----|----|----|----|----|----|----|----|----|----|
                 **************************************************|
                 Harmony 6/10
               0%   10   20   30   40   50   60   70   80   90   100%
                 [----|----|----|----|----|----|----|----|----|----|
                    **************************************************|
                    Harmony converged after 6 iterations
                  Warning: Invalid name supplied, making object name syntactically valid. New object name is Seurat..ProjectDim.RNA.harmony; see ?make.names for more details on syntax validity
                  用戶  系統(tǒng)  流逝 
                  67.39  5.28 88.92 
降維聚類
scRNA_harmony <- FindNeighbors(scRNA_harmony, reduction = "harmony", dims = 1:15) %>% FindClusters(resolution = 0.5)
                  Computing nearest neighbor graph
                  Computing SNN
                  Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
                  
                  Number of nodes: 14760
                  Number of edges: 529809
                  
                  Running Louvain algorithm...
                  0%   10   20   30   40   50   60   70   80   90   100%
                    [----|----|----|----|----|----|----|----|----|----|
                       **************************************************|
                       Maximum modularity in 10 random starts: 0.9244
                     Number of communities: 12
                     Elapsed time: 3 seconds
降維可視化
scRNA_harmony <- RunUMAP(scRNA_harmony, reduction = "harmony", dims = 1:15)
                     11:41:02 UMAP embedding parameters a = 0.9922 b = 1.112
                     11:41:02 Read 14760 rows and found 15 numeric columns
                     11:41:02 Using Annoy for neighbor search, n_neighbors = 30
                     11:41:02 Building Annoy index with metric = cosine, n_trees = 50
                     0%   10   20   30   40   50   60   70   80   90   100%
                       [----|----|----|----|----|----|----|----|----|----|
                          **************************************************|
                          11:41:09 Writing NN index file to temp file C:\Users\ASUS\AppData\Local\Temp\RtmpO8LMz0\file2da4113d7aa5
                        11:41:09 Searching Annoy index using 1 thread, search_k = 3000
                        11:41:16 Annoy recall = 100%
                        11:41:39 Commencing smooth kNN distance calibration using 1 thread
                        11:41:45 Initializing from normalized Laplacian + noise
                        11:41:47 Commencing optimization for 200 epochs, with 640848 positive edges
                        0%   10   20   30   40   50   60   70   80   90   100%
                          [----|----|----|----|----|----|----|----|----|----|
                             **************************************************|
                             11:42:07 Optimization finished
畫圖看看整合效果
#group_by_cluster
plot1 = DimPlot(scRNA_harmony, reduction = "umap", label=T) 
#group_by_sample
plot2 = DimPlot(scRNA_harmony, reduction = "umap", group.by='orig.ident') 
#combinate
plotc <- plot1+plot2
plotc

對(duì)比一下Seurat整合樣本，Seurat是采用錨定的方法來(lái)整合埋涧。
數(shù)據(jù)歸一化處理
scRNA.list <- list(scRNA.2,scRNA.3)
for (i in 1:length(scRNA.list)) {
scRNA.list[[i]] <- NormalizeData(scRNA.list[[i]])
scRNA.list[[i]] <- FindVariableFeatures(scRNA.list[[i]], selection.method = "vst")
}
Performing log-normalization
0%   10   20   30   40   50   60   70   80   90   100%
  [----|----|----|----|----|----|----|----|----|----|
     **************************************************|
     Calculating gene variances
   0%   10   20   30   40   50   60   70   80   90   100%
     [----|----|----|----|----|----|----|----|----|----|
        **************************************************|
        Calculating feature variances of standardized and clipped values
      0%   10   20   30   40   50   60   70   80   90   100%
        [----|----|----|----|----|----|----|----|----|----|
           **************************************************|
           Performing log-normalization
         0%   10   20   30   40   50   60   70   80   90   100%
           [----|----|----|----|----|----|----|----|----|----|
              **************************************************|
              Calculating gene variances
            0%   10   20   30   40   50   60   70   80   90   100%
              [----|----|----|----|----|----|----|----|----|----|
                 **************************************************|
                 Calculating feature variances of standardized and clipped values
               0%   10   20   30   40   50   60   70   80   90   100%
                 [----|----|----|----|----|----|----|----|----|----|
                    **************************************************|

尋找錨點(diǎn)                    
system.time({scRNA.anchors <- FindIntegrationAnchors(object.list = scRNA.list)})
Computing 2000 integration features
Scaling features for provided objects
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=13s  
Finding all pairwise anchors
|                                                  | 0 % ~calculating  Running CCA
Merging objects
Finding neighborhoods
Finding anchors
Found 12929 anchors
Filtering anchors
Retained 4266 anchors
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=15m 55s
user  system elapsed 
365.04  207.64  973.92 
#錨定后整合
system.time({scRNA_seurat <- IntegrateData(anchorset = scRNA.anchors)})
scRNA_seurat<- IntegrateData(anchorset = scRNA.anchors)
Merging dataset 1 into 2
Extracting anchors for merged samples
Finding integration vectors
Finding integration vector weights
0%   10   20   30   40   50   60   70   80   90   100%
  [----|----|----|----|----|----|----|----|----|----|
     **************************************************|
     Integrating data
save(scRNA.anchors,file='scRNA.anchors.RData')
rm(list = ls())
降維聚類
scRNA_seurat <- ScaleData(scRNA_seurat) %>% RunPCA(verbose=FALSE)
Centering and scaling data matrix
|===========================================================| 100%
scRNA_seurat <- FindNeighbors(scRNA_seurat, dims = 1:15) %>% FindClusters(resolution = 0.5)
Computing nearest neighbor graph
Computing SNN
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 14760
Number of edges: 523184

Running Louvain algorithm...
0%   10   20   30   40   50   60   70   80   90   100%
  [----|----|----|----|----|----|----|----|----|----|
     **************************************************|
     Maximum modularity in 10 random starts: 0.9068
   Number of communities: 12
   Elapsed time: 3 seconds
scRNA_seurat <- ScaleData(scRNA_seurat, features = VariableFeatures(scRNA_seurat))
scRNA_seurat <- RunPCA(scRNA_seurat, features = VariableFeatures(scRNA_seurat))
plot1 <- DimPlot(scRNA_seurat, reduction = "pca", group.by="orig.ident")
plot2 <- ElbowPlot(scRNA_seurat, ndims=30, reduction="pca") 
plotc <- plot1+plot2
pc.num=1:20
scRNA_seurat <- FindNeighbors(scRNA_seurat, dims = pc.num) 
scRNA_seurat <- FindClusters(scRNA_seurat, resolution = 0.5)
table(scRNA_seurat@meta.data$seurat_clusters)
metadata <- scRNA_seurat@meta.data
cell_cluster <- data.frame(cell_ID=rownames(metadata), cluster_ID=metadata$seurat_clusters)
#tSNE
scRNA_seurat = RunTSNE(scRNA_seurat, dims = pc.num)
#group_by_cluster
plot1 = DimPlot(scRNA_seurat, reduction = "tsne", label=T) 

#group_by_sample
plot2 = DimPlot(scRNA_seurat, reduction = "tsne", group.by='orig.ident') 

#combinate
plotc <- plot1+plot2

#UMAP
scRNA_seurat <- RunUMAP(scRNA_seurat, dims = pc.num)
#group_by_cluster
plot3 = DimPlot(scRNA_seurat, reduction = "umap", label=T) 
#group_by_sample
plot4 = DimPlot(scRNA_seurat, reduction = "umap", group.by='orig.ident')
#combinate
plotc <- plot3+plot4


plotc <- plot3+plot4+ plot_layout(guides = 'collect')

哈哈暫時(shí)還看不出harmony與Seurat誰(shuí)的整合效果好贴唇。。飞袋。

最后編輯于：2021.06.23 14:27:57

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個(gè)濱河市链患，隨后出現(xiàn)的幾起案子巧鸭，更是在濱河造成了極大的恐慌，老刑警劉巖麻捻，帶你破解...
沈念sama閱讀 216,372評(píng)論 6贊 498
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件纲仍，死亡現(xiàn)場(chǎng)離奇詭異呀袱，居然都是意外死亡，警方通過(guò)查閱死者的電腦和手機(jī)郑叠，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 92,368評(píng)論 3贊 392
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門夜赵，熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)，“玉大人乡革，你說(shuō)我怎么就攤上這事寇僧。” “怎么了沸版？”我有些...
開(kāi)封第一講書人閱讀 162,415評(píng)論 0贊 353
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵嘁傀，是天一觀的道長(zhǎng)。經(jīng)常有香客問(wèn)我视粮，道長(zhǎng)细办，這世上最難降的妖魔是什么？我笑而不...
開(kāi)封第一講書人閱讀 58,157評(píng)論 1贊 292
?港島之戀（遺憾婚禮）
正文為了忘掉前任蕾殴，我火速辦了婚禮笑撞，結(jié)果婚禮上，老公的妹妹穿的比我還像新娘钓觉。我一直安慰自己茴肥，他們只是感情好，可當(dāng)我...
茶點(diǎn)故事閱讀 67,171評(píng)論 6贊 388
惡毒庶女頂嫁案：這布局不是一般人想出來(lái)的
文/花漫我一把揭開(kāi)白布议谷。她就那樣靜靜地躺著炉爆，像睡著了一般。火紅的嫁衣襯著肌膚如雪卧晓。梳的紋絲不亂的頭發(fā)上芬首，一...
開(kāi)封第一講書人閱讀 51,125評(píng)論 1贊 297
城市分裂傳說(shuō)
那天，我揣著相機(jī)與錄音逼裆，去河邊找鬼郁稍。笑死，一個(gè)胖子當(dāng)著我的面吹牛胜宇，可吹牛的內(nèi)容都是我干的耀怜。我是一名探鬼主播，決...
沈念sama閱讀 40,028評(píng)論 3贊 417
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開(kāi)眼桐愉，長(zhǎng)吁一口氣：“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼财破！你這毒婦竟也來(lái)了？” 一聲冷哼從身側(cè)響起从诲，我...
開(kāi)封第一講書人閱讀 38,887評(píng)論 0贊 274
萬(wàn)榮殺人案實(shí)錄
序言：老撾萬(wàn)榮一對(duì)情侶失蹤左痢，失蹤者是張志新（化名）和其女友劉穎，沒(méi)想到半個(gè)月后，有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體俊性，經(jīng)...
沈念sama閱讀 45,310評(píng)論 1贊 310
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡略步，尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 37,533評(píng)論 2贊 332
?白月光啟示錄
正文我和宋清朗相戀三年，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了定页。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片趟薄。...
茶點(diǎn)故事閱讀 39,690評(píng)論 1贊 348
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡，死狀恐怖典徊，靈堂內(nèi)的尸體忽然破棺而出杭煎，到底是詐尸還是另有隱情，我是刑警寧澤宫峦，帶...
沈念sama閱讀 35,411評(píng)論 5贊 343
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布岔帽，位于F島的核電站，受9級(jí)特大地震影響导绷，放射性物質(zhì)發(fā)生泄漏犀勒。R本人自食惡果不足惜，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 41,004評(píng)論 3贊 325
男人毒藥：我在死后第九天來(lái)索命
文/蒙蒙一妥曲、第九天我趴在偏房一處隱蔽的房頂上張望贾费。院中可真熱鬧，春花似錦檐盟、人聲如沸褂萧。這莊子的主人今日做“春日...
開(kāi)封第一講書人閱讀 31,659評(píng)論 0贊 22
一樁弒父案葵萎，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽(yáng)导犹。三九已至，卻和暖如春羡忘，著一層夾襖步出監(jiān)牢的瞬間谎痢，已是汗流浹背。一陣腳步聲響...
開(kāi)封第一講書人閱讀 32,812評(píng)論 1贊 268
情欲美人皮
我被黑心中介騙來(lái)泰國(guó)打工卷雕，沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留节猿，地道東北人。一個(gè)月前我還...
沈念sama閱讀 47,693評(píng)論 2贊 368
代替公主和親
正文我出身青樓漫雕，卻偏偏與公主長(zhǎng)得像滨嘱，于是被迫代替她去往敵國(guó)和親。傳聞我的和親對(duì)象是個(gè)殘疾皇子浸间，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 44,577評(píng)論 2贊 353

2021-05-26 單細(xì)胞分析之harmony與Seurat

harmony原理

練習(xí)另外一個(gè)數(shù)據(jù)

推薦閱讀更多精彩內(nèi)容