Seurat3||整合方法merge()與IntegrateData()

隨著單細(xì)胞測序技術(shù)的成熟,越來越多的研究者選擇應(yīng)用該技術(shù)來闡釋手上的生物學(xué)問題。同時(shí)單細(xì)胞也不再是單樣本單物種單器官的技術(shù)胚嘲,往往會(huì)用到多樣本整合分析的技術(shù),這方面Seurat團(tuán)隊(duì)是最值得關(guān)注的洛二。他們提出了一套用于單細(xì)胞樣本整合分析的算法馋劈,Comprehensive integration of single cell data,并打包成Rpackages可以說是很貼心了晾嘶。

其整合單細(xì)胞數(shù)據(jù)的核心函數(shù)之一就是:FindIntegrationAnchors {Seurat}

如其名稱所指妓雾,是用來Finds the integration anchors的,他是如何做到的呢垒迂?

?FindIntegrationAnchors

但是在整合的時(shí)候有時(shí)候整合不成功械姻,特別是應(yīng)用之前的單細(xì)胞技術(shù),識(shí)別的細(xì)胞較少的時(shí)候娇斑,如可能會(huì)報(bào)錯(cuò):

Filtering Anchors
Error in nn2(data = cn.data2[nn.cells2, ], query = cn.data1[nn.cells1,  : 
  Cannot find more nearest neighbours than there are points

github上有這個(gè)問題的討論:https://github.com/satijalab/seurat/issues/997

有用的回答是這樣的:

I experienced the same problem when integrating very heterogeneous datasets, and it seems to be related to the size of k.filter parameter.In my case, lowering the default value of 200 to 150 did the job. However, I am still trying to figure out whether the results are meaningful, and the underlying biological rationale of it, so any explanation would be welcome!
那么k.filter 的影響到底是多大呢策添?我們來試一下。

我用Unravelling subclonal heterogeneity and aggressive disease states in TNBC through single-cell RNA-seq這篇文章的數(shù)據(jù)集:GSE118390.

library(R.utils)
 library(tidyverse)
 library(reticulate)
 use_python("E:\\conda\\python.exe")
gunzip("D:\\Users\\Administrator\\Desktop\\RStudio\\con-cluster\\GSE118389_counts_rsem.txt.gz", remove=FALSE)

medata<-read.table("D:\\Users\\Administrator\\Desktop\\RStudio\\con-cluster\\GSE118389_counts_rsem.txt")
 dim(medata) 
table(str_split(colnames(medata),pattern="_",simplify=T)[,1])

PT039 PT058 PT081 PT084 PT089 PT126 
  341    96   288   286   333   190 

Seurat基本流程:

library(Seurat)
pbmc <- CreateSeuratObject(counts = medata, project = "pbmc3k", min.cells = 3, min.features = 200)
pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize", scale.factor = 10000)
pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = 2000)
pbmc <- ScaleData(pbmc,features=VariableFeatures(pbmc)) 
pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc))
pbmc <- FindNeighbors(pbmc, dims = 1:10)
pbmc <- FindClusters(pbmc, resolution = 0.5)
pbmc <- RunUMAP(pbmc, dims = 1:10)
pbmc <- RunTSNE(pbmc, dims = 1:10)

?DimPlot
table(pbmc@meta.data$orig.ident)

PT039 PT058 PT081 PT089 
  127    58    57   237 

DimPlot(pbmc, reduction = "umap",label = TRUE,group.by="orig.ident")

這就是沒有整合的時(shí)候毫缆,四個(gè)樣本的降維結(jié)果唯竹。

下面我們用不同的k.filter參數(shù)來試一下,

pancreas.list <- SplitObject(pbmc, split.by = "orig.ident")
pancreas.list <- pancreas.list[c("PT039", "PT058", "PT081", "PT089")]

for (i in 1:length(pancreas.list)) {
  pancreas.list[[i]] <- NormalizeData(pancreas.list[[i]], verbose = FALSE)
  pancreas.list[[i]] <- FindVariableFeatures(pancreas.list[[i]], selection.method = "vst", 
                                             nfeatures = 2000, verbose = FALSE)
}

#reference.list <- pancreas.list[c("PT081", "PT089")]
SelectIntegrationFeatures(object.list = pancreas.list)
#?FindIntegrationAnchors
pancreas.anchors <- FindIntegrationAnchors(object.list = pancreas.list, dims = 1:20,k.anchor = 5,k.filter = 30)
#?IntegrateData
pancreas.integrated <- IntegrateData(anchorset = pancreas.anchors, dims = 1:20)

Merging dataset 3 into 1
Extracting anchors for merged samples
Finding integration vectors
Finding integration vector weights
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Integrating data
Merging dataset 2 into 1 3
Extracting anchors for merged samples
Finding integration vectors
Finding integration vector weights
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Integrating data
Merging dataset 4 into 1 3 2
Extracting anchors for merged samples
Finding integration vectors
Finding integration vector weights
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Integrating data

pancreas.integrated <- ScaleData(pancreas.integrated,features=VariableFeatures(pbmc)) 
pancreas.integrated <- RunPCA(pancreas.integrated, features = VariableFeatures(object = pbmc))
pancreas.integrated <- FindNeighbors(pancreas.integrated, dims = 1:10)
pancreas.integrated <- FindClusters(pancreas.integrated, resolution = 0.5)
pancreas.integrated <- RunUMAP(pancreas.integrated, dims = 1:10)
pancreas.integrated <- RunTSNE(pancreas.integrated, dims = 1:10)
DimPlot(pancreas.integrated, reduction = "umap",label = TRUE,group.by="orig.ident")+ggtitle("k.filter = 30")

可見苦丁,seurat在整合多樣本的時(shí)候并不會(huì)自動(dòng)為研究者提供合適的參數(shù)浸颓,我們也不應(yīng)這樣要求他們。需要注意的是default雖然是用的最多的旺拉,并不一定是最優(yōu)的产上。

還有一種方式merge()即簡單地講多個(gè)數(shù)據(jù)集放到一起,并不運(yùn)行整合分析蛾狗。

pancreas.list <- SplitObject(pbmc, split.by = "orig.ident")
pancreas.list <- pancreas.list[c("PT039", "PT058", "PT081", "PT089")]

mpancreas.list<-pancreas.list[c("PT039", "PT058", "PT081")]
mymerg<-merge(pancreas.list$PT089,mpancreas.list)

mymerg <- NormalizeData(mymerg, normalization.method = "LogNormalize", scale.factor = 10000)
mymerg <- FindVariableFeatures(mymerg, selection.method = "vst", nfeatures = 2000)
mymerg <- ScaleData(mymerg,features=VariableFeatures(mymerg)) 
mymerg <- RunPCA(mymerg, features = VariableFeatures(object = mymerg))
mymerg <- FindNeighbors(mymerg, dims = 1:10)
mymerg <- FindClusters(mymerg, resolution = 0.5)
mymerg <- RunUMAP(mymerg, dims = 1:10)
mymerg <- RunTSNE(mymerg, dims = 1:10)

pm<-DimPlot(mymerg, reduction = "umap",label = TRUE,group.by="orig.ident")+ggtitle("merge")

CombinePlots(plots = list(p0,pm),legend="bottom")

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末晋涣,一起剝皮案震驚了整個(gè)濱河市,隨后出現(xiàn)的幾起案子沉桌,更是在濱河造成了極大的恐慌谢鹊,老刑警劉巖算吩,帶你破解...
    沈念sama閱讀 206,013評論 6 481
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場離奇詭異佃扼,居然都是意外死亡偎巢,警方通過查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 88,205評論 2 382
  • 文/潘曉璐 我一進(jìn)店門兼耀,熙熙樓的掌柜王于貴愁眉苦臉地迎上來压昼,“玉大人,你說我怎么就攤上這事瘤运∏舷迹” “怎么了?”我有些...
    開封第一講書人閱讀 152,370評論 0 342
  • 文/不壞的土叔 我叫張陵尽超,是天一觀的道長官撼。 經(jīng)常有香客問我,道長似谁,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 55,168評論 1 278
  • 正文 為了忘掉前任掠哥,我火速辦了婚禮巩踏,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘续搀。我一直安慰自己塞琼,他們只是感情好,可當(dāng)我...
    茶點(diǎn)故事閱讀 64,153評論 5 371
  • 文/花漫 我一把揭開白布禁舷。 她就那樣靜靜地躺著彪杉,像睡著了一般。 火紅的嫁衣襯著肌膚如雪牵咙。 梳的紋絲不亂的頭發(fā)上派近,一...
    開封第一講書人閱讀 48,954評論 1 283
  • 那天,我揣著相機(jī)與錄音洁桌,去河邊找鬼渴丸。 笑死,一個(gè)胖子當(dāng)著我的面吹牛另凌,可吹牛的內(nèi)容都是我干的谱轨。 我是一名探鬼主播,決...
    沈念sama閱讀 38,271評論 3 399
  • 文/蒼蘭香墨 我猛地睜開眼吠谢,長吁一口氣:“原來是場噩夢啊……” “哼土童!你這毒婦竟也來了?” 一聲冷哼從身側(cè)響起工坊,我...
    開封第一講書人閱讀 36,916評論 0 259
  • 序言:老撾萬榮一對情侶失蹤献汗,失蹤者是張志新(化名)和其女友劉穎错沃,沒想到半個(gè)月后,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體雀瓢,經(jīng)...
    沈念sama閱讀 43,382評論 1 300
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡枢析,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 35,877評論 2 323
  • 正文 我和宋清朗相戀三年,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了刃麸。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片醒叁。...
    茶點(diǎn)故事閱讀 37,989評論 1 333
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡,死狀恐怖泊业,靈堂內(nèi)的尸體忽然破棺而出把沼,到底是詐尸還是另有隱情,我是刑警寧澤吁伺,帶...
    沈念sama閱讀 33,624評論 4 322
  • 正文 年R本政府宣布饮睬,位于F島的核電站,受9級特大地震影響篮奄,放射性物質(zhì)發(fā)生泄漏捆愁。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 39,209評論 3 307
  • 文/蒙蒙 一窟却、第九天 我趴在偏房一處隱蔽的房頂上張望昼丑。 院中可真熱鬧,春花似錦夸赫、人聲如沸菩帝。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,199評論 0 19
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽呼奢。三九已至,卻和暖如春切平,著一層夾襖步出監(jiān)牢的瞬間握础,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 31,418評論 1 260
  • 我被黑心中介騙來泰國打工揭绑, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留弓候,地道東北人。 一個(gè)月前我還...
    沈念sama閱讀 45,401評論 2 352
  • 正文 我出身青樓他匪,卻偏偏與公主長得像菇存,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個(gè)殘疾皇子邦蜜,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 42,700評論 2 345

推薦閱讀更多精彩內(nèi)容