系列回顧:
ArchR官網(wǎng)教程學(xué)習(xí)筆記1:Getting Started with ArchR
ArchR官網(wǎng)教程學(xué)習(xí)筆記2:基于ArchR推測Doublet
ArchR官網(wǎng)教程學(xué)習(xí)筆記3:創(chuàng)建ArchRProject
ArchR官網(wǎng)教程學(xué)習(xí)筆記4:ArchR的降維
ArchR官網(wǎng)教程學(xué)習(xí)筆記5:ArchR的聚類
ArchR官網(wǎng)教程學(xué)習(xí)筆記6:單細(xì)胞嵌入(Single-cell Embeddings)
ArchR官網(wǎng)教程學(xué)習(xí)筆記7:ArchR的基因評分和Marker基因
除了使用基因評分進(jìn)行clusters鑒定外全蝶,ArchR還支持與scRNA-seq的整合逆害。這有助于clusters鑒定分配筷转,因?yàn)槟憧梢灾苯邮褂迷趕cRNA-seq空間中的clusters次屠,或者在整合后使用基因表達(dá)測定炸茧。這種整合的工作方式是通過比對scATAC-seq基因評分矩陣和scRNA-seq基因表達(dá)矩陣摩骨,直接將scRNA-seq的細(xì)胞與scATAC-seq的細(xì)胞對應(yīng)上蕾盯。這種對齊是使用來自Seurat
包的FindTransferAnchors()
函數(shù)執(zhí)行的竹揍,該函數(shù)允許你跨兩個(gè)數(shù)據(jù)集比對數(shù)據(jù)。但是痕鳍,為了適當(dāng)?shù)貙⒋诉^程擴(kuò)展到數(shù)十萬個(gè)細(xì)胞硫豆,ArchR通過將總細(xì)胞劃分為更小的細(xì)胞組并平行化分別執(zhí)行比對。
實(shí)際上笼呆,對于scATAC-seq數(shù)據(jù)中的每個(gè)細(xì)胞熊响,這個(gè)整合過程是在scRNA-seq數(shù)據(jù)中查找看起來最相似的細(xì)胞,并將該scRNA-seq細(xì)胞的基因表達(dá)數(shù)據(jù)分配給scATAC-seq細(xì)胞诗赌。最后汗茄,scATAC-seq空間中的每個(gè)細(xì)胞都被分配了一個(gè)可用于下游分析的基因表達(dá)特征。本章介紹了如何使用這些信息來分配clusters境肾,而后面的章節(jié)則展示了如何將鏈接的scRNA-seq數(shù)據(jù)應(yīng)用于更復(fù)雜的分析剔难,比如識別預(yù)測的順式調(diào)控元件胆屿。我們相信奥喻,隨著多組學(xué)單細(xì)胞的商業(yè)化,這些綜合分析將變得越來越重要非迹。同時(shí)环鲤,使用匹配細(xì)胞類型的公開可用的scRNA-seq數(shù)據(jù),或在你感興趣的樣本上生成的scRNA-seq數(shù)據(jù)憎兽,可以增強(qiáng)在ArchR中執(zhí)行的scATAC-seq分析冷离。
(一)跨平臺將scATAC-seq細(xì)胞與scRNA-seq細(xì)胞聯(lián)系
為了將我們的教程scATAC-seq數(shù)據(jù)與匹配的scRNA-seq數(shù)據(jù)進(jìn)行整合,我們將使用Granja*等人(2019)從相同的造血細(xì)胞類型中得到的scRNA-seq數(shù)據(jù)纯命。
我們已經(jīng)把這個(gè)scRNA-seq數(shù)據(jù)存儲為一個(gè)大小為111 MB的RangedSummarizedExperiment
對象西剥。但是,ArchR也接受未修改的Seurat
對象作為整合工作流的輸入亿汞。下載并檢查這個(gè)對象瞭空,我們看到它有一個(gè)基因表達(dá)計(jì)數(shù)矩陣和相關(guān)的metadata:
> if(!file.exists("scRNA-Hematopoiesis-Granja-2019.rds")){
download.file(
url = "https://jeffgranja.s3.amazonaws.com/ArchR/TestData/scRNA-Hematopoiesis-Granja-2019.rds",
destfile = "scRNA-Hematopoiesis-Granja-2019.rds"
)
}
> seRNA <- readRDS("scRNA-Hematopoiesis-Granja-2019.rds")
> seRNA
class: RangedSummarizedExperiment
dim: 20287 35582
metadata(0):
assays(1): counts
rownames(20287): FAM138A OR4F5 ... S100B PRMT2
rowData names(3): gene_name gene_id exonLength
colnames(35582): CD34_32_R5:AAACCTGAGTATCGAA-1
CD34_32_R5:AAACCTGAGTCGTTTG-1 ...
BMMC_10x_GREENLEAF_REP2:TTTGTTGCATGTGTCA-1
BMMC_10x_GREENLEAF_REP2:TTTGTTGCATTGAAAG-1
colData names(10): Group nUMI_pre ... BioClassification Barcode
這個(gè)metadata包含了一列,叫BioClassification
疗我,包含了scRNA-seq數(shù)據(jù)集里每個(gè)細(xì)胞的細(xì)胞類型鑒定咆畏。
> colnames(colData(seRNA))
[1] "Group" "nUMI_pre" "nUMI"
[4] "nGene" "initialClusters" "UMAP1"
[7] "UMAP2" "Clusters" "BioClassification"
[10] "Barcode"
使用table()
可以看一下每一個(gè)細(xì)胞類型里有多少個(gè)細(xì)胞:
> table(colData(seRNA)$BioClassification)
01_HSC 02_Early.Eryth 03_Late.Eryth 04_Early.Baso
1425 1653 446 111
05_CMP.LMPP 06_CLP.1 07_GMP 08_GMP.Neut
2260 903 2097 1050
09_pDC 10_cDC 11_CD14.Mono.1 12_CD14.Mono.2
544 325 1800 4222
13_CD16.Mono 14_Unk 15_CLP.2 16_Pre.B
292 520 377 710
17_B 18_Plasma 19_CD8.N 20_CD4.N1
1711 62 1521 2470
21_CD4.N2 22_CD4.M 23_CD8.EM 24_CD8.CM
2364 3539 796 2080
25_NK 26_Unk
2143 161
我們可以執(zhí)行兩種類型的整合方法。無限制整合是一種完全不可知的(agnostic)方法吴裤,它采用scATAC-seq實(shí)驗(yàn)中的所有細(xì)胞旧找,并嘗試把它們與scRNA-seq實(shí)驗(yàn)中的任何細(xì)胞進(jìn)行比對。雖然這是一個(gè)可行的初步解決方案麦牺,但我們可以通過限制整合過程來提高跨平臺比對的質(zhì)量钮蛛。為了執(zhí)行限制整合鞭缭,我們使用細(xì)胞類型的先驗(yàn)知識來限制比對的搜索空間。舉個(gè)例子:如果我們知道cluster A, B和C 在scATAC-seq數(shù)據(jù)對應(yīng)于三種不同T細(xì)胞群魏颓,并且我們知細(xì)胞群X和Y在 scRNA-seq數(shù)據(jù)對應(yīng)于兩個(gè)不同T細(xì)胞群缚去,我們可以告訴ArchR特別嘗試使細(xì)胞從scATAC-seq里的A,B和C細(xì)胞比對scRNA-seq里的X和Y 。下面我們舉例說明這兩種方法琼开,首先執(zhí)行無限制整合來獲得初步的cluster鑒定易结,然后使用這些先驗(yàn)知識來執(zhí)行更精細(xì)的限制整合。
(一)無限制整合
為了整合scATAC-seq和scRNA-seq柜候,我們使用了addGeneIntegrationMatrix()
函數(shù)搞动。如上所述,該函數(shù)通過seRNA
參數(shù)接受Seurat
對象或RangedSummarizedExperiment
對象渣刷。我們執(zhí)行的第一輪合成是初步的無限制合成鹦肿,我們不會將其存儲在Arrow files中(addToArrow = FALSE
)。我們?yōu)楹铣傻木仃囂峁┮粋€(gè)名稱辅柴,該名稱將通過matrixName
參數(shù)存儲在ArchRProject中箩溃。此函數(shù)的其他關(guān)鍵參數(shù)提供cellColData列名,其中將存儲某些信息:nameCell
將存儲來自匹配的scRNA-seq細(xì)胞的細(xì)胞ID, nameGroup
將存儲來自scRNA-seq細(xì)胞的細(xì)胞群ID碌嘀,而nameScore
將存儲跨平臺整合得分涣旨。
> projHeme2 <- addGeneIntegrationMatrix(
ArchRProj = projHeme2,
useMatrix = "GeneScoreMatrix",
matrixName = "GeneIntegrationMatrix",
reducedDims = "IterativeLSI",
seRNA = seRNA,
addToArrow = FALSE,
groupRNA = "BioClassification",
nameCell = "predictedCell_Un",
nameGroup = "predictedGroup_Un",
nameScore = "predictedScore_Un"
)
(二)限制整合
現(xiàn)在我們已經(jīng)完成了初步的無限制整合,我們將識別通用的細(xì)胞類型來提供一個(gè)框架股冗,從而進(jìn)一步優(yōu)化整合結(jié)果霹陡。
鑒于本教程的數(shù)據(jù)來自于造血細(xì)胞,我們限制整合會很理想止状,使相似的細(xì)胞類型相關(guān)聯(lián)烹棉。首先,我們將鑒定從scRNA-seq數(shù)據(jù)中哪些細(xì)胞類型在scATAC-seq cluster中最富集怯疤。這樣做的目的是使用無限制整合識別scATAC-seq和scRNA-seq數(shù)據(jù)中與T細(xì)胞和NK細(xì)胞對應(yīng)的細(xì)胞浆洗,以便我們可以使用這些信息執(zhí)行有限制整合。為此集峦,我們將創(chuàng)建一個(gè)confusionMatrix
伏社,它查看Clusters
和predictedGroup_Un
的交集,后者包含由scRNA-seq鑒定的細(xì)胞類型少梁。
> cM <- as.matrix(confusionMatrix(projHeme2$Clusters, projHeme2$predictedGroup_Un))
> preClust <- colnames(cM)[apply(cM, 1 , which.max)]
> cbind(preClust, rownames(cM)) #Assignments
preClust
[1,] "08_GMP.Neut" "C4"
[2,] "21_CD4.N2" "C7"
[3,] "16_Pre.B" "C9"
[4,] "17_B" "C10"
[5,] "11_CD14.Mono.1" "C1"
[6,] "01_HSC" "C5"
[7,] "02_Early.Eryth" "C3"
[8,] "22_CD4.M" "C8"
[9,] "09_pDC" "C11"
[10,] "25_NK" "C6"
[11,] "15_CLP.2" "C12"
[12,] "11_CD14.Mono.1" "C2"
上面的list顯示了哪些scRNA-seq細(xì)胞類型在12個(gè)scATAC-seq 聚類里最富集洛口。
首先,來看一下上面在無限制整合的scRNA-seq數(shù)據(jù)中的細(xì)胞類型標(biāo)簽:
> unique(unique(projHeme2$predictedGroup_Un))
[1] "08_GMP.Neut" "23_CD8.EM" "16_Pre.B" "25_NK"
[5] "17_B" "12_CD14.Mono.2" "11_CD14.Mono.1" "02_Early.Eryth"
[9] "03_Late.Eryth" "07_GMP" "22_CD4.M" "21_CD4.N2"
[13] "05_CMP.LMPP" "09_pDC" "20_CD4.N1" "15_CLP.2"
[17] "06_CLP.1" "01_HSC" "04_Early.Baso" "19_CD8.N"
在上面的列表中凯沪,我們可以看到scRNA-seq數(shù)據(jù)中的對應(yīng)的T細(xì)胞和NK細(xì)胞是clusters19-25第焰。我們將創(chuàng)建一個(gè)基于字符串表示法用于下游限制整合。
> cTNK <- paste0(paste0(19:25), collapse="|")
> cTNK
[1] "19|20|21|22|23|24|25"
然后我們可以用所有其他clusters妨马,創(chuàng)建一個(gè)基于字符串的表示法表示所有“非T細(xì)胞挺举,非NK細(xì)胞”的clusters杀赢。(Cluster 1 - 18)
> cNonTNK <- paste0(c(paste0("0", 1:9), 10:13, 15:18), collapse="|")
> cNonTNK
[1] "01|02|03|04|05|06|07|08|09|10|11|12|13|15|16|17|18"
這些基于字符串的表示法是模式匹配字符串,我們將使用grep提取與這些scRNA-seq細(xì)胞類型對應(yīng)的scATAC-seq clusters湘纵。字符串中的|
充當(dāng)or
語句脂崔,因此我們在confusion matrix的preClust列中搜索與模式匹配字符串中提供的scRNA-seq cluster數(shù)字匹配的任何行。
對于T細(xì)胞和NK細(xì)胞梧喷,識別scATAC-seq cluster C6砌左、C7和C8:
> clustTNK <- rownames(cM)[grep(cTNK, preClust)]
> clustTNK
[1] "C7" "C8" "C6"
對于非T細(xì)胞和非NK細(xì)胞:
> clustNonTNK <- rownames(cM)[grep(cNonTNK, preClust)]
> clustNonTNK
[1] "C4" "C9" "C10" "C1" "C5" "C3" "C11" "C12" "C2"
然后我們執(zhí)行類似的操作來鑒定對應(yīng)于這些相同細(xì)胞類型的scRNA-seq細(xì)胞。首先铺敌,我們在scRNA-seq數(shù)據(jù)中鑒定T細(xì)胞和NK細(xì)胞:
> rnaTNK <- colnames(seRNA)[grep(cTNK, colData(seRNA)$BioClassification)]
> head(rnaTNK)
[1] "PBMC_10x_GREENLEAF_REP1:AAACCCAGTCGTCATA-1"
[2] "PBMC_10x_GREENLEAF_REP1:AAACCCATCCGATGTA-1"
[3] "PBMC_10x_GREENLEAF_REP1:AAACCCATCTCAACGA-1"
[4] "PBMC_10x_GREENLEAF_REP1:AAACCCATCTCTCGAC-1"
[5] "PBMC_10x_GREENLEAF_REP1:AAACGAACAATCGTCA-1"
[6] "PBMC_10x_GREENLEAF_REP1:AAACGAACACGATTCA-1"
再在scRNA-seq數(shù)據(jù)中鑒定非T細(xì)胞和非NK細(xì)胞:
> rnaNonTNK <- colnames(seRNA)[grep(cNonTNK, colData(seRNA)$BioClassification)]
> head(rnaNonTNK)
[1] "CD34_32_R5:AAACCTGAGTATCGAA-1" "CD34_32_R5:AAACCTGAGTCGTTTG-1"
[3] "CD34_32_R5:AAACCTGGTTCCACAA-1" "CD34_32_R5:AAACGGGAGCTTCGCG-1"
[5] "CD34_32_R5:AAACGGGAGGGAGTAA-1" "CD34_32_R5:AAACGGGAGTTACGGG-1"
為了準(zhǔn)備限制整合汇歹,我們創(chuàng)建一個(gè)嵌套列表。這是一個(gè)包含多個(gè)SimpleList
對象的SimpleList
偿凭,每個(gè)對象對應(yīng)我們想要限制的組产弹。在這個(gè)例子中,我們有兩組:一組稱為TNK弯囊,在兩個(gè)平臺上識別T細(xì)胞和NK細(xì)胞痰哨,另一組稱為NonTNK,在兩個(gè)平臺上識別非T細(xì)胞和NK細(xì)胞匾嘱。每個(gè)SimpleList
對象都有兩個(gè)細(xì)胞id載體斤斧,一個(gè)稱為ATAC,另一個(gè)稱為RNA奄毡,如下所示:
> groupList <- SimpleList(
TNK = SimpleList(
ATAC = projHeme2$cellNames[projHeme2$Clusters %in% clustTNK],
RNA = rnaTNK
),
NonTNK = SimpleList(
ATAC = projHeme2$cellNames[projHeme2$Clusters %in% clustNonTNK],
RNA = rnaNonTNK
)
)
#我們將這個(gè)列表傳遞給' addGeneIntegrationMatrix() '函數(shù)的' groupList '參數(shù)來限制我們的整合折欠。
#注意,在本例中吼过,我們?nèi)匀粵]有將這些結(jié)果添加到Arrow files中(' addToArrow = FALSE ')。
#我們建議在將結(jié)果保存到Arrow files之前咪奖,根據(jù)你的期望檢查整合的結(jié)果盗忱。我們將在下一部分說明這個(gè)過程。
> projHeme2 <- addGeneIntegrationMatrix(
ArchRProj = projHeme2,
useMatrix = "GeneScoreMatrix",
matrixName = "GeneIntegrationMatrix",
reducedDims = "IterativeLSI",
seRNA = seRNA,
addToArrow = FALSE,
groupList = groupList,
groupRNA = "BioClassification",
nameCell = "predictedCell_Co",
nameGroup = "predictedGroup_Co",
nameScore = "predictedScore_Co"
)
(三)比較無限制整合和限制整合
為了比較無限制和有限制的整合羊赵,我們將根據(jù)通過整合識別的scRNA-seq細(xì)胞類型為scATAC-seq數(shù)據(jù)中的細(xì)胞著色趟佃。為此,我們將使用內(nèi)置的ArchR函數(shù)paletteDiscrete()
創(chuàng)建一個(gè)調(diào)色板(palette)昧捷。
> pal <- paletteDiscrete(values = colData(seRNA)$BioClassification)
Length of unique values greater than palette, interpolating..
在ArchR中闲昭,調(diào)色板本質(zhì)上是一個(gè)命名的向量,其中的值是十六進(jìn)制代碼靡挥,對應(yīng)于與名稱相關(guān)聯(lián)的顏色序矩。
> pal
01_HSC 02_Early.Eryth 03_Late.Eryth 04_Early.Baso
"#D51F26" "#502A59" "#235D55" "#3D6E57"
05_CMP.LMPP 06_CLP.1 07_GMP 08_GMP.Neut
"#8D2B8B" "#DE6C3E" "#F9B712" "#D8CE42"
09_pDC 10_cDC 11_CD14.Mono.1 12_CD14.Mono.2
"#8E9ACD" "#B774B1" "#D69FC8" "#C7C8DE"
13_CD16.Mono 14_Unk 15_CLP.2 16_Pre.B
"#8FD3D4" "#89C86E" "#CC9672" "#CF7E96"
17_B 18_Plasma 19_CD8.N 20_CD4.N1
"#A27AA4" "#CD4F32" "#6B977E" "#518AA3"
21_CD4.N2 22_CD4.M 23_CD8.EM 24_CD8.CM
"#5A5297" "#0F707D" "#5E2E32" "#A95A3C"
25_NK 26_Unk
"#B28D5C" "#3D3D3D"
現(xiàn)在,我們可以通過在無限制整合的基礎(chǔ)上覆蓋scATAC-seq數(shù)據(jù)上的scRNA-seq細(xì)胞類型來可視化整合:
> p1 <- plotEmbedding(
projHeme2,
colorBy = "cellColData",
name = "predictedGroup_Un",
pal = pal
)
> p1
同樣的跋破,我們也可以可視化通過限制整合的基礎(chǔ)上覆蓋scATAC-seq數(shù)據(jù)上的scRNA-seq細(xì)胞類型:
> p2 <- plotEmbedding(
projHeme2,
colorBy = "cellColData",
name = "predictedGroup_Co",
pal = pal
)
> p2
在本例中簸淀,無限制整合和限制整合之間的區(qū)別非常微妙瓶蝴,這主要是因?yàn)楦信d趣的細(xì)胞類型已經(jīng)非常不同了。然而租幕,你應(yīng)該注意到差異舷手,特別是在T細(xì)胞簇(簇17-22)。
保存圖片:
> plotPDF(p1,p2, name = "Plot-UMAP-RNA-Integration.pdf", ArchRProj = projHeme2, addDOC = FALSE, width = 5, height = 5)
保存project:
> saveArchRProject(ArchRProj = projHeme2, outputDirectory = "D:/yanfang/ArchR/Save-ProjHeme2", load = FALSE)
(二)給scATAC-seq數(shù)據(jù)添加擬scRNA-seq表達(dá)譜
現(xiàn)在我們對scATAC-seq和scRNA-seq整合的結(jié)果感覺還可以劲绪,我們可以使用addToArrow = TRUE
重新運(yùn)行整合的過程男窟,這次我們將鏈接的基因表達(dá)數(shù)據(jù)添加到Arrow files中。如前所述贾富,我們通過groupList
限制整合蝎宇,并給cellColData的每個(gè)元數(shù)據(jù)nameCell
、nameGroup
和nameScore
添加列名祷安。在這里姥芥,我們創(chuàng)建了projHeme3,將在本教程中繼續(xù)使用汇鞭。
> projHeme3 <- addGeneIntegrationMatrix(
ArchRProj = projHeme2,
useMatrix = "GeneScoreMatrix",
matrixName = "GeneIntegrationMatrix",
reducedDims = "IterativeLSI",
seRNA = seRNA,
addToArrow = TRUE,
force= TRUE,
groupList = groupList,
groupRNA = "BioClassification",
nameCell = "predictedCell",
nameGroup = "predictedGroup",
nameScore = "predictedScore"
)
現(xiàn)在凉唐,當(dāng)我們想查看哪些矩陣是可用的,我們使用getAvailableMatrices()
函數(shù)查看霍骄。發(fā)現(xiàn)GeneIntegrationMatrix已經(jīng)被加到Arrow files里了:
> getAvailableMatrices(projHeme3)
[1] "GeneIntegrationMatrix" "GeneScoreMatrix" "TileMatrix"
有了這個(gè)新的GeneIntegrationMatrix台囱,我們可以比較鏈接的基因表達(dá)和從基因評分中推測的基因表達(dá)水平。
首先读整,我們先要確保在project里添加了填充權(quán)重:
> projHeme3 <- addImputeWeights(projHeme3)
然后簿训,可以畫一些UMAP圖,是GeneIntegrationMatrix里的基因表達(dá)值:
> markerGenes <- c(
"CD34", #Early Progenitor
"GATA1", #Erythroid
"PAX5", "MS4A1", #B-Cell Trajectory
"CD14", #Monocytes
"CD3D", "CD8A", "TBX21", "IL7R" #TCells
)
> p1 <- plotEmbedding(
ArchRProj = projHeme3,
colorBy = "GeneIntegrationMatrix",
name = markerGenes,
continuousSet = "horizonExtra",
embedding = "UMAP",
imputeWeights = getImputeWeights(projHeme3)
)
再畫一些UMAP圖米间,是來自GeneScoreMatrix的基因評分值:
> p2 <- plotEmbedding(
ArchRProj = projHeme3,
colorBy = "GeneScoreMatrix",
continuousSet = "horizonExtra",
name = markerGenes,
embedding = "UMAP",
imputeWeights = getImputeWeights(projHeme3)
)
使用cowplot
把所有的marker基因畫出來:
> p1c <- lapply(p1, function(x){
x + guides(color = FALSE, fill = FALSE) +
theme_ArchR(baseSize = 6.5) +
theme(plot.margin = unit(c(0, 0, 0, 0), "cm")) +
theme(
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank()
)
})
> p2c <- lapply(p2, function(x){
x + guides(color = FALSE, fill = FALSE) +
theme_ArchR(baseSize = 6.5) +
theme(plot.margin = unit(c(0, 0, 0, 0), "cm")) +
theme(
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank()
)
})
> do.call(cowplot::plot_grid, c(list(ncol = 3), p1c))
> do.call(cowplot::plot_grid, c(list(ncol = 3), p2c))
和我們預(yù)期的一樣强品,用兩種方法推測的基因表達(dá)很相似,但不是完全一樣屈糊。
保存:
> plotPDF(plotList = p1,
name = "Plot-UMAP-Marker-Genes-RNA-W-Imputation.pdf",
ArchRProj = projHeme3,
addDOC = FALSE, width = 5, height = 5)
(三)使用scRNA-seq信息給scATAC-seq clusters做標(biāo)記
一旦我們確定了 scATAC-seq 和scRNA-seq的比對是正確的的榛,我們可以使用scRNA-seq數(shù)據(jù)里的細(xì)胞類型給scATAC-seq clusters標(biāo)記。
首先我們要先創(chuàng)建一個(gè)confusion matrix逻锐,包括scATAC-seq聚類夫晌,以及從整合分析中獲得的predictedGroup
對象:
> cM <- confusionMatrix(projHeme3$Clusters, projHeme3$predictedGroup)
> labelOld <- rownames(cM)
> labelOld
[1] "C4" "C7" "C9" "C10" "C1" "C5" "C3" "C8" "C11" "C6" "C12" "C2"
對于每一個(gè)scATAC-seq cluster,我們從predictedGroup
里鑒定細(xì)胞類型:
> labelNew <- colnames(cM)[apply(cM, 1, which.max)]
> labelNew
[1] "08_GMP.Neut" "21_CD4.N2" "16_Pre.B" "17_B"
[5] "12_CD14.Mono.2" "01_HSC" "03_Late.Eryth" "22_CD4.M"
[9] "09_pDC" "22_CD4.M" "15_CLP.2" "12_CD14.Mono.2"
接下來昧诱,我們需要重新分類這些新的labels晓淀,弄一個(gè)更簡單的分類。對于每個(gè)scRNA-seq cluster盏档,我們將重新定義它們的標(biāo)簽凶掰,使得更加容易的去解讀。
> remapClust <- c(
"01_HSC" = "Progenitor",
"02_Early.Eryth" = "Erythroid",
"03_Late.Eryth" = "Erythroid",
"04_Early.Baso" = "Basophil",
"05_CMP.LMPP" = "Progenitor",
"06_CLP.1" = "CLP",
"07_GMP" = "GMP",
"08_GMP.Neut" = "GMP",
"09_pDC" = "pDC",
"10_cDC" = "cDC",
"11_CD14.Mono.1" = "Mono",
"12_CD14.Mono.2" = "Mono",
"13_CD16.Mono" = "Mono",
"15_CLP.2" = "CLP",
"16_Pre.B" = "PreB",
"17_B" = "B",
"18_Plasma" = "Plasma",
"19_CD8.N" = "CD8.N",
"20_CD4.N1" = "CD4.N",
"21_CD4.N2" = "CD4.N",
"22_CD4.M" = "CD4.M",
"23_CD8.EM" = "CD8.EM",
"24_CD8.CM" = "CD8.CM",
"25_NK" = "NK"
)
#將新定義的標(biāo)簽與上面從`predictedGroup`里鑒定細(xì)胞類型做個(gè)匹配
#把匹配上的存到一個(gè)新的變量里
> remapClust <- remapClust[names(remapClust) %in% labelNew]
使用mapLabels()
函數(shù),轉(zhuǎn)換我們的標(biāo)簽:
> labelNew2 <- mapLabels(labelNew, oldLabels = names(remapClust), newLabels = remapClust)
> labelNew2
[1] "GMP" "CD4.N" "PreB" "B" "Mono"
[6] "Progenitor" "Erythroid" "CD4.M" "pDC" "CD4.M"
[11] "CLP" "Mono"
結(jié)合labelsOld
和labelsNew2
锄俄,我們現(xiàn)在可以使用mapLabels()
功能再創(chuàng)建新的cluster標(biāo)簽在 cellColData
里:
> projHeme3$Clusters2 <- mapLabels(projHeme3$Clusters, newLabels = labelNew2, oldLabels = labelOld)
用新鑒定的cluster畫UMAP:
> p1 <- plotEmbedding(projHeme3, colorBy = "cellColData", name = "Clusters2")
> p1
保存:
> plotPDF(p1, name = "Plot-UMAP-Remap-Clusters.pdf", ArchRProj = projHeme2, addDOC = FALSE, width = 5, height = 5)
保存project3:
> saveArchRProject(ArchRProj = projHeme3, outputDirectory = "Save-ProjHeme3", load = FALSE)