第十四計 借尸還魂
迷信人認(rèn)為人死后靈魂可附著于別人的尸體而復(fù)活厢岂。后用以比喻已經(jīng)消滅或沒落的事物绳瘟,又假托別的名義或以另一種形式重新出現(xiàn)易茬。此言兵法,是說兵家要善于抓住一切機(jī)會萧吠,甚至是看去無什用處的東西左冬,努力爭取主動,壯大自己纸型,即時利用而轉(zhuǎn)不利為有利拇砰,乃至轉(zhuǎn)敗為勝。
我們演示了如何通過基于規(guī)范性標(biāo)記計算細(xì)胞周期階段評分并在預(yù)處理過程中將其從數(shù)據(jù)中回歸來減輕scRNA-seq數(shù)據(jù)中細(xì)胞周期異質(zhì)性的影響狰腌。我們在鼠類造血祖細(xì)胞的數(shù)據(jù)集上進(jìn)行了證明(Nestorowa等人除破,Blood 2016)。您可以在此處下載運(yùn)行此小插圖所需的文件癌别。
library(Seurat)
# Read in the expression matrix The first row is a header row, the first column is rownames
exp.mat <- read.table(file = "../data/nestorawa_forcellcycle_expressionMatrix.txt", header = TRUE,
as.is = TRUE, row.names = 1)
# A list of cell cycle markers, from Tirosh et al, 2015, is loaded with Seurat. We can
# segregate this list into markers of G2/M phase and markers of S phase
s.genes <- cc.genes$s.genes
g2m.genes <- cc.genes$g2m.genes
# Create our Seurat object and complete the initalization steps
marrow <- CreateSeuratObject(counts = exp.mat)
marrow <- NormalizeData(marrow)
marrow <- FindVariableFeatures(marrow, selection.method = "vst")
marrow <- ScaleData(marrow, features = rownames(marrow))
如果我們使用FindVariableFeatures()`上面發(fā)現(xiàn)的可變基因在我們的對象上運(yùn)行PCA 皂岔,我們會看到,雖然大多數(shù)差異可以通過譜系來解釋展姐,但PC8和PC10會在包括TOP2A和MKI67的細(xì)胞周期基因上分裂。我們將嘗試從數(shù)據(jù)中回歸該信號剖毯,以使細(xì)胞周期異質(zhì)性不會有助于PCA或下游分析圾笨。
marrow <- RunPCA(marrow, features = VariableFeatures(marrow), ndims.print = 6:10, nfeatures.print = 10)
## PC_ 6
## Positive: SELL, ARL6IP1, CCL9, CD34, ADGRL4, BPIFC, NUSAP1, FAM64A, CD244, C030034L19RIK
## Negative: LY6C2, AA467197, CYBB, MGST2, ITGB2, PF4, CD74, ATP1B1, GP1BB, TREM3
## PC_ 7
## Positive: HDC, CPA3, PGLYRP1, MS4A3, NKG7, UBE2C, CCNB1, NUSAP1, PLK1, FUT8
## Negative: F13A1, LY86, CFP, IRF8, CSF1R, TIFAB, IFI209, CCR2, TNS4, MS4A6C
## PC_ 8
## Positive: NUSAP1, UBE2C, KIF23, PLK1, CENPF, FAM64A, CCNB1, H2AFX, ID2, CDC20
## Negative: WFDC17, SLC35D3, ADGRL4, VLDLR, CD33, H2AFY, P2RY14, IFI206, CCL9, CD34
## PC_ 9
## Positive: IGKC, JCHAIN, LY6D, MZB1, CD74, IGLC2, FCRLA, IGKV4-50, IGHM, IGHV9-1
## Negative: SLC2A6, HBA-A1, HBA-A2, IGHV8-7, FCER1G, F13A1, HBB-BS, PLD4, HBB-BT, IGFBP4
## PC_ 10
## Positive: CTSW, XKRX, PRR5L, RORA, MBOAT4, A630014C17RIK, ZFP105, COL9A3, CLEC2I, TRAT1
## Negative: H2AFX, FAM64A, ZFP383, NUSAP1, CDC25B, CENPF, GBP10, TOP2A, GBP6, GFRA1
DimHeatmap(marrow, dims = c(8, 10))
分配細(xì)胞周期分?jǐn)?shù)
首先,我們根據(jù)每個細(xì)胞的G2 / M和S期標(biāo)志物的表達(dá)為其分配分?jǐn)?shù)逊谋。這些標(biāo)記物組在表達(dá)水平上應(yīng)該是反相關(guān)的擂达,并且不表達(dá)它們的細(xì)胞可能不會循環(huán)并處于G1期。
我們在CellCycleScoring()函數(shù)中分配分?jǐn)?shù)胶滋,該函數(shù)將S和G2 / M分?jǐn)?shù)存儲在對象元數(shù)據(jù)中板鬓,以及G2M,S或G1階段中每個單元格的預(yù)測分類究恤。CellCycleScoring()也可以通過傳遞將Seurat對象的標(biāo)識設(shè)置為細(xì)胞周期階段set.ident = TRUE
(原始標(biāo)識存儲為old.ident
)俭令。請注意,Seurat在下游細(xì)胞周期回歸中不使用離散分類(G2M / G1 / S)部宿。相反抄腔,它使用G2M和S期的定量評分。但是理张,如果有興趣赫蛇,我們會提供預(yù)測的分類。
marrow <- CellCycleScoring(marrow, s.features = s.genes, g2m.features = g2m.genes, set.ident = TRUE)
# view cell cycle scores and phase assignments
head(marrow[[]])
## orig.ident nCount_RNA nFeature_RNA S.Score G2M.Score Phase
## Prog_013 Prog 2563089 10211 -0.14248691 -0.4680395 G1
## Prog_019 Prog 3030620 9991 -0.16915786 0.5851766 G2M
## Prog_031 Prog 1293487 10192 -0.34627038 -0.3971879 G1
## Prog_037 Prog 1357987 9599 -0.44270212 0.6820229 G2M
## Prog_008 Prog 4079891 10540 0.55854051 0.1284359 S
## Prog_014 Prog 2569783 10788 0.07116218 0.3166073 G2M
## old.ident
## Prog_013 Prog
## Prog_019 Prog
## Prog_031 Prog
## Prog_037 Prog
## Prog_008 Prog
## Prog_014 Prog
# Visualize the distribution of cell cycle markers across
RidgePlot(marrow, features = c("PCNA", "TOP2A", "MCM6", "MKI67"), ncol = 2)
# Running a PCA on cell cycle genes reveals, unsurprisingly, that cells separate entirely by
# phase
marrow <- RunPCA(marrow, features = c(s.genes, g2m.genes))
DimPlot(marrow)
我們根據(jù)Tirosh等人描述的評分策略對單個細(xì)胞評分雾叭。2016悟耘。有關(guān)`?AddModuleScore()更多信息,請參見Seurat织狐。此功能可用于計算任何基因列表的監(jiān)督模塊評分暂幼。
在數(shù)據(jù)縮放期間回歸出細(xì)胞周期得分
現(xiàn)在掘殴,我們嘗試從數(shù)據(jù)中減去(“回歸”)這種異質(zhì)性來源。對于Seurat v1.4的用戶粟誓,此功能已在中實現(xiàn)RegressOut
奏寨。但是,由于此過程的結(jié)果存儲在縮放后的數(shù)據(jù)槽中(因此將覆蓋的輸出ScaleData())鹰服,因此我們現(xiàn)在將此功能合并到ScaleData()`函數(shù)本身中病瞳。
對于每個基因,Seurat建谋幔基因表達(dá)與S和G2M細(xì)胞周期得分之間的關(guān)系套菜。該模型的比例殘差表示“校正”的表達(dá)矩陣,可將其用于下游以進(jìn)行尺寸縮減设易。
marrow <- ScaleData(marrow, vars.to.regress = c("S.Score", "G2M.Score"), features = rownames(marrow))
# Now, a PCA on the variable genes no longer returns components associated with cell cycle
marrow <- RunPCA(marrow, features = VariableFeatures(marrow), nfeatures.print = 10)
## PC_ 1
## Positive: BLVRB, CAR2, KLF1, AQP1, CES2G, ERMAP, CAR1, FAM132A, RHD, SPHK1
## Negative: TMSB4X, H2AFY, CORO1A, PLAC8, EMB, MPO, PRTN3, CD34, LCP1, BC035044
## PC_ 2
## Positive: ANGPT1, ADGRG1, MEIS1, ITGA2B, MPL, DAPP1, APOE, RAB37, GATA2, F2R
## Negative: LY6C2, ELANE, HP, IGSF6, ANXA3, CTSG, CLEC12A, TIFAB, SLPI, ALAS1
## PC_ 3
## Positive: APOE, GATA2, NKG7, MUC13, MS4A3, RAB44, HDC, CPA3, FCGR3, TUBA8
## Negative: FLT3, DNTT, LSP1, WFDC17, MYL10, GIMAP6, LAX1, GPR171, TBXA2R, SATB1
## PC_ 4
## Positive: CSRP3, ST8SIA6, DNTT, MPEG1, SCIN, LGALS1, CMAH, RGL1, APOE, MFSD2B
## Negative: PROCR, MPL, HLF, MMRN1, SERPINA3G, ESAM, GSTM1, D630039A03RIK, MYL10, LY6A
## PC_ 5
## Positive: CPA3, LMO4, IKZF2, IFITM1, FUT8, MS4A2, SIGLECF, CSRP3, HDC, RAB44
## Negative: PF4, GP1BB, SDPR, F2RL2, RAB27B, SLC14A1, TREML1, PBX1, F2R, TUBA8
# When running a PCA on only cell cycle genes, cells no longer separate by cell-cycle phase
marrow <- RunPCA(marrow, features = c(s.genes, g2m.genes))
DimPlot(marrow)
由于最佳的細(xì)胞周期標(biāo)記在組織和物種之間極為保守逗柴,因此我們發(fā)現(xiàn)此程序可在各種數(shù)據(jù)集上穩(wěn)定可靠地工作。
備用工作流程
上面的過程將刪除與細(xì)胞周期相關(guān)的所有信號顿肺。在某些情況下戏溺,我們發(fā)現(xiàn)這會對下游分析產(chǎn)生負(fù)面影響,尤其是在分化過程中(如鼠類造血)屠尊,在此過程中干細(xì)胞處于靜止?fàn)顟B(tài)旷祸,分化的細(xì)胞正在增殖(反之亦然)。在這種情況下讼昆,將所有細(xì)胞周期效應(yīng)消退托享,也會使干細(xì)胞和祖細(xì)胞之間的區(qū)別變得模糊。
作為替代方案浸赫,我們建議逐步淘汰G2M和S期評分之間的差異闰围。這意味著將保持分離非循環(huán)細(xì)胞和循環(huán)細(xì)胞的信號,但是增殖細(xì)胞之間的細(xì)胞周期相位差異(通常是無趣的)將被從數(shù)據(jù)中剔除既峡。
marrow$CC.Difference <- marrow$S.Score - marrow$G2M.Score
marrow <- ScaleData(marrow, vars.to.regress = "CC.Difference", features = rownames(marrow))
# cell cycle effects strongly mitigated in PCA
marrow <- RunPCA(marrow, features = VariableFeatures(marrow), nfeatures.print = 10)
## PC_ 1
## Positive: BLVRB, KLF1, ERMAP, FAM132A, CAR2, RHD, CES2G, SPHK1, AQP1, SLC38A5
## Negative: TMSB4X, CORO1A, PLAC8, H2AFY, LAPTM5, CD34, LCP1, TMEM176B, IGFBP4, EMB
## PC_ 2
## Positive: APOE, GATA2, RAB37, ANGPT1, ADGRG1, MEIS1, MPL, F2R, PDZK1IP1, DAPP1
## Negative: CTSG, ELANE, LY6C2, HP, CLEC12A, ANXA3, IGSF6, TIFAB, SLPI, MPO
## PC_ 3
## Positive: APOE, GATA2, NKG7, MUC13, ITGA2B, TUBA8, CPA3, RAB44, SLC18A2, CD9
## Negative: DNTT, FLT3, WFDC17, LSP1, MYL10, LAX1, GIMAP6, IGHM, CD24A, MN1
## PC_ 4
## Positive: CSRP3, ST8SIA6, SCIN, LGALS1, APOE, ITGB7, MFSD2B, RGL1, DNTT, IGHV1-23
## Negative: MPL, MMRN1, PROCR, HLF, SERPINA3G, ESAM, PTGS1, D630039A03RIK, NDN, PPIC
## PC_ 5
## Positive: HDC, LMO4, CSRP3, IFITM1, FCGR3, HLF, CPA3, PROCR, PGLYRP1, IKZF2
## Negative: GP1BB, PF4, SDPR, F2RL2, TREML1, RAB27B, SLC14A1, PBX1, PLEK, TUBA8
# when running a PCA on cell cycle genes, actively proliferating cells remain distinct from G1
# cells however, within actively proliferating cells, G2M and S phase cells group together
marrow <- RunPCA(marrow, features = c(s.genes, g2m.genes))
DimPlot(marrow)