pd - l1等抑制性免疫檢查點分子的表達在人類癌癥中較為常見,可導致T細胞介導的免疫應答的抑制捞奕。在這里,我們應用ECCITE-seq技術來探索調控pd - l1表達的分子網絡详瑞。ECCITE-seq技術將混合的CRISPR篩查與單細胞mRNA和表面蛋白測量相結合遗嗽。我們還開發(fā)了一個計算框架粘我,mixscape
,它通過識別和去除混雜的變異源痹换,大大提高了單細胞擾動屏幕的信噪比征字。利用這些工具,我們識別和驗證PD-L1的調控因子娇豫,并利用我們的多模態(tài)數據識別轉錄和轉錄后的調控模式匙姜。特別是,我們發(fā)現kelch樣蛋白keap1和轉錄激活因子NRF2介導了IFN刺激后pd - l1的上調冯痢。我們的結果為免疫檢查點的調節(jié)確定了一個新的機制氮昧,并為分析多模態(tài)單細胞perturbation screens提供了一個強大的分析框架 。
免疫檢查點(IC)分子調節(jié)免疫反應中激活和抑制之間的關鍵平衡浦楣。在正常生理條件下袖肥,抑制IC分子對于維持自身耐受和防止自身免疫是必不可少的,但在人類癌癥中椒振,抑制IC分子的表達常常被錯誤調控昭伸,以逃避免疫監(jiān)控。例如澎迎,抑制性IC PD-L1與T細胞上的pd -1受體相互作用庐杨,抑制T細胞活化,在許多癌癥中過表達夹供,并作為患者生存和免疫治療反應的預后因素灵份。因此,人們不僅對識別阻斷這些相互作用的治療途徑感興趣哮洽,而且對理解癌細胞上調PD-L1等ICs的分子網絡也很感興趣填渠。
之前的研究已經初步建立了一套能夠影響PD-L1 mRNA和表面蛋白水平的分子調控因子。大量研究發(fā)現,干擾素(IFN)的暴露可迅速誘導pd - l1在體外和腫瘤微環(huán)境中的表達[7-10]氛什。因此莺葫,IFN反應的核心成分是pd - l1表達的上游調控因子,包括轉錄因子IRF1(直接與pd - l1啟動子結合)枪眉、JAK-STATsignal轉導通路和IFN受體本身捺檬。此外,還鑒定了其他IFN的阻斷信號贸铜、pd - l1啟動子染色質狀態(tài)或對uv介導的應激的調節(jié)因子堡纬。
此外,最近人們特別關注pd - l1穩(wěn)定性和降解的轉錄后調節(jié)因子的特性蒿秦。例如烤镐,Cullin 3-SPOPE3-ligase復合物可以細胞周期依賴的方式直接泛素化PD-L1,導致其降解棍鳖。此外炮叶,全基因組CRISPR篩選發(fā)現了兩個之前未鑒定的調節(jié)因子cmtm6和CMTM4,它們通過防止溶酶體介導的降解來穩(wěn)定PD-L1表面表達鹊杖。在這些案例中悴灵,pd - l1調節(jié)劑的干擾被證明可以調節(jié)抗腫瘤T細胞的活性,這就突出了理解抑制IC分子調控的治療興趣骂蓖。
我們最近引入了擴展的crispr兼容的CITE-seq (ECCITE-seq)积瞒,它同時測量轉錄組、表面蛋白水平和在單細胞分辨率上的擾動登下,作為一種識別和表征分子調節(jié)劑[18]的新方法茫孔。ECCITE-seq建立在混合CRISPR屏幕的實驗設計上,在單一實驗中被芳,多個擾動被復用在一起缰贝,但是有明顯的優(yōu)勢。首先畔濒,單細胞測序讀數(即 Perturb-seq, CROP-seq, CRISP-seq)能夠測量詳細的分子表型剩晴,而不是單個表型(單個蛋白的表達或細胞活力)。其次侵状,通過同時耦合測量mRNA赞弥、表面蛋白和直接檢測同一細胞內的導聯,ECCITE-seq允許對每個擾動進行多模態(tài)表征趣兄。因此绽左,我們認為ECCITE-seq將使我們能夠同時測試和識別IC分子的新調控因子,特別是區(qū)分轉錄模式和轉錄后模式艇潭。此外拼窥,豐富和高維的讀出容易促進網絡和基于路徑的分析戏蔑,這可以超越鑒定單個基因和產生對其調控機制的洞察。
在這里鲁纠,我們應用ECCITE-seq來同時擾亂和表征假定的調節(jié)抑制分子對IFN不刺激的反應总棵。當分析我們的單細胞數據時,我們確定了混雜的異質性來源房交,包括那些接受了靶向引導RNA但沒有表現出干擾效應的細胞彻舰,這些細胞在下游分析中引入了大量的噪聲。我們開發(fā)并驗證了計算方法來控制這些因素候味,并大幅增加了我們的統(tǒng)計能力來表征多模態(tài)擾動。
利用這些工具隔心,我們確定了一組基因白群,其擾動會影響pd - l1轉錄水平、表面蛋白水平硬霍,或兩者都影響帜慢,并確定了每個調控器所使用的潛在分子通路。特別是唯卖,我們發(fā)現在人類癌癥中經常突變的kelch樣蛋白keap1和轉錄激活因子NRF2可以改變pd - l1水平粱玲。我們將這些發(fā)現與CUL3的一個新的調節(jié)機制聯系起來,并表明該基因通過穩(wěn)定NRF2通路作為PD-L1mRNA的間接轉錄激活劑拜轨。綜上所述抽减,我們的發(fā)現確定了免疫檢查點調節(jié)的一個重要途徑,并提出了一個強大的和廣泛適用的分析框架來分析ECCITE-seq數據橄碾。
在ECCITE-seq實驗中卵沉,我們運行了10x Genomics 5’(Chromium Single Cell Immune Profiling Solution v1.0, #1000014法牲, #1000020史汗, #1000151)的8個lanes ,目標是每個lanes 捕獲10000個細胞拒垃。在運行之前停撞,細胞存活率被確定和細胞數量估計之前描述。為了跟蹤每個生物學重復身份悼瓮,樣本按照細胞hashing protocol 進行戈毒。
- mRNA
- hashtags (Hashtag-derived oligos, HTOs)
- protein (Antibody-derived oligos, ADTs)
- gRNA (Guide-derived oligos, GDOs)
文庫均由10x genomics and ECCITE-seq protocols方法構建。在NovaSeq運行時谤牡,所有庫都在兩個lanes上進行測序副硅。利用Cellranger (V2.1.1)將來自mRNA文庫的測序片段映射到hg19參考基因組。為了生成HTO翅萤、ADT和GDO庫的計數矩陣恐疲,使用了CITE-seq-count package (https://github.com/Hoohm/CITE-seq-Count)腊满。然后使用計數矩陣作為Seurat R包的輸入來執(zhí)行所有的下游分析。
下面我們跟著官網教程來看看是如何達到目的的培己。
首先碳蛋,我們需要安裝新的Seurat,下載示例數據:
remotes::install_github(“satijalab/seurat”, ref = “mixscape”)
# Load packages.
library(Seurat)
library(SeuratData)
library(ggplot2)
library(patchwork)
library(scales)
library(dplyr)
# Download dataset using SeuratData.
InstallData(ds = "thp1.eccite")
# Setup custom theme for plotting.
custom_theme <- theme(plot.title = element_text(size = 16, hjust = 0.5), legend.key.size = unit(0.7,
"cm"), legend.text = element_text(size = 14))
老規(guī)矩省咨,我們來看一下數據格式肃弟。
# Load object.
eccite <- LoadData(ds = "thp1.eccite")
eccite
An object of class Seurat
18776 features across 20729 samples within 4 assays
Active assay: RNA (18649 features, 0 variable features)
3 other assays present: ADT, HTO, GDO
我們看到有4個assay: RNA , ADT, HTO, GDO,一個Seurat玩轉4套數據零蓉,在才叫多模態(tài)啊笤受。我們分別看看這四套數據的內容:
eccite@assays$RNA@data[1:4,1:4]
4 x 4 sparse Matrix of class "dgCMatrix"
l1_AAACCTGAGCCAGAAC l1_AAACCTGAGTGGACGT l1_AAACCTGCATGAGCGA l1_AAACCTGTCTTGTCAT
AL627309.1 . . . .
AP006222.2 . . . .
RP4-669L17.10 . . . .
RP11-206L10.3 . . . .
> eccite@assays$ADT@data[1:4,1:4]
4 x 4 sparse Matrix of class "dgCMatrix"
l1_AAACCTGAGCCAGAAC l1_AAACCTGAGTGGACGT l1_AAACCTGCATGAGCGA l1_AAACCTGTCTTGTCAT
CD86 118 243 99 110
PDL1 522 139 109 334
PDL2 94 104 56 48
CD366 67 59 80 47
> eccite@assays$HTO@data[1:4,1:4]
4 x 4 sparse Matrix of class "dgCMatrix"
l1_AAACCTGAGCCAGAAC l1_AAACCTGAGTGGACGT l1_AAACCTGCATGAGCGA l1_AAACCTGTCTTGTCAT
rep1-tx 82 18 55 14
rep1-ctrl 3 1 4 .
rep2-tx 13 11 6 6
rep2-ctrl 1 4 . .
> eccite@assays$GDO@data[1:4,1:4]
4 x 4 sparse Matrix of class "dgCMatrix"
l1_AAACCTGAGCCAGAAC l1_AAACCTGAGTGGACGT l1_AAACCTGCATGAGCGA l1_AAACCTGTCTTGTCAT
eGFPg1 1 1 1 1
CUL3g1 1 1 1 1
CUL3g2 1 1 1 1
CUL3g3 1 1 1 1
metadata的內容:
head(eccite@meta.data)
orig.ident nCount_RNA nFeature_RNA nCount_HTO nFeature_HTO nCount_GDO nFeature_GDO nCount_ADT nFeature_ADT percent.mito MULTI_ID MULTI_classification
l1_AAACCTGAGCCAGAAC Lane1 17207 3942 99 4 576 111 801 4 2.295577 rep1-tx rep1-tx
l1_AAACCTGAGTGGACGT Lane1 9506 2948 35 5 190 111 545 4 4.512939 rep1-tx rep1-tx
l1_AAACCTGCATGAGCGA Lane1 15256 4258 66 4 212 111 344 4 4.116413 rep1-tx rep1-tx
l1_AAACCTGTCTTGTCAT Lane1 5135 1780 22 3 243 111 539 4 5.491723 rep1-tx rep1-tx
l1_AAACGGGAGAACAACT Lane1 9673 2671 99 5 198 111 1053 4 3.359868 rep1-tx rep1-tx
l1_AAACGGGAGACAGAGA Lane1 14941 3918 97 5 124 111 487 4 3.379961 rep1-tx rep1-tx
MULTI_classification.global HTO_classification guide_ID guide_ID.global gene con NT crispr replicate S.Score G2M.Score Phase
l1_AAACCTGAGCCAGAAC rep1-tx rep1-tx STAT2g2 Singlet STAT2 tx STAT2g2 Perturbed rep1 -0.25271565 -0.77130934 G1
l1_AAACCTGAGTGGACGT rep1-tx rep1-tx CAV1g4 Singlet CAV1 tx CAV1g4 Perturbed rep1 -0.12380192 -0.33260303 G1
l1_AAACCTGCATGAGCGA rep1-tx rep1-tx STAT1g2 Singlet STAT1 tx STAT1g2 Perturbed rep1 -0.15463259 -0.69441836 G1
l1_AAACCTGTCTTGTCAT rep1-tx rep1-tx CD86g1 Singlet CD86 tx CD86g1 Perturbed rep1 -0.06126198 -0.03781951 G1
l1_AAACGGGAGAACAACT rep1-tx rep1-tx IRF7g2 Singlet IRF7 tx IRF7g2 Perturbed rep1 -0.13218850 -0.35315597 G1
l1_AAACGGGAGACAGAGA rep1-tx rep1-tx NTg1 Singlet NT tx NT NT rep1 -0.20998403 -0.58247261 G1
禁不住用我們的桑吉圖看看分屬關系:
library(ggforce)
eccite@meta.data %>%
gather_set_data(17:21) %>%
ggplot(aes(x, id = id, split = y, value = 1)) +
geom_parallel_sets(aes(fill = NT), show.legend = FALSE, alpha = 0.3) +
geom_parallel_sets_axes(axis.width = 0.1, color = "lightgrey", fill = "white") +
geom_parallel_sets_labels(angle = 0) +
theme_no_axes()
要理解這里的每一列是什么就要學習CRISPR的基本知識了。重要的一點是理解perturbation 的概念敌蜂。
# Normalize protein.
eccite <- NormalizeData(object = eccite, assay = "ADT", normalization.method = "CLR", margin = 2)
為了獲得全局的觀點箩兽,我們先對RNA的數據執(zhí)行Seurat的一般流程:基于rna的聚類是由混雜的變異源驅動的。
# Prepare RNA assay for dimensionality reduction: Normalize data, find variable features and
# scale data.
DefaultAssay(object = eccite) <- "RNA"
eccite <- NormalizeData(object = eccite) %>% FindVariableFeatures() %>% ScaleData()
# Run Principle Component Analysis (PCA) to reduce the dimensionality of the data.
eccite <- RunPCA(object = eccite)
# Run Uniform Manifold Approximation and Projection (UMAP) to visualize clustering in 2-D.
eccite <- RunUMAP(object = eccite, dims = 1:40)
# Generate plots to check if clustering is driven by biological replicate ID, cell cycle phase
# or target gene class.
p1 <- DimPlot(eccite, group.by = "replicate", label = F, pt.size = 0.2, reduction = "umap") + scale_color_brewer(palette = "Dark2") +
ggtitle("Biological Replicate") + xlab("UMAP 1") + ylab("UMAP 2") + custom_theme
p2 <- DimPlot(eccite, group.by = "Phase", label = F, pt.size = 0.2, reduction = "umap") + ggtitle("Cell Cycle Phase") +
ylab("UMAP 2") + xlab("UMAP 1") + custom_theme
p3 <- DimPlot(eccite, group.by = "crispr", pt.size = 0.2, reduction = "umap", split.by = "crispr",
ncol = 1, cols = c("grey39", "goldenrod3")) + ggtitle("Perturbation Status") + ylab("UMAP 2") +
xlab("UMAP 1") + custom_theme
# Visualize plots.
((p1/p2 + plot_layout(guides = "auto")) | p3)
接下來章喉,計算局部擾動特征減輕混雜效應汗贫。
Calculate local perturbation signature.
eccite <- CalcPerturbSig(object = eccite, assay = "RNA", slot = "data", gd.class = "gene", nt.cell.class = "NT",
reduction = "pca", ndims = 40, num.neighbors = 20, split.by = "replicate", new.assay.name = "PRTB")
# Prepare PRTB assay for dimensionality reduction: Normalize data, find variable features and
# center data.
DefaultAssay(object = eccite) <- "PRTB"
# Use variable features from RNA assay.
VariableFeatures(object = eccite) <- VariableFeatures(object = eccite[["RNA"]])
eccite <- ScaleData(eccite, do.scale = F, do.center = T)
# Run PCA to reduce the dimensionality of the data.
eccite <- RunPCA(object = eccite, reduction.key = "prtbpca", reduction.name = "prtbpca")
# Run UMAP to visualize clustering in 2-D.
eccite <- RunUMAP(object = eccite, dims = 1:40, reduction = "prtbpca", reduction.key = "prtbumap",
reduction.name = "prtbumap")
# Generate plots to check if clustering is driven by biological replicate ID, cell cycle phase
# or target gene class.
q1 <- DimPlot(eccite, group.by = "replicate", reduction = "prtbumap", pt.size = 0.2) + scale_color_brewer(palette = "Dark2") +
ggtitle("Biological Replicate") + ylab("UMAP 2") + xlab("UMAP 1") + custom_theme
q2 <- DimPlot(eccite, group.by = "Phase", reduction = "prtbumap", pt.size = 0.2) + ggtitle("Cell Cycle Phase") +
ylab("UMAP 2") + xlab("UMAP 1") + custom_theme
q3 <- DimPlot(eccite, group.by = "crispr", reduction = "prtbumap", split.by = "crispr", ncol = 1,
pt.size = 0.2, cols = c("grey39", "goldenrod3")) + ggtitle("Perturbation Status") + ylab("UMAP 2") +
xlab("UMAP 1") + custom_theme
# Visualize plots.
(q1/q2 + plot_layout(guides = "auto") | q3)
Mixscape識別沒有檢測擾動的細胞。
#install.packages('mixtools')
# Run mixscape to classify cells based on their perturbation status.
eccite <- RunMixscape(object = eccite, assay = "PRTB", slot = "scale.data", labels = "gene", nt.class.name = "NT",
min.de.genes = 5, iter.num = 10, de.assay = "RNA", verbose = F)
# Show that only IFNG pathway KO cells have a reduction in PD-L1 protein expression.
Idents(object = eccite) <- "gene"
VlnPlot(object = eccite, features = "adt_PDL1", idents = c("NT", "JAK2", "STAT1", "IFNGR1", "IFNGR2",
"IRF1"), group.by = "gene", pt.size = 0.2, sort = T, split.by = "mixscape_class.global", cols = c("coral3",
"grey79", "grey39")) + ggtitle("PD-L1 protein") + theme(axis.text.x = element_text(angle = 0,
hjust = 0.5))
KO,NP,NT 是啥意思秸脱?
- If the cell received a NT gRNA, it retains its assignment as non-targeting (NT)
- If the cell received a targeting gRNA, and is classified in step 8 as NT, it is assigned a non-perturbed (NP) label
- If the cell received a targeting gRNA, and is classified in step 8 as KO, it receives a perturbed/knock-out (KO) label
用線性判別分析(LDA)可視化擾動響應落包。
# Remove non-perturbed cells and run LDA to reduce the dimensionality of the data.
Idents(eccite) <- "mixscape_class.global"
sub <- subset(eccite, idents = c("KO", "NT"))
sub <- MixscapeLDA(object = sub, assay = "RNA", pc.assay = "PRTB", labels = "gene", nt.label = "NT",
npcs = 10, logfc.threshold = 0.25, verbose = F)
# Use LDA results to run UMAP and visualize cells on 2-D.
sub <- RunUMAP(sub, dims = 1:11, reduction = "lda", reduction.key = "ldaumap", reduction.name = "ldaumap")
# Visualize UMAP clustering results.
Idents(sub) <- "mixscape_class"
sub$mixscape_class <- as.factor(sub$mixscape_class)
p <- DimPlot(sub, reduction = "ldaumap", label = T, repel = T, label.size = 5)
col = setNames(object = hue_pal()(12), nm = levels(sub$mixscape_class))
names(col) <- c(names(col)[1:7], "NT", names(col)[9:12])
col[8] <- "grey39"
p + scale_color_manual(values = col, drop = FALSE) + ylab("UMAP 2") + xlab("UMAP 1") + custom_theme
代碼和降維圖均來自mixscape官網,為什么沒有自己畫摊唇,因為畫到一般發(fā)現自己的PC計算資源不夠用了咐蝇。對原理感興趣看一看文章:Characterizing the molecular regulation of inhibitory immune checkpoints with multi-modal single-cell screens.
在分析的過程中注意Seurat數據的assay之間的切換,這是四套數據了遏片。大部分函數是我們熟悉的Seurat的函數嘹害,幾個關鍵的函數需要我們親自查看help文檔,如RunMixscape ``CalcPerturbSig
吮便。當然笔呀,最為關鍵的還是我們對ECCITE-seq和CRISPR技術的原理和細節(jié)。
Mixscape 的出現再次驗證了:單細胞只是概念髓需,我們可以基于此開發(fā)相應的技術许师。