本文是參考學習CNS圖表復現(xiàn)20—第三次分群乔煞,以T細胞為例的學習筆記吁朦。可能根據(jù)學習情況有所改動渡贾。
前面我們展現(xiàn)了 CNS圖表復現(xiàn)08—腫瘤單細胞數(shù)據(jù)第一次分群通用規(guī)則逗宜,然后呢,第二次分群的上皮細胞可以細分惡性與否空骚,免疫細胞呢纺讲,細分可以成為: B細胞,T細胞囤屹,巨噬細胞刻诊,樹突細胞等等。實際上每個免疫細胞亞群仍然可以繼續(xù)精細的劃分牺丙,以文章為例:
- Macrophages from lung tumor biopsies (n = 1,379) were clus- tered into 5 distinct groups (Figure S7A) followed by differential gene expression in each resulting cluster
- T cells and NK cells (n = 2,226) were analyzed in the same manner as macrophages and resulted in 5 distinct T/NK cell pop- ulations .
這就是正文的圖表5:
首先是Macrophages細分 :
各個細胞亞群的比例展示及標記基因小提琴圖:
各個細胞亞群的分布情況及標記基因熱圖;
然后是T cells and NK cells的細分
各個細胞亞群的比例展示及標記基因小提琴圖:
各個細胞亞群的分布情況及標記基因熱圖复局;
嘗試復現(xiàn)T細胞的亞群細分
文章提到的是
- Macrophages from lung tumor biopsies (n = 1,379) were clus- tered into 5 distinct groups (Figure S7A) followed by differential gene expression in each resulting cluster
- T cells and NK cells (n = 2,226) were analyzed in the same manner as macrophages and resulted in 5 distinct T/NK cell pop- ulations .
首先我們拿到T cells and NK cells數(shù)據(jù)集
rm(list=ls())
options(stringsAsFactors = F)
library(Seurat)
library(ggplot2)
### 來源于:CNS圖表復現(xiàn)02—Seurat標準流程之聚類分群的step1-create-sce.R
load(file = 'first_sce.Rdata')
### 來源于 step2-anno-first.R
load(file = 'phe-of-first-anno.Rdata')
sce=sce.first
table(phe$immune_annotation)
# # immune (CD45+,PTPRC), epithelial/cancer (EpCAM+,EPCAM), and stromal (CD10+,MME,fibo or CD31+,PECAM1,endo)
genes_to_check = c("PTPRC","CD19",'PECAM1','MME','CD3G', 'CD4', 'CD8A' )
p <- DotPlot(sce, features = genes_to_check,
assay='RNA' )
p
table(phe$immune_annotation,phe$seurat_clusters)
cells.use <- row.names(sce@meta.data)[which(phe$immune_annotation=='immune')]
length(cells.use)
sce <-subset(sce, cells=cells.use)
sce
# 實際上這里需要重新對sce進行降維聚類分群冲簿,為了節(jié)省時間
# 我們直接載入前面的降維聚類分群結果,但是并沒有載入tSNE哦
## 來源于:CNS圖表復現(xiàn)05—免疫細胞亞群再分類
load(file = 'phe-of-subtypes-Immune-by-manual.Rdata')
sce@meta.data=phe
table(phe$immuSub)
# 需要背景知識
# https://www.abcam.com/primary-antibodies/immune-cell-markers-poster
# NK Cell* CD11b+, CD122+, NK1.1+, NKG2D+, NKp46+
# NCR1 Gene. Natural Cytotoxicity Triggering Receptor 1; NK-P46; Natural Killer Cell P46-Related Protein;
# NKG2D is a transmembrane protein belonging to the NKG2 family of C-type lectin-like receptors. NKG2D is encoded by KLRK1 gene
# T Cell* CD3+
genes_to_check = c('CD3G', 'CD4', 'CD8A', 'FOXP3',
'TNF','IFNG','KLRK1','NCR1')
p <- DotPlot(sce, features = genes_to_check,group.by = 'seurat_clusters') + coord_flip()
p
cells.use <- row.names(sce@meta.data)[which(phe$immuSub=='Tcells')]
length(cells.use)
sce <-subset(sce, cells=cells.use)
sce
# 26485 features across 4555 samples within 1 assay
然后走降維聚類分群流程
挑選出來了 4555 個細胞亿昏,走標準流程即可:
sce <- NormalizeData(sce, normalization.method = "LogNormalize",
scale.factor = 10000)
GetAssay(sce,assay = "RNA")
sce <- FindVariableFeatures(sce,
selection.method = "vst", nfeatures = 2000)
# 步驟 ScaleData 的耗時取決于電腦系統(tǒng)配置(保守估計大于一分鐘)
sce <- ScaleData(sce)
sce <- RunPCA(object = sce, pc.genes = VariableFeatures(sce))
DimHeatmap(sce, dims = 1:12, cells = 100, balanced = TRUE)
ElbowPlot(sce)
sce <- FindNeighbors(sce, dims = 1:15)
sce <- FindClusters(sce, resolution = 0.2)
table(sce@meta.data$RNA_snn_res.0.2)
sce <- RunUMAP(object = sce, dims = 1:15, do.fast = TRUE)
DimPlot(sce,reduction = "umap",label=T)
# 這個時候分成了6群
查亞群的標記基因
如果我們?nèi)?https://www.abcam.com/primary-antibodies/t-cells-basic-immunophenotyping 可以看到
- Killer T cells (cytotoxic T lymphocytes: CD8+) , Effector or memory
- Helper T cells (Th cells: CD4+), TH1,2,17
- Regulatory T cells (Treg cells: CD4+, CD25+, FoxP3+, CD127+)
- interferon gamma (IFN-γ) and tumor necrosis factor alpha (TNF-α), two factors produced by activated T cells
- Helper Th1,IFNγ, IL-2, IL-12, IL-18
比較簡單的CD4,CD8,以及FOXP3各自代表的T細胞亞群峦剔。比較復雜的可以是:張澤民團隊的單細胞研究把T細胞分的如此清楚
但是很不幸,在這個數(shù)據(jù)集里面它們都不是主要的細胞亞群決定因素角钩。
genes_to_check = c('CD3G', 'CD4', 'CD8A', 'FOXP3',
'TNF','IFNG','KLRK1','NCR1')
p1 <- DotPlot(sce, features = genes_to_check,group.by = 'seurat_clusters') + coord_flip()
p1
p2=DimPlot(sce,reduction = "umap",label=T)
library(patchwork)
p1+p2
出圖如下:
文章呢吝沫,選取的其實是 biopsy_site == "Lung" 的T cells and NK cells (n = 2,226) ,區(qū)分成為5 distinct T/NK cell pop- ulations .
> table(sce@meta.data$biopsy_site)
Adrenal Brain Liver LN lung Lung Pleura
226 2 586 700 485 1692 864
所以我們只能是繼續(xù)挑選子集走流程递礼。代碼如下:
sce <- NormalizeData(sce, normalization.method = "LogNormalize",
scale.factor = 10000)
GetAssay(sce,assay = "RNA")
sce <- FindVariableFeatures(sce,
selection.method = "vst", nfeatures = 2000)
# 步驟 ScaleData 的耗時取決于電腦系統(tǒng)配置(保守估計大于一分鐘)
sce <- ScaleData(sce)
sce <- RunPCA(object = sce, pc.genes = VariableFeatures(sce))
DimHeatmap(sce, dims = 1:12, cells = 100, balanced = TRUE)
ElbowPlot(sce)
sce <- FindNeighbors(sce, dims = 1:15)
sce <- FindClusters(sce, resolution = 0.5)
table(sce@meta.data$RNA_snn_res.0.5)
sce <- RunUMAP(object = sce, dims = 1:15, do.fast = TRUE)
genes_to_check = c('CD3G', 'CD4', 'CD8A', 'FOXP3',
'TNF','IFNG','KLRK1','NCR1')
p1 <- DotPlot(sce, features = genes_to_check,group.by = 'seurat_clusters') + coord_flip()
p1
p2=DimPlot(sce,reduction = "umap",label=T)
library(patchwork)
p1+p2
可以看到惨险,并沒有本質(zhì)上的區(qū)別:
復現(xiàn)失敗了!唯一看起來比較類似的就是 FOXP3 positive Tregs in cluster 5脊髓,哈哈哈哈辫愉!
- Cluster0: generic dont really know.
- Cluster1: Exhaustion in the pD cluster (1) and the PD/PR cluster (4). (PR/PD feature)
- Cluster2: Cluster 2 is cytotoxic (GZMK, EOMES, CD8pos, IFNG) (Less in PD)
- Cluster3: Cluster 3 is enriched for NK cells (Naive feature)
- Cluster4: Exhaustion in the pD cluster (1) and the PD/PR cluster (4). (PR/PD feature)
- Cluster5: FOXP3 positive Tregs in cluster 5 (Less in PR)