單細胞數據挖掘實戰(zhàn):文獻復現(xiàn)(一)批量讀取數據
單細胞數據挖掘實戰(zhàn):文獻復現(xiàn)(二)批量創(chuàng)建Seurat對象及質控
前面得到了samples_objects Seurat對象并進行了質控,下面進行降維、聚類和細胞注釋,主要是復現(xiàn)一下文獻中的Fig. 1b和Fig. 1c
一、高變基因
samples_objects <- lapply(seq_along(samples_objects), function(i) {
samples_objects[[i]] <- FindVariableFeatures(object = samples_objects[[i]],
selection.method = "vst",
nfeatures = 2000, verbose = FALSE)
})
names(samples_objects) <- names(samples_raw_data)
二闯狱、在Seurat對象中添加一些相關信息
- 文獻中Fig. 1b和Fig. 1c主要用f_ctrl_1, f_tumor_1, m_ctrl_1, m_tumor_1這四個樣本;
- 添加相關基因:用來計算細胞周期評分
#細胞周期相關基因
s_genes <- cc.genes$s.genes
g2m_genes <- cc.genes$g2m.genes
#小膠質細胞和巨噬細胞標志物
microglia_markers <- c("Tmem119", "P2ry12", "Sall1", "Pros1", "Crybb1")
macrophages_markers <- c("Itga4", "Tgfbi", "Cxcl2", "Ccr2", "Il10", "Fgr")
samples_objects <- lapply(seq_along(samples_objects), function(i) {
# 添加shortID
samples_objects[[i]]$shortID <- substr(as.character(samples_objects[[i]]$orig.ident), 15,
nchar(as.character(samples_objects[[i]]$orig.ident)))
# 添加sex和sex_condition
samples_objects[[i]]$sex <- substr(as.character(samples_objects[[i]]$shortID), 12, 12)
samples_objects[[i]]$sex_condition <- paste0(samples_objects[[i]]$sex, "_", samples_objects[[i]]$condition)
# 添加var.features
samples_objects[[i]]@assays$RNA@var.features <- unique(c(samples_objects[[i]]@assays$RNA@var.features,
s_genes, g2m_genes, microglia_markers, macrophages_markers))
samples_objects[[i]]
})
names(samples_objects) <- names(samples_raw_data)
三抛计、按條件整合樣本哄孤、PCA, t-SNE 及找細胞類群
設置參數
anchor_dims <- 30
pca_dims <- 30
clustering_resolution <- 0.3
n_features <- 2000
下面這個過程耗時較長
sex_condition_objects <- lapply(c(1,3,5,7), function(i) { # f_ctrl_1, f_tumor_1, m_ctrl_1, m_tumor_1
sex_condition <- unique(samples_objects[[i]]$sex_condition)
#按條件整合樣本(f_ctrl_1, f_tumor_1, m_ctrl_1, m_tumor_1)
print(paste0("Find Integration Anchors: ", sex_condition))
samples_anchors <- FindIntegrationAnchors(object.list = list(samples_objects[[i]],
samples_objects[[i + 1]]),
dims = 1:anchor_dims,
anchor.features = n_features)
print(paste0("Integrate Data: ", sex_condition))
samples_integrated <- IntegrateData(anchorset = samples_anchors, dims = 1:anchor_dims)
DefaultAssay(object=samples_integrated) <- "integrated"
#細胞周期評分(CellCycleScoring函數)
print(paste0("Cell Cycle Scoring:", sex_condition))
samples_integrated <- CellCycleScoring(object = samples_integrated, s.features = s_genes,g2m.features = g2m_genes, set.ident = TRUE)
print(paste0("Scale Data: ", sex_condition, " regress out: nCount_RNA, percent_mito, CC_difference"))
#計算 G2M 和 S 期分數之間的差異,以分離非周期細胞和周期細胞
# Seurat細胞周期評分和回歸
samples_integrated$CC_Difference <- samples_integrated$S.Score - samples_integrated$G2M.Score
samples_integrated <- ScaleData(object = samples_integrated, verbose = FALSE,
vars.to.regress = c("nCount_RNA",
"percent_mito",
"CC_Difference"))
print(paste0("Run PCA: ", sex_condition))
samples_integrated <- RunPCA(object = samples_integrated, npcs = pca_dims, verbose = FALSE)
print(paste0("Run t-SNE:", sex_condition))
samples_integrated <- RunTSNE(object = samples_integrated, reduction = "pca", dims=1:pca_dims)
print(paste0("Find Neighbors: ", sex_condition))
samples_integrated <- FindNeighbors(object = samples_integrated, dims = 1:pca_dims)
print(paste0("Find Clusters: ", sex_condition))
samples_integrated <- FindClusters(object = samples_integrated, resolution = clustering_resolution)
samples_integrated
})
names(sex_condition_objects) <- substring(names(samples_objects), 1, nchar(names(samples_objects))-2)[c(1,3,5,7)]
保存
saveRDS(sex_condition_objects, file = "sex_condition_objects.RDS")
四吹截、細胞注釋
- 細胞注釋這一步文獻中是手工注釋的瘦陈,需要找一些marker,然后根據marker基因在哪些cluster中表達來判斷波俄;
- 可以用DotPlot和FeaturePlot可視化marker基因晨逝;
- 文獻中用到的marker在附件中,可以下載懦铺;
先看一下文獻中的圖
1.png
這里只嘗試f-ctrl這個樣本捉貌,其它三個大家可以自己去復現(xiàn)一下
先簡單畫個圖,看下cluster
DimPlot(sex_condition_objects[[1]],label = T)
2.png
選取marker并作圖觀察
##這些marker來自文獻中Fig. 1c中所示的基因
ctrl_gene_panel <- c("Cd14", "Tmem119", "P2ry12", "Csf1", "Crybb1", "Mcm5", "Ifit3", "Tgfbi", "Ifitm2", "Ifitm3", "S100a6", "Ly6c2", "Ccr2", "Mrc1", "Cd163", "Cd24a","Ncam1")
########################FeaturePlot###########
for (i in seq_along(ctrl_gene_panel)) {
FeaturePlot(sex_condition_objects[[1]], features = ctrl_gene_panel[i], coord.fixed = T, order = T)
ggsave2(paste0("FeaturePlot_tumor_gene_panel_", tumor_gene_panel[i], ".png"), path = "./annotation/f_ctrl", width = 10, height = 10, units = "cm")
}
#######################
#############################DotPlot#####################
p <- DotPlot(sex_condition_objects[[1]], features = ctrl_gene_panel,
assay='RNA' ) + coord_flip()
p
###############################################
3.png
需要對這些marker進行觀察判斷冬念,然后根據marker對細胞進行注釋趁窃,這里就按照文獻里的注釋進行下一步;
注釋:Fig. 1b
#################################################
anno_cells <- c("MG1","MG2","MG3","BAM","MG4","MG5","pre-MG",
"pre-MG","pre-MG","pre-MG","MG6","pre-MG")
names(anno_cells) <- levels(sex_condition_objects[[1]])
sex_condition_objects[[1]] <- RenameIdents(sex_condition_objects[[1]],anno_cells)
DimPlot(sex_condition_objects[[1]],label = T,pt.size = 0.5,
cols = c("#9AEAFF","#41589C","#64A7D4","#0FD1AE",
"#3A2985","#427DB0","#29A5C8","#518FC3"))+NoLegend()
#################################################
和文獻的圖比較一下
4.png
總體感覺差不多急前,MG4位置有點不一樣醒陆,不知道是不是因為R包版本不一致引起的?
Fig. 1c
DotPlot(sex_condition_objects[[1]], features = ctrl_gene_panel,
cols = c("#92cee9","#00ed01")) + RotatedAxis() +
theme(legend.position="top")
和文獻的圖比較一下
5.png
顏色需要調一下裆针,字體什么的可以去AI里修改刨摩,這里只做了f_ctrl_1一個樣本,還有f_tumor_1, m_ctrl_1, m_tumor_1三個樣本据块,感興趣的話可以去試一下码邻。
往期單細胞數據挖掘實戰(zhàn):