代碼主要來自:https://satijalab.org/seurat/articles/integration_introduction.html
1.數(shù)據(jù)準(zhǔn)備
不得不說,網(wǎng)速的限制是無處不在啊客燕。這個數(shù)據(jù)有點大鸳劳, 官網(wǎng)給的下載方式是用代碼,在大陸基本上不可能成功咯也搓。還是把包下載到本地赏廓,用本地安裝R包的方法靠譜一點
rm(list = ls())
library(Seurat)
library(SeuratData)
library(patchwork)
# install dataset
#InstallData("ifnb")
#install.packages("ifnb.SeuratData_3.1.0.tar.gz",repos = NULL)
# load dataset
ifnb = LoadData("ifnb")
2.了解和拆分?jǐn)?shù)據(jù)
因為是用來做整合的例子,而內(nèi)置數(shù)據(jù)是個整體的數(shù)據(jù)傍妒,所以要把它拆分掉幔摸。
# split the dataset into a list of two seurat objects (stim and CTRL)
head(ifnb@meta.data)
## orig.ident nCount_RNA nFeature_RNA stim seurat_annotations
## AAACATACATTTCC.1 IMMUNE_CTRL 3017 877 CTRL CD14 Mono
## AAACATACCAGAAA.1 IMMUNE_CTRL 2481 713 CTRL CD14 Mono
## AAACATACCTCGCT.1 IMMUNE_CTRL 3420 850 CTRL CD14 Mono
## AAACATACCTGGTA.1 IMMUNE_CTRL 3156 1109 CTRL pDC
## AAACATACGATGAA.1 IMMUNE_CTRL 1868 634 CTRL CD4 Memory T
## AAACATACGGCATT.1 IMMUNE_CTRL 1581 557 CTRL CD14 Mono
table(ifnb@meta.data$stim)
##
## CTRL STIM
## 6548 7451
ifnb.list <- SplitObject(ifnb, split.by = "stim")
length(ifnb.list)
## [1] 2
可以看到ctrl和stim組各自的細(xì)胞數(shù)量。
3.完成整合
兩個拆分后的對象分別Normalize颤练,找高變化基因既忆,尋找錨點,結(jié)合在一起昔案。
# normalize and identify variable features for each dataset independently
ifnb.list <- lapply(X = ifnb.list, FUN = function(x) {
x <- NormalizeData(x)
x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 2000)
})
# select features that are repeatedly variable across datasets for integration
features <- SelectIntegrationFeatures(object.list = ifnb.list)
immune.anchors <- FindIntegrationAnchors(object.list = ifnb.list, anchor.features = features)
# this command creates an 'integrated' data assay
immune.combined <- IntegrateData(anchorset = immune.anchors)
# specify that we will perform downstream analysis on the corrected data note that the
# original unmodified data still resides in the 'RNA' assay
DefaultAssay(immune.combined) <- "integrated"
之后的分析默認(rèn)使用整合后的數(shù)據(jù)integrated尿贫。
4.常規(guī)的降維聚類分群
# Run the standard workflow for visualization and clustering
immune.combined <- ScaleData(immune.combined, verbose = FALSE)
immune.combined <- RunPCA(immune.combined, npcs = 30, verbose = FALSE)
immune.combined <- RunUMAP(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindNeighbors(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindClusters(immune.combined, resolution = 0.5)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 13999
## Number of edges: 569703
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9057
## Number of communities: 16
## Elapsed time: 1 seconds
# Visualization
p1 <- DimPlot(immune.combined, reduction = "umap", group.by = "stim")
p2 <- DimPlot(immune.combined, reduction = "umap", label = TRUE, repel = TRUE)
p1 + p2
5.singleR注釋
官網(wǎng)使用的是根據(jù)marker基因手動識別細(xì)胞類型,設(shè)置了標(biāo)簽踏揣。我這里用singleR偷個懶庆亡。。捞稿。
singleR分的類比較粗糙又谋,右邊那一大片都是單核細(xì)胞拼缝,沒有那么具體。而手動的話可以繼續(xù)細(xì)分彰亥。
# 注釋
library(celldex)
library(SingleR)
#ref <- celldex::HumanPrimaryCellAtlasData()
ref <- get(load("single_ref/ref_Hematopoietic.RData"))
library(BiocParallel)
pred.scRNA <- SingleR(test = immune.combined@assays$integrated@data,
ref = ref,
labels = ref$label.main,
clusters = immune.combined@active.ident)
pred.scRNA$pruned.labels
## [1] "Monocytes" "CD8+ T cells" "CD4+ T cells" "Monocytes"
## [5] "B cells" "CD8+ T cells" "NK cells" "CD4+ T cells"
## [9] "Monocytes" "B cells" "CD8+ T cells" "Dendritic cells"
## [13] "Monocytes" "Monocytes" "HSCs"
plotScoreHeatmap(pred.scRNA, clusters=pred.scRNA@rownames, fontsize.row = 9,show_colnames = T)
new.cluster.ids <- pred.scRNA$pruned.labels
names(new.cluster.ids) <- levels(immune.combined)
levels(immune.combined)
## [1] "0" "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14"
immune.combined <- RenameIdents(immune.combined,new.cluster.ids)
levels(immune.combined)
## [1] "Monocytes" "CD8+ T cells" "CD4+ T cells" "B cells"
## [5] "NK cells" "Dendritic cells" "HSCs"
UMAPPlot(object = immune.combined, pt.size = 0.5, label = TRUE)