引言
本文將帶您分析一個(gè)單細(xì)胞聯(lián)合檢測(cè)數(shù)據(jù)集呐萌,該數(shù)據(jù)集能夠同時(shí)測(cè)量細(xì)胞內(nèi)的基因表達(dá)水平和DNA的可及性酝惧。
這項(xiàng)數(shù)據(jù)集由Chen、Lake和Zhang在2019年發(fā)表,采用了一種名為SNARE-seq的技術(shù)惨奕。由于該數(shù)據(jù)集并未公開火架,我們已將原始數(shù)據(jù)重新映射至mm10基因組。您可以通過以下鏈接下載:
- 片段文件:https://signac-objects.s3.amazonaws.com/snareseq/fragments.sort.bed.gz
- 片段文件的索引文件:https://signac-objects.s3.amazonaws.com/snareseq/fragments.sort.bed.gz.tbi
- 用于從原始數(shù)據(jù)生成片段文件的代碼:https://github.com/timoast/SNARE-seq
數(shù)據(jù)加載
首先構(gòu)建了一個(gè)Seurat對(duì)象滞欠,它包含了兩種不同的檢測(cè)類型:一種是基因表達(dá)數(shù)據(jù)古胆,另一種是DNA的可及性數(shù)據(jù)。
在加載計(jì)數(shù)數(shù)據(jù)時(shí)筛璧,我們利用Seurat提供的Read10X()功能逸绎。使用這個(gè)功能之前惹恃,需要將barcodes.tsv.gz、matrix.mtx.gz和features.tsv.gz這些文件整理到一個(gè)單獨(dú)的文件夾中桶良。
library(Signac)
library(Seurat)
library(ggplot2)
library(EnsDb.Mmusculus.v79)
# load processed data matrices for each assay
rna <- Read10X("../vignette_data/snare-seq/GSE126074_AdBrainCortex_rna/", gene.column = 1)
atac <- Read10X("../vignette_data/snare-seq/GSE126074_AdBrainCortex_atac/", gene.column = 1)
fragments <- "../vignette_data/snare-seq/fragments.sort.bed.gz"
# create a Seurat object and add the assays
snare <- CreateSeuratObject(counts = rna)
snare[['ATAC']] <- CreateChromatinAssay(
counts = atac,
sep = c(":", "-"),
genome = "mm10",
fragments = fragments
)
# extract gene annotations from EnsDb
annotations <- GetGRangesFromEnsDb(ensdb = EnsDb.Mmusculus.v79)
# change to UCSC style since the data was mapped to mm10
seqlevels(annotations) <- paste0('chr', seqlevels(annotations))
genome(annotations) <- "mm10"
# add the gene information to the object
Annotation(snare[["ATAC"]]) <- annotations
數(shù)據(jù)質(zhì)控
DefaultAssay(snare) <- "ATAC"
snare <- TSSEnrichment(snare)
snare <- NucleosomeSignal(snare)
snare$blacklist_fraction <- FractionCountsInRegion(
object = snare,
assay = 'ATAC',
regions = blacklist_mm10
)
Idents(snare) <- "all" # group all cells together, rather than by replicate
VlnPlot(
snare,
features = c("nCount_RNA", "nCount_ATAC", "TSS.enrichment",
"nucleosome_signal", "blacklist_fraction"),
pt.size = 0.1,
ncol = 5
)
snare <- subset(
x = snare,
subset = blacklist_fraction < 0.03 &
TSS.enrichment < 20 &
nCount_RNA > 800 &
nCount_ATAC > 500
)
snare
## An object of class Seurat
## 277704 features across 8055 samples within 2 assays
## Active assay: ATAC (244544 features, 0 variable features)
## 2 layers present: counts, data
## 1 other assay present: RNA
基因表達(dá)數(shù)據(jù)處理
- 使用 Seurat 處理基因表達(dá)數(shù)據(jù)
DefaultAssay(snare) <- "RNA"
snare <- FindVariableFeatures(snare, nfeatures = 3000)
snare <- NormalizeData(snare)
snare <- ScaleData(snare)
snare <- RunPCA(snare, npcs = 30)
snare <- RunUMAP(snare, dims = 1:30, reduction.name = "umap.rna")
snare <- FindNeighbors(snare, dims = 1:30)
snare <- FindClusters(snare, resolution = 0.5, algorithm = 3)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 8055
## Number of edges: 324240
##
## Running smart local moving algorithm...
## Maximum modularity in 10 random starts: 0.8900
## Number of communities: 14
## Elapsed time: 4 seconds
p1 <- DimPlot(snare, label = TRUE) + NoLegend() + ggtitle("RNA UMAP")
DNA可及性數(shù)據(jù)處理
- 使用 Signac 處理 DNA 可及性數(shù)據(jù)
DefaultAssay(snare) <- 'ATAC'
snare <- FindTopFeatures(snare, min.cutoff = 10)
snare <- RunTFIDF(snare)
snare <- RunSVD(snare)
snare <- RunUMAP(snare, reduction = 'lsi', dims = 2:30, reduction.name = 'umap.atac')
p2 <- DimPlot(snare, reduction = 'umap.atac', label = TRUE) + NoLegend() + ggtitle("ATAC UMAP")
p1 + p2
與 scRNA-seq 整合
接下來座舍,可以通過成人小鼠大腦的單細(xì)胞RNA測(cè)序(scRNA-seq)數(shù)據(jù)集的標(biāo)簽,來對(duì)當(dāng)前數(shù)據(jù)集中的細(xì)胞類型進(jìn)行分類標(biāo)注陨帆。
# label transfer from Allen brain
allen <- readRDS("../vignette_data/allen_brain.rds")
allen <- UpdateSeuratObject(allen)
# use the RNA assay in the SNARE-seq data for integration with scRNA-seq
DefaultAssay(snare) <- 'RNA'
transfer.anchors <- FindTransferAnchors(
reference = allen,
query = snare,
dims = 1:30,
reduction = 'cca'
)
predicted.labels <- TransferData(
anchorset = transfer.anchors,
refdata = allen$subclass,
weight.reduction = snare[['pca']],
dims = 1:30
)
snare <- AddMetaData(object = snare, metadata = predicted.labels)
# label clusters based on predicted ID
new.cluster.ids <- c(
"L2/3 IT",
"L4",
"L6 IT",
"L5 CT",
"L4",
"L5 PT",
"Pvalb",
"Sst",
"Astro",
"Oligo",
"Vip/Lamp5",
"L6 IT.2",
"L6b",
"NP"
)
names(x = new.cluster.ids) <- levels(x = snare)
snare <- RenameIdents(object = snare, new.cluster.ids)
snare$celltype <- Idents(snare)
DimPlot(snare, group.by = 'celltype', label = TRUE, reduction = 'umap.rna')
同時(shí)展示基因表達(dá)和DNA開放性
利用CoveragePlot()功能曲秉,我們可以同時(shí)觀察基因表達(dá)和DNA可及性數(shù)據(jù)。這種方式便于對(duì)不同細(xì)胞類型在特定區(qū)域內(nèi)的DNA開放性進(jìn)行比較疲牵,并且能夠?qū)⒉煌虻谋磉_(dá)情況疊加顯示承二,以便于分析。
DefaultAssay(snare) <- "ATAC"
CoveragePlot(snare, region = "chr2-22620000-22660000", features = "Gad2")
本文由mdnice多平臺(tái)發(fā)布