Seurat - Guided Clustering Tutorial

Compiled: February 08, 2021

<small class="dont-index" style="box-sizing: border-box; font-size: 12.9px;">Source: vignettes/pbmc3k_tutorial.Rmd</small>

Setup the Seurat Object

For this tutorial, we will be analyzing the a dataset of Peripheral Blood Mononuclear Cells (PBMC) freely available from 10X Genomics. There are 2,700 single cells that were sequenced on the Illumina NextSeq 500. The raw data can be found here.

We start by reading in the data. The [Read10X()](https://satijalab.org/seurat/reference/Read10X.html) function reads in the output of the cellranger pipeline from 10X, returning a unique molecular identified (UMI) count matrix. The values in this matrix represent the number of molecules for each feature (i.e. gene; row) that are detected in each cell (column).

We next use the count matrix to create a Seurat object. The object serves as a container that contains both data (like the count matrix) and analysis (like PCA, or clustering results) for a single-cell dataset. For a technical discussion of the Seurat object structure, check out our GitHub Wiki. For example, the count matrix is stored in pbmc[["RNA"]]@counts.

library(dplyr)
library(Seurat)
library(patchwork)

# Load the PBMC dataset
pbmc.data <- Read10X(data.dir = "../data/pbmc3k/filtered_gene_bc_matrices/hg19/")
# Initialize the Seurat object with the raw (non-normalized data).
pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k", min.cells = 3, min.features = 200)
pbmc

## An object of class Seurat 
## 13714 features across 2700 samples within 1 assay 
## Active assay: RNA (13714 features, 0 variable features)

<details style="box-sizing: border-box; display: block;"><summary style="box-sizing: border-box; display: list-item;">What does data in a count matrix look like?</summary></details>

Standard pre-processing workflow

The steps below encompass the standard pre-processing workflow for scRNA-seq data in Seurat. These represent the selection and filtration of cells based on QC metrics, data normalization and scaling, and the detection of highly variable features.

QC and selecting cells for further analysis

Seurat allows you to easily explore QC metrics and filter cells based on any user-defined criteria. A few QC metrics commonly used by the community include

The number of unique genes detected in each cell.
- Low-quality cells or empty droplets will often have very few genes
- Cell doublets or multiplets may exhibit an aberrantly high gene count
Similarly, the total number of molecules detected within a cell (correlates strongly with unique genes)
The percentage of reads that map to the mitochondrial genome
- Low-quality / dying cells often exhibit extensive mitochondrial contamination
- We calculate mitochondrial QC metrics with the [PercentageFeatureSet()](https://satijalab.org/seurat/reference/PercentageFeatureSet.html) function, which calculates the percentage of counts originating from a set of features
- We use the set of all genes starting with MT- as a set of mitochondrial genes

# The [[ operator can add columns to object metadata. This is a great place to stash QC stats
pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")

<details style="box-sizing: border-box; display: block;"><summary style="box-sizing: border-box; display: list-item;">Where are QC metrics stored in Seurat?</summary></details>

In the example below, we visualize QC metrics, and use these to filter cells.

We filter cells that have unique feature counts over 2,500 or less than 200
We filter cells that have >5% mitochondrial counts

# Visualize QC metrics as a violin plot
VlnPlot(pbmc, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)

image

# FeatureScatter is typically used to visualize feature-feature relationships, but can be used
# for anything calculated by the object, i.e. columns in object metadata, PC scores etc.

plot1 <- FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "percent.mt")
plot2 <- FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")
plot1 + plot2

image

pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)

Normalizing the data

After removing unwanted cells from the dataset, the next step is to normalize the data. By default, we employ a global-scaling normalization method “LogNormalize” that normalizes the feature expression measurements for each cell by the total expression, multiplies this by a scale factor (10,000 by default), and log-transforms the result. Normalized values are stored in pbmc[["RNA"]]@data.

pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize", scale.factor = 10000)

For clarity, in this previous line of code (and in future commands), we provide the default values for certain parameters in the function call. However, this isn’t required and the same behavior can be achieved with:

pbmc <- NormalizeData(pbmc)

Identification of highly variable features (feature selection)

We next calculate a subset of features that exhibit high cell-to-cell variation in the dataset (i.e, they are highly expressed in some cells, and lowly expressed in others). We and others have found that focusing on these genes in downstream analysis helps to highlight biological signal in single-cell datasets.

Our procedure in Seurat is described in detail here, and improves on previous versions by directly modeling the mean-variance relationship inherent in single-cell data, and is implemented in the [FindVariableFeatures()](https://satijalab.org/seurat/reference/FindVariableFeatures.html) function. By default, we return 2,000 features per dataset. These will be used in downstream analysis, like PCA.

pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = 2000)

# Identify the 10 most highly variable genes
top10 <- head(VariableFeatures(pbmc), 10)

# plot variable features with and without labels
plot1 <- VariableFeaturePlot(pbmc)
plot2 <- LabelPoints(plot = plot1, points = top10, repel = TRUE)
plot1 + plot2

image

Scaling the data

Next, we apply a linear transformation (‘scaling’) that is a standard pre-processing step prior to dimensional reduction techniques like PCA. The [ScaleData()](https://satijalab.org/seurat/reference/ScaleData.html) function:

Shifts the expression of each gene, so that the mean expression across cells is 0
Scales the expression of each gene, so that the variance across cells is 1
- This step gives equal weight in downstream analyses, so that highly-expressed genes do not dominate
The results of this are stored in pbmc[["RNA"]]@scale.data

all.genes <- rownames(pbmc)
pbmc <- ScaleData(pbmc, features = all.genes)

<details style="box-sizing: border-box; display: block;"><summary style="box-sizing: border-box; display: list-item;">This step takes too long! Can I make it faster?</summary></details> <details style="box-sizing: border-box; display: block;"><summary style="box-sizing: border-box; display: list-item;">How can I remove unwanted sources of variation, as in Seurat v2?</summary></details>

Perform linear dimensional reduction

Next we perform PCA on the scaled data. By default, only the previously determined variable features are used as input, but can be defined using features argument if you wish to choose a different subset.

pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc))

Seurat provides several useful ways of visualizing both cells and features that define the PCA, including VizDimReduction(), [DimPlot()](https://satijalab.org/seurat/reference/DimPlot.html), and [DimHeatmap()](https://satijalab.org/seurat/reference/DimHeatmap.html)

# Examine and visualize PCA results a few different ways
print(pbmc[["pca"]], dims = 1:5, nfeatures = 5)

## PC_ 1 
## Positive:  CST3, TYROBP, LST1, AIF1, FTL 
## Negative:  MALAT1, LTB, IL32, IL7R, CD2 
## PC_ 2 
## Positive:  CD79A, MS4A1, TCL1A, HLA-DQA1, HLA-DQB1 
## Negative:  NKG7, PRF1, CST7, GZMB, GZMA 
## PC_ 3 
## Positive:  HLA-DQA1, CD79A, CD79B, HLA-DQB1, HLA-DPB1 
## Negative:  PPBP, PF4, SDPR, SPARC, GNG11 
## PC_ 4 
## Positive:  HLA-DQA1, CD79B, CD79A, MS4A1, HLA-DQB1 
## Negative:  VIM, IL7R, S100A6, IL32, S100A8 
## PC_ 5 
## Positive:  GZMB, NKG7, S100A8, FGFBP2, GNLY 
## Negative:  LTB, IL7R, CKB, VIM, MS4A7

VizDimLoadings(pbmc, dims = 1:2, reduction = "pca")

image

DimPlot(pbmc, reduction = "pca")

image

In particular [DimHeatmap()](https://satijalab.org/seurat/reference/DimHeatmap.html) allows for easy exploration of the primary sources of heterogeneity in a dataset, and can be useful when trying to decide which PCs to include for further downstream analyses. Both cells and features are ordered according to their PCA scores. Setting cells to a number plots the ‘extreme’ cells on both ends of the spectrum, which dramatically speeds plotting for large datasets. Though clearly a supervised analysis, we find this to be a valuable tool for exploring correlated feature sets.

DimHeatmap(pbmc, dims = 1, cells = 500, balanced = TRUE)

image

DimHeatmap(pbmc, dims = 1:15, cells = 500, balanced = TRUE)

image

Determine the ‘dimensionality’ of the dataset

To overcome the extensive technical noise in any single feature for scRNA-seq data, Seurat clusters cells based on their PCA scores, with each PC essentially representing a ‘metafeature’ that combines information across a correlated feature set. The top principal components therefore represent a robust compression of the dataset. However, how many components should we choose to include? 10? 20? 100?

In Macosko et al, we implemented a resampling test inspired by the JackStraw procedure. We randomly permute a subset of the data (1% by default) and rerun PCA, constructing a ‘null distribution’ of feature scores, and repeat this procedure. We identify ‘significant’ PCs as those who have a strong enrichment of low p-value features.

# NOTE: This process can take a long time for big datasets, comment out for expediency. More
# approximate techniques such as those implemented in ElbowPlot() can be used to reduce
# computation time
pbmc <- JackStraw(pbmc, num.replicate = 100)
pbmc <- ScoreJackStraw(pbmc, dims = 1:20)

The [JackStrawPlot()](https://satijalab.org/seurat/reference/JackStrawPlot.html) function provides a visualization tool for comparing the distribution of p-values for each PC with a uniform distribution (dashed line). ‘Significant’ PCs will show a strong enrichment of features with low p-values (solid curve above the dashed line). In this case it appears that there is a sharp drop-off in significance after the first 10-12 PCs.

JackStrawPlot(pbmc, dims = 1:15)

image

An alternative heuristic method generates an ‘Elbow plot’: a ranking of principle components based on the percentage of variance explained by each one ([ElbowPlot()](https://satijalab.org/seurat/reference/ElbowPlot.html) function). In this example, we can observe an ‘elbow’ around PC9-10, suggesting that the majority of true signal is captured in the first 10 PCs.

ElbowPlot(pbmc)

image

Identifying the true dimensionality of a dataset – can be challenging/uncertain for the user. We therefore suggest these three approaches to consider. The first is more supervised, exploring PCs to determine relevant sources of heterogeneity, and could be used in conjunction with GSEA for example. The second implements a statistical test based on a random null model, but is time-consuming for large datasets, and may not return a clear PC cutoff. The third is a heuristic that is commonly used, and can be calculated instantly. In this example, all three approaches yielded similar results, but we might have been justified in choosing anything between PC 7-12 as a cutoff.

We chose 10 here, but encourage users to consider the following:

Dendritic cell and NK aficionados may recognize that genes strongly associated with PCs 12 and 13 define rare immune subsets (i.e. MZB1 is a marker for plasmacytoid DCs). However, these groups are so rare, they are difficult to distinguish from background noise for a dataset of this size without prior knowledge.
We encourage users to repeat downstream analyses with a different number of PCs (10, 15, or even 50!). As you will observe, the results often do not differ dramatically.
We advise users to err on the higher side when choosing this parameter. For example, performing downstream analyses with only 5 PCs does significantly and adversely affect results.

Cluster the cells

Seurat v3 applies a graph-based clustering approach, building upon initial strategies in (Macosko et al). Importantly, the distance metric which drives the clustering analysis (based on previously identified PCs) remains the same. However, our approach to partitioning the cellular distance matrix into clusters has dramatically improved. Our approach was heavily inspired by recent manuscripts which applied graph-based clustering approaches to scRNA-seq data [SNN-Cliq, Xu and Su, Bioinformatics, 2015] and CyTOF data [PhenoGraph, Levine et al., Cell, 2015]. Briefly, these methods embed cells in a graph structure - for example a K-nearest neighbor (KNN) graph, with edges drawn between cells with similar feature expression patterns, and then attempt to partition this graph into highly interconnected ‘quasi-cliques’ or ‘communities’.

As in PhenoGraph, we first construct a KNN graph based on the euclidean distance in PCA space, and refine the edge weights between any two cells based on the shared overlap in their local neighborhoods (Jaccard similarity). This step is performed using the [FindNeighbors()](https://satijalab.org/seurat/reference/FindNeighbors.html) function, and takes as input the previously defined dimensionality of the dataset (first 10 PCs).

To cluster the cells, we next apply modularity optimization techniques such as the Louvain algorithm (default) or SLM [SLM, Blondel et al., Journal of Statistical Mechanics], to iteratively group cells together, with the goal of optimizing the standard modularity function. The [FindClusters()](https://satijalab.org/seurat/reference/FindClusters.html) function implements this procedure, and contains a resolution parameter that sets the ‘granularity’ of the downstream clustering, with increased values leading to a greater number of clusters. We find that setting this parameter between 0.4-1.2 typically returns good results for single-cell datasets of around 3K cells. Optimal resolution often increases for larger datasets. The clusters can be found using the [Idents()](https://satijalab.org/seurat/reference/Idents.html) function.

pbmc <- FindNeighbors(pbmc, dims = 1:10)
pbmc <- FindClusters(pbmc, resolution = 0.5)

## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 2638
## Number of edges: 95965
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8723
## Number of communities: 9
## Elapsed time: 0 seconds

# Look at cluster IDs of the first 5 cells
head(Idents(pbmc), 5)

## AAACATACAACCAC-1 AAACATTGAGCTAC-1 AAACATTGATCAGC-1 AAACCGTGCTTCCG-1 
##                2                3                2                1 
## AAACCGTGTATGCG-1 
##                6 
## Levels: 0 1 2 3 4 5 6 7 8

Run non-linear dimensional reduction (UMAP/tSNE)

Seurat offers several non-linear dimensional reduction techniques, such as tSNE and UMAP, to visualize and explore these datasets. The goal of these algorithms is to learn the underlying manifold of the data in order to place similar cells together in low-dimensional space. Cells within the graph-based clusters determined above should co-localize on these dimension reduction plots. As input to the UMAP and tSNE, we suggest using the same PCs as input to the clustering analysis.

# If you haven't installed UMAP, you can do so via reticulate::py_install(packages =
# 'umap-learn')
pbmc <- RunUMAP(pbmc, dims = 1:10)

# note that you can set `label = TRUE` or use the LabelClusters function to help label
# individual clusters
DimPlot(pbmc, reduction = "umap")

image

You can save the object at this point so that it can easily be loaded back in without having to rerun the computationally intensive steps performed above, or easily shared with collaborators.

saveRDS(pbmc, file = "../output/pbmc_tutorial.rds")

Finding differentially expressed features (cluster biomarkers)

Seurat can help you find markers that define clusters via differential expression. By default, it identifies positive and negative markers of a single cluster (specified in ident.1), compared to all other cells. [FindAllMarkers()](https://satijalab.org/seurat/reference/FindAllMarkers.html) automates this process for all clusters, but you can also test groups of clusters vs. each other, or against all cells.

The min.pct argument requires a feature to be detected at a minimum percentage in either of the two groups of cells, and the thresh.test argument requires a feature to be differentially expressed (on average) by some amount between the two groups. You can set both of these to 0, but with a dramatic increase in time - since this will test a large number of features that are unlikely to be highly discriminatory. As another option to speed up these computations, max.cells.per.ident can be set. This will downsample each identity class to have no more cells than whatever this is set to. While there is generally going to be a loss in power, the speed increases can be significant and the most highly differentially expressed features will likely still rise to the top.

# find all markers of cluster 1
cluster1.markers <- FindMarkers(pbmc, ident.1 = 2, min.pct = 0.25)
head(cluster1.markers, n = 5)

##             p_val avg_log2FC pct.1 pct.2    p_val_adj
## IL32 2.593535e-91  1.2154360 0.949 0.466 3.556774e-87
## LTB  7.994465e-87  1.2828597 0.981 0.644 1.096361e-82
## CD3D 3.922451e-70  0.9359210 0.922 0.433 5.379250e-66
## IL7R 1.130870e-66  1.1776027 0.748 0.327 1.550876e-62
## LDHB 4.082189e-65  0.8837324 0.953 0.614 5.598314e-61

# find all markers distinguishing cluster 5 from clusters 0 and 3
cluster5.markers <- FindMarkers(pbmc, ident.1 = 5, ident.2 = c(0, 3), min.pct = 0.25)
head(cluster5.markers, n = 5)

##                       p_val avg_log2FC pct.1 pct.2     p_val_adj
## FCGR3A        2.150929e-209   4.267579 0.975 0.039 2.949784e-205
## IFITM3        6.103366e-199   3.877105 0.975 0.048 8.370156e-195
## CFD           8.891428e-198   3.411039 0.938 0.037 1.219370e-193
## CD68          2.374425e-194   3.014535 0.926 0.035 3.256286e-190
## RP11-290F20.3 9.308287e-191   2.722684 0.840 0.016 1.276538e-186

# find markers for every cluster compared to all remaining cells, report only the positive ones
pbmc.markers <- FindAllMarkers(pbmc, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)
pbmc.markers %>% group_by(cluster) %>% top_n(n = 2, wt = avg_log2FC)

## # A tibble: 18 x 7
## # Groups:   cluster [9]
##        p_val avg_log2FC pct.1 pct.2 p_val_adj cluster gene    
##        <dbl>      <dbl> <dbl> <dbl>     <dbl> <fct>   <chr>   
##  1 1.74e-109       1.07 0.897 0.593 2.39e-105 0       LDHB    
##  2 1.17e- 83       1.33 0.435 0.108 1.60e- 79 0       CCR7    
##  3 0\.              5.57 0.996 0.215 0\.        1       S100A9  
##  4 0\.              5.48 0.975 0.121 0\.        1       S100A8  
##  5 7.99e- 87       1.28 0.981 0.644 1.10e- 82 2       LTB     
##  6 2.61e- 59       1.24 0.424 0.111 3.58e- 55 2       AQP3    
##  7 0\.              4.31 0.936 0.041 0\.        3       CD79A   
##  8 9.48e-271       3.59 0.622 0.022 1.30e-266 3       TCL1A   
##  9 1.17e-178       2.97 0.957 0.241 1.60e-174 4       CCL5    
## 10 4.93e-169       3.01 0.595 0.056 6.76e-165 4       GZMK    
## 11 3.51e-184       3.31 0.975 0.134 4.82e-180 5       FCGR3A  
## 12 2.03e-125       3.09 1     0.315 2.78e-121 5       LST1    
## 13 1.05e-265       4.89 0.986 0.071 1.44e-261 6       GZMB    
## 14 6.82e-175       4.92 0.958 0.135 9.36e-171 6       GNLY    
## 15 1.48e-220       3.87 0.812 0.011 2.03e-216 7       FCER1A  
## 16 1.67e- 21       2.87 1     0.513 2.28e- 17 7       HLA-DPB1
## 17 7.73e-200       7.24 1     0.01  1.06e-195 8       PF4     
## 18 3.68e-110       8.58 1     0.024 5.05e-106 8       PPBP

Seurat has several tests for differential expression which can be set with the test.use parameter (see our DE vignette for details). For example, the ROC test returns the ‘classification power’ for any individual marker (ranging from 0 - random, to 1 - perfect).

cluster1.markers <- FindMarkers(pbmc, ident.1 = 0, logfc.threshold = 0.25, test.use = "roc", only.pos = TRUE)

We include several tools for visualizing marker expression. [VlnPlot()](https://satijalab.org/seurat/reference/VlnPlot.html) (shows expression probability distributions across clusters), and [FeaturePlot()](https://satijalab.org/seurat/reference/FeaturePlot.html) (visualizes feature expression on a tSNE or PCA plot) are our most commonly used visualizations. We also suggest exploring [RidgePlot()](https://satijalab.org/seurat/reference/RidgePlot.html), [CellScatter()](https://satijalab.org/seurat/reference/CellScatter.html), and [DotPlot()](https://satijalab.org/seurat/reference/DotPlot.html) as additional methods to view your dataset.

VlnPlot(pbmc, features = c("MS4A1", "CD79A"))

image

# you can plot raw counts as well
VlnPlot(pbmc, features = c("NKG7", "PF4"), slot = "counts", log = TRUE)

image

FeaturePlot(pbmc, features = c("MS4A1", "GNLY", "CD3E", "CD14", "FCER1A", "FCGR3A", "LYZ", "PPBP", 
    "CD8A"))

image

[DoHeatmap()](https://satijalab.org/seurat/reference/DoHeatmap.html) generates an expression heatmap for given cells and features. In this case, we are plotting the top 20 markers (or all markers if less than 20) for each cluster.

top10 <- pbmc.markers %>% group_by(cluster) %>% top_n(n = 10, wt = avg_log2FC)
DoHeatmap(pbmc, features = top10$gene) + NoLegend()

image

Assigning cell type identity to clusters

Fortunately in the case of this dataset, we can use canonical markers to easily match the unbiased clustering to known cell types:

Cluster ID	Markers	Cell Type
0	IL7R, CCR7	Naive CD4+ T
1	CD14, LYZ	CD14+ Mono
2	IL7R, S100A4	Memory CD4+
3	MS4A1	B
4	CD8A	CD8+ T
5	FCGR3A, MS4A7	FCGR3A+ Mono
6	GNLY, NKG7	NK
7	FCER1A, CST3	DC
8	PPBP	Platelet

new.cluster.ids <- c("Naive CD4 T", "CD14+ Mono", "Memory CD4 T", "B", "CD8 T", "FCGR3A+ Mono", 
    "NK", "DC", "Platelet")
names(new.cluster.ids) <- levels(pbmc)
pbmc <- RenameIdents(pbmc, new.cluster.ids)
DimPlot(pbmc, reduction = "umap", label = TRUE, pt.size = 0.5) + NoLegend()

image

saveRDS(pbmc, file = "../output/pbmc3k_final.rds")

<details open="" style="box-sizing: border-box; display: block;"><summary style="box-sizing: border-box; display: list-item;">Session Info</summary>

sessionInfo()

## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04 LTS
## 
## Matrix products: default
## BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C             
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggplot2_3.3.3      patchwork_1.1.1    SeuratObject_4.0.0 Seurat_4.0.0      
## [5] dplyr_1.0.4       
## 
## loaded via a namespace (and not attached):
##   [1] Rtsne_0.15           colorspace_2.0-0     deldir_0.2-9        
##   [4] ellipsis_0.3.1       ggridges_0.5.3       rprojroot_2.0.2     
##   [7] fs_1.5.0             spatstat.data_1.7-0  farver_2.0.3        
##  [10] leiden_0.3.7         listenv_0.8.0        ggrepel_0.9.1       
##  [13] fansi_0.4.2          RSpectra_0.16-0      codetools_0.2-16    
##  [16] splines_4.0.3        cachem_1.0.3         knitr_1.31          
##  [19] polyclip_1.10-0      jsonlite_1.7.2       ica_1.0-2           
##  [22] cluster_2.1.0        png_0.1-7            uwot_0.1.10         
##  [25] shiny_1.6.0          sctransform_0.3.2    compiler_4.0.3      
##  [28] httr_1.4.2           assertthat_0.2.1     Matrix_1.2-18       
##  [31] fastmap_1.1.0        lazyeval_0.2.2       cli_2.3.0           
##  [34] limma_3.46.0         later_1.1.0.1        formatR_1.7         
##  [37] htmltools_0.5.1.1    tools_4.0.3          igraph_1.2.6        
##  [40] gtable_0.3.0         glue_1.4.2           RANN_2.6.1          
##  [43] reshape2_1.4.4       Rcpp_1.0.6           spatstat_1.64-1     
##  [46] scattermore_0.7      pkgdown_1.6.1        vctrs_0.3.6         
##  [49] nlme_3.1-149         lmtest_0.9-38        xfun_0.20           
##  [52] stringr_1.4.0        globals_0.14.0       mime_0.9            
##  [55] miniUI_0.1.1.1       lifecycle_0.2.0      irlba_2.3.3         
##  [58] goftest_1.2-2        future_1.21.0        MASS_7.3-53         
##  [61] zoo_1.8-8            scales_1.1.1         ragg_0.4.1          
##  [64] promises_1.1.1       spatstat.utils_2.0-0 parallel_4.0.3      
##  [67] RColorBrewer_1.1-2   yaml_2.2.1           memoise_2.0.0       
##  [70] reticulate_1.18      pbapply_1.4-3        gridExtra_2.3       
##  [73] rpart_4.1-15         stringi_1.5.3        highr_0.8           
##  [76] desc_1.2.0           rlang_0.4.10         pkgconfig_2.0.3     
##  [79] systemfonts_1.0.0    matrixStats_0.58.0   evaluate_0.14       
##  [82] lattice_0.20-41      tensor_1.5           ROCR_1.0-11         
##  [85] purrr_0.3.4          labeling_0.4.2       htmlwidgets_1.5.3   
##  [88] cowplot_1.1.1        tidyselect_1.1.0     parallelly_1.23.0   
##  [91] RcppAnnoy_0.0.18     plyr_1.8.6           magrittr_2.0.1      
##  [94] R6_2.5.0             generics_0.1.0       DBI_1.1.1           
##  [97] withr_2.4.1          mgcv_1.8-33          pillar_1.4.7        
## [100] fitdistrplus_1.1-3   survival_3.2-7       abind_1.4-5         
## [103] tibble_3.0.6         future.apply_1.7.0   crayon_1.4.0        
## [106] utf8_1.1.4           KernSmooth_2.23-17   plotly_4.9.3        
## [109] rmarkdown_2.6        grid_4.0.3           data.table_1.13.6   
## [112] digest_0.6.27        xtable_1.8-4         tidyr_1.1.2         
## [115] httpuv_1.5.5         textshaping_0.2.1    munsell_0.5.0       
## [118] viridisLite_0.3.0
```</details> 

<nav id="toc" data-toggle="toc" style="box-sizing: border-box; display: block;">

## Contents

*   [Setup the Seurat Object](https://satijalab.org/seurat/articles/pbmc3k_tutorial.html#setup-the-seurat-object-1)
*   [Standard pre-processing workflow](https://satijalab.org/seurat/articles/pbmc3k_tutorial.html#standard-pre-processing-workflow-1)
*   [Normalizing the data](https://satijalab.org/seurat/articles/pbmc3k_tutorial.html#normalizing-the-data-1)
*   [Identification of highly variable features (feature selection)](https://satijalab.org/seurat/articles/pbmc3k_tutorial.html#identification-of-highly-variable-features-feature-selection-)
*   [Scaling the data](https://satijalab.org/seurat/articles/pbmc3k_tutorial.html#scaling-the-data-1)
*   [Perform linear dimensional reduction](https://satijalab.org/seurat/articles/pbmc3k_tutorial.html#perform-linear-dimensional-reduction-1)
*   [Determine the ‘dimensionality’ of the dataset](https://satijalab.org/seurat/articles/pbmc3k_tutorial.html#determine-the-dimensionality-of-the-dataset-1)
*   [Cluster the cells](https://satijalab.org/seurat/articles/pbmc3k_tutorial.html#cluster-the-cells-1)
*   [Run non-linear dimensional reduction (UMAP/tSNE)](https://satijalab.org/seurat/articles/pbmc3k_tutorial.html#run-non-linear-dimensional-reduction-umap-tsne-)
*   [Finding differentially expressed features (cluster biomarkers)](https://satijalab.org/seurat/articles/pbmc3k_tutorial.html#finding-differentially-expressed-features-cluster-biomarkers-)
*   [Assigning cell type identity to clusters](https://satijalab.org/seurat/articles/pbmc3k_tutorial.html#assigning-cell-type-identity-to-clusters-1)

</nav>

<footer style="box-sizing: border-box; display: flex; margin-top: 45px; padding: 35px 0px 36px; border-top: 1px solid rgb(229, 229, 229); color: rgb(102, 102, 102); flex-shrink: 0;">

Developed by Paul Hoffman, Satija Lab and Collaborators.

Site built with [pkgdown](https://pkgdown.r-lib.org/) 1.6.1.

</footer>

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末泳姐，一起剝皮案震驚了整個(gè)濱河市业扒，隨后出現(xiàn)的幾起案子尚蝌，更是在濱河造成了極大的恐慌系馆，老刑警劉巖获询，帶你破解...
沈念sama閱讀 206,126評論 6贊 481
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件，死亡現(xiàn)場離奇詭異顶别，居然都是意外死亡狐粱，警方通過查閱死者的電腦和手機(jī)，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 88,254評論 2贊 382
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門斑唬，熙熙樓的掌柜王于貴愁眉苦臉地迎上來市埋，“玉大人，你說我怎么就攤上這事赖钞⊙兀” “怎么了？”我有些...
開封第一講書人閱讀 152,445評論 0贊 341
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵雪营，是天一觀的道長弓千。經(jīng)常有香客問我，道長献起，這世上最難降的妖魔是什么洋访？我笑而不...
開封第一講書人閱讀 55,185評論 1贊 278
?港島之戀（遺憾婚禮）
正文為了忘掉前任镣陕，我火速辦了婚禮，結(jié)果婚禮上姻政，老公的妹妹穿的比我還像新娘呆抑。我一直安慰自己，他們只是感情好汁展，可當(dāng)我...
茶點(diǎn)故事閱讀 64,178評論 5贊 371
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布鹊碍。她就那樣靜靜地躺著，像睡著了一般食绿。火紅的嫁衣襯著肌膚如雪侈咕。梳的紋絲不亂的頭發(fā)上，一...
開封第一講書人閱讀 48,970評論 1贊 284
城市分裂傳說
那天器紧，我揣著相機(jī)與錄音耀销，去河邊找鬼。笑死铲汪，一個(gè)胖子當(dāng)著我的面吹牛熊尉，可吹牛的內(nèi)容都是我干的。我是一名探鬼主播掌腰，決...
沈念sama閱讀 38,276評論 3贊 399
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼狰住，長吁一口氣：“原來是場噩夢啊……” “哼！你這毒婦竟也來了辅斟？” 一聲冷哼從身側(cè)響起转晰，我...
開封第一講書人閱讀 36,927評論 0贊 259
萬榮殺人案實(shí)錄
序言：老撾萬榮一對情侶失蹤芦拿，失蹤者是張志新（化名）和其女友劉穎士飒，沒想到半個(gè)月后，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體蔗崎，經(jīng)...
沈念sama閱讀 43,400評論 1贊 300
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡酵幕，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 35,883評論 2贊 323
?白月光啟示錄
正文我和宋清朗相戀三年，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了缓苛。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片芳撒。...
茶點(diǎn)故事閱讀 37,997評論 1贊 333
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡，死狀恐怖未桥，靈堂內(nèi)的尸體忽然破棺而出笔刹，到底是詐尸還是另有隱情，我是刑警寧澤冬耿，帶...
沈念sama閱讀 33,646評論 4贊 322
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布舌菜，位于F島的核電站，受9級特大地震影響亦镶，放射性物質(zhì)發(fā)生泄漏日月。R本人自食惡果不足惜袱瓮，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 39,213評論 3贊 307
男人毒藥：我在死后第九天來索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望爱咬。院中可真熱鬧尺借，春花似錦、人聲如沸精拟。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,204評論 0贊 19
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽蜂绎。三九已至瘫里，卻和暖如春，著一層夾襖步出監(jiān)牢的瞬間荡碾，已是汗流浹背谨读。一陣腳步聲響...
開封第一講書人閱讀 31,423評論 1贊 260
情欲美人皮
我被黑心中介騙來泰國打工，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留坛吁，地道東北人劳殖。一個(gè)月前我還...
沈念sama閱讀 45,423評論 2贊 352
代替公主和親
正文我出身青樓，卻偏偏與公主長得像拨脉，于是被迫代替她去往敵國和親哆姻。傳聞我的和親對象是個(gè)殘疾皇子，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 42,722評論 2贊 345

Seurat - Guided Clustering Tutorial

Compiled: February 08, 2021

Setup the Seurat Object

Standard pre-processing workflow

QC and selecting cells for further analysis

Normalizing the data

Identification of highly variable features (feature selection)

Scaling the data

Perform linear dimensional reduction

Determine the ‘dimensionality’ of the dataset

Cluster the cells

Run non-linear dimensional reduction (UMAP/tSNE)

Finding differentially expressed features (cluster biomarkers)

Assigning cell type identity to clusters

推薦閱讀更多精彩內(nèi)容