Cell type discovery and representation in the era of high-content single cell phenotyping
題目:高含量單細(xì)胞表型時(shí)代下的細(xì)胞類型發(fā)現(xiàn)與表征
作者及單位:
Trygve Bakken?, Lindsay Cowell?, Brian D. Aevermann, Mark Novotny, Rebecca Hodge, Jeremy A. Miller, Alexandra Lee, Ivan Chang, Jamison McCorrison, Bali Pulendran, Yu Qian, Nicholas J. Schork, Roger S. Lasken, Ed S. Lein and Richard H. Scheuermann
- J. Craig Venter Institute, 4120 Capricorn Lane, La Jolla, CA 92037, USA
- Department of Pathology, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
發(fā)表期刊及時(shí)間:
BMC BioinformaticsBMC series
- Published: 21 December 2017
摘要:
Background
A fundamental characteristic of multicellular organisms is the specialization of functional cell types through the process of differentiation. These specialized cell types not only characterize the normal functioning of different organs and tissues, they can also be used as cellular biomarkers of a variety of different disease states and therapeutic/vaccine responses. In order to serve as a reference for cell type representation, the Cell Ontology has been developed to provide a standard nomenclature of defined cell types for comparative analysis and biomarker discovery. Historically, these cell types have been defined based on unique cellular shapes and structures, anatomic locations, and marker protein expression. However, we are now experiencing a revolution in cellular characterization resulting from the application of new high-throughput, high-content cytometry and sequencing technologies. The resulting explosion in the number of distinct cell types being identified is challenging the current paradigm for cell type definition in the Cell Ontology.
背景
多細(xì)胞生物的基本特征是通過分化過程使功能細(xì)胞類型特化太颤。這些特化細(xì)胞類型不僅表征不同器官和組織 的正常功能,它們還可以用作各種不同疾病狀態(tài)和治療/疫苗反應(yīng)的細(xì)胞生物標(biāo)志物缤沦。為了作為細(xì)胞類型代表的 參考挑童,已經(jīng)開發(fā)了細(xì)胞本體論(Cell Ontology)數(shù)據(jù)庫以提供定義細(xì)胞類型的標(biāo)準(zhǔn)命名法趾访,這套方法可以用于 比較分析和生物標(biāo)記物的發(fā)現(xiàn)挪凑。歷史上君旦,已經(jīng)基于細(xì)胞獨(dú)特的形狀甩牺、結(jié)構(gòu)、解剖學(xué)位置和標(biāo)記蛋白表達(dá)來定義 這些細(xì)胞類型煌茬。然而斥铺,由于新高通量、高含量細(xì)胞計(jì)數(shù)和測(cè)序技術(shù)的應(yīng)用坛善,我們現(xiàn)在正在經(jīng)歷細(xì)胞表征的革 命晾蜘。由此導(dǎo)致鑒定出的不同細(xì)胞類型數(shù)量的爆炸性增長,給當(dāng)前細(xì)胞本體論中細(xì)胞類型定義的范例帶來了挑 戰(zhàn)
Results
In this paper, we provide examples of state-of-the-art cellular biomarker characterization using high-content cytometry and single cell RNA sequencing, and present strategies for standardized cell type representations based on the data outputs from these cutting-edge technologies, including “context annotations” in the form of standardized experiment metadata about the specimen source analyzed and marker genes that serve as the most useful features in machine learning-based cell type classification models. We also propose a statistical strategy for comparing new experiment data to these standardized cell type representations.
結(jié)果 在本文中眠屎,我們利用高含量細(xì)胞計(jì)數(shù)和單細(xì)胞RNA測(cè)序的方法笙纤,提供最新細(xì)胞生物標(biāo)記物表征的實(shí)例,并 基于這些尖端技術(shù)的輸出數(shù)據(jù)提供了標(biāo)準(zhǔn)化細(xì)胞類型表示的策略组力,包括以標(biāo)準(zhǔn)化實(shí)驗(yàn)元數(shù)據(jù)的形式的“上下文注 釋”省容,這些元數(shù)據(jù)關(guān)于標(biāo)本源分析以及marker基因(基于機(jī)器學(xué)習(xí)的細(xì)胞類型分類模型中最有用的特征得 到)。我們還提出了一種統(tǒng)計(jì)策略燎字,用于將新實(shí)驗(yàn)數(shù)據(jù)與這些標(biāo)準(zhǔn)化細(xì)胞類型代表進(jìn)行比較
Conclusion
The advent of high-throughput/high-content single cell technologies is leading to an explosion in the number of distinct cell types being identified. It will be critical for the bioinformatics community to develop and adopt data standard conventions that will be compatible with these new technologies and support the data representation needs of the research community. The proposals enumerated here will serve as a useful starting point to address these challenges.
結(jié)論 高通量/高含量單細(xì)胞技術(shù)的出現(xiàn)帶來了不同細(xì)胞類型的數(shù)量激增腥椒。重要的是,生物信息學(xué)領(lǐng)域需要開發(fā) 和采用與這些新技術(shù)兼容的數(shù)據(jù)標(biāo)準(zhǔn)慣例候衍,并支持研究界的數(shù)據(jù)表示需求笼蛛。這里列舉的提案將成為應(yīng)對(duì)這些挑 戰(zhàn)的有用起點(diǎn)。
Keyword
- Cell ontology
- Single cell transcriptomics
- Cell phenotype
- Peripheral blood mononuclear cells
- Neuron
- Next generation sequencing
- Cytometry
- Open biomedical ontologies
- Marker genes
關(guān)鍵詞: 細(xì)胞本體論蛉鹿, 單細(xì)胞轉(zhuǎn)錄組學(xué)滨砍, 細(xì)胞表型, 外周血單核 細(xì)胞妖异, 神經(jīng)元惋戏, 二代測(cè)序, 血細(xì)胞計(jì)數(shù)他膳, 開放性生物醫(yī)學(xué)本體論响逢, 標(biāo) 記基因
圖表選摘:
Fig. 1 Identification of myeloid cell subtypes using manual gating and directed automated filtering.
圖1. 使用手動(dòng)控制和定向自動(dòng)過濾技術(shù)鑒 定骨髓細(xì)胞亞型
A gating hierarchy (a series of iterative two-dimensional manual data partitions) has been established by the investigative team in which peripheral blood mononuclear cells (PBMC) are assessed for expression of HLA-DR and CD3, CD3- cells (Population #5) are assessed for expression of CD19 and CD14, CD19- cells (Population #7) are then assessed for expression of HLA-DR and CD16, HLA-DR+ cells (Population #10) are assessed for expression of HLA-DR and CD14, CD14- cells (Population #19) are assessed for expression of CD123 and CD141, CD141- cells (Population #21) are assessed for expression of CD11c and CD123, and CD11c?+?cells (Population #23) are assessed for expression of CD1c and CD16. Manual gating results are shown in the top panel; directed automated filter results using the DAFi method, a modified version of the FLOCK algorithm [21] are shown in the bottom panel
評(píng)估外周血單核細(xì)胞 HLA-DR 和 CD3 的表達(dá)情況,建立層級(jí)調(diào)控(一 系列二維迭代對(duì)數(shù)據(jù)進(jìn)行劃分) 棕孙, 評(píng)估缺少 CD3 的細(xì)胞(群落 5) 中 CD19 和 CD14 的表達(dá)情況舔亭, 然后評(píng)估缺少 CD19 的細(xì)胞(群落 7) 中 HLA-DR 和 CD16 的表達(dá)情況, 評(píng)估 HLA-DR 表達(dá)的細(xì)胞(群落 10) 中 HLA-DR 和 CD14 的表達(dá)情況蟀俊, 評(píng)估缺少 CD14 的細(xì)胞(群落 19) 中 CD123 和 CD141 的表達(dá)情況钦铺, 評(píng)估缺少 CD141 的細(xì)胞(群 落 21) 中 CD11c 和 CD123 的表達(dá)情況, 評(píng)估 CD11c 表達(dá)的細(xì)胞(群 落 23) 中 CD1c 和 CD16 的表達(dá)情況肢预, 手工控制調(diào)控結(jié)果顯示在圖表 上方矛洞, 圖表底部顯示了使用 DAFi 方法的定向自動(dòng)過濾結(jié)果, 即修改 后的 flock 算法版本
Fig. 2 Cell type representations in the Cell Ontology.
圖2. 在細(xì)胞本體論中的細(xì)胞類型的表示
a The expanded is_a hierarchy of the monocyte branch. b The expanded is_a hierarchy of the dendritic cell branch. c An example of a cell type term record for dendritic cell. Note the presence of both textual definitions in the “definition” field, and the components of the logical axioms in the “has part”, “l(fā)acks_plasma_membrane_part”, and “subClassOf” fields
a 擴(kuò)展的是單核細(xì)胞分支的各類分級(jí)误甚。b 擴(kuò)展的是樹突狀細(xì)胞分支的各類分級(jí)缚甩。 c 樹突狀細(xì)胞的細(xì)胞類型術(shù)語 的例子谱净。注意文本定義同時(shí)存在于“definition”部分窑邦,以及組成邏輯公理的“has part”擅威,“l(fā)acks_plasma_membrane_part”,以及“subClassOf” 部分冈钦。
Fig. 3 Cell type clustering and marker gene expression from RNA sequencing of single nuclei isolated from layer 1 cortex of post-mortem human brain.
圖3.單細(xì)胞細(xì)胞核RNA測(cè)序結(jié)果郊丛,樣本來自于人腦的1皮層。
a Heatmap of CPM expression levels of a subset of genes that show selective expression in the 11 clusters of cells identified by principle component analysis (not show). An example of the statistical methods used to identify cell clusters and marker genes from single cell/single nuclei data can be found in [13]. b Violin plots of selected marker genes in each of the 11 cell clusters. c The expanded is_a hierarchy of the neuron branch of the Cell Ontology, with the interneuron sub-branch highlighted
利用marker基因的表達(dá)水平進(jìn)行聚類 來區(qū)分細(xì)胞類型 a. 熱圖的PCA聚類結(jié)果瞧筛, 11種細(xì)胞類型 b. 小提琴圖厉熟,每種marker基因在11種細(xì)胞中的表達(dá)情況 (單位: CPM) c. 細(xì)胞類型樹,高亮的Interneuron與本次鑒定出的細(xì)胞類型最匹配
Cell population identification from single cell transcriptional profiling
單細(xì) 胞轉(zhuǎn)錄譜的細(xì)胞群體鑒定
While flow cytometry relies on detection of a pre-selected set of proteins to help define a cell’s “parts list”, transcriptional profiling uses unbiased RNA detection and quantification to characterize the parts list. Recently, the RNA sequencing technology for transcriptional profiling has been optimized for use on single cells, so-called single cell RNA sequencing (scRNAseq). The application of scRNAseq on samples from a variety of different normal and abnormal tissues is revealing a level of cellular complexity that was unanticipated only a few years ago. Thus, we are experiencing an explosion in the number of new cell types being identified using these unbiased highthroughput/high-content experimental technologies
流式細(xì)胞術(shù)依靠檢測(cè)一組預(yù)先選定的蛋白質(zhì)來幫助定義細(xì)胞的“部分列表”较幌,而轉(zhuǎn)錄圖譜則使用不帶偏見 的RNA檢測(cè)和定量來描述部分列表揍瑟。近年來,轉(zhuǎn)錄譜的RNA測(cè)序技術(shù)已經(jīng)被優(yōu)化用于單細(xì)胞乍炉,也就是所 謂的單細(xì)胞RNA測(cè)序(Scrnaseq)绢片。 Scrnaseq在來自不同正常和異常組織的樣本上的應(yīng)用揭示了幾年前還 未預(yù)料到的細(xì)胞復(fù)雜性水平。因此岛琼,我們正經(jīng)歷著使用這些不偏不倚的高通量/高含量實(shí)驗(yàn)技術(shù)來識(shí)別新 的細(xì)胞類型的數(shù)量激增底循。
As an example, our group has recently completed an analysis of the transcriptional profiles of single nuclei from post-mortem human brain using single nucleus RNA sequencing (snRNAseq). Single nuclei from cortical layer 1 of the middle temporal gyrus were sorted into individual wells of a microtiter plate for snRNAseq analysis, and specific cell type clusters identified using iterative principle component analysis (unpublished). A heatmap of gene expression values reveals the differential expression pattern across cells from the 11 different neuronal cell clusters identified** (Fig. 3a)**. Note that cells in all 11 clusters express GAD1 (top row), a well-known marker of inhibitory interneurons. Violin plots of selected marker genes for each cell cluster demonstrate their selective expression patterns (Fig. 3b). For example, GRIK3 is selectively expressed in the i2 cluster.
作為一個(gè)例子,我們的小組最近已經(jīng)完成了使用單核RNA測(cè)序(Snrnaseq)對(duì)死后人腦單個(gè)核轉(zhuǎn)錄譜的分 析槐瑞。將顳中回第1層的單個(gè)核分為微滴板的單個(gè)井進(jìn)行snrnaseq分析熙涤,并通過迭代主成分分析(未發(fā)表) 確定特定的細(xì)胞類型簇±ч荩基因表達(dá)值的熱圖顯示了在識(shí)別出的11個(gè)不同的神經(jīng)細(xì)胞群中細(xì)胞間的差異表 達(dá)模式(圖)祠挫。 3A)。注意悼沿,所有11個(gè)簇中的細(xì)胞都表達(dá)小工具1(頂行)茸歧,這是一種眾所周知的抑制性中間 神經(jīng)元的標(biāo)記。為每個(gè)細(xì)胞簇選擇標(biāo)記基因的小提琴圖顯示了它們的選擇性表達(dá)模式(圖)显沈。 3B)软瞎。例如, grik 3在i2集群中有選擇地表示
In order to determine if the distinct cell types reflected in these snRNAseq-derived clusters have been previously reported, we examine the neuronal branch of the CL (Fig. 3c) and found that the cerebral cortex GABAergic interneuron is probably the closest match based on the following relevant definitions: cerebral cortex GABAergic interneuron - a GABAergic interneuron that is part_of a cerebral cortex.
為了確定在這些snrnaseq衍生簇中所反映的不同的細(xì)胞類型拉讯,我們檢查了cl的神經(jīng)元分支(圖)(3C)并 根據(jù)以下相關(guān)定義發(fā)現(xiàn)大腦皮層GABAergic 與中間神經(jīng)元可能是最接近匹配的:大腦皮層GABAergic 中間神經(jīng) 元-GABAergic 中間神經(jīng)元是大腦皮層的一部分涤浇。
GABAergic interneuron – An interneuron that uses GABA as a vesicular neurotransmitter.
GABAergic 中 間神經(jīng)元-一種利用GABA作為水泡神經(jīng)遞質(zhì)的中間神經(jīng)元。
interneuron – Most generally any neuron which is not motor or sensory. Interneurons may also refer to neurons whose axons remain within a particular brain region as contrasted with projection neurons which have axons projecting to other brain regions.
中間神經(jīng)元-通常是任何不是運(yùn)動(dòng)或感覺的神經(jīng)元魔慷。中間神經(jīng)元也可以指軸突停留在特定腦區(qū)的神經(jīng) 元只锭,與投射神經(jīng)元形成對(duì)比,投射神經(jīng)元的軸突投射到其他腦區(qū)
neuron - The basic cellular unit of nervous tissue. Each neuron consists of a body, an axon, and dendrites. Their purpose is to receive, conduct, and transmit impulses in the nervous system.
神經(jīng) 元-神經(jīng)組織的基本細(xì)胞單位院尔。每個(gè)神經(jīng)元由一個(gè)身體蜻展,一個(gè)軸突和樹突組成喉誊。他們的目的是接收、引 導(dǎo)和傳遞神經(jīng)系統(tǒng)中的沖動(dòng)纵顾。
Given these definitions, it appears that each of the cell types defined by these single nuclei expression clusters represents a novel cell type that should be positioned under the cerebral cortex GABAergic interneuron parent class in the CL.
根據(jù)這些定義伍茄,似乎這些單個(gè)核表達(dá)簇所定義的每一種細(xì)胞類型都代表了一種新的細(xì)胞類型,這種類型 應(yīng)該位于CL中大腦皮層GABA能間神經(jīng)元的父類之下施逾。
Fig. 4 Proposed cell type names and definitions for cell types identified from the snRNAseq experiment shown in Fig. 3
圖4. 從圖 3 所示的 snRNAseq 實(shí)驗(yàn)中識(shí)別的細(xì)胞類型敷矫,提出了細(xì)胞類型名稱和 定義
翻譯小組:
王俊豪、葉名琛汉额、鄭易民曹仗、陳志榮、鄧峻瑋蠕搜、鄭凌伶