Amezquita, Robert & Hicks, Stephanie. (2019). Orchestrating Single-Cell Analysis with Bioconductor. 10.1101/590562.
近年來氧吐,諸如單細(xì)胞RNA測序等實驗技術(shù)的發(fā)展使得在單個細(xì)胞中對基因組范圍內(nèi)的特征進行高維分析成為可能,這激發(fā)了大規(guī)模數(shù)據(jù)生成項目的形成怀樟,這些項目量化了單細(xì)胞水平上前所未有的生物變異功偿。這些項目產(chǎn)生的數(shù)據(jù)在特征數(shù)量和樣本數(shù)量上都表現(xiàn)出獨特的特征,包括稀疏性和規(guī)模的增加往堡。
由于這些獨特的特性械荷,需要專門的統(tǒng)計方法和快速有效的軟件實現(xiàn),以便成功地獲得生物學(xué)見解虑灰。Bioconductor是一個基于R編程語言的開源养葵、開放開發(fā)的軟件項目,它利用豐富的軟件歷史和方法開發(fā)經(jīng)驗瘩缆,率先分析了這種高通量、高維的生物數(shù)據(jù)佃蚜。
Bioconductor擁有最先進的計算方法庸娱、標(biāo)準(zhǔn)化的數(shù)據(jù)基礎(chǔ)設(shè)施和交互式的數(shù)據(jù)可視化工具着绊,這些工具都可以很容易地作為軟件包訪問,Bioconductor使不同的用戶能夠分析來自尖端單細(xì)胞檢測的數(shù)據(jù)熟尉。在這里归露,我們?yōu)闈撛诘挠脩艉拓暙I者提供了單細(xì)胞RNA測序分析的概述,并強調(diào)了Bioconductor在這方面的貢獻斤儿。
在sc-Review:單細(xì)胞RNA-seq數(shù)據(jù)分析最佳實踐中剧包,我們講述了單細(xì)胞數(shù)據(jù)分析各個步驟的關(guān)鍵點。單細(xì)胞數(shù)據(jù)分析有著先天的優(yōu)勢往果,大部分關(guān)于基因的分析方法以及統(tǒng)計算法都已經(jīng)開源了疆液,這也是這一塊發(fā)展比較快的一個原因。Bioconductor就是生物信息發(fā)展的一個寫照:開源陕贮、便捷堕油、文檔健全。2019年肮之,Bioconductor關(guān)于單細(xì)胞數(shù)據(jù)的工具爆發(fā)式增長掉缺,也發(fā)展出了特定的數(shù)據(jù)存儲格式:以類,對象戈擒,包的形式眶明。
- Sample: a single biological unit that is assayed.
- Feature: a trait of a sample that is measured. Examples include mRNAs in RNA-seq experiments, genomic loci for ChIP-seq experiments, and cell markers in flow/CyTOF experiments.
- Experiment: a procedure where a set of features are measured for each sample; in this usage, typically involves multiple samples, possibly with varying conditions (e.g. treatments, time points).
- High-throughput assay: an assay that captures and measures features from many samples. Examples include flow cytometry, CyTOF, and certain scRNA-seq platforms, which can quantify tens or hundreds of thousands to millions of cells. For this reason, in our review, most bulk assays are not considered high-throughput as they profile a limited number of samples.
- High-dimensional assay: an assay that captures thousands or tens of thousands of features per single sample unit. In our review, high throughput assays such as flow cytometry are not considered high-dimensional as they profile a limited number of proteins. Bulk assay: an assay that measures pools of cells to produce a set of measured features as a single observation unit per pool.
- Single-cell assay: a technology where a single sample corresponds to a single cell; includes flow cytometry, CyTOF, and single-cell RNA-seq (scRNA-seq) across various platform technologies (plate-based, droplet, etc.).
數(shù)據(jù)結(jié)構(gòu):
A : 最小的sce對象是通過提供數(shù)據(jù)來構(gòu)建的,比如每個細(xì)胞的計數(shù)矩陣(藍(lán)色方框)筐高,由特征組成搜囱,比如基因(行)和細(xì)胞(列)。還可以提供描述單元格的元數(shù)據(jù)凯傲,其中單元格表示為行犬辰,單元格的已知特征為列(橙色框)。類似地冰单,也可以添加描述特性的元數(shù)據(jù)(綠色框)幌缝。這些不同類型的數(shù)據(jù)都存儲在sce對象的不同部分中,這些部分稱為槽(slots)诫欠。每個槽中的數(shù)據(jù)可以通過以各自的槽(箭頭)命名的訪問器以編程方式訪問涵卵,比如rowRanges指的是特征元數(shù)據(jù),colData指的是樣本元數(shù)據(jù)荒叼,assay指的是數(shù)據(jù)轿偎。
B : 使用sce (singlecellexper, sce)兼容的工作流進行分析,將數(shù)據(jù)附加到初始sce對象被廓。例如坏晦,計算每個單元格的庫規(guī)范化因子將創(chuàng)建一個新槽(粉色框)。這些可以用來推導(dǎo)一個歸一化計數(shù)矩陣,它與初始計數(shù)數(shù)據(jù)(深藍(lán)色方框)一起存儲在同一個檢測槽中昆婿。因此球碉,分析槽能夠存儲任意數(shù)量的數(shù)據(jù)轉(zhuǎn)換。單元質(zhì)量度量(描述單元特征)被附加到樣例元數(shù)據(jù)槽colData中仓蛆。最后睁冬,以與分析槽類似的方式,可以存儲任意數(shù)量的維數(shù)縮減的數(shù)據(jù)表示形式看疙,駐留在它們自己的槽中豆拨,reducedDim。
C : sce對象在典型分析的整個過程中不斷發(fā)展能庆,存儲來自初始數(shù)據(jù)的各種度量和表示施禾。有關(guān)singlecellexper類的更多信息,請參見singlecellexper(https://bioconductor.org/packages/singlecellexper)相味。
Bioconductor 還有一大優(yōu)勢:可以以包的形式分發(fā)數(shù)據(jù)集拾积,這樣一來大量的數(shù)據(jù)就可以在R中直接訪問了。
單細(xì)胞數(shù)據(jù)標(biāo)準(zhǔn)流程:
這些分析的結(jié)果存放在:
已有工具庫: