10X單細胞軌跡分析（擬時分析）之cytotrace

hello惋鹅，大家好芯义，這次我們來分享一下做軌跡分析的軟件----CytoTRACE,文章在Single-cell transcriptional diversity is a hallmark of developmental potential,2020年1月表達于science弯菊，相當(dāng)牛了，跟URD有一拼古涧。當(dāng)然馍乙，關(guān)于軌跡分析的方法之前分享過很多了，比如單細胞數(shù)據(jù)擬時分析之VIA（我的優(yōu)勢你們比不了）,10X單細胞軌跡分析之回顧,擬時分析軟件Palantir炎辨，以及空間轉(zhuǎn)錄組軌跡分析的方法10X空間轉(zhuǎn)錄組的軌跡分析捕透，今天我們來看看這個軟件有什么不同。

Abstract

Single-cell RNA sequencing (scRNA-seq) is a powerful approach for reconstructing cellular differentiation trajectories. However, inferring both the state and direction of differentiation is challenging（這個顯而易見）. Here, we demonstrate a simple, yet robust, determinant（決定條件） of developmental potential—the number of expressed genes per cell（基因表達的數(shù)量）—and leverage this measure of transcriptional diversity to develop a computational framework（依據(jù)基因表達的數(shù)量進行發(fā)育軌跡的推斷？乙嘀？牛啊） (CytoTRACE) for predicting differentiation states from scRNA-seq data. When applied to diverse tissue types and organisms, CytoTRACE outperformed previous methods and nearly 19,000 annotated gene sets for resolving 52 experimentally determined developmental trajectories（背景倒是很豐厚）. Additionally, it facilitated the identification of quiescent stem cells and revealed genes that contribute to breast tumorigenesis. This study thus establishes a key RNA-based feature of developmental potential and a platform for delineation of cellular hierarchies.（看來這個方法有很多值得一看的地方）末购。

introduction

Inmulticellular organisms, tissues are hierarchically organized into distinct cell types and cellular stateswith intrinsic differences in function and developmental potential。當(dāng)然虎谢，目前已經(jīng)有了很多新的方法盟榴，但是 Though powerful, these technologies cannot be applied to human tissues in vivo and generally require prior knowledge of cell type–specific genetic markers（做軌跡分析必須先進行細胞定義，否則都是耍流氓）婴噩。These limitations have made it difficult to study the developmental organization of primary human tissues under physiological and pathological conditions擎场。（不知道大家擬時分析的時候，研究的有多深）几莽。
Single-cell RNA-sequencing (scRNA-seq) has emerged as a promising approach to study cellular differentiation trajectories at high resolution in primary tissue specimens（單細胞確實是一個劃時代的技術(shù)）顶籽，目前大多數(shù)軌跡分析的軟件需要：
（1）a priori knowledge of the starting point (and thus, direction) of the inferred biological process（先驗知識，不進行細胞定義直接做軌跡分析就是耍流氓）银觅。
（2）the presence of intermediate cell states to reconstruct the trajectory（含有細胞分化的中間態(tài)礼饱，理論上是這樣）。
These requirements can be challenging to satisfy in certain contexts, such as human cancer development（研究腫瘤樣本單細胞數(shù)據(jù)的童鞋是不是深有體會究驴？）镊绪。
目前的方法還有一個缺點：
with existing in silico approaches, it is difficult to distinguish quiescent（靜止的） (noncycling) adult stem cells that have long-term regenerative potential frommore specialized cells（這種情況其實在我們研究單細胞數(shù)據(jù)的情況下非常少見），而且gene expression–basedmodels utility across diverse developmental systems and single-cell sequencing technologies is still unclear.
Here,we systematically evaluated RNA-based features, including nearly 19,000 annotated gene sets, to identify factors that accurately predict cellular differentiation status independently of tissue type, species, and platform.（開始夸自己的軟件了）洒忧，我們來看一下這個軟件的理論和運用吧

Result1 RNA-based correlates of single-cell differentiation states（最關(guān)鍵的地方）

Our initial goal was to identify robust, RNAbased determinants of developmental potential potential without the need for a priori knowledge of developmental direction or intermediate cell states marking cell fate transitions.（沒有先驗知識的前提下識別發(fā)育的方向和細胞的轉(zhuǎn)變）蝴韭，Using scRNA-seq data, we evaluated ~19,000 potential correlates of cell potency, including all available gene sets in the Molecular Signatures Database。896 gene sets covering transcription factor binding sites from ENCODE (17) and ChEA (18), an mRNA expression–derived stemness index (mRNAsi) (15), and three computational techniques that infer stemness as a measure of transcriptional entropy（這個地方了解一下就可以了）熙侍，We also explored the utility of “gene counts,” or the number of detectably expressed genes
per cell. Although anecdotally observed to correlate with differentiation status in a limited number of settings（這也是文章的重點榄鉴，基因數(shù)量和發(fā)育的關(guān)系），the reliability of this association and whether it reflects a general property of cellular ontogeny are unknown.
To assess these RNA-based features, we compiled a training cohort consisting of nine gold standard scRNA-seq datasets with experimentally confirmed differentiation trajectories.These datasets were selected to prioritize commonly used benchmarking datasets from earlier studies and to ensure a broad sampling of developmental states from the mammalian zygote to terminally differentiated cells（這才是真正的發(fā)育軌跡）蛉抓。Overall, the training cohort encompassed 3174 single cells spanning 49 phenotypes, six biological systems, and three scRNA-seq platforms（種類很齊全）庆尘。To evaluate performance, we used Spearman correlation to compare each RNA-based feature, averaged by phenotype, against known differentiation states。We then averaged the results across the nine training datasets to yield a final score and rank for every feature（相關(guān)性檢驗）巷送。
This systematic screen revealedmany known and unexpected correlates of differentiation status

圖片.png

However, one feature in particular showed notable performance: the number of detectably expressed genes per cell (gene counts）（基因數(shù)量的特征非常明顯）驶忌。這個地方給的理論在于干細胞，多能干細胞表達的基因數(shù)會比較多笑跛，而成熟的細胞類型表達的基因數(shù)量就會相對少付魔，Pluripotency genes（對這一類基因感興趣的同學(xué)可以查一下）, by contrast, showed an arc-like pattern early in human embryogenesis that was characterized by progressively increasing expression until the emergence of embryonic stem cells, followed by decreasing expression（這個發(fā)現(xiàn)倒是很有意思）。

圖片.png

這個地方飞蹂，總結(jié)一下几苍，分化能力強的細胞基因表達數(shù)相對很多，而多能性基因卻呈現(xiàn)弧形的走向陈哑。
These findings suggested that gene counts might extend beyond isolated experimental systems to recapitulate the full spectrum of developmental potential.妻坝，接下來用小鼠的數(shù)據(jù)進行了驗證

圖片.png

和之前的結(jié)果一致伸眶，相關(guān)性非常高，其他物種也檢驗到了相同的結(jié)果惠勒，

圖片.png

赚抡，suggesting that it is a general feature of cellular ontogeny.

接下來是對染色體可及性和發(fā)育關(guān)系的研究
tested whether single-cell gene counts are ultimately a surrogate for global chromatin accessibility, which has been shown to decrease with differentiation in certain contexts，genome-wide chromatin accessibility was observed to progressively decrease with differentiation of hESCs into paraxial mesoderm and lateral mesoderm lineages（這個結(jié)果都能猜到）

圖片.png

We observed strong concordance between thenumber of accessible peaks and the mean number of detectably expressed genes per phenotype

圖片.png

看來這部分結(jié)果具有共性纠屋。

Result2 Development of CytoTRACE

The number of expressed genes per cell generally showed consistent performance with respect to key technical parameters and was generally correlated with mRNA content(這個自然)涂臣，However, in some datasets, such as that for in vitro differentiation of hESCs into the gastrulation layers, the number of expressed genes per cell exhibited considerable intraphenotypic variation（表型的部分其實單細胞用到的相對還少一點，但是ATAC的內(nèi)容也相當(dāng)重要）

圖片.png

看來軌跡分析與基因表達的數(shù)量關(guān)聯(lián)性還是很強售担。
we reasoned that genes whose expression patterns correlate with gene counts might better capture differentiation states. Indeed, by simply averaging the expression levels of genes that were most highly correlated with gene counts in each dataset（這個已經(jīng)無數(shù)次被驗證了）赁遗。the resulting dataset-specific
gene counts signature (GCS) became the topperforming measure in the screen, outranking every predefined gene set and computational tool that we assessed

圖片.png

GCS, like gene counts, is inherently insensitive to dropout events, is agnostic to prior knowledge of developmentally regulated genes,（也就是說對技術(shù)缺陷和先驗知識以來程度較小），and is not solely attributable to multilineage priming or a known molecular signature族铆。

Result3 Performance evaluation across tissues, species, and platforms（多種來源的數(shù)據(jù)岩四，這部分我們簡單看一下）

When assessed at the single-cell level, CytoTRACE outperformed all evaluated RNAbased features in the validation cohort，

圖片.png

achieving a substantial gain in performance over the second-highest-ranking approach

圖片.png

Similar improvements were observed acrossmany complex systems, including bone marrow differentiation

圖片.png

In addition, CytoTRACE results were positively correlated with the direction of differentiation in 88% of datasets（已知發(fā)育軌跡的數(shù)據(jù)來驗證軟件的準確性哥攘，當(dāng)然都不錯）剖煌。
Moreover, no significant biases in performance were observed in relation to tissue type, species, the number of cells analyzed, time series experiments versus snapshots of developmental states, or
plate-based versus droplet-based technologies（bias很小，這個不錯）逝淹。
接下來還和RNA velocyto的結(jié)果進行比較耕姊，當(dāng)然，cytoTrace的結(jié)果相當(dāng)不錯

圖片.png

作者推斷cytoTRACE更準確的原因是This was likely due to the short mRNA half-lives and developmental time scales assumed for the RNA velocity model栅葡。
后面還有對多樣本批次效應(yīng)的驗證茉兰，但是我們現(xiàn)在一般都會事先去除批次效應(yīng)，然后再去做軌跡分析欣簇，方法之間還是要靈活運用规脸。

Result 4 Stem cell–related genes and hierarchies

圖片.png

這個地方提到了關(guān)鍵的一點，CytoTRACE可以識別準確的起點熊咽，講道理莫鸭，真實的情況我是不信的，這部分結(jié)果簡單了解一下就可以网棍，真正做軌跡分析的時候一定要進行人為監(jiān)督黔龟。

Result5 Application to neoplastic disease

圖片.png

還是要識別細胞類型，我真的不信這個軟件能在純數(shù)據(jù)的情況下滥玷，識別發(fā)育起點。

接下來看看示例代碼

Running CytoTRACE

Load CytoTRACE in R with library(CytoTRACE). The package contains the following contents:

Cytotrace(): function to run CytoTRACE on a custom scRNA-seq dataset
iCytoTRACE: function to run CytoTRACE across multiple, heterogeneous scRNA-seq batches/dataset
plotCytoTRACE: function to generate 2D visualizations of CytoTRACE, phenotypes, and gene expression
Two bone marrow differentiation scRNA-seq datasets (marrow_10x_expr and marrow_plate_expr) with corresponding phenotype labels (marrow_10x_pheno and marrow_plate_pheno)

Example I: Run CytoTRACE on a custom scRNA-seq dataset

Use the bone marrow 10x scRNA-seq dataset to run CytoTRACE

results <- CytoTRACE(marrow_10x_expr)

CytoTRACE will automatically run on fast-mode, a subsampling approach used to reduce runtime and memory usage, when the number of cells in the dataset exceeds 3,000. Users can additionally multi-thread using 'ncores' (default = 1) or indicate subsampling size using 'subsamplingsize' (default = 1,000 cells). Run the following dataset on fast mode using 8 cores and subsample size of 1,000.

results <- CytoTRACE(marrow_10x_expr, ncores = 8, subsamplesize = 1000)

The ouput is a list object containing numeric values for CytoTRACE (values ranging from 0 (more differentiated) to 1 (less differentiated)), ranked CytoTRACE, GCS, and gene counts, a numeric vector of the Pearson correlation between each gene and CytoTRACE, a numeric vector of the Pearson correlation between each gene and gene counts, the IDs of filtered cells, and a normalized gene expression table (see package documentation for more details).
Example II: Run iCytoTRACE on multiple scRNA-seq batches/datasets

Run iCytoTRACE on a list containing two bone marrow scRNA-seq datasets profiled on different platforms, 10x and Smart-seq2

datasets <- list(marrow_10x_expr, marrow_plate_expr)
results <- iCytoTRACE(datasets)

The ouput is a list object containing numeric values for the merged CytoTRACE (values ranging from 0 (more differentiated) to 1 (less differentiated)), ranked CytoTRACE, GCS, gene counts, the Scanorama-corrected gene expression matrix, the merged low dimensional embedding, and the IDs of filtered cells (see package documentation for more details).
Example III: Plot CytoTRACE and iCytoTRACE results
Visualizing CytoTRACE results

Generate 2D plots and tables to visualize CytoTRACE, known phenotypes, and gene expression. The current implementation uses t-SNE for dimensional reduction but users can also input their own embeddings. At minimum, the plotCytoTRACE function takes as input a list object generated by either the CytoTRACE or iCytoTRACE functions. Users can also optionally provide phenotype labels or gene names to generate additional plots. Boxplots of CytoTRACE by phenotype labels are automatically generated when phenotype labels are provided.

plotCytoTRACE(results, phenotype = marrow_10x_pheno, gene = "Kit")

The function saves two files to disk: -a pdf of 2D embedded plots colored by CytoTRACE, and, if provided, phenotype labels, and gene expression. -a tab-delimited text file containing a table of CytoTRACE values t-SNE embeddings, and, if provided, phenotype labels and gene expression values.
Visualizing genes associated with CytoTRACE

Generate a bar plot to visualize genes associated with CytoTRACE. At minimum, the plotCytoGenes function takes as input a list object generated by either the CytoTRACE or iCytoTRACE functions. Users can also indicate the number of genes and colors to display.

plotCytoGenes(results, numOfGenes = 10)

The function saves one file to disk:

a pdf of bar plots indicating the genes associated with least and most differentiated cells based on correlation with CytoTRACE.

參考網(wǎng)址在CytoTRCAE

代碼相當(dāng)簡單巍棱，大家自己試一下吧惑畴，不過從結(jié)果看，人為監(jiān)督必不可少

生活很好航徙，有你更好

最后編輯于：2021.04.14 12:28:37

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

禁止轉(zhuǎn)載如贷，如需轉(zhuǎn)載請通過簡信或評論聯(lián)系作者。

人面猴
序言：七十年代末，一起剝皮案震驚了整個濱河市杠袱，隨后出現(xiàn)的幾起案子尚猿，更是在濱河造成了極大的恐慌，老刑警劉巖楣富，帶你破解...
沈念sama閱讀 206,013評論 6贊 481
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件凿掂，死亡現(xiàn)場離奇詭異，居然都是意外死亡纹蝴，警方通過查閱死者的電腦和手機庄萎，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 88,205評論 2贊 382
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來塘安，“玉大人糠涛，你說我怎么就攤上這事〖娣福” “怎么了忍捡？”我有些...
開封第一講書人閱讀 152,370評論 0贊 342
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵，是天一觀的道長切黔。經(jīng)常有香客問我砸脊，道長，這世上最難降的妖魔是什么绕娘？我笑而不...
開封第一講書人閱讀 55,168評論 1贊 278
?港島之戀（遺憾婚禮）
正文為了忘掉前任脓规，我火速辦了婚禮，結(jié)果婚禮上险领，老公的妹妹穿的比我還像新娘侨舆。我一直安慰自己，他們只是感情好绢陌，可當(dāng)我...
茶點故事閱讀 64,153評論 5贊 371
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布挨下。她就那樣靜靜地躺著，像睡著了一般脐湾。火紅的嫁衣襯著肌膚如雪臭笆。梳的紋絲不亂的頭發(fā)上，一...
開封第一講書人閱讀 48,954評論 1贊 283
城市分裂傳說
那天秤掌，我揣著相機與錄音愁铺，去河邊找鬼。笑死闻鉴，一個胖子當(dāng)著我的面吹牛茵乱，可吹牛的內(nèi)容都是我干的。我是一名探鬼主播孟岛，決...
沈念sama閱讀 38,271評論 3贊 399
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼瓶竭，長吁一口氣：“原來是場噩夢啊……” “哼督勺！你這毒婦竟也來了？” 一聲冷哼從身側(cè)響起斤贰，我...
開封第一講書人閱讀 36,916評論 0贊 259
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤智哀，失蹤者是張志新（化名）和其女友劉穎，沒想到半個月后荧恍，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體瓷叫，經(jīng)...
沈念sama閱讀 43,382評論 1贊 300
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點故事閱讀 35,877評論 2贊 323
?白月光啟示錄
正文我和宋清朗相戀三年块饺，在試婚紗的時候發(fā)現(xiàn)自己被綠了赞辩。大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
茶點故事閱讀 37,989評論 1贊 333
活死人
序言：一個原本活蹦亂跳的男人離奇死亡授艰，死狀恐怖辨嗽，靈堂內(nèi)的尸體忽然破棺而出，到底是詐尸還是另有隱情淮腾，我是刑警寧澤糟需，帶...
沈念sama閱讀 33,624評論 4贊 322
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布，位于F島的核電站谷朝，受9級特大地震影響洲押，放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜圆凰，卻給世界環(huán)境...
茶點故事閱讀 39,209評論 3贊 307
男人毒藥：我在死后第九天來索命
文/蒙蒙一杈帐、第九天我趴在偏房一處隱蔽的房頂上張望。院中可真熱鬧专钉，春花似錦挑童、人聲如沸。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,199評論 0贊 19
一樁弒父案站叼，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽。三九已至菇民，卻和暖如春尽楔，著一層夾襖步出監(jiān)牢的瞬間，已是汗流浹背第练。一陣腳步聲響...
開封第一講書人閱讀 31,418評論 1贊 260
情欲美人皮
我被黑心中介騙來泰國打工阔馋，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留，地道東北人娇掏。一個月前我還...
沈念sama閱讀 45,401評論 2贊 352
代替公主和親
正文我出身青樓垦缅，卻偏偏與公主長得像，于是被迫代替她去往敵國和親驹碍。傳聞我的和親對象是個殘疾皇子壁涎，可洞房花燭夜當(dāng)晚...
茶點故事閱讀 42,700評論 2贊 345

10X單細胞軌跡分析（擬時分析）之cytotrace

Abstract

introduction

Result1 RNA-based correlates of single-cell differentiation states（最關(guān)鍵的地方）

Result2 Development of CytoTRACE

Result3 Performance evaluation across tissues, species, and platforms（多種來源的數(shù)據(jù)岩四，這部分我們簡單看一下）

Result 4 Stem cell–related genes and hierarchies

Result5 Application to neoplastic disease

接下來看看示例代碼

推薦閱讀更多精彩內(nèi)容