hello贱鼻,昨天我們分享了VECTOR的示例代碼酱床,文章在10X單細(xì)胞(10X空間轉(zhuǎn)錄組)軌跡分析(擬時分析)之VECTOR,2020年8月發(fā)表于Cell Reports除呵,對于其原理衡楞,我們還是需要認(rèn)真總結(jié)一下的薄疚,這篇短文就讓我們來分享一下這篇文獻(xiàn)访诱,把握重點(diǎn),看看這個軟件的特點(diǎn)及運(yùn)用情況锋拖,對軟件的把握做到心中有數(shù)诈悍。
SUMMARY
A key step in trajectory inference is the determination of starting cells(這個大家應(yīng)該深有體會祸轮,所以做個性化分析之前都是需要細(xì)胞定義的), which is typically done by using manually selected marker genes(目前大多數(shù)細(xì)胞定義的方法還是依賴于人工選擇marker,相似性映射的方法目前問題太多). In this study, we find that the quantile polarization(分位數(shù)極化 侥钳?适袜??) of a cell’s principal-component values is strongly associated with their respective states in development hierarchy(主成分的value與細(xì)胞發(fā)育狀態(tài)相關(guān)), and therefore provides an unsupervised solution for determining the starting cells(這個地方需要深入研究一下). Based on this finding, we developed a tool named VECTOR that infers vectors of developmental directions for cells in UniformManifold Approximation and Projection (UMAP). In seven datasets of different developmental scenarios, VECTOR correctly identifies the starting cells and successfully infers the vectors of developmental directions. VECTOR is freely available for academic use at https://github.com/jumphone/Vector.(運(yùn)用示例很好慕趴,每篇文章都是這么說的)痪蝇。
INTRODUCTION
這個地方我們提煉一下
TI方法的算法(monocle,PAGA冕房,slingshot等躏啰,這幾個軟件大家都應(yīng)該很熟悉)設(shè)計有兩個共同的組成部分:
- the use of dimensional reduction, clustering, or graph-building techniques to convert scRNA-seq data into a simplified representation of trajectory, and the ordering of cells along the trajectory.(降維聚類,很常規(guī))
- there may be many alternative trajectories to choose from, most TI methods require the use of prior information, such as a set of known marker genes, to determine the starting cells (SCs) of the correct trajectory.(說白了耙册,需要做細(xì)胞定義來決定發(fā)育的起點(diǎn)给僵,不做細(xì)胞定義的軌跡分析都是耍流氓)
marker的人為主觀選擇確實(shí)存在很大的誤差,Recently, a new study found that RNA velocity(RNA Velocyto確實(shí)這個方面做的不錯详拙,人為干預(yù)減少)帝际,the time derivative of gene expression states, could be estimated by modeling the relationship between unspliced and spliced mRNAs, making it possible to deduce the future transcriptional states of cells and consequently the developmental trajectories without the need of prior information for determining SCs(依據(jù)可變剪切來推斷發(fā)育軌跡,這個方法高分文獻(xiàn)經(jīng)常用到)饶辙,在沒有使用任何先驗信息的情況下蹲诀,使用RNA速度鑒定了神經(jīng)c譜系細(xì)胞的新型發(fā)育模型,證明了其在發(fā)育譜系分析中的有用性弃揽。
看一下RNA velocyto的缺點(diǎn)
- reanalyze raw sequencing data to determine intron reads for quantifying unspliced mRNAs, which is time-consuming and sometimes may not be possible because of the limitation of the sequencing platforms.(這也不算什么缺點(diǎn))脯爪。
現(xiàn)在做單細(xì)胞分析確實(shí)PCA分析是必需的,Cells at different developmental states have been shown to
have distinct patterns of PC values.However, the patterns of a cell’s PC values have not yet been fully explored in the current TI methods.(這個地方作者持保留意見)矿微,In this study, we observed that the averaged polarization of a cell’s PC values across a large number of PC subspaces is strongly correlated with their developmental states, with SCs having the most polarized PC values.(這個地方需要注意一下痕慢,不知道大家注意過沒有,初始細(xì)胞的PC值很特別么涌矢?掖举?待會看看看方法),We thus provided an unsupervised solution for determining the SCs based on the averaged polarization of a cell’s PC values.(依據(jù)PC值來確定發(fā)育起點(diǎn)娜庇,這個方法不能說是無監(jiān)督塔次,必須半監(jiān)督),當(dāng)然名秀,作者的示例當(dāng)然很不錯励负,我們自己用需要點(diǎn)注意了。
Result
第一步是拿定義好的兩個單細(xì)胞數(shù)據(jù)集驗證軟件的可靠性
我們做PCA分析的時候泰偿,一般選擇前十幾個PCA做下游的分析,Seurat本身會計算50個PCA蜈垮,作者這個地方采用的卻是150個PCA耗跛,這個地方依據(jù)是什么裕照,需要在方法中看看了。
在數(shù)據(jù)集分析中發(fā)現(xiàn)调塌,F(xiàn)or both oligodendrocyte and enterocyte lineages, we found that cells at earlier developmental stages tend to have more extreme PC values(更極端的PCA值)(either very small or very large—i.e., highly polarized(極化原來是這個意思晋南,服了)),while those at later developmental stages tend to have more intermediate PC values(這個規(guī)律還真沒注意過羔砾,需要拿自己的數(shù)據(jù)來嘗試一下了)负间。such patterns were more obvious if we inspected the density of the PC value quantiles at all 150 PC subspaces for cells at different developmental stages。(看圖規(guī)律倒是很明顯)
To quantify the polarization of the PC value quantiles, we next defines a quantile polarization (QP) score that averages the polarization of the PC value quantile of a given cell across all 150 PC subspaces(QP的定義姜凄,這個方式講道理政溃, 我還是第一次見),然后QP的值很發(fā)育層級相關(guān)性很高态秧,with cells at the earliest developmental stages having the greatest QP scores董虱。
We further experimented with using a different number of PCs, and found that such correlations were robust if the number of PCs used could explain ~20%–80% of the total variance。
UMAP直接推斷軌跡發(fā)生申鱼,這個在monocle3軟件中有運(yùn)用
In essence, VECTOR treats a twodimensional UMAP representation of cells as an image and splits it into a number of pixels. After removing those pixels that do not include any cells, VECTOR focuses on the largest connected pixel (LCP) network in UMAP to infer developmental directions.(看來這個軟件這是在UMAP圖上進(jìn)行軌跡的推斷)愤诱。By averaging the QP scores of cells inside each pixel, VECTOR identifies the high-scoring pixels that have the greatest QP scores (top 10% by default).(PCA的極化值推斷發(fā)育起點(diǎn)的細(xì)胞),作者也提到了這個方法可能會存在假陽性捐友,Here, VECTOR considers not only QP scores but also the connectivity of cells in UMAP; from the high-scoring pixels, it selects the largest connected high scoring pixels as the starting point of development. (聯(lián)合UMAP的分析結(jié)果進(jìn)行綜合分析淫半,得到發(fā)育起點(diǎn)的細(xì)胞),Those isolated high-scoring pixels that are likely false positives are then filtered out.(這個地方其實(shí)有bug)匣砖。For each pixel in the LCP network, VECTOR computes a pseudotime score defined as
its network distance to the starting point of development(大部分軟件都是這么計算的)科吭。Finally, for a given target pixel VECTOR computes a vector (with arrow and length) by taking into consideration the information of all pixels in the LCP network, including the direction of the unit vector pointing from a selected pixel to the target pixel, the relative pseudotime score between the target pixel and the selected pixel, and the closeness of the selected pixel to the target pixel in the LCP network, and so on.(分析結(jié)果得到類似RNA Velocyto的圖)。箭頭的方向就是發(fā)育的方向脆粥,臨近發(fā)育起點(diǎn)和發(fā)育中期砌溺,箭頭較短,臨近發(fā)育終點(diǎn)箭頭較長变隔。
運(yùn)用示例
剛才定義好的兩個數(shù)據(jù)集表現(xiàn)很好规伐,成功識別了發(fā)育起點(diǎn)和軌跡
運(yùn)用到其他示例數(shù)據(jù),效果也不錯
Vector 和 RNA Velocyto的比較
Vector效果更好匣缘,RNA Velocyto有截斷猖闪,which may be caused by the lack of intron reads in these cells.當(dāng)然,Velocyto也很難識別發(fā)育的起點(diǎn)肌厨。
接下來是運(yùn)用到多發(fā)育分支的數(shù)據(jù)
效果不錯培慌。當(dāng)然,軟件也提供了人工選擇發(fā)育起點(diǎn)的功能柑爸。
Method
The workflow of VECTOR
Given a two-dimensional UMAP representation of cells, VECTOR treats it as an image, and then splitting it into a number of pixels. We provide a parameter called ‘‘N’’ for defining the number of pixels in UMAP.
不僅僅有數(shù)據(jù)處理吵护,還有圖片處理的相關(guān)信息
大家不妨試一試吧
生活很好,有你更好