How to map cell fate to branches?
擬時間分析結(jié)果有很多重要的結(jié)果,但是這些結(jié)果如何解讀喧枷?比如下圖的分支點(diǎn)分析結(jié)果:
從圖中可以看到宛渐,行代表基因,這個好說获搏,熱圖的列主要分為三方面:Pre?branch虑粥、Cell fate 1如孝、Cell fate 2,這三個列代表什么含義娩贷?
Pre?branch
為了解讀結(jié)果第晰,我們看一下擬時間分析分的state結(jié)果圖,然后我們對應(yīng)的Pre?branch包含哪些細(xì)胞彬祖?
這里茁瘦,我們想比較state7和state1的差異,也就是想分析branch point 3的分支點(diǎn)(identify genes expressed in a branch-dependent )储笑,那這里Pre?branch到底包含哪些細(xì)胞甜熔?
In fact, BEAM tries to traverse backward from the cell on the branch point all the way back to the root cell (the cell with pseudotime 0) and use all those cells as the the pre-branch.
從結(jié)果說明可以看到,Pre?branch包含的細(xì)胞為 2, 3, 5突倍。
'cell fate 1' and 'cell fate 2'
cell fate 1和cell fate 2到底指什么腔稀?比如還是這里的branch point 3為例:
Cell fate 1 corresponds to the state with small id (in this case, state 1) while cell fate 2 corresponds to sate with bigger id (in this case, state 2)
從說明文檔中可以看出:
- [x] Cell fate 1:state 1
- [x] Cell fate 2:state 7
其他場景Pre?branch說明
如果比較state4和state7盆昙,Pre?branch又是哪些細(xì)胞?
this is a very good question since state 4 relates to branch point 2 while state 7 relates to branch point 3. For this test, the pre-branch will only include cells from state 2.
這里的Pre?branch僅僅包含state2細(xì)胞焊虏。
后記
此文僅僅記錄了分支點(diǎn)依賴相關(guān)基因的解讀淡喜,其他的解讀后續(xù)在說明。
plot_multiple_branches_pseudotime函數(shù)說明
plot_multiple_branches_pseudotime:Create a kinetic curves to demonstrate the bifurcation of gene expression along multiple branches诵闭。
此函數(shù)可以進(jìn)行多個分支點(diǎn)進(jìn)行比較分析炼团。
plot_multiple_branches_pseudotime(cds, branches, branches_name = NULL,min_expr = NULL, cell_size = 0.75, norm_method = c("vstExprs", "log"),nrow = NULL, ncol = 1, panel_order = NULL, color_by = "Branch",
trend_formula = "~sm.ns(Pseudotime, df=3)", label_by_short_name = TRUE,TPM = FALSE, cores = 1)
#示范命令
plot_multiple_branches_heatmap(celltrajectory.monocle, branches = c(6,7),
cluster_rows = TRUE, hclust_method = "ward.D2", num_clusters = 6,
hmcols = NULL, add_annotation_row = NULL, add_annotation_col = NULL,
show_rownames = FALSE, use_gene_short_name = TRUE,
norm_method = c("vstExprs", "log"), scale_max = 3, scale_min = -3,
trend_formula = "~sm.ns(Pseudotime, df=3)", return_heatmap = FALSE,
cores = 1)
熱圖的每一列代表什么?
If you're looking for a deeper understanding of what the function is doing, I'd recommend digging into the source code for the function. The plot_genes_branched_heatmap function is in R/plotting.R, but it calls a nested function (buildBranchCellDataSet) that's contained in R/BEAM.R. I found it valuable to run through the code line by line and see what variables get made/changed.
But to briefly answer your question, monocle orders your cells along the trajectory, giving each cell a pseudotime value. Now, with expression values for each gene at different points in pseudotime (ie. each cell), it uses a VGLM with splines to fit non-linear expression dynamics as a function of pseudotime. This model can then directly be used for differential expression if desired (eg. using a likelihood ratio test against a reduced model that doesn't incorporate pseudotime). For plotting a heatmap though, there's a problem: the pseudotime values for your cells do not increase by sequential integers (ie. 1,2,3,..,n). This is because monocle was designed, recognizing that the jump between cells along a trajectory aren't always the same distance. So if you were to make a heatmap, your column representation of pseudotime wouldn't be linear--it will depend on your sampling density along the trajectory. It could go, for example, 1,1.15,1.25,5,6,6.25,10 (see the problem?). So what the plotting function does (more specifically, a function called genSmoothCurves) is use the constructed models from before to predict gene expression of all genes along 100 evenly spaced pseudotime values spanning the range, and then makes a heatmap of those predictions rather than your scRNA-Seq measurements themselves. Each column represents those one of those 100 pseudotime values.
The branched heatmap function is similar, except things are ordered differently. Those modelled values are ordered from the middle of the heatmap outwards. The left and right directions represent the modelled expression for two separate branches of the trajectory. The small region in the middle that is symmetrical represents the "progenitors" (the nomenclature used by the devs) prior to the branchpoint, and the point moving outwards where that symmetry breaks is the bifurcation point of the two independent branches. Going through the source code for this would really help make this clear.
簡而言之涂圆,就是根據(jù)的擬時間值的范圍们镜,分成100個bin,每個bin中代表一個擬時間值润歉。
參考資料
官方說明:How to map cell fate to branches?
plot_multiple_branches_pseudotime源代碼
Understanding plot_genes_branched_heatmap columns