RNA注釋
這里以inDrop 實驗數(shù)據(jù)舉例殊鞭,spliced/unspliced的RNA可以通過:
1.使用dropEst 輸出管道生成 類似10X的 bam 文件:
~/dropEst/build/dropest -m -F -L eiEIBA -o run1 -g cellranger/refdata-cellranger-mm10-1.2.0/genes/genes.gtf -c ~/dropEst/configs/indrop_v3.xml *.bam
2.使用velocyto.py來注釋spliced/unspliced的reads,寫出標準loom文件:
velocyto run -u Gene -o out -e SCG71 -m mm10_rmsk_srt.gtf -v SCG_71_tophat.filtered.sorted.bam UCSC/mm10/Annotation/Genes/genes.gtf
(注意登馒,也可以使用-V參數(shù)直接通過 DropEst 注釋spliced/unspliced的reads:)
~/dropEst/dropest -V -C 6000 -m -g ucsc_mm10_exons.gtf.gz -c ~/dropEst/configs/indrop_v3.xml *.aligned.bam)
下面的例子從 velocyto.py 生成的loom文件開始,使用 pagoda2 獲取細胞clusters/embedding笨篷,然后估計/可視化RNA速率怀喉。
加載R包:
library(velocyto.R)
Loading required package: Matrix
數(shù)據(jù)下載
下載預(yù)先計算的loom矩陣
wget http://pklab.med.harvard.edu/velocyto/mouseBM/SCG71.loom
讀取矩陣
ldat <-read.loom.matrices(url("http://pklab.med.harvard.edu/velocyto/mouseBM/SCG71.loom"))
#Error: is.character(name) is not TRUE
使用pagoda2標準化和聚類細胞
使用spliced 表達矩陣作為pagoda2的輸入,并查看分布脓恕。
emat <- ldat$spliced
hist(log10(colSums(emat)),col='wheat',xlab='cell size')
# this dataset has already been pre-filtered, but this is where one woudl do some filtering
emat <- emat[,colSums(emat)>=1e3]
pagoda2處理
pagoda2用于生成細胞嵌入、細胞聚類以及更精確的細胞距離矩陣窿侈。您也可以使用其他工具生成這些工具炼幔,如 Seurat等。
創(chuàng)建pagoda2對象史简,調(diào)整方差:
library(pagoda2)
r <- Pagoda2$new(emat,modelType='plain',trim=10,log.scale=T)
#2600 cells, 7301 genes; normalizing ... using plain model winsorizing ... log scale ... done.
r$adjustVariance(plot=T,do.par=T,gam.k=10)
#calculating variance fit ... using gam 137 overdispersed genes ... 137 persisting ... done.
運行基本分析步驟以生成細胞嵌入和聚類乃秀,并可視化:
r$calculatePcaReduction(nPcs=100,n.odgenes=3e3,maxit=300)
running PCA using 3000 OD genes ..
Loading required package: irlba
.. done
r$makeKnnGraph(k=30,type='PCA',center=T,distance='cosine');
Loading required package: igraph
Attaching package: ‘igraph’
The following objects are masked from ‘package:stats’:decompose, spectrum
The following object is masked from ‘package:base’:union
creating space of type angular done
adding data ... done
building index ... done
querying ... done
r$getKnnClusters(method=multilevel.community,type='PCA',name='multilevel')r$getEmbedding(type='PCA',embeddingType='tSNE',perplexity=50,verbose=T)
calculating distance ... pearson ...running tSNE using 16 cores:
Read the 2600 x 2600 data matrix successfully!
OpenMP is working...
Using no_dims = 2, perplexity = 50.000000, and theta = 0.500000
Computing input similarities...
Building tree...
Done in 9.36 seconds (sparsity = 0.084143)!
Learning embedding...
Iteration 50: error is 73.294295 (50 iterations in 3.70 seconds)
Iteration 100: error is 65.648297 (50 iterations in 3.19 seconds)
Iteration 150: error is 65.151406 (50 iterations in 3.17 seconds)
Iteration 200: error is 65.090414 (50 iterations in 3.24 seconds)
Iteration 250: error is 65.075571 (50 iterations in 3.27 seconds)
Iteration 300: error is 1.814009 (50 iterations in 3.11 seconds)
Iteration 350: error is 1.699950 (50 iterations in 3.08 seconds)
Iteration 400: error is 1.660607 (50 iterations in 3.01 seconds)
Iteration 450: error is 1.644676 (50 iterations in 2.98 seconds)
Iteration 500: error is 1.638744 (50 iterations in 2.96 seconds)
Iteration 550: error is 1.633982 (50 iterations in 3.00 seconds)
Iteration 600: error is 1.629732 (50 iterations in 2.99 seconds)
Iteration 650: error is 1.628101 (50 iterations in 3.12 seconds)
Iteration 700: error is 1.625991 (50 iterations in 3.27 seconds)
Iteration 750: error is 1.624012 (50 iterations in 3.15 seconds)
Iteration 800: error is 1.622847 (50 iterations in 3.27 seconds)
Iteration 850: error is 1.621847 (50 iterations in 3.31 seconds)
Iteration 900: error is 1.620831 (50 iterations in 3.34 seconds)
Iteration 950: error is 1.619936 (50 iterations in 3.21 seconds)
Iteration 1000: error is 1.618511 (50 iterations in 3.16 seconds)
Fitting performed in 63.52 seconds.
繪制嵌入、集群標記(左)和"Xist"表達圖(將男性和女性分開展示)
par(mfrow=c(1,2))r$plotEmbedding(type='PCA',embeddingType='tSNE',show.legend=F,mark.clusters=T,min.group.size=10,shuffle.colors=F,mark.cluster.cex=1,alpha=0.3,main='cell clusters')r$plotEmbedding(type='PCA',embeddingType='tSNE',colors=r$depth,main='depth')
treating colors as a gradient with zlim: 1000.9 2939
速率估計
準備矩陣和聚類數(shù)據(jù):
emat <- ldat$spliced; nmat <- ldat$unspliced; # disregarding spanning reads, as there are too few of thememat <- emat[,rownames(r$counts)]; nmat <- nmat[,rownames(r$counts)]; # restrict to cells that passed p2 filter# take cluster labelscluster.label <- r$clusters$PCA$multilevel # take the cluster factor that was calculated by p2cell.colors <- pagoda2:::fac2col(cluster.label)# take embedding form p2emb <- r$embeddings$PCA$tSNE
除了聚類和 t-SNE 嵌入圆兵,從 p2 處理跺讯,我們將采取細胞距離,優(yōu)于默認的R 通常會使用全轉(zhuǎn)錄體相關(guān)距離殉农。
cell.dist <- as.dist(1-armaCor(t(r$reductions$PCA)))
基于最低平均表達量(至少在其中一個cluster中)篩選基因刀脏,輸出生成的有效基因總數(shù):
emat <- filter.genes.by.cluster.expression(emat,cluster.label,min.max.cluster.average = 0.2)nmat <- filter.genes.by.cluster.expression(nmat,cluster.label,min.max.cluster.average = 0.05)length(intersect(rownames(emat),rownames(nmat)))
[1] 2541
估計RNA速率:
fit.quantile <- 0.02rvel.cd <- gene.relative.velocity.estimates(emat,nmat,deltaT=1,kCells=25,cell.dist=cell.dist,fit.quantile=fit.quantile)
calculating cell knn ... done
calculating convolved matrices ... done
fitting gamma coefficients ... done. succesfful fit for 743 genes
filtered out 149 out of 743 genes due to low nmat-emat correlation
filtered out 27 out of 594 genes due to low nmat-emat slope
calculating RNA velocity shift ... done
calculating extrapolated cell state ... done
可視化 t-SNE 嵌入上的速率:
show.velocity.on.embedding.cor(emb,rvel.cd,n=200,scale='sqrt',cell.colors=ac(cell.colors,alpha=0.5),cex=0.8,arrow.scale=3,show.grid.flow=TRUE,min.grid.cell.mass=0.5,grid.n=40,arrow.lwd=1,do.par=F,cell.border.alpha = 0.1)
delta projections ... sqrt knn ... transition probs ... done
calculating arrows ... done
grid estimates ... grid.sd= 0.6580196 min.arrow.size= 0.01316039 max.grid.arrow.length= 0.05887327 done
特定基因可視化:
gene <- "Camp"gene.relative.velocity.estimates(emat,nmat,deltaT=1,kCells = 25,kGenes=1,fit.quantile=fit.quantile,cell.emb=emb,cell.colors=cell.colors,cell.dist=cell.dist,show.gene=gene,old.fit=rvel.cd,do.par=T)
calculating convolved matrices ... done
[1] 1
調(diào)整視圖
調(diào)整kCells,這會給我們一個更理想化的圖像視圖超凳,差異肉眼可見:
gene <- "Camp"gene.relative.velocity.estimates(emat,nmat,deltaT=1,kCells = 100,kGenes=1,fit.quantile=fit.quantile,cell.emb=emb,cell.colors=cell.colors,cell.dist=cell.dist,show.gene=gene,do.par=T)
calculating cell knn ... done
calculating convolved matrices ... done
[1] 1