RNA velocity分析練習(xí)（三）生成loom文件

系列回顧：
1.RNA velocity分析練習(xí)（一）文件下載以及預(yù)處理
2.RNA velocity分析練習(xí)（二）軟件安裝

Velocyto軟件針對不同測序平臺随抠，有不同的方法進(jìn)行l(wèi)oom文件的提取设预，你可以參考官網(wǎng)：here來進(jìn)行操作。

我練習(xí)的數(shù)據(jù)是Smartseq2平臺測序結(jié)果舌胶，所以這里只介紹這一種。

velocyto includes a shortcut to perform the read counting for UMI-less, not stranded, full-length techniques such as SmartSeq2.

根據(jù)我之前寫的筆記，你需要準(zhǔn)備好基因組的annotation文件，和repeat_msk.gtf文件请垛。這一步所需時間取決于你數(shù)據(jù)的測序深度和電腦配置，但一般不會超過６小時洽议。（實際上我的破電腦真的跑了6個小時宗收。。绞铃。）

已經(jīng)準(zhǔn)備好的bam文件：

下載好的基因組注釋文件（小鼠）：

我這是去年下載的镜雨，現(xiàn)在已經(jīng)更新到v25了

下載并解壓好的repeat_msk.gtf文件：

（一）生成loom文件

進(jìn)入bam文件所在的文件夾：

$ velocyto run_smartseq2 -o /media/yanfang/FYWD/scRNA_seq/RNA_relocity/loom_files/ -m /media/yanfang/FYWD/RNA_seq/ref_genome/mm10_repeat_msk.gtf -e MyTissue /media/yanfang/FYWD/scRNA_seq/RNA_relocity/GSE99933_bam/*.bam /media/yanfang/FYWD/RNA_seq/ref_genome/gencode.vM22.annotation.gtf

（代碼里的"MyTissue"這個名字是一會兒生成loom文件的前綴）
-o ：輸出文件的文件夾
run_smartseq2：指定哪個平臺的測序結(jié)果

運(yùn)行過程會彈出很多信息，每讀取一個bam文件都會顯示幾行信息儿捧，例如說：

2020-05-19 17:13:10,863 - DEBUG - Reading /media/yanfang/FYWD/scRNA_seq/RNA_relocity/GSE99933_bam/E13.5_P9.bam
2020-05-19 17:13:11,197 - DEBUG - Read first 0 million reads
2020-05-19 17:13:22,762 - DEBUG - Counting for batch 768, containing 1 cells and 419294 reads
2020-05-19 17:13:37,676 - DEBUG - 42428 reads in repeat masked regions
2020-05-19 17:13:37,676 - DEBUG - 191333 reads overlapping with features on plus strand
2020-05-19 17:13:37,677 - DEBUG - 188152 reads overlapping with features on minus strand
2020-05-19 17:13:37,677 - DEBUG - 25321 reads overlapping with features on both strands
2020-05-19 17:13:41,095 - WARNING - The barcode selection mode is off, no cell events will be identified by <80 counts
2020-05-19 17:13:41,095 - WARNING - 0 of the barcodes where without cell

bam文件都讀取完成會顯示：

2020-05-19 17:13:41,161 - DEBUG - Counting done!
2020-05-19 17:13:41,161 - DEBUG - Example of barcode: E13.5_D9.bam and cell_id: MyTissue:E13.5_D9.bam
2020-05-19 17:13:41,161 - DEBUG - Generating output file /media/yanfang/FYWD/scRNA_seq/RNA_relocity/loom_files/MyTissue.loom
2020-05-19 17:13:41,161 - DEBUG - Collecting row attributes
2020-05-19 17:13:41,324 - DEBUG - Generating data table
2020-05-19 17:13:44,062 - DEBUG - Writing loom file
2020-05-19 17:13:53,902 - DEBUG - Terminated Succesfully!

得到的loom文件：

（二）讀取loom文件

參考文章：
https://www.cnblogs.com/raisok/p/12425258.html
http://pklab.med.harvard.edu/velocyto/notebooks/R/SCG71.nb.html
https://satijalab.org/loomR/loomR_tutorial.html
https://bustools.github.io/BUS_notebooks_R/velocity.html

這里用R來進(jìn)行舉例：

> library(devtools)
> install_github("velocyto-team/velocyto.R")
> library(velocyto.R)
#load data
> input_loom <- "MyTissue.loom"
> adata <- read.loom.matrices(input_loom)

可以來看一下adata這個對象里都有什么：

其中spliced和unspliced就是我們需要的成熟的mRNA和未成熟mRNA的數(shù)值荚坞；spanning的意思是落在intron+exon上的reads count

# Use the spliced data as the input data提取spliced數(shù)據(jù)
> spliced_adata <- adata$spliced
> dim(spliced_adata)
[1] 55487   768　＃5萬多個基因菲盾，768個細(xì)胞

NOTE
你也可以用R直接讀取bam文件（針對smartseq2平臺的數(shù)據(jù)）颓影，我沒有運(yùn)行過，僅僅是從文獻(xiàn)作者提供的代碼copy過來（源代碼：https://github.com/velocyto-team/velocyto-notebooks/blob/master/R/chromaffin.Rmd）：

> path <- "data/e12.5.bams"
> files <- system(paste('find',path,'-name "*unique.bam" -print'),intern=T)
> names(files) <- gsub(".*\\/(.*)_unique.bam","\\1",files)
# parse gene annotation, annotate bam file reads
> dat <- read.smartseq2.bams(files,"data/genes.refFlat",n.cores=40)

到這一步懒鉴，我們所需要的count矩陣就有了诡挂。后面就是RNA velocity的分析過程了。

最后編輯于：2020.05.21 09:35:55

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

禁止轉(zhuǎn)載临谱，如需轉(zhuǎn)載請通過簡信或評論聯(lián)系作者璃俗。