介紹
SPRINT是Zhang等人2017年發(fā)表在Bioinformatics上的檢測(cè)RNA編輯位點(diǎn)的工具,文章題目為:SPRINT: an SNP-free toolkit for identifying RNA editing sites。該工具不同于傳統(tǒng)的RES(RNA Editing Sites)檢測(cè)方法,它不依賴于數(shù)據(jù)庫(kù)中的SNP位點(diǎn)柬讨。
簡(jiǎn)單來(lái)說(shuō),因?yàn)镽NA編輯通常是成簇發(fā)生的,因此SPRINT定義一個(gè)SNV duplet的概念:如果基因組上兩個(gè)相鄰的SNV位點(diǎn)小于一定的閾值的話楣黍,則稱之為一個(gè)SNV duplet,將這兩個(gè)SNV位點(diǎn)定義為RES棱烂∽馄基因組上不同區(qū)域的duplet閾值可以有不同的取值(例如Alu區(qū)域傾向于發(fā)生更多的RNA編輯,則Alu區(qū)域的該閾值設(shè)置為更泄柑洹)窜锯。
SPRINT文章解讀
引言
RNA編輯主要分為A-I和C-U兩種,其中人類組織中發(fā)生的RNA編輯的95%是A-I芭析。
傳統(tǒng)對(duì)RES檢測(cè)的方法是首先將RNA-Seq數(shù)據(jù)與參考基因組或參考轉(zhuǎn)錄組相比較锚扎,找出所有的SNV(Single Nucleotide Variants),然后再將基因組中本來(lái)存在的SNP位點(diǎn)過(guò)濾掉,剩下的就是RES位點(diǎn)馁启。
A-to-I RES位點(diǎn)被發(fā)現(xiàn)在基因組上是成簇出現(xiàn)的驾孔,而SNP在基因組上則是密度很低,并且不同的SNP在基因組上的出現(xiàn)也是獨(dú)立的惯疙。因此翠勉,定義兩個(gè)相鄰的相同變異類型的SNV為SNV duplet,通過(guò)SNV duplet的不同分布來(lái)區(qū)分SNP和RES霉颠。
此外对碌,對(duì)于未比對(duì)到基因組上的resds,Porath等人通過(guò)將A全部替換為G蒿偎,然后再與參考基因組比對(duì)朽们,可以發(fā)現(xiàn)基因組的某些區(qū)域上存在大量的RNA編輯怀读,這種現(xiàn)象稱為RNA超編輯。利用這種方法骑脱,SPRINT也能檢測(cè)出hyper-RES位點(diǎn)菜枷。
方法
具體來(lái)講,SPRINT的流程如下:
SPRINT的安裝
SPRINT v0.1.8最新版的安裝過(guò)程非常簡(jiǎn)單叁丧,首先在https://github.com/jumphone/SPRINT下載源數(shù)據(jù)包啤誊,然后在python2.7的環(huán)境下使用pip命令即可安裝完成
pip install SPRINT-master.zip
SPRINT的使用
Prepare: Mask reference genome and build mapping index
sprint prepare [options] reference_genome(.fa) bwa_path
[options]:
-t transcript_annotation(.gtf) ????????#Optional
Main: Identify regular- and hyper- RESs
sprint main [options] reference_genome(.fa) output_path bwa_path samtools_path
[options]:
-1 read1(.fq) ????????# Required !
-2 read2(.fq) ????????# Optional
-rp repeat_file ????????# Optional, you can http://sprint.software/SPRINT/dbrep/
-ss INT???????? # when input is strand-specific sequencing data, please clarify the direction of read1. [0 for antisense; 1 for sense] (default is 0)
-c INT???????? # Remove the fist INT bp of each read (default is 0)
-p INT???????? # Mapping CPU (default is 1)
-cd INT ????????# The distance cutoff of SNV duplets (default is 200)
-csad1 INT???????? # Regular - [-rp is required] cluster size - Alu - AD >=1 (default is 3)
-csad2 INT ????????# Regular - [-rp is required] cluster size - Alu - AD >=2 (default is 2)
-csnar INT ????????# Regular - [-rp is required] cluster size - nonAlu Repeat - AD >=1 (default is 5) -csnr INT # Regular - [-rp is required] cluster size - nonRepeat - AD >=1 (default is 7) -csrg INT # Regular - [without -rp] cluster size - AD >=1 (default is 5)
-csahp INT ????????# Hyper - [-rp is required] cluster size - Alu - AD >=1 (default is 5)
-csnarhp INT ????????# Hyper - [-rp is required] cluster size - nonAlu Repeat - AD >=1 (default is 5) -csnrhp INT # Hyper - [-rp is required] cluster size - nonRepeat - AD >=1 (default is 5)
-cshp INT ????????# Hyper - [without -rp] cluster size - AD >=1 (default is 5)
Start from aligned reads
對(duì)于已經(jīng)比對(duì)好后得到的BAM文件,可以使用sprint_from_bam命令尋找RES拥娄。但僅通過(guò)BAM文件無(wú)法找到hyper RES蚊锹,因?yàn)閔yper RES需要使用比對(duì)軟件得到unmapped reads。要得到hyper RES条舔,可以先使用samtools將unmapped reads從BAM文件中提取出來(lái)枫耳,然后轉(zhuǎn)換為fastq格式,再對(duì)這些unmapped reads執(zhí)行前兩步的sprint標(biāo)準(zhǔn)流程即可孟抗。
sprint_from_bam [options] alinged_reads(.bam) reference_genome(.fa) output_path samtools_path
[options]:
-rp repeat_file ????????# Optional, you can download it from http://sprint.software/SPRINT/dbrep/
-cd INT ????????# The distance cutoff of SNV duplets (default is 200)
-csad1 INT ????????# Regular - [-rp is required] cluster size - Alu - AD >=1 (default is 3)
-csad2 INT ????????# Regular - [-rp is required] cluster size - Alu - AD >=2 (default is 2)
-csnar INT ????????# Regular - [-rp is required] cluster size - nonAlu Repeat - AD >=1 (default is 5) -csnr INT # Regular - [-rp is required] cluster size - nonRepeat - AD >=1 (default is 7) -csrg INT # Regular - [without -rp] cluster size - AD >=1 (default is 5)
實(shí)戰(zhàn)
cd /local/txm/txmdata/scRNA_editing/SRRdata/SRR7311317/sprinttest/
sprint prepare -t ./Homo_sapiens.GRCh38.87.chr.gtf ./hg38.fa /local/txm/anaconda3/envs/py2/bin/bwa
sprint main -rp? ./hg38_repeat.bed? -p? 8? -1? ../SRR7311317_1.fastq? -2? ../SRR7311317_2.fastq? ./hg38.fa? ./? /local/txm/anaconda3/envs/py2/bin/bwa? /local/txm/txmdata/scRNA_editing/SPRINT-master/samtools_and_bwa/samtools
參考
https://academic.oup.com/bioinformatics/article/33/22/3538/4004872
https://github.com/jumphone/SPRINT
https://github.com/jumphone/SPRINT/blob/master/SPRINT_manual.pdf