本文轉(zhuǎn)自? http://www.cnblogs.com/ZHshuang463508120/p/3593679.html
一、mirDeep2安裝
下載和解壓
wget http://mdc.helmholtz.de/38350089/en/research/research_teams/systems_biology_of_gene_regulatory_elements/projects/miRDeep/mirdeep2_0_0_5.zip
unzip mirdeep2_0_0_5.zip
如果用mirDeep2自帶的install.pl安裝會(huì)遇到下載的文件不存在的情況,比如bowtie
那么你需要自己安裝幾個(gè)軟件蹬叭。解壓后的路徑下面有個(gè)README里面詳細(xì)介紹了如何自行安裝mirdeep2像寒。不過有些細(xì)節(jié)需要修改张遭。
首先斧吐,下載幾個(gè)必須的package矾缓,下載到/home/disk6/src路徑下商虐,解壓也都在這個(gè)路徑下完成
(ps:所有附帶安裝軟件的網(wǎng)址觉阅,參照下載好的mirdeep2目錄下的README)
bowtie ?????????????????#version 0.12.7
ViennaRNA-1.8.5.tar.gz
squid-1.9g.tar.gz
randfold-2.0.tar.gz
PDF-API2-0.73.tar.gz
perl ??????????????????#我的版本是 5.10.1
~~~~~~~~~~安裝bowtie
unzip bowtie-0.12.7-linux-x86_64.zip
解壓后就是可執(zhí)行的二進(jìn)制文件,不需要編譯秘车,省心啊
把bowtie加入環(huán)境變量
~~~~~~~~~安裝ViennaRNA
tar -zxf?ViennaRNA-1.8.5.tar.gz
cd ViennaRNA-1.8.5
./configure --prefix=/home/disk6/tools/ViennaRNA ?#/home/disk6/tools/是我安裝軟件的路徑典勇,我把常用的軟件都安裝到這里,或者建立ln -s到tools下面相應(yīng)的目錄叮趴,然后一個(gè)個(gè)放到path中
make
make install
~~~~~~~~~安裝squid-1.9g.tar.gz和randfold-2.0.tar.gz
tar -zxf squid-1.9g.tar.gz
cd squid-1.9g
./configure --prefix=/home/disk6/tools/squid ???#只有configure之后才有squid.h文件割笙,這是下面的randfold2.0需要的文件
make
make install
tar -zxf randfold-2.0.tar.gz
cd randfold2.0
編輯Makefile文件,將INCLUDE=-I這一行替換為INCLUDE=-I. -I/home/disk6/src/squid-1.9g/ -L/home/disk6/src/squid-1.9g/
make
將randfold加入path
~~~~~~~~~~~~安裝PDF-API2-0.73.tar.gz
tar -zxf PDF-API2-0.73.tar.gz
cd PDF-API2-0.73
mkdir ../mirdeep2/lib/ ?#這個(gè)不能忘了眯亦,一開始就解壓了mirdeep2伤溉,在mirdeep2下面創(chuàng)建一個(gè)lib路徑
perl Makefile.PL PREFIX=/home/disk6/src/mirdeep2 LIB=/home/disk6/src/mirdeep2/lib
make
make test
make install ??#至此,/home/disk6/src/mirdeep2/lib下面已經(jīng)有了兩個(gè)目錄PDF和x86_64-linux-thread-multi
~~~~~~~~~~~~配置mirdeep2的perl5lib 就是那個(gè)PDF了
在~/.bash_profile里面加入
export PERL5LIB=PERL5LIB:/home/disk6/src/mirdeep2/lib
~~~~~~~~~測(cè)試所有安裝過的軟件是否正常
to test if everything is installed properly type in
1) bowtie
2) RNAfold -h
3) randfold
4) make_html.pl
~~~~~~~~~~最后妻率,在path中加入miRDeep2的路徑
二乱顾、mirDeep2介紹
miRDeep2的文件夾下面有自帶的tutorial,參考通過參考這個(gè)例子學(xué)習(xí)miRDeep2.
tutorial_dir文件夾里有下面幾個(gè)文件宫静,.fa為fasta格式走净。
cel_cluster.fa:????????????#???研究物種的基因組文件
mature_ref_this_species.fa:?????????#???研究物種的成熟miRNA文件,miRBase有下載
mature_ref_other_species.fa:????????# 其他物種相關(guān)的成熟miRNA文件囊嘉,miRBase有下載
precursors_ref_this_species.fa:?????# 研究物種miRNA前體的文件温技,miRBase有下載
reads.fa:???????????????????????????#???deep sequencing reads
~~~~~~~~~~第一步~~~~~~~~~
#??利用bowtie-build建立基因組文件的index
bowtie-build cel_cluster.fa cel_cluster??????#???cel_cluster.fa是基因組文件,cel_cluster是index文件的
前綴扭粱,這個(gè)前綴可以是任意的
#???字符舵鳞,不一定要和基因組文件相同。
~~~~~~~~~~第二步~~~~~~~~~
#??處理reads文件并且把它map到基因上
perl mapper.pl reads.fa -c -j -k TCGTATGCCGTCTTCTGCTTGT??-l 18 -m -p cel_cluster -s
reads_collapsed.fa -t?reads_collapsed_vs_genome.arf -v
參數(shù)講解
-c 指出輸入文件是fasta格式琢蛤,同類的參數(shù)還有-a(seq.txt format),-b(qseq.txt format),-e(fastq format),-d
(contig file)
-j 刪除不規(guī)范的字母(不規(guī)范的字母是指除a,c,g,t,u,n,A,C,G,T,U,N之外的字母)
-k 剪切接頭蜓堕,后跟接頭序列,例子中的TCGTATGCCGTCTTCTGCTTGT就是接頭
-l 忽視小于某長(zhǎng)度的序列博其,例子中忽視18nt長(zhǎng)度的reads
-m collapses the reads
-p 將處理過的reads map到之前建立過索引的基因組上套才,例子中的cel_cluster
-s 指出將處理過的reads輸出到某個(gè)文件,例子中將處理過的reads輸出到reads_collapsed.fa
-t 指出將mapping的結(jié)果輸出到某個(gè)文件慕淡,例子中將mapping后的結(jié)果輸出到reads_collapsed_vs_genome.arf文件中
-v 在屏幕上顯示處理的動(dòng)作背伴,加v和不加v的區(qū)別見附注1,明顯看出來加v后屏幕不僅顯示了一個(gè)處理后的summary,而
且顯示了mapper的動(dòng)作傻寂,如discarding息尺,clipping,collapsing疾掰,trimming搂誉。不加v屏幕上只顯示一個(gè)summary
例子中未使用的參數(shù)
處理/mapping參數(shù)
-g 給reads一個(gè)前綴,默認(rèn)是seq静檬。-s和-t兩個(gè)輸出文件中reads前面會(huì)多出seq三個(gè)字母炭懊。
-h parse to fasta format
-i 轉(zhuǎn)換rna成dna(再map到基因組)convert rna to dna alphabet (to map against genome)
-q 種子序列中一個(gè)錯(cuò)配(mapping的時(shí)間會(huì)變長(zhǎng)?拂檩?)map with one mismatch in the seed (mapping takes
longer)
-r 允許在基因組上map到的最多的位置數(shù)侮腹,默認(rèn)是5。也就是說最多map 5個(gè)位置
-u 不移除臨時(shí)文件的路徑
-n 覆蓋已有文件
~~~~~~~~~~第三步~~~~~~~~~
# fast quantitation of reads mapping to known miRBase precursors.
(This step is not required for
identification of known and novel miRNAs in the deep sequencing data when using miRDeep2.pl.)
快速定量reads mapping到已知的miRNA前體广恢。利用miRDeep.pl在deep sequencing數(shù)據(jù)中鑒定已知和未知的miRNA凯旋,這
一步不是必須的。
quantifier.pl -p precursors_ref_this_species.fa -m mature_ref_this_species.fa -r reads_collapsed.fa
-t cel -y 16_19
參數(shù)講解
-p miRNA前體文件钉迷,miRBase可以下載
-m 成熟miRNA序列文件,miRBase可以下載
-r reads文件
-t 物種钠署,可以指定某個(gè)物種糠聪,這樣分析的時(shí)候只考慮某個(gè)物種的數(shù)據(jù)。也可以不指定,分析所有的
-y [time]????optional otherwise its generating a new one
屏幕上顯示的結(jié)果
getting samples and corresponding read numbers
seq?????374333 reads
Converting input files
building bowtie index
mapping mature sequences against index
# reads processed: 174
# reads with at least one reported alignment: 6 (3.45%)
# reads that failed to align: 168 (96.55%)
Reported 6 alignments to 1 output stream(s)
mapping read sequences against index
# reads processed: 1505
# reads with at least one reported alignment: 1088 (72.29%)
# reads that failed to align: 417 (27.71%)
Reported 1099 alignments to 1 output stream(s)
analyzing data
6 mature mappings to precursors
Expressed miRNAs are written to expression_analyses/expression_analyses_16_19/miRNA_expressed.csv
not expressed miRNAs are written to
expression_analyses/expression_analyses_16_19/miRNA_not_expressed.csv
Creating miRBase.mrd file
after READS READ IN thing
make_html2.pl -q expression_analyses/expression_analyses_16_19/miRBase.mrd -k
mature_ref_this_species.fa -z -t C.elegans -y 16_19??-o -i
expression_analyses/expression_analyses_16_19/mature_ref_this_species_mapped.arf??-l -m cel
miRNAs_expressed_all_samples_16_19.csv
miRNAs_expressed_all_samples_16_19.csv file with miRNA expression values
parsing miRBase.mrd file finished
creating PDF files
creating pdf for cel-mir-39 finished
creating pdf for cel-mir-40 finished
creating pdf for cel-mir-37 finished
creating pdf for cel-mir-36 finished
creating pdf for cel-mir-38 finished
creating pdf for cel-mir-41 finished
#
得到幾個(gè)文件谐鼎,expression_16_19.html舰蟆,expression_analyses文件夾(里面有很多文件),
iRNAs_expressed_all_samples_16_19.csv
狸棍,pdfs_16_19文件夾
~~~~~~~~~~第四步~~~~~~~~~
#在deep sequencing data中鑒定已知和未知的miRNA
miRDeep2.pl reads_collapsed.fa cel_cluster.fa reads_collapsed_vs_genome.arf
mature_ref_this_species.fa mature_ref_other_species.fa precursors_ref_this_species.fa -t C.elegans
2> report.log
# reads_collapsed.fa是經(jīng)過mapper.pl處理的reads身害。
# cel_cluster.fa是基因組文件
# reads_collapsed_vs_genome.arf mapping的結(jié)果
# mature_ref_this_species.fa研究物種的成熟miRNA文件,miRBase有下載
# mature_ref_other_species.fa其他物種相關(guān)的成熟miRNA文件草戈,miRBase有下載
# precursors_ref_this_species.fa研究物種miRNA前體的文件塌鸯,miRBase有下載
# 如果你只有reads,arf文件唐片,genome文件丙猬,其他文件沒有,需要這樣表示miRNAs_ref/none miRNAs_other/none
precursors/none费韭,本物種的成熟miRNA無茧球,其他相關(guān)物種也無,更沒有前體星持。
參數(shù)說明
-t 物種
2> repot.log表示將所有的步驟輸出到report.log文件中
# 屏幕顯示
#####################################
#???????????????????????????????????#
# miRDeep2??????????????????????????#
#???????????????????????????????????#
# last change: 07/07/2011???????????#
#???????????????????????????????????#
#####################################
miRDeep2 started at 19:44:43
#Starting miRDeep2
#testing input files
#Quantitation of known miRNAs in data
#parsing genome mappings
#excising precursors
#preparing signature
#folding precursors
#computing randfold p-values
#running miRDeep core algorithm
#running permuted controls
#doing survey of accuracy
#producing graphic results
miRDeep runtime:
started: 19:44:43
ended: 19:46:15
total:0h:1m:32s
~~~~~~~~~~第五步~~~~~~~~~
# 瀏覽結(jié)果
用瀏覽器打開.html文件
注意抢埋,cel-miR-37預(yù)測(cè)了兩次。因?yàn)檫@個(gè)位點(diǎn)的兩個(gè)潛在的前體可以折疊成發(fā)卡結(jié)構(gòu)。然而揪垄,注釋的發(fā)卡結(jié)構(gòu)得分遠(yuǎn)遠(yuǎn)
高于未注釋的發(fā)卡結(jié)構(gòu)(miRDeep2 score 6.1e+4 vs. -0.2)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~附注1~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
######加v###屏幕上輸出的結(jié)果如下####
discarding sequences with non-canonical letters
clipping 3' adapters
discarding short reads
collapsing reads
mapping reads to genome index
# reads processed: 1609
# reads with at least one reported alignment: 470 (29.21%)
# reads that failed to align: 1139 (70.79%)
Reported 480 alignments to 1 output stream(s)
trimming unmapped nts in the 3' ends
######不加v###屏幕上輸出的結(jié)果如下####
# reads processed: 1609
# reads with at least one reported alignment: 470 (29.21%)
# reads that failed to align: 1139 (70.79%)
Reported 480 alignments to 1 output stream(s)
~~~~~~~~~~~~~~附注1~~~~~~~~~~~~~~~~~~