有人問(wèn)miRNA靶基因如何預(yù)測(cè)的問(wèn)題蚁鳖,鑒于自己也做過(guò)相關(guān)的分析昏兆,稍有點(diǎn)經(jīng)驗(yàn)鸳碧,總結(jié)記錄一下各淀。
一懒鉴、miranda預(yù)測(cè)
- 軟件安裝
$ wget http://cbio.mskcc.org/microrna_data/miRanda-aug2010.tar.gz
$ tar -xvf miRanda-aug2010.tar.gz
$ cd miRanda-3.3a/
$ ./configure --prefix=/Lustre01/user/software/miRanda-3.3a/
$ make && make install
$ cd ***/miRanda-3.3a/bin #你自己的軟件路徑
$ ./miranda --help #產(chǎn)看幫助信息和參數(shù)等等,具體如下
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
miranda v3.3a microRNA Target Scanning Algorithm
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
(c) 2003 Memorial Sloan-Kettering Cancer Center, New York
Authors: Anton Enright, Bino John, Chris Sander and Debora Marks
(mirnatargets (at) cbio.mskcc.org - reaches all authors)
Software written by: Anton Enright
Distributed for anyone to use under the GNU Public License (GPL),
See the files 'COPYING' and 'LICENSE' for details
If you use this software please cite:
Enright AJ, John B, Gaul U, Tuschl T, Sander C and Marks DS;
(2003) Genome Biology; 5(1):R1.
miranda comes with ABSOLUTELY NO WARRANTY;
This is free software, and you are welcome to redistribute it
under certain conditions; type `miranda --license' for details.
miRanda is an miRNA target scanner which aims to predict mRNA
targets for microRNAs using dynamic-programming alignment and
thermodynamics.
Usage: miranda file1 file2 [options..]
Where:
'file1' is a FASTA file with a microRNA query
'file2' is a FASTA file containing the sequence(s)
to be scanned.
OPTIONS
--help -h Display this message
--version -v Display version information
--license Display license information
Core algorithm parameters:
-sc S Set score threshold to S [DEFAULT: 140.0]
-en -E Set energy threshold to -E kcal/mol [DEFAULT: 1.0]
-scale Z Set scaling parameter to Z [DEFAULT: 4.0]
-strict Demand strict 5' seed pairing [DEFAULT: off]
Alignment parameters:
-go -X Set gap-open penalty to -X [DEFAULT: -4.0]
-ge -Y Set gap-extend penalty to -Y [DEFAULT: -9.0]
General Options:
-out file Output results to file [DEFAULT: off]
-quiet Output fewer event notifications [DEFAULT: off]
-trim T Trim reference sequences to T nt [DEFAULT: off]
-noenergy Do not perform thermodynamics [DEFAULT: off]
-restrict file Restricts scans to those between
specific miRNAs and UTRs
provided in a pairwise
tab-separated file [DEFAULT: off]
Other Options:
-keyval Key value pairs ?? [DEFAULT:]
This software will be further developed under the open source model,
coordinated by Anton Enright and Chris Sander (miranda (at) cbio.mskcc.org).
Please send bug reports to: miranda (at) cbio.mskcc.org.
- 測(cè)試example
$ cd ***/miRanda-3.3a/examples #你自己的軟件路徑
$ ls #查看路徑下有兩個(gè)fastq序列文件
bantam_stRNA.fasta hid_UTR.fasta
# 我們挨個(gè)看一下:[如下碎浇,前者是miRNA成熟序列文件(可以是下載的miRNA序列临谱,自己鑒定的miRNA序列等等),后者是待匹配/查找的基因文件(circRNA奴璃、lncRNA悉默、PCG等序列文件)。]
$ less -SN bantam_stRNA.fasta
1 >gi|29565487|emb|AJ550546.1| Drosophila melanogaster microRNA miR-bantam
2 GTGAGATCATTTTGAAAGCTG
$ less -SN hid_UTR.fasta
1 >gi|945100|gb|U31226.1|DMU31226 Drosophila melanogaster head involution defective protein (hid) mRNA, complete cds (3'UTR only)
2 TGACAAAAAATAAAAAACGAAATCCATCGTGAACAGTTTTGTGTTTTTAAATCAGTTCTAAACACGAAAA
3 GGGTTGATGAAAAACGCAGAAGAATCCGAAAAACTAACTAACCGAGCAAAAACTTGACTTGAGTGTTGTT
4 TGACAAATCAGGAAAGATAAAAAACAAATCATAAGAAAAAACTGCACGAAAAATGAAAAAGTTTCTAATA
5 TTCAAAATCTTGCACAAGAAATACAAAATCAATTAAAGTGAACTCTAACCAAAAGTTGTACACAAAATAA
6 AAAGCAAAACAAAGCAGCGAAGAACAATCACAAGAAGAGCAAAGTGCCAACAAAGTGCAGGAAGGAAGGA
7 AGCGGATAAGGACAAAAAGGAAGCCAGCACACACACACACACCCACACAATGGCCGTGCCCTTTTATTTG
8 CCCGAGGGCGGCGCCGATGACGTAGCGTCGAGTTCATCGGGAGCCTCGGGCAACTCCTCCCCCCACAACC
9 ACCCACTTCCCTCGAGCGCATCCTCGTCCGTCTCCTCCTCGGGCGTGTCCTCGGCCTCCGCCTCCTCGGC
10 CTCATCTTCGTCATCCGCATCGTCGGACGGCGCCAGCAGCGCCGCCTCGCAATCGCCGAACACCACCACC
11 TCGTCGGCCACGCAGACGCCGATGCAGTCTCCACTGCCCACCGACCAAGTGCTATACGCCCTCTACGAGT
12 GGGTCAGGATGTACCAGAGCCAGCAGAGTGCCCCGCAAATCTTCCAGTATCCGCCGCCAAGCCCCTCTTG
13 CAATTTCACTGGCGGCGATGTGTTCTTTCCGCACGGCCATCCGAATCCGAACTCGAATCCCCATCCGCGC
14 ACCCCCCGAACCAGCGTGAGCTTCTCCTCCGGCGAGGAGTACAACTTCTTCCGGCAGCAGCAGCCGCAAC
15 CACATCCGTCATATCCGGCGCCATCAACACCGCAGCCAATGCCACCGCAGTCAGCGCCGCCGATGCACTG
16 CAGCCACAGCTACCCGCAGCAGTCGGCGCACATGATGCCACACCATTCCGCTCCCTTCGGAATGGGCGGT
17 ACCTACTACGCCGGCTACACGCCACCACCCACTCCGAACACGGCCAGTGCGGGCACCTCCAGCTCATCGG
18 CGGCCTTCGGCTGGCACGGCCACCCCCACAGCCCCTTCACGTCGACCTCCACGCCGTTATCGGCGCCAGT
19 GGCGCCCAAGATGCGCCTGCAGCGCAGCCAGTCGGATGCGGCCAGACGCAAGCGATTGACCTCGACGGGC
20 GAGGATGAGCGCGAGTACCAGAGCGATCATGAGGCCACTTGGGACGAGTTTGGCGATCGCTACGACAACT
運(yùn)行測(cè)試預(yù)測(cè)靶基因代碼(提前添加bin到.bashrc):
$ miranda bantam_stRNA.fasta hid_UTR.fasta > test.out.txt # 這里只設(shè)置輸入與輸出苟穆,其他參數(shù)默認(rèn)抄课,或者如下添加其他參數(shù)
$ miranda bantam_stRNA.fasta hid_UTR.fasta -sc 140 -en -1 > test.out.txt
$ cat test.out.txt | grep ">" | less -SN
1 >gi|29565487|emb|AJ550546.1| gi|945100|gb|U31226.1|DMU31226 167.00 -24.54 2 20 3340 3360 18 83.33% 94.44%
2 >gi|29565487|emb|AJ550546.1| gi|945100|gb|U31226.1|DMU31226 156.00 -20.03 2 17 2505 2525 15 86.67% 93.33%
3 >gi|29565487|emb|AJ550546.1| gi|945100|gb|U31226.1|DMU31226 155.00 -14.57 2 16 2852 2872 14 78.57% 85.71%
4 >gi|29565487|emb|AJ550546.1| gi|945100|gb|U31226.1|DMU31226 152.00 -14.18 2 18 3820 3841 17 76.47% 76.47%
5 >>gi|29565487|emb|AJ550546.1| gi|945100|gb|U31226.1|DMU31226 630.00 -73.32 167.00 -24.54 1 21 3902 3340 2505 2852 3820
# 結(jié)果解讀:
1)">"開(kāi)頭的幾行唱星,是miRNA(gi|29565487|emb|AJ550546.1|)靶標(biāo)到基因(gi|945100|gb|U31226.1|DMU31226)上的不同位置
2)">"開(kāi)頭的行對(duì)應(yīng)的信息依次是:miRNA id,基因id跟磨,打分间聊,自由能,miRNA起始位置抵拘,miRNA終止位置哎榴,基因起始位置,基因終止位置僵蛛,靶標(biāo)結(jié)合miRNA長(zhǎng)度尚蝌,miRNA結(jié)合長(zhǎng)度與miRNA總長(zhǎng)占比,基因上結(jié)合長(zhǎng)度(大于等于前者充尉,可能是一個(gè)跨度)對(duì)miRNA總長(zhǎng)占比
3)">>"的行是綜合信息驼壶,依次代表:miRNA id,基因id喉酌,總打分,總自由能泵喘,最大打分泪电,最大自由能,鏈信息纪铺,miRNA長(zhǎng)度相速,基因長(zhǎng)度,靶標(biāo)到基因上的位置(1個(gè)或多個(gè)鲜锚,這里是4個(gè))
4)綜上突诬,我們可以抓取“>>”的行,來(lái)獲取靶向的基因芜繁,如果需要其他信息也可以自己按需提取旺隙,so easy
- 再舉一例(個(gè)性化分析)
miRNA鑒定等分析這里就不說(shuō)了(贅述),(1)我們測(cè)miRNA數(shù)據(jù)骏令,鑒定并得到差異miRNA(known or novel均可)蔬捷,是否希望看一下他們靶向與那些基因呢?(2)文獻(xiàn)榔袋、試驗(yàn)等等周拐,發(fā)現(xiàn)某些miRNA(可獲取序列或名字的)有著重要的生物學(xué)功能,機(jī)理尚不明確凰兑,是否也想看看其靶向關(guān)系呢妥粟?可以運(yùn)行miranda。按照(2)我們?cè)鞄讉€(gè)miRNA進(jìn)行舉例:
3.1 下載miRBase成熟序列吏够,提取human的miRNA進(jìn)行舉例
$ wget ftp://mirbase.org/pub/mirbase/CURRENT/mature.fa.gz
$ gunzip mature.fa.gz
$ less -SN mature.fa # 格式如下
1 >cel-let-7-5p MIMAT0000001 Caenorhabditis elegans let-7-5p
2 UGAGGUAGUAGGUUGUAUAGUU
3 >cel-let-7-3p MIMAT0015091 Caenorhabditis elegans let-7-3p
4 CUAUGCAAUUUUCUACCUUACC
5 >cel-lin-4-5p MIMAT0000002 Caenorhabditis elegans lin-4-5p
6 UCCCUGAGACCUCAAGUGUGA
7 >cel-lin-4-3p MIMAT0015092 Caenorhabditis elegans lin-4-3p
8 ACACCUGGGCUCUCCGGGUACC
我通過(guò)python編程取出10個(gè)human的miRNA勾给,存為hsa_mature10.fa
$ cat hsa_mature10.fa
>hsa-let-7a-5p
UGAGGUAGUAGGUUGUAUAGUU
>hsa-let-7a-3p
CUAUACAAUCUACUGUCUUUC
>hsa-let-7a-2-3p
CUGUACAGCCUCCUAGCUUUCC
>hsa-let-7b-5p
UGAGGUAGUAGGUUGUGUGGUU
>hsa-let-7b-3p
CUAUACAACCUACUGCCUUCCC
>hsa-let-7c-5p
UGAGGUAGUAGGUUGUAUGGUU
>hsa-let-7c-3p
CUGUACAACCUUCUAGCUUUCC
>hsa-let-7d-5p
AGAGGUAGUAGGUUGCAUAGUU
>hsa-let-7d-3p
CUAUACGACCUGCUGCCUUUCU
>hsa-let-7e-5p
UGAGGUAGGAGGUUGUAUAGUU
接著準(zhǔn)備靶向的基因集合滩报,我是通過(guò)gtf位置提取了human基因組所有的轉(zhuǎn)錄本,取了其中17個(gè)來(lái)測(cè)試(17.fa)锦秒,當(dāng)然你也可以去ensembl下載CDS序列露泊,其他方法得到的基因序列均可。
$ grep ">" 17.fa
>ENST00000000233.9 +
>ENST00000000412.7 -
>ENST00000000442.10 +
>ENST00000001008.5 +
>ENST00000001146.6 -
>ENST00000002125.8 +
>ENST00000002165.10 -
>ENST00000002501.10 -
>ENST00000002596.5 -
>ENST00000002829.7 +
>ENST00000003084.10 +
>ENST00000003100.12 -
>ENST00000003302.8 -
>ENST00000003583.12 -
>ENST00000003912.7 +
>ENST00000004103.7 +
>ENST00000004531.14 +
然后預(yù)測(cè)二者的靶標(biāo)關(guān)系(通過(guò)管道輸出):
$ miranda hsa_mature10.fa 17.fa -sc 140 -en -1 | grep ">>" > hsa_mature10_targets.txt
我們大概看看幾行結(jié)果:
$ less -SN hsa_mature10_targets.txt
1 >>hsa-let-7a-5p ENST00000000412.7 146.00 -11.97 146.00 -11.97 2 22 2756 960
2 >>hsa-let-7a-5p ENST00000001008.5 143.00 -17.75 143.00 -17.75 4 22 3732 2611
3 >>hsa-let-7a-5p ENST00000002125.8 156.00 -21.39 156.00 -21.39 6 22 2176 1248
4 >>hsa-let-7a-5p ENST00000002829.7 146.00 -16.38 146.00 -16.38 10 22 3802 3480
5 >>hsa-let-7a-5p ENST00000003084.10 299.00 -39.65 152.00 -20.76 11 22 6132 3077 5087
6 >>hsa-let-7a-5p ENST00000003100.12 143.00 -18.40 143.00 -18.40 12 22 3210 2586
7 >>hsa-let-7a-5p ENST00000003302.8 438.00 -47.15 152.00 -16.09 13 22 4669 1132 2011 4340
8 >>hsa-let-7a-5p ENST00000004531.14 163.00 -20.40 163.00 -20.40 17 22 7560 1101
9 >>hsa-let-7a-3p ENST00000002125.8 161.00 -12.52 161.00 -12.52 23 21 2176 1476
10 >>hsa-let-7a-3p ENST00000002165.10 158.00 -14.96 158.00 -14.96 24 21 2356 1993
11 >>hsa-let-7a-3p ENST00000003084.10 148.00 -6.47 148.00 -6.47 28 21 6132 5758
12 >>hsa-let-7a-3p ENST00000003302.8 149.00 -16.95 149.00 -16.95 30 21 4669 3959
13 >>hsa-let-7a-3p ENST00000003912.7 146.00 -7.39 146.00 -7.39 32 21 5481 3640
14 >>hsa-let-7a-3p ENST00000004531.14 292.00 -16.76 146.00 -13.42 34 21 7560 3954 6657
tips:我們看到結(jié)果的格式如上面例子所描述的一致旅择,此外惭笑,也看到同一miRNA靶向與不同的轉(zhuǎn)錄本(基因)上面,關(guān)于怎么去解釋結(jié)果和行文生真、試驗(yàn)等等沉噩,就要看你的閱歷和水平了≈埃可以提取所有結(jié)果進(jìn)行網(wǎng)絡(luò)圖的構(gòu)建川蒙、可以提取最好(根據(jù)打分和自由能)的基因進(jìn)行驗(yàn)證(相關(guān)性計(jì)算、定量长已、濕試驗(yàn)等等)畜眨、單純查找資料討論∈跷停【轉(zhuǎn)錄本與基因的對(duì)應(yīng)關(guān)系需要自行轉(zhuǎn)換】康聂。下面也附上miRBase下載成熟序列的截圖,也很簡(jiǎn)單胞四√裰【本教程再提取序列時(shí)需要一定的編程基礎(chǔ),如果你只有兩者的序列辜伟,也是可以進(jìn)行靶向預(yù)測(cè)的】
http://www.mirbase.org
download