0. 背景知識(shí)
非編碼RNA介紹——百度詞條
miRNA的命名規(guī)則:
詳細(xì)介紹見(jiàn)https://mp.weixin.qq.com/s/SnrUB8v0_Mzlg2f4nHju3Q
1. 下載序列
Ensembl Plants
Favourite genomes中可以直接找到擬南芥
下載基因組序列及注釋?zhuān)蚲enes, cDNAs, ncRNA等序列库快。
Arabidopsis_thaliana.TAIR10.42.gff3.gz
Arabidopsis_thaliana.TAIR10.cdna.abinitio.fa.gz
Arabidopsis_thaliana.TAIR10.cdna.all.fa.gz
Arabidopsis_thaliana.TAIR10.cds.all.fa.gz
Arabidopsis_thaliana.TAIR10.dna.toplevel.fa
Arabidopsis_thaliana.TAIR10.dna.toplevel.fa.gz
Arabidopsis_thaliana.TAIR10.ncrna.fa.gz
Arabidopsis_thaliana.TAIR10.pep.abinitio.fa.gz
Arabidopsis_thaliana.TAIR10.pep.all.fa.gz
$ less Arabidopsis_thaliana.TAIR10.42.gff3.gz | grep "^#" -v | cut -f3 | sort | uniq -c
286067 CDS 7 chromosome
313952 exon 56384 five_prime_UTR
27655 gene 3879 lnc_RNA
325 miRNA 48359 mRNA
377 ncRNA 5178 ncRNA_gene
15 rRNA 287 snoRNA
82 snRNA 48308 three_prime_UTR
689 tRNA
2. 非編碼RNA數(shù)據(jù)庫(kù)
2.1 PNRD
中國(guó)農(nóng)業(yè)大學(xué)開(kāi)發(fā)扳缕,http://structuralbiology.cau.edu.cn/PNRD/download.php
數(shù)據(jù)方便下載蟹演,但是似乎沒(méi)有更新了。
下載擬南芥所有非編碼RNA序列和靶標(biāo)信息响牛。
$ ll
total 24M
-rw-r--r-- 1 huangsiyuan grp3 2.9M Jan 19 11:41 lncRNA.txt
-rw-r--r-- 1 huangsiyuan grp3 64K Jan 19 11:40 miRNA.txt
-rw-r--r-- 1 huangsiyuan grp3 26K Jan 19 11:41 snoRNA.txt
-rw-r--r-- 1 huangsiyuan grp3 8.5K Jan 19 11:42 snRNA.txt
-rw-r--r-- 1 huangsiyuan grp3 233K Jan 19 11:40 stem_loop.txt
-rw-r--r-- 1 huangsiyuan grp3 21M Jan 19 11:44 tar.txt
-rw-r--r-- 1 huangsiyuan grp3 948 Jan 19 11:42 tasiRNA.txt
-rw-r--r-- 1 huangsiyuan grp3 67K Jan 19 11:42 tRNA.txt
看看自己下載的序列
$ head -n 6 miRNA.txt #成熟的miRNA序列
>ath-miR156a
UGACAGAAGAGAGUGAGCAC
>ath-miR156b
UGACAGAAGAGAGUGAGCAC
>ath-miR156c
UGACAGAAGAGAGUGAGCAC
huangsiyuan 13:57:16 ~/learn_rnaseq/srna_project/ref_ncrna
$ head -n 6 stem_loop.txt #莖環(huán)結(jié)構(gòu)序列
>ath-MIR156a
CAAGAGAAACGCAAAGAAACUGACAGAAGAGAGUGAGCACACAAAGGCAAUUUGCAUAUCAUUGCACUUGCUUCUCUUGCGUGCUCACUGCUCUUUCUGUCAGAUUCCGGUGCUGAUCUCUUU
>ath-MIR156b
GCUAGAAGAGGGAGAGAUGGUGAUUGAGGAAUGCAACAGAGAAAACUGACAGAAGAGAGUGAGCACAUGCAGGCACUGUUAUGUGUCUAUAACUUUGCGUGUGCGUGCUCACCUCUCUUUCUGUCAGUUGCCUAUCUCUGCCUGCUUGACCUCUCUCUCUCUCUCUCUCUCUCAAAUUUGGCU
>ath-MIR156c
CGCAUAGAAACUGACAGAAGAGAGUGAGCACACAAAGGCACUUUGCAUGUUCGAUGCAUUUGCUUCUCUUGCGUGCUCACUGCUCUAUCUGUCAGAUUCCGGCU
兩個(gè)文件的順序是對(duì)應(yīng)的
莖環(huán)序列不一樣也能產(chǎn)生高度相似的成熟miRNA
2.2 miRBase
the microRNA
database持钉,這個(gè)數(shù)據(jù)庫(kù)microRNA最全,實(shí)時(shí)更新,可以查詢(xún)莖環(huán)序列苏携、成熟miRNA序列毛仪、靶標(biāo)序列搁嗓,但是似乎不能直接下載下來(lái),需要根據(jù)它提供的gff3文件轉(zhuǎn)換得到序列箱靴。
http://www.mirbase.org/
如何查詢(xún)miRNA的前體結(jié)構(gòu)或序列腺逛?
1
2
3
加工后,莖環(huán)兩條臂上的序列:
4