真核生物的基因大都為斷裂基因冷尉,編碼序列通常被內(nèi)含子隔開歪赢。內(nèi)含子和外顯子邊界和周圍序列是前體mRNA內(nèi)的有保守性的一些特殊核苷酸序列慧瘤。
內(nèi)含子的5'端剪切位點(diǎn)以GU開始,叫donor
內(nèi)含子的3'端剪切位點(diǎn)以AG結(jié)束本橙,叫acceptor扳躬,
還包括位于內(nèi)含子內(nèi),靠近3'端的分支位點(diǎn)甚亭,通常為A坦报,后面是多聚嘧啶區(qū)
在分析基因組數(shù)據(jù)時(shí),通常需要預(yù)測(cè)基因的RNA選擇性剪切方式狂鞋,也就是內(nèi)含子和外顯子的位置和數(shù)量。
而基于的就是RNA剪接的保守型序列GU-AG規(guī)則潜的,據(jù)此骚揍,再輔以O(shè)RF,Blast等數(shù)據(jù)可以對(duì)未知基因的成熟mRNA進(jìn)行預(yù)測(cè)啰挪。
預(yù)測(cè)的工具
-
基因組核苷酸序列的包含剪切位點(diǎn)和內(nèi)含子可用NetGene2,Splice View直接預(yù)測(cè)
-
mRNA/cDNA需要借助Splign信不,SIM4,BLAS,BLAST等從相應(yīng)基因組序列推斷基因結(jié)構(gòu)
-
The Human Splicing Finder (HSF)
NCBI的Splign預(yù)測(cè)實(shí)例
參考手冊(cè)
1 用Splign識(shí)別mRNA的外顯子組成
或者
- Navigate to the Online page using the menu at the top of the page
Navigate to the Online page using the menu at the top of the page- Type or copy/paste you input sequences in the cDNA and Genomic text areas. Sequences in each box can be specified as identifiers (accessions or GIs), or in FASTA format. Entering both FASTA data and identifiers in same entry will generate an error. You can specify up to five cDNA sequences at a time, but only one genomic sequence.
- Check "Reverse and complement the query" box if you want your cDNA be aligned in antisense. E.g. EST sequences are often not guaranteed to have a sense orientation.
- Check "Cross-species mode" if your cDNA and genomic sequences are from different species. Internally, the cross-species mode means less stringent blast hits.
- Upon job submission, results will appear in a few seconds or more, depending primarily on the lengths and the number of sequences being spligned. Since fetching large chromosomal sequences (like whole-length human chromosomes) and running blast on them can be time-consuming, consider specifying shorter genomic sequences such as contigs. Smaller chromosomal sequences (e.g. Drosophila chromosomes) are ok.
結(jié)果如下:
結(jié)果解釋
詳細(xì)請(qǐng)參考https://www.ncbi.nlm.nih.gov/sutils/splign/splign.cgi?textpage=documentation
- Plus (sense) and minus signs next to accessions indicate orientations in which the sequences were aligned. The remaining columns are explained below: