歡迎關(guān)注公眾號(hào):oddxix
簡(jiǎn)介
Region-based annotation是基于區(qū)域進(jìn)行注釋的睛低,它不用考慮精確位置也不考慮核苷酸的同一性桨武。進(jìn)行注釋時(shí)要使用--regionanno 參數(shù)(默認(rèn)是 --geneanno ),縮寫
包含的數(shù)據(jù)庫(kù)
- Conserved genomic elements annotation
- Transcription factor binding site
- Identify cytogenetic band
- Identify variants disrupting microRNAs and snoRNAs
- Identify variants disrupting predicted microRNA binding sites
- Identify variants located in segmental duplications
- Identify previously reported structural variants
- Identify variants reported in previously published GWAS
- Identify variants in ENCODE annotated regions
- Identify dbSNP variants in user-specified regions
- Identify non-coding variants that disrupt enhancers, repressors, promoters
- Identify variants in other genomic regions annotated with other functions
- Annotating custom-made databases conforming to GFF3 (Generic Feature Format version 3)
- Identifying variants in regions specified in BED files
Conserved genomic elements annotation 保守基因組注釋
用來(lái)鑒定保守基因組區(qū)域的變體 媒区。-dbtype phastConsElements46way
#下載
annotate_variation.pl -build hg19 -downdb phastConsElements46way humandb/
NOTICE: Web-based checking to see whether ANNOVAR new version is available ... Done
NOTICE: Downloading annotation database http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/phastConsElements46way.txt.gz ... OK
NOTICE: Uncompressing downloaded files
NOTICE: Finished downloading annotation files for hg19 build version, with files saved at the 'humandb' directory
#注釋
annotate_variation.pl -regionanno -build hg19 -out ex1 -dbtype phastConsElements46way example/ex1.avinput humandb/
NOTICE: Reading annotation database humandb/hg19_phastConsElements46way.txt ... Done with 5163775 regions
結(jié)果說(shuō)明
phastConsElements46way Score=420;Name=lod=68 16 50756540 50756540 G C comments: rs2066845 (G908R), a non-synonymous SNP in NOD2
phastConsElements46way Score=385;Name=lod=49 16 50763778 50763778 - C comments: rs2066847 (c.3016_3017insC), a frameshift SNP in NOD2
phastConsElements46way Score=395;Name=lod=54 13 20763686 20763686 G - comments: rs1801002 (del35G), a frameshift mutation in GJB2, associated with hearing loss
phastConsElements46way Score=545;Name=lod=218 13 20797176 21105944 0 - comments: a 342kb deletion encompassing GJB6, associated with hearing loss
第一列 phastConsElements46way:表示注釋的類型;
第二列:表示分?jǐn)?shù)和名稱,分?jǐn)?shù)是UCSC Genome Browser分配的歸一化分?jǐn)?shù)梳星,該分?jǐn)?shù)范圍從0到1000,其目的是在瀏覽器中顯示標(biāo)準(zhǔn)值范圍滚朵;
他列與輸入文件中的列完全相同冤灾。只有實(shí)際位于保守區(qū)域內(nèi)的變體才會(huì)打印在輸出文件中
Transcription factor binding site 轉(zhuǎn)錄因子結(jié)合位點(diǎn)
包含人/小鼠/大鼠比對(duì)中保守的轉(zhuǎn)錄因子結(jié)合位點(diǎn)的位置和分?jǐn)?shù),其中得分和閾值使用Transfac Matrix Database計(jì)算辕近,-dbtype tfbsConsSites
#下載
annotate_variation.pl -build hg19 -downdb tfbsConsSites humandb/
#注釋
annotate_variation.pl -regionanno -build hg19 -out ex1 -dbtype tfbsConsSites example/ex1.avinput humandb/
#結(jié)果文件
tfbsConsSites Score=767;Name=V$PAX5_02 16 50745926 50745926 C T comments: rs2066844 (R702W), a non-synonymous SNP in NOD2
tfbsConsSites Score=880;Name=V$CEBPA_01 16 50756540 50756540 G C comments: rs2066845 (G908R), a non-synonymous SNP in NOD2
tfbsConsSites Score=878;Name=V$FREAC3_01 13 20763686 20763686 G - comments: rs1801002 (del35G), a frameshift mutation in GJB2, associated with hearing loss
tfbsConsSites Score=1000;Name=V$STAT4_01,V$AML1_01,V$SRY_01,V$MZF1_01 13 20797176 21105944 0 - comments: a 342kb deletion encompassing GJB6, associated with hearing loss
第二列具有分?jǐn)?shù)和名稱注釋韵吨,表示標(biāo)準(zhǔn)化分?jǐn)?shù)和結(jié)合位點(diǎn)主題名稱。諸如V $ PAX5_02之類的名稱僅表示某些轉(zhuǎn)錄因子識(shí)別的主題的名稱移宅。由用戶來(lái)決定哪些轉(zhuǎn)錄因子可以識(shí)別該基序归粉。
Identify cytogenetic band 細(xì)胞遺傳學(xué)帶識(shí)別
為了鑒定Giemsa染色的染色體條帶椿疗,可以使用-dbtype cytoBand
#下載
annotate_variation.pl -build hg19 -downdb cytoBand humandb/
#注釋
annotate_variation.pl -regionanno -build hg19 -out ex1 -dbtype cytoBand example/ex1.avinput humandb/
#結(jié)果文件
cytoBand 1q23.3 1 162736463 162736463 C T comments: rs1000050, a SNP in Illumina SNP arrays
cytoBand 1p31.1 1 84875173 84875173 C T comments: rs6576700 or SNP_A-1780419, a SNP in Affymetrix SNP arrays
cytoBand 1p36.21 1 13211293 13211294 TC - comments: rs59770105, a 2-bp deletion
cytoBand 1p36.22 1 11403596 11403596 - AT comments: rs35561142, a 2-bp insertion
Identify variants disrupting microRNAs and snoRNAs 確定破壞microRNA和snoRNA的變體
UCSC基于miRBase Release和snoRNABase提供snoRNA和microRNA的wgRna表 】返浚可以使用-dbtype wgRna
#下載
annotate_variation.pl -build hg19 -downdb wgRna humandb/
#注釋
annotate_variation.pl -regionanno -build hg19 -out ex1 -dbtype wgRna example/ex1.avinput humandb/
Identify variants disrupting predicted microRNA binding sites 鑒定破壞預(yù)測(cè)的microRNA結(jié)合位點(diǎn)的變體届榄。
發(fā)現(xiàn)破壞預(yù)測(cè)的microRNA結(jié)合位點(diǎn)的變體。 UCSC提供TargetScanS注釋數(shù)據(jù)庫(kù)倔喂,該數(shù)據(jù)庫(kù)顯示保守的哺乳動(dòng)物microRNA調(diào)控靶位點(diǎn)铝条,用于Refseq基因的3'UTR區(qū)域中保守的microRNA家族,可以使用-dbtype targetScanS
#下載
annotate_variation.pl -build hg19 -downdb targetScanS humandb/
#注釋
annotate_variation.pl -regionanno -build hg19 -out ex1 -dbtype targetScanS example/ex1.avinput humandb/
Identify variants located in segmental duplications 識(shí)別位于分段重復(fù)中的變體
映射到節(jié)段重復(fù)的遺傳變異很可能是序列比對(duì)錯(cuò)誤席噩,有時(shí)它們表現(xiàn)為具有高折疊覆蓋率且可能具有高置信度得分的SNP班缰,實(shí)際上可能代表基因組中恰好具有相同側(cè)翼序列的兩個(gè)非多態(tài)性位點(diǎn)〉渴啵可以使用-dbtype genomicSuperDups
#下載
annotate_variation.pl -build hg19 -downdb genomicSuperDups humandb/
#注釋
annotate_variation]$ annotate_variation.pl -regionanno -build hg19 -out ex1 -dbtype genomicSuperDups example/ex1.avinput humandb/
#結(jié)果文件
genomicSuperDups Score=0.99612;Name=chr1:13142561 1 13211293 13211294 TC - comments: rs59770105, a 2-bp deletion
輸出中的“名稱”字段表示基因組中的另一個(gè)“匹配”區(qū)段(其位于chr1:13142561處的相同染色體中)埠忘。 “得分”字段是兩個(gè)基因組區(qū)段之間具有indel的序列同一性。
Identify previously reported structural variants 已報(bào)道的結(jié)構(gòu)變異確定
ANNOVAR還可以方便地對(duì)缺失和重復(fù)進(jìn)行注釋萧芙,并將它們與基因組變體數(shù)據(jù)庫(kù)(DGV)中先前發(fā)表的變體進(jìn)行比較给梅,--dbtype dgvMerged。
#下載
annotate_variation.pl -build hg19 -downdb dgvMerged humandb/
#注釋
annotate_variation.pl -regionanno -build hg19 -out ex1 -dbtype dgvMerged example/ex1.avinput humandb/
#結(jié)果文件
dgvMerged Name=nsv516482 1 162736463 162736463 C T comments: rs1000050, a SNP in Illumina SNP arrays
dgvMerged Name=nsv830437 1 84875173 84875173 C T comments: rs6576700 or SNP_A-1780419, a SNP in Affymetrix SNP arrays
dgvMerged Name=dgv168n71,nsv821616,nsv510936,nsv871634,esv27872,nsv870885,dgv147n71,nsv517190,nsv821333,nsv871687,nsv7172,nsv428410,nsv8768,dgv20e1,dgv22e1 1 13211293 13211294 TC - comments: rs59770105, a 2-bp deletion
dgvMerged Name=nsv832536 1 11403596 11403596 - AT comments: rs35561142, a 2-bp insertion
對(duì)于刪除和重復(fù)的結(jié)構(gòu)變體双揪,-minqueryoverlap可以去除SNP或indel重疊(否則許多SNP / indel將與DGV區(qū)域重疊)
Identify variants reported in previously published GWAS 確定在GWAS中已報(bào)道的變體
查找先前報(bào)告的與全基因組關(guān)聯(lián)研究中的疾病或性狀相關(guān)的變體动羽,可以使用 -dbtype gwasCatalog
#下載
annotate_variation.pl -build hg19 -downdb gwasCatalog humandb/
#注釋
annotate_variation.pl -regionanno -build hg19 -out ex1 -dbtype gwasCatalog example/ex1.avinput humandb/
#結(jié)果文件
gwasCatalog Name=Ankylosing spondylitis,Crohn's disease,Ulcerative colitis,Psoriasis,Inflammatory bowel disease 1 67705958 67705958 comments: rs11209026 (R381Q), a SNP in IL23R associated with Crohn's disease
gwasCatalog Name=Crohn's disease 2 234183368 234183368 A G comments: rs2241880 (T300A), a SNP in the ATG16L1 associated with Crohn's disease
gwasCatalog Name=Crohn's disease 16 50763778 50763778 - C comments: rs2066847 (c.3016_3017insC), a frameshift SNP in NOD2
Identify variants in ENCODE annotated regions識(shí)別ENCODE注釋區(qū)域中的變體
ENCODE現(xiàn)在在ANNOVAR可以注釋的基因組瀏覽器軌道中提供大量數(shù)據(jù),例如轉(zhuǎn)錄區(qū)域渔期,H3K4Me1區(qū)域运吓,H3K4Me3區(qū)域,H3K27Ac區(qū)域疯趟,DNaseI超敏感區(qū)域拘哨,轉(zhuǎn)錄因子ChIP-Seq區(qū)域等。
Identify dbSNP variants in user-specified regions識(shí)別用戶指定區(qū)域中的dbSNP變體
這與基于過(guò)濾的注釋不同信峻。在這里倦青,我們只關(guān)心兩個(gè)區(qū)域是否有重疊,而不是相同盹舞。因此产镐,刪除區(qū)域可以匹配dbSNP數(shù)據(jù)庫(kù)中的多個(gè)SNP。-dbtype snp138