詳細(xì)先查看:http://www.bio-info-trainee.com/1469.html
http://www.reibang.com/p/3e545b9a3c68
https://blog.csdn.net/leadingsci/article/details/82947869
ftp://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/
在UCSC下載hg19參考基因組,群主博客有詳細(xì)說(shuō)明镶骗,從gencode數(shù)據(jù)庫(kù)下載基因注釋文件,并且用IGV去查看你感興趣的基因的結(jié)構(gòu)燕酷,例如TP53籍凝,EGFR等等。
截圖幾個(gè)基因的IGV可視化結(jié)構(gòu)苗缩!還可以下載ENSEMBL饵蒂,NCBI的GTF,也導(dǎo)入IGV看看酱讶,截圖基因結(jié)構(gòu)退盯。了解IGV常識(shí)。
首先是NCBI對(duì)應(yīng)UCSC浴麻,對(duì)應(yīng)ENSEMBL數(shù)據(jù)庫(kù):
GRCh36 (hg18): ENSEMBL release_52.
GRCh37 (hg19): ENSEMBL release_59/61/64/68/69/75.
GRCh38 (hg38):?ENSEMBL? release_76/77/78/80/81/82.
可以看到ENSEMBL的版本特別復(fù)雜5梦省!软免!很容易搞混!
但是UCSC的版本就簡(jiǎn)單了焚挠,就hg18,19,38, 常用的是hg19膏萧,但是我推薦大家都轉(zhuǎn)為hg38
看起來(lái)NCBI也是很簡(jiǎn)單,就GRCh36,37,38蝌衔,但是里面水也很深榛泛!
可以看到,有37.1,?? 37.2噩斟,? 37.3 等等曹锨,不過(guò)這種版本一般指的是注釋在更新,基因組序列一般不會(huì)更新L暝省E婕颉!
反正你記住hg19基因組大小是3G斥废,壓縮后八九百兆即可=烽埂!牡肉!
如果要下載GTF注釋文件捧灰,基因組版本尤為重要!M炒浮毛俏!
對(duì)NCBI:ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/GFF/????????? ##最新版(hg38)
ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/??? ## 其它版本
對(duì)于ensembl:
ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/Homo_sapiens.GRCh37.75.gtf.gz
變幻中間的release就可以拿到所有版本信息:ftp://ftp.ensembl.org/pub/
對(duì)于UCSC,那就有點(diǎn)麻煩了:
需要選擇一系列參數(shù):
http://genome.ucsc.edu/cgi-bin/hgTables
1. Navigate tohttp://genome.ucsc.edu/cgi-bin/hgTables
2. Select the following options:
clade: Mammal
genome: Human
assembly: Feb. 2009 (GRCh37/hg19)
group: Genes and Gene Predictions
track: UCSC Genes
table: knownGene
region: Select "genome" for the entire genome.
output format: GTF - gene transfer format
output file: enter a file name to save your results to a file, or leave blank to display results in the browser
3. Click 'get output'.
?現(xiàn)在重點(diǎn)來(lái)了饲窿,搞清楚版本關(guān)系了煌寇,就要下載呀!
UCSC里面下載非常方便免绿,只需要根據(jù)基因組簡(jiǎn)稱(chēng)來(lái)拼接url即可:
http://hgdownload.cse.ucsc.edu/goldenPath/mm10/bigZips/chromFa.tar.gz
http://hgdownload.cse.ucsc.edu/goldenPath/mm9/bigZips/chromFa.tar.gz
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz
http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/chromFa.tar.gz
或者用shell腳本指定下載的染色體號(hào):
for i in $(seq 1 22) X Y M;
do echo $i;
wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/chr${i}.fa.gz;
## 這里也可以用NCBI的:ftp://ftp.ncbi.nih.gov/genomes/M_musculus/ARCHIVE/MGSCv3_Release3/Assembled_Chromosomes/chr前綴
done
gunzip *.gz
for i in $(seq 1 22) X Y M;
do cat chr${i}.fa >> hg19.fasta;
done
rm -fr chr*.fasta