不同版本基因組坐標(biāo)的轉(zhuǎn)換亿昏,常用的方法有:
1. NCBI的 Remap
參見(jiàn)上一篇文章 : http://www.reibang.com/p/41e5280f59c3
2. UCSC的 LiftOver
https://genome.ucsc.edu/cgi-bin/hgLiftOver
3. CrossMap: http://crossmap.sourceforge.net/#installation
重點(diǎn)介紹和推薦該軟件
該軟件用法簡(jiǎn)單沈矿,只需要傳入2個(gè)文件即可句惯。
3.1 下載和安裝
(1)Use pip to install CrossMap
pip3 install git+https://github.com/liguowang/CrossMap.git
or
pip3 install CrossMap #Install CrossMap supporting Python3
or
conda install CrossMap
(2) Install CrossMap from source code
$ tar zxf CrossMap-VERSION.tar.gz
$ cd CrossMap-VERSION
# install CrossMap to default location. In Linux/Unix, this location is like:
# /home/user/lib/python2.7/site-packages/
$ python setup.py install
# or you can install CrossMap to a specified location:
$ python setup.py install --root=/home/user/CrossMap
# setup PYTHONPATH. Skip this step if CrossMap was installed to default location.
$ export PYTHONPATH=/home/user/CrossMap/usr/local/lib/python2.7/site-packages:$PYTHONPATH.
# Skip this step if CrossMap was installed to default location.
$ export PATH=/home/user/CrossMap/usr/local/bin:$PATH
3.2 下載chain 文件
該文件是在轉(zhuǎn)換坐標(biāo)時(shí)的輸入文件隧魄,可以直接從網(wǎng)站下載涉馁,找到對(duì)應(yīng)的版本信息就可以了江咳,如下:
UCSC built chain files (Human, Homo sapiens)
hg38ToHg19.over.chain.gz (Chain file for hg38 to hg19 conversion)
hg19ToHg38.over.chain.gz (Chain file for hg19 to hg38 conversion)
hg18ToHg38.over.chain.gz (Chain file for hg18 to hg38 conversion)
hg19ToHg18.over.chain.gz (Chain file for hg19 to hg18 conversion)
hg19ToHg17.over.chain.gz (Chain file for hg19 to hg17 conversion)
hg18ToHg19.over.chain.gz (Chain file for hg18 to hg19 conversion)
hg18ToHg17.over.chain.gz (Chain file for hg18 to hg17 conversion)
hg17ToHg19.over.chain.gz (Chain file for hg17 to hg19 conversion)
hg17ToHg18.over.chain.gz (Chain file for hg17 to hg18 conversion)
GRCh37ToHg19.over.chain.gz (Chain file for GRCh37 to hg19 conversion)
hg19ToGRCh37.over.chain.gz (Chain file for hg19 to GRCh37 conversion)
UCSC built chain files (Mouse, Mus musculus)
mm10ToMm9.over.chain.gz (Chain file for mm10 to mm9 conversion)
mm9ToMm10.over.chain.gz (Chain file for mm9 to mm10 conversion)
mm9ToMm8.over.chain.gz (Chain file for mm9 to mm8 conversion)
UCSC Chain file of other species can be downloaded from: http://hgdownload.soe.ucsc.edu/downloads.html
這里主要提供了人的轉(zhuǎn)換文件返咱,比如要把hg38換成hg19的钥庇,就直接下載 (Chain file for hg38 to hg19 conversion) 這個(gè)版本就可以了。
3.3 準(zhǔn)備輸入的bed文件
其實(shí)輸入的原始坐標(biāo)文件有很多種類型都能接受如bed咖摹、bam评姨、wig、GFF/GTF萤晴、VCF吐句、maf等,常見(jiàn)的是bed文件店读,該bed文件至少包含chr,start,end 這3列嗦枢,用tab鍵分割,也可以包含更多列屯断,如strand,ref.Function等信息文虏,但最多只能有12列。
3.4 例子
python3 CrossMap.py bed hg38ToHg19.over.chain.gz in.origion.hg38.bed out.convert.hg19.bed
(1)找到剛才安裝的CrossMap.py 腳本殖演,一般在python目錄的bin中氧秘;
(2)bed 是指定輸入文件是bed類型,例如輸入一個(gè)位點(diǎn)坐標(biāo):
(3)hg38ToHg19.over.chain.gz 是剛才下載的chain文件剃氧;
(4)in.origion.hg38.bed 是輸入的原始坐標(biāo)的bed文件敏储,這里用的是3列;
(5)out.convert.hg19.bed 是輸出文件名稱朋鞍,會(huì)與輸入的bed的列數(shù)一樣已添。
需要說(shuō)明的是,如果原始坐標(biāo)轉(zhuǎn)換成新坐標(biāo)后滥酥,坐標(biāo)區(qū)間不連續(xù)更舞,則會(huì)被分割成2個(gè)或更多的區(qū)間。