首先在此感謝jimmy非常詳盡的教程 HiC數(shù)據(jù)分析實戰(zhàn)之HiC-Pro
本文為三維基因組學習筆記的第二篇突硝,主要記錄HiC-pro的安裝遇到的問題及部分實戰(zhàn)。
安裝
- 首先根據(jù)要求說明安裝依賴,可由conda安裝速勇,注意版本
- The bowtie2 mapper
- Python (>2.7, python-3 is not supported) with pysam (>=0.8.3), bx-python(>=0.5.0), numpy(>=1.8.2), and scipy(>=0.15.1)libraries
- R with the RColorBrewer and ggplot2 (>2.2.1) packages
- g++ compiler
- samtools (>1.1)
- Unix sort (which support -V option) is required ! For Mac OS user, please install the GNU core utilities !
- HiC-Pro的安裝
不在conda中的軟件代碼
$ pip install https://bitbucket.org/mirnylab/mirnylib/get/tip.tar.gz
$ pip install https://bitbucket.org/mirnylab/hiclib/get/tip.tar.gz
# hicpro的安裝
$ cd ~/biosoft/hicpro
$ cd ~/biosoft/hicpro
$ git clone https://github.com/nservant/HiC-Pro.git
$ cd HiC-Pro
# 這里要改寫配置文件(見下)
$ cat config-install.txt
$ mkdir ~/biosoft/hicpro/bin
$ make configure
$ make install
### 最后安裝的時候可能會出現(xiàn)Directory does not exit!,這可能是程序默認在home目錄下有bin這個文件夾導致懊悯,新建bin文件夾即可。最后 絕對路徑/HiC-Pro -h跳出說明即安裝成功娜亿。
SYSTEM CONFIGURATION
PREFIX | Path to installation folder |
---|---|
BOWTIE2_PATH | Full path the bowtie2 installation directory |
SAMTOOLS_PATH | Full path to the samtools installation directory |
R_PATH | Full path to the R installation directory |
PYTHON_PATH | Full path to the python installation directory (>2.7 - python3 not supported) |
CLUSTER_SYS | Scheduler to use for cluster submission. Must be TORQUE, SGE, SLURM or LSF |
運行
- 首先需要獲得消化片段的BED文件及chromosomes' size表格文件,這里需要限制酶酶切位點及參考基因組信息蚌堵。根據(jù)測試數(shù)據(jù)來源及digest_genome.py
$ /PATH/HiC-Pro-master/bin/utils/digest_genome.py -r hindiii -o Refgenome.fasta
# BED文件格式(-1)
chr1 0 16007 HIC_chr1_1 0 +
# chromosomes' size(-1)
chr1 249250621
HiC-Pro --help
usage : HiC-Pro -i INPUT -o OUTPUT -c CONFIG [-s ANALYSIS_STEP] [-p] [-h] [-v]
Use option -h|--help for more information
HiC-Pro 2.10.0
---------------
OPTIONS
-i|--input INPUT : input data folder; Must contains a folder per sample with input files
-o|--output OUTPUT : output folder
-c|--conf CONFIG : configuration file for Hi-C processing
[-p|--parallel] : if specified run HiC-Pro on a cluster
[-s|--step ANALYSIS_STEP] : run only a subset of the HiC-Pro workflow; if not specified the complete workflow is run
mapping: perform reads alignment
proc_hic: perform Hi-C filtering
quality_checks: run Hi-C quality control plots
build_contact_maps: build raw inter/intrachromosomal contact maps
ice_norm: run ICE normalization on contact maps
[-h|--help]: help
[-v|--version]: version
- 根據(jù)說明文檔买决,將 configuration file 'config-hicpro.txt' 復制到你的當前目錄,并修改吼畏;本次測試數(shù)據(jù)來源于來自于Tung B. K. Le et al. Science 2013 :https://www.ncbi.nlm.nih.gov/sra/?term=srr824846督赤,rawdata文件并不用編排,但是由于程序讀寫要求泻蚊,因此需要將數(shù)據(jù)放入獨立的文件夾中躲舌。
Put all input files in a rawdata folder. The input files have to be organized with one folder per sample, with ;
$ mkdir -p ~/data/project/hic/fq/s1/
$ cd ~/data/project/hic/fq/s1/
858M Jul 3 16:21 SRR824846_Q20L10_1.fastq.gz
857M Jul 3 16:22 SRR824846_Q20L10_2.fastq.gz
# 多個輸入文件
+ PATH_TO_MY_DATA
+ sample1
++ file1_R1.fastq.gz
++ file1_R2.fastq.gz
++ ...
+ sample2
++ file1_R1.fastq.gz
++ file1_R2.fastq.gz
*...
- 運行命令如下,jimmy在推文中運用了一系列技巧性雄,可以詳細查看學習没卸。
# 配置文件主要修改內(nèi)容
BOWTIE2_IDX_PATH = # bowtie2建立的索引所在的路徑,記住絕對路徑
REFERENCE_GENOME = # bowtie2建立的索引
GENOME_SIZE = # 一個文件記錄著參考基因組中每條序列的大小
GENOME_FRAGMENT = 消化片段的BED文件所在的路徑
LIGATION_SITE = #連接位點
# 若單個測序數(shù)據(jù)則
PAIR1_EXT = SRR824846_Q20L10_1
PAIR2_EXT = SRR824846_Q20L10_2
$ MY_INSTALL_PATH/bin/HiC-Pro -i FULL_PATH_TO_DATA_FOLDER -o FULL_PATH_TO_OUTPUTS -c MY_LOCAL_CONFIG_FILE
$ cd out
$ qsub HiCPro_step1_.sh
$ qsub HiCPro_step2_.sh
這里記錄一個問題毅贮,qsub命令報錯办悟,目前使用sh命令執(zhí)行shell腳本。
$ qsub HiCPro_step1_.sh -p 20
Unable to initialize environment because of error: cell directory "/opt/gridengine/default" doesn't exist
Exiting.
待結(jié)果出來以后滩褥,將進一步學習病蛉。
相關(guān)鏈接地址:
HiC-Pro_github
HiC-Pro: An optimized and flexible pipeline for Hi-C processing