EndHic
想比較HiC-Pro,EndHic的安裝就簡(jiǎn)單很多蛛株,就是下載即可用
EndHiC的安裝
git clone git@github.com:fanagislab/EndHiC.git
要用到的腳本都在文件夾下团赁,直接調(diào)用就行
怎么使用呢?不得不說一下谨履,github上面寫的簡(jiǎn)直潦草~~
還不如直接看他給出的實(shí)例中的腳本來得直接
EndHiC的使用
給出的實(shí)例腳本
$ cat biosoft/EndHiC/z.testing_data/Arabidopsis_thalina/work.sh
##Atha.contigs.fa is generated by Hifiasm
##AthaHiC_100000_abs.bed, AthaHiC_100000.matrix, AthaHiC_100000_iced.matrix are generated by HiC-pro using Atha.contigs.fa as the reference genome
gzip -d Atha.contigs.fa.gz
##get contig length
perl ../../fastaDeal.pl -attr id:len Atha.contigs.fa > Atha.contigs.fa.len
##draw contig Hi-C heatmaps with 10*100000 (1-Mb) resolution
../../matrix2heatmap.py AthaHiC_100000_abs.bed AthaHiC_100000.matrix 10
##Run one round, when the contig assembly is quite good
perl ../../endhic.pl Atha.contigs.fa.len AthaHiC_100000_abs.bed AthaHiC_100000.matrix AthaHiC_100000_iced.matrix
ln Round_A.04.summary_and_merging_results/z.EndHiC.A.results.summary.cluster* ./
##convert cluster file to agp file
perl ../../cluster2agp.pl Round_A.04.summary_and_merging_results/z.EndHiC.A.results.summary.cluster Atha.contigs.fa.len > Atha.scaffolds.agp
##get final scaffold sequence file
perl ../../agp2fasta.pl Atha.scaffolds.agp Atha.contigs.fa > Atha.scaffolds.fa
##draw HiC heatmaps for scaffolds with 10*100000 (1-Mb) resolution
../../cluster2bed.pl AthaHiC_100000_abs.bed z.EndHiC.A.results.summary.cluster > clusterA_100000_abs.bed 2> clusterA.id.len
../../matrix2heatmap.py clusterA_100000_abs.bed AthaHiC_100000.matrix 10
##Here, Arabidopsis thalina has 5 chromosomes, and all these chromosomes can be successfully scaffolded by EndHiC
使用的數(shù)據(jù)就是我們上一步HiC-Pro輸出的數(shù)據(jù):
改良后的腳本
contig=/share/home/off/Work/Genome_assembly/Assembly/contig.fa ##contig文件欢摄,一定要和HiC-Pro中的contig保持一致
endhic_dir=/share/home/off_wenhao/biosoft/EndHiC ##EndHiC的安裝路徑
name=dlo ##物種名稱,也要和HiC-Pro設(shè)置的保持一致笋粟,也是就是hic-pro的輸出文件夾`**_outdir_new`
##get contig length
perl ${endhic_dir}/fastaDeal.pl -attr id:len ${contig} > contigs.fa.len
##draw contig Hi-C heatmaps with 10*100000 (1-Mb) resolution
hic_pro_dir=/share/home/off/Work/Genome_assembly/Assembly/08.EndHiC/01.hicprp/${name}_outdir_new/hic_results/matrix/${name}
${endhic_dir}/matrix2heatmap.py ${hic_pro_dir}/raw/100000/${name}_100000_abs.bed ${hic_pro_dir}/raw/100000/${name}_100000.matrix 10
##Run one round, when the contig assembly is quite good
perl ${endhic_dir}/endhic.pl contigs.fa.len ${hic_pro_dir}/raw/100000/${name}_100000_abs.bed ${hic_pro_dir}/raw/100000/${name}_100000.matrix ${hic_pro_dir}/iced/100000/${name}_100000_iced.matrix
ln Round_A.04.summary_and_merging_results/z.EndHiC.A.results.summary.cluster* ./
##convert cluster file to agp file
perl ${endhic_dir}/cluster2agp.pl Round_A.04.summary_and_merging_results/z.EndHiC.A.results.summary.cluster contigs.fa.len > scaffolds.agp
##get final scaffold sequence file
perl ${endhic_dir}/agp2fasta.pl scaffolds.agp ${contig} > ${name}.scaffolds.fa
##draw HiC heatmaps for scaffolds with 10*100000 (1-Mb) resolution
${endhic_dir}/cluster2bed.pl ${hic_pro_dir}/raw/100000/${name}_100000_abs.bed Round_A.04.summary_and_merging_results/z.EndHiC.A.results.summary.cluster > clusterA_100000_abs.bed 2> clusterA.id.len
${endhic_dir}/matrix2heatmap.py clusterA_100000_abs.bed ${hic_pro_dir}/raw/100000/${name}_100000.matrix 10
結(jié)果
clusterA.id.len
clusterA_100000_abs.bed
clusterA_100000_abs.bed.pdf
endhic.100000.10.iced.sh
endhic.100000.20.iced.sh
endhic.100000.5.iced.sh
endhic.100000.10.raw.sh
endhic.100000.20.raw.sh
endhic.100000.5.raw.sh
endhic.100000.15.raw.sh
endhic.100000.25.raw.sh
endhic.Round_A.sh
endhic.100000.15.iced.sh
endhic.100000.25.iced.sh
endhic.log
EndHic.sh
dlo.scaffolds.fa
Round_A.01.contig_end_contact_results/
Round_A.02.GFA_contig_graph_results/
Round_A.03.cluster_order_orient_results/
Round_A.04.summary_and_merging_results/
scaffolds.agp
contigs.fa.len
z.EndHiC.A.results.summary.cluster
z.EndHiC.A.results.summary.cluster.GFA.v1.2.GFA
z.EndHiC.A.results.summary.cluster.GFA
文件很多怀挠,但是我們真正需要的就只有scaffolds.agp
和prefix.scaffolds.fa
兩個(gè),一個(gè)是scaffold文件害捕,一個(gè)是map文件绿淋。