https://github.com/nygenome/conpair
依賴:python闸衫、numpy排作、scipy泡挺、GATK3
numpy渐裸、scipy安裝:
sudo pip install numpy
sudo pip install scipy
GATK4無法使用,我用的3.8.
1.官方指導(dǎo)寫的是修改配置文件,但是CONPAIR_DIR和GATK_JAR都可以通過參數(shù)添加仍劈,PYTHONPATH沒有參數(shù)添加厕倍,所以我修改配置文件只添加了CONPAIR_DIR、PYTHONPATH:
sudo vi /etc/profile
export CONPAIR_DIR=/your/path/to/CONPAIR
export GATK_JAR=/your/path/to/GenomeAnalysisTK.jar
export PYTHONPATH=${PYTHONPATH}:/your/path/to/CONPAIR/modules/
2.參考基因組文件要求有三個,但是不需要都寫在--reference的后面贩疙,只寫第一個就行:
human_g1k_v37.fa
human_g1k_v37.fa.fai
human_g1k_v37.dict
3.生成pileup格式文件(Tumor和Normal兩個)
run_gatk_pileup_for_sample.py -B TUMOR_bam -O TUMOR_pileup
run_gatk_pileup_for_sample.py -B TUMOR_bam -O TUMOR_pileup
其他參數(shù):
--reference REFERENCE reference genome in the fasta format, two additional files (.fai, .dict) located in the same directory as the fasta file are required. You may choose to avoid specifying the reference by following the steps in the "default reference genome" section above.
--markers MARKERS the set of preselected genomic positions in the BED format. Default: ${CONPAIR_DIR}/data/markers/GRCh37.autosomes.phase3_shapeit2_mvncall_integrated.20130502.SNV.genotype.sselect_v4_MAF_0.4_LD_0.8.bed
--conpair_dir CONPAIR_DIR path to ${CONPAIR_DIR}
--gatk GATK path to GATK JAR [$GATK by default]
--java JAVA path to JAVA [java by default]
--temp_dir_java TEMP_DIR_JAVA java temporary directory to set -Djava.io.tmpdir
--xmx_java XMX_JAVA Xmx java memory setting [default: 12g]
主要要添加的是--reference,--gatk
--markers文件在下載包里就有讹弯,設(shè)置好配置文件的CONPAIR_DIR,沒有移動過markers文件夾位置这溅,就不用寫了组民。
4.驗證Tumor/Normal一致性
verify_concordance.py -T TUMOR_pileup -N NORMAL_pileup
Optional:
--help show help message and exit
--outfile OUTFILE write output to OUTFILE
--normal_homozygous_markers_only use only normal homozygous positions to calculate concordance between TUMOR and NORMAL
--min_cov MIN_COV require min of MIN_COV in both TUMOR and NORMAL to use the marker
--min_mapping_quality MIN_MAP_QUAL do not use reads with mapping qual below MIN_MAP_QUAL [default: 10]
--min_base_quality MIN_BASE_QUAL do not use reads with base qual below MIN_BASE_QUAL of a specified position [default: 20]
--markers MARKERS the set of preselected genomic positions in the TXT format. Default: ${CONPAIR_DIR}/data/markers/GRCh37.autosomes.phase3_shapeit2_mvncall_integrated.20130502.SNV.genotype.sselect_v4_MAF_0.4_LD_0.8.txt
官方文檔最后還寫了,考慮到CNV的影響芍躏,最好加上-H 參數(shù)邪乍,然而help里并沒有寫這個參數(shù),我加上-H試了一下,concordance 從99.18%升到了100%庇楞。
To eliminate the effect of copy number variation on the concordance levels, we recommend using the -H flag. If two samples are concordant the expected concordance level should be close to 99-100%.
For discordant samples concordance level should be close to 40%.
You can observe slighly lower concordance (80-99%) in presence of contamination and/or copy number changes (if the -H option wasn't used) in at least one of the samples.
5.評估污染等級
estimate_tumor_normal_contamination.py -T TUMOR_pileup -N NORMAL_pileup
Optional:
--help show help message and exit
--outfile OUTFILE write output to OUTFILE
--min_mapping_quality MIN_MAP_QUAL do not use reads with mapping qual below MIN_MAP_QUAL [default: 10]
--markers MARKERS the set of preselected genomic positions in the TXT format. Default: ${CONPAIR_DIR}/data/markers/GRCh37.autosomes.phase3_shapeit2_mvncall_integrated.20130502.SNV.genotype.sselect_v4_MAF_0.4_LD_0.8.txt
--conpair_dir CONPAIR_DIR path to ${CONPAIR_DIR}
--grid GRID grid interval [default: 0.01]
Even a very low contamination level (such as 0.5%) in the tumor sample will have a severe effect on calling somatic mutations, resulting in decreased specificity. Cross-individual contamination in the normal sample usually has a milder effect on somatic calling.