問(wèn)題1:我們應(yīng)該用哪個(gè)bed?
target vs bait BED:
對(duì)于雜交捕獲:
-- the targeted regions (or “primary targets”) :指探針設(shè)計(jì)的理論區(qū)域偏螺,例如感興趣的基因外顯子區(qū)域
-- The baited regions (or “capture targets”) :指探針實(shí)際捕獲的區(qū)域行疏,通常也包括bed區(qū)間兩邊大約50bp的范圍
cnvkit需要 the bait/capture BED file
target
準(zhǔn)備my_targets.bed文件
cnvkit.py target my_baits.bed --annotate refFlat.txt --split -o my_targets.bed
1.fix的使用
Combine the uncorrected target and antitarget coverage tables (.cnn) and correct for biases in regional coverage and GC content, according to the given reference. Output a table of copy number ratios (.cnr).
cnvkit.py fix Sample.targetcoverage.cnn Sample.antitargetcoverage.cnn Reference.cnn -o Sample.cnr
2.segment
Infer discrete copy number segments from the given coverage table:
cnvkit.py segment Sample.cnr -o Sample.cns
3.call
Given segmented log2 ratio estimates (.cns), derive each segment’s absolute integer copy number using either:
A list of threshold log2 values for each copy number state (-m threshold
), or rescaling - for a given known tumor cell fraction and normal ploidy, then simple rounding to the nearest integer copy number (-m clonal
).
cnvkit.py call Sample.cns -y -m threshold -t=-1.1,-0.4,0.3,0.7 -o Sample.call.cns
cnvkit.py call Sample.cns -y -m clonal --purity 0.65 -o Sample.call.cns
Target and antitarget bin-level coverages (.cnn)
Chromosome or reference sequence name (chromosome)
:染色體的名稱(chēng)
Start position (start):起始位置
End position (end):終止位置
Gene name (gene):基因名稱(chēng)
Log2 mean coverage depth (log2):log2 平均覆蓋深度
Absolute-scale mean coverage depth (depth):
chromosome start end gene depth log2
chr1 69069 69309 OR4F5 280.079 8.12969
chr1 69309 69549 OR4F5 264.517 8.04721
chr1 69549 69789 OR4F5 248.579 7.95756
chr1 69789 70029 OR4F5 261.962 8.03322
Bin-level log2 ratios (.cnr)
weight
:權(quán)重比例或者可靠性
chromosome start end gene log2 depth weight
chr1 69069 69309 OR4F5 0.220677 280.079 0.542821
chr1 69309 69549 OR4F5 0.213013 264.517 0.557108
chr1 69549 69789 OR4F5 -0.0232971 248.579 0.548714
Segmented log2 ratios (.cns)
probes
:indicating the number of bins covered by the segment
chromosome start end gene log2 depth probes weight
chr1 148009310 148021662 NBPF19,LOC100996740,NBPF26 -0.619849 267.693 12 4.71205
chr2 86343627 86371817 PTCD3,IMMT -0.405761 35.5856 22 10.1462
chr2 179528335 179549158 MIR548N,TTN,TTN 0.312921 82.0201 40 20.673
詳情請(qǐng)參考:https://cnvkit.readthedocs.io/en/stable/quickstart.html
call.cns
chromosome start end gene log2 cn depth probes weight
chr1 148009310 148021662 NBPF19,LOC100996740,NBPF26 -0.619849 1 267.693 12 4.71205
chr2 86343627 86371817 PTCD3,IMMT -0.405761 2 35.5856 22 10.1462
chr2 179528335 179549158 MIR548N,TTN,TTN 0.312921 2 82.0201 40 20.673
call.cnr
chromosome start end gene log2 cn depth weight
chr1 69069 69309 OR4F5 0.220677 2 280.079 0.542821
chr1 69309 69549 OR4F5 0.213013 2 264.517 0.557108
chr1 69549 69789 OR4F5 -0.0232971 2 248.579 0.548714
chr1 69789 70029 OR4F5 -0.0932431 2 261.962 0.47815