gatk4 mutect2 分析somatic突變


第一步:創(chuàng)建normal樣品的PoN(Create a sites-only PoN with CreateSomaticPanelOfNormals)

To create a panel of normals (PoN), call on each normal sample as if a tumor sample. Then use CreateSomaticPanelOfNormals to output a PoN of germline and artifactual sites. This contrasts with the GATK3 workflow, which uses CombineVariants to retain variant sites called in at least two samples and then uses Picard MakeSitesOnlyVcf to simplify the callset for use as a PoN.

參考:https://software.broadinstitute.org/gatk/documentation/article?id=11136

First, run Mutect2??Mutect2?in?tumor-only mode?on each normal sample.?In?tumor-only mode, a single case sample is analyzed with the?-tumor?flag?without?an accompanying matched control?-normal?sample. For the tutorial, we run this command only for sample HG00190.

gatk Mutect2 \

-R ~/Documents/ref/hg38/Homo_sapiens_assembly38.fasta \

-I HG00190.bam \

-tumor HG00190 \

--disable-read-filter MateOnSameContigOrNoMappedMateReadFilter \

-L chr17plus.interval_list \

--germline-resource?af-only-gnomad.raw.sites.b37.vcf.gz \

-O 3_HG00190.vcf.gz

Second, collate all the normal VCFs into a single callset with?CreateSomaticPanelOfNormals.?For the tutorial, to illustrate the step with small data, we run this command on three normal sample VCFs. The general recommendation for panel of normals is?a minimum of forty samples.

gatk CreateSomaticPanelOfNormals \

-vcfs 3_HG00190.vcf.gz \

-vcfs 4_NA19771.vcf.gz \

-vcfs 5_HG02759.vcf.gz \

--min-sample-count 2 \

-O 6_threesamplepon.vcf.gz

結(jié)果如下:

第二步:call somatic short variants and indel

1仅政、Tumor with matched normal

????????Given a matched normal, Mutect2 is designed to call somatic variants only. The tool includes logic to skip emitting variants that are clearly present in the germline based on the evidence present in the matched normal. This is done at an early stage to avoid spending computational resources on germline events. If the variant's germline status is borderline, then Mutect2 will emit the variant to the callset with a germline-risk filter. Such filtered emissions enable manual review.

gatk-launch --javaOptions "-Xmx4g" Mutect2 \

? -R ref_fasta.fa \

? -I tumor.bam \

? -tumor tumor_sample_name \

? -I normal.bam \

? -normal normal_sample_name \

? --germline_resource af-only-gnomad.vcf.gz \

? --normal_panel pon.vcf.gz \

? -L intervals.list \

? -O tumor_matched_m2_snvs_indels.vcf.gz

2殉挽、Single tumor sample

? gatk-launch --javaOptions "-Xmx4g" Mutect2 \

? -R ref_fasta.fa \

? -I tumor.bam \

? -tumor tumor_sample_name \

? --germline_resource af-only-gnomad.vcf.gz \

? --normal_panel pon.vcf.gz \

? -L intervals.list \

? -O tumor_unmatched_m2_snvs_indels.vcf.gz

第三步:? Estimate cross-sample contamination using GetPileupSummaries and CalculateContamination.

First, run?GetPileupSummaries?on the tumor BAM to summarize read support for a set number of known variant sites.Use a population germline resource containing only common biallelic variants, e.g. subset by using?SelectVariants?--restrict-alleles-to BIALLELIC, as well as population?AF?allele frequencies in the INFO field [4]. The tool tabulates read counts that support reference, alternate and other alleles for the sites in the resource.

gatk GetPileupSummaries \

-I tumor.bam \

-V small_exac_common_3_b37.vcf.gz \

-O 7_tumor_getpileupsummaries.table

This produces a six-column table as shown. The?alt_count?is the count of reads that support the ALT allele in the germline resource. The?allele_frequency?corresponds to that given in the germline resource. Counts for?other_alt_count?refer to reads that support all other alleles.


Second, estimate contamination with?CalculateContamination.?The tool takes the summary table from GetPileupSummaries and gives the fraction contamination. This estimation informs downstream filtering by FilterMutectCalls.

gatk CalculateContamination? -I 7_tumor_getpileupsummaries.table -O 8_tumor_calculatecontamination.table

or?

gatk CalculateContamination -I 7_tumor_getpileupsummaries.table -matched normal.getpileupsummaries.table -O 8_tumor_calculatecontamination.table

This produces a table with estimates for contamination and error. The estimate for the full tumor sample is shown below and gives a contamination fraction of 0.0205. Going forward, we know to suspect calls with less than ~2% alternate allele fraction.


第四:?Filter for confident somatic calls using FilterMutectCalls

FilterMutectCalls determines whether a call is a confident somatic call. The tool uses the annotations within the callset and applies preset thresholds that are tuned for human somatic analyses.

Filter the Mutect2 callset with FilterMutectCalls.?Here we use the full callset,?somatic_m2.vcf.gz. To activate filtering based on the contamination estimate, provide the contamination table with?--contamination-table. In GATK v4.0.0.0, the tool uses the contamination estimate as a hard cutoff.

gatk FilterMutectCalls \

-V somatic_m2.vcf.gz \

--contamination-table tumor_calculatecontamination.table \

-O 9_somatic_oncefiltered.vcf.gz

This produces a VCF callset?9_somatic_oncefiltered.vcf.gz?and index. Calls that are likely true positives get the PASS label in the FILTER field, and calls that are likely false positives are labeled with the reason(s) for filtering in the FILTER field of the VCF. We can view the available filters in the VCF header using?grep '##FILTER'.


5. (Optional) Estimate artifacts with CollectSequencingArtifactMetrics and filter them with FilterByOrientationBias

FilterByOrientationBias allows filtering based on sequence context artifacts, e.g. OxoG and FFPE. This step is optional and if employed, should always be performed?after?filtering with FilterMutectCalls. The tool requires the?pre_adapter_detail_metricsfrom Picard CollectSequencingArtifactMetrics.

First, collect metrics on sequence context artifacts with?CollectSequencingArtifactMetrics.?The tool categorizes these as those that occur before hybrid selection (preadapter) and those that occur during hybrid selection (baitbias). Results provide a global view across the genome that empowers decision making in ways that site-specific analyses cannot. The metrics can help decide whether to consider downstream filtering.

gatk CollectSequencingArtifactMetrics \

-I TESTMOS18080001.recal.bam \

-O TESTMOS18080001.artifact \

--FILE_EXTENSION ".txt" \

-R b37.fa

Alternatively, use the tool from a standalone Picard jar.

java -jar picard.jar \

CollectSequencingArtifactMetrics \

I=tumor.bam \

O=10_tumor_artifact \

FILE_EXTENSION=.txt \

R=~/Documents/ref/hg38/Homo_sapiens_assembly38.fasta

This generates five metrics files, including?pre_adapter_detail_metrics, which contains counts that FilterByOrientationBias uses. Below are the summary?pre_adapter_summary_metrics?for the full data. Our samples were not from FFPE so we do not expect this artifact. However, it appears that we could have some OxoG transversions.


Second, perform orientation bias filtering with?FilterByOrientationBias.?We provide the tool with the once-filtered calls?9_somatic_oncefiltered.vcf.gz, the?pre_adapter_detail_metrics?file and the sequencing contexts for FFPE (C→T transition) and OxoG (G→T transversion). The tool knows to include the reverse complement contexts.

gatk FilterByOrientationBias \

-A G/T \

-A C/T \

-V 9_somatic_oncefiltered.vcf.gz \

-P tumor_artifact.pre_adapter_detail_metrics.txt \

-O 11_somatic_twicefiltered.vcf.gz

This produces a VCF?11_somatic_twicefiltered.vcf.gz, index and summary?11_somatic_twicefiltered.vcf.gz.summary. In the summary, we see the number of calls for the sequence context and the number of those that the tool filters.


In the VCF header, we see the addition of the 15th filter,?orientation_bias, which the tool applies to 56 calls. All 56 of these calls were previously PASS sites, i.e. unfiltered. We now have 673 passing calls out of 3,695 total calls.

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市,隨后出現(xiàn)的幾起案子妻味,更是在濱河造成了極大的恐慌饵沧,老刑警劉巖,帶你破解...
    沈念sama閱讀 218,682評論 6 507
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場離奇詭異匿刮,居然都是意外死亡,警方通過查閱死者的電腦和手機探颈,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 93,277評論 3 395
  • 文/潘曉璐 我一進店門熟丸,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人伪节,你說我怎么就攤上這事光羞。” “怎么了怀大?”我有些...
    開封第一講書人閱讀 165,083評論 0 355
  • 文/不壞的土叔 我叫張陵狞山,是天一觀的道長。 經(jīng)常有香客問我叉寂,道長,這世上最難降的妖魔是什么总珠? 我笑而不...
    開封第一講書人閱讀 58,763評論 1 295
  • 正文 為了忘掉前任屏鳍,我火速辦了婚禮,結(jié)果婚禮上局服,老公的妹妹穿的比我還像新娘钓瞭。我一直安慰自己,他們只是感情好淫奔,可當(dāng)我...
    茶點故事閱讀 67,785評論 6 392
  • 文/花漫 我一把揭開白布山涡。 她就那樣靜靜地躺著,像睡著了一般唆迁。 火紅的嫁衣襯著肌膚如雪鸭丛。 梳的紋絲不亂的頭發(fā)上,一...
    開封第一講書人閱讀 51,624評論 1 305
  • 那天唐责,我揣著相機與錄音鳞溉,去河邊找鬼。 笑死鼠哥,一個胖子當(dāng)著我的面吹牛熟菲,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播朴恳,決...
    沈念sama閱讀 40,358評論 3 418
  • 文/蒼蘭香墨 我猛地睜開眼抄罕,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了于颖?” 一聲冷哼從身側(cè)響起呆贿,我...
    開封第一講書人閱讀 39,261評論 0 276
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎恍飘,沒想到半個月后榨崩,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體谴垫,經(jīng)...
    沈念sama閱讀 45,722評論 1 315
  • 正文 獨居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 37,900評論 3 336
  • 正文 我和宋清朗相戀三年母蛛,在試婚紗的時候發(fā)現(xiàn)自己被綠了翩剪。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 40,030評論 1 350
  • 序言:一個原本活蹦亂跳的男人離奇死亡彩郊,死狀恐怖前弯,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情秫逝,我是刑警寧澤恕出,帶...
    沈念sama閱讀 35,737評論 5 346
  • 正文 年R本政府宣布,位于F島的核電站违帆,受9級特大地震影響浙巫,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜刷后,卻給世界環(huán)境...
    茶點故事閱讀 41,360評論 3 330
  • 文/蒙蒙 一的畴、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧尝胆,春花似錦丧裁、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,941評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至贪染,卻和暖如春缓呛,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背杭隙。 一陣腳步聲響...
    開封第一講書人閱讀 33,057評論 1 270
  • 我被黑心中介騙來泰國打工强经, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留,地道東北人寺渗。 一個月前我還...
    沈念sama閱讀 48,237評論 3 371
  • 正文 我出身青樓匿情,卻偏偏與公主長得像,于是被迫代替她去往敵國和親信殊。 傳聞我的和親對象是個殘疾皇子炬称,可洞房花燭夜當(dāng)晚...
    茶點故事閱讀 44,976評論 2 355

推薦閱讀更多精彩內(nèi)容