picard 使用

https://broadinstitute.github.io/picard/picard-metric-definitions.html
https://broadinstitute.github.io/picard/index.html
picard 是一套命令行組成的工具包锥余，用于處理高通量數(shù)據(jù)以及SAM/bam/VCF等相關(guān)數(shù)據(jù)格式满哪。相關(guān)文件格式見說明 Hts-specs， SAM specification and the VCF specification.

使用方法：

java jvm-args -jar picard.jar PicardToolName OPTION1=value1 OPTION2=value2...

所有工具

1. AlignmentSummaryMetrics: 統(tǒng)計(jì)比對(duì)結(jié)果（SAM/BAM）, 由CollectAlignmentSummaryMetrics生成，結(jié)果在文件.alignment_summary_metrics中。

BaseDistributionByCycleMetrics: *
ClusteredCrosscheckMetric: 處理聚類的 crosschecking fingerprints結(jié)果*
CollectHiSeqXPfFailMetrics.PFFailDetailedMetric: a metric class for describing FP failing reads from an Illumina HiSeqX lane *
CollectHiSeqXPfFailMetrics.PFFailSummaryMetric: Metrics produced by the GetHiSeqXPFFailMetrics program.*
CollectOxoGMetrics.CpcgMetrics: Metrics class for outputs.*
CollectQualityYieldMetrics.QualityYieldMetrics: 用于描述 BAM 比對(duì)質(zhì)量的一些指標(biāo)羡亩。*
CollectRawWgsMetrics.RawWgsMetrics: *
CollectVariantCallingMetrics.VariantCallingDetailMetrics: 給定文件的 VCF 文件媒佣，與 SNP 和 Indel 相關(guān)的指標(biāo)斋扰。*
CollectVariantCallingMetrics.VariantCallingSummaryMetrics: 同上*
CollectWgsMetrics.WgsMetrics: 用于評(píng)估全基因組測(cè)序結(jié)果爷速。*
CollectWgsMetricsWithNonZeroCoverage.WgsMetricsWithNonZeroCoverage: 同上*
CrosscheckMetric: 處理 crosschecking fingerprints結(jié)果*
DuplicationMetrics: 對(duì) SAM 標(biāo)記 duplicates央星，并計(jì)算相關(guān)指標(biāo)。*
ErrorSummaryMetrics: CollectSequencingArtifactMetrics 計(jì)算的summary 指標(biāo)惫东，計(jì)算每種堿基錯(cuò)誤率莉给。*
ExtractIlluminaBarcodes.BarcodeMetric:
ExtractIlluminaBarcodes計(jì)算的指標(biāo)，分析 Basecalling 目錄下的數(shù)據(jù)廉沮，確定每個(gè)reads 和 barcode 的關(guān)系颓遏。*
FingerprintingDetailMetrics: fingerprint 內(nèi)，單個(gè) SNP/雜合體比較的詳細(xì)指標(biāo)滞时。*
FingerprintingSummaryMetrics: 總結(jié) fingerprinting 指標(biāo)叁幢，統(tǒng)計(jì)比較測(cè)序數(shù)據(jù)。*
GcBiasDetailMetrics:
Class that holds detailed metrics about reads that fall within windows of a certain GC bin on the reference genome.*
GcBiasMetrics: *
GcBiasSummaryMetrics: High level metrics that capture how biased the coverage in a certain lane is.*
GenotypeConcordanceContingencyMetrics: Class that holds metrics about the Genotype Concordance contingency tables.*
GenotypeConcordanceDetailMetrics: Class that holds detail metrics about Genotype Concordance*
GenotypeConcordanceSummaryMetrics: Class that holds summary metrics about Genotype Concordance*
HsMetrics:

Metrics generated by CollectHsMetrics for the analysis of target-capture sequencing experiments.
- IlluminaBasecallingMetrics: Metric for Illumina Basecalling that stores means and standard deviations on a per-barcode per-lane basis.*
  IlluminaLaneMetrics: Embodies characteristics that describe a lane.*
  IlluminaPhasingMetrics: Metrics for Illumina Basecalling that stores median phasing and prephasing percentages on a per-template-read, per-lane basis.*
  IndependentReplicateMetric: A class to store information relevant for biological rate estimation*
  InsertSizeMetrics: Metrics about the insert size distribution of a paired-end library, created by the CollectInsertSizeMetrics program and usually written to a file with the extension ".insert_size_metrics".*
  JumpingLibraryMetrics: High level metrics about the presence of outward- and inward-facing pairs within a SAM file generated with a jumping library, produced by the CollectJumpingLibraryMetrics program and usually stored in a file with the extension ".jump_metrics".*
  MendelianViolationMetrics: Describes the type and number of mendelian violations found within a Trio.*
  MergeableMetricBase: An extension of MetricBase that knows how to merge-by-adding fields that are appropriately annotated.*
  MultilevelMetrics: *
  RnaSeqMetrics: Metrics about the alignment of RNA-seq reads within a SAM file to genes, produced by the CollectRnaSeqMetrics program and usually stored in a file with the extension ".rna_metrics".*
  RrbsCpgDetailMetrics: Holds information about CpG sites encountered for RRBS processing QC*
  RrbsSummaryMetrics: Holds summary statistics from RRBS processing QC*
  SequencingArtifactMetrics.BaitBiasDetailMetrics: Bait bias artifacts broken down by context.*
  SequencingArtifactMetrics.BaitBiasSummaryMetrics: Summary analysis of a single bait bias artifact, also known as a reference bias artifact.*
  SequencingArtifactMetrics.PreAdapterDetailMetrics: Pre-adapter artifacts broken down by context.*
  SequencingArtifactMetrics.PreAdapterSummaryMetrics: Summary analysis of a single pre-adapter artifact.*
  TargetedPcrMetrics: Metrics class for the analysis of reads obtained from targeted pcr experiments e.g.*
  UmiMetrics: Metrics that are calculated during the process of marking duplicates within a stream of SAMRecords using the UmiAwareDuplicateSetIterator.

詳細(xì)功能

CollectHsMetrics:

分析靶向測(cè)序（hybrid-selection）的相關(guān)指標(biāo)

該命令讀取SAM/BAM文件坪稽。HS（雜交捕獲曼玩，靶向測(cè)序，hybrid-selection）是靶向測(cè)序常用的技術(shù)窒百，如外顯子靶向測(cè)序黍判，更多信息參考GATK Dictionary entry.

該命令需要
1）比對(duì)結(jié)果（SAM/BAM）
2）捕獲位點(diǎn)信息（靶向試劑盒生產(chǎn)商提供）。若捕獲位點(diǎn)是 bed 格式贝咙，則需要 BedToInterval 工具轉(zhuǎn)換為 picard 所需的 interval_list 格式样悟。
3）若有參考序列拂募，則會(huì)同時(shí)計(jì)算 AT_DROPOUT and GC_DROPOUT 指標(biāo)庭猩。
因?yàn)槟承﹨^(qū)域GC含量過多或過少，會(huì)使測(cè)序錯(cuò)誤率增加陈症，然后導(dǎo)致比對(duì)到這些區(qū)域的reads變少蔼水，即比對(duì)效率降低，覆蓋度降低录肯。

你可以使用 PER_TARGET_COVERAGE趴腋，獲取每個(gè)捕獲位點(diǎn)的GC含量和測(cè)序深度等信息。
標(biāo)為 pct 的指標(biāo)都是比例论咏。

java -jar picard.jar CollectHsMetrics \
      I=input.bam \
      O=hs_metrics.txt \
      R=reference_sequence.fasta \
      BAIT_INTERVALS=bait.interval_list \
      TARGET_INTERVALS=target.interval_list
 # BAIT_INTERVALS 可以與 TARGET_INTERVALS 相同（但我還不太明白）

bait 與 target 區(qū)別:
計(jì)算 bait coverage 時(shí)优炬，很少去除 reads，因此可以直觀感受濕實(shí)驗(yàn)效果厅贪，但是計(jì)算 target coverage 時(shí)蠢护，因?yàn)閷?duì)突變檢測(cè)的貢獻(xiàn)有限，去除了很多堿基养涮】叮可以看一下各種 PCT_EXC 指標(biāo)的描述眉抬，為什么在計(jì)算 target 時(shí)，過濾掉很多 reads 懈凹。大部分過濾條件可以通過參數(shù)調(diào)節(jié)蜀变。

詳細(xì)的結(jié)果說明查看 CollectHsMetrics
CollectHsMetrics 分析的指標(biāo)分為三類。
1 ) 基本測(cè)序指標(biāo)介评，用來計(jì)算其他指標(biāo)库北。比如基因組大小，reads 總數(shù)威沫，比對(duì)的 reads 總數(shù)贤惯。
bait_set：捕獲雜交用的 bait 名稱
bait_territory：位于一個(gè)或多個(gè) bait位點(diǎn)的堿基數(shù)量
target_territory：覆蓋在target區(qū)域 unique base數(shù)量
bait_design_efficiency：設(shè)計(jì)效率。 target_territory/bait_territory 比例棒掠。值為1 表示設(shè)計(jì)效率極好孵构，0.5表示一半 bait 堿基不在taget區(qū)域。
PF_READS：通過vendor's 過濾的reads總數(shù)烟很。
PF_BASES_ALIGNED ：通過堿基質(zhì)量控制（PF）颈墅，且比對(duì)到基因組（比對(duì)分值>0）上 unique 堿基。
on_bait_bases：比對(duì)到基因組 bait 區(qū)域的（PF_BASES_ALIGNED ）堿基數(shù)量雾袱。
genome_size
total_reads： SAM 文件中 reads 總數(shù)恤筛。
pf_reads：通過平臺(tái)/vendor 質(zhì)控的 reads 總數(shù)。
pf_bases：PF_READS 的堿基量芹橡。
pf_unique_reads：非重復(fù) reads
pf_uq_reads_aligned：比對(duì)reads中 unique 比例
pf_bases_aligned：比對(duì)上的堿基總數(shù)毒坛。
pf_uq_bases_aligned：比對(duì) reads 中 unique reads 的堿基總數(shù)
on_target_bases：比對(duì)到 target 區(qū)域的堿基總數(shù)
pct_pf_reads：下機(jī)數(shù)據(jù)中通過質(zhì)控的 reads 比例。
pct_pf_uq_reads：下機(jī)數(shù)據(jù)中通過質(zhì)控且無重復(fù)的 reads 比例
pct_pf_uq_reads_aligned：通過質(zhì)控的reads中林说，比對(duì)到reference 的無重復(fù) reads 比例

2 ) 實(shí)驗(yàn)質(zhì)量煎殷，比如比對(duì)到 bait 附近投放、內(nèi)部菠红、外部的堿基數(shù)量或比例， fold 80 堿基罰分荤懂，捕獲文庫大小珠移，捕獲罰分弓乙。在過濾之前得到這些指標(biāo)，比如低比對(duì)質(zhì)量钧惧，低質(zhì)量堿基暇韧，重復(fù)reads。
near_bait_bases：比對(duì)到 bait 附近的 reads 堿基量浓瞪。即有部分重疊懈玻。
off_bait_bases：沒有比對(duì)到 bait 區(qū)域的堿基量。
pct_selected_bases：（near_bait_bases+on_bait_bases）/PF_BASES_ALIGNED
pct_off_bait：off_bait_bases/PF_BASES_ALIGNED追逮。
on_bait_vs_selected：on-taget 中bait 完全覆蓋的比例酪刀。

fold_80_base_penalty：測(cè)序均一度指標(biāo)粹舵，非0覆蓋區(qū)域上，使80%堿基達(dá)到平均coverage時(shí)骂倘，需要另外測(cè)序的倍數(shù)眼滤。值越低越好，最好值為1历涝。

hs_library_size：被捕獲的文庫片段數(shù)量估計(jì)值
hs_penalty_10x：80% 靶向區(qū)區(qū)域堿基達(dá)到 10X時(shí)的捕獲罰分诅需。即：當(dāng)設(shè)計(jì)10M的靶向區(qū)域時(shí)，要得到 10X coverage荧库，需要測(cè)序堰塌，直到 PF_ALIGNED_BASES =10^7 * 10 * HS_PENALTY_10X.
hs_penalty_20x：想要80%區(qū)域到達(dá) 20X coverage。
hs_penalty_30x
hs_penalty_40x
hs_penalty_50x
hs_penalty_100x

3）target 覆蓋度評(píng)估分衫，評(píng)估下游分析中的可靠性场刑。比如target 區(qū)域平均覆蓋度，不同覆蓋度水平的堿基比例蚪战，不同條件過濾的堿基比例牵现。按照所有條件過濾后計(jì)算這些指標(biāo)。
mean_bait_coverage：所有 bait 位點(diǎn)上的平均覆蓋度邀桑。
pct_usable_bases_on_bait：可使用的 PF 堿基中瞎疼，比對(duì)到 bait 上的去重的堿基數(shù)量。
pct_usable_bases_on_target：可使用的 PF 堿基中壁畸，比對(duì)到 target 上的去重的堿基數(shù)量贼急。
fold_enrichment：擴(kuò)增區(qū)域被擴(kuò)增的倍數(shù)
mean_target_coverage： target 區(qū)域平均覆蓋度。
median_target_coverage：覆蓋度
max_target_coverage：覆蓋度
min_target_coverage：覆蓋度
zero_cvg_targets_pct：target 區(qū)域覆蓋度<1的比例捏萍。

不同條件過濾的堿基比例：
pct_exc_dupe：標(biāo)記為重復(fù)的 reads 太抓。
pct_exc_adapter：adapter
pct_exc_mapq：低比對(duì)質(zhì)量
pct_exc_baseq：低堿基堿基。
pct_exc_overlap：重復(fù)序列比例照弥。 the second observation from an insert with overlapping reads. 腻异？进副？这揣？
pct_exc_off_target：比對(duì)到 taget 區(qū)域外。

不同覆蓋度水平的堿基比例：
pct_target_bases_1x：比對(duì)到target 區(qū)域的影斑，不小于 1X的堿基比例
pct_target_bases_2x
pct_target_bases_10x
pct_target_bases_20x
pct_target_bases_30x
pct_target_bases_40x
pct_target_bases_50x
pct_target_bases_100x
at_dropout：與平均覆蓋度相比给赞，低堿基含量（GC<50%）的區(qū)域，偏低的程度矫户。結(jié)果是個(gè)比值片迅，表示總reads中比對(duì)到低 GC含量區(qū)域的比例。
gc_dropout：高 GC含量的區(qū)域上 reads 比例皆辽。

het_snp_sensitivity：HET SNP 理論值柑蛇。
het_snp_q：HET SNP 理論值的 Q 值芥挣，
sample
library
read_group

最后編輯于：2020.11.26 10:33:13

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個(gè)濱河市耻台，隨后出現(xiàn)的幾起案子空免，更是在濱河造成了極大的恐慌，老刑警劉巖盆耽，帶你破解...
沈念sama閱讀 218,204評(píng)論 6贊 506
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件蹋砚，死亡現(xiàn)場(chǎng)離奇詭異，居然都是意外死亡摄杂，警方通過查閱死者的電腦和手機(jī)坝咐，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 93,091評(píng)論 3贊 395
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來析恢，“玉大人墨坚，你說我怎么就攤上這事∮彻遥” “怎么了框杜？”我有些...
開封第一講書人閱讀 164,548評(píng)論 0贊 354
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵，是天一觀的道長(zhǎng)袖肥。經(jīng)常有香客問我咪辱，道長(zhǎng)，這世上最難降的妖魔是什么椎组？我笑而不...
開封第一講書人閱讀 58,657評(píng)論 1贊 293
?港島之戀（遺憾婚禮）
正文為了忘掉前任油狂，我火速辦了婚禮，結(jié)果婚禮上寸癌，老公的妹妹穿的比我還像新娘专筷。我一直安慰自己，他們只是感情好蒸苇，可當(dāng)我...
茶點(diǎn)故事閱讀 67,689評(píng)論 6贊 392
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布磷蛹。她就那樣靜靜地躺著，像睡著了一般溪烤。火紅的嫁衣襯著肌膚如雪味咳。梳的紋絲不亂的頭發(fā)上，一...
開封第一講書人閱讀 51,554評(píng)論 1贊 305
城市分裂傳說
那天檬嘀，我揣著相機(jī)與錄音槽驶，去河邊找鬼。笑死鸳兽，一個(gè)胖子當(dāng)著我的面吹牛掂铐，可吹牛的內(nèi)容都是我干的。我是一名探鬼主播，決...
沈念sama閱讀 40,302評(píng)論 3贊 418
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼全陨，長(zhǎng)吁一口氣：“原來是場(chǎng)噩夢(mèng)啊……” “哼爆班！你這毒婦竟也來了？” 一聲冷哼從身側(cè)響起辱姨，我...
開封第一講書人閱讀 39,216評(píng)論 0贊 276
萬榮殺人案實(shí)錄
序言：老撾萬榮一對(duì)情侶失蹤蛋济，失蹤者是張志新（化名）和其女友劉穎，沒想到半個(gè)月后炮叶，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體碗旅，經(jīng)...
沈念sama閱讀 45,661評(píng)論 1贊 314
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡，尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 37,851評(píng)論 3贊 336
?白月光啟示錄
正文我和宋清朗相戀三年镜悉，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了祟辟。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
茶點(diǎn)故事閱讀 39,977評(píng)論 1贊 348
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡侣肄，死狀恐怖旧困，靈堂內(nèi)的尸體忽然破棺而出，到底是詐尸還是另有隱情稼锅，我是刑警寧澤吼具，帶...
沈念sama閱讀 35,697評(píng)論 5贊 347
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布，位于F島的核電站矩距，受9級(jí)特大地震影響拗盒，放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜锥债，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 41,306評(píng)論 3贊 330
男人毒藥：我在死后第九天來索命
文/蒙蒙一陡蝇、第九天我趴在偏房一處隱蔽的房頂上張望。院中可真熱鬧哮肚，春花似錦登夫、人聲如沸。這莊子的主人今日做“春日...
開封第一講書人閱讀 31,898評(píng)論 0贊 22
一樁弒父案恼策，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽。三九已至潮剪，卻和暖如春涣楷，著一層夾襖步出監(jiān)牢的瞬間，已是汗流浹背鲁纠。一陣腳步聲響...
開封第一講書人閱讀 33,019評(píng)論 1贊 270
情欲美人皮
我被黑心中介騙來泰國打工总棵，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留鳍寂，地道東北人改含。一個(gè)月前我還...
沈念sama閱讀 48,138評(píng)論 3贊 370
代替公主和親
正文我出身青樓，卻偏偏與公主長(zhǎng)得像迄汛，于是被迫代替她去往敵國和親捍壤。傳聞我的和親對(duì)象是個(gè)殘疾皇子骤视，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 44,927評(píng)論 2贊 355

picard 使用

所有工具

詳細(xì)功能

CollectHsMetrics:

分析靶向測(cè)序（hybrid-selection）的相關(guān)指標(biāo)

推薦閱讀更多精彩內(nèi)容