MCMC-PurBayes

Related Knowledge

異質性

  • 腫瘤的異質性是惡性腫瘤的特征之一癞蚕,是指腫瘤在生長過程中桦山,經(jīng)過多次分裂增殖醋旦,其子細胞呈現(xiàn)出分子生物學或基因方面的改變浑度,從而使腫瘤的生長速度鸦概、侵襲能力窗市、對藥物的敏感性、預后等各方面產(chǎn)生差異论熙。簡單點說就是同一腫瘤中可以存在有很多不同的基因型或者亞型的細胞脓诡。因此同一種腫瘤在不同的個體身上可表現(xiàn)出不一樣的治療效果及預后媒役,甚至同一個體身上的腫瘤細胞也存在不同的特性和差異酣衷。
  • 腫瘤的異質性是指腫瘤組織內(nèi)部不同的腫瘤細胞或者亞群中體細胞突變不完全相同穿仪。

腫瘤純度

  • 腫瘤樣本中癌細胞總是混合一定未知比例的正常細胞,我們稱腫瘤樣本中癌細胞所占的比例為腫瘤純度(Tumor purity)只锻。

SNV

  • SNV是基因組上單個堿基發(fā)生改變的位點齐饮,在基因組上廣泛分布碴里。

Abstract

  • PurBayes,to estimate tumor purity and detect intratumor heterogeneity based on next-generation sequencing data of paired tumor-normal tissue samples, which uses finite mixture modeling methods.

  • PurBayes咬腋,基于使用有限混合物建模方法的成對腫瘤 - 正常組織樣本下一代測序數(shù)據(jù)(NGS)估計腫瘤純度檢測腫瘤內(nèi)異質性根竿。

introduction

  • With advances in high-throughput next-generation sequencing (NGS) technologies, sequencing of tumor-normal tissue pairs is becoming commonplace in cancer studies. Often, the sampled tumor tissue is contaminated with stromal cells, resulting in a mixture of tumor and normal sequence data in the tumor sample. There has been a recent interest in accurate estimation of tumor purity levels in tumor data analysis, including methods specific to NGS data such as PurityEst.

  • 隨著高通量新一代測序(NGS)技術的進步寇壳,腫瘤-正常組織對的測序在癌癥研究中變得普遍。 通常泞歉,取樣的腫瘤組織被基質細胞污染匿辩,導致腫瘤樣品中腫瘤和正常序列數(shù)據(jù)的混合铲球。 最近人們對腫瘤數(shù)據(jù)分析中腫瘤純度水平的準確估計感興趣,包括NGS數(shù)據(jù)特有的方法选侨,如PurityEst援制。

  • However, a subset of the observed somatic mutations may be subclonal because of intratumor heterogeneity . Unlike clonal mutations, which are observed tumor-wide, subclonal mutations will be observed at cellularities less than the tumor purity level and subsequently bias purity estimates under an assumption of tumor tissue homogeneity. By modeling this heterogeneity, it may also be possible to make inferences about tumor evolution and founder events. To date there are no methods that aim to both quantify tumor purity and detect intratumor heterogeneity using NGS data.

  • 然而隘谣,由于腫瘤內(nèi)異質性啄巧,觀察到的體細胞突變的子集可能是亞克隆的秩仆。 與在腫瘤范圍內(nèi)觀察到的克隆突變不同,將在低于腫瘤純度水平的細胞系中觀察到亞克隆突變噪珊,并且隨后在腫瘤組織同質性的假設下偏向純度估計痢站。 通過對這種異質性進行建模选酗,也可以對腫瘤進化和創(chuàng)始事件做出推論芒填。 迄今為止,沒有任何方法旨在使用NGS數(shù)據(jù)來量化腫瘤純度和檢測腫瘤內(nèi)異質性朱庆。

  • In this article, we present a Bayesian mixture modeling approach, PurBayes, toward estimating tumor purity and subclonality using NGS data, resulting in posterior distributions of tumor cellularities from which credible intervals (CI) can be derived. To illustrate its implementation, we conduct a simulation study under a variety of conditions and discuss the performance of PurBayes on synthetic data.

  • 在本文中娱颊,我們提出了貝葉斯混合物建模方法箱硕,PurBayes,使用NGS數(shù)據(jù)估計腫瘤純度和亞克隆性殖熟,得出腫瘤細胞的后驗分布菱属,從中可以得出可信區(qū)間(CI)纽门。 為了說明其實施营罢,我們在各種條件下進行了模擬研究饲漾,并討論了PurBayes在合成數(shù)據(jù)上的性能。

Methods

  • For a set of S observed heterozygous loci because of somatically acquired single-nucleotide variants (SNVs) for a given tumor sequencing sample, each SNV can be represented by respective normal and mutant allele read counts Xi and Yi. The total number of sample reads Ni = Xi + Yi can in turn be decomposed into respective tumor and normal tissue read counts Nti and Nwi , such that Ni = Nwi + Nti . As it cannot be directly determined which cell type each individual read was derived, Nti and Nwi are latent variables. If we assume Nti to be binomially distributed, such that Nti~Bin(Ni, λ) and λ indicates tumor sample purity, and Yi|Nti~Bin(Nti , 0.50), then Yi follows a binomial–binomial hierarchical mixture model with marginal distribution Yi~Bin(Ni, λ/2) .
  • 對于一組觀察到的雜合位點集合S,由于給定了腫瘤測序樣品的體細胞獲得的單核苷酸變體(SNV)勤晚,每個SNV可以由相應的正常和突變等位基因讀數(shù)Xi和Yi表示泉褐。 樣本總數(shù)寫作Ni = Xi + Yi又可以分解成各自的腫瘤和正常組織讀數(shù)Nti和Nwi膜赃,使得Ni = Nwi + Nti。 由于不能直接確定每個單獨讀取的細胞類型悠夯,Nti和Nwi是潛在變量沦补。 如果我們假設Nti是二項分布的夕膀,那么Nti~Bin(Ni美侦,λ)其中λ表示腫瘤樣本純度菠剩,并且Yi|Nti~Bin(Nti具壮,0.50),則Yi遵循二項式 - 二項式層次混合模型與邊緣分布Yi~Bin(Ni攘已,λ/ 2)怜跑。
  • Consider a tumor that exhibits intratumor heterogeneity. If we assume subclonal mutations cluster into an a priori finite number of J-1 subclonal populations, Y can be modeled under a Bayesian finite mixture model. Let Kj denote to the probability a mutation corresponds to variant population j with respective cellularity λj, for j = 1, ... , J, such that E Kj = 1, λ1<...<λj, and λj ~=λ , with uniform priors on λj. To obtain a data-driven value for J, PurBayes generates model fits iteratively by initially assuming tumor homogeneity and then increasing the subclonal population count by one until an optimal model fit is achieved under a penalized expected deviance (PED) criterion .
  • 考慮一種表現(xiàn)出腫瘤內(nèi)異質性的腫瘤峡眶。 如果我們假設亞克隆突變聚集到先驗有限數(shù)量的J-1亞克隆群體中植锉,Y可以在貝葉斯有限混合模型下建模汽煮。 令Kj表示突變對應于具有各自細胞性λj的變體群J的概率暇赤,對于j = 1,...止后,J译株,使得 epsilon Kj = 1,λ1<... <λj歉糜,并且λj ~= λ乘寒, 其中λj是均勻先驗的。 為了獲得J的數(shù)據(jù)驅動值匪补,PurBayes通過初始假設腫瘤同質性然后將亞克隆種群數(shù)增加1來迭代地生成模型擬合伞辛,直到在懲罰預期偏差(PED)標準下實現(xiàn)最佳模型擬合。
  • Mapping bias can result in non-reference alleles in heterozygous loci being mapped at rates<0.50, which would impact tumor purity estimation. PurBayes can accommodate this bias by estimating it from additional reference and alternate allele counts in heterozygous normal tissue variant calls.
  • 定位偏差可導致雜合基因座中的非參考等位基因以<0.50的速率定位夯缺,這將影響腫瘤純度估計蚤氏。 PurBayes可以通過從雜合正常組織變異調用中的額外參考和替代等位基因計數(shù)來估計它來適應這種偏差
  • PurBayes is implemented in the statistical programming language R and uses the MCMC software JAGS. The only inputs required for PurBayes are the tumor tissue read counts (N and Y) for a set of high-confidence SNVs, which can easily be derived from most variant calling software output file formats on NGS data.
  • PurBayes以統(tǒng)計編程語言R實現(xiàn)踊兜,并使用MCMC軟件JAGS。 PurBayes所需的唯一輸入是一組高可信度SNV的腫瘤組織讀數(shù)(N和Y)捏境,可以很容易地從NGS數(shù)據(jù)上的大多數(shù)變體調用軟件輸出文件格式中獲得姐呐。

Simulation

  • To illustrate the performance of PurBayes under a variety of conditions, we conducted simulation studies based on real sequencing data from the 1000 Genomes Project (details in Supplementary Materials). We first simulated read count data for homogenous tumors ranging in purity from 20–80%, with S = 100 and average sequencing depth at 50x and 100x. We ran 100 replications of each unique set of conditions and examined the PurBayes posterior median estimates. We ran similar simulations for heterogeneous tumor data with J = 2 at 100x for various values of Kj and λj to determine how well PurBayes can detect intratumor heterogeneity and estimate tumor purity. For each application, we also simulated read count data from 100 additional germ line variant calls to account for mapping bias. For purposes of comparison, we also applied the PurityEst algorithm to each simulation replicate.

  • 為了說明PurBayes在各種條件下的性能,我們基于來自1000個基因組項目(詳見補充材料)的真實測序數(shù)據(jù)進行了仿真研究典蝌。我們首先模擬了20-80%純度的同質腫瘤的計數(shù)數(shù)據(jù)曙砂,S = 100,平均測序深度分別為50x和100x骏掀。我們對每種獨特的條件進行了100次復制鸠澈,并檢查了PurBayes后中位數(shù)估計值。我們對各種Kj和λj值的異質性腫瘤數(shù)據(jù)進行了類似的仿真截驮,其中 J = 2, 100x笑陈,以確定PurBayes檢測腫瘤內(nèi)異質性和估計腫瘤純度的精度。 對于每鐘應用葵袭,我們還仿真了來自另外100個胚芽系變體調用的讀計數(shù)數(shù)據(jù)涵妥,以考慮映射偏差。為了便于比較坡锡,我們還對每個仿真迭代應用了PurityEst 算法蓬网。

  • For each application of PurBayes, the first 50000 iterations of the optimal MCMC model fit were discarded as a burn-in before posterior sampling of 10000 iterations. Mean per-sample execution time was 2 min on a workstation equipped with an Intel CoreTM i5 3.10 Ghz processor and 4GB of random access memory.

  • 對于PurBayes的每個應用,最佳MCMC模型擬合的前50000次迭代在10000次迭代的后驗取之前被丟棄作為老化鹉勒。 在配備Intel CoreTM i5 3.10 Ghz處理器和4GB隨機存取存儲器的工作站上帆锋,每個樣本的平均執(zhí)行時間為2分鐘。

Results and Discussion

  • For the homogenous tumor simulations, PurBayes correctly identified tumor homogeneity in all replications. Distributions of the posterior median estimates of tumor purity for each value of λ and method are displayed in Figure 1. Estimates from PurBayes and PurityEst were nearly identical, with a Pearson correlation of 0.9997. Both methods were accurate, tending toward overestimation at lower values of λ. When we applied PurBayes to heterogeneous data, the ability to detect heterogeneity was highly dependent on the disparity between cellularities. The proportion of clonal variants also affected detection, with larger values of K1 leading to higher mean absolute error (MAE) of the posterior median purity estimates. Although PurityEst performed comparably under certain conditions, the ability for PurBayes to detect heterogeneity generally resulted in greater estimate accuracy.
  • 對于同質腫瘤仿真禽额,PurBayes在所有重復實驗中正確識別腫瘤同質性锯厢。 圖1顯示了對每個λ值的腫瘤純度的后驗中位數(shù)估計值的分布以及方法皮官。PurBayes和PurityEst的估計值幾乎相同,Pearson相關系數(shù)為0.9997实辑。 兩種方法都是準確的捺氢,傾向于在較低的λ值下過高估計。 當我們將PurBayes應用于異質性數(shù)據(jù)時剪撬,檢測異質性的能力高度依賴于細胞之間的差異摄乒。 克隆變體的比例也影響檢測,較大的K1值導致后驗中位數(shù)純度估計時較高的平均絕對誤差(MAE)婿奔。雖然PurityEst在某些條件下表現(xiàn)相當,但PurBayes檢測異質性的能力通常會帶來更高的估計準確性问慎。
  • Our simulation results highlight the potential bias of tumor purity estimates in the presence of unaccounted intratumor heterogeneity. By simultaneously estimating tumor purity and subclonality, PurBayes may also provide additional advantages, such as facilitating inference regarding the tumor composition and evolution as well as isolation of potential founder events. As a Bayesian approach, measures of uncertainty are directly derived from the posterior distribution of J in the form of CIs.
  • 我們的仿真結果強調了在未計入腫瘤內(nèi)異質性的情況下腫瘤純度估計的潛在偏差萍摊。 通過同時估計腫瘤純度和亞克隆性,PurBayes還可以提供額外的優(yōu)勢如叼,例如促進關于腫瘤組成和進化的推斷冰木,以及潛在的創(chuàng)始事件的分離。 作為貝葉斯方法笼恰,不確定性的度量直接來自于CI的形式的J的后驗分布踊沸。
  • One possible issue in the application of PurBayes is if it estimates J to be larger than the true value because of outlier observations, which leads to a positively biased tumor purity estimate. This can be especially problematic with the existence of copy number variation (CNV) and structural rearrangements. Given that regions of CNV will result in multiplicative impact on the number of mapped reads and SNVs contained within such regions will not truly reflect heterozygosity at a proportion of 0.50, such SNVs would highly influence estimation of J. As such, we anticipate PurityEst to perform better in instances in which CNVs are present and unaccounted for in purity estimation because of its robust estimation procedures. It is thus highly recommended that regions indicated to be CNVs by parallel analyses be filtered from the estimation procedure.
  • 應用PurBayes的一個可能問題是,如果由于離群值的觀察使J的估計大于真值社证,則導致腫瘤純度估計值偏向正偏差逼龟。 對于拷貝數(shù)變異(CNV)和結構重排的存在,這可能尤其成問題追葡。 鑒于CNV區(qū)域將對映射讀數(shù)的數(shù)量產(chǎn)生倍增影響腺律,并且這些區(qū)域中包含的SNV不能真實地反映0.50的比例的雜合性,這樣的SNV將高度影響J的估計宜肉。因此匀钧,我們預期PurityEst執(zhí)行在CNV存在的情況下更好,并且由于其強大的估計程序而在純度估計中不明確谬返。 因此之斯,強烈建議從估計程序中過濾通過平行分析指示為CNV的區(qū)域。
  • We foresee a variety of extensions to the concepts in PurBayes. For example, the mixture model could be alternatively formulated to characterize tumor cellularity as a continuous distribution using semi-parametric approaches. Integration of CNV and ploidy information will also make PurBayes a more effective estimator.
  • 我們預見到對PurBayes概念的各種擴展遣铝。例如佑刷,混合型模型可以通過半?yún)?shù)化方法來描述腫瘤細胞的連續(xù)分布。CNV和倍性信息的集成也將使PurBayes成為一種更有效的估計器酿炸。
最后編輯于
?著作權歸作者所有,轉載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末项乒,一起剝皮案震驚了整個濱河市,隨后出現(xiàn)的幾起案子梁沧,更是在濱河造成了極大的恐慌檀何,老刑警劉巖,帶你破解...
    沈念sama閱讀 222,104評論 6 515
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場離奇詭異频鉴,居然都是意外死亡栓辜,警方通過查閱死者的電腦和手機,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 94,816評論 3 399
  • 文/潘曉璐 我一進店門垛孔,熙熙樓的掌柜王于貴愁眉苦臉地迎上來藕甩,“玉大人,你說我怎么就攤上這事周荐∠晾常” “怎么了?”我有些...
    開封第一講書人閱讀 168,697評論 0 360
  • 文/不壞的土叔 我叫張陵概作,是天一觀的道長腋妙。 經(jīng)常有香客問我,道長讯榕,這世上最難降的妖魔是什么骤素? 我笑而不...
    開封第一講書人閱讀 59,836評論 1 298
  • 正文 為了忘掉前任,我火速辦了婚禮愚屁,結果婚禮上济竹,老公的妹妹穿的比我還像新娘。我一直安慰自己霎槐,他們只是感情好送浊,可當我...
    茶點故事閱讀 68,851評論 6 397
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著丘跌,像睡著了一般罕袋。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上碍岔,一...
    開封第一講書人閱讀 52,441評論 1 310
  • 那天浴讯,我揣著相機與錄音,去河邊找鬼蔼啦。 笑死棋枕,一個胖子當著我的面吹牛涩僻,可吹牛的內(nèi)容都是我干的券册。 我是一名探鬼主播厘惦,決...
    沈念sama閱讀 40,992評論 3 421
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼鸵赫!你這毒婦竟也來了衣屏?” 一聲冷哼從身側響起,我...
    開封第一講書人閱讀 39,899評論 0 276
  • 序言:老撾萬榮一對情侶失蹤辩棒,失蹤者是張志新(化名)和其女友劉穎狼忱,沒想到半個月后膨疏,有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 46,457評論 1 318
  • 正文 獨居荒郊野嶺守林人離奇死亡钻弄,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 38,529評論 3 341
  • 正文 我和宋清朗相戀三年佃却,在試婚紗的時候發(fā)現(xiàn)自己被綠了。 大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片窘俺。...
    茶點故事閱讀 40,664評論 1 352
  • 序言:一個原本活蹦亂跳的男人離奇死亡饲帅,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出瘤泪,到底是詐尸還是另有隱情灶泵,我是刑警寧澤,帶...
    沈念sama閱讀 36,346評論 5 350
  • 正文 年R本政府宣布对途,位于F島的核電站赦邻,受9級特大地震影響,放射性物質發(fā)生泄漏掀宋。R本人自食惡果不足惜深纲,卻給世界環(huán)境...
    茶點故事閱讀 42,025評論 3 334
  • 文/蒙蒙 一仲锄、第九天 我趴在偏房一處隱蔽的房頂上張望劲妙。 院中可真熱鬧,春花似錦儒喊、人聲如沸镣奋。這莊子的主人今日做“春日...
    開封第一講書人閱讀 32,511評論 0 24
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽侨颈。三九已至,卻和暖如春芯义,著一層夾襖步出監(jiān)牢的瞬間哈垢,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 33,611評論 1 272
  • 我被黑心中介騙來泰國打工扛拨, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留耘分,地道東北人。 一個月前我還...
    沈念sama閱讀 49,081評論 3 377
  • 正文 我出身青樓绑警,卻偏偏與公主長得像求泰,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個殘疾皇子计盒,可洞房花燭夜當晚...
    茶點故事閱讀 45,675評論 2 359

推薦閱讀更多精彩內(nèi)容