2433個(gè)乳腺癌患者的173個(gè)基因的突變?nèi)皥D
發(fā)表于2016年的NC斑鸦,The somatic mutation profiles of 2,433 breast cancers refine their genomic and transcriptomic landscapes 可以說(shuō)后續(xù)做乳腺癌人群隊(duì)列突變研究的都需要引用這篇文章的數(shù)據(jù)結(jié)果愕贡,里面涉及到的分析要點(diǎn)也比較多,都是比較容易重現(xiàn)的巷屿。
這2433個(gè)病人固以,來(lái)自于 METABRIC 計(jì)劃,已經(jīng)有
- copy number aberration (CNA)
- gene expression
- long-term clinical follow-up
的信息嘱巾,所以這個(gè)時(shí)候再加入173個(gè)基因的捕獲測(cè)序憨琳,可以更加全面的了解乳腺癌患者。
乳腺癌具有患者間與同一患者腫瘤內(nèi)的基因組變異性浓冒。以患者間的異源性分類早期乳腺癌生物亞型栽渴,現(xiàn)在臨床對(duì)乳腺癌患者通常是觀察 morphological assessment (size, grade, lymph node status) 尖坤,或者檢查稳懒,ER,PR,HER2 等marker,目前的亞型主要是以下:
- 管腔A型(luminal A)
- 管腔B型(luminal B)
- 類正常乳腺型(normal breast-like)
- HER-2型
- 基底細(xì)胞樣(basal-like)乳腺癌。
Pereiral等通過(guò)測(cè)序2433例乳腺癌樣本的173個(gè)基因场梆,發(fā)現(xiàn)40個(gè)腫瘤抑制基因和癌基因的驅(qū)動(dòng)基因(多重驅(qū)動(dòng))墅冷,這些基因參與的生物學(xué)過(guò)程包括:
- AKT信號(hào)
- 細(xì)胞周期調(diào)節(jié)
- 染色質(zhì)功能
- DNA損傷與凋亡
- MAPK信號(hào)
- 組織架構(gòu)
- 轉(zhuǎn)錄調(diào)節(jié)
- 泛素化
并且發(fā)現(xiàn)ER+乳腺癌患者PI3K突變與不同的生存相關(guān)。
實(shí)驗(yàn)前挑選基因
挑選的173個(gè)基因或油,來(lái)自于前面的TCGA計(jì)劃寞忿,下面簡(jiǎn)單列出幾個(gè)基因:
#Supplementary Dataset 1 - Details of genes & mutations in this study
#Genes names, positions and annotation transcripts, numbers of various classs of mutations, numbers of CNAs, numbers of samples with double mutations, whether gene was included because of homozygous deletions
完整表格見(jiàn): Supplementary Data 1
HGNC_symbol | Chr | Start | End | Strand | Annotation_transcript | Number_mutations | Number_synonymous | Number_missense |
---|---|---|---|---|---|---|---|---|
ACVRL1 | 12 | 52300702 | 52317645 | + | ENST00000388922 | 72 | 7 | 12 |
AFF2 | X | 147581639 | 148082693 | + | ENST00000370460 | 296 | 28 | 40 |
AGMO | 7 | 15239443 | 15602140 | - | ENST00000342526 | 117 | 11 | 24 |
AGTR2 | X | 115301458 | 115306725 | + | ENST00000371906 | 40 | 0 | 14 |
AHNAK | 11 | 62200516 | 62314832 | - | ENST00000378024 | 387 | 82 | 237 |
AHNAK2 | 14 | 105403091 | 105445194 | - | ENST00000333244 | 878 | 322 | 524 |
AKAP9 | 7 | 91569689 | 91740487 | + | ENST00000356239 | 265 | 30 | 137 |
AKT1 | 14 | 105235187 | 105262580 | - | ENST00000554581 | 193 | 17 | 96 |
AKT2 | 19 | 40735724 | 40791765 | - | ENST00000392038 | 138 | 10 | 12 |
ALK | 2 | 29415140 | 30144932 | - | ENST00000389048 | 188 | 37 | 49 |
APC | 5 | 112042702 | 112182436 | + | ENST00000457016 | 159 | 18 | 55 |
ARID1A | 1 | 27022022 | 27109101 | + | ENST00000324856 | 243 | 39 | 57 |
ARID1B | 6 | 157098564 | 157532413 | + | ENST00000346085 | 204 | 40 | 54 |
ARID2 | 12 | 46123120 | 46302319 | + | ENST00000334344 | 159 | 29 | 36 |
ARID5B | 10 | 63660513 | 63857207 | + | ENST00000279873 | 143 | 18 | 39 |
ASXL1 | 20 | 30945647 | 31027622 | + | ENST00000375687 | 142 | 21 | 50 |
ASXL2 | 2 | 25961753 | 26101812 | - | ENST00000435504 | 128 | 13 | 42 |
somatic突變結(jié)果
大部分的分析資料都是在: Supplementary Information
純粹分析結(jié)果在 : Somatic mutation calls and ASCAT segment files for 2,433 primary tumours are available at http://github.com/cclab-brca
但是原始數(shù)據(jù)是 EGAS00001001753 需要申請(qǐng)才能下載。
突變?nèi)匀皇且? PIK3CA (coding mutations in 40.1% of the samples) and TP53 (35.4%) 為主顶岸。
其次就只有5個(gè)基因突變超過(guò)10%的樣本了腔彰,分別是:MUC16 (16.8%); AHNAK2 (16.2%); SYNE1 (12.0%); KMT2C (also known as MLL3; 11.4%) and GATA3 (11.1%) ,但是MUC16 本身的背景噪音太大辖佣,不適合二代測(cè)序這個(gè)技術(shù)霹抛。**
病理性的germline突變情況
還是那些出名的基因作者就拿出來(lái)說(shuō)了說(shuō):
- BRCA1 and BRCA2 were identified in 1.36% and 1.64% of the cohort, respectively
- 2.22% of tumours harboured pathogenic CHEK2germline mutations.
- TP53 pathogenic germline mutations were found in 0.82% of the tumours.
突變過(guò)濾策略
值得注意的是: All reads with a mapping quality < 70 were removed prior to calling.
其它策略包括:
- Based on our analysis of replicates, SNVs with MuTect quality scores <6.95 were removed.
- We removed those variants that overlapped with repetitive regions
- Fisher’s exact test was used to identify variants exhibiting read direction bias
- SNVs present at VAFs smaller than 0.1 or at loci covered by fewer than 10 reads were removed, unless they were also present and confirmed somatic in the Catalogue of Somatic Mutations in Cancer (COSMIC).
- 刪除那些在千人基因組計(jì)劃的任意人群(AMR, ASN, AFR) 里面頻率大于1%的變異位點(diǎn)。
- We used the normal samples in our data set (normal pool) to control for both sequencing noise and germline variants, and removed any SNV observed in the normal pool (at a VAF of at least 0.1).
這些策略理論上是需要引入到自己的研究里面的卷谈。
找driver突變
使用的是: Vogelstein et al.16 的方法 杯拐, 定位了 40個(gè)基因 , We used a ratiometric method to identify 40 Mut-driver genes
主要是區(qū)分recurrent和inactivating的突變
其中recurrent突變包括
- nonsynonymous SNVs
- in-frame indels
- oncogene score (ONC)
而inactivating突變包括:
- frameshift indels
- nonsense SNVs
- splice site mutations
- tumour suppressor gene score (TSG)
The mutation patterns of some Mut-driver genes differed by ER status.
值得注意的是:
- Overall, 22.6% of tumours harboured a coding mutation in one of the seven Mut-driver genes involved in chromatin function (KMT2C, ARID1A, NCOR1, CTCF, KDM6A, PRBM1 and TBL1XR1).
- Of the 40 genes, 8 were independently identified as Mut-driver tumour suppressor genes using the ratiometric method described above: FOXO3, CTNNA1, FOXP1, MEN1, CHEK2 in ER+ tumours; CDKN2A, KDM6A and MLLT4 in both ER+ and ER? tumours.
探索不同突變直接的關(guān)系世蔗,互斥或者共發(fā)生
首先是somatic的SNVs的 關(guān)系端逼,如下圖:
[圖片上傳失敗...(image-b43f90-1542717772571)]
只要有了這些突變信息,比如maf格式的somatic mutations就可以用現(xiàn)成的R包污淋,比如maftools來(lái)做上圖顶滩。
然后是somatic的CNVs的關(guān)系,如下圖
[圖片上傳失敗...(image-38a60b-1542717772571)]
這個(gè)要稍微復(fù)雜一點(diǎn)寸爆,把拷貝數(shù)變異和點(diǎn)突變信息來(lái)互相聯(lián)系诲祸。
根據(jù) IntClusts 分類來(lái)看突變情況
前面的分析,都是根據(jù)ER表達(dá)情況來(lái)對(duì)兩千多個(gè)乳腺癌患者進(jìn)行分類而昨,現(xiàn)在是通過(guò)作者前面發(fā)表的 IntClusts 分類來(lái)檢查突變情況救氯,下面的這個(gè)突變?nèi)皥D是整個(gè)文章的精髓:
根據(jù) mutant-allele tumour heterogeneity (MATH) 來(lái)探索腫瘤異質(zhì)性
結(jié)論很清晰:
- ER+ tumours generally had lower MATH scores (median=0.29, IQR=0.18–0.44) than ER? tumours (median=0.41, IQR=0.25–0.56).
- Higher MATH scores were associated with worse outcome in ER+ cancers
這個(gè)分析也是被 maftools 包裝起來(lái)了,很容易在自己的數(shù)據(jù)里面復(fù)現(xiàn)這個(gè)分析點(diǎn)歌憨。
(文章轉(zhuǎn)自jimmy的2018年閱讀文獻(xiàn)筆記)
生信基礎(chǔ)知識(shí)大全系列:生信基礎(chǔ)知識(shí)100講
史上最強(qiáng)的生信自學(xué)環(huán)境準(zhǔn)備課來(lái)啦W藕! 7次改版务嫡,11節(jié)課程甲抖,14K的講稿,30個(gè)夜晚打磨心铃,100頁(yè)P(yáng)PT的課程准谚。
如果需要組裝自己的服務(wù)器;代辦生物信息學(xué)服務(wù)器
如果需要幫忙下載海外數(shù)據(jù)(GEO/TCGA/GTEx等等)去扣,點(diǎn)我柱衔?
如果需要線下輔導(dǎo)及培訓(xùn),看招學(xué)徒
如果需要個(gè)人電腦:個(gè)人計(jì)算機(jī)推薦
如果需要置辦生物信息學(xué)書籍,看:生信人必備書單
如果需要實(shí)習(xí)崗位:實(shí)習(xí)職位發(fā)布
如果需要售后:點(diǎn)我
如果需要入門資料大全:點(diǎn)我