4月week2文獻(xiàn)閱讀2:MOBCdb: a comprehensive database integrating multi?omics data on breast cancer for precision medicine
MOBCdb:一個(gè)綜合了乳腺癌多組學(xué)數(shù)據(jù)的精確醫(yī)學(xué)數(shù)據(jù)庫(kù)
Abstract
Background :
Breast cancer is one of the most frequently diagnosed cancers among women worldwide, characterized by diverse biological heterogeneity.
乳腺癌是世界范圍內(nèi)女性最常被診斷的癌癥之一坦报,具有多樣性的生物學(xué)異質(zhì)性臀脏。
It is well known that complex and combined gene regulation of multi-omics is involved in the occurrence and development of breast cancer.
眾所周知棺蛛,多組學(xué)中復(fù)雜和結(jié)合的基因調(diào)控參與了乳腺癌的發(fā)生發(fā)展。
Results:
In this paper, we present the Multi-Omics Breast Cancer Database (MOBCdb), a simple and easily accessible repository that integrates genomic, transcriptomic, epigenomic, clinical, and drug response data of different subtypes of breast cancer.
在本文中奠支,我們介紹了多組學(xué)乳腺癌數(shù)據(jù)庫(kù)(MOBCdb)梦重,這是一個(gè)簡(jiǎn)單易用的存儲(chǔ)庫(kù)润歉,集成了不同亞型乳腺癌的基因組璧眠、轉(zhuǎn)錄組、表觀基因組嫂粟、臨床和藥物反應(yīng)數(shù)據(jù)娇未。
MOBCdb allows users to retrieve simple nucleotide variation (SNV), gene expression, microRNA expression, DNA methylation, and specific drug response data by various search fashions.
MOBCdb允許用戶通過(guò)各種搜索方式檢索簡(jiǎn)單核苷酸變異(SNV)、基因表達(dá)赋元、microRNA表達(dá)忘蟹、DNA甲基化和特定藥物反應(yīng)數(shù)據(jù)。
The genome-wide browser /navigation facility in MOBCdb provides an interface for visualizing multi-omics data of multi-samples simultaneously.
MOBCdb中的全基因組瀏覽器/導(dǎo)航功能為同時(shí)顯示多樣本的多組數(shù)據(jù)提供了一個(gè)接口搁凸。
Furthermore, the survival module provides survival analysis for all or some of the samples by using data of three omics.
此外媚值,生存模塊利用三個(gè)組學(xué)的數(shù)據(jù)為所有或部分樣本提供生存分析。
The approved public drugs with genetic variations on breast cancer are also included in MOBCdb.
經(jīng)批準(zhǔn)的具有乳腺癌基因變異的公共藥物也包括在MOBCdb中护糖。
Conclusion: In summary, MOBCdb provides users a unique web interface to the integrated multi-omics data of different subtypes of breast cancer, which enables the users to identify potential novel biomarkers for precision medicine.
綜上所述褥芒,MOBCdb為用戶提供了一個(gè)針對(duì)不同亞型乳腺癌的綜合多組學(xué)數(shù)據(jù)的獨(dú)特web界面,使用戶能夠識(shí)別出可能用于精準(zhǔn)醫(yī)療的新型生物標(biāo)志物。
Keywords :Precision medicine · Breast cancer · Multi-omics · Genome · Transcriptome · Epigenome · Databas
關(guān)鍵詞精準(zhǔn)醫(yī)學(xué)·乳腺癌·多組學(xué)·基因組·轉(zhuǎn)錄組·表觀基因組·數(shù)據(jù)庫(kù)
Introduction
Breast cancer is the most frequently diagnosed cancer (accounting for 30% of cancer cases) and the second leading cause of cancer-related mortality (14% of cancer deaths) among women in the United States [1].
乳腺癌是美國(guó)女性中最常見(jiàn)的癌癥(占癌癥病例的30%)锰扶,也是癌癥相關(guān)死亡的第二大原因(占癌癥死亡的14%)献酗。
Moreover, the high incidence rate has also been reported in Europe and China [2, 3].
此外,歐洲和中國(guó)也報(bào)道了高發(fā)病率[2,3]坷牛。
Breast cancer is highly heterogeneous, its molecular subtypes stratified on the basis of microarray-based gene expression comprise luminal A, luminal B, HER2-enriched, basal-like, and normal-like [4].
乳腺癌具有高度異質(zhì)性罕偎,其分子亞型以微陣列基因表達(dá)為基礎(chǔ)分層,包括luminal A京闰、luminal B颜及、her2富集、basal-like和normal-like[4]蹂楣。
Adjuvant endocrine therapy and chemotherapy are extensively used for the treatment of luminal breast cancer [5].
輔助內(nèi)分泌治療和化療廣泛應(yīng)用于腔內(nèi)乳腺癌[5]的治療俏站。
Two approved HER2-targeted agents (trastuzumab and lapatinib) are used for the treatment of HER2-enriched breast cancer [6].
兩種經(jīng)批準(zhǔn)的her2靶向藥物(曲妥珠單抗和拉帕替尼)用于治療her2富集的乳腺癌[6]。
However, currently available targeted systemic therapy seems not quite effective on patients with basal-like breast cancer [7].
然而痊土,目前可用的靶向全身治療對(duì)基底樣乳腺癌[7]患者似乎并不十分有效肄扎。
With the rapid development of next-generation sequencing, massive omics data of breast cancer have been produced.
隨著下一代測(cè)序技術(shù)的迅速發(fā)展,已經(jīng)產(chǎn)生了大量的乳腺癌組學(xué)數(shù)據(jù)赁酝。
Compared to using single-omics data [4, 8], integrating multiple omics is more effective in identifying specific cancer subtypes and finding novel biomarkers [9].
與使用單一組學(xué)數(shù)據(jù)相比[4,8]犯祠,整合多個(gè)組學(xué)在識(shí)別特定癌癥亞型和尋找新的生物標(biāo)志物[9]方面更有效。
Integrating multi-omics data for the identification of molecular patterns associated with a disease is popular in life science research [10, 11].
整合多組學(xué)數(shù)據(jù)來(lái)識(shí)別與疾病相關(guān)的分子模式在生命科學(xué)研究中很受歡迎[10,11]赞哗。
Some integration methods have shown significant clinical implications [9, 12].
一些整合方法已顯示出重要的臨床意義[9,12]雷则。
An integrated web interface for breast cancer data storage, retrieval, and visualization is urgent and critical to deal with the increasing amount of multi-omics breast cancer data for precise medicine applications.
一個(gè)用于乳腺癌數(shù)據(jù)存儲(chǔ)辆雾、檢索和可視化的集成web界面對(duì)于處理越來(lái)越多的用于精確醫(yī)學(xué)應(yīng)用的多組學(xué)乳腺癌數(shù)據(jù)是迫切和關(guān)鍵的肪笋。
(目前現(xiàn)狀以及多組學(xué)乳腺癌數(shù)據(jù)庫(kù)建立的必要性)
Over the past years, several databases have been developed for the storage and analysis of breast cancer data [13–17].
在過(guò)去的幾年里,已經(jīng)開(kāi)發(fā)了幾個(gè)數(shù)據(jù)庫(kù)來(lái)存儲(chǔ)和分析乳腺癌數(shù)據(jù)[13-17]度迂。
Among them, BIC [13] (Breast Cancer Information Core by National Human Genome Research Institute) is an open-access database dedicated to creating a catalog of all the reported mutations and polymorphisms in BRCA1 and BRCA2.
其中藤乙,BIC13是一個(gè)開(kāi)放存取數(shù)據(jù)庫(kù),致力于創(chuàng)建一個(gè)目錄惭墓,所有報(bào)告的突變和多態(tài)性在BRCA1和BRCA2坛梁。
It has a collection of 3416 and 2292 entries describing the genetic variants of BRCA1 and BRCA2, respectively, with detailed detection protocols and technologies.
它收集了3416和2292個(gè)條目,分別描述了BRCA1和BRCA2的遺傳變異腊凶,并提供了詳細(xì)的檢測(cè)協(xié)議和技術(shù)划咐。
BCGD [14] (Breast Cancer Gene Database by Baylor College of Medicine) is a compendium of molecular data related to 60 genes involved in breast cancer, most of the data were extracted from published biomedical research papers/articles.
BCGD [14] (Baylor College of Medicine乳腺癌基因數(shù)據(jù)庫(kù))是一個(gè)與乳腺癌相關(guān)的60個(gè)基因相關(guān)的分子數(shù)據(jù)的匯編,其中大部分?jǐn)?shù)據(jù)是從已發(fā)表的生物醫(yī)學(xué)研究論文/文章中提取的钧萍。
Both BIC and BCGD are based on published information of different biological processes at genomic level, transcriptomic level, proteomic level, etc.
BIC和BCGD均基于已發(fā)表的基因組水平褐缠、轉(zhuǎn)錄組水平、蛋白質(zhì)組水平等不同生物學(xué)過(guò)程的信息风瘦。
However, these two databases do not provide arranged raw data to users for identifying new biomarkers of breast cancer.
然而队魏,這兩個(gè)數(shù)據(jù)庫(kù)并沒(méi)有為用戶提供經(jīng)過(guò)整理的原始數(shù)據(jù)來(lái)識(shí)別新的乳腺癌生物標(biāo)志物。
Unlike BIC and BCGD, ROCK [15] is a resource of microarray gene expression data, DNA copy numbers, and RNA interferences screening data from breast cancer cell lines and tumor samples.
與BIC和BCGD不同万搔,ROCK[15]是一種來(lái)自乳腺癌細(xì)胞系和腫瘤樣本的微陣列基因表達(dá)數(shù)據(jù)胡桨、DNA拷貝數(shù)和RNA干擾篩選數(shù)據(jù)的資源官帘。
However, these different types of data were collected from different samples, which make data integration, analysis, and utilization very complicated or even impossible.
然而,這些不同類型的數(shù)據(jù)是從不同的樣本中收集的昧谊,這使得數(shù)據(jù)集成刽虹、分析和利用非常復(fù)雜,甚至是不可能的呢诬。
Another database, G2SBC [16] (Genes-to-Systems Breast Cancer) is an integrated data source of genes, transcripts, and proteins of breast cancer cell lines described in the literature.
另一個(gè)數(shù)據(jù)庫(kù)G2SBC16是文獻(xiàn)中描述的乳腺癌細(xì)胞系的基因状婶、轉(zhuǎn)錄本和蛋白質(zhì)的綜合數(shù)據(jù)源。
So we can see that the existing breast cancer databases not only have very limited amounts of data, but also are relatively incomplete, which makes them not quite suitable for precise medicine applications.
所以我們可以看到現(xiàn)有的乳腺癌數(shù)據(jù)庫(kù)不僅數(shù)據(jù)量非常有限馅巷,而且相對(duì)不完整膛虫,這使得它們不太適合精確的醫(yī)學(xué)應(yīng)用。
(目前已存數(shù)據(jù)庫(kù)的局限性:)
On the other hand, the Cancer Genome Atlas (TCGA) has collected detailed omics data and clinical records of various cancers, including breast cancer.
另一方面钓猬,癌癥基因組圖譜(TCGA)收集了包括乳腺癌在內(nèi)的各種癌癥的詳細(xì)組學(xué)數(shù)據(jù)和臨床記錄稍刀。
It is more like a primary data source even though the data were classified into from level 1 to level 3, and the users could hardly access directly to the huge raw data.
它更像是一個(gè)主數(shù)據(jù)源,雖然數(shù)據(jù)被劃分為1級(jí)到3級(jí)敞曹,但是用戶很難直接訪問(wèn)這些龐大的原始數(shù)據(jù)账月。
cBioPortal [18, 19] provides a web resource for exploring, visualizing, and analyzing multidimensional TCGA data, and the interactive exploration and graphical summary of TCGA data make complex cancer genomics profiles accessible.
cBioPortal[18,19]提供了一個(gè)用于探索、可視化和分析多維TCGA數(shù)據(jù)的web資源澳迫,對(duì)TCGA數(shù)據(jù)的交互式探索和圖形化總結(jié)使復(fù)雜的癌癥基因組學(xué)概要文件變得可訪問(wèn)局齿。
However, cBioPortal lacks genome browser and analysis tools for integrated multi-omics, as well as targeted drug information.
然而,cBioPortal缺乏用于整合多組學(xué)和靶向藥物信息的基因組瀏覽器和分析工具橄登。
To support precise medicine application to breast cancer, we constructed the Multi-Omics Breast Cancer Database (MOBCdb in short).
為了支持乳腺癌的精準(zhǔn)用藥抓歼,我們構(gòu)建了乳腺癌多組學(xué)數(shù)據(jù)庫(kù)(MOBCdb)。
MOBCdb extracted SNV, gene expression, microRNA expression, DNA methylation data, and clinical records from TCGA, and integrated these data with breast cancer-related drugs from pharmGKB [20] under a web interface.
MOBCdb從TCGA中提取SNV拢锹、基因表達(dá)谣妻、microRNA表達(dá)、DNA甲基化數(shù)據(jù)和臨床記錄卒稳,并將這些數(shù)據(jù)與藥物基因?qū)W知識(shí)庫(kù)[20]中的乳腺癌相關(guān)藥物在web界面下整合蹋半。
Besides, MOBCdb modified the JBrowse genome browser [21] to support the visualization of multi-omics data on genome-wide at the same time.
此外,MOBCdb還對(duì)JBrowse基因組瀏覽器[21]進(jìn)行了修改充坑,支持同時(shí)在全基因組上顯示多組數(shù)據(jù)减江。
Furthermore, MOBCdb has a survival analysis tool that provides the Kaplan–Meier curve and statistical information relevant to clinical factors.
此外,MOBCdb有一個(gè)生存分析工具捻爷,提供Kaplan-Meier曲線和與臨床因素相關(guān)的統(tǒng)計(jì)信息辈灼。
Last but not least, MOBCdb collected drug data for specific genetic variants, which may be essential to the interpretation of biological omics data for clinical implications.
最后但并非最不重要的是,MOBCdb收集了特定基因變異的藥物數(shù)據(jù)役衡,這可能對(duì)解釋具有臨床意義的生物組學(xué)數(shù)據(jù)至關(guān)重要茵休。
Currently, MOBCdb is freely available at the URL:http://bigd.big.ac.cn/MOBCdb/.
目前,MOBCdb可以通過(guò)以下URL免費(fèi)獲得:http://bigd.big.ac.cn/MOBCdb/。
(MOBCdb數(shù)據(jù)來(lái)源以及數(shù)據(jù)庫(kù)特性)
MOBCdb: an overview
MOBCdb:概述
SNV, gene expression, microRNA expression, DNA methylation, and clinical data of breast cancer were extracted from the level 3 datasets of TCGA portal (https ://cance rgeno me.nih.gov/).
從TCGA門戶網(wǎng)站的三級(jí)數(shù)據(jù)集(https://cancergenome.nih.gov/)中提取乳腺癌的SNV榕莺、基因表達(dá)俐芯、microRNA表達(dá)、DNA甲基化和臨床數(shù)據(jù)钉鸯。
More than 10,000 files were stored in MOBCdb.
MOBCdb中存儲(chǔ)了超過(guò)10,000個(gè)文件吧史。
The details regarding the number of samples and data types are shown in Supplementary Table 1.
關(guān)于樣本數(shù)量和數(shù)據(jù)類型的詳細(xì)信息見(jiàn)補(bǔ)充表1。
Access to the protected data on SNV is authorized, and the annotations of SNV data can be utilized by annovar [22] with build hg38.
授權(quán)對(duì)SNV上受保護(hù)數(shù)據(jù)的訪問(wèn)唠雕,annovar[22]使用build hg38可以使用SNV數(shù)據(jù)的注釋贸营。
The basic gene annotation file was downloaded from GENECODE and NCBI.
基本基因注釋文件從GENECODE和NCBI下載。
The microRNA annotation file was downloaded from miRBase version 21 [23].
microRNA注釋文件是從miRBase version 21[23]下載的岩睁。
Information regarding the correlation of drugs with specific genetic variants was gathered from PharmGKB [20].
從藥物基因?qū)W知識(shí)庫(kù)[20]中收集有關(guān)藥物與特定遺傳變異之間關(guān)系的信息钞脂。
(MOBCdb的信息來(lái)源)
To process the SNV data, four tools including MuSE, MuTect2, SomaticSniper, and VarScan2 were equipped.
為了處理SNV數(shù)據(jù),我們配備了MuSE捕儒、MuTect2冰啃、SomaticSniper和VarScan2四個(gè)工具。
As the results from these tools vary a lot, we dealt with each SNV by using from one to four tools.
由于這些工具的結(jié)果差異很大刘莹,我們使用1到4個(gè)工具來(lái)處理每個(gè)SNV阎毅。
There are 41355226 records in SNV data.
SNV數(shù)據(jù)中有41355226條記錄。
The gene expression data contain three forms: Count, FPKM, and FPKM_UQ.
基因表達(dá)數(shù)據(jù)包括三種形式:Count点弯、FPKM和FPKM_UQ扇调。
The ensemble gene id was used in gene expression level3 data so that the counts of ensemble gene id in three forms are consistent.
在基因表達(dá)水平3的數(shù)據(jù)中使用了集成基因id,使三種形式的集成基因id計(jì)數(shù)一致抢肛。
The genecode v25 basic annotation (ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_25/) was used to translate ensemble id to gene name.
使用genecode v25 basic注釋(ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_25/)將集成id翻譯為基因名狼钮。
There are 43084054 records in gene expression data.
基因表達(dá)數(shù)據(jù)有43084054條記錄。
The microRNA data have two forms: read count and reads per million miRNA mapped.
microRNA數(shù)據(jù)有兩種形式:讀計(jì)數(shù)和每百萬(wàn)miRNA的讀圖雌团。
There are 2270367 records in microRNA expression data.
microRNA表達(dá)數(shù)據(jù)中有2270367條記錄燃领。
The DNA methylation data consist of two datasets: a 27 K dataset and a 450 K dataset.
DNA甲基化數(shù)據(jù)由兩個(gè)數(shù)據(jù)集組成:27k數(shù)據(jù)集和450k數(shù)據(jù)集士聪。
The former has 27578 CpG sites and the latter has 485578 CpG sites.
前者有27578個(gè)CpG站點(diǎn)锦援,后者有485578個(gè)CpG站點(diǎn)。
There are 361238969 records in the DNA methylation data.
DNA甲基化數(shù)據(jù)中有361238969條記錄剥悟。
The variant-related drug information collected from pharmGKB contains variant, pmid, drug, gene, p value, race, association, etc.
從藥物基因?qū)W知識(shí)庫(kù)中收集到的與變異體相關(guān)的藥物信息包括變異體灵寺、pmid、藥物区岗、基因略板、p值、種族慈缔、關(guān)聯(lián)等叮称。
There are 986 records in drug data.
藥品數(shù)據(jù)有986條記錄。
(各數(shù)據(jù)類型的記錄數(shù),處理)
The genome-wide browsed data were from the data mentioned above.
全基因組瀏覽數(shù)據(jù)來(lái)自上述數(shù)據(jù)瓤檐。
The browsed data include SNV, gene expression, microRNA expression, DNA methylation, and clinical information.
瀏覽的數(shù)據(jù)包括SNV赂韵、基因表達(dá)、microRNA表達(dá)挠蛉、DNA甲基化和臨床信息媒佣。
The SNV data were rearranged to VCF form with clinical factor in the last column joined by semicolon.
將SNV數(shù)據(jù)重新排列為VCF格式捐韩,最后一列用分號(hào)連接臨床因子
The gene expression and microRNA expression data were rearranged to GFF form with clinical factor in the last column joined by semicolon.
將基因表達(dá)和microRNA表達(dá)數(shù)據(jù)重新排列為GFF形式,并在最后一列中加入臨床因子分號(hào)。
The DNA methylation data were transformed to BigWig.
將DNA甲基化數(shù)據(jù)轉(zhuǎn)化為BigWig掐隐。
(各類型數(shù)據(jù)格式)
MOBCdb was implemented with Linux, Apache, MySQL, Strust, and MyBatis, which form an open source development environment for users to add new data and develop new web applications easily.
MOBCdb是使用Linux、Apache诀紊、MySQL垦垂、Strust和MyBatis實(shí)現(xiàn)的,這為用戶提供了一個(gè)開(kāi)源的開(kāi)發(fā)環(huán)境带饱,可以方便地添加新數(shù)據(jù)和開(kāi)發(fā)新的web應(yīng)用程序瞬测。
Multi-omics raw data were processed by Perl and R. The omics data were stored in a MySQL relational database system (version 5.6.19).
組學(xué)數(shù)據(jù)存儲(chǔ)在MySQL關(guān)系數(shù)據(jù)庫(kù)系統(tǒng)(5.6.19版本)中。
The omics data were organized in two forms: MySQL tables and files to facilitate efficient access for general retrieval and JBrowse.
組學(xué)數(shù)據(jù)以兩種形式組織:MySQL表和文件纠炮,以方便對(duì)普通檢索和JBrowse的高效訪問(wèn)月趟。
The computing component of Survival analysis was implemented by R. The browser-based interfaces of different omics data include various figures, which were developed with a number of web front techniques, including JavaScript, CSS, and Highchart.
生存分析的計(jì)算組件由r實(shí)現(xiàn)。不同組學(xué)數(shù)據(jù)的基于瀏覽器的接口包括各種圖形恢口,這些圖形是用JavaScript孝宗、CSS和Highchart等多種web前端技術(shù)開(kāi)發(fā)的。
(網(wǎng)站開(kāi)發(fā)架構(gòu)耕肩,工具因妇,以及數(shù)據(jù)存儲(chǔ)形式)
Multiple ways to search breast cancer omics data
The search section comprises four different components in terms of the data to be searched: somatic mutation, gene expression, miRNA expression, and DNA methylation.
搜索部分根據(jù)需要搜索的數(shù)據(jù)包括四個(gè)不同的成分:體細(xì)胞突變、基因表達(dá)猿诸、miRNA表達(dá)和DNA甲基化婚被。
MOBCdb offers multiple methods to efficiently retrieve information, including gene, miRNA, cgid, and chromosomal region.
MOBCdb提供了多種有效檢索信息的方法,包括基因梳虽、miRNA址芯、cgid和染色體區(qū)域。
MOBCdb was designed to provide a userfriendly web interface for data search.
MOBCdb旨在為數(shù)據(jù)搜索提供一個(gè)用戶友好的web界面窜觉。
For example, a dictionary of entry names (e.g., gene names) in the database was integrated into the search bar so that the users can get candidate names to conveniently form queries.
例如谷炸,將數(shù)據(jù)庫(kù)中的條目名字典(如基因名)集成到搜索欄中,以便用戶能夠獲得候選名稱禀挫,方便地形成查詢旬陡。
The results are presented as figures (the upper part) and a table (the lower part), as shown in Fig. 1, which consist of data of different dimensions.
結(jié)果以圖(上半部分)和表(下半部分)的形式表示,如圖1所示语婴,表中包含了不同維度的數(shù)據(jù)描孟。
-
The figures present statistical analysis results, while the detailed information is given in the table.
圖中為統(tǒng)計(jì)分析結(jié)果驶睦,詳細(xì)情況見(jiàn)表。
In Fig. 1a, the pie chart illustrates the BRCA1 somatic mutation status of all the samples.
在圖1a中匿醒,餅圖顯示了所有樣本的BRCA1體細(xì)胞突變狀態(tài)啥繁。
In Fig. 1b, BRCA1 expression data of tumor samples and normal samples are presented separately by histogram.
圖1b中,腫瘤樣本和正常樣本的BRCA1表達(dá)數(shù)據(jù)分別用直方圖表示青抛。
-
Figure 1c shows the methylation data of cg07054526 by line graph.
圖1c為cg07054526的甲基化數(shù)據(jù)用線形圖表示旗闽。
-
Figure 1d–f illustrates the status of ER, PR, and HER2.
圖1d-f說(shuō)明了ER、PR和HER2的狀態(tài)蜜另。
From the figures of statistical analysis results, we can see that the mutation types of BRCA1 are diverse, the expression level is relatively low and some methylation sites exhibit hypermethylation.
從統(tǒng)計(jì)分析結(jié)果圖中可以看出适室,BRCA1的突變類型多樣,表達(dá)水平相對(duì)較低举瑰,部分甲基化位點(diǎn)表現(xiàn)為高甲基化捣辆。
In Fig. 1g, the users can search, sort, copy, and download the detailed data in the table.
在圖1g中,用戶可以搜索此迅、排序汽畴、復(fù)制、下載表格中的詳細(xì)數(shù)據(jù)耸序。
The results from the four tools related to somatic mutation data are integrated and presented in one column.
將四種與體細(xì)胞突變數(shù)據(jù)相關(guān)的工具的結(jié)果整合在一欄中忍些。
The users can easily see which tool filters the search result.
用戶可以很容易地看到哪個(gè)工具過(guò)濾搜索結(jié)果。
Different types of expression data (FPKM, count, and RPM) are shown simultaneously in the table, and the 450 K/27 K methylation data are also integrated into the same column.
表中同時(shí)顯示了不同類型的表達(dá)數(shù)據(jù)(FPKM坎怪、count和RPM)罢坝,并將450 K/ 27k甲基化數(shù)據(jù)集成到同一列中。
(web 頁(yè)面展示可視化說(shuō)明)
Visualization and analysis tools to facilitate the usage of omics data
The genome-wide browser in MOBCdb was built based on JBrowse.
MOBCdb中的全基因組瀏覽器是基于JBrowse構(gòu)建的搅窿。
We redeveloped the selection box such that it is more suitable for presenting multi-omics data.
我們重新開(kāi)發(fā)了選擇框嘁酿,使其更適合顯示多組數(shù)據(jù)。
As shown in Fig. 2, the browser contains a selection box on the left panel and the corresponding visualization figure on the right panel.
如圖2所示男应,瀏覽器在左側(cè)面板中包含一個(gè)選擇框闹司,右側(cè)面板中包含相應(yīng)的可視化圖形。
The selection box has three parts, including Clinical Factors, Omics Category, and Sample, from top to bottom.
從上到下沐飘,選擇框由臨床因素游桩、組學(xué)分類、樣本三部分組成薪铜。
The users can filter their search results through clinical factors such as age, race, AJCC classification, ER, PR, and HER2 status.
用戶可以通過(guò)年齡众弓、種族、AJCC分類隔箍、ER、PR和HER2狀態(tài)等臨床因素過(guò)濾搜索結(jié)果脚乡。
Similarly, the Omics Category part contains four boxes, corresponding to SNV, gene expression, microRNA expression, and DNA methylation data.
同樣蜒滩,組學(xué)類包含四個(gè)盒子滨达,分別對(duì)應(yīng)SNV、基因表達(dá)俯艰、microRNA表達(dá)和DNA甲基化數(shù)據(jù)捡遍。
The sample part includes different TCGA sample names.
示例部分包括不同的TCGA示例名稱。
Furthermore, the Omics Category and Sample parts provide checker boxes that allow the users to choose multiple boxes at the same time.
此外竹握,組學(xué)類別和樣例部件提供了檢查框画株,允許用戶同時(shí)選擇多個(gè)框。
The checker boxes can be easily used to show multi-omics and multi-sample data, so that the users can compare the data of a certain region of many samples or different omics data in this region.
該復(fù)選框可以方便地顯示多組學(xué)和多樣本數(shù)據(jù)啦辐,使用戶可以比較該區(qū)域內(nèi)多個(gè)樣本或不同組學(xué)數(shù)據(jù)在某一區(qū)域的數(shù)據(jù)谓传。
The figure on the right panel shows a variety of data tracks and allows zooming in and out for detailed or general information.
右側(cè)面板上的圖顯示了各種數(shù)據(jù)跟蹤,并允許對(duì)詳細(xì)信息或一般信息進(jìn)行放大和縮小芹关。
By clicking a track, its detailed information, including all the collected clinical characteristics, the data of the related sample, and the description of the omics data, will be displayed in a new window.
通過(guò)點(diǎn)擊一個(gè)track续挟,它的詳細(xì)信息,包括所有收集到的臨床特征侥衬,相關(guān)樣本的數(shù)據(jù)诗祸,以及組學(xué)數(shù)據(jù)的描述,將會(huì)顯示在一個(gè)新的窗口中轴总。
By using the genome browser, the users can easily visualize different omics data of a certain sample, and compare the specified omics data across different samples in a selected subgroup.
通過(guò)使用基因組瀏覽器直颅,用戶可以方便地查看特定樣本的不同組學(xué)數(shù)據(jù),并在選定的子組中比較不同樣本之間的特定組學(xué)數(shù)據(jù)怀樟。
(組學(xué)數(shù)據(jù)可視化)
Three datasets (gene expression, microRNA expression, and DNA methylation) were used for survival analysis.
使用三個(gè)數(shù)據(jù)集(基因表達(dá)际乘、microRNA表達(dá)和DNA甲基化)進(jìn)行生存分析。
SNV was not used because more than 90% SNVs emerged in no more than three of the 1097 samples.
沒(méi)有使用SNV漂佩,因?yàn)樵?097個(gè)樣本中脖含,超過(guò)90%的SNV出現(xiàn)在不超過(guò)3個(gè)樣本中。
The users can do search by gene, microRNA, cgid or their combinations, and the samples can be filtered by four features: ER status, PR status, HER2 status, and sample type.
用戶可以通過(guò)基因投蝉、microRNA养葵、cgid或它們的組合進(jìn)行搜索,樣本可以通過(guò)ER狀態(tài)瘩缆、PR狀態(tài)关拒、HER2狀態(tài)和樣本類型四個(gè)特征進(jìn)行篩選。
As shown in Fig. 3, the results are presented in a statistical summary table with omics signature and clinical factors, and a Kaplan–Meier plot of two groups stratified by mean score obtained by Cox regression.
如圖3所示庸娱,結(jié)果以統(tǒng)計(jì)匯總表的形式呈現(xiàn)着绊,其中包含組學(xué)特征和臨床因素,并以Cox回歸得到的均分對(duì)兩組進(jìn)行分層熟尉,得到Kaplan-Meier圖归露。
The Kaplan–Meier plot shows the survival time between two groups (BRCA1, hsa-let-7a-1, cg07054526) with noticeable difference in tumor samples (Hazard ratio 1.5207, C-index = 0.5921, p value = 0.036).
Kaplan-Meier圖顯示兩組患者的生存時(shí)間(BRCA1, hsa-let-7a-1, cg07054526),腫瘤樣本差異顯著(危險(xiǎn)比1.5207,c指數(shù)= 0.5921,p值= 0.036)斤儿。
In the table, HR, C-index, p value, and clinical features (e.g., age, ER, PR, HER2 status, and lymph nodes) are presented.
表中列出了HR剧包、c指數(shù)恐锦、p值以及臨床特征(如年齡、ER疆液、PR一铅、HER2狀態(tài)、淋巴結(jié))堕油。
Drugs with variant information useful for precision medicine
Along with the understanding of diseases, target therapy becomes an important part of precision medicine.
隨著對(duì)疾病認(rèn)識(shí)的加深潘飘,靶向治療成為精準(zhǔn)醫(yī)療的重要組成部分。
Due to the large-scale GWAS study and clinical research, PharmGKB have accumulated a lot of genetic variants with drug response.
由于GWAS的大規(guī)模研究和臨床研究掉缺,藥物基因?qū)W知識(shí)庫(kù)積累了大量具有藥物反應(yīng)性的遺傳變異卜录。
We extracted 12 breast cancer-related drugs and their variant information.
我們提取了12種乳腺癌相關(guān)藥物及其變異信息。
The drugs collected were approved by the US Food and Drug Administration (FDA), European Medicines Agency (EMA), Pharmaceuticals and Medical Devices Agency, Japan (PMDA), and Health Canada (Santé Canada) (HCSC).
所收集的藥物已獲美國(guó)食品及藥物管理局(FDA)攀圈、歐洲藥物管理局(EMA)暴凑、日本藥物及醫(yī)療器械管理局(PMDA)、加拿大衛(wèi)生部(Sante Canada) (HCSC)批準(zhǔn)赘来。
The 12 drugs have a summary description about how using the drugs under specific conditions and reliability.
這12種藥物都有關(guān)于如何在特定條件下使用藥物和可靠性的概述现喳。
7 of the 12 drugs have 986 entries of information associated with variants and genes, which impact specific genotypes and may cause the increasing or decreasing of breast cancer risk.
12種藥物中有7種含有986條與變異和基因相關(guān)的信息,這些信息會(huì)影響特定的基因型犬辰,并可能導(dǎo)致乳腺癌風(fēng)險(xiǎn)的增加或減少嗦篱。
The population size, race, significance, and phenotype were also provided, as shown in Fig. 4.
還提供了種群大小、種族幌缝、顯著性和表型灸促,如圖4所示。
The information of genetic variants related to drug response is extremely useful in precision medicine, which makes applying omics data to breast cancer clinic practice be possible.
與藥物反應(yīng)相關(guān)的基因變異信息在精準(zhǔn)醫(yī)學(xué)中非常有用涵卵,這使得將組學(xué)數(shù)據(jù)應(yīng)用于乳腺癌臨床成為可能浴栽。
The users can easily issue queries on the information by rsid, gene, and drug, and upload the annotated VCF file after they perform genetic analysis.
用戶可以方便地通過(guò)rsid、gene轿偎、drug對(duì)信息進(jìn)行查詢典鸡,并在進(jìn)行遺傳分析后上傳帶注釋的VCF文件。
Discussion
With the development of next-generation sequencing technologies, the volume of multi-omics data of breast cancer has been increasing rapidly, and some related databases were built.
隨著下一代測(cè)序技術(shù)的發(fā)展坏晦,乳腺癌多組學(xué)數(shù)據(jù)量迅速增加萝玷,并建立了相關(guān)數(shù)據(jù)庫(kù)。
However, most of the existing databases focused on single-omics data to identify the potential targets [24–26].
然而昆婿,現(xiàn)有的數(shù)據(jù)庫(kù)大多側(cè)重于單一組學(xué)數(shù)據(jù)來(lái)識(shí)別潛在的目標(biāo)[24-26]球碉。
As life is a complex regulatory system, it is obvious that using single-omics data to determine effective therapeutic biomarkers has its limitations [11].
由于生命是一個(gè)復(fù)雜的調(diào)控系統(tǒng),使用單組學(xué)數(shù)據(jù)來(lái)確定有效的治療性生物標(biāo)志物顯然有其局限性仓蛆。
Instead, integrating multi-omics data for deep analysis will provide new perspectives for precision medicine.
相反睁冬,整合多組學(xué)數(shù)據(jù)進(jìn)行深度分析,將為精準(zhǔn)醫(yī)療提供新的視角多律。
Therefore, we collected large amounts of genomic, transcriptomic, epigenomic, and drug response studies data and built up the MOBCdb database, with the aim of providing integrated data and analysis tools for precise medicine.
因此痴突,我們收集了大量的基因組搂蜓、轉(zhuǎn)錄組狼荞、表觀基因組和藥物反應(yīng)研究數(shù)據(jù)辽装,建立了MOBCdb數(shù)據(jù)庫(kù),旨在為精準(zhǔn)醫(yī)療提供綜合的數(shù)據(jù)和分析工具相味。
The current implementation of MOBCdb integrates data of SNV, gene expression, microRNA expression, DNA methylation, clinic information, and targeted drug response.
目前MOBCdb的實(shí)施整合了SNV拾积、基因表達(dá)、microRNA表達(dá)丰涉、DNA甲基化拓巧、臨床信息、靶向藥物反應(yīng)等數(shù)據(jù)一死。
Each type of these data was processed with multiple tools or methods, and various search methods are provided.
這些數(shù)據(jù)的每一種類型都使用多種工具或方法進(jìn)行處理肛度,并提供了各種搜索方法。
Based on the JBrowse framework, we developed an efficient embedded web genome browser, with selection boxes for conveniently displaying multi-omics data simultaneously.
在JBrowse框架的基礎(chǔ)上投慈,開(kāi)發(fā)了一種高效的嵌入式web基因組瀏覽器承耿,該瀏覽器具有多個(gè)選擇框,可以方便地同時(shí)顯示多組數(shù)據(jù)伪煤。
The users can choose one or more samples of interest to check the specific clinic characteristics.
用戶可以選擇一個(gè)或多個(gè)感興趣的樣本來(lái)檢查特定的臨床特征加袋。
Survival analysis tools were provided to help researchers to find the prognostic biomarkers for different subtypes of breast cancer.
生存分析工具的提供,以幫助研究人員找到預(yù)后生物標(biāo)志物的不同亞型乳腺癌抱既。
MOBCdb also collected data of drug response to specific genetic variants of breast cancer.
MOBCdb還收集了乳腺癌特定基因變異的藥物反應(yīng)數(shù)據(jù)职烧。
(MOBdb 數(shù)據(jù)庫(kù)總結(jié)和優(yōu)勢(shì))
In summary, MOBCdb can serve the breast cancer research and clinic community as a valuable resource of multi-omics data and analysis.
綜上所述,MOBCdb可以為乳腺癌研究和臨床社區(qū)提供一個(gè)有價(jià)值的多組學(xué)數(shù)據(jù)和分析資源防泵。
Certainly, MOBCdb still has some limitations.
當(dāng)然蚀之,MOBCdb仍然有一些限制。
First, SNV data were obtained from exome sequencing, which cover only about 1.5% of the human genome.
首先捷泞,SNV數(shù)據(jù)是通過(guò)外顯子組測(cè)序獲得的足删,外顯子組測(cè)序僅覆蓋約1.5%的人類基因組。
Second, DNA methylation data were derived from illumina 27 K/450 K bead array, they account up to only 2% of the total CpGs in the human genome.
其次肚邢,DNA甲基化數(shù)據(jù)來(lái)源于illumina 27k / 450k珠粒陣列壹堰,它們僅占人類基因組總cpg的2%。
The low coverage of SNV and DNA methylation data will surely limit their applications.
SNV和DNA甲基化數(shù)據(jù)的低覆蓋率必將限制它們的應(yīng)用骡湖。
Third, the normal tissue samples account for about 10% of all samples and there are no data available from healthy individuals.
第三贱纠,正常組織樣本約占所有樣本的10%,沒(méi)有來(lái)自健康個(gè)體的數(shù)據(jù)响蕴。
In the future, on the one hand, we will try to include more multi-omics data from both breast cancer and control samples.
在未來(lái)谆焊,一方面,我們將嘗試包括更多來(lái)自乳腺癌和對(duì)照樣本的多組學(xué)數(shù)據(jù)浦夷。
On the other hand, we will enrich the data analysis tools to support real clinic practices.
另一方面辖试,我們將豐富數(shù)據(jù)分析工具辜王,以支持實(shí)際的臨床實(shí)踐。
(MOBCdb 的局限)
Compliance with ethical standards
遵守道德標(biāo)準(zhǔn)
Conflict of interest We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled.
利益沖突聲明,我們沒(méi)有資金和人際關(guān)系與他人或組織不當(dāng)會(huì)影響我們的工作,沒(méi)有專業(yè)的或其他任何性質(zhì)的個(gè)人利益或在任何產(chǎn)品,服務(wù)和/或公司可能被視為影響的位置,或?qū)彶?手稿資格罐孝。
詳細(xì)見(jiàn)文獻(xiàn):
MOBCdb: a comprehensive database integrating multi?omics data on breast cancer for precision medicine
https://link.springer.com/article/10.1007%2Fs10549-018-4708-z