語雀:左手柳葉刀右手炭火燒
微信公眾號:研平方 | 簡書:研平方
關(guān)注可了解更多的科研教程及技巧碗降。如有問題或建議,請留言塘秦。
歡迎關(guān)注我:一起學(xué)習(xí)讼渊,一起進(jìn)步!
最近嗤形,小編“掃蕩”文獻(xiàn)時精偿,發(fā)現(xiàn)一個令我十分感興趣的應(yīng)用,利用文本文本挖掘技術(shù)可以評估選定基因與癌癥之間的關(guān)聯(lián)赋兵。提到文本挖掘這類技術(shù)笔咽,小編當(dāng)然要一探究竟了。
1.原文如下
Literature evidence for the identified target genes in cancer
We used OncoScore, a text mining tool to assess the associations between each gene and specific cancers based on the literature. A cutoff value of 21.09 was suggested to determine true positives and the true negatives in cancer gene identification.
2.查找資料
習(xí)慣性的打開瀏覽器霹期,準(zhǔn)備打破砂鍋問到底叶组,驚喜的發(fā)現(xiàn),OncoScore
竟然是一個寫好的R包历造,而且放在了Bioconductor網(wǎng)頁甩十,可直接進(jìn)行安裝、使用吭产。雖然文章發(fā)在了Sci Rep雜志上侣监,但是小編認(rèn)為還是值得一試。
3.它能干什么
The OncoScore analysis consists of two parts. One can estimate a score to asses the
oncogenic potential of a set of genes, given the lecterature knowledge, at the time of the
analysis, or one can study the trend of such score over time.
可見臣淤,OncoScore不僅可以依據(jù)文獻(xiàn)中的知識橄霉,對一組設(shè)定目標(biāo)基因列表的致癌能力進(jìn)行評分,還可以研究這個分?jǐn)?shù)隨時間的趨勢邑蒋。
4.開始表演姓蜂,拿好小板凳看戲
4.1 準(zhǔn)備工作
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("OncoScore")
# load the library
library(OncoScore)
# Define a query
query = perform.query(c("ASXL1","IDH1","IDH2","SETBP1","TET2"))
### Starting the queries for the selected genes.
### Performing queries for cancer literature
Number of papers found in PubMed for ASXL1 was: 923
Number of papers found in PubMed for IDH1 was: 3691
Number of papers found in PubMed for IDH2 was: 1318
Number of papers found in PubMed for SETBP1 was: 177
Number of papers found in PubMed for TET2 was: 1609
### Performing queries for all the literature
Number of papers found in PubMed for ASXL1 was: 1018
Number of papers found in PubMed for IDH1 was: 3902
Number of papers found in PubMed for IDH2 was: 1499
Number of papers found in PubMed for SETBP1 was: 229
Number of papers found in PubMed for TET2 was: 2117
以上我們可以發(fā)現(xiàn),通過檢索医吊,得到了癌癥相關(guān)研究的文獻(xiàn)數(shù)量钱慢,以及所有與檢索基因相關(guān)文獻(xiàn)數(shù)量。
OncoScore provides a function to merge gene names if requested by the user. This function is useful when there are aliases in the gene list.
combine.query.results(query, c('IDH1', 'IDH2'), 'new_gene')
CitationsGene CitationsGeneInCancer
ASXL1 1018 923
SETBP1 229 177
TET2 2117 1609
new_gene 5401 5009
當(dāng)然卿堂,OncoScore還可以依據(jù)染色體信息檢索基因束莫。這里不再演示。
4.2 重點(diǎn)來啦
4.2.1 開始計(jì)算基因的致癌評分
result = compute.oncoscore(query)
### Processing data
### Computing frequencies scores
### Estimating oncogenes
### Results:
ASXL1 -> 81.59349
IDH1 -> 86.66355
IDH2 -> 79.59096
SETBP1 -> 67.43283
TET2 -> 69.12424
4.2.2 時間趨勢分析(OncoScore timeline analysis)
query.timepoints = perform.query.timeseries(c("ASXL1","IDH1","IDH2","SETBP1","TET2"),
c("2012/03/01", "2013/03/01", "2014/03/01", "2015/03/01", "2016/03/01"))
### Starting the queries for the selected genes.
### Quering PubMed for timepoint 2012/03/01
### Performing queries for cancer literature
Number of papers found in PubMed for ASXL1 was: 86
Number of papers found in PubMed for IDH1 was: 409
Number of papers found in PubMed for IDH2 was: 173
Number of papers found in PubMed for SETBP1 was: 5
Number of papers found in PubMed for TET2 was: 173
### Performing queries for all the literature
Number of papers found in PubMed for ASXL1 was: 92
Number of papers found in PubMed for IDH1 was: 489
Number of papers found in PubMed for IDH2 was: 235
Number of papers found in PubMed for SETBP1 was: 10
Number of papers found in PubMed for TET2 was: 197
### Quering PubMed for timepoint 2013/03/01
### Performing queries for cancer literature
Number of papers found in PubMed for ASXL1 was: 135
Number of papers found in PubMed for IDH1 was: 662
Number of papers found in PubMed for IDH2 was: 267
Number of papers found in PubMed for SETBP1 was: 11
Number of papers found in PubMed for TET2 was: 258
### Performing queries for all the literature
Number of papers found in PubMed for ASXL1 was: 150
Number of papers found in PubMed for IDH1 was: 753
Number of papers found in PubMed for IDH2 was: 336
Number of papers found in PubMed for SETBP1 was: 18
Number of papers found in PubMed for TET2 was: 303
### Quering PubMed for timepoint 2014/03/01
### Performing queries for cancer literature
Number of papers found in PubMed for ASXL1 was: 188
Number of papers found in PubMed for IDH1 was: 904
Number of papers found in PubMed for IDH2 was: 365
Number of papers found in PubMed for SETBP1 was: 29
Number of papers found in PubMed for TET2 was: 347
### Performing queries for all the literature
Number of papers found in PubMed for ASXL1 was: 209
Number of papers found in PubMed for IDH1 was: 1003
Number of papers found in PubMed for IDH2 was: 440
Number of papers found in PubMed for SETBP1 was: 36
Number of papers found in PubMed for TET2 was: 431
### Quering PubMed for timepoint 2015/03/01
### Performing queries for cancer literature
Number of papers found in PubMed for ASXL1 was: 257
Number of papers found in PubMed for IDH1 was: 1198
Number of papers found in PubMed for IDH2 was: 468
Number of papers found in PubMed for SETBP1 was: 51
Number of papers found in PubMed for TET2 was: 461
### Performing queries for all the literature
Number of papers found in PubMed for ASXL1 was: 286
Number of papers found in PubMed for IDH1 was: 1304
Number of papers found in PubMed for IDH2 was: 551
Number of papers found in PubMed for SETBP1 was: 66
Number of papers found in PubMed for TET2 was: 583
### Quering PubMed for timepoint 2016/03/01
### Performing queries for cancer literature
Number of papers found in PubMed for ASXL1 was: 323
Number of papers found in PubMed for IDH1 was: 1506
Number of papers found in PubMed for IDH2 was: 569
Number of papers found in PubMed for SETBP1 was: 68
Number of papers found in PubMed for TET2 was: 587
### Performing queries for all the literature
Number of papers found in PubMed for ASXL1 was: 359
Number of papers found in PubMed for IDH1 was: 1625
Number of papers found in PubMed for IDH2 was: 661
Number of papers found in PubMed for SETBP1 was: 89
Number of papers found in PubMed for TET2 was: 745
perform.query.timeseries ()
函數(shù)檢索了幾個設(shè)定時間的文獻(xiàn)數(shù)據(jù)信息草描。
result.timeseries = compute.oncoscore.timeseries(query.timepoints)
### Computing oncoscore for timepoint 2012/03/01
### Processing data
### Computing frequencies scores
### Estimating oncogenes
### Results:
ASXL1 -> 79.14893
IDH1 -> 74.27776
IDH2 -> 64.27063
SETBP1 -> 34.9485
TET2 -> 76.29579
### Computing oncoscore for timepoint 2013/03/01
### Processing data
### Computing frequencies scores
### Estimating oncogenes
### Results:
ASXL1 -> 77.54983
IDH1 -> 78.71551
IDH2 -> 69.99559
SETBP1 -> 46.4559
TET2 -> 74.81894
### Computing oncoscore for timepoint 2014/03/01
### Processing data
### Computing frequencies scores
### Estimating oncogenes
### Results:
ASXL1 -> 78.28121
IDH1 -> 81.08963
IDH2 -> 73.50788
SETBP1 -> 64.97398
TET2 -> 71.31087
### Computing oncoscore for timepoint 2015/03/01
### Processing data
### Computing frequencies scores
### Estimating oncogenes
### Results:
ASXL1 -> 78.84769
IDH1 -> 82.99363
IDH2 -> 75.60886
SETBP1 -> 64.48853
TET2 -> 70.46695
### Computing oncoscore for timepoint 2016/03/01
### Processing data
### Computing frequencies scores
### Estimating oncogenes
### Results:
ASXL1 -> 79.37202
IDH1 -> 83.9881
IDH2 -> 76.89328
SETBP1 -> 64.60591
TET2 -> 70.53378
4.2.3 可視化
## Oncogenetic potential of the considered genes
plot.oncoscore(result, col = 'darkblue')
## Absolute values of the oncogenetic potential of the considered genes over times
plot.oncoscore.timeseries(result.timeseries)
## Variations of the oncogenetic potential of the considered genes over times
plot.oncoscore.timeseries(result.timeseries,
incremental = TRUE,
ylab='absolute variation')
## Variations as relative values of the oncogenetic potential of the considered genes over times
plot.oncoscore.timeseries(result.timeseries,
incremental = TRUE,
relative = TRUE,
ylab='relative variation')