【生信分析】-文本挖掘目標(biāo)基因+評估致癌能力,你造嘛拔莱?

語雀:左手柳葉刀右手炭火燒
微信公眾號:研平方 | 簡書:研平方
關(guān)注可了解更多的科研教程及技巧碗降。如有問題或建議,請留言塘秦。
歡迎關(guān)注我:一起學(xué)習(xí)讼渊,一起進(jìn)步!

最近嗤形,小編“掃蕩”文獻(xiàn)時精偿,發(fā)現(xiàn)一個令我十分感興趣的應(yīng)用,利用文本文本挖掘技術(shù)可以評估選定基因與癌癥之間的關(guān)聯(lián)赋兵。提到文本挖掘這類技術(shù)笔咽,小編當(dāng)然要一探究竟了。

1.原文如下

Literature evidence for the identified target genes in cancer

We used OncoScore, a text mining tool to assess the associations between each gene and specific cancers based on the literature. A cutoff value of 21.09 was suggested to determine true positives and the true negatives in cancer gene identification.

2.查找資料

習(xí)慣性的打開瀏覽器霹期,準(zhǔn)備打破砂鍋問到底叶组,驚喜的發(fā)現(xiàn),OncoScore竟然是一個寫好的R包历造,而且放在了Bioconductor網(wǎng)頁甩十,可直接進(jìn)行安裝、使用吭产。雖然文章發(fā)在了Sci Rep雜志上侣监,但是小編認(rèn)為還是值得一試。

image
image

3.它能干什么

The OncoScore analysis consists of two parts. One can estimate a score to asses the
oncogenic potential of a set of genes, given the lecterature knowledge, at the time of the
analysis, or one can study the trend of such score over time.

可見臣淤,OncoScore不僅可以依據(jù)文獻(xiàn)中的知識橄霉,對一組設(shè)定目標(biāo)基因列表的致癌能力進(jìn)行評分,還可以研究這個分?jǐn)?shù)隨時間的趨勢邑蒋。

4.開始表演姓蜂,拿好小板凳看戲

4.1 準(zhǔn)備工作

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("OncoScore")

# load the library
library(OncoScore)
# Define a query
query = perform.query(c("ASXL1","IDH1","IDH2","SETBP1","TET2"))

### Starting the queries for the selected genes.

### Performing queries for cancer literature 
    Number of papers found in PubMed for ASXL1 was: 923 
    Number of papers found in PubMed for IDH1 was: 3691 
    Number of papers found in PubMed for IDH2 was: 1318 
    Number of papers found in PubMed for SETBP1 was: 177 
    Number of papers found in PubMed for TET2 was: 1609 

### Performing queries for all the literature 
    Number of papers found in PubMed for ASXL1 was: 1018 
    Number of papers found in PubMed for IDH1 was: 3902 
    Number of papers found in PubMed for IDH2 was: 1499 
    Number of papers found in PubMed for SETBP1 was: 229 
    Number of papers found in PubMed for TET2 was: 2117

以上我們可以發(fā)現(xiàn),通過檢索医吊,得到了癌癥相關(guān)研究的文獻(xiàn)數(shù)量钱慢,以及所有與檢索基因相關(guān)文獻(xiàn)數(shù)量。

OncoScore provides a function to merge gene names if requested by the user. This function is useful when there are aliases in the gene list.

combine.query.results(query, c('IDH1', 'IDH2'), 'new_gene')
         CitationsGene CitationsGeneInCancer
ASXL1             1018                   923
SETBP1             229                   177
TET2              2117                  1609
new_gene          5401                  5009

當(dāng)然卿堂,OncoScore還可以依據(jù)染色體信息檢索基因束莫。這里不再演示。

4.2 重點(diǎn)來啦

4.2.1 開始計(jì)算基因的致癌評分
result = compute.oncoscore(query)

### Processing data
### Computing frequencies scores 
### Estimating oncogenes
### Results:
     ASXL1 -> 81.59349 
     IDH1 -> 86.66355 
     IDH2 -> 79.59096 
     SETBP1 -> 67.43283 
     TET2 -> 69.12424
4.2.2 時間趨勢分析(OncoScore timeline analysis)
query.timepoints = perform.query.timeseries(c("ASXL1","IDH1","IDH2","SETBP1","TET2"),
                                            c("2012/03/01", "2013/03/01", "2014/03/01", "2015/03/01", "2016/03/01"))

### Starting the queries for the selected genes.
### Quering PubMed for timepoint 2012/03/01 
    ### Performing queries for cancer literature 
    Number of papers found in PubMed for ASXL1 was: 86 
    Number of papers found in PubMed for IDH1 was: 409 
    Number of papers found in PubMed for IDH2 was: 173 
    Number of papers found in PubMed for SETBP1 was: 5 
    Number of papers found in PubMed for TET2 was: 173 
    ### Performing queries for all the literature 
    Number of papers found in PubMed for ASXL1 was: 92 
    Number of papers found in PubMed for IDH1 was: 489 
    Number of papers found in PubMed for IDH2 was: 235 
    Number of papers found in PubMed for SETBP1 was: 10 
    Number of papers found in PubMed for TET2 was: 197 
### Quering PubMed for timepoint 2013/03/01 
    ### Performing queries for cancer literature 
    Number of papers found in PubMed for ASXL1 was: 135 
    Number of papers found in PubMed for IDH1 was: 662 
    Number of papers found in PubMed for IDH2 was: 267 
    Number of papers found in PubMed for SETBP1 was: 11 
    Number of papers found in PubMed for TET2 was: 258 
    ### Performing queries for all the literature 
    Number of papers found in PubMed for ASXL1 was: 150 
    Number of papers found in PubMed for IDH1 was: 753 
    Number of papers found in PubMed for IDH2 was: 336 
    Number of papers found in PubMed for SETBP1 was: 18 
    Number of papers found in PubMed for TET2 was: 303 
### Quering PubMed for timepoint 2014/03/01 
    ### Performing queries for cancer literature 
    Number of papers found in PubMed for ASXL1 was: 188 
    Number of papers found in PubMed for IDH1 was: 904 
    Number of papers found in PubMed for IDH2 was: 365 
    Number of papers found in PubMed for SETBP1 was: 29 
    Number of papers found in PubMed for TET2 was: 347
    ### Performing queries for all the literature 
    Number of papers found in PubMed for ASXL1 was: 209 
    Number of papers found in PubMed for IDH1 was: 1003 
    Number of papers found in PubMed for IDH2 was: 440 
    Number of papers found in PubMed for SETBP1 was: 36 
    Number of papers found in PubMed for TET2 was: 431 
### Quering PubMed for timepoint 2015/03/01 
    ### Performing queries for cancer literature 
    Number of papers found in PubMed for ASXL1 was: 257 
    Number of papers found in PubMed for IDH1 was: 1198 
    Number of papers found in PubMed for IDH2 was: 468 
    Number of papers found in PubMed for SETBP1 was: 51 
    Number of papers found in PubMed for TET2 was: 461 
    ### Performing queries for all the literature 
    Number of papers found in PubMed for ASXL1 was: 286 
    Number of papers found in PubMed for IDH1 was: 1304 
    Number of papers found in PubMed for IDH2 was: 551 
    Number of papers found in PubMed for SETBP1 was: 66 
    Number of papers found in PubMed for TET2 was: 583 
### Quering PubMed for timepoint 2016/03/01 
    ### Performing queries for cancer literature 
    Number of papers found in PubMed for ASXL1 was: 323 
    Number of papers found in PubMed for IDH1 was: 1506 
    Number of papers found in PubMed for IDH2 was: 569 
    Number of papers found in PubMed for SETBP1 was: 68 
    Number of papers found in PubMed for TET2 was: 587
    ### Performing queries for all the literature 
    Number of papers found in PubMed for ASXL1 was: 359 
    Number of papers found in PubMed for IDH1 was: 1625 
    Number of papers found in PubMed for IDH2 was: 661 
    Number of papers found in PubMed for SETBP1 was: 89 
    Number of papers found in PubMed for TET2 was: 745 

perform.query.timeseries ()函數(shù)檢索了幾個設(shè)定時間的文獻(xiàn)數(shù)據(jù)信息草描。

result.timeseries = compute.oncoscore.timeseries(query.timepoints)

### Computing oncoscore for timepoint 2012/03/01 
### Processing data
### Computing frequencies scores 
### Estimating oncogenes
### Results:
     ASXL1 -> 79.14893 
     IDH1 -> 74.27776 
     IDH2 -> 64.27063 
     SETBP1 -> 34.9485 
     TET2 -> 76.29579 
### Computing oncoscore for timepoint 2013/03/01 
### Processing data
### Computing frequencies scores 
### Estimating oncogenes
### Results:
     ASXL1 -> 77.54983 
     IDH1 -> 78.71551 
     IDH2 -> 69.99559 
     SETBP1 -> 46.4559 
     TET2 -> 74.81894 
### Computing oncoscore for timepoint 2014/03/01 
### Processing data
### Computing frequencies scores 
### Estimating oncogenes
### Results:
     ASXL1 -> 78.28121 
     IDH1 -> 81.08963 
     IDH2 -> 73.50788 
     SETBP1 -> 64.97398 
     TET2 -> 71.31087 
     ### Computing oncoscore for timepoint 2015/03/01 
### Processing data
### Computing frequencies scores 
### Estimating oncogenes
### Results:
     ASXL1 -> 78.84769 
     IDH1 -> 82.99363 
     IDH2 -> 75.60886 
     SETBP1 -> 64.48853 
     TET2 -> 70.46695 
### Computing oncoscore for timepoint 2016/03/01 
### Processing data
### Computing frequencies scores 
### Estimating oncogenes
### Results:
     ASXL1 -> 79.37202 
     IDH1 -> 83.9881 
     IDH2 -> 76.89328 
     SETBP1 -> 64.60591 
     TET2 -> 70.53378 
4.2.3 可視化
## Oncogenetic potential of the considered genes
plot.oncoscore(result, col = 'darkblue')

## Absolute values of the oncogenetic potential of the considered genes over times
plot.oncoscore.timeseries(result.timeseries)

## Variations of the oncogenetic potential of the considered genes over times
plot.oncoscore.timeseries(result.timeseries,
                          incremental = TRUE,
                          ylab='absolute variation')

## Variations as relative values of the oncogenetic potential of the considered genes over times
plot.oncoscore.timeseries(result.timeseries,
                          incremental = TRUE,
                          relative = TRUE,
                          ylab='relative variation')
plot1.png
plot2.png
plot3.png
plot4.png

溫馨提示:語雀上的閱讀览绿,體驗(yàn)更佳!

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末陶珠,一起剝皮案震驚了整個濱河市挟裂,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌揍诽,老刑警劉巖诀蓉,帶你破解...
    沈念sama閱讀 216,372評論 6 498
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場離奇詭異暑脆,居然都是意外死亡渠啤,警方通過查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 92,368評論 3 392
  • 文/潘曉璐 我一進(jìn)店門添吗,熙熙樓的掌柜王于貴愁眉苦臉地迎上來沥曹,“玉大人,你說我怎么就攤上這事〖嗣溃” “怎么了僵腺?”我有些...
    開封第一講書人閱讀 162,415評論 0 353
  • 文/不壞的土叔 我叫張陵,是天一觀的道長壶栋。 經(jīng)常有香客問我辰如,道長,這世上最難降的妖魔是什么贵试? 我笑而不...
    開封第一講書人閱讀 58,157評論 1 292
  • 正文 為了忘掉前任琉兜,我火速辦了婚禮,結(jié)果婚禮上毙玻,老公的妹妹穿的比我還像新娘豌蟋。我一直安慰自己,他們只是感情好桑滩,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,171評論 6 388
  • 文/花漫 我一把揭開白布梧疲。 她就那樣靜靜地躺著,像睡著了一般施符。 火紅的嫁衣襯著肌膚如雪往声。 梳的紋絲不亂的頭發(fā)上,一...
    開封第一講書人閱讀 51,125評論 1 297
  • 那天戳吝,我揣著相機(jī)與錄音浩销,去河邊找鬼。 笑死听哭,一個胖子當(dāng)著我的面吹牛慢洋,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播陆盘,決...
    沈念sama閱讀 40,028評論 3 417
  • 文/蒼蘭香墨 我猛地睜開眼普筹,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了隘马?” 一聲冷哼從身側(cè)響起太防,我...
    開封第一講書人閱讀 38,887評論 0 274
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎酸员,沒想到半個月后蜒车,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 45,310評論 1 310
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡幔嗦,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,533評論 2 332
  • 正文 我和宋清朗相戀三年酿愧,在試婚紗的時候發(fā)現(xiàn)自己被綠了。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片邀泉。...
    茶點(diǎn)故事閱讀 39,690評論 1 348
  • 序言:一個原本活蹦亂跳的男人離奇死亡嬉挡,死狀恐怖钝鸽,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情庞钢,我是刑警寧澤拔恰,帶...
    沈念sama閱讀 35,411評論 5 343
  • 正文 年R本政府宣布,位于F島的核電站焊夸,受9級特大地震影響仁连,放射性物質(zhì)發(fā)生泄漏蓝角。R本人自食惡果不足惜阱穗,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,004評論 3 325
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望使鹅。 院中可真熱鬧揪阶,春花似錦、人聲如沸患朱。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,659評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽裁厅。三九已至冰沙,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間执虹,已是汗流浹背拓挥。 一陣腳步聲響...
    開封第一講書人閱讀 32,812評論 1 268
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留袋励,地道東北人侥啤。 一個月前我還...
    沈念sama閱讀 47,693評論 2 368
  • 正文 我出身青樓,卻偏偏與公主長得像茬故,于是被迫代替她去往敵國和親盖灸。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 44,577評論 2 353

推薦閱讀更多精彩內(nèi)容