背景
文章于2014年10月份發(fā)表在IEEE TRANSACTIONS ON SOFTWARE ENGINEERING扩借,一篇通過(guò)文本挖掘技術(shù)進(jìn)行軟件漏洞檢測(cè)的論文。文章本身引用只有14煎娇,創(chuàng)新點(diǎn)也不是很新,但由于其期刊等級(jí)較高纠拔,而且文章數(shù)據(jù)處理分析較多缸血,還是值得以后寫(xiě)作借鑒蜜氨。
- 出處:IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 40, NO. 10, OCTOBER 2014
- 作者:Riccardo Scandariato, James Walden, Aram Hovsepyan, and Wouter Joosen
概述
論文主要觀點(diǎn)
將Android app應(yīng)用軟件源代碼視作文本,源代碼中語(yǔ)句與詞類比于文本中詞捎泻,作為數(shù)據(jù)特征飒炎,使用樸素貝葉斯和隨機(jī)森林的算法,構(gòu)建軟件源代碼漏洞預(yù)測(cè)模型笆豁。
成果
- 首次將文本挖掘相關(guān)方法應(yīng)用于軟件漏洞預(yù)測(cè)郎汪,直接使用源代碼而非軟件語(yǔ)義、開(kāi)發(fā)者相關(guān)特征作為特征進(jìn)行預(yù)測(cè)闯狱。
- 預(yù)測(cè)模型相對(duì)于當(dāng)前的軟件漏洞預(yù)測(cè)模型煞赢,具有更好的準(zhǔn)確性和召回率。
方法模型
相關(guān)工作圖
作者使用五個(gè)維度的信息來(lái)評(píng)價(jià)對(duì)比軟件漏洞預(yù)測(cè)模型相關(guān)工作哄孤,如下圖所示:

image
主要步驟
- 樣本選擇selection of applications:
- source: the F-Droid repository (f-droid.org)
- selection criteria:programming language, application size, and the number of
versions released
- 漏洞數(shù)據(jù)構(gòu)建construction of dataset:
- tool:HP Fortify SCA scan the source code to present vulnerablity warnings of the applications
- why:too few vulnerabilities(NVD
(nvd.nist.gov)) related to Android applications
- 輸入構(gòu)建input:Each Java file is tokenized into a vector of terms
- 機(jī)器學(xué)習(xí)方法選擇machine learning techniques:five, wellknown learning techniques are applied to the approache: Decision Trees, k-Nearest
Neighbor, Na€ ?ve Bayes, Random Forest and support vector machine (SVM). Best results are obtained with NB and Random Forest. - 實(shí)驗(yàn)設(shè)計(jì)experiments design:
- 驗(yàn)證方法validation:10-fold cross-validation
- experiment 1:built models with both Na€ ?ve Bayes and Random Forest machine learning techniques based on the first version (v0) of each application.prove the method can be used to build high quality prediction models for Android applications.
- experiment 2:built a prediction model based on the initial version (using all source files available in v0) and predicted all subsequent versions of that application (v1 andfollowing) prediction technique can forecast with excellent performance the vulnerable files of the future versions of an Android application
- experiment 3:built 20 models using version v0 of each application. We then tested each model by predicting vulnerable files in the v0 versions of the other 19 applications.a single application can predict which software components are vulnerable in other applications
創(chuàng)新點(diǎn)
- 文本挖掘方法應(yīng)用于軟件缺陷檢測(cè)
- 實(shí)驗(yàn)設(shè)計(jì)上照筑,進(jìn)行三個(gè)方向上的對(duì)比實(shí)驗(yàn)
總結(jié)
優(yōu)點(diǎn)
- 文章的實(shí)驗(yàn)部分?jǐn)?shù)據(jù)對(duì)比寫(xiě)的不錯(cuò),很簡(jiǎn)單的創(chuàng)意和想法瘦陈,做出了三組實(shí)驗(yàn)
不足
- 創(chuàng)新點(diǎn)較為簡(jiǎn)單
- 數(shù)據(jù)對(duì)比牽強(qiáng)凝危,數(shù)據(jù)集不同
- 期刊論文過(guò)于滯后,本文2014年10月發(fā)表双饥,但是研究時(shí)間節(jié)點(diǎn)在2012年前后媒抠,以后盡量多看會(huì)議論文,期刊論文僅作為參考文獻(xiàn)
我的想法
- 結(jié)合vccfinder論文