本筆記來源于B站Up主: 有Li 的影像組學(xué)系列教學(xué)視頻
本節(jié)(24)主要講解: 解讀一篇文獻(xiàn)穿撮,了解不同的降維姨伤、分類器組合方法
這篇文獻(xiàn)2018年發(fā)表在European Radiology上: Radiomics features on non-contrast-enhanced CT scan can precisely classify AVM-related hematomas from other spontaneous intraparenchymal hematoma types. 這里的AVM
指的是 arteriovenous malformation.
1. feature extraction
研究最初一共提取到了576個(gè)特征溃斋,并將其分為6組:
(1) First-order statistics of hematoma intensity (n = 18),
(2) shape (n = 16),
(3) texture (n = 22, derived from GLCM),
(4) texture (n = 16, derived from GLRLM),
(5) wavelet-based features (n = 448),
(6) Laplacian of Gaussian-filtered image features (n = 56).
圖像分割由兩名放射科醫(yī)生完成,作者將ICC(intraclass correlation coefficient) > 0.8 的特征篩選出來用于下一步的特征選擇和建模分尸。
2. feature selection
2.1 降維(11種過濾式特征篩選):
單變量分析(p < 0.1)
gini index (GINI)
, relief (RELF)
, information gain (IFGN)
, gain ratio (GNRO)
, Euclidean distance (EUDT)
, F-ANOVA (FAOV)
, t test-score (TSCR)
, Wilcoxon rank sum (WLCR)
, and fisher score (FSCR)
多變量分析
mutual information (MUIF)
and MRMR
2.2 實(shí)現(xiàn)方法:
FS methods including GINI, RELF, IFGN, GNRO, and EUDT were performed by R software package “CORElearn” by the “attrEval” function.
FAOV and MUIF were conducted using the feature_selection module in sklearn (f_classif and mutual_info_classif), MRMR by the “pymrmr” package in Python.
需要注意的是锦聊,
We selected features according to rankings in their own group instead of rankings among all features since this enabled a systematic description of different aspects of the hematomas and avoided selecting features from a certain feature group.
3. machine learning and evaluation of the model
作者使用了8種分類器:
Eight supervised machine learning algorithms:
neural network (NN)
,decision tree (Decision Tree)
,Adaboost classifier (AD)
,na?ve Bayes (NB)
,random forest (RF)
,logistic regression (LG)
,support vector machines (SVM)
, andk nearest neighbors (KNN)
. ( throughsklearn
package in Python)
這樣,一共88(11*8)個(gè)models就建成了箩绍。研究者使用了threefold cross-validated對其進(jìn)行訓(xùn)練孔庭,使用 AUC和RSD(relative standard deviation)來評(píng)價(jià)model的表現(xiàn)。其中材蛛,
RSD = (sdAUC/meanAUC) *100
The lower the RSD value, the more stable the predicting model.
4. 結(jié)果
-
Boxplot of ICC of features extracted from 6 feature groups
- Heatmaps illustrating the predictive performance (AUC) of different combinations of feature selection methods (rows) and classification algorithms (columns).
(a) Cross-validated AUC values of 88 models on the train and validation datasets.
(b) RSD values of 88 models on the train and validation datasets.
-
The model of RELF_Ada showed a best performance.
(a) Illustration of the threefold cross-validated ROC curve of model RELF_Ada.
(b) ROC curve of RELF_Ada on the test dataset.
(c) Confusion matrix with normalization of RELF_Ada
-
Comparison of prediction performance between the model and radiologists.
最后圆到,作者還分析比較了納入到RELF_Ada模型中的8個(gè)特征在AVM-related
組和Other etiologies
組中的差別。