本筆記來源于B站Up主: 有Li 的影像組學(xué)系列教學(xué)視頻
本節(jié)(5)主要介紹: 特征篩選之方差選擇法
針對醫(yī)療人員在影像組學(xué)研究中碰到的編程問題彪杉,李博士建議:
如果有一門編程語言基礎(chǔ)的話會比較輕松
先學(xué)說話,再學(xué)語法
根據(jù)你的需求順序蜒灰,而非課本安排的順序來學(xué)
方差選擇法:
思考:一個能用來做分類的特征喉童,它的方差應(yīng)該是怎么樣的橄教?
方差公式:
Formula.jpg
方差選擇法進(jìn)行降維的代碼實現(xiàn):
import pandas as pd
import numpy as np
from sklearn.utils import shuffle
xlsx1_filePath = 'C:/Users/RONG/Desktop/PythonBasic/data_A.xlsx'
xlsx2_filePath = 'C:/Users/RONG/Desktop/PythonBasic/data_B.xlsx'
data_1 = pd.read_excel(xlsx1_filePath)
data_2 = pd.read_excel(xlsx2_filePath)
rows_1,__ = data_1.shape
rows_2,__ = data_2.shape
data_1.insert(0,'label',[0]*rows_1)
data_2.insert(0,'label',[1]*rows_2)
data = pd.concat([data_1,data_2])
data = shuffle(data)
data = data.fillna(0)
X = data[data.columns[0:]]
X.head()
方差選擇法
# VarianceSelection
from sklearn.feature_selection import VarianceThreshold
selector = VarianceThreshold(1e10) # 注意修改參數(shù)達(dá)到篩選目的
selector.fit_transform(X)
# print('EveryVaris:'+str(selector.variances_))
print('selectedFeatureIndex:'+str(selector.get_support(True)))
print('selectedFeatureNameis:'+str(X.columns[selector.get_support(True)]))
# print('excludedFeatureNameis:'+str(X.columns[~ selector.get_support()])) # ‘~’取反
Output:
# selectedFeatureIndex:[17 30 34 92]
# selectedFeatureNameis:Index(['original_firstorder_Energy', 'original_firstorder_TotalEnergy',
# 'original_glcm_ClusterProminence',
# 'original_glszm_LargeAreaHighGrayLevelEmphasis'],
# dtype='object')