注:數(shù)據(jù)導(dǎo)入見:Python學(xué)習(xí)之?dāng)?shù)據(jù)導(dǎo)入
1豺裆、讀取數(shù)據(jù)(X:獨(dú)立數(shù)據(jù)酗昼、Y:聯(lián)動(dòng)數(shù)據(jù))
#導(dǎo)入包
import numpy as np #矩陣
import matplotlib.pyplot as plt #數(shù)據(jù)展示篓叶、可視化
import pandas as pd? ? #數(shù)據(jù)預(yù)處理
#import dataset
datasets = pd.read_csv('Data.csv')
#missing data 丟失數(shù)據(jù)處理 1虎韵、去最大值 最小值时鸵,2言秸、平均數(shù) 3什往、刪除
X = datasets.iloc[:,:-1].values? #取出獨(dú)立變量
Y = datasets.iloc[:,3].values
#數(shù)據(jù)預(yù)處理,補(bǔ)充缺失數(shù)據(jù)
from sklearn.preprocessing import Imputer
#mean 缺失的用平均數(shù)填充
#怎么處理數(shù)據(jù)
imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
#處理哪里的數(shù)據(jù)
imputer = imputer.fit( X[:, 1:3])
X[:,1:3] = imputer.transform( X[:,1:3])
#查看補(bǔ)充缺失數(shù)據(jù)之后的數(shù)據(jù)
X
解釋:“imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0):
NaN:缺失數(shù)據(jù)
strategy:缺失數(shù)據(jù)處理方式扳缕,平均值,
If “mean”, then replace missing values using the mean along the axis.
If “median”, then replace missing values using the median along the axis.
If “most_frequent”, then replace missing using the most frequent value along the axis.
axis:
Ifaxis=0, then impute along columns.Ifaxis=1, then impute along rows.
2别威、查看補(bǔ)充缺失數(shù)據(jù)之后的數(shù)據(jù)