數(shù)據(jù)集鏈接http://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset
相關(guān)描述可以在網(wǎng)站上看到割以,我就不寫(xiě)啦~
分別使用線性回歸/決策樹(shù)/隨機(jī)森林決策樹(shù)進(jìn)行預(yù)測(cè),順便比較了一下哪個(gè)模型預(yù)測(cè)更加精準(zhǔn)。
在使用隨機(jī)森林預(yù)測(cè)時(shí)酣衷,如果對(duì)時(shí)間要求不是很高的話吨灭,可以把n_estimators設(shè)置的稍微大一些聂受,0-200之間都可以扮饶,因?yàn)槟P蜏?zhǔn)確率函數(shù)為一個(gè)對(duì)數(shù)函數(shù)逗鸣。
代碼:
讀取csv文件
import pandas as pd
import matplotlib.pyplot as plt
bike_rentals=pd.read_csv('./data/hour.csv')
#plt.hist(bike_rentals['cnt'])
#plt.show()
cnt_correlations=bike_rentals.corr()['cnt']
print("\n Reading success! cnt-correlations:\n")
print(cnt_correlations)
處理數(shù)據(jù)枝嘶,生成模型并預(yù)測(cè)
import read_file
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
bike_rentals=read_file.bike_rentals
# Formatting 'hr' column
def assign_label(hour):
if hour >=0 and hour < 6:
return 4
elif hour >=6 and hour < 12:
return 1
elif hour >= 12 and hour < 18:
return 2
elif hour >= 18 and hour <=24:
return 3
bike_rentals['time_labels']=bike_rentals['hr'].apply(assign_label)
#Splitting data
train=bike_rentals.sample(frac=.8)
test=bike_rentals.iloc[~bike_rentals.index.isin(train.index)]
# Removing columns,such as indirect and unuseful columns
columns=list(bike_rentals.columns)
columns.remove('cnt')
columns.remove('casual')
columns.remove('dteday')
columns.remove('registered')
print("\n===========>>>>>>Predictting:\n")
#Predictting target column,selectting mse as metric.
#LinearRegression
model=LinearRegression()
model.fit(train[columns],train['cnt'])
predictions=model.predict(test[columns])
mse=mean_squared_error(test['cnt'],predictions)
print("MSE using LinearRegression: ",end='')
print(mse,'\n')
#DecisionTreeRegression
model=DecisionTreeRegressor(min_samples_leaf=5)
model.fit(train[columns],train['cnt'])
predictions=model.predict(test[columns])
mse=mean_squared_error(test['cnt'],predictions)
print("MSE using DecisionTreeRegression: ",end='')
print(mse,'\n')
#RandomForsetRegression
model=RandomForestRegressor(n_estimators=50,min_samples_leaf=2)
model.fit(train[columns],train['cnt'])
predictions=model.predict(test[columns])
mse=mean_squared_error(test['cnt'],predictions)
test['predictions']=predictions
print("MSE using DecisionTreeRegression: ",end='')
print(mse,'\n')
print(test.iloc[:10][['cnt','predictions']])
結(jié)果: