參考kaggle notebook:
keras
一、題目
1.項(xiàng)目題目:沃爾瑪銷量預(yù)測(cè)
預(yù)測(cè)沃爾瑪未來(lái)28天的銷量
2.評(píng)分標(biāo)準(zhǔn):RMSSE
RMSSE.png
n為40341訓(xùn)練樣本量,h為28天然磷,Yt為實(shí)際銷量值,Yt^為預(yù)測(cè)銷量值
3.數(shù)據(jù)描述
數(shù)據(jù)有3049種產(chǎn)品稠鼻,共3大類捺信,7個(gè)部門(mén),在3個(gè)洲的10個(gè)商場(chǎng)里銷售
sales_train.csv:這是主要的訓(xùn)練集仁烹,含有每個(gè)從2011-1-29到2016-5-22的1941天每天的(不包括到2016-6-19的28天)銷量,含商品的ID耸弄,部門(mén),分類卓缰,商店计呈,洲.
sell_prices.csv:商店的商品每周均價(jià)
calendar.csv:日期的星期砰诵、月份、年和該洲是否允許用食品券(food stamp,低收入家庭的補(bǔ)助)購(gòu)買(mǎi)
二捌显、正文
1.導(dǎo)入數(shù)據(jù)
#導(dǎo)入庫(kù)
import pandas as pd
import seaborn as sns
import lightgbm as lgb
import numpy as np
#導(dǎo)入數(shù)據(jù) import data
calendar = pd.read_csv('calendar.csv')
sample_submission = pd.read_csv('sample_submission.csv')
sales_train_validation = pd.read_csv('sales_train_validation.csv')
sell_prices = pd.read_csv('sell_prices.csv')
#減小內(nèi)存占用 reduce the memory usage
def reduce_mem_usage(df, verbose=True):
numerics = ["int16", "int32", "int64", "float16", "float32", "float64"]
start_mem = df.memory_usage().sum() / 1024 ** 2
for col in df.columns:
col_type = df[col].dtypes
if col_type in numerics:
c_min = df[col].min()
c_max = df[col].max()
if str(col_type)[:3] == "int":
if c_min > np.iinfo(np.int8).min and c_max < np.iinfo(np.int8).max:
df[col] = df[col].astype(np.int8)
elif c_min > np.iinfo(np.int16).min and c_max < np.iinfo(np.int16).max:
df[col] = df[col].astype(np.int16)
elif c_min > np.iinfo(np.int32).min and c_max < np.iinfo(np.int32).max:
df[col] = df[col].astype(np.int32)
elif c_min > np.iinfo(np.int64).min and c_max < np.iinfo(np.int64).max:
df[col] = df[col].astype(np.int64)
else:
if (
c_min > np.finfo(np.float16).min
and c_max < np.finfo(np.float16).max
):
df[col] = df[col].astype(np.float16)
elif (
c_min > np.finfo(np.float32).min
and c_max < np.finfo(np.float32).max
):
df[col] = df[col].astype(np.float32)
else:
df[col] = df[col].astype(np.float64)
end_mem = df.memory_usage().sum() / 1024 ** 2
if verbose:
print(
"Mem. usage decreased to {:5.2f} Mb ({:.1f}% reduction)".format(
end_mem, 100 * (start_mem - end_mem) / start_mem
)
)
return df
#減小dataframe占用內(nèi)存
print("縮小前占用內(nèi)存為:",sell_prices.memory_usage().sum() / (1024 ** 2), "MB")
calendar = reduce_mem_usage(calendar)
sample_submission = reduce_mem_usage(sample_submission)
sales_train_validation = reduce_mem_usage(sales_train_validation)
sell_prices = reduce_mem_usage(sell_prices)
print("縮小后占用內(nèi)存為:",sell_prices.memory_usage().sum() / (1024 ** 2), "MB")
sales_train_validation.head()
calendar.head()
image.png