一瘩扼、基本原理
除了模型断凶,也是被廣泛應(yīng)用在點擊率預(yù)測中的深度學習模型,主要關(guān)注如何學習用戶行為(user behavior)的組合特征(feature interactions),從而最大化推薦系統(tǒng)CTR缅茉。
是一個集成了(factorization machine)和的神經(jīng)網(wǎng)絡(luò)框架凤巨,分別承擔和的部分视乐。DeepFM的和部分共享相同的輸入,可以提高訓(xùn)練效率敢茁,不需要額外的特征工程佑淀,用FM建模低階的特征組合,用DNN建模高階的特征組合彰檬,因此可以同時從原始特征中學習到高階和低階的特征交互伸刃。
1.1 FM 結(jié)構(gòu)
FM 包括一階項和二階項僧叉,一階項主要是講特征分別乘上對應(yīng)的系數(shù)奕枝,二階項主要是對和兩兩組合,并且找到其分別對應(yīng)的特征向量瓶堕。
為了更方便實現(xiàn)二階部分隘道,可以轉(zhuǎn)化為和平方與平方和兩個部分:
1.2 DNN 結(jié)構(gòu)
深度部分是一個前饋神經(jīng)網(wǎng)絡(luò),CTR預(yù)估的原始特征輸入向量具有高稀疏郎笆、超高維度谭梗,分類、連續(xù)數(shù)據(jù)混合宛蚓,特征按照field分組等特征激捏,因此在第一層隱含層之前,引入一個嵌入層來完成將輸入向量壓縮到低維稠密向量凄吏。
嵌入層(embedding layer)的結(jié)構(gòu)如下圖所示远舅。當前網(wǎng)絡(luò)結(jié)構(gòu)有兩個有趣的特性闰蛔,1)盡管不同field的輸入長度不同,但是embedding之后向量的長度均為K图柏。2)在FM里得到的隱變量現(xiàn)在作為了嵌入層網(wǎng)絡(luò)的權(quán)重序六。
二、算法實踐
仍然采用Porto Seguro’s Safe Driver Prediction 的數(shù)據(jù)蚤吹,數(shù)據(jù)的介紹可見上期:推薦系統(tǒng)(五):基于深度學習的推薦模型例诀。數(shù)據(jù)的加載與處理如上期相同,即:
TRAIN_FILE = "Driver_Prediction_Data/train.csv"
TEST_FILE = "Driver_Prediction_Data/test.csv"
NUMERIC_COLS = [
"ps_reg_01", "ps_reg_02", "ps_reg_03",
"ps_car_12", "ps_car_13", "ps_car_14", "ps_car_15"
]
IGNORE_COLS = [
"id", "target",
"ps_calc_01", "ps_calc_02", "ps_calc_03", "ps_calc_04",
"ps_calc_05", "ps_calc_06", "ps_calc_07", "ps_calc_08",
"ps_calc_09", "ps_calc_10", "ps_calc_11", "ps_calc_12",
"ps_calc_13", "ps_calc_14",
"ps_calc_15_bin", "ps_calc_16_bin", "ps_calc_17_bin",
"ps_calc_18_bin", "ps_calc_19_bin", "ps_calc_20_bin"
]
train_data = pd.read_csv(TRAIN_FILE)
test_data = pd.read_csv(TEST_FILE)
data = pd.concat([train_data,test_data])
feature_dict = {}
total_feature = 0
for col in data.columns:
if col in IGNORE_COLS:
continue
elif col in NUMERIC_COLS:
feature_dict[col] = total_feature
total_feature += 1
else:
unique_val = data[col].unique()
feature_dict[col] = dict(zip(unique_val,range(total_feature,len(unique_val)+total_feature)))
total_feature += len(unique_val)
# convert train dataset
train_y = train_data[['target']].values.tolist()
train_data.drop(['target','id'],axis=1, inplace=True)
train_feature_index = train_data.copy()
train_feature_value = train_data.copy()
for col in train_feature_index.columns:
if col in IGNORE_COLS:
train_feature_index.drop(col,axis=1, inplace=True)
train_feature_value.drop(col,axis=1, inplace=True)
continue
elif col in NUMERIC_COLS:
train_feature_index[col] = feature_dict[col]
else:
train_feature_index[col] = train_feature_index[col].map(feature_dict[col])
train_feature_value[col] = 1
field_size = train_feature_value.shape[1]
包的調(diào)用與上期完全一致裁着,類的定義也基本一致繁涂,略有差別的是增加了self.use_fm,self.use_deep兩個變量二驰。
class DeepFM(BaseEstimator,TransformerMixin):
def __init__(self, feature_size=100, embedding_size=8, deep_layers=[32,32],batch_size=256,
learning_rate=0.001,optimizer='adam',random_seed=2020,used_fm=True,
use_deep=True,loss_type='logloss',l2_reg=0.0, field_size=39):
self.feature_size = feature_size
self.field_size = field_size
self.embedding_size = embedding_size
self.deep_layers = deep_layers
self.deep_layers_activation = tf.nn.relu
self.use_fm = used_fm
self.use_deep = use_deep
self.l2_reg = l2_reg
self.batch_size = batch_size
self.learning_rate = learning_rate
self.optimizer_type = optimizer
self.random_seed = random_seed
self.loss_type = loss_type
self.train_result, self.valid_result = [], []
self.max_iteration = 200
self._init_graph()
初始化權(quán)值扔罪,包括嵌入層、深度層诸蚕、連接層步势。
def _initialize_weights(self):
weights = dict()
# embeddings layer
weights['feature_embeddings'] = tf.Variable(tf.random_normal([self.feature_size,self.embedding_size],0.0,0.01), name='feature_embeddings')
weights['feature_bias'] = tf.Variable(tf.random_normal([self.feature_size,1],0.0,1.0),name='feature_bias')
# deep layers
input_size = self.field_size * self.embedding_size
glorot = np.sqrt(2.0/(input_size + self.deep_layers[0]))
weights['layer_0'] = tf.Variable(np.random.normal(loc=0, scale=glorot, size=(input_size, self.deep_layers[0])),dtype=np.float32)
weights['bias_0'] = tf.Variable(np.random.normal(loc=0, scale=glorot, size=(1,self.deep_layers[0])),dtype=np.float32)
glorot = np.sqrt(2.0 / (self.deep_layers[0]+self.deep_layers[1]))
weights['layer_1'] = tf.Variable(np.random.normal(loc=0, scale=glorot, size=(self.deep_layers[0], self.deep_layers[1])), dtype=np.float32)
weights['bias_1'] = tf.Variable(np.random.normal(loc=0, scale=glorot, size=(1, self.deep_layers[1])),dtype=np.float32)
# final concat projection layer
if self.use_fm and self.use_deep:
input_size = self.field_size + self.embedding_size + self.deep_layers[1]
elif self.use_fm:
input_size = self.field_size + self.embedding_size
elif self.use_deep:
input_size = self.deep_layers[1]
glorot = np.sqrt(2.0 / (input_size+1))
weights['concat_projection'] = tf.Variable(np.random.normal(loc=0, scale=glorot, size=(input_size, 1)),dtype=np.float32)
weights['concat_bias'] = tf.Variable(tf.constant(0.01), dtype=np.float32)
return weights
初始化圖,其結(jié)構(gòu)如理論部分所示背犯,其中FM部分結(jié)構(gòu)分為一階和二階部分坏瘩,并分別計算。
def _init_graph(self):
self.graph = tf.Graph()
with self.graph.as_default():
tf.set_random_seed(self.random_seed)
self.feat_index = tf.placeholder(tf.int32, shape=[None,None], name='feat_index')
self.feat_value = tf.placeholder(tf.float32, shape=[None,None], name='feat_value')
self.label = tf.placeholder(tf.float32, shape=[None,1], name='label')
self.dropout_keep_deep = tf.placeholder(tf.float32, shape=[None], name='dropout_keep_deep')
self.train_phase = tf.placeholder(tf.bool, name='train_phase')
self.weights = self._initialize_weights()
# model
self.embeddings = tf.nn.embedding_lookup(self.weights['feature_embeddings'], self.feat_index)
feat_value = tf.reshape(self.feat_value, shape=[-1, self.field_size, 1])
self.embeddings = tf.multiply(self.embeddings, feat_value)
# first order term
self.y_first_order = tf.nn.embedding_lookup(self.weights['feature_bias'], self.feat_index)
self.y_first_order = tf.reduce_sum(tf.multiply(self.y_first_order, feat_value), 2)
# second order term (sum-square-part)
self.summed_features_emb = tf.reduce_sum(self.embeddings,1) # None * K
self.summed_features_emb_square = tf.square(self.summed_features_emb) # None * K
# squre-sum-part
self.squared_features_emb = tf.square(self.embeddings)
self.squared_sum_features_emb = tf.reduce_sum(self.squared_features_emb, 1) # None * K
# second order
self.y_second_order = 0.5 * tf.subtract(self.summed_features_emb_square, self.squared_sum_features_emb)
# deep component
self.y_deep = tf.reshape(self.embeddings, shape=[-1, self.field_size * self.embedding_size])
self.y_deep = tf.add(tf.matmul(self.y_deep, self.weights['layer_0']), self.weights['bias_0'])
self.y_deep = self.deep_layers_activation(self.y_deep)
self.y_deep = tf.add(tf.matmul(self.y_deep, self.weights['layer_1']),self.weights['bias_1'])
self.y_deep = self.deep_layers_activation(self.y_deep)
# -----DeepFM-----
self.concat_input = tf.concat([self.y_first_order, self.y_second_order, self.y_deep], axis=1)
self.out = tf.add(tf.matmul(self.concat_input, self.weights['concat_projection']), self.weights['concat_bias'])
本文提供了多種損失函數(shù)和優(yōu)化方法可供選擇漠魏,同時也對模型進行了保存和記錄倔矾。
# loss
if self.loss_type == 'logloss':
self.out = tf.nn.sigmoid(self.out)
self.loss = tf.losses.log_loss(self.label, self.out)
elif self.loss_type == 'mse':
self.loss = tf.nn.l2_loss(tf.subtract(self.label, self.out))
# l2 regularization on weights
if self.l2_reg > 0:
self.loss += tf.contrib.layers.l2_regularizer(self.l2_reg)(self.weights['concat_projection'])
if self.use_deep:
self.loss += tf.contrib.layers.l2_regularizer(self.l2_reg)(self.weights['layer_0'])
self.loss += tf.contrib.layers.l2_regularizer(self.l2_reg)(self.weights['layer_1'])
# optimize
if self.optimizer_type == 'adam':
self.optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate, beta1=0.9, beta2=0.999, epsilon=1e-8).minimize(self.loss)
elif self.optimizer_type == 'adagrad':
self.optimizer = tf.train.AdagradDAOptimizer(learning_rate=self.learning_rate, initial_accumulator_value=1e-8).minimize(self.loss)
elif self.optimizer_type == 'gd':
self.optimizer = tf.train.GradientDescentOptimizer(learning_rate=self.learning_rate).minimize(self.loss)
elif self.optimizer_type == 'momentum':
self.optimizer = tf.train.MomentumOptimizer(learning_rate=self.learning_rate, momentum=0.95).minimize(self.loss)
# init
self.saver = tf.train.Saver()
save_path = "deepfm/model.ckpt"
init = tf.global_variables_initializer()
self.sess = tf.Session()
self.sess.run(init)
save_path = self.saver.save(self.sess, save_path, global_step=1)
writer = tf.summary.FileWriter("D:/logs/deepfm/", tf.get_default_graph())
writer.close()
根據(jù)保存的網(wǎng)絡(luò)結(jié)構(gòu),可通過tensorboard查看網(wǎng)絡(luò)結(jié)構(gòu)柱锹,如下所示:
定義訓(xùn)練函數(shù)哪自,及實例化類并完成訓(xùn)練。
def train(self, train_feature_index, train_feature_value, train_y):
with tf.Session(graph=self.graph) as sess:
sess.run(tf.global_variables_initializer())
for i in range(self.max_iteration):
epoch_loss, _ = sess.run([self.loss, self.optimizer],feed_dict={self.feat_index:train_feature_index,
self.feat_value:train_feature_value,
self.label:train_y})
print('epoch %s, loss is %s' % (str(i), str(epoch_loss)))
deepfm = DeepFM(feature_size=total_feature, field_size=field_size, embedding_size=8)
deepfm.train(train_feature_index, train_feature_value, train_y)
結(jié)果如下所示:
epoch 0, loss is 1.2481447
epoch 1, loss is 1.2091292
epoch 2, loss is 1.1717732
epoch 3, loss is 1.1359197
epoch 4, loss is 1.1009481
epoch 5, loss is 1.0666977
···
epoch 195, loss is 0.15519407
epoch 196, loss is 0.15500054
epoch 197, loss is 0.15480281
epoch 198, loss is 0.15460847
epoch 199, loss is 0.15441668
參考資料
[1]. Guo, Huifeng, Tang, Ruiming, Ye, Yunming, Li, Zhenguo, & He, Xiuqiang. . Deepfm: a factorization-machine based neural network for ctr prediction.
[2]. 推薦系統(tǒng)與深度學習. 黃昕等. 清華大學出版社. 2019.
[3]. https://github.com/wangby511/Recommendation_System
春天禁熏,遂想起江南壤巷,
唐詩里的江南,九歲時
采桑葉于其中瞧毙,捉蜻蜓于其中
——余光中《春天胧华,遂想起》