推薦系統(tǒng)
# install.packages('recommenderlab')
require(recommenderlab)
data(MovieLense)
MovieLense
# 943個用戶 1664部電影評分
# 看一個用戶對每一部電影的評分
# 把第一行轉(zhuǎn)換成list ,取前10
as(MovieLense[1, ], 'list')[[1]][1:10]
# 1:100用戶隙疚,對1:100電影的評分
# 看到有些用戶壁拉,評分多击狮,有些不評分
image(MovieLense[1:100, 1:100])
hist(rowCounts(MovieLense))
hist(colCounts(MovieLense))
mean(rowMeans(MovieLense))
recommenderLab
針對realRatingMatrix, recommenderLab 提供了6中不同的推薦方法
i.e random(隨機(jī)推薦)潭流, popular(基于流行度推薦),ibcf(基于項(xiàng)目協(xié)同過濾)
recommenderRegistry$get_entries(dataType = 'realRatingMatrix')
以IBCF_realRatingMatrix 為例
k: 取多少個相似的item
method:相似度算法,默認(rèn)采用余弦相似度
Normalize: 采用何種歸一化算法
normalize_sim_matrix: 是否對相似矩陣歸一化
alpha:
na_as_zero: 是否將NA作為0
minRating: 最小化評分
推薦模型
ml.recomModel = Recommender(MovieLense[1:800], method = 'IBCF')
# top_n 推薦
ml.predictl = predict(ml.recomModel, MovieLense[805:807], n =5)
ml.predictl
as(ml.predictl, 'list')
# 評分
ml.predict2 = predict(ml.recomModel, MovieLense[805:807], type = 'ratings')
ml.predict2
as(ml.predict2, 'matrix')[1:3, 1:6]
模型評估
recommenderlab 包有提供專門的評估方案硫眨, 對應(yīng)的函數(shù)是evaluationScheme,
能夠設(shè)置采用n-fold交叉驗(yàn)證還是簡單的training/train分開驗(yàn)證,這里采用后一種方法巢块,
即將數(shù)據(jù)集簡單分為training和test捺球,在training訓(xùn)練模型,然后在test上評估
model.eval = evaluationScheme(MovieLense[1:943], method = 'split', train = 0.9,
given = 15, goodRating = 5)
# 分別用RANDOM夕冲、UBCF氮兵、IBCF建立預(yù)測模型
model.random = Recommender(getData(model.eval, 'train'), method = 'RANDOM')
model.ubcf = Recommender(getData(model.eval, 'train'), method = 'UBCF')
model.ibcf = Recommender(getData(model.eval, 'train'), method = 'IBCF')
# 分別根據(jù)每個模型預(yù)測評分
# predict 已知部分測試數(shù)據(jù)
predict.random = predict(model.random, getData(model.eval, 'known'), type = 'ratings')
predict.ubcf = predict(model.ubcf, getData(model.eval, 'known'), type = 'ratings')
predict.ibcf = predict(model.ibcf, getData(model.eval, 'known'), type = 'ratings')
error = rbind(
calcPredictionAccuracy(predict.random, getData(model.eval, 'unknow')),
calcPredictionAccuracy(predict.ubcf, getData(model.eval, 'unknow')),
calcPredictionAccuracy(predict.ibcf, getData(model.eval, 'unknow'))
)
rownames(error) = c('RANDOM', 'UBCF', 'IBCF')
error
UBCF 的方法最好