跟著跑了一下baseline绞佩,記錄一下遇到的不熟的函數(shù)和問(wèn)題:
- drop_duplicates :去除重復(fù)數(shù)據(jù)
drop_duplicates((['user_id', 'click_article_id', 'click_timestamp']))
- reset_index():重置索引
DataFrame.reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='')
- defaultdict(int)
defaultdict類返回一個(gè)類似于的字典對(duì)象,第一個(gè)參數(shù)給default_factory屬性賦值境钟,其它的參數(shù)都傳遞給dict構(gòu)造器依啰。通俗來(lái)說(shuō)就是defaultdict類的初始化函數(shù)接收一個(gè)類型作為參數(shù)蜕提,當(dāng)訪問(wèn)的鍵不存在宴抚,實(shí)例化一個(gè)值作為默認(rèn)值。(https://blog.csdn.net/Alen_1996/article/details/87916039)如果是int布轿,當(dāng)key不存在時(shí),對(duì)應(yīng)0(http://www.reibang.com/p/bbd258f99fd3)
有關(guān)itemCF部分補(bǔ)充這里的文檔進(jìn)行學(xué)習(xí):https://github.com/datawhalechina/team-learning-rs/blob/master/RecommendationSystemFundamentals/02%20%E5%8D%8F%E5%90%8C%E8%BF%87%E6%BB%A4.md
需要后續(xù)再查的問(wèn)題:逆用戶頻率(IUF, Inverse User Frequence)