小結(jié)
主要講述pinterest在Related Pins這個場景的演進過程付魔。pinterest的總保存RP占比從10%漲到了40%孙蒙。Related Pins Save Propensity,是他們主要的優(yōu)化目標。Related Pins leverages this human-curated content to provide personalized recommendations of pins based on a given query pin
系統(tǒng)總覽
一共分為三塊柄延、候選產(chǎn)生蚀浆、memboost和rank,注意看三者在系統(tǒng)中的先后順序
- 候選
規(guī)模從10億到1000 - memboost
memorizes past engagement on specific query and result pairs. -
rank
maximize our target engagement metric of Save Propensity
圖片.png
候選演進過程
最早候選主要是基于pins在boards中的共現(xiàn)搜吧,后來引入memboost和ltr之后市俊,候選的問題從precision慢慢往recall的方向靠攏,加入了更多新的候選來源
board共現(xiàn)
1)用mapreduce來計算兩個pin在boards中的共現(xiàn)滤奈,同時基于分類和文本的匹配程度來進行相關(guān)性加權(quán)摆昧,有兩個問題a)長尾問題;b)更多的依賴基礎(chǔ)信息的相關(guān)性得分
2)隨機游走Pixie。a)使用一些規(guī)則去掉了一些高相關(guān)和低相關(guān)的節(jié)點;b)參考了twitter的游走方法蜒程,超過10w次的隨機游走的結(jié)果聚合绅你,每次都重設概率
3)優(yōu)缺點
優(yōu)點是recall還不錯,缺點1)board有時寬泛了昭躺,而且board的內(nèi)容容易隨著用戶的興趣轉(zhuǎn)義而變化忌锯;2)board有時太狹義了,比如whiskey和cocktail有時候在不同的board中session共現(xiàn)
用戶在時序上的行為串聯(lián)可以解決board太寬泛和狹窄的問題
Pin2Vec is a learned embedding of the N most popular (head) pins in a d-dimensional space, with the goal of mini- mizing the distance between pins that are saved in the same session.
we consider pins that are saved by the same user within a certain time window to be related (保存行為而不是點擊行為G涎觥汉规!),it captures a large amount of user behavior in a compact vector representation.補充候選
為了解決兩大問題:1)the cold start problem: rare pins do not have a lot of candidates because they do not appear on many boards.2)after we added ranking, we wanted to expand our candidate sets in the cases where diversity of results would lead to more engagement.
1)基于搜索的候選
We generate candidates by leveraging Pinterest’s text-based search, using the query pin’s annotations (words from the web link or description) as query tokens. Each popular search query is backed by a precomputed set of pins from Pinterest Search
2)視覺相似候選
a) If the query image is a near-duplicate, then we add the Related Pins recommendations for the duplicate image to the results.
b) use the Visual Search backend to return visually similar images, based on a nearest-neighbor lookup
- 區(qū)域候選
the content activation problem: rare pins do not show up as candidates because they do not appear on many boards.
we generate additional candidate sets segmented by locale for many of the above generation techniques
解決內(nèi)容激活問題的方法還有g(shù)ender-specific content or fresh content.
memboost演進過程
we built Memboost to memorize the best result pins for each query驹吮,Memboost as a whole introduces significant system com- plexity by adding feedback loops in the system针史。
1)使用clicks over expected clicks (COEC) 來解決位置和平臺的偏差
2)具體行為考慮到點擊、長點擊碟狞、關(guān)閉和保存啄枕,具體的計算方法為:
3)如果加入了new ranker或者時間推移或者系統(tǒng)改變,其實會對memboost這種歷史累計的得分產(chǎn)生影響族沃,解決方法是把memboost作為一個feature喂給ranker
4)memboost insertion主要是為了解決一些召回和排序不存在的優(yōu)質(zhì)內(nèi)容進行回流
ranking演進過程
- 概覽
假設ranking是未來最有可能提升效果的部分频祝,第一個版本效果提升了30%。第一個版本只用了pin的原始數(shù)據(jù)脆淹,后面加上了Memboost and user data常空,包括用戶的行為數(shù)據(jù)(最近搜索)。特征包括原始特征(主題盖溺、分類)漓糙、歸一化特征(memboost)、one-hot編碼特征烘嘱、相關(guān)性特征(query和candidate的主題相關(guān)性)等等昆禽。
在實際線上使用中主要是有三個大的問題需要處理:
- 訓練數(shù)據(jù)如何選擇蝗蛙?用memboost分數(shù)還是獨立session作為訓練數(shù)據(jù)
- 學習目標用 pointwise 還是 pairwise?
- 模型類型用線性模型還是樹模型
-
進化過程
圖片.png
1). Memboost training data, relevance pair labels, pairwise loss, and linear RankSVM model
- pins (r1,rn),(rn,rrand) for each query, where r1,rn are the results with highest and lowest Memboost scores,
- respectively, for a given query. rrand is a randomly generated popular pin from Pinterest, added to stabilize the rankings, as suggested in [14].
- We reasoned that pins with low Memboost scores would still be more relevant than purely random pins
優(yōu)點: - the training data was fairly clean
- we could use a much smaller corpus and train a model within minutes on a single machine.
缺點: - Memboost data inherently precludes personalization because it is aggregated over many users, losing the association with individual users and session context(很難模擬不同用戶對結(jié)果不同的反應)
- only popular content had enough interaction for reliable Memboost data.
2). individual Related Pins sessions
logged session consists of the query pin, viewing user, and recent action context, and a list of result pins.
save > long click > click > closeup > impression only.
we trim the logged set of pins, taking each engaged pin as well as two pins immediately preceding it in rank order, under the assumption that the user probably saw the pins immediately preceding the pin they engaged with(只選連續(xù)出現(xiàn)的pair)
3). Moved to a RankNet GBDT Model
線性模型的缺點 - they force the score to depend linearly on each feature.
- cannot make use of features that only depend on the query and not the candidate pin醉鳖, It is time consuming to engineer these feature crosses
樹模型的優(yōu)點 - allowing non-linear response to individual features,
- decision trees also inherently consider interactions between features, corresponding to the depth of the tree.
4). Moved to pointwise classification loss, binary labels, and logistic GBDT model - closeups and clicks seemed counterproductive since these actions may not reflect save propensity.
- We found that giving examples simple binary labels (“saved” or “not saved”) and reweighting positive examples to combat class imbalance proved effective at increasing save propensity.
- 模型偏差
the model that is currently deployed dramatically impacts the training examples produced for future models.
- the first ranking model, the logs reflected user’s engagement with results ranked only by the candidate generator
- Over the following months, the training data only reflected engagement with pins that were highly ranked by the existing model
- training pins no longer matched the distribution of pins ranked at serving time.
解決方法 - a small percentage of traffic for “unbiased data collection捡硅,show a random sample,randomly ordered without ranking盗棵,each user is served unranked pins on only a small random subset of queries.
- 成功的評測指標
迭代越快越好壮韭,線上的ab測試對于ranking的評價方法就是看保存率,但是online需要數(shù)天漾根,所以離線測試就很重要了泰涂。
- sample individual Related Pins sessions, but choose a distinct range of dates that follows the training date range, and a slightly different sampling strategy
- For each session we rescore the pins that the user actually saw, using the models under test, then measure the agreement between the scores and the logged user behavior.
- We examined the directionality as well as the magnitude of the difference predicted by offline evaluation, and compared it to the actual experiment results.
- We found that PR AUC metrics are extremely predictive of closeups and clickthroughs in A/B experiments, but we had difficulty predicting the save behav- ior using offline evaluation
-
serving框架
離線與在線相結(jié)合
圖片.png
ranking與memboost的關(guān)系
ranking和memboost理論上來說都是進行排序的,在pinterest的應用中辐怕,這倆模塊也一直在逼蒙。我們可以看到
- 第三階段其實是ranking先pre-compute做一個粗排,然后用memboost進行微調(diào)(其實還有一種做法就是用memboost進行粗排再用ranking精排寄疏,不知道為什么pinterest為什么這么選擇是牢?)
- 第四階段直接變成joint-learning了,memboost的分數(shù)變成了ranking的一個特征陕截,進行聯(lián)合訓練驳棱。
挑戰(zhàn)
- Changing Anything Changes Everything
inputs are never really independent,Improving another compo- nent may actually result in worse overall performance.
Our general solution is to jointly train/automate as much of the system as possible for each experiment.
- To avoid other changes resulting in hyperparameters becoming subop- timal, we implemented a parallelized system for automated hyperparameter tuning
- “Improvements” to the raw data can harm our results since our downstream model is trained on the old definition of the feature. Even if a bug is fixed
- changing or introducing a candidate generator can cause the ranker to become worse, since the training data distribution will no longer match the distribution of data ranked at serving time农曲。Our current solution is to insert the new candidates into the training data collection for some time before running an experiment with a newly trained model.
- 內(nèi)容冷啟動
- We dive deeper into how we solved for content activation for the particular case of localization.
- SWAP:we check if there is a local alternative with the same image.
- BOOST: artificially promote them to a higher position in the result set.
- BLEND: producing a segmented corpus of pins for each language社搅,blend local candidates into the results at various ratios.
TODO
pinterest的最新方向好像是GCN
參考資料和深入閱讀
Pinterest推薦系統(tǒng)四年進化之路
Wtf: The who to follow service at twitter
Visual search at pinterest
Visual discovery at pinterest