#論文筆記# Demystifying Core Ranking in Pinterest Image Search

本文是 Pinterest 試驗(yàn)的圖片檢索算法倾哺，本文從：training data齿尽，user/image featurization 和 ranking models 等角度進(jìn)行解讀颠印，并做了性能和質(zhì)量方面的測(cè)試

Backgroud:

1. Purpose: improve the image search quality by evolving ranking techniques

The first search system was built on top of Apache Lucene and solr, the result were ranked by text relevance scores between queries and text description of images. The search system was subsequently evolved by adding varies advancements that address unique challenges of Pinterest ( got problems and explored solutions)

2. Three challenges:?

The first challenge?is the reason why users use Pinterest to search images. It can be understood by adding pins for users to find their intent and explicit, such as 'try it', 'close up' & 'repin'. However, it rises new problem that it is difficult to define a universal performance rule to distinguish the weight between 'try it' and 'close up', which one is more important? An other challenge comes from the the nature of images. An image may reflect many information which is difficult to extract and describe with simple words. The third is balancing the efficiency and effectiveness of ranking algorithms. Though there are many algorithms implemented in industries, it is rarely known that which algorithm is the best for a given application.

3. Three aspects:

data: combine the engagement training data & human curated relevance judgement data

Featurization: include feature engineering, word embedding and visual embedding, visual relevance signal, query intent understanding, user intent understanding etc.

Modeling: cascading core ranking component ( to achieve the balance between search latency and search quality)

Section 2: Introduce how to curate training data?

maxmize both engagement & relevance

engagement data （user behavioral metrics）:

click-through engagement Log has became the standard training data for learning to rank optimization in search engine，< q,u, (P, T) >撼嗓， but T can be?multiple actions towards pins including click-through, repin, close-up, like, hide, comment, try-it, etc绑谣，?

so?a new challenge: how we should simultaneously combine and optimize multiple feedbacks?

方案1：每個(gè)engagement訓(xùn)練一個(gè)模型，然后再對(duì)多個(gè)模型進(jìn)行ensemble和調(diào)試程梦，?但是嘗試了很多方法点把，沒(méi)有辦法得到一個(gè)不犧牲任何engagement的方法

方案2：在data_level進(jìn)行ensemble, l (p | q,u)代表用戶u在搜索詞q下對(duì)pin（圖片）的engagement-based quality label橘荠，用lp表示。于是對(duì)圖片集 P （user query recall）可以生成label set L 郎逃，對(duì)于圖片集P中的每個(gè)圖片哥童，它的lp是所有T集合的線性組合，?

公式1褒翰， T是一系列action贮懈，包括上面提到的（repin，close Up 等）影暴，ct是action的計(jì)數(shù)错邦，wt是action的權(quán)重，權(quán)重與action的volume成反比型宙。（volume of Close up? > click ）

對(duì)公式1 用公式2 做歸一化撬呢，以位置和年齡為基礎(chǔ) 歸一化每個(gè)pin的label， agep and posp are the age and position of pin p, τ is the normalized weight for the ages of pins, and λ is the parameter that controls the position decay.

another challenge: 樣本偏斜妆兑，negative > positive , 修剪樣本集魂拦，1. 舍棄lp都小于零的query group2. 舍棄負(fù)樣本數(shù)超過(guò)某個(gè)閾值的query group?

最終得到< q,u, (P, T, L) >

human relevance data (human relevance judgment）:

雖然大規(guī)模不可靠用戶搜索會(huì)話的聚集提供了隱含的反饋的可靠的參與訓(xùn)練數(shù)據(jù)，但是它也帶來(lái)了來(lái)自當(dāng)前排序函數(shù)的偏倚搁嗓。例如芯勘，位置偏倚就是其中之一。To correct the ranking bias, we also curate relevance judgment data from human experts腺逛，類似問(wèn)卷調(diào)查荷愕，人工打標(biāo)。

combining：

we simply consider each trainingdata source independently and feed each of which into the ranking function to train a specific model

Section 3: Feature representation for ranking

how we enhance traditional ranking features to address unique challenges

Beyond Text Relevance Feature：

背景：the text description of each Pin usually is short and noisy.?

1. word-level relevance: 為每個(gè)圖片標(biāo)注text annotation in the format of （n-gram）棍矛， unigrams, bigrams and trigrams from different sources such as?texts, description, text from the crawled linked web page and?automatically classified annotation label.然后用bm25或者proximity BM25算text matching score安疗。

2.?Categoryboost:?32 L1 categories and 500 L2 categories.

3.?Topicboost: topic 是通過(guò)statistical topic modeling分析words的分布找到的（Latent Dirichletallocation topic modeling）

4.Embedding Features：在latent representation space 上描述了 ?users’ query request 和 the pins之間的相似程度。

User Intent Features： help our learning algorithm learn which type of pins are “really” relevant and interesting to users

1.?Navboost：描述了一個(gè)圖片在一般情況下和特殊搜索詞和用戶群下的表現(xiàn)够委，根據(jù)先驗(yàn)經(jīng)驗(yàn)統(tǒng)計(jì)close up, click, long-click and repin荐类，還有 country, gender, aggregation time 等信息。

2.?Tokenboost： how well a pin performs in general and in context of a specific token

3.Gender Features：

4.Personalized Features：

Socialness, Visual and other Features

圖片大小茁帽，寬高比玉罐，圖像哈希特征，pin following, friends following

Section 4: Cascading models

三級(jí)： light-weight stage, full-ranking stage, and re-ranking stage.如圖4潘拨，百萬(wàn)級(jí)到萬(wàn)集到1000吊输，

q: query, u:user, p:pin, x: feature for a tuple with (q, u, p ), lp 同section 2, y是lp的真值，sp 是quality score of p given q & u, L 是Loss 是function战秋， S 是 score function.

table 1 是各個(gè)stage 用的model璧亚。略過(guò)基于規(guī)則的模型，因?yàn)榉浅Ｖ庇^。

light-weight stage：

filter out negative pins

full-ranking stage：

re-ranking stage：

提高新鮮度癣蟋、多樣性透硝、地方性和語(yǔ)言意識(shí)。

models：

Gradient Boost Decision Tree (GBDT）：use?mean square loss as the training loss

Deep Neural Network (DNN）：learns to predict quality score sp疯搅，變成多分類問(wèn)題?4-scale label [1, 2, 3, 4]. lp變成y ∈ [1, 2, 3, 4]濒生。cross entropy loss

Convolutional Neural Network (CNN)：除了結(jié)構(gòu)不同，其他與DNN都是一樣的

RankNet：找到文本對(duì)的正確排序幔欧，?one model learns a ranking function which predicts the S(q,u罪治，pi，pj, θ probability spi > spj. 基于未歸一的lp礁蔗，lpi > lpj ? lpi:lpj.? loss function 用公式6

RankSVM：?

Gradient Boost Ranking Tree (GBRT)：RankSVM 和 GBDT的組合觉义，每次迭代都是為了學(xué)到一個(gè)function that pi 排在pj前面的概率, 和gbdt相似的是，rander是一個(gè)有限深度的回歸樹(shù)

模型融合：

engagement data 和 human relevance data 訓(xùn)練出了不同的種模型浴井，用線性融合晒骇。

對(duì)于訓(xùn)練階段計(jì)算量小的模型，在訓(xùn)練過(guò)程中融合磺浙，反之在訓(xùn)練結(jié)束后融合洪囤。

Section 5：Experiments

Offline Experimental Setting：

每個(gè)國(guó)家和語(yǔ)言，human-relevance data : 5000 query and performed user judgement for 400 pins per query ; engagement data: 最近7天隨機(jī)抽取1%的log. 70% 用于訓(xùn)練撕氧，20%測(cè)試瘤缩，10%驗(yàn)證。

Offline Measurement Metrics 圖6 是部分feature分布伦泥。offline 用ndcg做評(píng)估剥啤。

Online Experimental Setting：

100 buckets

Online Measurement Metrics?

用戶level 和 query level

For query-level measurement metrics, repin per search (Qrepin), click per search (Qclick), close up per search

(Qclose up) and engagement per search (Qengaged) were the main metrics we used

Performance Results

5.3.1 Lightweight Ranking Comparison.

圖7 離線數(shù)據(jù)下，以rule based 為基準(zhǔn)不脯， ranksvm 有很大提升铐殃，但是線上ranksvm提升不大，通病跨新，提升不高但是延遲降低。table2

5.3.2 Full Ranking Comparison

offline: 對(duì)比ranksvm,?CNN ? GBRT ? DNN ? RankNet ? GBDT

online: 神經(jīng)網(wǎng)絡(luò)模型引入latency坏逢，線下預(yù)先計(jì)算好域帐，作為feature添加到線上的數(shù)模型。

5.3.3 Re-ranking Comparison.

當(dāng)基于規(guī)則的rerank模型被換為GBRT時(shí)是整，在fresh pin 的repin rate和ctr增加了20%肖揣。

reference

歸一化折損累積增益（NDCG）

在精確率與召回率中，返回集中每個(gè)項(xiàng)目的地位（權(quán)值）是一樣浮入，即位置k處的項(xiàng)目與位置1處的項(xiàng)目地位一樣龙优。但用戶對(duì)返回結(jié)果的理解卻不是這樣。對(duì)于搜索引擎給出的排序結(jié)果事秀，最前面的答案會(huì)遠(yuǎn)比排在后面的答案重要彤断。

歸一化折損累積增益（NDCG野舶，Normalized Discounted Cumulative G a i n）便考慮了這種情況。N D C C包含了3 個(gè)遞進(jìn)的指標(biāo)：累積增益（CG宰衙，Cumulative Gain）平道，折損累積增益（DCG，Discounted Cumulative Gain)供炼，進(jìn)而得到歸一化折損累積增益一屋。CG是對(duì)排序返回的最前面k個(gè)項(xiàng)目的相關(guān)性得分求和，而DCG在每個(gè)項(xiàng)目的得分乘上一個(gè)權(quán)值袋哼，該權(quán)值與位置成反關(guān)系冀墨，即位置越靠前，權(quán)值越大涛贯。NDCG則對(duì)每項(xiàng)的帶權(quán)值得分先進(jìn)行歸一化（把每個(gè)項(xiàng)目的得分除以最好的那個(gè)項(xiàng)目的得分）诽嘉，這樣得分總是落在0.0和1.0之間。維基百科上的相關(guān)文章有更詳細(xì)的數(shù)學(xué)公式疫蔓。

DCG或NDCG在信息檢索中或者那些對(duì)項(xiàng)目的返回位置關(guān)心的模型方法找中用的比較多含懊。

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個(gè)濱河市衅胀，隨后出現(xiàn)的幾起案子岔乔，更是在濱河造成了極大的恐慌，老刑警劉巖滚躯，帶你破解...
沈念sama閱讀 211,948評(píng)論 6贊 492
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件雏门，死亡現(xiàn)場(chǎng)離奇詭異，居然都是意外死亡掸掏，警方通過(guò)查閱死者的電腦和手機(jī)茁影，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 90,371評(píng)論 3贊 385
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門(mén)，熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)丧凤，“玉大人募闲，你說(shuō)我怎么就攤上這事≡复” “怎么了浩螺？”我有些...
開(kāi)封第一講書(shū)人閱讀 157,490評(píng)論 0贊 348
道士緝兇錄：失蹤的賣(mài)姜人
文/不壞的土叔我叫張陵，是天一觀的道長(zhǎng)仍侥。經(jīng)常有香客問(wèn)我要出，道長(zhǎng)，這世上最難降的妖魔是什么农渊？我笑而不...
開(kāi)封第一講書(shū)人閱讀 56,521評(píng)論 1贊 284
?港島之戀（遺憾婚禮）
正文為了忘掉前任患蹂，我火速辦了婚禮，結(jié)果婚禮上，老公的妹妹穿的比我還像新娘传于。我一直安慰自己囱挑，他們只是感情好，可當(dāng)我...
茶點(diǎn)故事閱讀 65,627評(píng)論 6贊 386
惡毒庶女頂嫁案：這布局不是一般人想出來(lái)的
文/花漫我一把揭開(kāi)白布格了。她就那樣靜靜地躺著看铆，像睡著了一般。火紅的嫁衣襯著肌膚如雪盛末。梳的紋絲不亂的頭發(fā)上弹惦，一...
開(kāi)封第一講書(shū)人閱讀 49,842評(píng)論 1贊 290
城市分裂傳說(shuō)
那天，我揣著相機(jī)與錄音悄但，去河邊找鬼棠隐。笑死，一個(gè)胖子當(dāng)著我的面吹牛檐嚣，可吹牛的內(nèi)容都是我干的助泽。我是一名探鬼主播，決...
沈念sama閱讀 38,997評(píng)論 3贊 408
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開(kāi)眼嚎京，長(zhǎng)吁一口氣：“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼嗡贺！你這毒婦竟也來(lái)了？” 一聲冷哼從身側(cè)響起鞍帝，我...
開(kāi)封第一講書(shū)人閱讀 37,741評(píng)論 0贊 268
萬(wàn)榮殺人案實(shí)錄
序言：老撾萬(wàn)榮一對(duì)情侶失蹤诫睬，失蹤者是張志新（化名）和其女友劉穎，沒(méi)想到半個(gè)月后帕涌，有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體摄凡，經(jīng)...
沈念sama閱讀 44,203評(píng)論 1贊 303
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡，尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 36,534評(píng)論 2贊 327
?白月光啟示錄
正文我和宋清朗相戀三年蚓曼，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了亲澡。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
茶點(diǎn)故事閱讀 38,673評(píng)論 1贊 341
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡纫版，死狀恐怖床绪，靈堂內(nèi)的尸體忽然破棺而出，到底是詐尸還是另有隱情其弊，我是刑警寧澤会涎，帶...
沈念sama閱讀 34,339評(píng)論 4贊 330
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布，位于F島的核電站瑞凑，受9級(jí)特大地震影響，放射性物質(zhì)發(fā)生泄漏概页。R本人自食惡果不足惜籽御，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 39,955評(píng)論 3贊 313
男人毒藥：我在死后第九天來(lái)索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望。院中可真熱鬧技掏，春花似錦铃将、人聲如沸。這莊子的主人今日做“春日...
開(kāi)封第一講書(shū)人閱讀 30,770評(píng)論 0贊 21
一樁弒父案劲阎，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽(yáng)。三九已至鸠真，卻和暖如春悯仙，著一層夾襖步出監(jiān)牢的瞬間，已是汗流浹背吠卷。一陣腳步聲響...
開(kāi)封第一講書(shū)人閱讀 32,000評(píng)論 1贊 266
情欲美人皮
我被黑心中介騙來(lái)泰國(guó)打工锡垄，沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留，地道東北人祭隔。一個(gè)月前我還...
沈念sama閱讀 46,394評(píng)論 2贊 360
代替公主和親
正文我出身青樓货岭，卻偏偏與公主長(zhǎng)得像，于是被迫代替她去往敵國(guó)和親疾渴。傳聞我的和親對(duì)象是個(gè)殘疾皇子千贯，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 43,562評(píng)論 2贊 349