原文出處AI千集
Netflix Technology BlogFollow
Dec 8, 2017 · 13 min read
By Ashok Chandrashekar, Fernando Amat, Justin Basilico and Tony Jebara
For many years, the main goal of the Netflix personalized recommendation system has been to get the right titles in front each of our members at the right time. With a catalog spanning thousands of titles and a diverse member base spanning over a hundred million accounts, recommending the titles that are just right for each member is crucial. But the job of recommendation does not end there. Why should you care about any particular title we recommend? What can we say about a new and unfamiliar title that will pique your interest? How do we convince you that a title is worth watching? Answering these questions is critical in helping our members discover great content, especially for unfamiliar titles. One avenue to address this challenge is to consider the artwork or imagery we use to portray the titles. If the artwork representing a title captures something compelling to you, then it acts as a gateway into that title and gives you some visual “evidence” for why the title might be good for you. The artwork may highlight an actor that you recognize, capture an exciting moment like a car chase, or contain a dramatic scene that conveys the essence of a movie or TV show. If we present that perfect image on your homepage (and as they say: an image is worth a thousand words), then maybe, just maybe, you will give it a try. =This is yet another way Netflix differs from traditional media offerings: we don’t have one product but over a 100 million different products with one for each of our members with ==personalized recommendations== and ==personalized visuals==.=
多年來辅愿,Netflix 個(gè)性化推薦系統(tǒng)的主要目標(biāo),是為用戶在合適的時(shí)間推薦合適的視頻卧檐。Nteflix 網(wǎng)站上每個(gè)分類頁面下有成千上萬部影片秸妥,用戶賬號(hào)達(dá)數(shù)十億,因此為每個(gè)成員推薦合適的視頻至關(guān)重要榆芦。但推薦系統(tǒng)能做到的不僅是這些柄粹。怎樣讓用戶對(duì)你推薦的視頻感興趣?怎樣讓一個(gè)陌生的視頻激起用戶的興趣匆绣?什么樣的視頻值得關(guān)注驻右?回答這些問題對(duì)于幫助用戶發(fā)現(xiàn)好的內(nèi)容至關(guān)重要,特別是對(duì)于不熟悉的視頻崎淳。
視頻的封面:為視頻設(shè)計(jì)獨(dú)立的海報(bào)或圖像堪夭,是可以輕松地解決這個(gè)問題的方法之一。如果一張封面對(duì)用戶有足夠的吸引力拣凹,比如用戶熟悉的演員森爽、讓人腎上腺激素飆升的汽車追逐場面,或者一部電影或電視節(jié)目精髓的戲劇性場景等信息(一張圖片勝過千言萬語)嚣镜,就會(huì)誘惑用戶點(diǎn)開視頻爬迟。這是 Netflix 與傳統(tǒng)媒體產(chǎn)品不同的一點(diǎn):我們提供的不是一個(gè)產(chǎn)品,而是一個(gè)千人千面的產(chǎn)品菊匿。就算一億個(gè)用戶進(jìn)來付呕,看到的也完全不同计福,我們?yōu)槊總€(gè)用戶提供個(gè)性化推薦和個(gè)性化的視覺效果。
A Netflix homepage without artwork. This is how historically our recommendation algorithms viewed a page.
沒有封面圖的主頁徽职。過去象颖,頁面上推薦算法的效果
In previous work, we discussed an effort to find the single perfect artwork for each title across all our members. Through multi-armed bandit algorithms, we hunted for the best artwork for a title, say Stranger Things, that would earn the most plays from the largest fraction of our members. However, given the enormous diversity in taste and preferences, wouldn’t it be better if we could find the best artwork for each of our members to highlight the aspects of a title that are specifically relevant to them?
之前,我們討論過如何做到為所有會(huì)員的視頻匹配最合適的封面姆钉。通過多臂老虎機(jī)算法说订,我們可以為視頻找到最合適的封面,以《怪奇物語》為例育韩,這部影片獲得了最高用戶播放率克蚂。但是,鑒于用戶的品味和偏好存在巨大差異筋讨,如果我們能夠找到每個(gè)用戶偏好的點(diǎn),并在封面圖中能呈現(xiàn)出他們最感興趣的東西摸恍,效果不是更好嗎悉罕?
Artwork for Stranger Things that each receive over 5% of impressions from our personalization algorithm. Different images cover a breadth of themes in the show to go beyond what any single image portrays.
增加海報(bào)后,新的視頻通過個(gè)性化算法立镶,得到了 5% 的提升壁袄。不同的圖像涵蓋了節(jié)目中的不同主題
As inspiration, let us explore scenarios where personalization of artwork would be meaningful. Consider the following examples where different members have different viewing histories. On the left are three titles a member watched in the past. To the right of the arrow is the artwork that a member would get for a particular movie that we recommend for them.
我們探討一下封面?zhèn)€性化在哪些場景下具有重要意義。例如媚媒,每個(gè)用戶有不同的觀看歷史嗜逻,下圖左是三個(gè)用戶過去看過的視頻,箭頭右側(cè)是我們?yōu)闀?huì)員推薦的頗受歡迎的電影缭召。
Let us consider trying to personalize the image we use to depict the movie Good Will Hunting. Here we might personalize this decision based on how much a member prefers different genres and themes. Someone who has watched many romantic movies may be interested in Good Will Hunting if we show the artwork containing Matt Damon and Minnie Driver, whereas, a member who has watched many comedies might be drawn to the movie if we use the artwork containing Robin Williams, a well-known comedian.
我們?yōu)殡娪啊缎撵`捕手》設(shè)計(jì)個(gè)性化封面的根據(jù)是每個(gè)用戶對(duì)不同類型和主題的偏好栈顷。對(duì)于看過許多浪漫愛情電影的人,如果他的推薦圖片中包含馬特·達(dá)蒙(Matt Damon)和米妮·司各德(Minnie Driver)的信息嵌巷,可能他會(huì)對(duì)《心靈捕手》感興趣萄凤,而如果是對(duì)于看過很多喜劇片的用戶,我們?cè)谕扑]圖中包含知名喜劇演員羅賓·威廉斯(Robin Williams)的信息搪哪,吸引他的幾率可能更大靡努。
In another scenario, let’s imagine how the different preferences for cast members might influence the personalization of the artwork for the movie Pulp Fiction. A member who watches many movies featuring Uma Thurman would likely respond positively to the artwork for Pulp Fiction that contains Uma. Meanwhile, a fan of John Travolta may be more interested in watching Pulp Fiction if the artwork features John.
另外,個(gè)性化封面對(duì)喜歡不同演員的用戶會(huì)產(chǎn)生什么影響呢晓折?以《低俗小說》為例惑朦,一位觀看過很多烏瑪·瑟曼(Uma Thurman)出演電影的用戶可能會(huì)對(duì)包含烏瑪(Uma)信息的圖片反應(yīng)更為積極。同理漓概,John Travolta 的粉絲更可能因?yàn)閳D像中包含 John 而被這部影片吸引漾月。
Of course, not all the scenarios for personalizing artwork are this clear and obvious. So we don’t enumerate such hand-derived rules but instead rely on the data to tell us what signals to use. Overall, by personalizing artwork we help each title put its best foot forward for every member and thus improve our member experience.
當(dāng)然,并不是所有的封面?zhèn)€性化場景都是這么明了的垛耳。所以我們并沒有窮舉這些規(guī)則栅屏,而是依靠數(shù)據(jù)來告訴我們應(yīng)該使用什么圖片飘千。總體而言栈雳,通過封面?zhèn)€性化處理护奈,我們可以幫助提高每個(gè)用戶的體驗(yàn)。
Challenges 挑戰(zhàn)
At Netflix, we embrace personalization and algorithmically adapt many aspects of our member experience, including the rows we select for the homepage, the titles we select for those rows, the galleries we display, the messages we send, and so forth. Each new aspect that we personalize has unique challenges; personalizing the artwork we display is no exception and presents different personalization challenges. One challenge of image personalization is that we can only select a single piece of artwork to represent each title in each place we present it. In contrast, typical recommendation settings let us present multiple selections to a member where we can subsequently learn about their preferences from the item a member selects. This means that image selection is a chicken-and-egg problem operating in a closed loop: if a member plays a title it can only come from the image that we decided to present to that member. What we seek to understand is when presenting a specific piece of artwork for a title influenced a member to play (or not to play) a title and when a member would have played a title (or not) regardless of which image we presented. Therefore artwork personalization sits on top of the traditional recommendation problem and the algorithms need to work in conjunction with each other. Of course, to properly learn how to personalize artwork we need to collect a lot of data to find signals that indicate when one piece of artwork is significantly better for a member.
Netflix 還通過算法對(duì)網(wǎng)站做了很多個(gè)性化處理哥纫,以提高會(huì)員體驗(yàn)霉旗,包括主頁列表選擇、列表的標(biāo)題蛀骇、展示的圖片厌秒、發(fā)送的消息等等。對(duì)于我們來說擅憔,每一個(gè)方面的個(gè)性化處理都是獨(dú)特的挑戰(zhàn)鸵闪,個(gè)性化封面也不例外。其中暑诸,圖像個(gè)性化處理的挑戰(zhàn)之一蚌讼,是每個(gè)位置視頻的封面只能有一張。相比之下个榕,典型的推薦設(shè)置可以向會(huì)員提供多個(gè)選擇篡石,之后我們可以從會(huì)員的選擇中了解他們的偏好。這就是個(gè)先有雞還是先有蛋的問題西采。會(huì)員到底是因?yàn)閭€(gè)性化封面吸引他凰萨,點(diǎn)擊的這個(gè)視頻,還是因?yàn)楸緛砭拖肟催@個(gè)視頻械馆,和封面無關(guān)胖眷。因此,個(gè)性化封面推薦應(yīng)該結(jié)合傳統(tǒng)方法與算法才能奏效狱杰。當(dāng)然瘦材,為了正確學(xué)習(xí)封面?zhèn)€性化,我們需要收集大量的數(shù)據(jù)仿畸,來找到能表明哪個(gè)封面對(duì)于用戶更合適的信息食棕。
Another challenge is to understand the impact of changing artwork that we show a member for a title between sessions. Does changing artwork reduce recognizability of the title and make it difficult to visually locate the title again, for example if the member thought was interested before but had not yet watched it? Or, does changing the artwork itself lead the member to reconsider it due to an improved selection? Clearly, if we find better artwork to present to a member we should probably use it; but continuous changes can also confuse people. Changing images also introduces an attribution problem as it becomes unclear which image led a member to be interested in a title.
另一個(gè)挑戰(zhàn),是要理解封面變化所產(chǎn)生的影響错沽,是否會(huì)降低視頻的可識(shí)別性簿晓,讓視頻在視覺上難以定位?例如千埃,會(huì)員之前感興趣但至今還沒有注意到的視頻憔儿,或者,封面改變是否會(huì)讓用戶改變想法放可。如果我們找到更好的圖片呈現(xiàn)給會(huì)員并不斷更換封面谒臼,會(huì)讓會(huì)員感到迷惑朝刊。另外,改變封面也會(huì)引起一個(gè)問題蜈缤,我們不清楚究竟是哪張封面引起了會(huì)員對(duì)視頻的興趣拾氓。
Next, there is the challenge of understanding how artwork performs in relation to other artwork we select in the same page or session. Maybe a bold close-up of the main character works for a title on a page because it stands out compared to the other artwork. But if every title had a similar image then the page as a whole may not seem as compelling. Looking at each piece of artwork in isolation may not be enough and we need to think about how to select a diverse set of images across titles on a page and across a session. Beyond the artwork for other titles, the effectiveness of the artwork for a title may depend on what other types of evidence and assets (e.g. synopses, trailers, etc.) we also display for that title. Thus, we may need a diverse selection where each can highlight complementary aspects of a title that may be compelling to a member.
接下來,是要理解封面如何與同一個(gè)頁面或者階段選擇的其他封面進(jìn)行合理關(guān)聯(lián)底哥。也許主角的大膽特寫非常適用于頁面上的視頻封面咙鞍,因?yàn)榕c其他作品相比,它顯得非常突出趾徽。但是续滋,如果整個(gè)頁面的封面都是這一類型,那么它的效果反而會(huì)大打折扣孵奶。因此疲酌,孤立地看每一幅圖片可能還不夠,我們需要思考如何在整個(gè)頁面使用多樣化的圖像拒课。封面的效果可能還取決于圖片之外其他的因素(例如簡介徐勃、預(yù)告片等)。所以早像,我們的圖片選擇應(yīng)該多樣化,讓每個(gè)視頻之間都能形成互補(bǔ)肖爵。
To achieve effective personalization, we also need a good pool of artwork for each title. This means that we need several assets where each is engaging, informative and representative of a title to avoid “clickbait”. The set of images for a title also needs to be diverse enough to cover a wide potential audience interested in different aspects of the content. After all, how engaging and informative a piece of artwork is truly depends on the individual seeing it. Therefore, we need to have artwork that highlights not only different themes in a title but also different aesthetics. Our teams of artists and designers strive to create images that are diverse across many dimensions. They also take into consideration the personalization algorithms which will select the images during their creative process for generating artwork.
為了實(shí)現(xiàn)有效的個(gè)性化卢鹦,我們還需要為每個(gè)視頻提供優(yōu)質(zhì)的作品庫。這意味著我們需要多個(gè)庫存劝堪,并且每個(gè)庫存的圖片都是非常有吸引力冀自、信息豐富且與視頻契合,但要避免那種“誘導(dǎo)點(diǎn)擊”式的圖片秒啦。視頻的圖像集也需要足夠多樣化熬粗,以涵蓋對(duì)內(nèi)容的不同角度感興趣的廣大潛在觀眾。畢竟余境,一張封面的信息量取決于看到它的個(gè)體驻呐。因此,我們的封面不僅需要突出視頻中的不同主題芳来,還要突出不同的美學(xué)含末。
Finally, there are engineering challenges to personalize artwork at scale. One challenge is that our member experience is very visual and thus contains a lot of imagery. So using personalized selection for each asset means handling a peak of over 20 million requests per second with low latency. Such a system must be robust: failing to properly render the artwork in our UI brings a significantly degrades the experience. Our personalization algorithm also needs to respond quickly when a title launches, which means rapidly learning to personalize in a cold-start situation. Then, after launch, the algorithm must continuously adapt as the effectiveness of artwork may change over time as both the title evolves through its life cycle and member tastes evolve.
最后,是大規(guī)模個(gè)性化封面面臨的工程挑戰(zhàn)即舌。由于我們的會(huì)員體驗(yàn)是視覺化的佣盒,包含大量的圖像,因此顽聂,系統(tǒng)在峰值時(shí)需要每秒處理超過 2000 萬個(gè)低延遲請(qǐng)求肥惭。這個(gè)系統(tǒng)必須足夠強(qiáng)大盯仪,因?yàn)橛脩艚缑娌荒苷_渲染圖稿,用戶體驗(yàn)會(huì)顯著下降蜜葱。而且全景,個(gè)性化算法還需要在視頻上傳時(shí)做出快速響應(yīng),這意味著要在冷啟動(dòng)的情況下快速個(gè)性化學(xué)習(xí)笼沥。啟動(dòng)后蚪燕,該算法必須不斷進(jìn)行調(diào)試,因?yàn)榉饷娴男Ч赡軙?huì)隨著時(shí)間的推移而變化奔浅,視頻的生命周期不斷演變馆纳,而且會(huì)員的品味也在不斷變化。
Contextual bandits approach
Much of the Netflix recommendation engine is powered by machine learning algorithms. Traditionally, we collect a batch of data on how our members use the service. Then we run a new machine learning algorithm on this batch of data. Next we test this new algorithm against the current production system through an A/B test. An A/B test helps us see if the new algorithm is better than our current production system by trying it out on a random subset of members. Members in group A get the current production experience while members in group B get the new algorithm. If members in group B have higher engagement with Netflix, then we roll-out the new algorithm to the entire member population. Unfortunately, this batch approach incurs regret: many members over a long period of time did not benefit from the better experience. This is illustrated in the figure below.
Netflix 的大部分推薦引擎都采用機(jī)器學(xué)習(xí)算法汹桦。首先鲁驶,我們會(huì)收集一批關(guān)于會(huì)員如何使用服務(wù)的數(shù)據(jù),然后在這批數(shù)據(jù)上運(yùn)行一個(gè)新的機(jī)器學(xué)習(xí)算法舞骆。接下來钥弯,我們對(duì)這種算法在現(xiàn)有生產(chǎn)系統(tǒng)上進(jìn)行 A / B 測試。通過在隨機(jī)子集上進(jìn)行 A / B 測試督禽,我們了解到新算法是否比現(xiàn)有的生產(chǎn)系統(tǒng)更好脆霎。A 組會(huì)員代表當(dāng)前的產(chǎn)品體驗(yàn),而 B 組代表新算法下的產(chǎn)品體驗(yàn)狈惫。如果 B 組中的會(huì)員對(duì) Netflix 的參與度更高睛蛛,那么我們將把這個(gè)新算法推廣到整個(gè)會(huì)員群體。不幸的是胧谈,這種批處理方式也有缺憾(regret):許多會(huì)員長期以來并沒有更好的用戶體驗(yàn)忆肾,如下圖所示:
To reduce this regret, we move away from batch machine learning and consider online machine learning. For artwork personalization, the specific online learning framework we use is contextual bandits. Rather than waiting to collect a full batch of data, waiting to learn a model, and then waiting for an A/B test to conclude, contextual bandits rapidly figure out the optimal personalized artwork selection for a title for each member and context. Briefly, contextual bandits are a class of online learning algorithms that trade off the cost of gathering training data required for learning an unbiased model on an ongoing basis with the benefits of applying the learned model to each member context. In our previous unpersonalized image selection work, we used non-contextual bandits where we found the winning image regardless of the context. For personalization, the member is the context as we expect different members to respond differently to the images.
為了減小這個(gè)缺憾,我們放棄了批處理機(jī)器學(xué)習(xí)菱肖,而使用在線機(jī)器學(xué)習(xí)客冈。對(duì)于圖片個(gè)性化,我們使用的在線學(xué)習(xí)框架是 contextual bandits稳强。contextual bandits 并不是收集整批的數(shù)據(jù)场仲,進(jìn)行學(xué)習(xí)模型訓(xùn)練,直到 A / B 測試結(jié)束键袱,而是可以迅速為每個(gè)會(huì)員找到最合適的個(gè)性化圖片燎窘。簡而言之,contextual bandits 是一類在線學(xué)習(xí)算法蹄咖,這種算法可以在學(xué)習(xí)無偏差模型所需的訓(xùn)練數(shù)據(jù)成本褐健,和將學(xué)習(xí)模型應(yīng)用于每個(gè)會(huì)員的好處之間進(jìn)行權(quán)衡。在之前的工作中,我們使用非 contextual bandits 方法進(jìn)行封面選擇蚜迅,找到內(nèi)容上最佳的圖像舵匾。而對(duì)于個(gè)性化推薦,我們要考慮上下文谁不,因?yàn)槲覀冾A(yù)計(jì)不同的會(huì)員會(huì)對(duì)圖像做出不同的反應(yīng)坐梯。
A key property of contextual bandits is that they are designed to minimize regret. At a high level, the training data for a contextual bandit is obtained through the injection of controlled randomization in the learned model’s predictions. The randomization schemes can vary in complexity from simple epsilon-greedy formulations with uniform randomness to closed loop schemes that adaptively vary the degree of randomization as a function of model uncertainty. We broadly refer to this process as data exploration. The number of candidate artworks that are available for a title along with the size of the overall population for which the system will be deployed informs the choice of the data exploration strategy. With such exploration, we need to log information about the randomization for each artwork selection. This logging allows us to correct for skewed selection propensities and thereby perform offline model evaluation in an unbiased fashion, as described later.
contextual bandits 的一個(gè)重要屬性,是其是為盡量減小缺憾而設(shè)計(jì)的刹帕。在高層次上吵血,我們通過在學(xué)習(xí)模型的預(yù)測中輸入受控隨機(jī)化來獲得 contextual bandits 的訓(xùn)練數(shù)據(jù)。隨機(jī)化方案的復(fù)雜性可以從簡單的具有均勻隨機(jī)性的 epsilon-greedy 公式偷溺,到隨著模型不確定性而自適應(yīng)地改變隨機(jī)化程度的閉環(huán)方案蹋辅。我們將這個(gè)過程稱為數(shù)據(jù)探索(data exploration)。進(jìn)行這樣的探索挫掏,我們需要記錄每個(gè)封面選擇的隨機(jī)化信息侦另。這種日志記錄讓我們可以糾正走偏的選擇傾向,從而以稍后所述的不偏頗的方式執(zhí)行離線模型評(píng)估尉共。
Exploration in contextual bandits typically has a cost (or regret) due to the fact that our artwork selection in a member session may not use the predicted best image for that session. What impact does this randomization have on the member experience (and consequently on our metrics)? With over a hundred millions members, the regret incurred by exploration is typically very small and is amortized across our large member base with each member implicitly helping provide feedback on artwork for a small portion of the catalog. This makes the cost of exploration per member negligible, which is an important consideration when choosing contextual bandits to drive a key aspect of our member experience. Randomization and exploration with contextual bandits would be less suitable if the cost of exploration were high.
Under our online exploration scheme, we obtain a training dataset that records, for each (member, title, image) tuple, whether that selection resulted in a play of the title or not. Furthermore, we can control the exploration such that artwork selections do not change too often. This gives a cleaner attribution of the member’s engagement to specific artwork. We also carefully determine the label for each observation by looking at the quality of engagement to avoid learning a model that recommends “clickbait” images: ones that entice a member to start playing but ultimately result in low-quality engagement.
由于我們可能不會(huì)采用情境 bandits 算法預(yù)測的最佳圖像褒傅,所以數(shù)據(jù)探索可能會(huì)產(chǎn)生成本(或缺憾)。這種隨機(jī)性對(duì)會(huì)員體驗(yàn)(以及我們的指標(biāo))有什么影響呢袄友?我們有超過一億的會(huì)員殿托,通常情況下,探索帶來的缺憾非常小剧蚣,分?jǐn)偟烬嫶蟮臅?huì)員基數(shù)上碌尔,每個(gè)會(huì)員都會(huì)為記錄提供一小部分反饋。這使得每個(gè)成員的探索成本可以忽略不計(jì)券敌,這也是起碼選擇情境 bandits 改善會(huì)員體驗(yàn)的重要因素。如果探索成本很高柳洋,那么使用情境 bandits 進(jìn)行隨機(jī)化和數(shù)據(jù)探索就不太合適壁顶。根據(jù)我們的在線數(shù)據(jù)探索方案恤煞,不管視頻是否被播放,我們都會(huì)獲得一個(gè)記錄每個(gè)(會(huì)員、標(biāo)題遇伞、圖像)元組的訓(xùn)練數(shù)據(jù)集。此外另假,我們可以控制探索础淤,使圖像選擇不會(huì)經(jīng)常變化,這使得會(huì)員對(duì)特定圖片的參與度更加清晰鬼吵。
Model training 模型訓(xùn)練
In this online learning setting, we train our contextual bandit model to select the best artwork for each member based on their context. We typically have up to a few dozen candidate artwork images per title. To learn the selection model, we can consider a simplification of the problem by ranking images for a member independently across titles. Even with this simplification we can still learn member image preferences across titles because, for every image candidate, we have some members who were presented with it and engaged with the title and some members who were presented with it and did not engage. These preferences can be modeled to predict for each (member, title, image) tuple, the probability that the member will enjoy a quality engagement. These can be supervised learning models or contextual bandit counterparts with Thompson Sampling, LinUCB, or Bayesian methods that intelligently balance making the best prediction with data exploration.
在在線學(xué)習(xí)中扣甲,我們訓(xùn)練 contextual bandits 模型根據(jù)情境為每個(gè)會(huì)員選擇最合適的圖片。通常每個(gè)視頻最多有幾十張候選圖片,為了訓(xùn)練選擇模型琉挖,我們?yōu)槊總€(gè)會(huì)員的圖片進(jìn)行排名來簡化問題启泣。簡化之后,我們?nèi)匀豢梢哉业綍?huì)員對(duì)視頻圖像的偏好示辈,因?yàn)槌诗I(xiàn)給用戶的每個(gè)候選圖像寥茫,有一部分會(huì)引起用戶的參與,而另一部分則不會(huì)矾麻。我們可以對(duì)這些偏好進(jìn)行建模和預(yù)測纱耻,會(huì)員享受高質(zhì)量參與度的概率會(huì)相應(yīng)提高。這樣的模型可以是監(jiān)督式學(xué)習(xí)险耀,也可以是湯普森抽樣(Thompson Sampling)contextual bandits弄喘、LinUCB 或貝葉斯方法(Bayesian)。
Potential signals 潛在的信息
In contextual bandits, the context is usually represented as an feature vector provided as input to the model. There are many signals we can use as features for this problem. In particular, we can consider many attributes of the member: the titles they’ve played, the genre of the titles, interactions of the member with the specific title, their country, their language preferences, the device that the member is using, the time of day and the day of week. Since our algorithm selects images in conjunction with our personalized recommendation engine, we can also use signals regarding what our various recommendation algorithms think of the title, irrespective of what image is used to represent it.
在 contextual bandits 中胰耗,contextual 通常表示為模型輸入提供的特征向量限次。我們可以使用許多信息作為特征,尤其是會(huì)員的許多屬性:他們播放的視頻柴灯、視頻類型卖漫、會(huì)員對(duì)特定視頻的參與度、國籍赠群、語言偏好羊始、使用設(shè)備、時(shí)間等查描。
An important consideration is that some images are naturally better than others in the candidate pool. We observe the overall take rates for all the images in our data exploration, which is simply the number of quality plays divided by the number of impressions. Our previous work on unpersonalized artwork selection used overall differences in take rates to determine the single best image to select for a whole population. In our new contextual personalized model, the overall take rates are still important and personalization still recovers selections that agree on average with the unpersonalized model’s ranking.
另外一個(gè)重要的考慮因素突委,是候選池中一些圖片優(yōu)于其他圖片。我們觀察數(shù)據(jù)探索中所有圖像的總體轉(zhuǎn)換率(take rates)冬三,即高質(zhì)量播放次數(shù)除以印象數(shù)量匀油。以前做非個(gè)性化圖像選擇時(shí),我們僅根據(jù)總體轉(zhuǎn)換率之間的差異來決定為用戶批量選擇的最佳圖像勾笆。而在我們新的情境 bandits 個(gè)性化模型中敌蚜,整體轉(zhuǎn)換了仍然是重要的,并且個(gè)性化推薦仍會(huì)與非個(gè)性化圖像排名有一定重合窝爪。
Image Selection 圖像選擇
The optimal assignment of image artwork to a member is a selection problem to find the best candidate image from a title’s pool of available images. Once the model is trained as above, we use it to rank the images for each context. The model predicts the probability of play for a given image in a given a member context. We sort a candidate set of images by these probabilities and pick the one with the highest probability. That is the image we present to that particular member.
為會(huì)員提供合適圖像弛车,實(shí)際上是一個(gè)從與視頻匹配的的可用圖像池中找到最佳候選圖像的選擇性問題。模型經(jīng)過上述訓(xùn)練后蒲每,我們用它來對(duì)每個(gè)情境的圖像進(jìn)行排序纷跛,并預(yù)測為會(huì)員推薦圖像會(huì)引發(fā)播放的概率。我們按這些概率對(duì)候選圖像集進(jìn)行排序邀杏,并選擇出概率最高的圖像贫奠。
Performance evaluation 效果評(píng)估
Offline 離線學(xué)習(xí)
To evaluate our contextual bandit algorithms prior to deploying them online on real members, we can use an offline technique known as replay [1]. This method allows us to answer counterfactual questions based on the logged exploration data (Figure 1). In other words, we can compare offline what would have happened in historical sessions under different scenarios if we had used different algorithms in an unbiased way.
在線上部署之前,我們可以使用一種稱為“重播”的離線技術(shù) [1] 對(duì)情境 bandits 算法進(jìn)行評(píng)估。這種方法讓我們可以根據(jù)記錄的探索數(shù)據(jù)來回答反事實(shí)問題(圖 1)叮阅。換句話說刁品,如果我們?cè)谕葪l件下使用不同的算法,在不同情境下在線下會(huì)發(fā)生什么浩姥。
Figure 1: Simple example of calculating a replay metric from logged data. For each member, a random image was assigned (top row). The system logged the impression and whether the profile played the title (green circle) or not (red circle). The replay metric for a new model is calculated by matching the profiles where the random assignment and the model assignment are the same (black square) and computing the take fraction over that subset.
(圖 1:根據(jù)記錄的數(shù)據(jù)計(jì)算重播率的簡單示例挑随。為每個(gè)成員分配一個(gè)隨機(jī)圖像(第一行),系統(tǒng)記錄了視頻印象以及用戶播放了視頻(綠色圓圈)或沒有(紅色圓圈)勒叠。通過匹配隨機(jī)分配和模型分配重合的部分(黑色方塊)兜挨,計(jì)算該子集的分?jǐn)?shù)來計(jì)算新模型的重播指數(shù)。)
Replay allows us to see how members would have engaged with our titles if we had hypothetically presented images that were selected through a new algorithm rather than the algorithm used in production. For images, we are interested in several metrics, particularly the take fraction, as described above.
如果我們假設(shè)提供的圖像是通過新算法選擇的眯分,而不是現(xiàn)用的算法拌汇,則重播顯示出會(huì)員對(duì)視頻的參與度。圖 2 顯示了與隨機(jī)選擇或非情境 bandits 相比弊决,情境 bandits 如何提高記錄中用戶的平均參與率噪舀。
Figure 2 shows how contextual bandit approach helps increase the average take fraction across the catalog compared to random selection or non-contextual bandits.
(圖 2:基于圖像探索數(shù)據(jù)記錄中重播率,不同算法選擇的圖像平均分?jǐn)?shù)(越高越好)飘诗。隨機(jī)(綠色)表示隨機(jī)選擇圖像与倡,簡單的 Bandit 算法(黃色)選擇具有最高分?jǐn)?shù)的圖像。情境 bandits 算法(藍(lán)色和粉紅色)根據(jù)情境為不同的成員選擇不同的圖像昆稿。)
Figure 2: Average image take fraction (the higher the better) for different algorithms based on replay from logged image explore data. The Random (green) policy selects one image at random. The simple Bandit algorithm (yellow) selects the image with highest take fraction. Contextual Bandit algorithms (blue and pink) use context to select different images for different members.
Figure 3: Example of contextual image selection based on the type of profile. Comedy refers to a profile that mostly watches comedy titles. Similarly, Romance watches mostly romantic titles. The contextual bandit selects the image of Robin Williams, a famous comedian, for comedy-inclined profiles while selecting an image of a kissing couple for profiles more inclined towards romance.
(圖 3:根據(jù)用戶個(gè)人資料進(jìn)行的情境圖像選擇示例纺座。Comedy 指主要觀看喜劇片的個(gè)人資料,Romance 代表看愛情片最多的用戶個(gè)人資料溉潭。情境 bandits 算法為更喜歡喜劇片的會(huì)員推薦了帶有著名喜劇演員羅賓·威廉姆斯(Robin Williams)形象净响,同時(shí)更為浪漫的情侶接吻圖片。)
Online 在線學(xué)習(xí)
After experimenting with many different models offline and finding ones that had a substantial increase in replay, we ultimately ran an A/B test to compare the most promising personalized contextual bandits against unpersonalized bandits. As we suspected, the personalization worked and generated a significant lift in our core metrics. We also saw a reasonable correlation between what we measured offline in replay and what we saw online with the models. The online results also produced some interesting insights. For example, the improvement of personalization was larger in cases where the member had no prior interaction with the title. This makes sense because we would expect that the artwork would be more important to someone when a title is less familiar.
經(jīng)過對(duì)多種離線模型進(jìn)行試驗(yàn)之后喳瓣,我們找到了可以提高重播率的模型馋贤,最后進(jìn)行 A / B 測試,以對(duì)個(gè)性化情境 bandits 與非個(gè)性化 bandits 進(jìn)行比較畏陕。正如我們所料掸掸,個(gè)性化對(duì)核心指標(biāo)提高起到了重大的作用。我們也看到了線下測量重播率與線上模型之間的合理性關(guān)聯(lián)蹭秋。在線結(jié)果還發(fā)現(xiàn)了有趣的現(xiàn)象,例如堤撵,在會(huì)員之前沒有參與的視頻仁讨,個(gè)性化的改善效果更好。這不無理由实昨,因?yàn)槲覀兏M@個(gè)算法對(duì)用戶并不熟悉的視頻發(fā)揮更大的作用洞豁。
Conclusion 結(jié)論
With this approach, we’ve taken our first steps in personalizing the selection of artwork for our recommendations and across our service. This has resulted in a meaningful improvement in how our members discover new content… so we’ve rolled it out to everyone! This project is the first instance of personalizing not just what we recommend but also how we recommend to our members. But there are many opportunities to expand and improve this initial approach. These opportunities include developing algorithms to handle cold-start by personalizing new images and new titles as quickly as possible, for example by using techniques from computer vision. Another opportunity is extending this personalization approach across other types of artwork we use and other evidence that describe our titles such as synopses, metadata, and trailers. There is also an even broader problem: helping artists and designers figure out what new imagery we should add to the set to make a title even more compelling and personalizable.
If these types of challenges interest you, please let us know! We are always looking for great people to join our team, and, for these types of projects, we are especially excited by candidates with machine learning and/or computer vision expertise.
現(xiàn)在,我們已經(jīng)邁出了第一步,在個(gè)性化圖片推薦和其他服務(wù)中采用了這種方法丈挟。這改進(jìn)了用戶發(fā)現(xiàn)新內(nèi)容的方法刁卜,有史以來,我們不僅對(duì)推薦內(nèi)容進(jìn)行了個(gè)性化曙咽,而且對(duì)推薦的方式也進(jìn)行了個(gè)性化蛔趴。但是,這個(gè)方法還有很多可以改進(jìn)的地方例朱,應(yīng)用的范圍也可以進(jìn)一步擴(kuò)大孝情,包括通過計(jì)算機(jī)視覺技術(shù)開發(fā)能以最快的速度對(duì)圖像和視頻進(jìn)行個(gè)性化處理的算法冷啟動(dòng)等。另一個(gè)機(jī)會(huì)是可以將這種個(gè)性化方法擴(kuò)展到我們使用的其他類型的封面以及其他視頻描述語洒嗤,例如概要箫荡、元數(shù)據(jù)和預(yù)告片中。
References
[1] L. Li, W. Chu, J. Langford, and X. Wang, “Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms,” in Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, New York, NY, USA, 2011, pp. 297–306.