The Netflix Recommender System: Algorithms, Business Value, and Innovation

用戶研究發(fā)現(xiàn)netflix的用戶在一到兩屏看過10-20個(gè)title之后,在60s-90s過后就會(huì)失去興趣。推薦系統(tǒng)的目的就是在兩屏之內(nèi)讓用戶找到感興趣的東西钠惩。
how each member watches (e.g., the device, time of day, day of week, intensity of watching)
有這么幾種推薦策略:
1)Personalized Video Ranker
orders the entire catalog of videos (or subsets selected by genre or other filtering) for each member profile in a personalized way柒凉。
Because we use PVR so widely, it must be good at general- purpose relative rankings throughout the entire catalog; this limits how personalized it can actually be
PVR需要對(duì)一個(gè)分類下所有的視頻進(jìn)行rank,需要對(duì)所有分類都進(jìn)行排序篓跛,這實(shí)際上限制了個(gè)性化
2) Top-N Video Ranker
find the best few personalized recommendations in the entire catalog for each member, that is, focusing only on the head of the ranking, a freedom that PVR does not have because it gets used to rank arbitrary subsets of the catalog
TVR其實(shí)是用對(duì)頭部的視頻進(jìn)行rank膝捞,挑出topn出來(lái),所以方法上比PVR更自由愧沟。但是這倆其實(shí)共享了很多相同的屬性绑警,比如
3)Treding Now
used to drive the Trending Now row,有兩部分情況表現(xiàn)很好:

  • 季節(jié)性的熱點(diǎn)央渣,比如情人節(jié)
  • 短期實(shí)時(shí)熱點(diǎn),比如颶風(fēng)
    4)Continue Watching
    the continue watching ranker sorts the subset of recently viewed titles based on our best estimate of whether the member intends to resume watching or rewatch渴频,主要特征有
  • 上次看過的時(shí)間間隔
  • 什么時(shí)候放棄的(中間芽丹、開始、結(jié)尾)
  • 使用的設(shè)備
  • 其他[相關(guān)]標(biāo)題是不是看過
    5)Video-Video Similarity
    an unpersonalized algorithm that computes a ranked list of videos—the similars—for every video in our catalog卜朗,the choice of which BYW rows make it onto a homepage is personalized
    6) Page Generation: Row Selection and Ranking
    select and order rows from a large pool of candidates to create an ordering optimized for relevance and diversity(怎么評(píng)估的相關(guān)性和多樣性拔第?A recent blogpost Learning a Personalized Homepage
    7) Evidence
    Evidence selection algorithms evaluate all the possible evidence items that we can display for every recommendation, to select the few that we think will be most helpful to the member viewing the recommendation。推薦理由的選擇和展示
    decide whether to show that a certain movie won an Oscar or instead show the member that the movie is similar to another video recently watched by that member
    8)Search
    a) search recommends videos for a given query as alternative results for a failed search.
    b)we know about the searching member’s taste is also especially important for us.
  • One algorithm attempts to find the videos that match a given query
  • Another algorithm predicts interest in a concept given a partial query
  • A third algorithm finds video recommendations for a given concept
  1. 商業(yè)價(jià)值
    The effective catalog size (ECS) is a metric that describes how spread viewing is across the items in our catalog.tells us how many videos are required to account for a typical hour streamed.
    ECS的計(jì)算方法如下:


    圖片.png

    Notethat pi ≥ pi+1 for i=1,...,N?1and 綜合為1.

  2. 衡量標(biāo)準(zhǔn)
    直覺跟線上效果不一定相關(guān)场钉,比如“house of cards”看起來(lái)更相似的相關(guān)推薦結(jié)果效果并不如更寬泛的結(jié)果.
    we have observed that improving engagement—the time that our members spend viewing Netflix content—is strongly correlated with improving retention.
    顯著性和測(cè)試的cell數(shù)量關(guān)系很大蚊俺,F(xiàn)or example, if we find that 50% of the members in the test have retained when we compute our retention metric, then we need roughly 2 million members per cell to measure a retention delta of 50.05% to 49.95%=0.1% with statistical confidence. this type of plot can be used as a guide to choose the sample size for the cells in a test, for example, detecting a retention delta of 0.2% requires the sample size traced by the black line labeled 0.2%, which changes as a function of the average retention rate when the experiment stops, being maximum (south of 500k members per cell) when the retention rate is 50%.


    圖片.png

    離線測(cè)試加速迭代,Offline experiments allow us to iterate quickly on algorithm prototypes, and to prune the candidate variants that we use in actual A/B experiments.

  1. 關(guān)鍵問題
    1)Better Experimentation Protocols
    還是需要更好地離線和在線評(píng)測(cè)指標(biāo)來(lái)綜合整體的收益逛万,比如在長(zhǎng)期收益和短期收益的衡量上
    2)Global Algorithms
    3)Controlling for Presentation Bias
    introduce randomness into the recommendations
    4)Page Construction
    It took us a couple of years to find a fully personalized algorithm to construct a page of recommendations that A/B tested better than a page based on a template (itself optimized through years of A/B testing)
    5)Member Coldstarting
    Today, our member coldstart approach has evolved into a survey given during the sign-up process, during which we ask new members to select videos from an algorithmically populated set that we use as input into all of our algorithms.
    6)Choosing the Best Evidence to Support Each Recommendation
    highlight different aspects of a video, such as an actor or director involved in it

  2. 延伸閱讀
    Learning a Personalized Homepage


    圖片.png

    We want our recommendations to be accurate in that they are relevant to the tastes of our members, but they also need to be diverse so that we can address the spectrum of a member’s interests versus only focusing on one. We want to be able to highlight the depth in the catalog we have in those interests and also the breadth we have across other areas to help our members explore and even find new interests. We want our recommendations to be fresh and responsive to the actions a member takes, such as watching a show, adding to their list, or rating; but we also want some stability so that people are familiar with their homepage and can easily find videos they’ve been recommended in the recent past
    二維的多行泳猬,橫著天然滿足相關(guān)性,豎著天然滿足多樣性宇植。
    we consider important

  • the quality of the videos in the row,
  • the amount of diversity on the page
  • the affinity of members for specific kinds of rows
  • and the quality of the evidence we can surface for each video.

A simple way to add in diversity is to switch from a row-ranking approach to a stage-wise approach using a scoring function that considers both a row as well as its relationship to both the previous rows and the previous videos already chosen for the page.Other approaches to greedily add diversity based on submodular function maximization can also be used.
Diversity can also be additionally incorporated into the scoring model when considering the features of a row compared to the rest of the page by looking at how similar the row is to the rest of the rows or the videos in the row to the videos on the rest of the page.

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末得封,一起剝皮案震驚了整個(gè)濱河市,隨后出現(xiàn)的幾起案子指郁,更是在濱河造成了極大的恐慌忙上,老刑警劉巖,帶你破解...
    沈念sama閱讀 222,252評(píng)論 6 516
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件闲坎,死亡現(xiàn)場(chǎng)離奇詭異疫粥,居然都是意外死亡,警方通過查閱死者的電腦和手機(jī)腰懂,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 94,886評(píng)論 3 399
  • 文/潘曉璐 我一進(jìn)店門梗逮,熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái),“玉大人悯恍,你說我怎么就攤上這事库糠。” “怎么了?”我有些...
    開封第一講書人閱讀 168,814評(píng)論 0 361
  • 文/不壞的土叔 我叫張陵瞬欧,是天一觀的道長(zhǎng)贷屎。 經(jīng)常有香客問我,道長(zhǎng)艘虎,這世上最難降的妖魔是什么唉侄? 我笑而不...
    開封第一講書人閱讀 59,869評(píng)論 1 299
  • 正文 為了忘掉前任,我火速辦了婚禮野建,結(jié)果婚禮上属划,老公的妹妹穿的比我還像新娘。我一直安慰自己候生,他們只是感情好同眯,可當(dāng)我...
    茶點(diǎn)故事閱讀 68,888評(píng)論 6 398
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著唯鸭,像睡著了一般须蜗。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上目溉,一...
    開封第一講書人閱讀 52,475評(píng)論 1 312
  • 那天明肮,我揣著相機(jī)與錄音,去河邊找鬼缭付。 笑死柿估,一個(gè)胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的陷猫。 我是一名探鬼主播秫舌,決...
    沈念sama閱讀 41,010評(píng)論 3 422
  • 文/蒼蘭香墨 我猛地睜開眼,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼烙丛!你這毒婦竟也來(lái)了舅巷?” 一聲冷哼從身側(cè)響起,我...
    開封第一講書人閱讀 39,924評(píng)論 0 277
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤河咽,失蹤者是張志新(化名)和其女友劉穎钠右,沒想到半個(gè)月后,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體忘蟹,經(jīng)...
    沈念sama閱讀 46,469評(píng)論 1 319
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡飒房,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 38,552評(píng)論 3 342
  • 正文 我和宋清朗相戀三年,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了媚值。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片狠毯。...
    茶點(diǎn)故事閱讀 40,680評(píng)論 1 353
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡,死狀恐怖褥芒,靈堂內(nèi)的尸體忽然破棺而出嚼松,到底是詐尸還是另有隱情嫡良,我是刑警寧澤,帶...
    沈念sama閱讀 36,362評(píng)論 5 351
  • 正文 年R本政府宣布献酗,位于F島的核電站寝受,受9級(jí)特大地震影響,放射性物質(zhì)發(fā)生泄漏罕偎。R本人自食惡果不足惜很澄,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 42,037評(píng)論 3 335
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望颜及。 院中可真熱鬧甩苛,春花似錦、人聲如沸俏站。這莊子的主人今日做“春日...
    開封第一講書人閱讀 32,519評(píng)論 0 25
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)肄扎。三九已至爱葵,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間反浓,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 33,621評(píng)論 1 274
  • 我被黑心中介騙來(lái)泰國(guó)打工赞哗, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留雷则,地道東北人。 一個(gè)月前我還...
    沈念sama閱讀 49,099評(píng)論 3 378
  • 正文 我出身青樓肪笋,卻偏偏與公主長(zhǎng)得像月劈,于是被迫代替她去往敵國(guó)和親。 傳聞我的和親對(duì)象是個(gè)殘疾皇子藤乙,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 45,691評(píng)論 2 361

推薦閱讀更多精彩內(nèi)容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi閱讀 7,346評(píng)論 0 10
  • The Inner Game of Tennis W Timothy Gallwey Jonathan Cape ...
    網(wǎng)事_79a3閱讀 12,104評(píng)論 3 20
  • 浮生長(zhǎng)路煙雨不散猜揪,山仍是山, 世間風(fēng)景缺憾萬(wàn)般坛梁,你還是你而姐。
    Markyyy閱讀 225評(píng)論 0 0
  • 縱然再大的悲或喜 總要吃飯睡覺 縱然有千般喜愛萬(wàn)般仇恨 總會(huì)放下 終究一生要明的什么 不是喜歡痛苦 不是追逐享受 ...
    生命不息信仰不止閱讀 124評(píng)論 0 0