Im Implicit Feedback and Performance Evaluation in in Recommender - PowerPoint PPT Presentation

Im Implicit Feedback and Performance Evaluation in in Recommender Systems Shay Ben Elazar Mike Gartrell Noam Koenigstein Gal Lavee

Agenda • Intro Universal Store Recommendations • Extreme Classification with Matrix Factorization • Offline Evaluation Techniques • Online Evaluation • The Gap • Bridging The Gap…

Microsoft Universal Store Recommendations

Windows Store

Groove Music

Extreme Classification with Matrix Factorization

History: Netflix Prize ... 4 5 2 ... 5 4 1 3 ... 4 3 4 1 ... 4 5 2 3 4

Two-class data – Extreme Classification ... 1 1 0 1 ... 0 1 1 0 ... 0 1 1 0 0 ... 0 0 0 1 1 0 1

One-class data ... 1 1 1 ... 1 1 ... 1 1 ... 1 1 1

Problem formulation M ≈ 10 – 500M nodes N ≈ 10K – 1M nodes ? ? ... ? ? Bipartite graph → We care about ? = p ( link )

Fully Bayesian model based on Variational Bayes optimization

Offline Evaluation Techniques

𝑆𝑁𝑇𝐹 - Root Mean Square Error RMSE is computed by averaging the square error over all user item pairs, 𝑣, 𝑗 ∈ ℛ 1 𝑆𝑁𝑇𝐹 = ෍ 𝑇𝐹 𝑣𝑗 ℛ 𝑣,𝑗 ∈ℛ

𝑥𝑆𝑁𝑇𝐹 - Weighted Root Mean Square Error This variant of RMSE is achieved by assigning each data point a weight, 𝑥 𝑣𝑗 , based on its importance. 1 𝑆𝑁𝑇𝐹 = ෍ 𝑥 𝑣𝑗 ⋅ 𝑇𝐹 𝑣𝑗 σ 𝑥 𝑣𝑗 𝑣,𝑗 ∈ℛ

Precision@ 𝑙 / Recall@ 𝑙 Ranking Induced by Ground Truth Algorithm 𝒍 = 𝟒 Positive Result 3 Positive Result 1 𝑞𝑠𝑓𝑑𝑗𝑡𝑗𝑝𝑜@𝑙 = 2 3 Negative Result Positive Result 2 𝑠𝑓𝑑𝑏𝑚𝑚@𝑙 = 2 3 Positive Result 3 Positive Result 1 Positive Result 2

Mean Average Precision We can plot precision as a function of recall Recall v Precision Ranking Induced by 100% Algorithm 90% 80% Positive Result 3 70% 60% Precision 50% Negative Result 40% Average Precision 30% 20% Positive Result 1 10% 0% 0% 33% 67% 100% Recall Positive Result 2

𝑂𝐸𝐷𝐻 – Normalized Discounted Cumulative Gain 1 The relevance is discounted by 𝛿 𝑗 = log 2 𝑗+1 and the sum @ k is normalized by its upper bound – the I𝐸𝐷𝐻 Ranking Induced by Algorithm 𝒍 = 𝟒 Ground Truth 1 5 Positive Result 3 𝐸𝐷𝐻@𝑙 = 𝑚𝑝𝑕 2 1 + 1 + 0 + 𝑚𝑝𝑕 2 3 + 1 = 3.5 Positive Result 1 5 3 1 Relevance: 5 I𝐸𝐷𝐻@𝑙 = 𝑚𝑝𝑕 2 1 + 1 + 𝑚𝑝𝑕 2 2 + 1 + 𝑚𝑝𝑕 2 3 + 1 = 7.39 Negative Result Positive Result 2 Relevance: 3 𝟒.𝟔 𝑶𝑬𝑫𝑯@𝒍 = 𝟖.𝟒𝟘 = 0.47 Positive Result 1 Positive Result 3 Relevance: 1 Positive Result 2

𝑁𝑄𝑆 - Mean Percentile Rank Sometimes there is only one “positive” items in the test set… Ranking Induced by Algorithm Ground Truth Positive Result 3 Negative Result Positive Result 1 Negative Result 𝒔𝒃𝒐𝒍 𝒋 = 𝟒 Positive Result 1 𝑵𝑸𝑺 = 𝟏. 𝟔 Positive Result 2 Negative Result

MPR in Xbox

Spearman’s Rho Coefficient In scenarios where we want to emphasize the full ranking we may compare the ranking of the algorithm to a reference ranking Ranking Induced by Ground Truth Algorithm Ranking 𝑠 1 − Ƹ 𝑠 1 = 1 − 3 Result 3 Result 1 Result 4 𝑠 2 − Ƹ 𝑠 2 = 2 − 4 Result 2 Result 3 𝑠 3 − Ƹ 𝑠 3 = 3 − 1 Result 1 Result 4 𝑠 4 − Ƹ 𝑠 4 = 4 − 2 Result 2

Ƹ Kendall’s Tau Coefficient In scenarios where we want to emphasize the full ranking we may compare the ranking of the algorithm to a reference ranking Same Order Ranking Induced by Ground Truth sign 𝑠 1 − 𝑠 2 ⋅ sign 𝑠 1 − Ƹ 𝑠 2 = 1 Algorithm Ranking Positive Result 3 Positive Result 1 Negative Result Positive Result 2 Positive Result 3 Positive Result 1 Negative Result Positive Result 2

Offline Techniques – Open Questions • How do we measure the importance/ relevance of the positive items? • Long tail items are more important. But how do we quantify? • How many items do we care to recommend? • Should the best item be the first item? • Maybe the best item should be in the middle? • What about diversity? • What about contextual effects? • What about items fatigue?

Online Experimentation

Online Experiments • Randomized controlled experiments • Measure KPIs (Key Performance Indicator) directly • Can compare several variants simultaneously • The ultimate evaluation technique!

Online Experiments in Xbox

Game Purchase Direct Purchases

Total Game Purchase Total Purchases

Experimentation Caveats • What KPIs to measure? • How long to run the experiment? • External factors may influence the results • Cannibalization is hard to account for • Expensive to implement • Can’t compare algorithms before “lighting up”

The Gap

Accuracy and Diversity Interactions

Characterizing The Offline / Online Evaluation Gap • Overemphasis of popular items • List recommendations (diversity, item position) • Freshness/ Fatigue • Contextual information is not fully utilized • Learning from historical data lets you predict the future. But what we really care about is changing the future!

Bridging The Gap

Mitigating Evaluation Techniques • Domain experts / focus groups • Internal user studies • Off-policy evaluation techniques

Off Policy Evaluation - Example 𝜌 𝑇 - The expected reward of a policy ℎ given data 𝑇 from a “logging 𝑊 ℎ policy” 𝜌 . 𝜌 𝑇 = 1 𝑠 ⋅ 𝕁 ℎ 𝑦 == 𝑏 𝑊 ෍ ℎ 𝑇 max ො 𝜌 𝑦 𝑏 , 𝜐 𝑦,𝑏,𝑠 ∈𝑇 where 𝑇 denotes the set of context-action-reward tuples available in the logs

Caveats of Off-policy Evaluation • Need to formulate everything in terms of a policy • Needs sufficient support • Becomes very difficult when your policies are time dependent

Thank you! We are looking for postdoc researchers to join us in Israel … Email: RecoRecruitmentEmail@microsoft.com

Im Implicit Feedback and Performance Evaluation in in Recommender - PowerPoint PPT Presentation

Im Implicit Feedback and Performance Evaluation in in Recommender Systems Shay Ben Elazar Mike Gartrell Noam Koenigstein Gal Lavee Agenda Intro Universal Store Recommendations Extreme Classification with Matrix Factorization

Implicit Guarantees and Risk Taking: Implicit Guarantees and Risk Taking: Implicit Guarantees and

Implicit Bias Implicit bias Implicit bias refers to attitudes or stereotypes that affect our

Implicit Surfaces Implicit Surfaces An implicit surface is simply an iso-contour CIS 781 of a

Implicit Extremes and Implicit MaxStable Laws Stilian Stoev ( sstoev@umich.edu ) University of

Implicit Bias: Transcript Inclusive Teaching Series: Implicit Bias Welcome to the third module of

Multi-core Programming: Implicit Parallelism Tuukka Haapasalo April 16, 2009 Tuukka Haapasalo

Implicit Surfaces CPSC 599.86 / 601.86 Sonny Chan University of Calgary (some board work happened

Implicit Consent Training Aide Autumn 2010 Authors T Owen; W Butcher; D Metharam Jobcentre

Baby Got Feedback: How to Give and Take Feedback Like A Boss Sarah Hagan @thesarahhagan Sarah

Implicit content and chimeric conditionals Itamar Francez University of Chicago / University of

Theory and Practice Implicit Leadership Theory (ILT) Romance of Leadership (RoL) Implicit

Implicit Bias and Race Mikah K. Thompson, Esq. Director of Affirmative Action & Adjunct

Implicit learning and working memory in children Nadiia Denhovska ndenhovska@uclan.ac.uk

Identifying Implicit Component Interactions in Distributed Cyber-Physical Systems 50th Hawaii

Implicit Differentiation Michael Freeze MAT 151 UNC Wilmington Summer 2013 1 / 14 Section 6.4

Provably Good Implicit MLS Surfaces Nikola Milosavljevic CS 468, Fall 2005 Implicit MLS Surfaces

SPAR ARQL: QL: More e than n a Shotgun tgun Weddin ing? 8 th Alberto Mendelzon International

Meta Level Hybrid Recommender System Algorithms How Graphs can be used to improve performance of

An Automatic Recommendation System using R Christopher Byrd Analytics System Architect -

SLIM : Sparse Linear Methods for Top-N Recommender Systems Xia Ning and George Karypis Computer

Why Web Standard are Important: An overview of W3C, its operation and current technical

After School Quality Data Collection Process and Objectives Ken Anthony, Ed.D. Marla Berrios

Summary Points: Sophia Paul and Katje Pritchard are second-year Masters students at the

Outdoor Developed Areas Architectural Barriers Act Accessibility (ABA) Standards for Trails,

Im Implicit Feedback and Performance Evaluation in in Recommender - PowerPoint PPT Presentation

Im Implicit Feedback and Performance Evaluation in in Recommender Systems Shay Ben Elazar Mike Gartrell Noam Koenigstein Gal Lavee Agenda Intro Universal Store Recommendations Extreme Classification with Matrix Factorization

Implicit Guarantees and Risk Taking: Implicit Guarantees and Risk Taking: Implicit Guarantees and

Implicit Bias Implicit bias Implicit bias refers to attitudes or stereotypes that affect our

Implicit Surfaces Implicit Surfaces An implicit surface is simply an iso-contour CIS 781 of a

Implicit Extremes and Implicit MaxStable Laws Stilian Stoev ( sstoev@umich.edu ) University of

Implicit Bias: Transcript Inclusive Teaching Series: Implicit Bias Welcome to the third module of

Multi-core Programming: Implicit Parallelism Tuukka Haapasalo April 16, 2009 Tuukka Haapasalo

Implicit Surfaces CPSC 599.86 / 601.86 Sonny Chan University of Calgary (some board work happened

Implicit Consent Training Aide Autumn 2010 Authors T Owen; W Butcher; D Metharam Jobcentre

Baby Got Feedback: How to Give and Take Feedback Like A Boss Sarah Hagan @thesarahhagan Sarah

Implicit content and chimeric conditionals Itamar Francez University of Chicago / University of

Theory and Practice Implicit Leadership Theory (ILT) Romance of Leadership (RoL) Implicit

Implicit Bias and Race Mikah K. Thompson, Esq. Director of Affirmative Action &amp; Adjunct

Implicit learning and working memory in children Nadiia Denhovska ndenhovska@uclan.ac.uk

Identifying Implicit Component Interactions in Distributed Cyber-Physical Systems 50th Hawaii

Implicit Differentiation Michael Freeze MAT 151 UNC Wilmington Summer 2013 1 / 14 Section 6.4

Provably Good Implicit MLS Surfaces Nikola Milosavljevic CS 468, Fall 2005 Implicit MLS Surfaces

SPAR ARQL: QL: More e than n a Shotgun tgun Weddin ing? 8 th Alberto Mendelzon International

Meta Level Hybrid Recommender System Algorithms How Graphs can be used to improve performance of

An Automatic Recommendation System using R Christopher Byrd Analytics System Architect -

SLIM : Sparse Linear Methods for Top-N Recommender Systems Xia Ning and George Karypis Computer

Why Web Standard are Important: An overview of W3C, its operation and current technical

After School Quality Data Collection Process and Objectives Ken Anthony, Ed.D. Marla Berrios

Summary Points: Sophia Paul and Katje Pritchard are second-year Masters students at the

Outdoor Developed Areas Architectural Barriers Act Accessibility (ABA) Standards for Trails,

Implicit Bias and Race Mikah K. Thompson, Esq. Director of Affirmative Action & Adjunct