Introduction Recommender Systems What is this lecture about? What - PowerPoint PPT Presentation

Introduction Recommender Systems What is this lecture about? ◮ What is the purpose of a recommender system? ◮ What are the key features? Alban Galland ◮ How does it work? ◮ What are the main challenges? INRIA-Saclay ◮ When to use it? ◮ How to design it? 18 March 2010 A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 1 / 42 A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 2 / 42 Who uses a recommender system? Content Content Who uses a recommender system? Who uses a recommender system? 1 1 What tasks and data correspond to a recommendation problem? What tasks and data correspond to a recommendation problem? 2 2 How to do it? How to do it? 3 3 Content-filtering algorithms Content-filtering algorithms Collaborative-filtering algorithms Collaborative-filtering algorithms Not personalized Not personalized User-based User-based Item-based Item-based Hybrid methods Hybrid methods To go further To go further 4 4 Interesting issues Interesting issues Bibliography Bibliography A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 3 / 42 A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 4 / 42

Who uses a recommender system? Who uses a recommender system? Content site eCommerce site Example: Amazon, Netflix Examples: AlloCine, Zagat, Task: build group of products LibraryThing, Last.fm, Pandora, for bundle sales or more StumbleUpon generally find a list of products that the user is likely to buy Task: predict ratings of items by a given user or find a list of Data: list of purchases and interesting items browsing history for all users Recommendation on LibraryThing Data: precise content description, explicit rating for some user Recommendation on Amazon A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 5 / 42 A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 6 / 42 Who uses a recommender system? Who uses a recommender system? eCommerce site Advertisement The Netflix challenge Example: Google AdSense, ◮ $1M prize competition DoubleClick ◮ Input: huge training dataset Task: find a list of ◮ Goal: improve root mean square prediction error rate of 10% compare advertisements optimized to Netflix algorithm according to expected income ◮ 40000+ teams from 186 countries (5000+ teams with valid submissions) Data: browsing history for all ◮ Begins October 2006, winners in June 2009 users Recommendation on Google A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 7 / 42 A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 8 / 42

What tasks and data correspond to a recommendation problem? What tasks and data correspond to a recommendation problem? Content What to do with data? Who uses a recommender system? 1 What tasks and data correspond to a recommendation problem? 2 Two kinds of problem with data: ◮ Information retrieval (IR): static content, dynamic query ⇒ modeling How to do it? 3 content (organized with index) Content-filtering algorithms ◮ Information filtering (IF): dynamic content, static query ⇒ modeling Collaborative-filtering algorithms query (organized as filters) Not personalized User-based Recommendation is between IR and IF since the content varies slowly Item-based and the queries depend of few parameters. Methods of both IR and Hybrid methods IF are then used to reduce computation at query time. To go further 4 Interesting issues Bibliography A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 9 / 42 A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 10 / 42 What tasks and data correspond to a recommendation problem? What tasks and data correspond to a recommendation problem? Task(1) Task(2) Degree of personalization General purpose ◮ Generic: everyone receives same recommendations ◮ Top-k filtering: list of “best” items (main usage) or anti-spam ◮ Demographic: everyone in the same category receives same ◮ Items correlation: find similar items recommendations ◮ Prediction of rating: predict affinity between any pair of an user and an ◮ Contextual: recommendation depends only on current activity item (more general) ◮ Persistent: recommendation depends on long-term interests A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 11 / 42 A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 12 / 42

What tasks and data correspond to a recommendation problem? What tasks and data correspond to a recommendation problem? Data (1) Data (2) Context of the current page (current request, item currently explored and structured content about this context) In general, three matrix as input: History of the current user on the system (explicit or implicit ratings) ◮ Users attributes History of all users on the system ◮ Items attributes History of the current user on multiple systems, the whole web or ◮ Rating matrix even on its computer History of all users on multiple systems, the whole web or even their computer A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 13 / 42 A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 14 / 42 What tasks and data correspond to a recommendation problem? What tasks and data correspond to a recommendation problem? Explicit ratings Implicit ratings Numeric ratings: Based on interaction and time ◮ Numeric scale, usually between 2 (thumb up/thumb down) and 15 ◮ purchase (between A+ and E-) levels. ◮ clicks ◮ The more levels you have, the much data you get but the much ◮ browsing (page view time) variance you have on these data. ◮ cursor on the page ◮ Numeric ratings should be normalized. Used to generate an implicit numeric rating Partial order: comparison between two items Semantic information: tags, labels A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 15 / 42 A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 16 / 42

How to do it? How to do it? Content General scope Who uses a recommender system? 1 What tasks and data correspond to a recommendation problem? 2 How to do it? 3 Purely editorial (still used for some advertisement) Content-filtering algorithms Content filtering: depending on attributes of items Collaborative-filtering algorithms Collaborative filtering: depending on ratings of all users Not personalized User-based Hybrid Item-based Hybrid methods To go further 4 Interesting issues Bibliography A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 17 / 42 A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 18 / 42 How to do it? Content-filtering algorithms How to do it? Collaborative-filtering algorithms Content-filtering algorithms Direct aggregation Usually, content-filtering algorithms means an algorithm based on the Usually, collaborative filtering algorithm means an algorithm based on attributes of the items and the ratings of the targeted user the rating matrix. Interpretation of the preferences of users as a function of the The recommender system displays some statistics summary attributes ◮ the average rating of the users Two main methods: ◮ the average rating of professional reviewers. ◮ a set of reviews of the users or of professional reviewer ◮ Heuristic-based: Use common techniques of information retrieval presented earlier in the course : TF/IDF, cosine, clustering... Some basic techniques such as explicit voting or date are used to rank ◮ Model-based: Use a probabilistic model to learn prediction of users reviews. from attributes A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 19 / 42 A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 20 / 42

How to do it? Collaborative-filtering algorithms How to do it? Collaborative-filtering algorithms User-based collaborative filtering Some correlation methods Let U i be the vector of ratings of user u i (see as a line). correlation aggregation ◮ Scalar product similarity: user users ratings t U j sim ( u i , u j ) = U i For each user u i , compute correlation with others users ◮ Cosine similarity: U i t U j For each item i k , aggregate the ratings of i k by the users highly sim ( u i , u j ) = correlated with u i � U i �� U j � ◮ Another one: Problem: sparsity of data (little information about each user) ⇒ bad sim ( u i , u j ) = U i t U j correlation, easy to attack (cf. cold start and attacks issues) � U i � 2 Usually, U i has to be normalized to get meaningful results A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 21 / 42 A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 22 / 42 How to do it? Collaborative-filtering algorithms How to do it? Collaborative-filtering algorithms Some aggregations methods Application on an example Let ˆ r ( u i , i k ) the rating prediction of user u i and item i k Let S t ( u i ) = { u j , sim ( u i , u j ) > t } the users highly correlated with u i for a threshold t What would you predict for user1 on item5, item6 and item7? ◮ Means on the best users user item1 item2 item3 item4 item5 item6 item7 1 � ˆ r ( u i , i k ) = r ( u j , i k ) user1 5 3 4 1 ? ? ? | S t ( u i ) | S t ( u i ) user2 5 3 4 1 5 2 5 user3 5 ? 4 1 5 3 ? ◮ Weighted average on the bests users user4 1 3 2 5 1 4 2 � S t ( u i ) sim ( u i , u j ) r ( u j , i k ) user5 4 ? 4 4 4 ? 4 ˆ r ( u i , i k ) = � S t ( u i ) sim ( u i , u j ) Usually, choice of S t ( u i ) is sensitive since it is a trade-off between sparsity and noise. A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 23 / 42 A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 24 / 42

Introduction Recommender Systems What is this lecture about? What - PowerPoint PPT Presentation

Introduction Recommender Systems What is this lecture about? What is the purpose of a recommender system? What are the key features? Alban Galland How does it work? What are the main challenges? INRIA-Saclay When to use it?

Web Mining and Recommender Systems Recommender Systems: Introduction Learning Goals

2. Recommender Systems Recommenders Everywhere Advanced Topics in Information Retrieval /

Affect- and Personality-based Recommender Systems Part II: Acquisition, Usage in Recommender

On the Economics of Recommender Systems Emilio Calvano Center for Studies in Econ and Finance U.

Privacy in Recommender Systems CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 21:

CSE 255 Lecture 5 Data Mining and Predictive Analytics Recommender Systems Why

Content- -based Recommender Systems based Recommender Systems Content problems, challenges

CSE 158 Lecture 7 Web Mining and Recommender Systems Recommender Systems Announcements

Web Mining and Recommender Systems Advanced Recommender Systems: Bayesian Personalized Ranking

CSE 158 Lecture 7 Web Mining and Recommender Systems Recommender Systems Announcements

CSE 258 Web Mining and Recommender Systems Advanced Recommender Systems This week

Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, Sanjiv Kumar Overview

CSE 258 Web Mining and Recommender Systems Advanced Recommender Systems This week

Web Mining and Recommender Systems Advanced Recommender Systems This week Methodological papers

CSE 258 Lecture 7 Web Mining and Recommender Systems Recommender Systems Announcements

Recommender Systems MLSS 14 Collaborative Filtering and other approaches Xavier Amatriain

Distributed Event Routing in Routing in Publish/Subscribe Systems Roberto Baldoni Sapienza

Whats happened to the world of networking hardware offloads? Jesse Brandeburg Anjali Singhai

Web Security: 1) UI-based attacks 2) Tracking on the web CS 161: Computer Security Prof. Raluca

BGP A route too far Michael Silvin Fredrik Sderquist Contents Background

News and Media Literacy: Building Critical Consumers and Creators Jeff Mao @jmao121

DIGITAL ADVERTISING AN INTRODUCTION EQUINET ACADEMY G1 2 ARE YOU PREPARED FOR CHANGE ? Who

CS 356: Computer Network Architectures Lecture 13: Border Gateway Protocol and switching

Scalable VM and Container Networking using /32bit subnets and BGP routing Andrew Yongjoon Kong 2