cs490w what is collaborative filtering
play

CS490W: What is Collaborative Filtering? Collaborative Filtering - PDF document

CS490W: What is Collaborative Filtering? Collaborative Filtering (CF): Making recommendation decisions for a specific CS-490W user based on the judgments of users with similar tastes Collaborative Filtering Luo Si Train_User 1 1 5 3 3 4


  1. CS490W: What is Collaborative Filtering? Collaborative Filtering (CF): Making recommendation decisions for a specific CS-490W user based on the judgments of users with similar tastes Collaborative Filtering Luo Si Train_User 1 1 5 3 3 4 Department of Computer Science Train_User 2 4 1 5 3 2 Purdue University Test User 1 ? 3 4 Abstract What is Collaborative Filtering? Outline Collaborative Filtering (CF): Making recommendation decisions for a specific � Introduction to collaborative filtering user based on the judgments of users with similar tastes � Main framework � Memory-based collaborative filtering approach � Model-based collaborative filtering approach � Aspect model & Two-way clustering model � Flexible mixture model Train_User 1 1 5 3 3 4 � Decouple model Train_User 2 4 1 5 3 2 � Unified filtering by combining content and collaborative Test User 1 3 4 filtering 5 What is Collaborative Filtering? Why Collaborative Filtering? Collaborative Filtering (CF): � Advantages of Collaborative Filtering Making recommendation decisions for a specific user based on the � Collaborative Filtering does not need content information as judgments of users with similar tastes required by CBF Content-Based Filtering: Recommend by analyzing the � The contents of items belong to the third-party (not accessible content information or available) � The contents of items are difficult to index or analyze (e.g., multimedia information) � Problems of Collaborative Filtering � Privacy issues, how to share one’s interest without disclosing Collaborative Filtering: Make recommendation too much detailed information? by judgments of similar users

  2. Why Collaborative Filtering? Memory-Based Approaches � Applications Collaborative Filtering How to determine the similarity between users? � E-Commerce � Measure the similarity in rating patterns between different users Pearson Correlation Coefficient Similarity Vector Space Similarity _ _ ∑ ∑ − − ( R ( o ) R )( R ( o ) R ) ( ) ( ) t R o R o u u = t u = w u t u w u t u , u _ _ u , u t ∑ ∑ ∑ ∑ 2 − − 2 2 2 ( ) ( ) ( R ( o ) R ) ( R ( o ) R ) R o R o t � Email ranking: borrow email ranking from your office mates u u t u t u u u Average Ratings (be careful…) _ + ∑ − ( ( ) ) w R o R u � Web search? (e.g., local search) t u ^ _ u u , = R ( ) o R u t Prediction: u ∑ t u w , t u u u Memory-Based Approaches Formal Framework for Collaborative Filtering Objects: O m What we have: How to combine the ratings from similar users for predicting? O 1 O 2 O 3 ……O j ………… O M � Weight similar users by their similarity with a specific user; use • Assume there are some Training ratings by training users these weights to combine their ratings. U 1 3 2 4 Users: U n • Test user provides some U 2 4 1 1 _ amount of additional + ∑ − w ( R o ( ) R ) u training data t u ^ _ u u , U i = ( ) u R o R Prediction: t u ∑ t u What we do: w t U N 5 2 2 u u , u • Predict test user’s rating Test User U t 2 3 based training R ut (O j ) = information Memory-Based Approaches Memory-Based Approaches � Memory-Based Approaches � Given a specific user u , find a set of similar users � Predict u’s rating based on ratings of similar users � Issues � How to determine the similarity between users? Train_User 1 1 5 3 3 4 � How to combine the ratings from similar users to make the Train_User 2 4 1 5 3 2 predictions (how to weight different users)? Test User 1 ? 3 2 4 Remove User-specific Rating Bias

  3. Memory-Based Approaches Memory-Based Approaches Train_User 1 1 5 3 3 4 Train_User 1 1 5 3 3 4 Sub Mean (Train1) -2.2 1.8 -0.2 -0.2 0.8 Sub Mean (Train1) -2.2 1.8 -0.2 -0.2 0.8 Train_User 2 4 1 5 3 2 Train_User 2 4 1 5 3 2 Sub Mean (Train2) 1 -2 2 0 -1 Sub Mean (Train2) 1 -2 2 0 -1 Test User 1 ? 3 4 Test User 1 5 3 4 Sub Mean (Test) -1.667 0.333 1.33 Sub Mean (Test) -1.667 0.333 1.33 Make Prediction : Normalize Rating Memory-Based Approaches Memory-Based Approaches � Problems with memory-based approaches � Associated a large amount of computation online costs (have to go over all users, any fast indexing approach?) � Heuristic method to calculate user similarity and make user Train_User 1 1 5 3 3 4 rating prediction Sub Mean (Train1) -2.2 1.8 -0.2 -0.2 0.8 Train_User 2 4 1 5 3 2 � Possible Solution Sub Mean (Train2) 1 -2 2 0 -1 Test User 1 ? 3 4 � Cluster users/items in offline manner, save for online Sub Mean (Test) -1.667 0.333 1.33 computation cost � Proposal more solid probabilistic modeling method Calculate Similarity: Wtrn1_test=0.92; Wtrn2_test=-0.44; Memory-Based Approaches Collaborative Filtering P(o|Z) P(Z) P(u|Z) Model-Based Approaches: Aspect Model (Hofmann et al., 1999) – Model individual ratings as convex Z combination of preference factors O l U l Train_User 1 1 5 3 3 4 R l L ∑ = P ( o , u , r ) P ( z ) P ( o | z ) P ( u | z ) P ( r | z ) Sub Mean (Train1) -2.2 1.8 -0.2 -0.2 0.8 ( l ) ( l ) ( l ) ( l ) ( l ) ( l ) ∈ z Z P(r|Z) Train_User 2 4 1 5 3 2 Two-Sided Clustering Model (Hofmann et al., 1999) Sub Mean (Train2) 1 -2 2 0 -1 – Assume each user and item belong to one user and item group. Test User 1 ? 3 4 ∑ = ( , , ) ( ) ( ) I x(l)v ,J y(l)u :Indicator P o u r P o P u I J C ( l ) ( l ) ( l ) ( l ) ( l ) x v y u vu Sub Mean (Test) -1.667 0.333 1.33 ( l ) ( l ) Variables C vu : Associaion v , u Parameter Make Prediction :

  4. Collaborative Filtering Collaborative Filtering Thoughts: Thoughts: Previous algorithms all cluster users and objects either Previous algorithms address the problem that users implicitly (memory-based) or explicitly (model-based) with similar tastes may have different rating patterns implicitly (Normalize user rating) – Aspect model allows users and objects to belong to different classes, but cluster them together – Two-sided clustering model clusters users and objects separately, but only allows them to belong to one single class Previous Work: Thoughts Previous Work: Thoughts Thoughts: Nice Rating: 5 Nice Rating: 3 Mean Rating: 2 Mean Rating: 1 Cluster users and objects separately AND Flexible Mixture allow them to belong to different classes Model (FMM) Explicitly decouple users preference Decoupled Model values out of the rating values (DM) Collaborative Filtering Decoupled Model (DM) Decoupled Model (DM): Flexible Mixture Model (FMM): P(Z o ) P(Z u ) Separate preference value P(o|Z o ) P(Z o ) P(Z u ) Cluster users and objects separately AND P(u|Z u ) P(u|Z u ) P(o|Z o ) allow them to belong to different classes Z o Z u Z pref ∈ [ 1 ,...., k ] (1 disfavor, k favor) Z o Z u ( , , ) P o u r ( l ) ( l ) ( l ) U l U l ∈ ∑ O l R l L from rating r { 1 , 2 , 3 , 4 , 5 } Z Pre = O l P ( Z ) P ( Z ) P ( o | Z ) P ( u | Z ) P ( r | Z , Z ) o u ( l ) o ( l ) u ( l ) o u Z R , Z o Z u Joint Probability: L R l • Training Procedure: P (r|Z o ,Z u ) P ( o , u , r ) Annealed Expectation Maximization (AEM) ( l ) ( l ) ( l ) P(r|Z Pre ,Z R ) ∑ ∑ = algorithm P ( Z ) P ( Z ) P ( o | Z ) P ( u | Z ) P ( Z | u )[ P ( Z | Z , Z ) P ( r | Z , Z )] o u ( l ) o ( l ) u R ( l ) pre u o ( l ) pre R Z , Z , Z Z o u R pre E-Step: Calculate Posterior Probabilities “ Preference-Based Graphical Model for Collaborative Filtering”, UAI’03 b ( P ( Z ) P ( Z ) P ( o | Z ) P ( u | Z ) P ( r | Z , Z )) = ( ) ( ) ( ) ( , | , , ) o u l o l u l o u P z z o u r ∑ “ A study of Mixture Model for Collaborative Filtering”, Journal of IR o u ( l ) ( l ) ( l ) ( ( ) ( ) ( | ) ( | ) ( | , )) b P Z P Z P o Z P u Z P r Z Z o u ( l ) o ( l ) u ( l ) o u Z o Z , u

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend