cs535 big data 4 13 2020 week 12 a sangmi lee pallickara
play

CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big - PDF document

CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University CS535 BIG DATA FAQs Wednesday (4/15) is the GEAR Session


  1. CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University CS535 BIG DATA FAQs • Wednesday (4/15) is the GEAR Session IV presentation • Discussion will be available on 4/15, 16, and 17 • Watch video clips on Canvas à Assignments à Echo360 PART B. GEAR SESSIONS SESSION 4: LARGE SCALE RECOMMENDATION SYSTEMS AND SOCIAL MEDIA Sangmi Lee Pallickara Computer Science, Colorado State University http://www.cs.colostate.edu/~cs535 CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University Topics of Todays Class • Part 1: Collaborative Filtering with the case study of Item-to-Item CF • Part 2: Collaborative Filtering with the case study of Latent Factor CF • Part 3: Evaluating Recommendation Systems GEAR Session 4. Large Scale Recommendation Systems and Social Media Lecture 2. Large Scale Recommendation Systems Amazon.com : Item-to-item collaborative filtering CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University Recommendation System • Amazon.com uses recommendations as a targeted marketing tool • Find a set of customers whose purchased and rated items overlap the user’s purchased and rated items • Email campaigns • Most of their web pages • Eliminates items the user has already purchased (or rated) • Recommends the remaining items to the users http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 1

  2. CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University What if they use a Traditional CF [1/4] What if they use a Traditional CF [2/4] • Build a utility matrix • Find out similar users • N-dimensional vector of items per user regarding their ratings • Cosine similarity between the vectors • Where N is the number of distinct catalog items • E.g. user A and B !"# • Positive for purchased or positively rated items • Cosine_Similarity(A,B) =cos(A,B)= ∥!∥∗∥#∥ • Negative for negatively rated items • Select items within the group of items purchased by the similar users • To compensate for the best-selling items • E.g. Rank each item according to how many similar customers purchased it • Multiplies the vector components by the inverse frequency • Making less well-known items more relevant • Highly ranked item(s) will be recommended CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University What if they use a Traditional CF [3/4] What if they use a Traditional CF [4/4] • For N items (in the catalog) and M users • Dimensionality reduction • Worst case • Reducing M by randomly sampled customers or discarding customers with few purchases • O(MN) • Average customer vector is extremely sparse • Reducing N by discarding very popular or unpopular items • O(M+N) • Most of scanning will be approximately O(M) • What will be the problem of above approaches? • There are a few customers who have purchased or rated a significant percentage of the catalog • Therefore, the final performance of the algorithm is approximately O(M+N) CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University What if they use a Traditional CF [4/4] Item-to-item collaborative filtering • Dimensionality reduction • It does NOT match the user to similar customers • Reducing M by randomly sampled customers or discarding customers with few • Item-to-item collaborative filtering purchases • Matches each of the user’s purchased and rated items to similar items • Reducing N by discarding very popular or unpopular items • Combines those similar items into a recommendation list • Disadvantages • Hard to capture the similarity between the users • Item-space partitioning restricts recommendations to a specific product or subject area • If the algorithm discards the most popular or unpopular items • They will never appear as recommendataion http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 2

  3. CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University Determining the most-similar match Determining the most-similar match • The algorithm builds a similar-items table • Calculating the similarity between a single product and all related products • By finding items that customers tend to purchase together • It is not the same “similarity” between items • Based on the co-occurred items in the a client’s purchase history • How about building a product-to-product matrix by iterating through all item pairs and • E.g. if a client A has bought a headset X and a lawn mower Y, X and Y can be considered as “similar” item in this context computing a similarity metric for each pair? • How to build a similar-items matrix • Many product pairs have no common customer For each item in product catalog , I1 • If you already bought a TV today, will you buy another TV again today? For each customer C who purchased I1 For each item I2 purchased by customer C Record that a customer purchased I1 and I2 For each item I2 Compute the similarity between I1 and I2 CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University Part 1: tracking co-occurrence items [1/3] Part 1: tracking co-occurrence items [2/3] Purchase record for the user U A ={ I 1 , I 3. ,I 4 } Purchase record for the user U A ={ I 1 , I 3. ,I 4 } I 0 I 1 I 2 I 3 I 4 I 5 I 6 I 0 I 1 I 2 I 3 I 4 I 5 I 6 I 0 Purchase record for the user U B ={ I 2 , I 3. ,I 4 } Purchase record for the user U B ={ I 2 , I 3. ,I 4 } I 0 I 1 Purchase record for the user U C ={ I 2 } I 1 1 1 Purchase record for the user U C ={ I 2 } I 2 Purchase record for the user U D ={ I 0 , I 5. ,I 6 } Purchase record for the user U D ={ I 0 , I 5. ,I 6 } I 2 I 3 Purchase record for the user U E ={ I 1 , I 3. } I 3 1 1 Purchase record for the user U E ={ I 1 , I 3. } I 4 Purchase record for the user U F ={ I 0 , I 3. ,I 5 } I 4 1 1 Purchase record for the user U F ={ I 0 , I 3. ,I 5 } I 5 Purchase record for the user U G ={ I 5 , I 6. } I 5 Purchase record for the user U G ={ I 5 , I 6. } I 6 I 6 For each item in product catalog , I1 For each item in product catalog , I1 For each customer C who purchased I1 For each customer C who purchased I1 For each item I2 purchased by customer C For each item I2 purchased by customer C Record that a customer purchased I1 and I2 Record that a customer purchased I1 and I2 For each item I2 For each item I2 Compute the similarity between I1 and I2 Compute the similarity between I1 and I2 CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University Part 1: tracking co-occurrence items [3/3] Part 2: Computing similarity between items Co-occurrence matrix Purchase record for the user U A ={ I 1 , I 3. ,I 4 } • Using cosine measure Purchase record for the user U B ={ I 2 , I 3. ,I 4 } I 0 I 1 I 2 I 3 I 4 I 5 I 6 Purchase record for the user U C ={ I 2 } • Each vector corresponds to an item I 0 0 0 0 1 0 2 1 Purchase record for the user U D ={ I 0 , I 5. ,I 6 } • Item A and B (rather than customers) I 1 0 0 0 2 1 0 0 !"# Purchase record for the user U E ={ I 1 , I 3. } I 2 0 0 0 1 1 0 0 • Cosine_Similarity(A,B) =cos(A,B)= Purchase record for the user U F ={ I 0 , I 3. ,I 5 } ∥!∥∗∥#∥ I 3 1 2 1 0 2 1 0 Purchase record for the user U G ={ I 5 , I 6. } I 4 0 1 1 2 0 0 0 I 5 2 0 0 1 0 0 2 I 6 1 0 0 0 0 2 0 http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 3

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend