SLIDE 3 CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 3
Determining the most-similar match
- The algorithm builds a similar-items table
- By finding items that customers tend to purchase together
- How about building a product-to-product matrix by iterating through all item pairs and
computing a similarity metric for each pair?
- Many product pairs have no common customer
- If you already bought a TV today, will you buy another TV again today?
CS535 Big Data | Computer Science | Colorado State University
Determining the most-similar match
- Calculating the similarity between a single product and all related products
- It is not the same “similarity” between items
- Based on the co-occurred items in the a client’s purchase history
- E.g. if a client A has bought a headset X and a lawn mower Y, X and Y can be considered as “similar” item in
this context
- How to build a similar-items matrix
For each item in product catalog, I1 For each customer C who purchased I1 For each item I2 purchased by customer C Record that a customer purchased I1 and I2 For each item I2 Compute the similarity between I1 and I2
CS535 Big Data | Computer Science | Colorado State University
Part 1: tracking co-occurrence items [1/3]
I0 I1 I2 I3 I4 I5 I6 I0 I1 I2 I3 I4 I5 I6
Purchase record for the user UA={ I1 , I3. ,I4 } Purchase record for the user UB={ I2 , I3. ,I4 } Purchase record for the user UC={ I2 } Purchase record for the user UD={ I0 , I5. ,I6 } Purchase record for the user UE={ I1 , I3. } Purchase record for the user UF={ I0 , I3. ,I5 } Purchase record for the user UG={ I5 , I6. }
CS535 Big Data | Computer Science | Colorado State University
For each item in product catalog, I1 For each customer C who purchased I1 For each item I2 purchased by customer C Record that a customer purchased I1 and I2 For each item I2 Compute the similarity between I1 and I2
Part 1: tracking co-occurrence items [2/3]
I0 I1 I2 I3 I4 I5 I6 I0 I1 1 1 I2 I3 1 1 I4 1 1 I5 I6
Purchase record for the user UA={ I1 , I3. ,I4 } Purchase record for the user UB={ I2 , I3. ,I4 } Purchase record for the user UC={ I2 } Purchase record for the user UD={ I0 , I5. ,I6 } Purchase record for the user UE={ I1 , I3. } Purchase record for the user UF={ I0 , I3. ,I5 } Purchase record for the user UG={ I5 , I6. }
CS535 Big Data | Computer Science | Colorado State University
For each item in product catalog, I1 For each customer C who purchased I1 For each item I2 purchased by customer C Record that a customer purchased I1 and I2 For each item I2 Compute the similarity between I1 and I2
Part 1: tracking co-occurrence items [3/3]
I0 I1 I2 I3 I4 I5 I6 I0 1 2 1 I1 2 1 I2 1 1 I3 1 2 1 2 1 I4 1 1 2 I5 2 1 2 I6 1 2
Purchase record for the user UA={ I1 , I3. ,I4 } Purchase record for the user UB={ I2 , I3. ,I4 } Purchase record for the user UC={ I2 } Purchase record for the user UD={ I0 , I5. ,I6 } Purchase record for the user UE={ I1 , I3. } Purchase record for the user UF={ I0 , I3. ,I5 } Purchase record for the user UG={ I5 , I6. }
Co-occurrence matrix
CS535 Big Data | Computer Science | Colorado State University
Part 2: Computing similarity between items
- Using cosine measure
- Each vector corresponds to an item
- Item A and B (rather than customers)
- Cosine_Similarity(A,B) =cos(A,B)=
!"# ∥!∥∗∥#∥
CS535 Big Data | Computer Science | Colorado State University