Market basket introduction
MARK ET BAS K ET AN ALYS IS IN R
Christopher Bruffaerts
Statistician
Market basket introduction MARK ET BAS K ET AN ALYS IS IN R - - PowerPoint PPT Presentation
Market basket introduction MARK ET BAS K ET AN ALYS IS IN R Christopher Bruffaerts Statistician Overview Market Basket course Chapter 1 : Introduction to market basket analysis Chapter 2 : Metrics and techniques in market basket analysis
MARK ET BAS K ET AN ALYS IS IN R
Christopher Bruffaerts
Statistician
MARKET BASKET ANALYSIS IN R
Market Basket course Chapter 1: Introduction to market basket analysis Chapter 2: Metrics and techniques in market basket analysis Chapter 3: Visualization in market basket analysis Chapter 4: Case study: Movie recommendations @ movieLens
MARKET BASKET ANALYSIS IN R
Basket = collection of items Items
Examples of baskets:
MARKET BASKET ANALYSIS IN R
What's in the store? What are you up for today? One bread Three pieces of cheese
MARKET BASKET ANALYSIS IN R
What's in the store?
store = c("Bread", "Butter", "Cheese", "Wine") set.seed(1234) n_items = 4 my_basket = data.frame( TID = rep(1,n_items), Product = sample( store, n_items, replace = TRUE))
R output
my_basket TID Product 1 1 Bread 2 1 Cheese 3 1 Cheese 4 1 Cheese
MARKET BASKET ANALYSIS IN R
My original basket One record per item purchased
TID Product 1 1 Bread 2 1 Cheese 3 1 Cheese 4 1 Cheese
My adjusted basket One record per distinct item purchased
# A tibble: 2 x 3 TID Product Quantity <dbl> <fct> <int> 1 1 Bread 1 2 1 Cheese 3
MARKET BASKET ANALYSIS IN R
Reshaping the basket data
# Adjusting my basket my_basket = my_basket %>% add_count(Product) %>% unique() %>% rename(Quantity = n) # Number of distinct items n_distinct(my_basket$Product) 2 # Total basket size my_basket %>% summarize(sum(Quantity)) 4
MARKET BASKET ANALYSIS IN R
Visualizing items in my basket
# Plotting items ggplot(my_basket, aes(x=reorder(Product, Quantity), y = Quantity)) + geom_col() + coord_flip() + xlab("Items") + ggtitle("Summary of items in my basket")
MARKET BASKET ANALYSIS IN R
Question: Is there any relationship between items within a basket ? Back to examples
MARK ET BAS K ET AN ALYS IS IN R
MARK ET BAS K ET AN ALYS IS IN R
Christopher Bruffaerts
Statistician
MARKET BASKET ANALYSIS IN R
What's in the store? What are you up for today? {"Bread", "Cheese", "Cheese", "Cheese"} Focus of market basket analysis {"Bread", "Cheese"}
MARKET BASKET ANALYSIS IN R
My store - set X = {"Bread", "Butter", "Cheese", "Wine"} Subsets of X - itemsets Size 0: { ∅ } Size 1: {"Bread"}, {"Wine"}, ... Size 2: {"Bread", "Wine"}, ... Supersets {"Bread", "Butter"} superset of {"Bread"} {"Bread", "Butter", "Cheese", "Wine"} superset of {"Bread", "Butter"}
MARKET BASKET ANALYSIS IN R
Question: What is the set of all possible subsets of X? X = {A, B, C, D}
MARKET BASKET ANALYSIS IN R
Intersection {"Bread"} ∩ {"Butter"} = ∅ {"Bread", "Butter"} ∩ {"Butter", "Wine"} = {"Butter"}
library(dplyr) A = c("Bread", "Butter") B = c("Bread", "Wine") intersect(A,B) [1] "Bread"
Union {"Bread"} ∪ {"Butter"} = {"Bread", "Butter"}
union(A,B) [1] "Bread" "Butter" "Wine"
MARKET BASKET ANALYSIS IN R
Question: How many possible subsets of size k from a set of size n ? "n choose k"
= ,
where
n! = n × (n − 1) × (n − 2) × ... × 2 × 1
Example: Number of baskets with 2 distinct items from the store:
= = 6 (k n) (n − k)!k! n! (2 4) (4 − 2)!2! 4!
MARKET BASKET ANALYSIS IN R
Question How many possible baskets can be created from a set of size n ? Newton's binom
= 2
2^(n_items)
Example T
2 = 16
k=0
∑
n
(k n)
n 4
MARKET BASKET ANALYSIS IN R
Combinations in R
n_items = 4 basket_size = 2 choose(n_items, basket_size) [1] 6 # Looping through all possible values store = matrix(NA, nrow=5, ncol=2) for (i in 0:n_items){ store[i+1,] = c(i, choose(n_items,i))}
Output
colnames(store)=c("size", "nb_combi") store size nb_combi [1,] 0 1 [2,] 1 4 [3,] 2 6 [4,] 3 4 [5,] 4 1
MARKET BASKET ANALYSIS IN R
Get an idea of how fast number of combinations
n_items = 50 fun_nk = function(x) choose(n_items, x) # Plotting ggplot(data = data.frame(x = 0), mapping = aes(x=x))+ stat_function(fun = fun_nk)+ xlim(0, n_items)+ xlab("Subset size")+ ylab("Number of subsets")
MARK ET BAS K ET AN ALYS IS IN R
MARK ET BAS K ET AN ALYS IS IN R
Christopher Bruffaerts
Statistician
MARKET BASKET ANALYSIS IN R
What's in the store? Basket 1: {"Bread", "Cheese"} Basket 2: {"Bread", "Wine" , "Cheese"} Multiple baskets If 100 customers visit the grocery store, can we nd associations of items that occur together? Example: Bread and Cheese Outcome: “if this, then that”
MARKET BASKET ANALYSIS IN R
Learning from multiple baskets Different applications E-commerce: “customers who bought this also bought this” Retail: items which are “bundled or placed together” Social media: friends and connections recommendation Videos and movies recommendation
MARKET BASKET ANALYSIS IN R
Create a dataset containing multiple baskets!
my_baskets = data.frame( "Basket" = c(1,1,1,1, 2,2,2, 3,3, 4,4,4, 5,5, 6,6, 7,7) "Product" = c("Bread", "Cheese", "Cheese", "Cheese", "Bread", "Butter", "Wine", "Butter", "Butter", "Butter", "Wine", "Wine", "Butter", "Cheese", "Cheese", "Wine", "Wine", "Wine") )
A glimpse at my baskets
head(my_baskets) Basket Product 1 1 Bread 2 1 Cheese 3 1 Cheese 4 1 Cheese 5 2 Bread 6 2 Butter
MARKET BASKET ANALYSIS IN R
Questions How many distinct items are there?
n_distinct(my_baskets$Product) [1] 4
How many baskets are there?
n_distinct(my_baskets$Basket) [1] 7
How many items are there in each basket?
df_basket = my_baskets %>% group_by(Basket) %>% summarize( n_total = n(), n_items = n_distinct(Product)) Basket n_total n_items <dbl> <int> <int> 1 1 4 2 2 2 3 3
MARKET BASKET ANALYSIS IN R
Average basket sizes
basket_size %>% summarize( avg_total_items = mean(n_total), avg_dist_items = mean(n_items)) # A tibble: 1 x 2 avg_total_items avg_dist_items <dbl> <dbl> 1 2.57 1.86
Distribution of basket size
# Distribution of distinct items ggplot(df_basket, aes(n_items)) + geom_bar()
MARKET BASKET ANALYSIS IN R
Which item are you looking at? How many times an item appears across all baskets? How many baskets contain that item? Example: Filtering for Cheese in R
# Number of baskets containing Cheese my_baskets %>% filter(Product == "Cheese") %>% summarize( n_tot_items = n(), n_basket_item = n_distinct(Basket)) n_tot_items n_basket_item 1 5 3
MARKET BASKET ANALYSIS IN R
Association rule mining: nding frequent co-occuring associations among a collection of items. Example of rule extraction: {Bread} → {Butter} {Bread, Cheese} → {Wine}
MARKET BASKET ANALYSIS IN R
Agenda for the rest of the course: Chapter 2: Metrics & techniques in market basket analysis Chapter 3: Visualization in market basket analysis Chapter 4: Case study: Movie recommendations @ movieLens
MARK ET BAS K ET AN ALYS IS IN R