Market basket introduction MARK ET BAS K ET AN ALYS IS IN R - - PowerPoint PPT Presentation

market basket introduction
SMART_READER_LITE
LIVE PREVIEW

Market basket introduction MARK ET BAS K ET AN ALYS IS IN R - - PowerPoint PPT Presentation

Market basket introduction MARK ET BAS K ET AN ALYS IS IN R Christopher Bruffaerts Statistician Overview Market Basket course Chapter 1 : Introduction to market basket analysis Chapter 2 : Metrics and techniques in market basket analysis


slide-1
SLIDE 1

Market basket introduction

MARK ET BAS K ET AN ALYS IS IN R

Christopher Bruffaerts

Statistician

slide-2
SLIDE 2

MARKET BASKET ANALYSIS IN R

Overview

Market Basket course Chapter 1: Introduction to market basket analysis Chapter 2: Metrics and techniques in market basket analysis Chapter 3: Visualization in market basket analysis Chapter 4: Case study: Movie recommendations @ movieLens

slide-3
SLIDE 3

MARKET BASKET ANALYSIS IN R

What is a basket?

Basket = collection of items Items

  • 1. Products at the supermarket
  • 2. Products on online website
  • 3. DataCamp courses
  • 4. Movies watched by users

Examples of baskets:

  • 1. Your basket @ the grocery store
  • 2. Your Amazon shopping cart
  • 3. Your courses @ DataCamp
  • 4. The movies you watched on Netix
slide-4
SLIDE 4

MARKET BASKET ANALYSIS IN R

Grocery store example

What's in the store? What are you up for today? One bread Three pieces of cheese

slide-5
SLIDE 5

MARKET BASKET ANALYSIS IN R

Grocery store example in R

What's in the store?

store = c("Bread", "Butter", "Cheese", "Wine") set.seed(1234) n_items = 4 my_basket = data.frame( TID = rep(1,n_items), Product = sample( store, n_items, replace = TRUE))

R output

my_basket TID Product 1 1 Bread 2 1 Cheese 3 1 Cheese 4 1 Cheese

slide-6
SLIDE 6

MARKET BASKET ANALYSIS IN R

What's in my basket?

My original basket One record per item purchased

TID Product 1 1 Bread 2 1 Cheese 3 1 Cheese 4 1 Cheese

My adjusted basket One record per distinct item purchased

# A tibble: 2 x 3 TID Product Quantity <dbl> <fct> <int> 1 1 Bread 1 2 1 Cheese 3

slide-7
SLIDE 7

MARKET BASKET ANALYSIS IN R

What's in my R basket?

Reshaping the basket data

# Adjusting my basket my_basket = my_basket %>% add_count(Product) %>% unique() %>% rename(Quantity = n) # Number of distinct items n_distinct(my_basket$Product) 2 # Total basket size my_basket %>% summarize(sum(Quantity)) 4

slide-8
SLIDE 8

MARKET BASKET ANALYSIS IN R

Visualizing items in my basket

Visualizing items in my basket

# Plotting items ggplot(my_basket, aes(x=reorder(Product, Quantity), y = Quantity)) + geom_col() + coord_flip() + xlab("Items") + ggtitle("Summary of items in my basket")

slide-9
SLIDE 9

MARKET BASKET ANALYSIS IN R

Why are we looking at my basket?

Question: Is there any relationship between items within a basket ? Back to examples

  • 1. Your basket @ the grocery store, e.g. Spaghetti and T
  • mato sauce
  • 2. Your Amazon shopping cart, e.g. Phone and a phone case
  • 3. Your courses @ DataCamp e.g. "Introduction to R" and "Intermediate R"
slide-10
SLIDE 10

Happy shopping!

MARK ET BAS K ET AN ALYS IS IN R

slide-11
SLIDE 11

Item combinations

MARK ET BAS K ET AN ALYS IS IN R

Christopher Bruffaerts

Statistician

slide-12
SLIDE 12

MARKET BASKET ANALYSIS IN R

Back to the grocery store

What's in the store? What are you up for today? {"Bread", "Cheese", "Cheese", "Cheese"} Focus of market basket analysis {"Bread", "Cheese"}

slide-13
SLIDE 13

MARKET BASKET ANALYSIS IN R

Subsets and supersets

My store - set X = {"Bread", "Butter", "Cheese", "Wine"} Subsets of X - itemsets Size 0: { ∅ } Size 1: {"Bread"}, {"Wine"}, ... Size 2: {"Bread", "Wine"}, ... Supersets {"Bread", "Butter"} superset of {"Bread"} {"Bread", "Butter", "Cheese", "Wine"} superset of {"Bread", "Butter"}

slide-14
SLIDE 14

MARKET BASKET ANALYSIS IN R

Itemset graph

Question: What is the set of all possible subsets of X? X = {A, B, C, D}

slide-15
SLIDE 15

MARKET BASKET ANALYSIS IN R

Intersections and unions

Intersection {"Bread"} ∩ {"Butter"} = ∅ {"Bread", "Butter"} ∩ {"Butter", "Wine"} = {"Butter"}

library(dplyr) A = c("Bread", "Butter") B = c("Bread", "Wine") intersect(A,B) [1] "Bread"

Union {"Bread"} ∪ {"Butter"} = {"Bread", "Butter"}

union(A,B) [1] "Bread" "Butter" "Wine"

slide-16
SLIDE 16

MARKET BASKET ANALYSIS IN R

How many baskets of size k?

Question: How many possible subsets of size k from a set of size n ? "n choose k"

= ,

where

n! = n × (n − 1) × (n − 2) × ... × 2 × 1

Example: Number of baskets with 2 distinct items from the store:

= = 6 (k n) (n − k)!k! n! (2 4) (4 − 2)!2! 4!

slide-17
SLIDE 17

MARKET BASKET ANALYSIS IN R

How many possible baskets?

Question How many possible baskets can be created from a set of size n ? Newton's binom

= 2

2^(n_items)

Example T

  • tal number of baskets:

2 = 16

k=0

n

(k n)

n 4

slide-18
SLIDE 18

MARKET BASKET ANALYSIS IN R

How many baskets in R?

Combinations in R

n_items = 4 basket_size = 2 choose(n_items, basket_size) [1] 6 # Looping through all possible values store = matrix(NA, nrow=5, ncol=2) for (i in 0:n_items){ store[i+1,] = c(i, choose(n_items,i))}

Output

colnames(store)=c("size", "nb_combi") store size nb_combi [1,] 0 1 [2,] 1 4 [3,] 2 6 [4,] 3 4 [5,] 4 1

slide-19
SLIDE 19

MARKET BASKET ANALYSIS IN R

Plotting number of combinations

Get an idea of how fast number of combinations

n_items = 50 fun_nk = function(x) choose(n_items, x) # Plotting ggplot(data = data.frame(x = 0), mapping = aes(x=x))+ stat_function(fun = fun_nk)+ xlim(0, n_items)+ xlab("Subset size")+ ylab("Number of subsets")

slide-20
SLIDE 20

Are you ready to count?

MARK ET BAS K ET AN ALYS IS IN R

slide-21
SLIDE 21

What is market basket analysis ?

MARK ET BAS K ET AN ALYS IS IN R

Christopher Bruffaerts

Statistician

slide-22
SLIDE 22

MARKET BASKET ANALYSIS IN R

Multiple baskets @ grocery store

What's in the store? Basket 1: {"Bread", "Cheese"} Basket 2: {"Bread", "Wine" , "Cheese"} Multiple baskets If 100 customers visit the grocery store, can we nd associations of items that occur together? Example: Bread and Cheese Outcome: “if this, then that”

slide-23
SLIDE 23

MARKET BASKET ANALYSIS IN R

Market basket applications

Learning from multiple baskets Different applications E-commerce: “customers who bought this also bought this” Retail: items which are “bundled or placed together” Social media: friends and connections recommendation Videos and movies recommendation

slide-24
SLIDE 24

MARKET BASKET ANALYSIS IN R

Multiple baskets in R

Create a dataset containing multiple baskets!

my_baskets = data.frame( "Basket" = c(1,1,1,1, 2,2,2, 3,3, 4,4,4, 5,5, 6,6, 7,7) "Product" = c("Bread", "Cheese", "Cheese", "Cheese", "Bread", "Butter", "Wine", "Butter", "Butter", "Butter", "Wine", "Wine", "Butter", "Cheese", "Cheese", "Wine", "Wine", "Wine") )

A glimpse at my baskets

head(my_baskets) Basket Product 1 1 Bread 2 1 Cheese 3 1 Cheese 4 1 Cheese 5 2 Bread 6 2 Butter

slide-25
SLIDE 25

MARKET BASKET ANALYSIS IN R

What's in our baskets?

Questions How many distinct items are there?

n_distinct(my_baskets$Product) [1] 4

How many baskets are there?

n_distinct(my_baskets$Basket) [1] 7

How many items are there in each basket?

df_basket = my_baskets %>% group_by(Basket) %>% summarize( n_total = n(), n_items = n_distinct(Product)) Basket n_total n_items <dbl> <int> <int> 1 1 4 2 2 2 3 3

slide-26
SLIDE 26

MARKET BASKET ANALYSIS IN R

How big are baskets?

Average basket sizes

basket_size %>% summarize( avg_total_items = mean(n_total), avg_dist_items = mean(n_items)) # A tibble: 1 x 2 avg_total_items avg_dist_items <dbl> <dbl> 1 2.57 1.86

Distribution of basket size

# Distribution of distinct items ggplot(df_basket, aes(n_items)) + geom_bar()

slide-27
SLIDE 27

MARKET BASKET ANALYSIS IN R

Specic products in the baskets

Which item are you looking at? How many times an item appears across all baskets? How many baskets contain that item? Example: Filtering for Cheese in R

# Number of baskets containing Cheese my_baskets %>% filter(Product == "Cheese") %>% summarize( n_tot_items = n(), n_basket_item = n_distinct(Basket)) n_tot_items n_basket_item 1 5 3

slide-28
SLIDE 28

MARKET BASKET ANALYSIS IN R

Association rule mining

Association rule mining: nding frequent co-occuring associations among a collection of items. Example of rule extraction: {Bread} → {Butter} {Bread, Cheese} → {Wine}

slide-29
SLIDE 29

MARKET BASKET ANALYSIS IN R

So what's coming next?

Agenda for the rest of the course: Chapter 2: Metrics & techniques in market basket analysis Chapter 3: Visualization in market basket analysis Chapter 4: Case study: Movie recommendations @ movieLens

slide-30
SLIDE 30

Let's play with baskets!

MARK ET BAS K ET AN ALYS IS IN R