Implications of Probabilistic Data Modeling for Rule Mining Michael - - PowerPoint PPT Presentation

implications of probabilistic data modeling for rule
SMART_READER_LITE
LIVE PREVIEW

Implications of Probabilistic Data Modeling for Rule Mining Michael - - PowerPoint PPT Presentation

Implications of Probabilistic Data Modeling for Rule Mining Michael Hahsler, Kurt Hornik and Thomas Reutterer Wirtschaftsuniversit at Wien 29th Annual Conference of the German Classification Society (GfKl 2005) Magdeburg, March 9-11, 2005


slide-1
SLIDE 1

Implications of Probabilistic Data Modeling for Rule Mining

Michael Hahsler, Kurt Hornik and Thomas Reutterer

Wirtschaftsuniversit¨ at Wien 29th Annual Conference of the German Classification Society (GfKl 2005) Magdeburg, March 9-11, 2005

slide-2
SLIDE 2

Motivation

  • Mining association rules is an important technique for discovering

meaningful patterns in transaction databases. – Example: diapers ⇒ beer – Applications: product assortment decisions, adapting promotional activities, personalized product recommendations, adaptive user interfaces

  • Current literature focuses on the properties of algorithms.
  • We will discuss properties of

– transaction data sets and – interest measures from a probabilistic point of view.

  • M. Hahsler, K. Hornik and T. Reutterer

2 Magdeburg, March 9-11, 2005

slide-3
SLIDE 3

Outline

  • 1. Association rules
  • 2. Probabilistic model for transaction data
  • 3. Simulation with R
  • 4. Implications for confidence and lift
  • 5. New measure: hyperlift
  • 6. Conclusion
  • M. Hahsler, K. Hornik and T. Reutterer

3 Magdeburg, March 9-11, 2005

slide-4
SLIDE 4

Association Rules

An association rule is a rule of the form X ⇒ Y , where X and Y are two disjoint sets of items (itemsets). Rule selection with threshold on interest measures:

  • Support: fraction of transactions containing an itemset
  • Confidence: probability of seeing Y under the condition that the

transactions also contain X Found rules are often ranked by:

  • Lift: how many times more often X and Y occur together than

expected if they where statistically independent

  • M. Hahsler, K. Hornik and T. Reutterer

4 Magdeburg, March 9-11, 2005

slide-5
SLIDE 5

A simple probabilistic framework for transaction data

Transactions occur following a Poisson process

time Tr1Tr2 Tr3 Tr4 Tr5 Trm-2 Trm-1 Trm t

We analyze transactions which are recorded in a fixed time interval of length t. The number of transactions m in the time interval is then poisson distributed with parameter θt:

P(M = m) = e−θt(θt)m m!

(1)

  • M. Hahsler, K. Hornik and T. Reutterer

5 Magdeburg, March 9-11, 2005

slide-6
SLIDE 6

A simple probabilistic framework (cont’d)

  • n independent items L = {l1, l2, . . . , ln},
  • with each having a fixed success probabilities to occur in a transaction

given by the vector p = (p1, p2, . . . , pn). Following the framework: ci, the observed number of transactions item

li is contained in, can be interpreted as a realization of a random variable Ci.

Under the condition of a fixed number of transactions m this random variable has a binomial distribution:

P(Ci = ci|M = m) = m ci

  • pci

i (1 − pi)m−ci

(2)

  • M. Hahsler, K. Hornik and T. Reutterer

6 Magdeburg, March 9-11, 2005

slide-7
SLIDE 7

A simple probabilistic framework (cont’d)

Since for a fixed time interval t the number of transactions m is not fixed, the unconditional distribution gives:

P(Ci = ci) =

  • m=ci

P(Ci = ci|M = m) · P(M = m) =

  • m=ci

m ci

  • pci

i (1 − pi)m−ci e−θt(θt)m

m! = e−θt(piθt)ci ci!

  • m=ci

((1 − p)θt)m−ci (m − ci)! = e−piθt(piθt)ci ci!

(3)

which has a Poisson distribution with parameter λi = piθt.

  • M. Hahsler, K. Hornik and T. Reutterer

7 Magdeburg, March 9-11, 2005

slide-8
SLIDE 8

A simple probabilistic framework (cont’d)

Representation of transaction data as a binary incidence matrix:

items transactions l1 l2 l3 ... ln Tr1 0 1 0 ... 1 Tr2 0 1 0 ... 1 Tr3 0 1 0 ... 0 Tr4 0 0 0 ... 0 . . . . . . . . . . . . . . . Trm-1 1 0 0 ... 1 Trm 0 0 1 ... 1 . . . c 99 201 7 ... 411 p 0.005 0.01 0.0003 ... 0.025

  • M. Hahsler, K. Hornik and T. Reutterer

8 Magdeburg, March 9-11, 2005

slide-9
SLIDE 9

Simulation

For simplicity we will assume for the following simulation that the parameters in λ are chosen from a single gamma distribution with parameters k = 0.75 and a = 250. We will simulate the counts ci, for n = 200 different items over a t = 30 day period with transaction intensity θ = 300 transactions per day.

> m <- rpois(1, theta * t) [1] 8885 > p <- sort(rgamma(n, shape = k, scale = a)/m, + decreasing = TRUE)

Now we can simulate the transactions in the database by m Bernoulli trials for each of the n items and calculate the count vector c.

> Tr <- matrix(rbinom(m * n, 1, p), ncol = n, byrow = TRUE) > c <- (apply(Tr, 2, sum))

  • M. Hahsler, K. Hornik and T. Reutterer

9 Magdeburg, March 9-11, 2005

slide-10
SLIDE 10

Simulation (cont’d)

We can directly calculate the support of each item from the transaction counts. > supp1 <- c/m > plot(supp1, type = "h", xlab = "items", + ylab = "support")

  • M. Hahsler, K. Hornik and T. Reutterer

10 Magdeburg, March 9-11, 2005

slide-11
SLIDE 11
  • M. Hahsler, K. Hornik and T. Reutterer

11 Magdeburg, March 9-11, 2005

slide-12
SLIDE 12

Simulation (cont’d)

Next, we extend the framework to the occurrences of 2-itemsets with a symmetric n × n count matrix c2 and a support matrix (supp2): > c2 <- sapply(1:n, function(i) { + apply(Tr[, i] & Tr[, 1:n], 2, sum)}) > diag(c2) <- NA > supp2 <- c2/m > persp(supp2, expand = 0.5, ticktype = "detailed", + border = 0, shade = 1, zlab = "support", + xlab = "items", ylab = "items")

  • M. Hahsler, K. Hornik and T. Reutterer

12 Magdeburg, March 9-11, 2005

slide-13
SLIDE 13
  • M. Hahsler, K. Hornik and T. Reutterer

13 Magdeburg, March 9-11, 2005

slide-14
SLIDE 14

Implications for confidence

Confidence is defined by

conf(X ⇒ Y ) = supp(X + Y ) supp(X) .

(4) From our 2-itemsets we can generate rules of the from li ⇒ lj, where

i, j = 1, 2, . . . , n and i = j. We calculate confidence for the n(n − 1)

possible rules in the data set. > conf2 <- supp2/supp1 > persp(conf2, expand = 0.5, ticktype = "detailed", + border = 0, shade = 1, zlab = "confidence", + xlab = "items", ylab = "items")

  • M. Hahsler, K. Hornik and T. Reutterer

14 Magdeburg, March 9-11, 2005

slide-15
SLIDE 15
  • M. Hahsler, K. Hornik and T. Reutterer

15 Magdeburg, March 9-11, 2005

slide-16
SLIDE 16

Implications for confidence (cont’d)

  • Confidence values are generally very low which reflect the fact that

there are no associations in the data.

  • Some rules with confidence of one. However, left-hand-sides (X) have

low support.

  • Confidence increases with the item in the right-hand-side Y of the rule

getting more frequent. The fact that confidence systematically favors some rules makes the measure problematic when it comes to ranking rules.

  • M. Hahsler, K. Hornik and T. Reutterer

16 Magdeburg, March 9-11, 2005

slide-17
SLIDE 17

Implications for lift

Typically, rules mined using minimum support (and confidence) are filtered or

  • rdered using their lift value. The measure lift is defined as:

lift(X ⇒ Y ) = conf(X ⇒ Y ) supp(Y )

(5) A lift value close to 1 indicates that the items are co-occurring in the database as expected under independence. > lift <- conf2/matrix(supp1, ncol = n, nrow = n, + byrow = TRUE) > persp(lift, expand = 0.5, ticktype = "detailed", + border = 0, shade = 1, zlab = "lift", + xlab = "items", ylab = "items") > length(which(lift > 2)) [1] 3424

  • M. Hahsler, K. Hornik and T. Reutterer

17 Magdeburg, March 9-11, 2005

slide-18
SLIDE 18
  • M. Hahsler, K. Hornik and T. Reutterer

18 Magdeburg, March 9-11, 2005

slide-19
SLIDE 19

Implications for lift (cont’d)

To counter the problem with extremely high lift values, we discard all 2-itemsets which do not satisfy a minimum support of 0.1%. > min_supp <- 0.001 > length(lift[supp2 >= min_supp]) [1] 7096 > lift[supp2 < min_supp] <- 1 > persp(lift, expand = 0.5, ticktype = "detailed", + border = 0, shade = 1, zlab = "lift", + xlab = "items", ylab = "items") > length(which(lift > 2)) [1] 130

  • M. Hahsler, K. Hornik and T. Reutterer

19 Magdeburg, March 9-11, 2005

slide-20
SLIDE 20
  • M. Hahsler, K. Hornik and T. Reutterer

20 Magdeburg, March 9-11, 2005

slide-21
SLIDE 21

Implications for lift (cont’d)

  • Lift performs poorly to filter random noise in transaction data

especially if for relatively rare items.

  • Lift has a tendency to produce higher values for rules with items close

to minimum support. This makes using lift problematic for ranking discovered rules.

  • M. Hahsler, K. Hornik and T. Reutterer

21 Magdeburg, March 9-11, 2005

slide-22
SLIDE 22

New measure: hyperlift

  • The n × n co-occurrence matrix can be modeled by n2 random

variables Ci,j.

  • The framework results in hypergeometric distributions for the Ci,js

(urn model).

  • Using the expected value of Ci,j lift can be rewritten as:

lift(li ⇒ lj) = P(li + lj) P(li)P(lj) = ci,j E[Ci,j]

(6)

  • As a more conservative approach we use quantile Qδ[Ci,j] instead of

the expected value.

hyperlift(li ⇒ lj) = ci,j Qδ[Ci,j].

(7)

  • M. Hahsler, K. Hornik and T. Reutterer

22 Magdeburg, March 9-11, 2005

slide-23
SLIDE 23

New measure: hyperlift (cont’d)

Calculating hyperlift for δ = 0.99: > calc_hyperbase <- function(ci, cj) { + qhyper(0.99, m = cj, n = m - cj, k = ci)} > hyperlift <- c2/outer(c, c, FUN = calc_hyperbase) > hyperlift[is.infinite(hyperlift)] <- NA > persp(hyperlift, shade = 1, ticktype = "detailed", + border = 0, expand = 0.5, zlab = "hyperlift", + xlab = "items", ylab = "items") > length(which(hyperlift > 2)) [1] 2

  • M. Hahsler, K. Hornik and T. Reutterer

23 Magdeburg, March 9-11, 2005

slide-24
SLIDE 24
  • M. Hahsler, K. Hornik and T. Reutterer

24 Magdeburg, March 9-11, 2005

slide-25
SLIDE 25

New measure: hyperlift (cont’d)

  • Generally smaller than 1 and more evenly distributed than lift.

Indicates that hyperlift filters the random co-occurrences better than lift.

  • Hyperlift shows a weak systematic dependency to favor rules with

more frequent items.

  • M. Hahsler, K. Hornik and T. Reutterer

25 Magdeburg, March 9-11, 2005

slide-26
SLIDE 26

Comparing lift and hyperlift on a grocery database

  • 1 month of real-world point-of-sale transaction data from a local

grocery outlet with

  • m = 9835 transaction and
  • n = 169 categories.
  • Support, confidence and lift distributions look almost identical to the

simulated data.

  • M. Hahsler, K. Hornik and T. Reutterer

26 Magdeburg, March 9-11, 2005

slide-27
SLIDE 27

Lift for 2-itemsets for items with support of 0.1% in the grocery database

  • M. Hahsler, K. Hornik and T. Reutterer

27 Magdeburg, March 9-11, 2005

slide-28
SLIDE 28

Hyperlift for 2-itemsets for items in the grocery database

  • M. Hahsler, K. Hornik and T. Reutterer

28 Magdeburg, March 9-11, 2005

slide-29
SLIDE 29

Comparing lift and hyperlift (cont’d)

Top 10 rules (ordered by lift, support = 0.001) l_i l_j supp lift 20 mayonnaise mustard 0.001423 12.965 8 Instant food products hamburger meat 0.003050 11.421 15 softener detergent 0.001118 10.600 16 liquor red/blush wine 0.002135 10.025 6 flour sugar 0.004982 8.463 4 popcorn salty snack 0.002237 8.192 11 processed cheese ham 0.003050 7.071 9 sauces hamburger meat 0.001220 6.684 3 meat spreads cream cheese 0.001118 6.605 14 house keeping products detergent 0.001017 6.346

  • M. Hahsler, K. Hornik and T. Reutterer

29 Magdeburg, March 9-11, 2005

slide-30
SLIDE 30

Comparing lift and hyperlift (cont’d)

Top 10 rules (ordered by hyperlift, no support)

l_i l_j supp hyperlift lift 11 Instant food products hamburger meat 0.0030 4.286 11.421 9 flour sugar 0.0049 4.083 8.463 15 liquor red/blush wine 0.0021 3.500 10.025 * 17 cooking chocolate baking powder 0.0007 3.500 15.826 18 mayonnaise mustard 0.0014 3.500 12.965 6 processed cheese white bread 0.0041 3.154 5.975 7 popcorn salty snack 0.0022 3.143 8.192 13 processed cheese ham 0.0030 3.000 7.071 3 liquor bottled beer 0.0046 2.875 5.241 14 softener detergent 0.0011 2.750 10.600 8 baking powder sugar 0.0032 2.667 5.432

  • M. Hahsler, K. Hornik and T. Reutterer

30 Magdeburg, March 9-11, 2005

slide-31
SLIDE 31

Comparing lift and hyperlift (cont’d)

  • All rules for lift (with support) and hyperlift make intuitively sense.
  • Rules with high hyperlift have potentially also high lift.
  • Hyperlift selects rules with support varying from very rare to relatively

frequent (the tendency of hyperlift to favors rules with more frequent items seems not too strong).

  • Hyperlift is also able to deal with very infrequent rules.
  • M. Hahsler, K. Hornik and T. Reutterer

31 Magdeburg, March 9-11, 2005

slide-32
SLIDE 32

Conclusion

  • Interest measures are systematically influenced by the frequencies of

items in the corresponding itemsets or rules.

  • Lift performs poorly to filter random noise.
  • The presented framework provides many possibilities for further

research: – Adapt hyperlift to finding substitutes (instead of complements). – Analyze systematic influence of the occurrence frequency of items

  • n the hyperlift measure.

– Use p-value instead of hyperlift. – Expand model to itemsets of size > 2. – Model dependencies between items.

  • M. Hahsler, K. Hornik and T. Reutterer

32 Magdeburg, March 9-11, 2005