Implications of Probabilistic Data Modeling for Rule Mining Michael - PowerPoint PPT Presentation

Implications of Probabilistic Data Modeling for Rule Mining Michael Hahsler, Kurt Hornik and Thomas Reutterer Wirtschaftsuniversit¨ at Wien 29th Annual Conference of the German Classification Society (GfKl 2005) Magdeburg, March 9-11, 2005

Motivation • Mining association rules is an important technique for discovering meaningful patterns in transaction databases. – Example: diapers ⇒ beer – Applications: product assortment decisions, adapting promotional activities, personalized product recommendations, adaptive user interfaces • Current literature focuses on the properties of algorithms. • We will discuss properties of – transaction data sets and – interest measures from a probabilistic point of view. M. Hahsler, K. Hornik and T. Reutterer 2 Magdeburg, March 9-11, 2005

Outline 1. Association rules 2. Probabilistic model for transaction data 3. Simulation with R 4. Implications for confidence and lift 5. New measure: hyperlift 6. Conclusion M. Hahsler, K. Hornik and T. Reutterer 3 Magdeburg, March 9-11, 2005

Association Rules An association rule is a rule of the form X ⇒ Y , where X and Y are two disjoint sets of items (itemsets). Rule selection with threshold on interest measures: • Support: fraction of transactions containing an itemset • Confidence: probability of seeing Y under the condition that the transactions also contain X Found rules are often ranked by: • Lift: how many times more often X and Y occur together than expected if they where statistically independent M. Hahsler, K. Hornik and T. Reutterer 4 Magdeburg, March 9-11, 2005

A simple probabilistic framework for transaction data Transactions occur following a Poisson process time Tr1Tr2 Tr3 Tr4 Tr5 Trm-2 Trm-1 Trm 0 t We analyze transactions which are recorded in a fixed time interval of length t . The number of transactions m in the time interval is then poisson distributed with parameter θt : P ( M = m ) = e − θt ( θt ) m (1) m ! M. Hahsler, K. Hornik and T. Reutterer 5 Magdeburg, March 9-11, 2005

A simple probabilistic framework (cont’d) • n independent items L = { l 1 , l 2 , . . . , l n } , • with each having a fixed success probabilities to occur in a transaction given by the vector p = ( p 1 , p 2 , . . . , p n ) . Following the framework: c i , the observed number of transactions item l i is contained in, can be interpreted as a realization of a random variable C i . Under the condition of a fixed number of transactions m this random variable has a binomial distribution: � m � p c i i (1 − p i ) m − c i P ( C i = c i | M = m ) = (2) c i M. Hahsler, K. Hornik and T. Reutterer 6 Magdeburg, March 9-11, 2005

A simple probabilistic framework (cont’d) Since for a fixed time interval t the number of transactions m is not fixed, the unconditional distribution gives: ∞ � P ( C i = c i ) = P ( C i = c i | M = m ) · P ( M = m ) m = c i ∞ i (1 − p i ) m − c i e − θt ( θt ) m � m � � p c i = m ! c i m = c i (3) ∞ = e − θt ( p i θt ) c i ((1 − p ) θt ) m − c i � c i ! ( m − c i )! m = c i = e − p i θt ( p i θt ) c i c i ! which has a Poisson distribution with parameter λ i = p i θt . M. Hahsler, K. Hornik and T. Reutterer 7 Magdeburg, March 9-11, 2005

A simple probabilistic framework (cont’d) Representation of transaction data as a binary incidence matrix: items l 1 l 2 l 3 ... l n p 0.005 0.01 0.0003 ... 0.025 Tr 1 0 1 0 ... 1 Tr 2 0 1 0 ... 1 transactions Tr 3 0 1 0 ... 0 Tr 4 0 0 0 ... 0 . . . . . . . . . . . . . . . . . . Tr m-1 1 0 0 ... 1 Tr m 0 0 1 ... 1 c 99 201 7 ... 411 M. Hahsler, K. Hornik and T. Reutterer 8 Magdeburg, March 9-11, 2005

Simulation For simplicity we will assume for the following simulation that the parameters in λ are chosen from a single gamma distribution with parameters k = 0 . 75 and a = 250 . We will simulate the counts c i , for n = 200 different items over a t = 30 day period with transaction intensity θ = 300 transactions per day. > m <- rpois(1, theta * t) [1] 8885 > p <- sort(rgamma(n, shape = k, scale = a)/m, + decreasing = TRUE) Now we can simulate the transactions in the database by m Bernoulli trials for each of the n items and calculate the count vector c . > Tr <- matrix(rbinom(m * n, 1, p), ncol = n, byrow = TRUE) > c <- (apply(Tr, 2, sum)) M. Hahsler, K. Hornik and T. Reutterer 9 Magdeburg, March 9-11, 2005

Simulation (cont’d) We can directly calculate the support of each item from the transaction counts. > supp1 <- c/m > plot(supp1, type = "h", xlab = "items", + ylab = "support") M. Hahsler, K. Hornik and T. Reutterer 10 Magdeburg, March 9-11, 2005

M. Hahsler, K. Hornik and T. Reutterer 11 Magdeburg, March 9-11, 2005

Simulation (cont’d) Next, we extend the framework to the occurrences of 2 -itemsets with a symmetric n × n count matrix c2 and a support matrix ( supp2 ): > c2 <- sapply(1:n, function(i) { + apply(Tr[, i] & Tr[, 1:n], 2, sum)}) > diag(c2) <- NA > supp2 <- c2/m > persp(supp2, expand = 0.5, ticktype = "detailed", + border = 0, shade = 1, zlab = "support", + xlab = "items", ylab = "items") M. Hahsler, K. Hornik and T. Reutterer 12 Magdeburg, March 9-11, 2005

Implications for confidence Confidence is defined by conf( X ⇒ Y ) = supp( X + Y ) . (4) supp( X ) From our 2 -itemsets we can generate rules of the from l i ⇒ l j , where i, j = 1 , 2 , . . . , n and i � = j . We calculate confidence for the n ( n − 1) possible rules in the data set. > conf2 <- supp2/supp1 > persp(conf2, expand = 0.5, ticktype = "detailed", + border = 0, shade = 1, zlab = "confidence", + xlab = "items", ylab = "items") M. Hahsler, K. Hornik and T. Reutterer 14 Magdeburg, March 9-11, 2005

Implications for confidence (cont’d) • Confidence values are generally very low which reflect the fact that there are no associations in the data. • Some rules with confidence of one. However, left-hand-sides ( X ) have low support. • Confidence increases with the item in the right-hand-side Y of the rule getting more frequent. The fact that confidence systematically favors some rules makes the measure problematic when it comes to ranking rules. M. Hahsler, K. Hornik and T. Reutterer 16 Magdeburg, March 9-11, 2005

Implications for lift Typically, rules mined using minimum support (and confidence) are filtered or ordered using their lift value. The measure lift is defined as: lift( X ⇒ Y ) = conf( X ⇒ Y ) (5) supp( Y ) A lift value close to 1 indicates that the items are co-occurring in the database as expected under independence. > lift <- conf2/matrix(supp1, ncol = n, nrow = n, + byrow = TRUE) > persp(lift, expand = 0.5, ticktype = "detailed", + border = 0, shade = 1, zlab = "lift", + xlab = "items", ylab = "items") > length(which(lift > 2)) [1] 3424 M. Hahsler, K. Hornik and T. Reutterer 17 Magdeburg, March 9-11, 2005

Implications for lift (cont’d) To counter the problem with extremely high lift values, we discard all 2-itemsets which do not satisfy a minimum support of 0.1%. > min_supp <- 0.001 > length(lift[supp2 >= min_supp]) [1] 7096 > lift[supp2 < min_supp] <- 1 > persp(lift, expand = 0.5, ticktype = "detailed", + border = 0, shade = 1, zlab = "lift", + xlab = "items", ylab = "items") > length(which(lift > 2)) [1] 130 M. Hahsler, K. Hornik and T. Reutterer 19 Magdeburg, March 9-11, 2005

Implications for lift (cont’d) • Lift performs poorly to filter random noise in transaction data especially if for relatively rare items. • Lift has a tendency to produce higher values for rules with items close to minimum support. This makes using lift problematic for ranking discovered rules. M. Hahsler, K. Hornik and T. Reutterer 21 Magdeburg, March 9-11, 2005

New measure: hyperlift • The n × n co-occurrence matrix can be modeled by n 2 random variables C i,j . • The framework results in hypergeometric distributions for the C i,j s (urn model). • Using the expected value of C i,j lift can be rewritten as: lift( l i ⇒ l j ) = P ( l i + l j ) c i,j P ( l i ) P ( l j ) = (6) E [ C i,j ] • As a more conservative approach we use quantile Q δ [ C i,j ] instead of the expected value. c i,j hyperlift( l i ⇒ l j ) = Q δ [ C i,j ] . (7) M. Hahsler, K. Hornik and T. Reutterer 22 Magdeburg, March 9-11, 2005

Implications of Probabilistic Data Modeling for Rule Mining Michael - PowerPoint PPT Presentation

Implications of Probabilistic Data Modeling for Rule Mining Michael Hahsler, Kurt Hornik and Thomas Reutterer Wirtschaftsuniversit at Wien 29th Annual Conference of the German Classification Society (GfKl 2005) Magdeburg, March 9-11, 2005

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

Language Modeling CSE354 - Spring 2020 Task Language Modeling Probabilistic Modeling

Rule Changes - Non rule change year Review of 2017 rule changes - just the easy to forgot

Common Rule Advanced Notice of Proposed Rulemaking (ANPRM) IRB Investigator Advanced Notice

2nd RULE: You MUST TALK about BOOK CLUB. 2nd RULE: You DO NOT talk about 3rd RULE: PERSEVERE -- If

Rule #1: Have a takeaway. Rule #2: Keep It Simple. Rule #3: Repetition is Good. Rule #4: Be

Counting Rules, etc Product Rule Generalized Product Rule Division Rule Bijection

Using Rule-Based Activity Using Rule-Based Activity Using Rule-Based Activity Using Rule-Based

Implications of global economic crisis Implications of global economic crisis Implications of

The Chain Rule Given a composite function: The Chain Rule Given a composite function: h ( x ) =

Product and Quotient Rule September 16, 2016 1 Product and Quotient Rule September 16, 2016 2

Rule-based Modeling William S. Hlavacek Theoretical Division Los Alamos National Laboratory

Running Probabilistic Running Probabilistic Running Probabilistic Programs Backwards Programs

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Thesis

Probabilistic Computation Lecture 13 BPP vs. PH 1 Recap 2 Recap Probabilistic computation 2

Performance Measurement Work Group Meeting 4/17 / 2019 Agenda Welcome and Introductions

Learning Deep Architectures for AI Yoshua Bengio Dept. IRO, Universit e de Montr eal C.P.

ACMS 20340 Statistics for Life Sciences Chapter 18: Comparing Two Means Daily Activity and

SANDYS PICKS FOR 2015 1. TITLE PAGE 2. HYBRID TEAS FEATURING # 1 ROSE, RANDY SCOTT 3. YEAR OR

The Art, Science, and Craft of Nutrition & Flavor Profitable Onion Production While

A National Web Conference on the Use of Health IT To Improve Health Care Delivery for Children

Classicism The Classical Moment The Persian Wars 490 Marathon - Darius invades

Introduction to the SAGA API Outline SAGA Standardization API Structure and Scope (C++)

Implications of Probabilistic Data Modeling for Rule Mining Michael - PowerPoint PPT Presentation

Implications of Probabilistic Data Modeling for Rule Mining Michael Hahsler, Kurt Hornik and Thomas Reutterer Wirtschaftsuniversit at Wien 29th Annual Conference of the German Classification Society (GfKl 2005) Magdeburg, March 9-11, 2005

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

Language Modeling CSE354 - Spring 2020 Task Language Modeling Probabilistic Modeling

Rule Changes - Non rule change year Review of 2017 rule changes - just the easy to forgot

Common Rule Advanced Notice of Proposed Rulemaking (ANPRM) IRB Investigator Advanced Notice

2nd RULE: You MUST TALK about BOOK CLUB. 2nd RULE: You DO NOT talk about 3rd RULE: PERSEVERE -- If

Rule #1: Have a takeaway. Rule #2: Keep It Simple. Rule #3: Repetition is Good. Rule #4: Be

Counting Rules, etc Product Rule Generalized Product Rule Division Rule Bijection

Using Rule-Based Activity Using Rule-Based Activity Using Rule-Based Activity Using Rule-Based

Implications of global economic crisis Implications of global economic crisis Implications of

The Chain Rule Given a composite function: The Chain Rule Given a composite function: h ( x ) =

Product and Quotient Rule September 16, 2016 1 Product and Quotient Rule September 16, 2016 2

Rule-based Modeling William S. Hlavacek Theoretical Division Los Alamos National Laboratory

Running Probabilistic Running Probabilistic Running Probabilistic Programs Backwards Programs

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Thesis

Probabilistic Computation Lecture 13 BPP vs. PH 1 Recap 2 Recap Probabilistic computation 2

Performance Measurement Work Group Meeting 4/17 / 2019 Agenda Welcome and Introductions

Learning Deep Architectures for AI Yoshua Bengio Dept. IRO, Universit e de Montr eal C.P.

ACMS 20340 Statistics for Life Sciences Chapter 18: Comparing Two Means Daily Activity and

SANDYS PICKS FOR 2015 1. TITLE PAGE 2. HYBRID TEAS FEATURING # 1 ROSE, RANDY SCOTT 3. YEAR OR

The Art, Science, and Craft of Nutrition &amp; Flavor Profitable Onion Production While

A National Web Conference on the Use of Health IT To Improve Health Care Delivery for Children

Classicism The Classical Moment The Persian Wars 490 Marathon - Darius invades

Introduction to the SAGA API Outline SAGA Standardization API Structure and Scope (C++)

The Art, Science, and Craft of Nutrition & Flavor Profitable Onion Production While