Association Rule Mining 1 What Is Association Rule Mining? - - PowerPoint PPT Presentation

association rule mining
SMART_READER_LITE
LIVE PREVIEW

Association Rule Mining 1 What Is Association Rule Mining? - - PowerPoint PPT Presentation

Association Rule Mining 1 What Is Association Rule Mining? Association rule mining is finding frequent patterns or associations among sets of items or objects, usually amongst transactional data Applications include Market Basket


slide-1
SLIDE 1

1

Association Rule Mining

slide-2
SLIDE 2

What Is Association Rule Mining?

  • Association rule mining is finding frequent patterns or

associations among sets of items or objects, usually amongst transactional data

  • Applications include Market Basket analysis, cross-marketing,

catalog design, etc.

2

slide-3
SLIDE 3

Association Mining

  • Examples.

 Rule form: “Body ead [support, confidence]”.  buys(x, “diapers”)  buys(x, “beers”) [0.5%, 60%]

 buys(x, "bread") buys(x, "milk") [0.6%, 65%]  major(x, "CS") /\ takes(x, "DB")  grade(x, "A") [1%, 75%]  age(X,30-45) /\ income(X, 50K-75K)  buys(X, SUVcar)  age=“30-45”, income=“50K-75K”  car=“SUV”

slide-4
SLIDE 4

Market-basket Analysis & Finding Associations

  • Do items occur together?
  • Proposed by Agrawal et al in 1993.
  • It is an important data mining model studied extensively

by the database and data mining community.

  • Assumes all data are categorical.
  • Initially used for Market Basket Analysis to find how

items purchased by customers are related. Bread  Milk [sup = 5%, conf = 100%]

slide-5
SLIDE 5

Association Rule: Basic Concepts

  • Given: (1) database of transactions, (2) each transaction is a list
  • f items (purchased by a customer in a visit)
  • Find: all rules that correlate the presence of one set of items

with that of another set of items

 E.g., 98% of people who purchase tires and auto accessories

also get automotive services done

  • Applications

 *  Maintenance Agreement (What the store should do to

boost Maintenance Agreement sales)

 Home Electronics  * (What other products should the

store stocks up?)

 Detecting “ping-pong”ing of patients, faulty “collisions”

5

slide-6
SLIDE 6

Association Rule Mining

  • Given a set of transactions, find rules that will predict the
  • ccurrence of an item based on the occurrences of other

items in the transaction

6

Market-Basket transactions

TID Items

1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke Example of Association Rules

{Diaper}  {Beer}, {Milk, Bread}  {Eggs,Coke}, {Beer, Bread}  {Milk},

Implication means co-occurrence, not causality!

An itemset is simply a set of items

slide-7
SLIDE 7

Examples from a Supermarket

  • Can you think of association rules from a

supermarket?

  • Let’s say you identify association rules from a

supermarket, how might you exploit them?

 That is, if you are the store manager, how might

you make money?

 Assume you have a rule of the form X  Y

7

slide-8
SLIDE 8

Supermarket examples

  • If you have a rule X  Y, you could:

 Run a sale on X if you want to increase sales of Y  Locate the two items near each other  Locate the two items far from each other to make

the shopper walk through the store

 Print out a coupon on checkout for Y if shopper

bought X but not Y

8

slide-9
SLIDE 9

Association “rules”–standard format

Rule format: (A set can consist of just a single item) If {set of items}  Then {set of items} Condition implies Results

If {Diapers, Baby Food} Condition {Beer, Chips} Results Then

Customer buys diaper Customer buys both Customer buys beer

Right side very often is a single item Rules do not imply causality

slide-10
SLIDE 10

What is an Interesting Association?

  • Requires domain-knowledge validation

 Actionable, non-trivial, understandable

  • Algorithms provide first-pass based on statistics
  • n how “unexpected” an association is
  • Some standard statistics used:

C  R

 support ≈ p(R&C)

 percent of “baskets” where rule holds

 confidence ≈ p(R|C)

 percent of times R holds when C holds

slide-11
SLIDE 11

Support and Confidence

  • Find all the rules X  Y with

minimum confidence and support

 Support = probability that a transaction

contains {X,Y}

 i.e., ratio of transactions in which X, Y

  • ccur together to all transactions in DB.

 Confidence = conditional probability

that a transaction having X contains Y

 i.e., ratio of transactions in which X, Y

  • ccur together to those in which X occurs.

Thel confidence of a rule LHS => RHS can be computed as the support of the whole itemset divided by the support of LHS: Confidence (LHS => RHS) = Support(LHS  RHS) / Support(LHS)

Customer buys diaper Customer buys both Customer buys beer

slide-12
SLIDE 12

Definition: Frequent Itemset

  • Itemset

A collection of one or more items  Example: {Milk, Bread, Diaper}

k-itemset: itemset with k items

  • Support count ()

Frequency count of occurrence of itemset

E.g. ({Milk, Bread,Diaper}) = 2

  • Support

Fraction of transactions containing the itemset

E.g. s({Milk, Bread, Diaper}) = 2/5

  • Frequent Itemset

An itemset whose support is greater than or equal to a minsup threshold

TID Items

1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke

12

slide-13
SLIDE 13

Support and Confidence Calculations

TID Items

1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke

Beer } Diaper , Milk { 

4 . 5 2 | T | ) Beer Diaper, , Milk (     s 67 . 3 2 ) Diaper , Milk ( ) Beer Diaper, Milk, (      c

 Given Association Rule

– {Milk, Diaper}  {Beer}

 Rule Evaluation Metrics

– Support (s)

 Fraction of transactions that

contain both X and Y – Confidence (c)

 Measures how often items in Y

appear in transactions that contain X

Now Compute these two metrics

slide-14
SLIDE 14

Support and Confidence – 2nd Example

Transaction ID Items Bought 1001 A, B, C 1002 A, C 1003 A, D 1004 B, E, F 1005 A, D, F

Itemset {A, C} has a support of 2/5 = 40% Rule {A} ==> {C} has confidence of 50% Rule {C} ==> {A} has confidence of 100% Support for {A, C, E} ? Support for {A, D, F} ? Confidence for {A, D} ==> {F} ? Confidence for {A} ==> {D, F} ?

Goal: Find all rules that satisfy the user-specified minimum support (minsup) and minimum confidence (minconf).

slide-15
SLIDE 15

Example

  • Transaction data
  • Assume:

minsup = 30% minconf = 80%

  • An example frequent itemset:

{Chicken, Clothes, Milk}

[sup = 3/7]

  • Rules from the itemset are partitions of the items
  • Association rules from above itemset:

Clothes  Milk, Chicken

[sup = 3/7, conf = 3/3]

Clothes, Chicken  Milk,

[sup = 3/7, conf = 3/3]

15

t1: Beef, Chicken, Milk t2: Beef, Cheese t3: Cheese, Boots t4: Beef, Chicken, Cheese t5: Beef, Chicken, Clothes, Cheese, Milk t6: Chicken, Clothes, Milk t7: Chicken, Milk, Clothes

slide-16
SLIDE 16

Mining Association Rules

TID Items

1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke

Example of Rules:

{Milk,Diaper}  {Beer} (s=0.4, c=0.67) {Milk,Beer}  {Diaper} (s=0.4, c=1.0) {Diaper,Beer}  {Milk} (s=0.4, c=0.67) {Beer}  {Milk,Diaper} (s=0.4, c=0.67) {Diaper}  {Milk,Beer} (s=0.4, c=0.5) {Milk}  {Diaper,Beer} (s=0.4, c=0.5)

Observations:

  • All the above rules are binary partitions of the same itemset:

{Milk, Diaper, Beer}

  • Rules originating from the same itemset have identical support (by

definition) but may have different confidence values

slide-17
SLIDE 17

Drawback of Confidence

Coffee Coffee Tea 15 5 20 Tea 75 5 80 90 10 100 Association Rule: Tea  Coffee Confidence= P(Coffee|Tea) = 0.75 but P(Coffee) = 0.9  Although confidence is high, rule is misleading  P(Coffee|Tea) = 0.9375

slide-18
SLIDE 18

Mining Association Rules

  • Two-step approach:

1.

Frequent Itemset Generation

– Generate all itemsets whose support  minsup

2.

Rule Generation

– Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset

  • Frequent itemset generation is still

computationally expensive

slide-19
SLIDE 19

Transaction data representation

  • A simplistic view of “shopping baskets”
  • Some important information not considered:

 the quantity of each item purchased  the price paid

19

slide-20
SLIDE 20

Many mining algorithms

  • There are a large number of them
  • They use different strategies and data structures.
  • Their resulting sets of rules are all the same.

 Given a transaction data set T, and a minimum support and a

minimum confident, the set of association rules existing in T is uniquely determined.

  • Any algorithm should find the same set of rules

although their computational efficiencies and memory requirements may be different.

  • We study only one: the Apriori Algorithm

20

slide-21
SLIDE 21

The Apriori algorithm

  • The best known algorithm
  • Two steps:

 Find all itemsets that have minimum support (frequent

itemsets, also called large itemsets).

 Use frequent itemsets to generate rules.

  • E.g., a frequent itemset

{Chicken, Clothes, Milk} [sup = 3/7]

and one rule from the frequent itemset

Clothes  Milk, Chicken [sup = 3/7, conf = 3/3]

21

slide-22
SLIDE 22

Step 1: Mining all Frequent Itemsets

  • A frequent itemset is an itemset whose support is

≥ minsup.

  • Key idea: The Apriori property (downward closure

property): any subsets of a frequent itemset are also frequent itemsets

22

AB AC AD BC BD CD A B C D ABC ABD ACD BCD

slide-23
SLIDE 23

Steps in Association Rule Discovery

  • Find frequent itemsets

 Itemsets with at least minimum support  Support is “downward closed” so a subset of a frequent

itemset must be frequent

 if {AB} is a frequent itemset, both {A} and {B} are frequent itemsets  If an itemset doesnot satisfy minimum support, none of its supersets will either (this is key point that allows pruning of search space)

 Iteratively find frequent itemsets with cardinality from 1 to k

(k-itemsets)

  • Use the frequent itemsets to generate assoc. rules

 Generate all binary partitions, but may have to fit template  E.g., only one item on right side or only two items on left side

slide-24
SLIDE 24

Frequent Itemset Generation

24

null AB AC AD AE BC BD BE CD CE DE A B C D E ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDE

Given d items, there are 2d possible candidate itemsets

slide-25
SLIDE 25

Mining Association Rules—An Example

For rule A  C: support = support({A ,C}) = 50% confidence = support({A ,C})/support({A}) = 66.6% The Apriori principle: Any subset of a frequent itemset must be frequent

25

Transaction ID Items Bought 2000 A,B,C 1000 A,C 4000 A,D 5000 B,E,F

Frequent Itemset Support {A} 75% {B} 50% {C} 50% {A,C} 50%

  • Min. support 50%
  • Min. confidence 50%

User specifies these

slide-26
SLIDE 26

26

Illustrating the Apriori Principle

Found to be Infrequent

null AB AC AD AE BC BD BE CD CE DE A B C D E ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDE null AB AC AD AE BC BD BE CD CE DE A B C D E ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDE

Pruned supersets

slide-27
SLIDE 27

The Apriori Algorithm

  • Terminology:

 Ck is the set of candidate k-itemsets  Lk is the set of k-itemsets

  • Join Step: Ck is generated by joining two elements from Lk-1

 There must be a lot of overlap for the join to only increase length by 1

  • Prune Step: Any (k-1)-itemset that is not frequent cannot

be a subset of a frequent k-itemset

 This is a bit confusing since we want to use it the other way.

We prune a candidate k-itemset if any of its k-1 itemsets are not in our list of frequent k-1 itemsets

  • To utilize this you simply start with k=1, which is single-item

itemsets and they you work your way up from there!

27

slide-28
SLIDE 28

The Algorithm

  • Iterative algo. (also called level-wise search): Find

all 1-item frequent itemsets; then all 2-item frequent itemsets, and so on.

 In each iteration k, only consider itemsets that contain

some k-1 frequent itemset.

  • Find frequent itemsets of size 1: F1
  • From k = 2

 Ck = candidates of size k: those itemsets of size k that

could be frequent, given Fk-1

 Fk = those itemsets that are actually frequent, Fk  Ck

(need to scan the database once).

28

slide-29
SLIDE 29

Apriori Candidate Generation

  • The candidate-gen function takes Lk-1 and

returns a superset (called the candidates) of the set of all frequent k-itemsets.

  • There are two steps:

 join step: Generate all possible candidate itemsets

Ck of length k

 prune step: Remove those candidates in Ck that

cannot be frequent.

29

slide-30
SLIDE 30

How to Generate Candidates?

  • Suppose the items in Lk-1 are listed in an order
  • Step 1: self-joining Lk-1

 The description below is a bit confusing– all we do is splice two sets

together so that only one new item is added (see next slide)

 insert into Ck  select p.item1, p.item2, …, p.itemk-1, q.itemk-1  from Lk-1 p, Lk-1 q  where p.item1=q.item1, …, p.itemk-2=q.itemk-2, p.itemk-1 < q.itemk-1

  • Step 2: pruning

forall itemsets c in Ck do forall (k-1)-subsets s of c do

if (s is not in Lk-1) then delete c from Ck

30

slide-31
SLIDE 31

Self-Joining Step

  • All items in the itemset to be self joined are in

a consistent order– any order

 Such as lexicographic (alphabetical) order

  • Two items in the itemset can be joined only if

they differ in the last position

  • Then when you join them the size of the

itemset goes up by one

  • See example on next slide

31

slide-32
SLIDE 32

Example of Generating Candidates (1)

  • L3={abc, abd, acd, ace, bcd}
  • Self-joining: L3*L3

 abc and abd yields abcd  acd and ace yields acde  We do not join abd and acd

 Even though it would give abcd which is a candidate  If the product were a candidate it would have already been generated given the ordering  This may not be obvious at first glance

32

slide-33
SLIDE 33

Example of Generating Candidates (2)

  • Note that for abcd to be frequent by the

Apriori property abc, bcd, and abd must be frequent

  • abc and abd are alphabetically before bcd
  • So if we see abc and bcd we do not need to

generate abcd because if abd were there it would have already been generated

 If it is not there then it would be pruned later

33

slide-34
SLIDE 34

Example of Generating Candidates (3)

  • Given abde we go to the pruning phase

 acde is removed because ade is not in L3

 Merge step does not ensure all subsets are frequent

  • C4={abcd}

34

slide-35
SLIDE 35

The Apriori Algorithm — Example (minsup = 30%)

35

TID Items 100 1 3 4 200 2 3 5 300 1 2 3 5 400 2 5

Database D

itemset sup. {1} 2 {2} 3 {3} 3 {4} 1 {5} 3

itemset sup. {1} 2 {2} 3 {3} 3 {5} 3 Scan D C1 L1

itemset {1 2} {1 3} {1 5} {2 3} {2 5} {3 5}

itemset sup {1 2} 1 {1 3} 2 {1 5} 1 {2 3} 2 {2 5} 3 {3 5} 2

itemset sup {1 3} 2 {2 3} 2 {2 5} 3 {3 5} 2

L2 C2 C2 Scan D C3 L3 itemset {2 3 5} Scan D

itemset sup {2 3 5} 2

slide-36
SLIDE 36

Warning: Do Not Forget Pruning

  • Rules get pruned in two ways

 Apriori property violated  If Apriori not violated, still must scan database and if

minsup not exceeded then prune

 Apriori property is necessary but not sufficient to keep a rule  If you forget to prune via Apriori property, you will get

same results since will catch on the scan

 But I will take off points on an exam. Make it clear when prune using Apriori property (do not fill in count when crossing off)

  • Apriori property cannot be violated until k=3.

Begins go get trickier at k=4 since more subsets to check

36

slide-37
SLIDE 37

Step 2: Rules from Frequent Itemsets

  • Frequent itemsets  association rules
  • One more step is needed
  • For each frequent itemset X,

For each proper nonempty subset A of X,

 Let B = X - A  A  B is an association rule if

 Confidence(A  B) ≥ minconf, support(A  B) = support(AB) = support(X) confidence(A  B) = support(A  B) / support(A)

37

slide-38
SLIDE 38

Generating Rules: an Example

  • Suppose {2,3,4} is frequent, with sup=50%

 Proper nonempty subsets: {2,3}, {2,4}, {3,4}, {2}, {3}, {4}, with sup=50%,

50%, 75%, 75%, 75%, 75% respectively

 These generate these association rules:

 2,3  4, confidence=100%  2,4  3, confidence=100%  3,4  2, confidence=67%  2  3,4, confidence=67%  3  2,4, confidence=67%  4  2,3, confidence=67%  All rules have support = 50%  Then apply confidence threshold to identify strong rules  Rules that meet the support and confidence requirements  If confidence threshold is 80% we are left with 2 strong rules

38

slide-39
SLIDE 39

Generating Rules: Summary

  • To recap, in order to obtain A  B, we need to

have support(A  B) and support(A)

  • All the required information for confidence

computation has already been recorded in itemset generation. No need to see the data T any more.

  • This step is not as time-consuming as frequent

itemset generation

 Hint: I almost always ask this on the exam

39

slide-40
SLIDE 40

On Apriori Algorithm

Seems to be very expensive

  • Level-wise search
  • K = the size of the largest itemset
  • It makes at most K passes over data
  • In practice, K is bounded (10).
  • The algorithm is very fast. Under some conditions, all

rules can be found in linear time.

  • Scale up to large data sets

40

slide-41
SLIDE 41

Granularity of items

  • One exception to the “ease” of applying association rules

is selecting the granularity of the items.

  • Should you choose:

 diet coke?  coke product?  soft drink?  beverage?

  • Should you include more than one level of granularity?

 Some association finding techniques allow you to represent

hierarchies explicitly

slide-42
SLIDE 42

Multiple-Level Association Rules

  • Items often form a hierarchy

 Items at the lower level are expected to have lower support  Rules regarding itemsets at appropriate levels could be quite useful  Transaction database can be encoded based on dimensions and levels

Food Milk Bread Skim 2% Wheat White

slide-43
SLIDE 43

Mining Multi-Level Associations

  • A top-down, progressive deepening

approach

 First find high-level strong rules:

 milk bread [20%, 60%]

 Then find their lower-level “weaker” rules:

 2% milk wheat bread [6%, 50%]

 Usually requires different thresholds at

different levels to find meaningful rules

 lower support at lower levels

slide-44
SLIDE 44

Interestingness Measurements

  • Objective measures

 Two popular measurements:

 Support  Confidence

  • Subjective measures (Silberschatz & Tuzhilin,

KDD95) A rule (pattern) is interesting if

 it is unexpected (surprising to the user); and/or  actionable (the user can do something with it)

44

slide-45
SLIDE 45

Criticism to Support and Confidence

  • Example 1:

 Among 5000 students

 3000 play basketball  3750 eat cereal  2000 both play basket ball and eat cereal

 play basketball  eat cereal [40%, 66.7%] is misleading because the

  • verall percentage of students eating cereal is 75% which is higher than

66.7%.

 play basketball  not eat cereal [20%, 33.3%] is far more interesting,

although with lower support and confidence

45

basketball not basketball sum(row) cereal 2000 1750 3750 not cereal 1000 250 1250 sum(col.) 3000 2000 5000

 Lift of A => B = P(B|A)/P(B)

and a rule is interesting if lift is not near 1.0 What is the lift

  • f this rule?

(1/3)/(1250/5000) = 1.33

slide-46
SLIDE 46

Customer Number vs. Transaction ID

  • In the homework you may have a problem where

there is a customer id for each transaction

 You can be asked to do association analysis based on

the customer id

 If this is so, you need to aggregate the transactions to the customer level  If a customer has 3 transactions then you just create an itemset containing all of the items in the union of the 3 transactions

 Note we will ignore the frequency of purchase

46

slide-47
SLIDE 47

Virtual items

  • If you’re interested in including other possible

variables, can create “virtual items”

  • gift-wrap, used-coupon, new-store, winter-

holidays, bought-nothing,…

slide-48
SLIDE 48

Associations: Pros and Cons

  • Pros

 can quickly mine patterns describing

business/customers/etc. without major effort in problem formulation

 virtual items allow much flexibility  unparalleled tool for hypothesis generation

  • Cons

 unfocused

 not clear exactly how to apply mined “knowledge”  only hypothesis generation

 can produce many, many rules!

 may only be a few nuggets among them (or none)

slide-49
SLIDE 49

Association Rules

  • Association rule types:

 Actionable Rules – contain high-quality, actionable

information

 Trivial Rules – information already well-known by

those familiar with the business

 Inexplicable Rules – no explanation and do not

suggest action

  • Trivial and inexplicable rules occur most often