Association Rule Mining 1 What Is Association Rule Mining? - PowerPoint PPT Presentation

Association Rule Mining 1

What Is Association Rule Mining?  Association rule mining is finding frequent patterns or associations among sets of items or objects, usually amongst transactional data  Applications include Market Basket analysis, cross-marketing, catalog design, etc. 2

Association Mining  Examples.  Rule form: “ Body  ead [support, confidence]”.  buys(x, “diapers”)  buys(x, “beers”) [0.5%, 60%]  buys(x, "bread")  buys(x, "milk") [0.6%, 65%]  major(x, "CS") /\ takes(x, "DB")  grade(x, "A") [1%, 75%]  age(X,30-45) /\ income(X, 50K-75K)  buys(X, SUVcar)  age=“30 - 45”, income=“50K - 75K”  car=“SUV”

Market-basket Analysis & Finding Associations  Do items occur together?  Proposed by Agrawal et al in 1993.  It is an important data mining model studied extensively by the database and data mining community.  Assumes all data are categorical.  Initially used for Market Basket Analysis to find how items purchased by customers are related. Bread  Milk [sup = 5%, conf = 100%]

Association Rule: Basic Concepts  Given: (1) database of transactions, (2) each transaction is a list of items (purchased by a customer in a visit)  Find: all rules that correlate the presence of one set of items with that of another set of items  E.g., 98% of people who purchase tires and auto accessories also get automotive services done  Applications  *  Maintenance Agreement (What the store should do to boost Maintenance Agreement sales)  Home Electronics  * (What other products should the store stocks up?)  Detecting “ping - pong”ing of patients, faulty “collisions” 5

Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction Market-Basket transactions Example of Association Rules TID Items {Diaper}  {Beer}, 1 Bread, Milk {Milk, Bread}  {Eggs,Coke}, {Beer, Bread}  {Milk}, 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer Implication means co-occurrence, not causality! 5 Bread, Milk, Diaper, Coke An itemset is simply a set of items 6

Examples from a Supermarket  Can you think of association rules from a supermarket?  Let’s say you identify association rules from a supermarket, how might you exploit them?  That is, if you are the store manager, how might you make money?  Assume you have a rule of the form X  Y 7

Supermarket examples  If you have a rule X  Y, you could:  Run a sale on X if you want to increase sales of Y  Locate the two items near each other  Locate the two items far from each other to make the shopper walk through the store  Print out a coupon on checkout for Y if shopper bought X but not Y 8

Association “ rules ”– standard format Rule format: ( A set can consist of just a single item ) If {set of items}  Then {set of items} Condition Results Then If {Diapers, {Beer, Chips} Baby Food} Customer Customer buys both Condition implies Results buys diaper Right side very often is a single item Rules do not imply causality Customer buys beer

What is an Interesting Association?  Requires domain-knowledge validation  Actionable, non-trivial, understandable  Algorithms provide first-pass based on statistics on how “unexpected” an association is  Some standard statistics used: C  R  support ≈ p(R&C)  percent of “baskets” where rule holds  confidence ≈ p(R|C)  percent of times R holds when C holds

Support and Confidence  Find all the rules X  Y with Customer Customer buys both minimum confidence and support buys diaper  Support = probability that a transaction contains {X,Y}  i.e., ratio of transactions in which X, Y occur together to all transactions in DB.  Confidence = conditional probability Customer that a transaction having X contains Y buys beer  i.e., ratio of transactions in which X, Y occur together to those in which X occurs. Thel confidence of a rule LHS => RHS can be computed as the support of the whole itemset divided by the support of LHS: Confidence (LHS => RHS) = Support(LHS  RHS) / Support(LHS)

Definition: Frequent Itemset  Itemset  A collection of one or more items TID Items  Example: {Milk, Bread, Diaper} 1 Bread, Milk  k-itemset: itemset with k items 2 Bread, Diaper, Beer, Eggs Support count (  )  3 Milk, Diaper, Beer, Coke  Frequency count of occurrence of itemset 4 Bread, Milk, Diaper, Beer E.g.  ({Milk, Bread,Diaper}) = 2  5 Bread, Milk, Diaper, Coke  Support  Fraction of transactions containing the itemset  E.g. s({Milk, Bread, Diaper}) = 2/5  Frequent Itemset  An itemset whose support is greater than or equal to a minsup threshold 12

Support and Confidence Calculations  Given Association Rule TID Items – {Milk, Diaper}  {Beer} 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs  Rule Evaluation Metrics 3 Milk, Diaper, Beer, Coke – Support (s) 4 Bread, Milk, Diaper, Beer  Fraction of transactions that 5 Bread, Milk, Diaper, Coke contain both X and Y – Confidence (c)  Measures how often items in Y appear in transactions that  { Milk , Diaper } Beer contain X Now Compute these two metrics   ( Milk , Diaper, Beer ) 2    0 . 4 s | T | 5  ( Milk, Diaper, Beer ) 2    0 . 67 c  ( Milk , Diaper ) 3

Support and Confidence – 2 nd Example Itemset {A, C} has a support of 2/5 = 40% Transaction ID Items Bought Rule {A} ==> {C} has confidence of 1001 A, B, C 50% 1002 A, C Rule {C} ==> {A} has confidence of 1003 A, D 100% 1004 B, E, F Support for {A, C, E} ? 1005 A, D, F Support for {A, D, F} ? Confidence for {A, D} ==> {F} ? Confidence for {A} ==> {D, F} ? Goal: Find all rules that satisfy the user-specified minimum support (minsup) and minimum confidence (minconf).

t1: Beef, Chicken, Milk Example t2: Beef, Cheese t3: Cheese, Boots t4: Beef, Chicken, Cheese t5: Beef, Chicken, Clothes, Cheese, Milk t6: Chicken, Clothes, Milk  Transaction data t7: Chicken, Milk, Clothes  Assume: minsup = 30% minconf = 80%  An example frequent itemset : {Chicken, Clothes, Milk} [sup = 3/7]  Rules from the itemset are partitions of the items  Association rules from above itemset: Clothes  Milk, Chicken [sup = 3/7, conf = 3/3] … … Clothes, Chicken  Milk, [sup = 3/7, conf = 3/3] 15

Mining Association Rules Example of Rules: TID Items {Milk,Diaper}  {Beer} (s=0.4, c=0.67) 1 Bread, Milk {Milk,Beer}  {Diaper} (s=0.4, c=1.0) 2 Bread, Diaper, Beer, Eggs {Diaper,Beer}  {Milk} (s=0.4, c=0.67) 3 Milk, Diaper, Beer, Coke {Beer}  {Milk,Diaper} (s=0.4, c=0.67) 4 Bread, Milk, Diaper, Beer {Diaper}  {Milk,Beer} (s=0.4, c=0.5) 5 Bread, Milk, Diaper, Coke {Milk}  {Diaper,Beer} (s=0.4, c=0.5) Observations: • All the above rules are binary partitions of the same itemset: {Milk, Diaper, Beer} • Rules originating from the same itemset have identical support (by definition) but may have different confidence values

Drawback of Confidence Coffee Coffee Tea 15 5 20 Tea 75 5 80 90 10 100 Association Rule: Tea  Coffee Confidence= P(Coffee|Tea) = 0.75 but P(Coffee) = 0.9  Although confidence is high, rule is misleading  P(Coffee|Tea) = 0.9375

Mining Association Rules  Two-step approach: Frequent Itemset Generation 1. Generate all itemsets whose support  minsup – Rule Generation 2. – Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset  Frequent itemset generation is still computationally expensive

Transaction data representation  A simplistic view of “shopping baskets”  Some important information not considered:  the quantity of each item purchased  the price paid 19

Many mining algorithms  There are a large number of them  They use different strategies and data structures.  Their resulting sets of rules are all the same.  Given a transaction data set T , and a minimum support and a minimum confident, the set of association rules existing in T is uniquely determined.  Any algorithm should find the same set of rules although their computational efficiencies and memory requirements may be different.  We study only one: the Apriori Algorithm 20

The Apriori algorithm  The best known algorithm  Two steps :  Find all itemsets that have minimum support ( frequent itemsets , also called large itemsets).  Use frequent itemsets to generate rules.  E.g., a frequent itemset {Chicken, Clothes, Milk} [sup = 3/7] and one rule from the frequent itemset Clothes  Milk, Chicken [sup = 3/7, conf = 3/3] 21

Step 1: Mining all Frequent Itemsets  A frequent itemset is an itemset whose support is ≥ minsup.  Key idea: The Apriori property (downward closure property): any subsets of a frequent itemset are also frequent itemsets ABC ABD ACD BCD AB AC AD BC BD CD A B C D 22

Association Rule Mining 1 What Is Association Rule Mining? - PowerPoint PPT Presentation

Association Rule Mining 1 What Is Association Rule Mining? Association rule mining is finding frequent patterns or associations among sets of items or objects, usually amongst transactional data Applications include Market Basket

Mining Association Rules Mining Association Rules Additional Measures of rule interestingness

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Relationship Mining Association Rule Mining Association Rule Mining Try to automatically find

Week 5 Video 3 Relationship Mining Association Rule Mining Association Rule Mining Try to

Association Rules from transactional databases ! Mining multilevel association rules from

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Week 5 Video 4 Relationship Mining Sequential Pattern Mining Association Rule Mining Try to

Association rule mining Association rule induction: Originally designed for market basket analysis

CISC 4631 Data Mining Lecture 10: Association Rule Mining Theses slides are based on the slides

Rule Changes - Non rule change year Review of 2017 rule changes - just the easy to forgot

Common Rule Advanced Notice of Proposed Rulemaking (ANPRM) IRB Investigator Advanced Notice

2nd RULE: You MUST TALK about BOOK CLUB. 2nd RULE: You DO NOT talk about 3rd RULE: PERSEVERE -- If

Rule #1: Have a takeaway. Rule #2: Keep It Simple. Rule #3: Repetition is Good. Rule #4: Be

Counting Rules, etc Product Rule Generalized Product Rule Division Rule Bijection

Using Rule-Based Activity Using Rule-Based Activity Using Rule-Based Activity Using Rule-Based

INTERPRETABILITY AND INTERPRETABILITY AND EXPLAINABILITY EXPLAINABILITY Christian Kaestner

Towards capabilities in HelenOS Towards capabilities in HelenOS The elephant in the room The

Mathematical Writing Reading: EC 2.1 Peter J. Haas INFO 150 Fall Semester 2019 Lecture 6 1/

PAYS We binar Sc he dule Se pte mbe r 16 th E nhanc ing Your Data Analysis IQ: Advanc e d

CS411 Database Systems Join Expressions 06: SQL Kazuhiro Minami Join Expressions Products and

New Proposal: Characterization and New Proposal: Characterization and Manipulation of Ellipsoidal

Brewing Analytics Quality for Cloud Performance Li Chen Kingsum Chow, Pooja Jain Emad Guirguis,

Theory of Computer Science B1. Propositional Logic I Malte Helmert University of Basel February