Mining Association Rules Mining Association Rules Additional - PowerPoint PPT Presentation

Mining Association Rules Mining Association Rules � What is Association rule mining � Apriori Algorithm Mining Association Rules Mining Association Rules � Additional Measures of rule interestingness � Advanced Techniques 1 2 What Is Association Rule Mining? What Is Association Rule Mining? What Is Association Rule Mining? What Is Association Rule Mining? � Rule form Association rule mining � Antecedent → Consequent [support, confidence] Finding frequent patterns, associations, correlations, or causal � structures among sets of items in transaction databases (support and confidence are user defined measures of interestingness) Understand customer buying habits by finding associations and � correlations between the different items that customers place in � Examples their “shopping basket” � buys(x, “computer”) → buys(x, “financial management software”) [0.5%, 60%] Applications � Basket data analysis, cross-marketing, catalog design, loss-leader � � age(x, “30..39”) ^ income(x, “42..48K”) � buys(x, “car”) [1%,75%] analysis, web log analysis, fraud detection (supervisor->examiner) 3 4

How can Association Rules be used? How can Association Rules be used? How can Association Rules be used? How can Association Rules be used? Probably mom was � Let the rule discovered be calling dad at work to buy {Bagels,...} → {Potato Chips} diapers on way home and he � Potato chips as consequent => Can be used to determine what decided to buy a should be done to boost its sales six-pack as well. The retailer � Bagels in the antecedent => Can be used to see which products could move diapers and beers would be affected if the store discontinues selling bagels to separate places and position high- � Bagels in antecedent and Potato chips in the consequent => profit items of Can be used to see what products should be sold with Bagels interest to young to promote sale of Potato Chips fathers along the path. 5 6 Rule basic Measures: Support and Confidence Rule basic Measures: Support and Confidence Association Rule: Basic Concepts Association Rule: Basic Concepts A ⇒ B [ s, c ] � Given: Support: denotes the frequency of the rule within � (1) database of transactions, transactions. A high value means that the rule involve a � (2) each transaction is a list of items purchased by a customer great part of database . in a visit support(A ⇒ B [ s, c ]) = p(A ∪ B) � Find: � all rules that correlate the presence of one set of items Confidence: denotes the percentage of transactions ( itemset ) with that of another set of items containing A which contain also B. It is an estimation of conditioned probability . � E.g., 98% of people who purchase tires and auto accessories also get automotive services done confidence(A ⇒ B [ s, c ]) = p(B|A) = sup(A,B)/sup(A). 7 8

Example Example Mining Association Rules Mining Association Rules Itemset: A,B or B,E,F � What is Association rule mining Trans. Id Purchased Items Support of an itemset: 1 A,D � Apriori Algorithm Sup(A,B)=1 Sup(A,C)=2 2 A,C � Additional Measures of rule interestingness Frequent pattern: 3 A,B,C 4 B,E,F Given min. sup=2, {A,C} is a � Advanced Techniques frequent pattern For minimum support = 50% and minimum confidence = 50%, we have the following rules A => C with 50% support and 66% confidence C => A with 50% support and 100% confidence 9 10 Boolean association rules association rules Boolean Mining Association Rules - - An An Example Example Mining Association Rules Min. support 50% Transaction ID Items Bought Min. confidence 50% Each transaction is 2000 A,B,C represented by a Boolean vector 1000 A,C Frequent Itemset Support 4000 A,D {A} 75% 5000 B,E,F {B} 50% {C} 50% {A,C} 50% For rule A ⇒ C : support = support({ A , C }) = 50% confidence = support({ A , C }) / support({ A }) = 66.6% 11 12

The Apriori Apriori principle principle The Apriori principle Apriori principle Any subset of a frequent itemset must � No superset of any infrequent itemset should be be frequent generated or tested � Many item combinations can be pruned � A transaction containing {beer, diaper, nuts} also contains {beer, diaper} � {beer, diaper, nuts} is frequent � {beer, diaper} must also be frequent 13 14 Itemset Lattice Itemset Lattice Apriori principle for pruning candidates Apriori principle for pruning candidates null null If an itemset is infrequent, then all of its A B C D E supersets must also be A B C D E infrequent AB AC AD AE BC BD BE CD CE DE AB AC AD AE BC BD BE CD CE DE Found to be ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE Infrequent ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE Pruned ABCD ABCE ABDE ACDE BCDE supersets ABCDE ABCDE 15 16

The Apriori Algorithm — — Example Example The Apriori Algorithm Mining Frequent Itemsets Itemsets (the Key Step) Mining Frequent (the Key Step) Min. support: 2 transactions Database D C 1 itemset sup. � Find the frequent itemsets: the sets of items that L 1 itemset sup. {1} 2 TID Items have minimum support {1} 2 {2} 3 100 1 3 4 Scan D {2} 3 {3} 3 200 2 3 5 � A subset of a frequent itemset must also be a frequent {4} 1 {3} 3 300 1 2 3 5 itemset {5} 3 {5} 3 400 2 5 � Generate length (k+1) candidate itemsets from length k C 2 C 2 itemset frequent itemsets, and itemset sup L 2 {1 2} itemset sup {1 2} 1 � Test the candidates against DB to determine which are in fact {1 3} {1 3} 2 {1 3} 2 frequent Scan D {1 5} {1 5} 1 {2 3} 2 {2 3} {2 3} 2 {2 5} 3 {2 5} {2 5} 3 � Use the frequent itemsets to generate association {3 5} 2 {3 5} {3 5} 2 rules. L 3 C 3 Scan D itemset itemset sup � Generation is straightforward {2 3 5} {2 3 5} 2 17 18 How to Generate Candidates? How to Generate Candidates? How to Generate Candidates? How to Generate Candidates? � The items in L k-1 are listed in an order � Step 2: pruning � Step 1: self-joining L k-1 for all itemsets c in C k do insert into C k for all (k-1)-subsets s of c do select p.item 1 , p.item 2 , …, p.item k-1 , q.item k-1 if (s is not in L k-1 ) then delete c from C k from L k-1 p, L k-1 q where p.item 1 =q.item 1 , …, p.item k-2 =q.item k-2 , p.item k-1 < q.item k-1 A D E F A D E = = < A D F 19 20

The Apriori Algorithm The Apriori Algorithm Example of Generating Candidates Example of Generating Candidates C k : Candidate itemset of size k L k : frequent itemset of size k � � L 3 = { abc, abd, acd, ace, bcd } Join Step: C k is generated by joining L k-1 with itself � Prune Step : Any (k-1)-itemset that is not frequent cannot be a � � Self-joining : L 3 *L 3 subset of a frequent k-itemset Algorithm: � abcd from abc and abd � L 1 = {frequent items}; for ( k = 1; L k != ∅ ; k ++) do begin � acde from acd and ace C k+1 = candidates generated from L k ; for each transaction t in database do � Pruning (before counting its support) : increment the count of all candidates in C k+1 that are contained in t � acde is removed because ade is not in L 3 L k+1 = candidates in C k+1 with min_support end � C 4 ={ abcd } return L = ∪ k L k ; 21 22 How to Count Supports of Candidates? How to Count Supports of Candidates? Generating AR from frequent intemsets Generating AR from frequent intemsets support_count({A B}) Why counting supports of candidates a problem? , � Confidence (A ⇒ B) = P(B|A) = � support_count({A}) The total number of candidates can be very huge � One transaction may contain many candidates � � For every frequent itemset x, generate all non-empty subsets of x Method: � Candidate itemsets are stored in a hash-tree � Leaf node of hash-tree contains a list of itemsets and counts � � For every non-empty subset s of x, output the rule Interior node contains a hash table � Subset function: finds all the candidates contained in a transaction � “ s ⇒ (x-s) ” if support_co unt({x}) min_conf ≥ support_co unt({s}) 23 24

Mining Association Rules Mining Association Rules Additional - PowerPoint PPT Presentation

Mining Association Rules Mining Association Rules What is Association rule mining Apriori Algorithm Mining Association Rules Mining Association Rules Additional Measures of rule interestingness Advanced Techniques 1 2 What Is

Association Rules from transactional databases ! Mining multilevel association rules from

Association Rules Data Mining and Exploration: Association Rules Itemsets, association rules

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Association Rule Mining 1 What Is Association Rule Mining? Association rule mining is finding

Relationship Mining Association Rule Mining Association Rule Mining Try to automatically find

Week 5 Video 3 Relationship Mining Association Rule Mining Association Rule Mining Try to

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Week 5 Video 4 Relationship Mining Sequential Pattern Mining Association Rule Mining Try to

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

Contents Association Rules: Concept and Algorithms Basics of Association Rules Algorithms:

Frequent Itemset Mining Stony Brook University CSE545, Fall 2016 Frequent Itemset Mining aka

DATA MINING LECTURE 3 Frequent Itemsets and Association Rules This is how it all started

On Allocating Cache Resources to Content Providers Weibo Chu, Mostafa Dehghan, Don Towsley,

The TAU 2014 Contest Removing Pessimism during Timing Analysis

Generic Construction of UC-Secure Oblivious Transfer O. Blazy , C.Chevalier O. Blazy (Xlim)

Hi Matlab Grader homework, emailed Thursday, 1 (of 9) homeworks Due 21 April, Binary graded. 2

Holomorphic functions associated with indeterminate rational moment problems Adhemar Bultheel 1 ,

Southern Methodist University February 13-15, 2017 Introduction W/Z@LHC Reweighting Conclusion

Functions and Data Fitting COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

in Wireline Communications Ali Sheikholeslami University of Toronto, Canada ali@ece.utoronto.ca