Association Rules Data Mining and Exploration: Association Rules - PowerPoint PPT Presentation

Association Rules Data Mining and Exploration: Association Rules ◮ Itemsets, association rules Amos Storkey, School of Informatics ◮ Frequency, accuracy ◮ APRIORI algorithm ◮ Comments on Association Rules February 7, 2006 Reading: HMS chapter 13 Additional reading: Witten and Frank § 4.5, Han and Kamber § 6.1, 6.2 http://www.inf.ed.ac.uk/teaching/courses/dme/ These lecture slides are based extensively on previous versions of the course written by Chris Williams. 1 / 1 2 / 1 About Association Rules ◮ Example of Association rules: market basket analysis, the ◮ We are looking for patterns , i.e. local regularities in the data process of analyzing customer buying habits by finding associations between items that customers place in their ◮ Examples of frequent itemsets, association rules “shopping baskets” ◮ 10% of supermarket customers buy wine and cheese ◮ Each row of the data matrix has a 1 if the corresponding ◮ If a person visits the CNN website, there is a 60% chance that they will visit the ABC website in the same month product was in the basket. Data is often sparse ◮ Can recode k -valued categorical variables (e.g. outlook = ◮ Association rules are like classification rules, except that they can predict any attribute, not just the class { sunny, overcast, rainy } ) as k binary variables ◮ Association rules are not intended to be used together as a set (cf classification rules) 3 / 1 4 / 1

Itemsets, Frequency, Accuracy Play Tennis Example ◮ An itemset is a pattern defined by Day Outlook Temperature Humidity Wind PlayTennis D1 Sunny Hot High False No ( A i 1 = a j 1 ) ∧ ( A i 2 = a j 2 ) ∧ . . . ( A i k = a j k ) D2 Sunny Hot High True No D3 Overcast Hot High False Yes ◮ The frequency (or support) of an itemset X is simply P ( X ) D4 Rain Mild High False Yes ◮ Example: in the “Play Tennis” data D5 Rain Cool Normal False Yes D6 Rain Cool Normal True No P ( Humidity = Normal ∧ Play = Yes ∧ Windy = False ) = 4 / 14 D7 Overcast Cool Normal True Yes D8 Sunny Mild High False No D9 Sunny Cool Normal False Yes ◮ The accuracy (or confidence) of an association rule if Y=y D10 Rain Mild Normal False Yes then Z=z is D11 Sunny Mild Normal True Yes P ( Z = z | Y = y ) D12 Overcast Mild High True Yes D13 Overcast Hot Normal False Yes ◮ Example D14 Rain Mild High True No P ( Windy = False ∧ Play = Yes | Humidity = Normal ) = 4 / 7 5 / 1 6 / 1 Generating rules from itemsets Finding Frequent Itemsets ◮ An itemset of size k can give rise to 2 k − 1 rules ◮ Task: find all itemsets with frequency ≥ s ◮ Example. Itemset ◮ Key observation: a set X of variables can be frequent only if all subsets of variables are frequent (monotonicity Windy=False, Play=Yes, Humidity=Normal property), i.e. P ( A , B ) ≤ P ( A ) and P ( A , B ) ≤ P ( B ) gives rise to 7 rules including ◮ So find frequent singleton sets, then sets of size 2, and so on ... IF Windy=False and Humidity=Normal THEN Play=Yes (4/4) IF Play=Yes THEN Humidity=Normal and Windy=False (4/9) ◮ An efficient algorithm using this idea for finding frequent IF True THEN Windy=False and Play=Yes and Humidity=Normal (4/14) itemsets is the APRIORI algorithm (Agrawal and Srikant (1994), Mannila et al (1994)) ◮ Select association rules that have accuracy greater than some threshold a 8 / 1 9 / 1

APRIORI algorithm ◮ Single database pass is linear in | C i | n , make a pass for each i until C i is empty (for binary variables) ◮ Candidate formation ◮ Find all pairs of sets { U , V } from L i such that U ∪ V has i = 1 size i + 1 and test if this union is really a potential C i = {{ A }| A is a variable } candidate. O ( | L i | 3 ) while C i is not empty ◮ Example: 5 three-item sets database pass: (ABC), (ABD), (ACD), (ACE), (BCD) for each set in C i test if it is frequent Candidate four-item sets let L i be collection of frequent sets from C i (ABCD) ok candidate formation: (ACDE) not ok because (CDE) is not present above let C i + 1 be those sets of size i + 1 ◮ Data structure techniques can be used for speedups all of whose subsets are frequent end while ◮ Other algorithms possible for finding frequent itemsets, e.g. Han’s FP-growth 10 / 1 11 / 1 APRIORI and Algorithm Components Comments on Association Rules ◮ Finding Association Rules is just the beginning in a datamining effort. Some will be trivial, others interesting. Challenge is to select potentially interesting rules ◮ Finding Association rules as Exploratory Data Analysis ◮ Trivial rule example: ◮ Task: Rule Pattern Discovery ◮ Structure: Association Rules pregnant ⇒ female ◮ Score Function: Support with accuracy 1! ◮ Search: Breadth First with Pruning ◮ For rule A ⇒ B , it can be useful to compare P ( B | A ) to P ( B ) ◮ Data Management Technique: Linear Scans ◮ APRIORI algorithm can be generalized to frequent structure mining, e.g. finding episodes from sequences or frequently-occurring trees ◮ Example application: Health Insurance Commission (HIC) in Australia detected patterns of ordering of medical tests that suggested that some of the tests ordered were unnecessary (Cabe˜ na et al, 1998) 12 / 1 13 / 1

Summary ◮ Finding frequent itemsets ◮ Done with APRIORI algorithm ◮ Given frequent itemsets, construct association rules with accuracy > a ◮ Select interesting rules ◮ Generalize to frequent structure mining 14 / 1

Association Rules Data Mining and Exploration: Association Rules - PowerPoint PPT Presentation

Association Rules Data Mining and Exploration: Association Rules Itemsets, association rules Amos Storkey, School of Informatics Frequency, accuracy APRIORI algorithm Comments on Association Rules February 7, 2006 Reading: HMS

Mining Association Rules Mining Association Rules Additional Measures of rule interestingness

Association Rules from transactional databases ! Mining multilevel association rules from

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

CS378 Introduction to Data Mining Data Exploration and Data Preprocessing Li Xiong Data

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Data Mining and Exploration Data Mining and Exploration: Introduction Course Introduction Amos

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Week 5 Video 4 Relationship Mining Sequential Pattern Mining Association Rule Mining Try to

Acacia Mining plc Exploration Roundtable 11.12.2015 Exploration roundtable Investment in

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

Association Rule Mining 1 What Is Association Rule Mining? Association rule mining is finding

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Plenary One Virtual National Chinese Language Conference 2020 Innovation During a Time of Change:

Disjoint-set data structure CS 5633 -- Spring 2006 (Union-Find) Problem: Maintain a dynamic

Teaching Dimension COMS 6998-4 Learning Theory Benjamin Kuykendall brk2117@columbia.edu 1

Why it is not enough to fund teacher training 25.02.20 1 Who is the International Council

E X E R C I S E S 2 Reescreva as sentenas que esto na VOZ ATIVA para a VOZ PASSIVA: 1. I

The Story of Rosie King If you like to think big But some make you feel crazy Or they say

10/19/16 Assessment Literacy for Early Educators: Unpacking Intentional Child Assessment

INTRODUCTION OF KEY PERSONNEL & FORM TEACHERS School Vision: Engaged Learners, Caring

Association Rules Data Mining and Exploration: Association Rules - PowerPoint PPT Presentation

Association Rules Data Mining and Exploration: Association Rules Itemsets, association rules Amos Storkey, School of Informatics Frequency, accuracy APRIORI algorithm Comments on Association Rules February 7, 2006 Reading: HMS

Mining Association Rules Mining Association Rules Additional Measures of rule interestingness

Association Rules from transactional databases ! Mining multilevel association rules from

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

CS378 Introduction to Data Mining Data Exploration and Data Preprocessing Li Xiong Data

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Data Mining and Exploration Data Mining and Exploration: Introduction Course Introduction Amos

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Week 5 Video 4 Relationship Mining Sequential Pattern Mining Association Rule Mining Try to

Acacia Mining plc Exploration Roundtable 11.12.2015 Exploration roundtable Investment in

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

Association Rule Mining 1 What Is Association Rule Mining? Association rule mining is finding

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Plenary One Virtual National Chinese Language Conference 2020 Innovation During a Time of Change:

Disjoint-set data structure CS 5633 -- Spring 2006 (Union-Find) Problem: Maintain a dynamic

Teaching Dimension COMS 6998-4 Learning Theory Benjamin Kuykendall brk2117@columbia.edu 1

Why it is not enough to fund teacher training 25.02.20 1 Who is the International Council

E X E R C I S E S 2 Reescreva as sentenas que esto na VOZ ATIVA para a VOZ PASSIVA: 1. I

The Story of Rosie King If you like to think big But some make you feel crazy Or they say

10/19/16 Assessment Literacy for Early Educators: Unpacking Intentional Child Assessment

INTRODUCTION OF KEY PERSONNEL &amp; FORM TEACHERS School Vision: Engaged Learners, Caring

INTRODUCTION OF KEY PERSONNEL & FORM TEACHERS School Vision: Engaged Learners, Caring