Mining Frequent Patterns, Associations and Correlations Week 3 1

Team Homework Assignment #2 Team Homework Assignment #2 • Read pp. 285 – 300 of the text book. R d 285 300 f h b k • Do Example 6.1. Prepare for the results of the homework assignment. assignment. • Due date – beginning of the lecture on Friday February 18 th .

Team Homework Assignment #3 Team Homework Assignment #3 • Prepare for the one ‐ page description of your group project P f h d i i f j topic • Prepare for presentation using slides Prepare for presentation using slides • Due date – beginning of the lecture on Friday February 11 th .

http://www.lucyluvs.com/images/fitt edXLpooh.JPG edXLpooh.JPG http://www.mondobirra.org/sfondi/BudLight.siz ed.jpg 4

cell_cycle ‐ > [+]Exp1,[+]Exp2,[+]Exp3,[+]Exp4, support=52.94% (9 genes) apoptosis ‐ > [+]Exp6,[+]Exp7,[+]Exp8, p p [ ] p ,[ ] p ,[ ] p , support=76.47% (13 genes) http://www.cnb.uam.es/~pcarmona/assocrules/imag4.JPG 5

a ble 8.3 T he substitutio n matrix o f amino ac ids. T ig ure 8.8 Sc o ring two po te ntial pairwise alignme nts, (a) and F (b), o f amino ac ids. 6

ig ure 9.1 A sample graph data se t. F ig ure 9.2 F re que nt graph. F 7

8 ig ure 9.14 A c he mic al database . F

What Is Frequent Pattern Analysis? What Is Frequent Pattern Analysis? • Frequent pattern: a pattern for itemsets, subsequences, substructures , etc. that occurs frequently in a data set • First proposed by Agrawal, Imielinski, and Swami in 1993, in the context of frequent itemsets and association rule mining 9

Why Is Frequent Pattern Mining I Important? ? • Discloses an intrinsic and important property of data sets • Discloses an intrinsic and important property of data sets • Forms the foundation for many essential data mining tasks and applications tasks and applications – What products were often purchased together?— Beer and diapers? – What are the subsequent purchases after buying a PC? – What kinds of DNA are sensitive to this new drug? – Can we automatically classify web documents? 10

Topics of Frequent Pattern Mining (1) Topics of Frequent Pattern Mining (1) • Based on the kinds of patterns to be mined – Frequent itemset mining – Sequential pattern mining – Structured pattern mining 11

Topics of Frequent Pattern Mining (2) Topics of Frequent Pattern Mining (2) • Based on the levels of abstraction involved in the rule set – Single ‐ level association rules Single level association rules – Multi ‐ level association rules 12

Topics of Frequent Pattern Mining (3) Topics of Frequent Pattern Mining (3) • Based on the number of data dimensions involved in the rule – Single ‐ dimensional association rules Single dimensional association rules – Multi ‐ dimensional association rules 13

Association Rule Mining Process Association Rule Mining Process • Find all frequent itemsets Fi d ll f i – Join steps – Prune steps – Prune steps • Generate “ strong” association rules from the frequent itemsets 14

Basic Concepts of Frequent Itemsets Basic Concepts of Frequent Itemsets • Let I = { I 1 , I 2 , …., I m } be a set of items • Let D , the task ‐ relevant data, be a set of database L t D th t k l t d t b t f d t b transactions where each transaction T is a set of items such that T ⊆ I that T ⊆ I • Each transaction is associated with an identifier, called TID . • Let A be a set of items • Let A be a set of items • A transaction T is said to contain A if and only if A ⊆ T 15

How to Generate Frequent Itemset? How to Generate Frequent Itemset? • Suppose the items in L k ‐ 1 are listed in an order • The join step : To find L k , a set of candidate k ‐ itemsets, C k , is generated by joining L k ‐ 1 with itself. Let l 1 and l 2 be itemsets in L k ‐ 1 .The resulting itemset formed by joining l 1 and l 2 is l 1 [1], l 1 [2], …, l 1 [k 2], l 1 [k 1], l 2 [k 1] l 1 [2] l 1 [k ‐ 2] l 1 [k ‐ 1] l 2 [k ‐ 1] • The prune step : Scan data set D and compare candidate support count of C k with minimum support count. Remove pp pp k candidate itemsets that whose support count is less than minimum support count, resulting in L k . 16

Apriori Algorithm Apriori Algorithm • Initially, scan DB once to get frequent 1 ‐ itemset I iti ll DB t t f t 1 it t • Generate length (k+1) candidate itemsets from length k frequent itemsets frequent itemsets • Prune length (k+1) candidate itemsets with Apriori property – Apriori property: All nonempty subsets of a frequent itemset must Apriori property: All nonempty subsets of a frequent itemset must also be frequent • Test the candidates against DB g • Terminate when no frequent or candidate set can be generated g 17

18 5.4 T he A Aprio ri alg go rithm fo o r disc o ve e ring fre q que nt F ig ure ite mse ts fo r min ing Bo o le e an asso c c iatio n rul le s.

T T ransac tio nalDatabase ransac tio nalDatabase TI D TI D List of item List of item _ I Ds I Ds T100 I1, I2, I5 T200 I2, I4 T300 I2, I3 T400 I1, I2, I4 T500 I1, I3 , T600 I2, I3 T700 I1, I3 T800 T800 I1 I2 I3 I5 I1, I2, I3, I5 T900 I1, I2, I3 a ble 5 1 T a ble 5.1 T ransac tio nal data fo r an AllE ransac tio nal data fo r an AllE le c tro nic s branc h le c tro nic s branc h. T T 19

Minimum support count = 2 Figure 5.2 Generation of candidate itemsets and frequent itemsets, where 20 the minimum support count is 2.

Generating Strong Association Rules Generating Strong Association Rules • From the frequent itemsets q • For each frequent itemset l , generate all nonempty subset of l • For every nonempty subset s of l, • Output the rule “s (l – s)” • If support_count(l) / support_count(s) ≥ min_conf, If t t(l) / t t( ) ≥ i f where min_conf is the minimum confidence threshold • Rules that satisfy both a minimum support threshold Rules that satisfy both a minimum support threshold and a minimum confidence threshold are called strong 1

Support Support • The rule A The rule A B holds in the transaction set D B holds in the transaction set D with support s – support , s , probability that a transaction contains t b bilit th t t ti t i A and B – support (A support (A B) B) = P (A P (A B) B) 2

Confidence Confidence • The rule A The rule A B has confidence c in the B has confidence c in the transaction set D – confidence , c , conditional probability that a fid diti l b bilit th t transaction having A also contains B – confidence (A confidence (A B) = P (B | A) B) P (B | A) ∪ ∪ ( ) ( ) support A B support_co unt A B ⇒ = = = ( ) ( | ) Confidence A B P B A ( ( ) ) ( ( ) ) support pp A support co pp _ unt A 3

Generating Association Rules from Frequent Itemsets • Example 5.4: Suppose the data contain the frequent itemset l p pp q = {I1, I2, I5}. What are the association rules that can be generated from l? If the minimum confidence threshold is 70%, then which rules are strong? 70% then which rules are strong? – I1 Î2 ‐ > I5, confidence = 2/4 = 50% – I1 Î5 ‐ > I2, confidence = 2/2 = 100% – I2 Î5 ‐ > I1, confidence = 2/2 = 100% – I1 ‐ > I2 ^ I5, confidence = 2/6 = 33% – I2 ‐ > I1 ^ I5, confidence = 2/7 = 29% , / – I5 ‐ > I1 ^ I2, confidence = 2/2 = 100% 1

Exercise Exercise 5.3 A database has five transactions. Let min_sup = 60% and min_conf = 80%. TID Items_bought T100 {M, O, N, K, E, Y} T200 T200 {D O N K E Y} {D, O, N, K, E, Y} T300 {M, A, K, E} T400 {M, U, C, K, Y} T500 {C, O, O, K, I, E} (a) Find all frequent itemsets. (b) List all of the strong association rules (with support s and confidence c ) matching following meta ‐ rule, where X is a variable representing customers, and item i denotes variables representing items (e.g., “A”, “B”, etc.): representing items (e.g., A , B , etc.): ∀ ∈ ∧ ⇒ , ( , ) ( , ) ( , ) x transactio n buys X item buys X item buys X item [s, c] 4 1 2 3

Challenges of Frequent Pattern Mining Challenges of Frequent Pattern Mining • Challenges Challenges – Multiple scans of transaction database – Huge number of candidates uge u be o ca d dates – Tedious workload of support counting for candidates • Improving Apriori – Reduce passes of transaction database scans – Shrink number of candidates – Facilitate support counting of candidates 5

Advanced Methods for Mining Frequent Itemsets • Mining frequent itemsets without candidate Mining frequent itemsets without candidate generation – Frequent pattern growth (FP growth—Han Pei & – Frequent ‐ pattern growth (FP ‐ growth—Han, Pei & Yin @SIGMOD’00) • Mining frequent itemsets using vertical data • Mining frequent itemsets using vertical data format – Vertical data format approach (ECLAT—Zaki Vertical data format approach (ECLAT Zaki @IEEE ‐ TKDE’00) 6

Mining Various Kinds of Association Rules • Mining multilevel association rules • Mining multilevel association rules • Mining multidimensional association rules g 7

Mining Frequent Patterns, Associations and Correlations Week 3 1 - PowerPoint PPT Presentation

Mining Frequent Patterns, Associations and Correlations Week 3 1 Team Homework Assignment #2 Team Homework Assignment #2 Read pp. 285 300 of the text book. R d 285 300 f h b k Do Example 6.1. Prepare for the results of the

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

1 Closed Patterns and Max-Patterns Closed Patterns and Max-Patterns A long pattern contains a

Contents 5 Mining Frequent Patterns, Associations, and Correlations 3 5.1 Basic Concepts and a

Introduction to Data Mining Frequent Pattern Mining and Association Analysis Li Xiong Slide

Introduction to Data Mining Frequent Pattern Mining and Association Analysis Li Xiong Slide

intro associations frequent patterns

Scope Constrained Frequent Pattern Mining: Constrained Frequent Pattern Mining: A A

Frequent Itemset Mining Stony Brook University CSE545, Fall 2016 Frequent Itemset Mining aka

From Path Tree To Frequent Patterns: A Framework for Mining Frequent Patterns Yabo Xu, Jeffrey

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

CS570 Introduction to Data Mining Frequent Pattern Mining and Association Analysis Cengiz Gunay

Data Mining 2018 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 10, 2018

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Frequent Pattern Mining Overview Basic Concepts and Challenges Data Mining Techniques:

Associations and Frequent Item Analysis 1 Outline Transactions Frequent itemsets

CS570 Data Mining Frequent Pattern Mining and Association Analysis 2 Cengiz Gunay Slide

Outline Basics of Association Rules Algorithms: Apriori, ECLAT and FP-growth Interestingness

Call for Collaboration International Alliance of AT Organisations Information session for

LU-RRTC Overview of National Research and Research Capacity Building Agenda Presenters: Andre

Hello! I am Janet Steveley with Griffin-Hammis Associates I am here because I love to talk about

Frequent Item Sets Chau Tran & Chun-Che Wang Outline 1. Definitions Frequent Itemsets

Foundations of Knowledge Management: Association Rules Markus Strohmaier (with slides based on

Integrating Classification and Association Rule Mining the Secret Behind CBA Written by Bing

An ideal associated to any cometric association scheme William J. Martin Department of

Mining Frequent Patterns, Associations and Correlations Week 3 1 - PowerPoint PPT Presentation

Mining Frequent Patterns, Associations and Correlations Week 3 1 Team Homework Assignment #2 Team Homework Assignment #2 Read pp. 285 300 of the text book. R d 285 300 f h b k Do Example 6.1. Prepare for the results of the

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

1 Closed Patterns and Max-Patterns Closed Patterns and Max-Patterns A long pattern contains a

Contents 5 Mining Frequent Patterns, Associations, and Correlations 3 5.1 Basic Concepts and a

Introduction to Data Mining Frequent Pattern Mining and Association Analysis Li Xiong Slide

Introduction to Data Mining Frequent Pattern Mining and Association Analysis Li Xiong Slide

intro associations frequent patterns

Scope Constrained Frequent Pattern Mining: Constrained Frequent Pattern Mining: A A

Frequent Itemset Mining Stony Brook University CSE545, Fall 2016 Frequent Itemset Mining aka

From Path Tree To Frequent Patterns: A Framework for Mining Frequent Patterns Yabo Xu, Jeffrey

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

CS570 Introduction to Data Mining Frequent Pattern Mining and Association Analysis Cengiz Gunay

Data Mining 2018 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 10, 2018

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Frequent Pattern Mining Overview Basic Concepts and Challenges Data Mining Techniques:

Associations and Frequent Item Analysis 1 Outline Transactions Frequent itemsets

CS570 Data Mining Frequent Pattern Mining and Association Analysis 2 Cengiz Gunay Slide

Outline Basics of Association Rules Algorithms: Apriori, ECLAT and FP-growth Interestingness

Call for Collaboration International Alliance of AT Organisations Information session for

LU-RRTC Overview of National Research and Research Capacity Building Agenda Presenters: Andre

Hello! I am Janet Steveley with Griffin-Hammis Associates I am here because I love to talk about

Frequent Item Sets Chau Tran &amp; Chun-Che Wang Outline 1. Definitions Frequent Itemsets

Foundations of Knowledge Management: Association Rules Markus Strohmaier (with slides based on

Integrating Classification and Association Rule Mining the Secret Behind CBA Written by Bing

An ideal associated to any cometric association scheme William J. Martin Department of

Frequent Item Sets Chau Tran & Chun-Che Wang Outline 1. Definitions Frequent Itemsets