The shortcomings of the frequent pattern mining CLOSET:An Efficient - PowerPoint PPT Presentation

The shortcomings of the frequent pattern mining CLOSET:An Efficient Algorithm � There may exist a large number of frequent for Mining itemsets in a transaction database, especially when the support threshold is low; Frequent Closed Itemsets � There may exist a huge number of association rules. It it hard for users to Jian Pei, Jiawei Han and Runying Mao comprehend and manipulate a huge number of rules. An interesting alternative A simple example Transaction ID Items in transaction mining the complete set of frequent 10 a1,a2,a3….a100 itemsets and their associations. 20 a1,a2,a3….a50 The minimum support threshold is 1; The minimum confidence threshold is 50% only mining the frequent closed itemsets and their corresponding association rules.

The comparison of the two DEFINITION 1 (Frequent Closed Itemset) mining methods Traditional Method FCI Method � An itemset X is a closed itemset ≈ 10³º Frequent itemsets: if there exists no itemset X' such that Only two FCI: 1> X' is a proper superset of X ; (a1),…(a100), (a1, a2, …a50) 2>every transaction containing X also contains X'; (a1,a2)…(a99,a100)… (a1,a2,…a100) (a1,a2,…a100) One association rule: a tremendous member of (a1,a2,…a50) � � A closed itemset X is frequent association rules… (a51,a52,…a100) if its support passes the given support threshold. An important Lemma DEFINITION 2 (Conditional Database) � Given a transaction database TDB. Let k be a � Given a transaction database TDB, a frequent item in TDB. The k-conditional database, support threshold min_sup, and denoted as TDB|k, is the subset of transactions in f_list=(i1,i2,…,in), the problem of mining TDB containing k, and all the occurrences of the complete set of frequent closed itemsets infrequent items, item k, and items following k in can be divided into n sub-problems: The j th the f_list are omitted. problem(1 ≤ j ≤ n) is to find the complete set of frequent closed itmesets containing i n+1-j but no i k (for n+1-j < k ≤ n)

TDB cdfad The transaction database TDB ea cef f_list:<c:4,e:4,f:4,a:3,d:2 cfad cef Transaction ID Items in transaction 10 a,c,d,e,f d-cond DB(d:2) a-cond DB(a:3) f-cond DB(f:4) e-cond DB(e:4) c-cond DB 20 a,b,e (e:4) c:3 cefa ce:3 cef cfa c e 30 c,e,f cf Output F.C.I.:cf:4,cef:3 Output F.C.I.:e:4 Output F.C.I.:cfad:2 40 a,c,d,f Output F.C.I.:a:3 50 c,e,f F_list|a=( c:2,e:2, f:2) Min_sup=2 fa-cond DB(fa:2) ea-cond DB(ea:2) ca-cond DB(ea:2) ce c c c Output F.C.I.:ea:2 Optimization 1 Optimization 2 Compress transactional and conditional Extract items appearing in every database using an FP-tree structure transaction of conditional database Benefits TDB d-cond DB(d:2) cdfad � FP-tree compresses database for Output F.C.I: ea cefa frequent itemset mining. cfad:2 cef cfa cfad � Conditional databases can be cef Benefits: derived from FP-tree efficiently. � It reduces the size of FP-tree; � It reduces the level of recursions.

Optimization 3 Lemma 2 Directly extract frequent closed itemsets from FP-tree � If an itemset Y is the maximal set of items appearing in every transaction in the X- Null() conditional database, and X ∪ Y is not TDB subsumed by some already found frequent f-cond DB(f:4) cdfad Output F.C.I: ea closed itemset with identical support, then c:4 ce:3 cef cf:4, cef:3 X ∪ Y is a frequent closed itemset. c cfad cef e:3 Lemma 3 DEFINITION 3 (k-single segment itemsets) � Let k be a frequent item in the X-conditional � The i_single segment itemset Y is a database. If there is only one node N labeled k in frequent closed itemset if the support of i the corresponding FP-tree, every ancestor of N has within the conditional database passes the only one child and N has (1)no child, (2)more than given threshold and Y is not a proper subset one child, or (3)one child with count value smaller of any frequent closed itemset already than that of N, then the k-single segment itemset is found. the union of itemset X and the set of items including N and N’s ancestors(excluding the root).

TDB Optimization 4 cdfad ea cef f_list:<c:4,e:4,f:4,a:3,d:2 Prune search branches cfad cef Lemma 4 d-cond DB(d:2) a-cond DB(a:3) f-cond DB(f:4) e-cond DB(e:4) c-cond DB (e:4) c:3 Let X and Y be two frequent itemsets with the cefa ce:3 cef cfa c e same support. If X ⊂ Y, and Y is closed, then cf Output F.C.I.:cf:4,cef:3 Output F.C.I.:e:4 Output F.C.I.:cfad:2 there exist no frequent closed itemset containing Output F.C.I.:a:3 X but not Y-X F_list|a=( c:2,e:2, f:2) fa-cond DB(fa:2) ea-cond DB(ea:2) ca-cond DB(ea:2) ce c c c Output F.C.I.:ea:2 The Algorithm of CLOSET Subroutine CLOSET(X,DB,f_list,FCI) � 1.Let Y be the set of items in f_list such that they appear in every transaction of DB, insert X ∪ Y to � Initialization. Let FCI be the set of frequent FCI if it is not a proper subset of some itemset in closed itemset. Initialize 0 � FCI; FCI with same support;//Applying Optimization2 � Find frequent items. Scan transaction � 2.Build FP-tree for DB, items already be extracted database TDB, compute frequent item list; should be excluded;//Applying Optimization1 � 3.Apply Optimization3 to extract frequent closed � Mine frequent closed itemsets recursively. itemsets if it is possible; Call CLOSET(0, TDB, f_list, FCI). � 4.Form conditional database for every remaining item in f_list, at the same time, compute local frequent item lists for these conditional databases;

Scaling up CLOSET in large database Subroutine CLOSET(X,DB,f_list,FCI) � 5.For each remaining item I in f_list, starting from When the transaction database is large, it is unrealistic to construct a main memory-based FP-tree. the last one, call CLOSET(iX, DB| i, f_list i , FCI). If iX is not a subset of any frequent closed itemset already found with the same support count, where DB| i is the i-conditional database with respect to DB and f_list is the corresponding frequent item Construct conditional list.//Applying Optimization4 Construct disk-based database without FP-tree FP-tree TDB Performance Study cdfad ea cef Reduction of the szie of itemsets cfad cef #F.I Support #F.C.I #F.I #F.C.I a-cond DB(a:3) d-cond DB(d:2) f-cond DB(f:4) e-cond DB(e:4) 64179(95%) 812 2,205 2.72 cefa ce:3 cef c:3 cfa c e cf 60801(90%) 3,486 27,127 7.78 54046(80%) 15,107 533,975 35.35 fa-cond DB(fa:2) ea-cond DB(ea:2) 47290(70%) 35,875 4,129,839 115.12 c ce c

Sparse dataset T25I20D100K CLOSET 100 A-CLOSE Performance Study CHARM Runtime Second 80 60 A-close and CLOSET CHAEM 40 20 0 0.7% 0.9% 1.1% 1.3% 1.5% Support Threshold Dense Dataset Pumsb Dense Dataset Connect-4 250 CLOSET CLOSET A-CLOSE Runtime Second Runtime Second 200 10000 A-CLOSE CHARM CHARM 150 1000 100 100 10 50 1 0 40% 50% 60% 70% 80% 90% 100% 75% 80% 85% 90% 95% Support Threshold Support Threshold

300 Conclusions T25I20D100K(1%) 250 Runtime Second Connect4(70%) Pumsb(85%) Three techniques: 200 � Applying a compressed FP-tree structure; 150 � Developing a single prefix path compression technique; 100 � Exploring a partition-based projection 50 mechanism. 0 0 2 4 8 6 10 Replication Factor

The shortcomings of the frequent pattern mining CLOSET:An Efficient - PowerPoint PPT Presentation

The shortcomings of the frequent pattern mining CLOSET:An Efficient Algorithm There may exist a large number of frequent for Mining itemsets in a transaction database, especially when the support threshold is low; Frequent Closed Itemsets

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Scope Constrained Frequent Pattern Mining: Constrained Frequent Pattern Mining: A A

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Data Mining 2018 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 10, 2018

Frequent Pattern Mining Overview Basic Concepts and Challenges Data Mining Techniques:

Introduction to Data Mining Frequent Pattern Mining and Association Analysis Li Xiong Slide

Introduction to Data Mining Frequent Pattern Mining and Association Analysis Li Xiong Slide

Frequent Itemset Mining Stony Brook University CSE545, Fall 2016 Frequent Itemset Mining aka

Statistics and Data Analysis Logistic Regression & Frequent Pattern Mining Ling-Chieh Kung

Data Mining Associative pattern mining Hamid Beigy Sharif University of Technology Fall 1396

CLOSET+:Searching for the Best Strategies for Mining Frequent Closed Itemsets Jianyong Wang,

CS570 Data Mining Frequent Pattern Mining and Association Analysis 2 Cengiz Gunay Slide

CS570 Introduction to Data Mining Frequent Pattern Mining and Association Analysis Cengiz Gunay

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

CS6220: DATA MINING TECHNIQUES Chapter 7: Advanced Pattern Mining Instructor: Yizhou Sun

Frequent Pattern Mining How Many Words Is a Picture Worth? E. Aiden and J-B Michel: Uncharted.

Atmospheric modeling: Fortran introduction, part 2 Topics Monday 1.11. Tuesday 2.11. Data

61A Lecture 35 ! cant deal with huge data ! cant deal with infinite sequences Iterators

Lecture 3: Intro to parallel machines and models David Bindel 1 Sep 2011 Logistics Remember:

Organizing Numerical Theories using Axiomatic Type Classes Lawrence C Paulson Computer

Complexity theory, summarized Evgenij Thorstensen V18 Evgenij Thorstensen Complexity theory,

Computable Numbers, What is a Computable . . . Computable Sets, What is a Computable . . . What

FILES, OBJECTS AND GRAPHICS CSSE 120 Rose-Hulman Institute of Technology Outline Files

Predicate Logic: Syntax Alice Gao Lecture 12 CS 245 Logic and Computation Fall 2019 1 / 30

The shortcomings of the frequent pattern mining CLOSET:An Efficient - PowerPoint PPT Presentation

The shortcomings of the frequent pattern mining CLOSET:An Efficient Algorithm There may exist a large number of frequent for Mining itemsets in a transaction database, especially when the support threshold is low; Frequent Closed Itemsets

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Scope Constrained Frequent Pattern Mining: Constrained Frequent Pattern Mining: A A

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Data Mining 2018 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 10, 2018

Frequent Pattern Mining Overview Basic Concepts and Challenges Data Mining Techniques:

Introduction to Data Mining Frequent Pattern Mining and Association Analysis Li Xiong Slide

Introduction to Data Mining Frequent Pattern Mining and Association Analysis Li Xiong Slide

Frequent Itemset Mining Stony Brook University CSE545, Fall 2016 Frequent Itemset Mining aka

Statistics and Data Analysis Logistic Regression &amp; Frequent Pattern Mining Ling-Chieh Kung

Data Mining Associative pattern mining Hamid Beigy Sharif University of Technology Fall 1396

CLOSET+:Searching for the Best Strategies for Mining Frequent Closed Itemsets Jianyong Wang,

CS570 Data Mining Frequent Pattern Mining and Association Analysis 2 Cengiz Gunay Slide

CS570 Introduction to Data Mining Frequent Pattern Mining and Association Analysis Cengiz Gunay

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

CS6220: DATA MINING TECHNIQUES Chapter 7: Advanced Pattern Mining Instructor: Yizhou Sun

Frequent Pattern Mining How Many Words Is a Picture Worth? E. Aiden and J-B Michel: Uncharted.

Atmospheric modeling: Fortran introduction, part 2 Topics Monday 1.11. Tuesday 2.11. Data

61A Lecture 35 ! cant deal with huge data ! cant deal with infinite sequences Iterators

Lecture 3: Intro to parallel machines and models David Bindel 1 Sep 2011 Logistics Remember:

Organizing Numerical Theories using Axiomatic Type Classes Lawrence C Paulson Computer

Complexity theory, summarized Evgenij Thorstensen V18 Evgenij Thorstensen Complexity theory,

Computable Numbers, What is a Computable . . . Computable Sets, What is a Computable . . . What

FILES, OBJECTS AND GRAPHICS CSSE 120 Rose-Hulman Institute of Technology Outline Files

Predicate Logic: Syntax Alice Gao Lecture 12 CS 245 Logic and Computation Fall 2019 1 / 30

Statistics and Data Analysis Logistic Regression & Frequent Pattern Mining Ling-Chieh Kung