SmartMiner: A Depth First Algorithm Guided by Tail Information for - PDF document

SmartMiner: A Depth First Algorithm Guided by Tail Information for Mining Maximal Frequent Itemsets Qinghua Zou Wesley W. Chu Baojing Lu Computer Science Department Computer Science Department Computer Science Department University of California-Los Angeles University of California-Los Angeles North Dakota State University zou@cs.ucla.edu wwc@cs.ucla.edu baojing.lu@ndsu.nodak.edu in a bottom up fashion until no candidate set can be formed. ABSTRACT Second, sampling approach [7]: it selects samples of a dataset to Maximal frequent itemsets (MFI) are crucial to many tasks in data form the candidate set. The candidate set is tested in the entire mining. Since the MaxMiner algorithm first introduced dataset to identify frequent itemsets. Sampling reduces enumeration trees for mining MFI in 1998, there have been computation complexity but the result is incomplete. Third, data several methods proposed to use depth first search to improve transformation approach [6,16,17]: it transforms a dataset for performance. To further improve the performance of mining MFI, efficient mining. For example, the FP-tree [6] builds up a we proposed a technique to gather and pass tail (of a node) compressed data representation called FP-tree from a dataset and information to determine the next node to explore during the then mines frequent itemsets directly from the FP-tree. The mining process. Our algorithm uses an augmented dynamic pattern decomposition algorithm (PDA) [16,17] decomposes reordering heuristic with considering of the tail information. transactions and shrinks the dataset in each pass. Both FP-tree and Compared with Mafia and GenMax, SmartMiner generates a PDA greatly reduce the original dataset and also do not need to much smaller search tree, requires a smaller number of support generate candidate sets. counting, and does not require superset checking. Using the datasets Mushroom and Connect, our experimental study reveals When the frequent patterns are long, mining FI is infeasible that SmartMiner generates the same MFI as Mafia and GenMax, because of the exponential number of frequent itemsets. Thus, but yields an order of magnitude improvement in speed. algorithms mining FCI [9,15,10] are proposed since FCI is enough to generate association rules. However, FCI could also be exponentially large as the FI. As a result, researchers now turn to Keywords find MFI. Given the set of MFI, it is easy to analyze many Data mining, frequent patterns, maximal frequent pattern, tail interesting properties of the dataset, such as the longest pattern, information, search space pruning. the overlap of the MFI, etc. All FI can be built up from MFI and can be counted for support in a single scan of the database. 1. INTRODUCTION Moreover, we can focus on part of the MFI to do supervised data Mining frequent itemsets in large datasets is an important problem mining. in the data mining field since it enables essential data mining tasks In this paper we introduce the SmartMiner that at each step passes such as discovering association rules, data correlations, sequential tail information (defined in section 2) to guide the search for new patterns, etc. The problem of finding frequent itemsets was MFI. SmartMiner using an augmented heuristic and tail originally proposed by Agrawal [1] in his association rule model information has many benefits: it does not require superset and the support confidence framework. It can be formally stated checking, reduces the computation for counting support, and as following: yields a small search tree. Our experimental results reveal that Let I be a set of items and D be a set of transactions, where a SmartMiner is an order of magnitude faster than Mafia [4] and transaction is an itemset. The support of an itemset is the number GenMax [5] in generating MFI on the same datasets. of transactions containing the itemset. An itemset is frequent if its support is at least a user specified minimum support value, 1.1 Related works minSup. Let FI denote the set of all frequent itemsets. An itemset We first introduce an enumeration tree for an itemset I . Assume is closed if there is no superset that has the same support. The set ≤ there is a total ordering over the items I in the database. We of all frequent closed itemsets is denoted by FCI. A frequent L i ≤ itemset is called maximal if it is not a subset of any other frequent i i occurs before item i in the ordering. say if item j L k j j itemset. We denote MFI as the set of all maximal frequent This ordering can be used to enumerate the item subset lattice itemsets. Any maximal frequent itemset X is a frequent closed (search space). Each node composed of head and tail represents a itemset since no nontrivial superset of X is frequent. Thus we ⊆ ⊆ state in the search space. The head is a candidate for FI while the MFI FCI FI have . tail contains candidate items to form new heads. For example, Figure 1 shows a complete enumeration tree over five items abcde There are three different approaches for generating FI. First, with the ordering a,b,c,d,e . Each node is written as head:tail . It candidate set generate-and-test approach [1,11,14,8,12,7]: most begins with root node :abcde . For each item a i in the tail of a previous algorithms belong to this group. The basic idea is to node X:Y , a sub node is created with Xa i as its head and the items generate and then test the candidate set. This process is repeated

SmartMiner: A Depth First Algorithm Guided by Tail Information for - PDF document

SmartMiner: A Depth First Algorithm Guided by Tail Information for Mining Maximal Frequent Itemsets Qinghua Zou Wesley W. Chu Baojing Lu Computer Science Department Computer Science Department Computer Science Department University of

CS161 Recursion Continued Tail recursion n Tail recursion is a recursive call that occurs as

FFR Guided Functional FFR Guided Functional FFR Guided Functional FFR Guided Functional

Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen

Race Condition Shared Data: 4 5 6 1 8 5 6 20 9 ? Synchronization and Deadlocks tail

Race Condition Shared Data: 5 6 4 1 8 5 6 20 9 ? InterProcess Communication tail A[]

Guided Therapeutics in Cancer Surgery Guided Therapeutics in Cancer Surgery Guided Therapeutics

for each dst in my.out_edges if dst.depth > my.depth+1 then dst.depth = my.depth+1

tail bounds tail bounds For a random variable X, the tails of X are the parts of the PMF/density

Probe or Wait : Handling tail losses using Multipath TCP Kiran Yedugundla, Per Hurtig, Anna

MVC Guided Pathways Brief review of Guided Pathways at MVC Plan for Today Spring

Evolution of valley depth and width Evolution of valley depth and width Evolution of valley depth

Week 5 Kullmann Analysing BFS Depth-first search Depth-first search Analysing DFS

RGBD Tutorial 14210240041 Gu Pan Image RGB YUV Lab Depth Image RGB image Depth image Each pixel in

Network Science Depth-First Search Joao Meidanis University of Campinas, Brazil March 24, 2020

Year 3 Guided Pathways Plan Presentation Presented by: Palomar Guided Pathways Team DATE: May

Guided Pathways Equity & Education Update Feb 7, 2020 Guided Pathways Decision Making

Using Unsupervised Paradigm Acquisition for Prefixes Daniel Zeman FAL MFF, Univerzita Karlova,

PREGEL: A SYSTEM FOR LARGE-SCALE GRAPH PROCESSING Grzegorz Malewicz, Matthew H. Austern, Aart J.

ChemoPort Access Guidelines & Demonstration What is Implantable Chemoport ?

WPW Syndrome Ahmed El-Damaty MD Electrophysiology and Pacing Service Cardiovascular department

2 Preliminaries of the same size and covered by the same superset. For ex- ample, ab , ac and ad

Securing the Tor Network Mike Perry Black Hat USA 2007 Defcon 2007 What is Tor? Volunteer

financial misstatement prediction A comparison of deep learning and text mining approach for

Jena Hwang Na-Rae Han Vivek Srikumar Archna Bhatia Tim OGorman Nathan Schneider August 4,

SmartMiner: A Depth First Algorithm Guided by Tail Information for - PDF document

SmartMiner: A Depth First Algorithm Guided by Tail Information for Mining Maximal Frequent Itemsets Qinghua Zou Wesley W. Chu Baojing Lu Computer Science Department Computer Science Department Computer Science Department University of

CS161 Recursion Continued Tail recursion n Tail recursion is a recursive call that occurs as

FFR Guided Functional FFR Guided Functional FFR Guided Functional FFR Guided Functional

Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen

Race Condition Shared Data: 4 5 6 1 8 5 6 20 9 ? Synchronization and Deadlocks tail

Race Condition Shared Data: 5 6 4 1 8 5 6 20 9 ? InterProcess Communication tail A[]

Guided Therapeutics in Cancer Surgery Guided Therapeutics in Cancer Surgery Guided Therapeutics

for each dst in my.out_edges if dst.depth &gt; my.depth+1 then dst.depth = my.depth+1

tail bounds tail bounds For a random variable X, the tails of X are the parts of the PMF/density

Probe or Wait : Handling tail losses using Multipath TCP Kiran Yedugundla, Per Hurtig, Anna

MVC Guided Pathways Brief review of Guided Pathways at MVC Plan for Today Spring

Evolution of valley depth and width Evolution of valley depth and width Evolution of valley depth

Week 5 Kullmann Analysing BFS Depth-first search Depth-first search Analysing DFS

RGBD Tutorial 14210240041 Gu Pan Image RGB YUV Lab Depth Image RGB image Depth image Each pixel in

Network Science Depth-First Search Joao Meidanis University of Campinas, Brazil March 24, 2020

Year 3 Guided Pathways Plan Presentation Presented by: Palomar Guided Pathways Team DATE: May

Guided Pathways Equity &amp; Education Update Feb 7, 2020 Guided Pathways Decision Making

Using Unsupervised Paradigm Acquisition for Prefixes Daniel Zeman FAL MFF, Univerzita Karlova,

PREGEL: A SYSTEM FOR LARGE-SCALE GRAPH PROCESSING Grzegorz Malewicz, Matthew H. Austern, Aart J.

ChemoPort Access Guidelines &amp; Demonstration What is Implantable Chemoport ?

WPW Syndrome Ahmed El-Damaty MD Electrophysiology and Pacing Service Cardiovascular department

2 Preliminaries of the same size and covered by the same superset. For ex- ample, ab , ac and ad

Securing the Tor Network Mike Perry Black Hat USA 2007 Defcon 2007 What is Tor? Volunteer

financial misstatement prediction A comparison of deep learning and text mining approach for

Jena Hwang Na-Rae Han Vivek Srikumar Archna Bhatia Tim OGorman Nathan Schneider August 4,

for each dst in my.out_edges if dst.depth > my.depth+1 then dst.depth = my.depth+1

Guided Pathways Equity & Education Update Feb 7, 2020 Guided Pathways Decision Making

ChemoPort Access Guidelines & Demonstration What is Implantable Chemoport ?