From Path Tree To Frequent Patterns: A Framework for Mining Frequent - PDF document

From Path Tree To Frequent Patterns: A Framework for Mining Frequent Patterns Yabo Xu, Jeffrey Xu Yu Guimei Liu, Hongjun Lu Chinese University of Hong Kong The Hong Kong University of Science and Technology Hong Kong, China Hong Kong, China � ybxu,yu � cslgm,luhj ✁ @se.cuhk.edu.hk ✁ @cs.ust.hk Abstract 1 and 0 represent the presence and absence, respectively, of the items in the set of transactions. Other data layout such as vertical tid-list, horizontal item-vector, horizontal item-list In this paper, we propose a new framework for mining were also studied [10, 6, 12]. frequent patterns from large transactional databases. The In this paper, we study a general framework for a multi- core of the framework is of a novel coded prefix-path tree user environment where a large number of users might issue with two representations, namely, a memory-based prefix- path tree and a disk-based prefix-path tree. The disk-based different mining queries from time to time. In brief, the prefix-path tree is simple in its data structure yet rich in main tasks in our general framework are listed below. information contained, and is small in size. The memory- ✌ 1. Constructing an initial tree in memory for a transac- based prefix-path tree is simple and compact. Upon the tional database. memory-based prefix-path tree, a new depth-first frequent ✌ 2. Mining using the tree constructed in main memory. pattern discovery algorithm, called ✂✄✂ -Mine, is proposed ✌ 3. Converting the in-memory tree to a disk-based tree. in this paper that outperforms FP-growth significantly. The ✌ 4. Loading a portion of the tree on disk into main memory memory-based prefix-path tree can be stored on disk using ✌ 2.) a disk-based prefix-path tree with assistance of the new cod- for mining. (Note the mining is the same as ing scheme. We present efficient loading algorithms to load the minimal required disk-based prefix-path tree into main We observe that the existing algorithms become deficient memory. Our technique is to push constraints into the load- in such an environment, due to the fact that all of the algo- ing process, which has not been well studied yet. rithms aim at mining a single task in a one-by-one manner. In other words, the existing algorithms repeat the first two ✌ 1 and ✌ 2, for every mining query, even though the tasks, 1. Introduction mining queries are the same. In order to efficiently process mining queries in a multi-user environment, it is highly de- Recent studies show pattern-growth method is one of sirable to i) have an even faster algorithm when mining in ✌ 1 and ✌ 2), and ii) reduce the cost of the most effective methods for frequent pattern mining main memory (task ✌ 3 and ✌ 4). Both motivate us [1, 2, 4, 5, 8, 7, 9]. As a divide-and-conquer method, this reconstructing a tree (task method partitions (projects) the database into partitions re- to study new mining algorithms and new data structures cursively, but does not generate candidate sets. This method which differentiate from the existing FP-growth algorithm also makes use of Apriori property [3]: if any length ☎ pat- and its data structure, FP-tree, because the complex node- tern is not frequent in the database, its length ✆✝☎✟✞✡✠☞☛ super- links cross the FP-tree in a unpredictable manner, and the patterns can never be frequent. It counts frequent patterns bottom-up FP-growth algorithm makes FP-tree difficult to in order to decide whether it can assemble longer patterns. be efficiently implemented on disk. Most of the algorithms use a tree as the basic data struc- The main contribution of our work is given below. We ture to mine frequent patterns, such as the lexicographic tree propose a novel coded prefix-path tree, ✂✍✂ -tree, as the core [1, 2, 4, 5] and the FP-tree [8]. Different strategies were ex- of our framework. This prefix-path tree has two representa- tensively studied such as depth-first [2, 1], breath-first [2, 4], tions, a disk-based representation and a memory-based rep- top-down [11] and bottom-up [8]. Coding techniques are resentation. Both are node-link-free. It is worth noting that also used. In [1], bit-patterns are used for efficient count- the memory-based representation and the disk-based repre- ing. In [5], a vertical tid-vector is used, in which a bit of sentation are designed for different purposes. The former

From Path Tree To Frequent Patterns: A Framework for Mining Frequent - PDF document

From Path Tree To Frequent Patterns: A Framework for Mining Frequent Patterns Yabo Xu, Jeffrey Xu Yu Guimei Liu, Hongjun Lu Chinese University of Hong Kong The Hong Kong University of Science and Technology Hong Kong, China Hong Kong, China

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

1 Closed Patterns and Max-Patterns Closed Patterns and Max-Patterns A long pattern contains a

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

Factory Patterns: Factory Method and Abstract Factory Design Patterns In Java Bob Tarr

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Frequent Item Sets Chau Tran & Chun-Che Wang Outline 1. Definitions Frequent Itemsets

Scope Constrained Frequent Pattern Mining: Constrained Frequent Pattern Mining: A A

Frequent Itemset Mining Stony Brook University CSE545, Fall 2016 Frequent Itemset Mining aka

Minimum Spanning Tree (undirected graph) 2 Path tree vs. spanning tree We have constructed

Minimum Spanning Tree (undirected graph) 2 Path tree vs. spanning tree We have constructed

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

intro associations frequent patterns

Principles and Patterns 26 February, 2020 Recap Principles Patterns Inheritance Anti-patterns

Frequent Pattern Mining Overview Basic Concepts and Challenges Data Mining Techniques:

Introduction to Data Mining Frequent Pattern Mining and Association Analysis Li Xiong Slide

Introduction to Data Mining Frequent Pattern Mining and Association Analysis Li Xiong Slide

BLOCKCHAIN: THE TECHNOLOGY AND ITS IMPLICATIONS Every company is a technology company, no matter

Generation 3 Middleware overview : (some of) the same slides as last time, almost status : not the

2014 Cha irma n Ca mp s T a x Pro po sa l T ax Update Curre nt Sta tus o f T a x E

2015 CUSBA CROSS-BORDER FORECAST Canadas Great Balancing Act February 5, 2015 Detroit Branch

COMPLETE INVENTORY CONTROL SOLUTIONS WHY INVENTORY SOLUTIONS? Secure long-term contracts

FY 2017 Results 23 February 2018 Howard Davies Chairman Ross McEwan Chief Executive Officer

RENTAL HOUSING IN THE ROARING 20S ULI Fall Meeting / CDC Blue October 25, 2017 Taylor

Big Changes, Unknown Impacts Grand Junction Area Chamber of Commerce Place cover image here

Sambuz

Useful Links

Newsletter

Mail Us

From Path Tree To Frequent Patterns: A Framework for Mining Frequent - PDF document

From Path Tree To Frequent Patterns: A Framework for Mining Frequent Patterns Yabo Xu, Jeffrey Xu Yu Guimei Liu, Hongjun Lu Chinese University of Hong Kong The Hong Kong University of Science and Technology Hong Kong, China Hong Kong, China

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

1 Closed Patterns and Max-Patterns Closed Patterns and Max-Patterns A long pattern contains a

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

Factory Patterns: Factory Method and Abstract Factory Design Patterns In Java Bob Tarr

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Frequent Item Sets Chau Tran &amp; Chun-Che Wang Outline 1. Definitions Frequent Itemsets

Scope Constrained Frequent Pattern Mining: Constrained Frequent Pattern Mining: A A

Frequent Itemset Mining Stony Brook University CSE545, Fall 2016 Frequent Itemset Mining aka

Minimum Spanning Tree (undirected graph) 2 Path tree vs. spanning tree We have constructed

Minimum Spanning Tree (undirected graph) 2 Path tree vs. spanning tree We have constructed

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

intro associations frequent patterns

Principles and Patterns 26 February, 2020 Recap Principles Patterns Inheritance Anti-patterns

Frequent Pattern Mining Overview Basic Concepts and Challenges Data Mining Techniques:

Introduction to Data Mining Frequent Pattern Mining and Association Analysis Li Xiong Slide

Introduction to Data Mining Frequent Pattern Mining and Association Analysis Li Xiong Slide

BLOCKCHAIN: THE TECHNOLOGY AND ITS IMPLICATIONS Every company is a technology company, no matter

Generation 3 Middleware overview : (some of) the same slides as last time, almost status : not the

2014 Cha irma n Ca mp s T a x Pro po sa l T ax Update Curre nt Sta tus o f T a x E

2015 CUSBA CROSS-BORDER FORECAST Canadas Great Balancing Act February 5, 2015 Detroit Branch

COMPLETE INVENTORY CONTROL SOLUTIONS WHY INVENTORY SOLUTIONS? Secure long-term contracts

FY 2017 Results 23 February 2018 Howard Davies Chairman Ross McEwan Chief Executive Officer

RENTAL HOUSING IN THE ROARING 20S ULI Fall Meeting / CDC Blue October 25, 2017 Taylor

Big Changes, Unknown Impacts Grand Junction Area Chamber of Commerce Place cover image here

Sambuz

Useful Links

Newsletter

Mail Us

Frequent Item Sets Chau Tran & Chun-Che Wang Outline 1. Definitions Frequent Itemsets