Verifying and Mining Frequent Patterns from Large Windows over Data - PDF document

Verifying and Mining Frequent Patterns from Large Windows over Data Streams Barzan Mozafari, Hetal Thakkar, Carlo Zaniolo Computer Science Department University of California Los Angeles, CA, USA { barzan,hthakkar,zaniolo } @cs.ucla.edu such large windows remains a computationally challenging Abstract — Mining frequent itemsets from data streams has proved to be very difficult because of computational complexity problem requiring algorithms that are faster and lighter than and the need for real-time response. In this paper, we introduce those used on stored data. Thus, algorithms that make multiple a novel verification algorithm which we then use to improve scans of the data should be avoided in favor of single- the performance of monitoring and mining tasks for association scan, incremental algorithms. In particular, the technique of rules. Thus, we propose a frequent itemset mining method partitioning large windows into slides (a.k.a. panes) to support for sliding windows, which is faster than the state-of-the-art methods—in fact, its running time that is nearly constant with incremental computations has proved very valuable in DSMS respect to the window size entails the mining of much larger [11], [12] and will be exploited in our approach. windows than it was possible before. The performance of other We will also make use of the following observation: in real- frequent itemset mining methods (including those on static data) world applications there is an obvious difference between the can be improved likewise, by replacing their counting methods (e.g., those using hash trees) by our verification algorithm. problem of (i) finding new association rules, and (ii) verifying the continuous validity of existing rules. I. I NTRODUCTION Normally, finding new rules requires both machines and Data streams have received much attention in recent years. domain experts, since size of the data is too large to be mined Furthermore, interest in online stream mining has also dra- by a person and importance of new rules with respect to matically increased [1], [2], [3], [4], [5], [6]. This interest is the application can only be validated by domain experts. In largely due to the growing set of streaming applications, such this situation, delays by the mining algorithms in detecting as credit card fraud detection, market basket data analysis, new frequent itemsets are also acceptable, provided that where data mining plays a critical role. In this paper, we focus they add little to the typical time required by the domain on the problem of mining frequent itemsets on large windows experts to validate new rules. Thus, we propose an algorithm defined over such data streams. This problem appears in many for incremental mining of frequent itemsets that compares of the applications mentioned above in different forms. favorably with existing algorithms when real-time response Mining frequent itemsets for association rules has been is required. Furthermore, the performance of the proposed studied extensively since it was first introduced by Agrawal et algorithm improves when small delays are acceptable. al. [1]. Since then many faster algorithms have been proposed Although a real-time introduction of new association rules [2], [3], [6], [7]. Furthermore, this problem appears as a is neither sensible nor feasible, the on-line verification of subproblem in many other mining contexts such as finding old rules is highly desirable in most application scenarios: sequential patterns [7], [3], clustering[8], and classification we need to determine immediately when old rules no longer [9], [10]. hold to stop them from pestering customers with improper The recent growth of interest in data stream systems and recommendations. Therefore, in this paper we propose fast data stream mining is due to the fact that, in many applica- algorithms, called verifiers henceforth, for verifying the fre- tions, data must be processed continuously, either because quency of previously frequent itemsets over newly arriving of real time requirements or simply because the stream windows. Toward this goal, we use sliding windows, whereby is too massive for a store-now & process-later approach. a large window is partitioned into smaller panes [11] and However, mining of data streams brings many challenges a response is returned promptly at the end of each slide not encountered in database mining, because of the real-time (rather than at the end of each large window). This also leads response requirement and the presence of bursty arrivals and to a more efficient computation since the frequency of the concept shifts (i.e., changes in the statistical properties of itemsets in the whole window can be computed incrementally data). In order to cope with such challenges, the continuous by counting itemsets in the new incoming (and old expiring) stream is often divided into windows, thus reducing the size panes. Thus to make this counting efficient, we introduce of the data that need to be stored and mined. This allows a novel concept of conditional counting , a.k.a. verification, detecting concept drifts/shifts by monitoring changes between which can be realized efficiently by the proposed verifiers. subsequent windows. Even so, association rule mining over Thus, the proposed incremental algorithm for finding frequent

Verifying and Mining Frequent Patterns from Large Windows over Data - PDF document

Verifying and Mining Frequent Patterns from Large Windows over Data Streams Barzan Mozafari, Hetal Thakkar, Carlo Zaniolo Computer Science Department University of California Los Angeles, CA, USA { barzan,hthakkar,zaniolo } @cs.ucla.edu such

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

1 Closed Patterns and Max-Patterns Closed Patterns and Max-Patterns A long pattern contains a

Scope Constrained Frequent Pattern Mining: Constrained Frequent Pattern Mining: A A

Frequent Itemset Mining Stony Brook University CSE545, Fall 2016 Frequent Itemset Mining aka

From Path Tree To Frequent Patterns: A Framework for Mining Frequent Patterns Yabo Xu, Jeffrey

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Introduction to Data Mining Frequent Pattern Mining and Association Analysis Li Xiong Slide

Introduction to Data Mining Frequent Pattern Mining and Association Analysis Li Xiong Slide

Windows Not just for houses Windows 1-10 Windows Server Essentially a jacked up windows 8 box

Platform Convergence Journey Windows Embedded Standard 7 Windows Embedded Standard 8 Converged

Data Mining 2018 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 10, 2018

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Frequent Pattern Mining Overview Basic Concepts and Challenges Data Mining Techniques:

Self- -Verifying Verifying Self Self-Verifying * * Dining Philosophers Dining Philosophers

Windows 8 Heap Internals Windows 8 Heap Internals Windows 8 Heap Internals INTRODUCTION Windows 8

1. 2. 3. 1. 2. 3. Windows 10 IoT Core Universal Windows Platform (UWP) Microsoft Azure v7

EU Collaborative Framework for Patient Registries - Pilot Phase Patients and Consumers

NON-FUMIGANT MEASURES AND ASSESSMENT OF HOST TOLERANCE FOR REPLANT DISEASE CONTROL Mark

MAPS SS7 SIGTRAN SIGTRAN Protocol Emulation over IP 818 West Diamond Avenue - Third Floor,

Forrest S. Smith Dan L Duncan Endowed Director South Texas Natives & Texas Native Seeds

Effects of Processing Delay on Function-Parallel Firewalls Ryan J. Farley and Errin W. Fulp

PROPERTY OWNERS MEETING SEWER SERVICE RATES Fair Oaks Sewer Maintenance District (District)

Supramolecular Chemistry for Pressure Sensitive Adhesives ? Gordon Seminar 2014 Xavier Callies, 3

A Copy-Protection Technique with Multi-Level Error Coding Chen-Yin Liao, Jen-Wei Yeh and

Verifying and Mining Frequent Patterns from Large Windows over Data - PDF document

Verifying and Mining Frequent Patterns from Large Windows over Data Streams Barzan Mozafari, Hetal Thakkar, Carlo Zaniolo Computer Science Department University of California Los Angeles, CA, USA { barzan,hthakkar,zaniolo } @cs.ucla.edu such

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

1 Closed Patterns and Max-Patterns Closed Patterns and Max-Patterns A long pattern contains a

Scope Constrained Frequent Pattern Mining: Constrained Frequent Pattern Mining: A A

Frequent Itemset Mining Stony Brook University CSE545, Fall 2016 Frequent Itemset Mining aka

From Path Tree To Frequent Patterns: A Framework for Mining Frequent Patterns Yabo Xu, Jeffrey

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Introduction to Data Mining Frequent Pattern Mining and Association Analysis Li Xiong Slide

Introduction to Data Mining Frequent Pattern Mining and Association Analysis Li Xiong Slide

Windows Not just for houses Windows 1-10 Windows Server Essentially a jacked up windows 8 box

Platform Convergence Journey Windows Embedded Standard 7 Windows Embedded Standard 8 Converged

Data Mining 2018 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 10, 2018

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Frequent Pattern Mining Overview Basic Concepts and Challenges Data Mining Techniques:

Self- -Verifying Verifying Self Self-Verifying * * Dining Philosophers Dining Philosophers

Windows 8 Heap Internals Windows 8 Heap Internals Windows 8 Heap Internals INTRODUCTION Windows 8

1. 2. 3. 1. 2. 3. Windows 10 IoT Core Universal Windows Platform (UWP) Microsoft Azure v7

EU Collaborative Framework for Patient Registries - Pilot Phase Patients and Consumers

NON-FUMIGANT MEASURES AND ASSESSMENT OF HOST TOLERANCE FOR REPLANT DISEASE CONTROL Mark

MAPS SS7 SIGTRAN SIGTRAN Protocol Emulation over IP 818 West Diamond Avenue - Third Floor,

Forrest S. Smith Dan L Duncan Endowed Director South Texas Natives &amp; Texas Native Seeds

Effects of Processing Delay on Function-Parallel Firewalls Ryan J. Farley and Errin W. Fulp

PROPERTY OWNERS MEETING SEWER SERVICE RATES Fair Oaks Sewer Maintenance District (District)

Supramolecular Chemistry for Pressure Sensitive Adhesives ? Gordon Seminar 2014 Xavier Callies, 3

A Copy-Protection Technique with Multi-Level Error Coding Chen-Yin Liao, Jen-Wei Yeh and

Forrest S. Smith Dan L Duncan Endowed Director South Texas Natives & Texas Native Seeds