Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding - PDF document

Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding Window Yun Chi ∗ , Haixun Wang † , Philip S. Yu † , Richard R. Muntz ∗ ∗ Department of Computer Science, University of California, Los Angeles, CA 90095 † IBM Thomas J. Watson Research Center, Hawthorne, NY 10532 ychi@cs.ucla.edu, { haixun,psyu } @us.ibm.com, muntz@cs.ucla.edu Abstract For the window-based approach, we can come up with two naive methods: This paper considers the problem of mining closed fre- 1. Regenerate frequent itemsets from the entire window quent itemsets over a sliding window using limited mem- whenever a new transaction comes into or an old trans- ory space. We design a synopsis data structure to monitor action leaves the window. transactions in the sliding window so that we can output the current closed frequent itemsets at any time. Due to 2. Store every itemset, frequent or not, in a traditional time and memory constraints, the synopsis data structure data structure such as the prefix tree, and update its cannot monitor all possible itemsets. However, monitoring support whenever a new transaction comes into or an only frequent itemsets will make it impossible to detect new old transaction leaves the window. itemsets when they become frequent. In this paper, we in- troduce a compact data structure, the closed enumeration Clearly, method 1 is not efficient. In fact, as long as tree (CET), to maintain a dynamically selected set of item- the window size is reasonable, and the concept drifts in the sets over a sliding-window. The selected itemsets consist of stream is not too dramatic, most itemsets do not change a boundary between closed frequent itemsets and the rest of their status (from frequent to non-frequent or from non- the itemsets. Concept drifts in a data stream are reflected frequent to frequent) often. Thus, instead of regenerating by boundary movements in the CET. In other words, a status all frequent itemsets every time from the entire window, we change of any itemset (e.g., from non-frequent to frequent) shall adopt an incremental approach. must occur through the boundary. Because the boundary Method 2 is incremental. However, its space requirement is relatively stable, the cost of mining closed frequent item- makes it infeasible in practice. The prefix tree [1] is often sets over a sliding window is dramatically reduced to that used for mining association rules on static data sets. In a of mining transactions that can possibly cause boundary prefix tree, each node n I represents an itemset I and each movements in the CET. Our experiments show that our al- child node of n I represents an itemset obtained by adding gorithm performs much better than previous approaches. a new item to I . The total number of nodes is exponential. Due to memory constraints, we cannot keep a prefix tree in memory, and disk-based structures will make real time 1 Introduction update costly. In view of these challenges, we focus on a dynamically selected set of itemsets that are i) informative enough to Mining data streams for knowledge discovery is impor- answer at any time queries such as “what are the (closed) tant to many applications, such as fraud detection, intrusion frequent itemsets in the current window”, and at the same detection, trend learning, etc. In this paper, we consider the time, ii) small enough so that they can be easily maintained problem of mining closed frequent itemsets on data streams. in memory and updated in real time. Mining frequent itemset on static datasets has been stud- The problem is, of course, what itemsets shall we se- ied extensively. However, data streams have posed new lect for this purpose? To reduce memory usage, we are challenges. First, data streams are continuous, high-speed, tempted to select, for example, nothing but frequent (or even and unbounded. It is impossible to mine association rules closed frequent) itemsets. However, if the frequency of a from them using algorithms that require multiple scans. non-frequent itemset is not monitored, we will never know Second, the data distribution in streams are usually chang- when it becomes frequent. A naive approach is to moni- ing with time, and very often people are interested in the tor all itemsets whose support is above a reduced threshold most recent patterns. minsup − ǫ , so that we will not miss itemsets whose current It is thus of great interest to mine itemsets that are cur- support is within ǫ of minsup when they become frequent. rently frequent. One approach is to always focus on fre- This approach is apparently not general enough. quent itemsets in the most recent window. A similar effect In this paper, we design a synopsis data structure to keep can be achieved by exponentially discounting old itemsets. track of the boundary between closed frequent itemsets and the rest of the itemsets. Concept drifts in a data stream are ∗ The work of these two authors was partly supported by NSF under reflected by boundary movements in the data structure. In Grant Nos. 0086116, 0085773, and 9817773.

Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding - PDF document

Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding Window Yun Chi , Haixun Wang , Philip S. Yu , Richard R. Muntz Department of Computer Science, University of California, Los Angeles, CA 90095 IBM Thomas J.

Midterm Review Jan-Willem van de Meent Review: Frequent Itemsets Frequent Itemsets Items

Finding Recent Frequent Itemsets Adaptively over Online Data Stream Yueting Chen Outline

MAINTAINING COMPLIANCE MAINTAINING COMPLIANCE MAINTAINING COMPLIANCE MAINTAINING MAINTAINING

Frequent Item Sets Chau Tran & Chun-Che Wang Outline 1. Definitions Frequent Itemsets

Mining Frequent Itemsets in a Stream Toon Calders, TU/e (joint work with Bart Goethals and Nele

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

The shortcomings of the frequent pattern mining CLOSET:An Efficient Algorithm There may exist

Chapter VII: Frequent Itemsets & Association Rules Information Retrieval & Data Mining

Toon Calders Discovery Science, October 30 th 2012, Lyon Frequent Itemset Mining F I Mi i

Maintaining Frequent Itemsets over High-Speed Data Streams James Cheng, Yiping Ke, and Wilfred

Maintaining Frequent Itemsets over High-Speed Data Streams James Cheng, Yiping Ke, and Wilfred

Associations and Frequent Item Analysis 1 Outline Transactions Frequent itemsets

Frequent Itemsets Itemset: a set of items E.g., acm = {a, c, m} Transaction database TDB

CLOSET+:Searching for the Best Strategies for Mining Frequent Closed Itemsets Jianyong Wang,

Mining Frequent Itemsets in a Stream Toon Calders Nele Dexters Bart Goethals Eindhoven

Classification of curves Simple, not closed Simple, closed Closed, not simple Not simple, not

2018 Community Update on Smyrnas Strategic Vision Plan February 6 th , 2018 Agenda Welcome

Libertys Stomping Horse Bakken EOR Pilot Williston Basin Petroleum Conference y 23 rd 2018 May

Victory Villa Elementary Community Boundary Study Committee Meeting 1 January 11, 2017

South Bay Cities Energy Efficiency Program Quarterly Update Board of Directors Meeting October

Strategies for Preventing Hay Fires Dr. Glenn Shewmaker Extension Forage Specialist OBJECTIVES

Proposed Amendments to Title 128 Nebraska Hazardous Waste Regulations NDEQ Waste Management

Port of San Francisco, Pier 1 March 18, 2019 Agenda Introductions Background Scope of

www.agripositions.com Committed to serve agribusiness sector with deep domain knowledge and

Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding - PDF document

Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding Window Yun Chi , Haixun Wang , Philip S. Yu , Richard R. Muntz Department of Computer Science, University of California, Los Angeles, CA 90095 IBM Thomas J.

Midterm Review Jan-Willem van de Meent Review: Frequent Itemsets Frequent Itemsets Items

Finding Recent Frequent Itemsets Adaptively over Online Data Stream Yueting Chen Outline

MAINTAINING COMPLIANCE MAINTAINING COMPLIANCE MAINTAINING COMPLIANCE MAINTAINING MAINTAINING

Frequent Item Sets Chau Tran &amp; Chun-Che Wang Outline 1. Definitions Frequent Itemsets

Mining Frequent Itemsets in a Stream Toon Calders, TU/e (joint work with Bart Goethals and Nele

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

The shortcomings of the frequent pattern mining CLOSET:An Efficient Algorithm There may exist

Chapter VII: Frequent Itemsets &amp; Association Rules Information Retrieval &amp; Data Mining

Toon Calders Discovery Science, October 30 th 2012, Lyon Frequent Itemset Mining F I Mi i

Maintaining Frequent Itemsets over High-Speed Data Streams James Cheng, Yiping Ke, and Wilfred

Maintaining Frequent Itemsets over High-Speed Data Streams James Cheng, Yiping Ke, and Wilfred

Associations and Frequent Item Analysis 1 Outline Transactions Frequent itemsets

Frequent Itemsets Itemset: a set of items E.g., acm = {a, c, m} Transaction database TDB

CLOSET+:Searching for the Best Strategies for Mining Frequent Closed Itemsets Jianyong Wang,

Mining Frequent Itemsets in a Stream Toon Calders Nele Dexters Bart Goethals Eindhoven

Classification of curves Simple, not closed Simple, closed Closed, not simple Not simple, not

2018 Community Update on Smyrnas Strategic Vision Plan February 6 th , 2018 Agenda Welcome

Libertys Stomping Horse Bakken EOR Pilot Williston Basin Petroleum Conference y 23 rd 2018 May

Victory Villa Elementary Community Boundary Study Committee Meeting 1 January 11, 2017

South Bay Cities Energy Efficiency Program Quarterly Update Board of Directors Meeting October

Strategies for Preventing Hay Fires Dr. Glenn Shewmaker Extension Forage Specialist OBJECTIVES

Proposed Amendments to Title 128 Nebraska Hazardous Waste Regulations NDEQ Waste Management

Port of San Francisco, Pier 1 March 18, 2019 Agenda Introductions Background Scope of

www.agripositions.com Committed to serve agribusiness sector with deep domain knowledge and

Frequent Item Sets Chau Tran & Chun-Che Wang Outline 1. Definitions Frequent Itemsets

Chapter VII: Frequent Itemsets & Association Rules Information Retrieval & Data Mining