I great potential for representing a set in main memory [13] in - PDF document

120 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 1, JANUARY 2010 The Dynamic Bloom Filters Deke Guo, Member , IEEE , Jie Wu, Fellow , IEEE , Honghui Chen, Ye Yuan, and Xueshan Luo Abstract —A Bloom filter is an effective, space-efficient data structure for concisely representing a set, and supporting approximate membership queries. Traditionally, the Bloom filter and its variants just focus on how to represent a static set and decrease the false positive probability to a sufficiently low level. By investigating mainstream applications based on the Bloom filter, we reveal that dynamic data sets are more common and important than static sets. However, existing variants of the Bloom filter cannot support dynamic data sets well. To address this issue, we propose dynamic Bloom filters to represent dynamic sets, as well as static sets and design necessary item insertion, membership query, item deletion, and filter union algorithms. The dynamic Bloom filter can control the false positive probability at a low level by expanding its capacity as the set cardinality increases. Through comprehensive mathematical analysis, we show that the dynamic Bloom filter uses less expected memory than the Bloom filter when representing dynamic sets with an upper bound on set cardinality, and also that the dynamic Bloom filter is more stable than the Bloom filter due to infrequent reconstruction when addressing dynamic sets without an upper bound on set cardinality. Moreover, the analysis results hold in stand- alone applications, as well as distributed applications. Index Terms —Bloom filters, dynamic Bloom filters, information representation. Ç 1 I NTRODUCTION I great potential for representing a set in main memory [13] in NFORMATION representation and processing of member- stand-alone applications. For example, SBFs have been used ship queries are two associated issues that encompass the to provide a probabilistic approach for explicit state model core problems in many computer applications. Representa- checking of finite-state transition systems [13], to summar- tion means organizing information based on a given format ize the contents of stream data in memory [14], [15], to store and mechanism such that information is operable by a the states of flows in the on-chip memory at networking corresponding method. The processing of membership devices [16], and to store the statistical values of tokens to queries involves making decisions based on whether an speed up the statistical-based Bayesian filters [17]. item with a specific attribute value belongs to a given set. A The SBF has been modified and improved from different standard Bloom filter (SBF) is a space-efficient data aspects for a variety of specific problems. The most structure for representing a set and answering membership queries within a constant delay [1]. The space efficiency is important variations include compressed Bloom filters [18], counting Bloom filters [12], distance-sensitive Bloom achieved at the cost of false positives in membership queries, and for many applications, the space savings filters [19], Bloom filters with two hash functions [20], space- outweigh this drawback when the probability of an error code Bloom filters [21], spectral Bloom filters [22], general- is sufficiently low. ized Bloom filters [23], Bloomier filters [24], and Bloom The SBF has been extensively used in many database filters based on partitioned hashing [25]. Compressed Bloom applications [2], for example, the Bloom join [3]. Recently, it filters can improve performance in terms of bandwidth has started receiving more widespread attention in net- saving when an SBF is passed on as a message. Counter working literature [4]. An SBF can be used as a summariz- Bloom filters deal mainly with the item deletion operation. Distance-sensitive Bloom filters, using locality-sensitive ing technique to aid global collaboration in peer-to-peer hash functions, can answer queries of the form, “Is x close (P2P) networks [5], [6], [7], support probabilistic algorithms to an item of S ?” Bloom filters with two hash functions use a for routing and locating resources [8], [9], [10], [11], and standard technique in hashing to simplify the implementa- share Web cache information [12]. In addition, SBFs have tion of SBFs significantly. Space-code Bloom filters and spectral Bloom filters focus on multisets, which support . D. Guo, H. Chen, and X. Luo are with the Key Laboratory of C 4 ISR queries of the form, “How many occurrences of an item are Technology, National University of Defense Technology, Changsha there in a given multiset?” The SBF and its mainstream 410073, China. variations are suitable for representing static sets whose E-mail: {guodeke, chh0808}@gmail.com, xsluo@nudt.edu.cn. . J. Wu is with the Department of Computer and Information Sciences, cardinality is known prior to design and deployment. Temple University, 1805 N. Borad Street, Philadelphia, PA 19122. Although the SBF and its variations have found suitable E-mail: jiewu@temple.edu. applications in different fields, the following three obstacles . Y. Yuan is with the Institute of Computer Systems, Northeastern University, 132#, Shen Yang City, Liao Ning Province 110004, China. still lack suitable and practical solutions: E-mail: linuxyy@gmail.com. 1. For stand-alone applications that know the upper Manuscript received 26 May 2007; revised 19 July 2008; accepted 10 Feb. 2009; published online 18 Feb. 2009. bound on set cardinality for a dynamic set in Recommended for acceptance by D. Gunopulos advance, a large number of bits are allocated for an For information on obtaining reprints of this article, please send e-mail to: SBF to represent all possible items of the dynamic set tkde@computer.org, and reference IEEECS Log Number TKDE-2007-05-0239. at the outset. This approach diminishes the space Digital Object Identifier no. 10.1109/TKDE.2009.57. 1041-4347/10/$26.00 � 2010 IEEE Published by the IEEE Computer Society

I great potential for representing a set in main memory [13] in - PDF document

120 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 1, JANUARY 2010 The Dynamic Bloom Filters Deke Guo, Member , IEEE , Jie Wu, Fellow , IEEE , Honghui Chen, Ye Yuan, and Xueshan Luo Abstract A Bloom filter is an effective,

Patterns and Anomalies Christos Faloutsos CMU CMU SCS Thank you The Department of

3G Long-Term Evolution (LTE) and System Architecture Evolution (SAE) Summer Semester 2011

Mississippi State Port Authority Port of Gulfport Restoration Program Pre-Bid Meeting West Pier

Los Domos Epithermal Project Targeting high-grade precious and base metal epithermal

Discovering the Undiscovered Rob Bills MD & CEO Disclaimer This presentation has been

RNA Secondary Structures Beyond Neutral Networks Peter Schuster Institut fr Theoretische

Future Directions in Computer Arithmetic: Panel Milo s D. Ercegovac University of California

Merger of Lakes Entertainment and Golden Gaming January 28, 2015 Safe Harbor / Non-GAAP Financial

The Need to Succeed: Tearing Down NFV Interoperability Walls Carsten Rossenhoevel, Co-Founder

Sanitizing Treatment for Minimally Processed Cucumber Products Fred Breidt and Suzanne

A Carbon Dioxide Partial Condensation Cycle for High Temperature Reactors Oct. 10th,2001

Troubleshooting & Q&A 1 1 SeisComP3 Troubleshooting scrttv Real Time Trace Viewer

Self-Affirmation As A Stress BUffer Denice Higareda Context In two or three columns Stress

Stable Homes Stable Schools Cumulative Program Report - Quarter 1 2020 Report to POGO Committee

Potency & Stability Testing for ATMP SME Workshop EMA Marcel Hoefnagel & Charlotte De

The Complexity of Achieving Stability in Sirte, Libya COL Nate Prussian Advisor: Dr. Natalia

Miniature Damped Accelerometer Series offers Wide Range of Applications developed &

Apache Hadoop Ingestion & Dispersal Framework Danny Chen dannyc@uber.com, Omkar Joshi

EPSCoR Project Research Components and Budget Overview University of Alaska Task 1: Development of

Turing degrees of orders on torsion-free abelian groups Reed Solomon joint with Asher Kach and

On Sets of Commuting and Anticommuting Paulis Rahul Sarkar 1,2 Ewout van den Berg 2 1 Institute for

APPLICATION EXAMPLES By: Yasmine A. El-Ashi Outline Peter Sylow Example 1 Example 2

Collision Dynamics of Non-Abelian Vortices in Spin-2 Spinor Bose-Ein stein Condensates

Chain Conditions On Rings And Modules Sutanu Roy Roll No.07212326 Department Of Mathematics

I great potential for representing a set in main memory [13] in - PDF document

120 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 1, JANUARY 2010 The Dynamic Bloom Filters Deke Guo, Member , IEEE , Jie Wu, Fellow , IEEE , Honghui Chen, Ye Yuan, and Xueshan Luo Abstract A Bloom filter is an effective,

Patterns and Anomalies Christos Faloutsos CMU CMU SCS Thank you The Department of

3G Long-Term Evolution (LTE) and System Architecture Evolution (SAE) Summer Semester 2011

Mississippi State Port Authority Port of Gulfport Restoration Program Pre-Bid Meeting West Pier

Los Domos Epithermal Project Targeting high-grade precious and base metal epithermal

Discovering the Undiscovered Rob Bills MD &amp; CEO Disclaimer This presentation has been

RNA Secondary Structures Beyond Neutral Networks Peter Schuster Institut fr Theoretische

Future Directions in Computer Arithmetic: Panel Milo s D. Ercegovac University of California

Merger of Lakes Entertainment and Golden Gaming January 28, 2015 Safe Harbor / Non-GAAP Financial

The Need to Succeed: Tearing Down NFV Interoperability Walls Carsten Rossenhoevel, Co-Founder

Sanitizing Treatment for Minimally Processed Cucumber Products Fred Breidt and Suzanne

A Carbon Dioxide Partial Condensation Cycle for High Temperature Reactors Oct. 10th,2001

Troubleshooting &amp; Q&amp;A 1 1 SeisComP3 Troubleshooting scrttv Real Time Trace Viewer

Self-Affirmation As A Stress BUffer Denice Higareda Context In two or three columns Stress

Stable Homes Stable Schools Cumulative Program Report - Quarter 1 2020 Report to POGO Committee

Potency &amp; Stability Testing for ATMP SME Workshop EMA Marcel Hoefnagel &amp; Charlotte De

The Complexity of Achieving Stability in Sirte, Libya COL Nate Prussian Advisor: Dr. Natalia

Miniature Damped Accelerometer Series offers Wide Range of Applications developed &amp;

Apache Hadoop Ingestion &amp; Dispersal Framework Danny Chen dannyc@uber.com, Omkar Joshi

EPSCoR Project Research Components and Budget Overview University of Alaska Task 1: Development of

Turing degrees of orders on torsion-free abelian groups Reed Solomon joint with Asher Kach and

On Sets of Commuting and Anticommuting Paulis Rahul Sarkar 1,2 Ewout van den Berg 2 1 Institute for

APPLICATION EXAMPLES By: Yasmine A. El-Ashi Outline Peter Sylow Example 1 Example 2

Collision Dynamics of Non-Abelian Vortices in Spin-2 Spinor Bose-Ein stein Condensates

Chain Conditions On Rings And Modules Sutanu Roy Roll No.07212326 Department Of Mathematics

Discovering the Undiscovered Rob Bills MD & CEO Disclaimer This presentation has been

Troubleshooting & Q&A 1 1 SeisComP3 Troubleshooting scrttv Real Time Trace Viewer

Potency & Stability Testing for ATMP SME Workshop EMA Marcel Hoefnagel & Charlotte De

Miniature Damped Accelerometer Series offers Wide Range of Applications developed &

Apache Hadoop Ingestion & Dispersal Framework Danny Chen dannyc@uber.com, Omkar Joshi