Sampling and Reconstruction Using Bloom Filters Neha Sengupta 1 , - PowerPoint PPT Presentation

Sampling and Reconstruction Using Bloom Filters Neha Sengupta 1 , Amitabha Bagchi 1 , Srikanta Bedathur 2 , Maya Ramanath 1 1 IIT Delhi 2 IBM India Research Lab

Bloom Filters ● Compact storage ● m bits, k hash functions ● Set Membership Query: W ● Union ● Intersection 0 1 1 1 0 1 0 1 0 1 1 0 ● Sampling ● Reconstruction S = { A, B, C }

Sampling from a Bloom filter Two Approaches: Dictionary Attack Hash Sample Slow Non-uniform Sample Invertible hash function t e e c S 0100010001011110 a e p t s a Membership Membership Invert e d m i d Set Bit a n N a C Candidate Set 0100010001011110 Bloom Filter Sample Sample

BloomSampleTree Sampling from Bloom filters BloomSampleTree bT: (0..15) b 1 , b 2 , b 3 storing sets S 1 , S 2 , S 3 respectively. 1111111110 1111111010 (0..7) (8..15) Set S contains elements in range [0,15] M = 16 S stored in Bloom filter b (0..3) (4..7) (8..11) (12..15) with 10 bits m = 10 1101100010 0110011000 1100110010 0111011100 Bloom filters use 2 hash functions k = 2 Bloom filter b 1 Bloom filter b 2 Bloom filter b 3 BloomSampleTree bT 0100111000 0111000000 1000010100 created using S 1 = {1,12} S 2 = {4,6} S 3 = {3,7} (M = 16, m = 10, k = 2) Can be used to sample from all 3 sets

BloomSampleTree Example: Sampling from Bloom filter b 1 using Start at the root (0..15) BloomSampleTree bT 1111111010 1111111110 . . 0100111000 Original Set S 1 = {1, 12} 0100111000 (0..7) (8..15) = 0100111000 = 0100111000 b 1 = 0100111000 (0..3) (4..7) (8..11) (12..15) 1101100010 0110011000 1100110010 0111011100

BloomSampleTree Example: Sampling from Bloom filter b 1 using Start at the root (0..15) BloomSampleTree bT 1111111010 p R = 0.5 1111111110 . p L = 0.5 . 0100111000 Original Set S 1 = {1, 12} 0100111000 (0..7) (8..15) = 0100111000 = 0100111000 b 1 = 0100111000 (0..3) (4..7) (8..11) (12..15) 1101100010 0110011000 1100110010 0111011100

BloomSampleTree Example: Sampling from Bloom filter b 1 using (0..15) BloomSampleTree bT 1111111110 1111111010 Original Set S 1 = {1, 12} Chosen (0..7) (8..15) subtree b 1 = 0100111000 p R = 0.48 p L = 0.52 (0 .. 3) (4..7) (8..11) (12..15) 1101100010 0110011000 1100110010 0111011100 . . 0100111000 0100111000 = 0100110000 = 0100011000

BloomSampleTree Example: Sampling from Bloom filter b 1 using (0..15) BloomSampleTree bT 1111111110 1111111010 Original Set S 1 = {1, 12} (0..7) (8..15) Chosen leaf b 1 = 0100111000 (0..3) (4..7) (8..11) (12..15) 1101100010 0110011000 1100110010 0111011100 ● Membership(4, b 1 ) = false ● Membership(5, b 1 ) = false This path was a false positive ● Membership(6, b 1 ) = false ● Membership(7, b 1 ) = false

BloomSampleTree Example: Sampling from Bloom filter b 1 using (0..15) BloomSampleTree bT 1111111110 1111111010 Original Set S 1 = {1, 12} (0..7) (8..15) Chosen leaf b 1 = 0100111000 (0..3) (4..7) (8..11) (12..15) 1101100010 0110011000 1100110010 0111011100 ● Membership(0, b 1 ) = false ● Membership(1, b 1 ) = true Sample = 1 ● Membership(2, b 1 ) = false ● Membership(3, b 1 ) = false

BloomSampleTree - Sampling 1 (1 ... 10M) False Positive Path 2 3 Empty Intersection 4 5 Potential Path 7 6 True Path 8 9 10 11 14 15 13 12 Subtree pruned Subtree not from 25 37 visited at all search 38

BloomSampleTree - Sampling Setting:

BloomSampleTree - Sampling Setting: Sample Quality:

BloomSampleTree - Sampling Setting: Running Time:

BloomSampleTree - Sampling Simple Hash Functions Algorithms: ● Dictionary Attack (DA) ● BloomSampleTree(BST) Setting: ● M = 10 7 ● k = 3 ● m increases with desired Accuracy ● Size of set S, |S| varies from 100 to 50K MD5/Murmur Hash Functions

BloomSampleTree - Reconstruction ● BloomSampleTree bT Similar to Sampling ● S = { 4, 6 } Follow all positive paths ● Bloom filter b Multi-threaded 0111000000 ● Challenge: − Bloom filter intersections almost never empty Reconstruction − Every path is a “true path” − Follow path only if the size of intersection exceeds a threshold. ● False Positive of set S part of reconstructed Reconstructed Set: set S’ S’ = { 4, 6, 13 }

BloomSampleTree - Reconstruction Simple Hash Functions Algorithms: ● Dictionary Attack (DA): test each element in namespace for membership ● HashInvert (HI): Invert each set bit in the bloom filter and prune using membership ● BloomSampleTree(BST) Setting: ● M = 10 7 ● MD5 Hash Functions k = 3 ● m increases with desired Precision ● Size of set S, |S| varies from 100 to 50K

Pruned BloomSampleTree Large range of possible values Actually used namespace much smaller (0..15) ● Do not expand nodes corresponding to 1111111110 1111111010 ‘unused’ regions (0..7) (8..15) ● Smaller BloomSampleTree, faster Sampling ● Add nodes to BloomSampleTree as namespace changes (0..3) (4..7) (8..11) (12..15) 1100110010 0110011000

Pruned BloomSampleTree ● Sampling for smaller Namespace with Simple Hash Functions ● How large is the actually used section of the range = Namespace Fraction ● How are the actually used parts of namespace distributed within it? ● Uniform or Clustered ● Affects BST size, sampling time

Thank you!

Sampling and Reconstruction Using Bloom Filters Neha Sengupta 1 , - PowerPoint PPT Presentation

Sampling and Reconstruction Using Bloom Filters Neha Sengupta 1 , Amitabha Bagchi 1 , Srikanta Bedathur 2 , Maya Ramanath 1 1 IIT Delhi 2 IBM India Research Lab Bloom Filters Compact storage m bits, k hash functions Set Membership

Outline Bloom filters Applications of Bloom filters Our replacement for Bloom filters

Bloom Filters Queries False-Positives Analysis Summary Anil Maheshwari anil@scs.carleton.ca

Bloom Filters References A. Broder and M. Mitzenmacher, Network applications of Bloom A.

Bloom Filters Anna Karlin Most slides by Shreya Jayaraman, Luxi Wang, Alex Tsun Bloom Filters:

Revisiting Bloom Filters Payload attribution via Hierarchiecal Bloom Filters Kulesh

Overview of Discrete-Time Filters First-order filters Ideal filters Practical filters

Overview of Discrete-Time Filters Discrete-Time Filters Overview First-order filters N M

An Examination of Bloom Filters and their Applications Jacob Honoroff March 16, 2006 Outline

Lecture #2: Advanced hashing and concentration bounds o Bloom filters o Cuckoo hashing o Load

Vectorized Bloom Filters for Advanced SIMD Processors Orestis Polychroniou Kenneth A. Ross

3D RECONSTRUCTION Reconstruction method Reconstruction from images Reconstruction from video

Filters (Bloom & Quotient) CSCI 333 Operations Filters approximately represent sets.

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

Student Research Project 1: HomePlug Security Axel Puppe, Jeroen Vanderauwera February 2, 2010

Multilocality in Rwanda: forced of deliberate choice? Ine Cottyn Phd Candidate International

BUILDING ON THE PAST Brad Anderson Alberta Chamber of Resources 2/19/2016 Alberta Chamber of

Working on ENIAC: The Lost Labors of the Information Age Thomas Haigh www.tomandmaria.com/tom

Detecting Advanced Network Threats Using a Similarity Search AIMS 2016 Wednesday 22 nd June, 2016

Wireless LAN Setup & Optimizing Wireless Client in Linux Hacking and Cracking Wireless

CYBE R SE CURI T Y Pre se nte d b y Willia m Whitne y I I I Ga rla nd Po we r & L ig

Chapter 2: First Declension Chapter 2 covers the following: the term declension, the three basic

Sampling and Reconstruction Using Bloom Filters Neha Sengupta 1 , - PowerPoint PPT Presentation

Sampling and Reconstruction Using Bloom Filters Neha Sengupta 1 , Amitabha Bagchi 1 , Srikanta Bedathur 2 , Maya Ramanath 1 1 IIT Delhi 2 IBM India Research Lab Bloom Filters Compact storage m bits, k hash functions Set Membership

Outline Bloom filters Applications of Bloom filters Our replacement for Bloom filters

Bloom Filters Queries False-Positives Analysis Summary Anil Maheshwari anil@scs.carleton.ca

Bloom Filters References A. Broder and M. Mitzenmacher, Network applications of Bloom A.

Bloom Filters Anna Karlin Most slides by Shreya Jayaraman, Luxi Wang, Alex Tsun Bloom Filters:

Revisiting Bloom Filters Payload attribution via Hierarchiecal Bloom Filters Kulesh

Overview of Discrete-Time Filters First-order filters Ideal filters Practical filters

Overview of Discrete-Time Filters Discrete-Time Filters Overview First-order filters N M

An Examination of Bloom Filters and their Applications Jacob Honoroff March 16, 2006 Outline

Lecture #2: Advanced hashing and concentration bounds o Bloom filters o Cuckoo hashing o Load

Vectorized Bloom Filters for Advanced SIMD Processors Orestis Polychroniou Kenneth A. Ross

3D RECONSTRUCTION Reconstruction method Reconstruction from images Reconstruction from video

Filters (Bloom &amp; Quotient) CSCI 333 Operations Filters approximately represent sets.

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

Student Research Project 1: HomePlug Security Axel Puppe, Jeroen Vanderauwera February 2, 2010

Multilocality in Rwanda: forced of deliberate choice? Ine Cottyn Phd Candidate International

BUILDING ON THE PAST Brad Anderson Alberta Chamber of Resources 2/19/2016 Alberta Chamber of

Working on ENIAC: The Lost Labors of the Information Age Thomas Haigh www.tomandmaria.com/tom

Detecting Advanced Network Threats Using a Similarity Search AIMS 2016 Wednesday 22 nd June, 2016

Wireless LAN Setup &amp; Optimizing Wireless Client in Linux Hacking and Cracking Wireless

CYBE R SE CURI T Y Pre se nte d b y Willia m Whitne y I I I Ga rla nd Po we r &amp; L ig

Chapter 2: First Declension Chapter 2 covers the following: the term declension, the three basic

Filters (Bloom & Quotient) CSCI 333 Operations Filters approximately represent sets.

Wireless LAN Setup & Optimizing Wireless Client in Linux Hacking and Cracking Wireless

CYBE R SE CURI T Y Pre se nte d b y Willia m Whitne y I I I Ga rla nd Po we r & L ig