Bloom Filters Rapha el Clifford (Slides by Benjamin Sach and - - PowerPoint PPT Presentation

bloom filters
SMART_READER_LITE
LIVE PREVIEW

Bloom Filters Rapha el Clifford (Slides by Benjamin Sach and - - PowerPoint PPT Presentation

Data Structures and Algorithms COMS21103 Bloom Filters Rapha el Clifford (Slides by Benjamin Sach and Ashley Montanaro) Introduction In this lecture we are interested in space efficient data structures for storing a set S which support


slide-1
SLIDE 1

Data Structures and Algorithms – COMS21103

Bloom Filters

Rapha¨ el Clifford (Slides by Benjamin Sach and Ashley Montanaro)

slide-2
SLIDE 2

Introduction

Our motivation comes from applications where the size of the universe U is much much larger than n INSERT(k) - inserts the key k from U into S MEMBER(k) - output ‘yes’ if k ∈ S In this lecture we are interested in space efficient data structures for storing a set S which support only two, basic operations: and ‘no’ otherwise

U is the universe, containing

all possible keys Let n be an upper bound on the number of keys that will ever be in S

slide-3
SLIDE 3

Introduction

Our motivation comes from applications where the size of the universe U is much much larger than n INSERT(k) - inserts the key k from U into S MEMBER(k) - output ‘yes’ if k ∈ S In this lecture we are interested in space efficient data structures for storing a set S which support only two, basic operations: and ‘no’ otherwise

U is the universe, containing

all possible keys Let n be an upper bound on the number of keys that will ever be in S

U

slide-4
SLIDE 4

Introduction

Our motivation comes from applications where the size of the universe U is much much larger than n INSERT(k) - inserts the key k from U into S MEMBER(k) - output ‘yes’ if k ∈ S In this lecture we are interested in space efficient data structures for storing a set S which support only two, basic operations: and ‘no’ otherwise

U is the universe, containing

all possible keys Let n be an upper bound on the number of keys that will ever be in S

U

a key in S

slide-5
SLIDE 5

Introduction

Our motivation comes from applications where the size of the universe U is much much larger than n INSERT(k) - inserts the key k from U into S MEMBER(k) - output ‘yes’ if k ∈ S In this lecture we are interested in space efficient data structures for storing a set S which support only two, basic operations: and ‘no’ otherwise

U is the universe, containing

all possible keys Let n be an upper bound on the number of keys that will ever be in S Important: You cannot ask “which keys are in S?”, only “is this key in S?”

U

a key in S

slide-6
SLIDE 6

Example and Motivation

Imagine you are attempting to build a blacklist of unsafe URLs that users should not visit The universe contains all possible URLs Whenever a new unsafe URL is discovered it is inserted into the data structure Whenever we want to visit a URL we check the data structure.

slide-7
SLIDE 7

Example and Motivation

Imagine you are attempting to build a blacklist of unsafe URLs that users should not visit The universe contains all possible URLs Whenever a new unsafe URL is discovered it is inserted into the data structure Whenever we want to visit a URL we check the data structure. INSERT(www.AwfulVirus.com)

slide-8
SLIDE 8

Example and Motivation

Imagine you are attempting to build a blacklist of unsafe URLs that users should not visit The universe contains all possible URLs Whenever a new unsafe URL is discovered it is inserted into the data structure Whenever we want to visit a URL we check the data structure. INSERT(www.AwfulVirus.com) INSERT(www.VirusStore.com)

slide-9
SLIDE 9

Example and Motivation

Imagine you are attempting to build a blacklist of unsafe URLs that users should not visit The universe contains all possible URLs Whenever a new unsafe URL is discovered it is inserted into the data structure Whenever we want to visit a URL we check the data structure. INSERT(www.AwfulVirus.com) INSERT(www.VirusStore.com)

Disclaimer: I take no responsability for the contents of these websites

slide-10
SLIDE 10

Example and Motivation

Imagine you are attempting to build a blacklist of unsafe URLs that users should not visit The universe contains all possible URLs Whenever a new unsafe URL is discovered it is inserted into the data structure Whenever we want to visit a URL we check the data structure. INSERT(www.AwfulVirus.com) INSERT(www.VirusStore.com)

Disclaimer: I take no responsability for the contents of these websites

MEMBER(www.BBC.co.uk) - returns ‘no’

slide-11
SLIDE 11

Example and Motivation

Imagine you are attempting to build a blacklist of unsafe URLs that users should not visit The universe contains all possible URLs Whenever a new unsafe URL is discovered it is inserted into the data structure Whenever we want to visit a URL we check the data structure. INSERT(www.AwfulVirus.com) INSERT(www.VirusStore.com)

Disclaimer: I take no responsability for the contents of these websites

MEMBER(www.BBC.co.uk) - returns ‘no’ MEMBER(www.VirusStore.com) - returns ‘yes’

slide-12
SLIDE 12

Example and Motivation

Imagine you are attempting to build a blacklist of unsafe URLs that users should not visit The universe contains all possible URLs Whenever a new unsafe URL is discovered it is inserted into the data structure Whenever we want to visit a URL we check the data structure. INSERT(www.AwfulVirus.com) INSERT(www.VirusStore.com) MEMBER(www.BBC.co.uk) - returns ‘no’ MEMBER(www.VirusStore.com) - returns ‘yes’

slide-13
SLIDE 13

Example and Motivation

Imagine you are attempting to build a blacklist of unsafe URLs that users should not visit The universe contains all possible URLs Whenever a new unsafe URL is discovered it is inserted into the data structure Whenever we want to visit a URL we check the data structure. INSERT(www.AwfulVirus.com) INSERT(www.VirusStore.com) MEMBER(www.BBC.co.uk) - returns ‘no’ MEMBER(www.VirusStore.com) - returns ‘yes’ INSERT(www.CleanUpPC.com)

slide-14
SLIDE 14

Example and Motivation

Imagine you are attempting to build a blacklist of unsafe URLs that users should not visit The universe contains all possible URLs Whenever a new unsafe URL is discovered it is inserted into the data structure Whenever we want to visit a URL we check the data structure. INSERT(www.AwfulVirus.com) INSERT(www.VirusStore.com) MEMBER(www.BBC.co.uk) - returns ‘no’ MEMBER(www.VirusStore.com) - returns ‘yes’ INSERT(www.CleanUpPC.com) MEMBER(www.BBC.co.uk) - returns ‘yes’

slide-15
SLIDE 15

Example and Motivation

Imagine you are attempting to build a blacklist of unsafe URLs that users should not visit The universe contains all possible URLs Whenever a new unsafe URL is discovered it is inserted into the data structure Whenever we want to visit a URL we check the data structure. INSERT(www.AwfulVirus.com) INSERT(www.VirusStore.com) MEMBER(www.BBC.co.uk) - returns ‘no’ MEMBER(www.VirusStore.com) - returns ‘yes’ INSERT(www.CleanUpPC.com) MEMBER(www.BBC.co.uk) - returns ‘yes’

?!

slide-16
SLIDE 16

Example and Motivation

Imagine you are attempting to build a blacklist of unsafe URLs that users should not visit The universe contains all possible URLs Whenever a new unsafe URL is discovered it is inserted into the data structure Whenever we want to visit a URL we check the data structure. INSERT(www.AwfulVirus.com) INSERT(www.VirusStore.com) MEMBER(www.BBC.co.uk) - returns ‘no’ MEMBER(www.VirusStore.com) - returns ‘yes’ INSERT(www.CleanUpPC.com) MEMBER(www.BBC.co.uk) - returns ‘yes’

?!

a Bloom filter is a randomised data structure - sometimes it gets the answer wrong

slide-17
SLIDE 17

Bloom filters

A Bloom filter is a randomised data structure for storing a set S which supports two operations

slide-18
SLIDE 18

Bloom filters

A Bloom filter is a randomised data structure for storing a set S which supports two operations The INSERT(k) operation inserts the key k from U into S

slide-19
SLIDE 19

Bloom filters

A Bloom filter is a randomised data structure for storing a set S which supports two operations The INSERT(k) operation inserts the key k from U into S (it never does this incorrectly)

slide-20
SLIDE 20

Bloom filters

In a bloom filter, the MEMBER(k) operation A Bloom filter is a randomised data structure for storing a set S which supports two operations The INSERT(k) operation inserts the key k from U into S (it never does this incorrectly)

slide-21
SLIDE 21

Bloom filters

In a bloom filter, the MEMBER(k) operation A Bloom filter is a randomised data structure for storing a set S which supports two operations always returns ‘yes’ if k ∈ S The INSERT(k) operation inserts the key k from U into S (it never does this incorrectly)

slide-22
SLIDE 22

Bloom filters

In a bloom filter, the MEMBER(k) operation A Bloom filter is a randomised data structure for storing a set S which supports two operations always returns ‘yes’ if k ∈ S however, if k is not in S there is a small chance (say 1%) that it will still say ‘yes’ The INSERT(k) operation inserts the key k from U into S (it never does this incorrectly)

slide-23
SLIDE 23

Bloom filters

In a bloom filter, the MEMBER(k) operation A Bloom filter is a randomised data structure for storing a set S which supports two operations always returns ‘yes’ if k ∈ S however, if k is not in S there is a small chance (say 1%) that it will still say ‘yes’ Why use a Bloom filter then? The INSERT(k) operation inserts the key k from U into S (it never does this incorrectly)

slide-24
SLIDE 24

Bloom filters

In a bloom filter, the MEMBER(k) operation A Bloom filter is a randomised data structure for storing a set S Both operations run in O(1) time and the space used is very very good which supports two operations always returns ‘yes’ if k ∈ S however, if k is not in S there is a small chance (say 1%) that it will still say ‘yes’ Why use a Bloom filter then? The INSERT(k) operation inserts the key k from U into S (it never does this incorrectly)

slide-25
SLIDE 25

Bloom filters

In a bloom filter, the MEMBER(k) operation A Bloom filter is a randomised data structure for storing a set S Both operations run in O(1) time and the space used is very very good which supports two operations always returns ‘yes’ if k ∈ S however, if k is not in S there is a small chance (say 1%) that it will still say ‘yes’ Why use a Bloom filter then? It will use O(n) bits of space to store up to n keys The INSERT(k) operation inserts the key k from U into S (it never does this incorrectly)

slide-26
SLIDE 26

Bloom filters

In a bloom filter, the MEMBER(k) operation A Bloom filter is a randomised data structure for storing a set S Both operations run in O(1) time and the space used is very very good which supports two operations always returns ‘yes’ if k ∈ S however, if k is not in S there is a small chance (say 1%) that it will still say ‘yes’ Why use a Bloom filter then? It will use O(n) bits of space to store up to n keys

  • the exact number of bits will depend on the failure probability

The INSERT(k) operation inserts the key k from U into S (it never does this incorrectly)

slide-27
SLIDE 27

Bloom filters

In a bloom filter, the MEMBER(k) operation A Bloom filter is a randomised data structure for storing a set S Both operations run in O(1) time and the space used is very very good which supports two operations always returns ‘yes’ if k ∈ S however, if k is not in S there is a small chance (say 1%) that it will still say ‘yes’ Why use a Bloom filter then? It will use O(n) bits of space to store up to n keys

  • the exact number of bits will depend on the failure probability

we’ll come back to this at the end The INSERT(k) operation inserts the key k from U into S (it never does this incorrectly)

slide-28
SLIDE 28

Approach 1: build an array

Before discussing Bloom filters, lets consider a naive approach using an array. . . For simplicity, let us think of the universe U as containing numbers 1, 2, 3 . . . |U|.

slide-29
SLIDE 29

Approach 1: build an array

Before discussing Bloom filters, lets consider a naive approach using an array. . . For simplicity, let us think of the universe U as containing numbers 1, 2, 3 . . . |U|. We could maintain a bit string B

slide-30
SLIDE 30

Approach 1: build an array

Before discussing Bloom filters, lets consider a naive approach using an array. . . For simplicity, let us think of the universe U as containing numbers 1, 2, 3 . . . |U|. We could maintain a bit string B Example:

1 1 1

1 2 3 4 5 6 7 8 9 10

B

|U|

slide-31
SLIDE 31

Approach 1: build an array

Before discussing Bloom filters, lets consider a naive approach using an array. . . For simplicity, let us think of the universe U as containing numbers 1, 2, 3 . . . |U|. We could maintain a bit string B Example:

1 1 1

1 2 3 4 5 6 7 8 9 10

B

where B[k] = 1 if k ∈ S and B[k] = 0 otherwise

|U|

slide-32
SLIDE 32

Approach 1: build an array

Before discussing Bloom filters, lets consider a naive approach using an array. . . For simplicity, let us think of the universe U as containing numbers 1, 2, 3 . . . |U|. We could maintain a bit string B Example: here |U| = 10 and S contains 3,6 and 8

1 1 1

1 2 3 4 5 6 7 8 9 10

B

where B[k] = 1 if k ∈ S and B[k] = 0 otherwise

|U|

slide-33
SLIDE 33

Approach 1: build an array

Before discussing Bloom filters, lets consider a naive approach using an array. . . For simplicity, let us think of the universe U as containing numbers 1, 2, 3 . . . |U|. We could maintain a bit string B Example: here |U| = 10 and S contains 3,6 and 8 While the operations take O(1) time, this array is |U| bits long!

1 1 1

1 2 3 4 5 6 7 8 9 10

B

where B[k] = 1 if k ∈ S and B[k] = 0 otherwise

|U|

slide-34
SLIDE 34

Approach 1: build an array

Before discussing Bloom filters, lets consider a naive approach using an array. . . For simplicity, let us think of the universe U as containing numbers 1, 2, 3 . . . |U|. We could maintain a bit string B Example: here |U| = 10 and S contains 3,6 and 8 While the operations take O(1) time, this array is |U| bits long! It certainly isn’t suitable for the application we have seen

1 1 1

1 2 3 4 5 6 7 8 9 10

B

where B[k] = 1 if k ∈ S and B[k] = 0 otherwise

|U|

slide-35
SLIDE 35

Approach 2: build a hash table

We could solve the problem by hashing. . . We now maintain a much shorter bit string B of some length m < |U| (to be determined later) Example:

1 2 3

B

slide-36
SLIDE 36

Approach 2: build a hash table

We could solve the problem by hashing. . . We now maintain a much shorter bit string B of some length m < |U| (to be determined later) Assume we have access to a hash function h which maps each key k ∈ U to an integer h(k) between 1 and m Example:

1 2 3

B

slide-37
SLIDE 37

Approach 2: build a hash table

We could solve the problem by hashing. . . We now maintain a much shorter bit string B of some length m < |U| (to be determined later) Assume we have access to a hash function h which maps each key k ∈ U to an integer h(k) between 1 and m Example:

Imagine that m = 3 and

h(www.AwfulVirus.com) = 2 h(www.VirusStore.com) = 3 h(www.BBC.co.uk) = 3 h(www.BBC.co.uk) = 3 1 2 3

B

slide-38
SLIDE 38

Approach 2: build a hash table

We could solve the problem by hashing. . . We now maintain a much shorter bit string B of some length m < |U| (to be determined later) Assume we have access to a hash function h which maps each key k ∈ U to an integer h(k) between 1 and m Example:

Imagine that m = 3 and

INSERT(k) sets B[h(k)] = 1

h(www.AwfulVirus.com) = 2 h(www.VirusStore.com) = 3 h(www.BBC.co.uk) = 3 h(www.BBC.co.uk) = 3 1 2 3

B

slide-39
SLIDE 39

Approach 2: build a hash table

We could solve the problem by hashing. . . We now maintain a much shorter bit string B of some length m < |U| (to be determined later) Assume we have access to a hash function h which maps each key k ∈ U to an integer h(k) between 1 and m Example:

Imagine that m = 3 and

INSERT(k) sets B[h(k)] = 1 MEMBER(k) returns ‘yes’ if B[h(k)] = 1 and ‘no’ if B[h(k)] = 0

h(www.AwfulVirus.com) = 2 h(www.VirusStore.com) = 3 h(www.BBC.co.uk) = 3 h(www.BBC.co.uk) = 3 1 2 3

B

slide-40
SLIDE 40

Approach 2: build a hash table

We could solve the problem by hashing. . . We now maintain a much shorter bit string B of some length m < |U| (to be determined later) Assume we have access to a hash function h which maps each key k ∈ U to an integer h(k) between 1 and m Example:

Imagine that m = 3 and

INSERT(k) sets B[h(k)] = 1 MEMBER(k) returns ‘yes’ if B[h(k)] = 1 and ‘no’ if B[h(k)] = 0

h(www.AwfulVirus.com) = 2

INSERT(www.AwfulVirus.com)

h(www.VirusStore.com) = 3 h(www.BBC.co.uk) = 3 h(www.BBC.co.uk) = 3 1 2 3

B

slide-41
SLIDE 41

Approach 2: build a hash table

We could solve the problem by hashing. . . We now maintain a much shorter bit string B of some length m < |U| (to be determined later) Assume we have access to a hash function h which maps each key k ∈ U to an integer h(k) between 1 and m Example:

Imagine that m = 3 and

INSERT(k) sets B[h(k)] = 1 MEMBER(k) returns ‘yes’ if B[h(k)] = 1 and ‘no’ if B[h(k)] = 0

h(www.AwfulVirus.com) = 2

INSERT(www.AwfulVirus.com)

h(www.VirusStore.com) = 3 h(www.BBC.co.uk) = 3 h(www.BBC.co.uk) = 3

1

1 2 3

B

slide-42
SLIDE 42

Approach 2: build a hash table

We could solve the problem by hashing. . . We now maintain a much shorter bit string B of some length m < |U| (to be determined later) Assume we have access to a hash function h which maps each key k ∈ U to an integer h(k) between 1 and m Example:

Imagine that m = 3 and

INSERT(k) sets B[h(k)] = 1 MEMBER(k) returns ‘yes’ if B[h(k)] = 1 and ‘no’ if B[h(k)] = 0

h(www.AwfulVirus.com) = 2

INSERT(www.AwfulVirus.com)

h(www.VirusStore.com) = 3 h(www.BBC.co.uk) = 3 h(www.BBC.co.uk) = 3

1

INSERT(www.VirusStore.com)

1 2 3

B

slide-43
SLIDE 43

Approach 2: build a hash table

We could solve the problem by hashing. . . We now maintain a much shorter bit string B of some length m < |U| (to be determined later) Assume we have access to a hash function h which maps each key k ∈ U to an integer h(k) between 1 and m Example:

Imagine that m = 3 and

INSERT(k) sets B[h(k)] = 1 MEMBER(k) returns ‘yes’ if B[h(k)] = 1 and ‘no’ if B[h(k)] = 0

h(www.AwfulVirus.com) = 2

INSERT(www.AwfulVirus.com)

h(www.VirusStore.com) = 3 h(www.BBC.co.uk) = 3 h(www.BBC.co.uk) = 3

1

INSERT(www.VirusStore.com)

1

1 2 3

B

slide-44
SLIDE 44

Approach 2: build a hash table

We could solve the problem by hashing. . . We now maintain a much shorter bit string B of some length m < |U| (to be determined later) Assume we have access to a hash function h which maps each key k ∈ U to an integer h(k) between 1 and m Example:

Imagine that m = 3 and

INSERT(k) sets B[h(k)] = 1 MEMBER(k) returns ‘yes’ if B[h(k)] = 1 and ‘no’ if B[h(k)] = 0

h(www.AwfulVirus.com) = 2

INSERT(www.AwfulVirus.com) MEMBER(www.BBC.co.uk) - returns ‘yes’

h(www.VirusStore.com) = 3 h(www.BBC.co.uk) = 3 h(www.BBC.co.uk) = 3

1

INSERT(www.VirusStore.com)

1

1 2 3

B

slide-45
SLIDE 45

Approach 2: build a hash table

We could solve the problem by hashing. . . We now maintain a much shorter bit string B of some length m < |U| (to be determined later) Assume we have access to a hash function h which maps each key k ∈ U to an integer h(k) between 1 and m Example:

Imagine that m = 3 and

INSERT(k) sets B[h(k)] = 1 MEMBER(k) returns ‘yes’ if B[h(k)] = 1 and ‘no’ if B[h(k)] = 0

h(www.AwfulVirus.com) = 2

INSERT(www.AwfulVirus.com) MEMBER(www.BBC.co.uk) - returns ‘yes’

h(www.VirusStore.com) = 3 h(www.BBC.co.uk) = 3 h(www.BBC.co.uk) = 3

1

INSERT(www.VirusStore.com)

1

This is called a collision

1 2 3

B

slide-46
SLIDE 46

Approach 2: build a hash table

The problem with hashing is that if m < |U| then there will be some keys that hash to the same positions (these are called collisions)

slide-47
SLIDE 47

Approach 2: build a hash table

The problem with hashing is that if m < |U| then there will be some keys that hash to the same positions (these are called collisions) If we call MEMBER(k) for some key k not in S but there is a key k′ ∈ S with h(k) = h(k′) we will incorrectly output ‘yes’

slide-48
SLIDE 48

Approach 2: build a hash table

The problem with hashing is that if m < |U| then there will be some keys that hash to the same positions (these are called collisions) If we call MEMBER(k) for some key k not in S but there is a key k′ ∈ S with h(k) = h(k′) we will incorrectly output ‘yes’ To make sure that the probability of an error is low for every operation sequence, we pick the hash function h at random

slide-49
SLIDE 49

Approach 2: build a hash table

The problem with hashing is that if m < |U| then there will be some keys that hash to the same positions (these are called collisions) If we call MEMBER(k) for some key k not in S but there is a key k′ ∈ S with h(k) = h(k′) we will incorrectly output ‘yes’ To make sure that the probability of an error is low for every operation sequence, we pick the hash function h at random Important: h is chosen before any operations happen and never changes

slide-50
SLIDE 50

Approach 2: build a hash table

The problem with hashing is that if m < |U| then there will be some keys that hash to the same positions (these are called collisions) If we call MEMBER(k) for some key k not in S but there is a key k′ ∈ S with h(k) = h(k′) we will incorrectly output ‘yes’ To make sure that the probability of an error is low for every operation sequence, we pick the hash function h at random For every key k ∈ U, the value of h(k) is chosen independently and uniformly at random: that is, the probability that h(k) = j is 1

m for all j between 1 and m

(each position is equally likely) Important: h is chosen before any operations happen and never changes

slide-51
SLIDE 51

What is the probability of an error?

Assume we have already INSERTED n keys into the structure Further, we have just called MEMBER(k) for some key k not in S (which will check whether B[h(k)] = 1)

slide-52
SLIDE 52

What is the probability of an error?

Assume we have already INSERTED n keys into the structure Further, we have just called MEMBER(k) for some key k not in S We want to know the probability that the answer returned is ‘yes’ (which would be bad) (which will check whether B[h(k)] = 1)

slide-53
SLIDE 53

What is the probability of an error?

Assume we have already INSERTED n keys into the structure Further, we have just called MEMBER(k) for some key k not in S We want to know the probability that the answer returned is ‘yes’ (which would be bad) The bit-string B contains at most n 1’s among the m positions (which will check whether B[h(k)] = 1)

slide-54
SLIDE 54

What is the probability of an error?

Assume we have already INSERTED n keys into the structure Further, we have just called MEMBER(k) for some key k not in S We want to know the probability that the answer returned is ‘yes’ (which would be bad) The bit-string B contains at most n 1’s among the m positions

B

1 1 1 1 1 1 1 1

m

(which will check whether B[h(k)] = 1)

slide-55
SLIDE 55

What is the probability of an error?

Assume we have already INSERTED n keys into the structure Further, we have just called MEMBER(k) for some key k not in S We want to know the probability that the answer returned is ‘yes’ (which would be bad) The bit-string B contains at most n 1’s among the m positions

B

1 1 1 1 1 1 1 1

m

By definition, h(k) is equally likely to be any position between 1 and m (which will check whether B[h(k)] = 1)

slide-56
SLIDE 56

What is the probability of an error?

Assume we have already INSERTED n keys into the structure Further, we have just called MEMBER(k) for some key k not in S We want to know the probability that the answer returned is ‘yes’ (which would be bad) The bit-string B contains at most n 1’s among the m positions

h(k) B

1 1 1 1 1 1 1 1

m

By definition, h(k) is equally likely to be any position between 1 and m (which will check whether B[h(k)] = 1)

slide-57
SLIDE 57

What is the probability of an error?

Assume we have already INSERTED n keys into the structure Further, we have just called MEMBER(k) for some key k not in S We want to know the probability that the answer returned is ‘yes’ (which would be bad) The bit-string B contains at most n 1’s among the m positions

h(k) B

1 1 1 1 1 1 1 1

m

By definition, h(k) is equally likely to be any position between 1 and m (which will check whether B[h(k)] = 1)

slide-58
SLIDE 58

What is the probability of an error?

Assume we have already INSERTED n keys into the structure Further, we have just called MEMBER(k) for some key k not in S We want to know the probability that the answer returned is ‘yes’ (which would be bad) The bit-string B contains at most n 1’s among the m positions

h(k) B

1 1 1 1 1 1 1 1

m

By definition, h(k) is equally likely to be any position between 1 and m Therefore the probability that B[h(k)] = 1 is at most n

m

(which will check whether B[h(k)] = 1)

slide-59
SLIDE 59

What is the probability of an error?

Assume we have already INSERTED n keys into the structure Further, we have just called MEMBER(k) for some key k not in S We want to know the probability that the answer returned is ‘yes’ (which would be bad) The bit-string B contains at most n 1’s among the m positions

h(k) B

1 1 1 1 1 1 1 1

m

By definition, h(k) is equally likely to be any position between 1 and m Therefore the probability that B[h(k)] = 1 is at most n

m

(which will check whether B[h(k)] = 1) If we choose m = 100n then we get a failure probability of at most 1%

slide-60
SLIDE 60

Approach 2: build a hash table

We have developed a randomised data structure for storing a set S which supports two operations

slide-61
SLIDE 61

Approach 2: build a hash table

We have developed a randomised data structure for storing a set S which supports two operations The INSERT(k) operation inserts the key k from U into S

slide-62
SLIDE 62

Approach 2: build a hash table

We have developed a randomised data structure for storing a set S which supports two operations The INSERT(k) operation inserts the key k from U into S (it never does this incorrectly)

slide-63
SLIDE 63

Approach 2: build a hash table

Like in a bloom filter, the MEMBER(k) operation We have developed a randomised data structure for storing a set S which supports two operations The INSERT(k) operation inserts the key k from U into S (it never does this incorrectly)

slide-64
SLIDE 64

Approach 2: build a hash table

Like in a bloom filter, the MEMBER(k) operation We have developed a randomised data structure for storing a set S which supports two operations always returns ‘yes’ if k ∈ S The INSERT(k) operation inserts the key k from U into S (it never does this incorrectly)

slide-65
SLIDE 65

Approach 2: build a hash table

Like in a bloom filter, the MEMBER(k) operation We have developed a randomised data structure for storing a set S which supports two operations always returns ‘yes’ if k ∈ S however, if k is not in S there is a small chance (in fact 1%) that it will still say ‘yes’ The INSERT(k) operation inserts the key k from U into S (it never does this incorrectly)

slide-66
SLIDE 66

Approach 2: build a hash table

Like in a bloom filter, the MEMBER(k) operation We have developed a randomised data structure for storing a set S Both operations run in O(1) time and the space used is 100n bits which supports two operations always returns ‘yes’ if k ∈ S however, if k is not in S there is a small chance (in fact 1%) that it will still say ‘yes’ The INSERT(k) operation inserts the key k from U into S (it never does this incorrectly)

slide-67
SLIDE 67

Approach 2: build a hash table

Like in a bloom filter, the MEMBER(k) operation We have developed a randomised data structure for storing a set S Both operations run in O(1) time and the space used is 100n bits which supports two operations always returns ‘yes’ if k ∈ S however, if k is not in S there is a small chance (in fact 1%) that it will still say ‘yes’ The INSERT(k) operation inserts the key k from U into S (it never does this incorrectly) when storing up to n keys

slide-68
SLIDE 68

Approach 2: build a hash table

Like in a bloom filter, the MEMBER(k) operation We have developed a randomised data structure for storing a set S Both operations run in O(1) time and the space used is 100n bits which supports two operations always returns ‘yes’ if k ∈ S however, if k is not in S there is a small chance (in fact 1%) that it will still say ‘yes’ The INSERT(k) operation inserts the key k from U into S (it never does this incorrectly) neither the space nor the failure probability depend on |U| when storing up to n keys

slide-69
SLIDE 69

Approach 2: build a hash table

Like in a bloom filter, the MEMBER(k) operation We have developed a randomised data structure for storing a set S Both operations run in O(1) time and the space used is 100n bits which supports two operations always returns ‘yes’ if k ∈ S however, if k is not in S there is a small chance (in fact 1%) that it will still say ‘yes’ The INSERT(k) operation inserts the key k from U into S (it never does this incorrectly) neither the space nor the failure probability depend on |U| when storing up to n keys if we wanted a better probability, we could use more space

slide-70
SLIDE 70

Approach 2: build a hash table

Like in a bloom filter, the MEMBER(k) operation We have developed a randomised data structure for storing a set S Both operations run in O(1) time and the space used is 100n bits which supports two operations always returns ‘yes’ if k ∈ S however, if k is not in S there is a small chance (in fact 1%) that it will still say ‘yes’ Why use a Bloom filter then? The INSERT(k) operation inserts the key k from U into S (it never does this incorrectly) neither the space nor the failure probability depend on |U| when storing up to n keys if we wanted a better probability, we could use more space

slide-71
SLIDE 71

Approach 2: build a hash table

Like in a bloom filter, the MEMBER(k) operation We have developed a randomised data structure for storing a set S Both operations run in O(1) time and the space used is 100n bits which supports two operations always returns ‘yes’ if k ∈ S however, if k is not in S there is a small chance (in fact 1%) that it will still say ‘yes’ Why use a Bloom filter then? The INSERT(k) operation inserts the key k from U into S (it never does this incorrectly) neither the space nor the failure probability depend on |U| when storing up to n keys if we wanted a better probability, we could use more space we will get much better space usage for the same probability

slide-72
SLIDE 72

Approach 3: build a bloom filter

We still maintain a bit string B of some length m < |U| Now we have r hash functions: h1, h2, . . . , hr

h1, h2, . . . , hr

(we will choose r and m later) Each hash function hi maps a key k, to an integer hi(k) between 1 and m

slide-73
SLIDE 73

Approach 3: build a bloom filter

We still maintain a bit string B of some length m < |U| Example:

Imagine that m = 4, r = 2 and

h1(AwVi.com) = 2 h1(ViSt.com) = 3 h1(BBC.com) = 2

Now we have r hash functions: h1, h2, . . . , hr

h1, h2, . . . , hr

(we will choose r and m later) Each hash function hi maps a key k, to an integer hi(k) between 1 and m

h2(AwVi.com) = 1 h2(ViSt.com) = 2 h2(BBC.com) = 4 1 2 3 4

slide-74
SLIDE 74

Approach 3: build a bloom filter

We still maintain a bit string B of some length m < |U| Example:

Imagine that m = 4, r = 2 and

INSERT(k) sets B[hi(k)] = 1 MEMBER(k) returns ‘yes’ if and only if for all i, B[hi(k)] = 1

h1(AwVi.com) = 2 h1(ViSt.com) = 3 h1(BBC.com) = 2

Now we have r hash functions: h1, h2, . . . , hr

h1, h2, . . . , hr

(we will choose r and m later) for all i between 1 and r Each hash function hi maps a key k, to an integer hi(k) between 1 and m

h2(AwVi.com) = 1 h2(ViSt.com) = 2 h2(BBC.com) = 4 1 2 3 4

slide-75
SLIDE 75

Approach 3: build a bloom filter

We still maintain a bit string B of some length m < |U| Example:

Imagine that m = 4, r = 2 and

INSERT(k) sets B[hi(k)] = 1 MEMBER(k) returns ‘yes’ if and only if for all i, B[hi(k)] = 1

h1(AwVi.com) = 2

INSERT(AwVi.com)

h1(ViSt.com) = 3 h1(BBC.com) = 2

Now we have r hash functions: h1, h2, . . . , hr

h1, h2, . . . , hr

(we will choose r and m later) for all i between 1 and r Each hash function hi maps a key k, to an integer hi(k) between 1 and m

h2(AwVi.com) = 1 h2(ViSt.com) = 2 h2(BBC.com) = 4 1 2 3 4

slide-76
SLIDE 76

Approach 3: build a bloom filter

We still maintain a bit string B of some length m < |U| Example:

Imagine that m = 4, r = 2 and

INSERT(k) sets B[hi(k)] = 1 MEMBER(k) returns ‘yes’ if and only if for all i, B[hi(k)] = 1

h1(AwVi.com) = 2

INSERT(AwVi.com)

h1(ViSt.com) = 3 h1(BBC.com) = 2

1

Now we have r hash functions: h1, h2, . . . , hr

h1, h2, . . . , hr

(we will choose r and m later) for all i between 1 and r Each hash function hi maps a key k, to an integer hi(k) between 1 and m

h2(AwVi.com) = 1 h2(ViSt.com) = 2 h2(BBC.com) = 4

1

1 2 3 4

slide-77
SLIDE 77

Approach 3: build a bloom filter

We still maintain a bit string B of some length m < |U| Example:

Imagine that m = 4, r = 2 and

INSERT(k) sets B[hi(k)] = 1 MEMBER(k) returns ‘yes’ if and only if for all i, B[hi(k)] = 1

h1(AwVi.com) = 2

INSERT(AwVi.com)

h1(ViSt.com) = 3 h1(BBC.com) = 2

1

INSERT(ViSt.com)

Now we have r hash functions: h1, h2, . . . , hr

h1, h2, . . . , hr

(we will choose r and m later) for all i between 1 and r Each hash function hi maps a key k, to an integer hi(k) between 1 and m

h2(AwVi.com) = 1 h2(ViSt.com) = 2 h2(BBC.com) = 4

1

1 2 3 4

slide-78
SLIDE 78

Approach 3: build a bloom filter

We still maintain a bit string B of some length m < |U| Example:

Imagine that m = 4, r = 2 and

INSERT(k) sets B[hi(k)] = 1 MEMBER(k) returns ‘yes’ if and only if for all i, B[hi(k)] = 1

h1(AwVi.com) = 2

INSERT(AwVi.com)

h1(ViSt.com) = 3 h1(BBC.com) = 2

1

INSERT(ViSt.com)

1

Now we have r hash functions: h1, h2, . . . , hr

h1, h2, . . . , hr

(we will choose r and m later) for all i between 1 and r Each hash function hi maps a key k, to an integer hi(k) between 1 and m

h2(AwVi.com) = 1 h2(ViSt.com) = 2 h2(BBC.com) = 4

1

1 2 3 4

slide-79
SLIDE 79

Approach 3: build a bloom filter

We still maintain a bit string B of some length m < |U| Example:

Imagine that m = 4, r = 2 and

INSERT(k) sets B[hi(k)] = 1 MEMBER(k) returns ‘yes’ if and only if for all i, B[hi(k)] = 1

h1(AwVi.com) = 2

INSERT(AwVi.com) MEMBER(BBC.com) - returns ‘no’

h1(ViSt.com) = 3 h1(BBC.com) = 2

1

INSERT(ViSt.com)

1

Now we have r hash functions: h1, h2, . . . , hr

h1, h2, . . . , hr

(we will choose r and m later) for all i between 1 and r Each hash function hi maps a key k, to an integer hi(k) between 1 and m

h2(AwVi.com) = 1 h2(ViSt.com) = 2 h2(BBC.com) = 4

1

1 2 3 4

slide-80
SLIDE 80

Approach 3: build a bloom filter

We still maintain a bit string B of some length m < |U| Example:

Imagine that m = 4, r = 2 and

INSERT(k) sets B[hi(k)] = 1 MEMBER(k) returns ‘yes’ if and only if for all i, B[hi(k)] = 1

h1(AwVi.com) = 2

INSERT(AwVi.com) MEMBER(BBC.com) - returns ‘no’

h1(ViSt.com) = 3 h1(BBC.com) = 2

1

INSERT(ViSt.com)

1

Now we have r hash functions: h1, h2, . . . , hr

h1, h2, . . . , hr

(we will choose r and m later) for all i between 1 and r Each hash function hi maps a key k, to an integer hi(k) between 1 and m

h2(AwVi.com) = 1 h2(ViSt.com) = 2 h2(BBC.com) = 4

Much better! 1

1 2 3 4

slide-81
SLIDE 81

Approach 3: build a bloom filter

We still maintain a bit string B of some length m < |U| Example:

Imagine that m = 4, r = 2 and

INSERT(k) sets B[hi(k)] = 1 MEMBER(k) returns ‘yes’ if and only if for all i, B[hi(k)] = 1

h1(AwVi.com) = 2

INSERT(AwVi.com) MEMBER(BBC.com) - returns ‘no’

h1(ViSt.com) = 3 h1(BBC.com) = 2

1

INSERT(ViSt.com)

1

Now we have r hash functions: h1, h2, . . . , hr

h1, h2, . . . , hr

(we will choose r and m later) for all i between 1 and r Each hash function hi maps a key k, to an integer hi(k) between 1 and m

h2(AwVi.com) = 1 h2(ViSt.com) = 2 h2(BBC.com) = 4

Much better! 1

1 2 3 4

(not convinced?)

slide-82
SLIDE 82

Approach 3: build a bloom filter

We still maintain a bit string B of some length m < |U| INSERT(k) sets B[hi(k)] = 1 MEMBER(k) returns ‘yes’ if and only if for all i, B[hi(k)] = 1 Now we have r hash functions: h1, h2, . . . , hr

h1, h2, . . . , hr

(we will choose r and m later) for all i between 1 and r Each hash function hi maps a key k, to an integer hi(k) between 1 and m For every key k ∈ U, that is, the probability that hi(k) = j is 1

m for all j between 1 and m

(each position is equally likely) the value of each hi(k) is chosen independently and uniformly at random:

slide-83
SLIDE 83

Approach 3: build a bloom filter

We still maintain a bit string B of some length m < |U| INSERT(k) sets B[hi(k)] = 1 MEMBER(k) returns ‘yes’ if and only if for all i, B[hi(k)] = 1 Now we have r hash functions: h1, h2, . . . , hr

h1, h2, . . . , hr

(we will choose r and m later) for all i between 1 and r Each hash function hi maps a key k, to an integer hi(k) between 1 and m For every key k ∈ U, that is, the probability that hi(k) = j is 1

m for all j between 1 and m

(each position is equally likely) but what is the probability of a wrong answer? the value of each hi(k) is chosen independently and uniformly at random:

slide-84
SLIDE 84

What is the probability of an error?

Assume we have already INSERTED n keys into the bloom filter Further, we have just called MEMBER(k) for some key k not in S this will check whether B[hi(k)] = 1 for all j = 1, 2, . . . r

slide-85
SLIDE 85

What is the probability of an error?

Assume we have already INSERTED n keys into the bloom filter Further, we have just called MEMBER(k) for some key k not in S this will check whether B[hi(k)] = 1 for all j = 1, 2, . . . r This is the same as checking whether r randomly chosen bits of B all equal 1

slide-86
SLIDE 86

What is the probability of an error?

Assume we have already INSERTED n keys into the bloom filter Further, we have just called MEMBER(k) for some key k not in S this will check whether B[hi(k)] = 1 for all j = 1, 2, . . . r We will now show that there is only a small probability of this happening This is the same as checking whether r randomly chosen bits of B all equal 1

slide-87
SLIDE 87

What is the probability of an error?

Assume we have already INSERTED n keys into the bloom filter Further, we have just called MEMBER(k) for some key k not in S this will check whether B[hi(k)] = 1 for all j = 1, 2, . . . r We will now show that there is only a small probability of this happening As there are at most n keys in the filter, at most nr bits of B are set to 1 This is the same as checking whether r randomly chosen bits of B all equal 1

slide-88
SLIDE 88

What is the probability of an error?

Assume we have already INSERTED n keys into the bloom filter Further, we have just called MEMBER(k) for some key k not in S this will check whether B[hi(k)] = 1 for all j = 1, 2, . . . r We will now show that there is only a small probability of this happening As there are at most n keys in the filter, at most nr bits of B are set to 1 (each INSERT sets at most r bits to 1) This is the same as checking whether r randomly chosen bits of B all equal 1

slide-89
SLIDE 89

What is the probability of an error?

Assume we have already INSERTED n keys into the bloom filter Further, we have just called MEMBER(k) for some key k not in S this will check whether B[hi(k)] = 1 for all j = 1, 2, . . . r We will now show that there is only a small probability of this happening As there are at most n keys in the filter, at most nr bits of B are set to 1 (each INSERT sets at most r bits to 1) This is the same as checking whether r randomly chosen bits of B all equal 1

B

1 1 1 1 1 1 1 1

m

slide-90
SLIDE 90

What is the probability of an error?

Assume we have already INSERTED n keys into the bloom filter Further, we have just called MEMBER(k) for some key k not in S this will check whether B[hi(k)] = 1 for all j = 1, 2, . . . r We will now show that there is only a small probability of this happening As there are at most n keys in the filter, at most nr bits of B are set to 1 (each INSERT sets at most r bits to 1) So the fraction of bits set to 1 is at most nr

m

This is the same as checking whether r randomly chosen bits of B all equal 1

B

1 1 1 1 1 1 1 1

m

slide-91
SLIDE 91

What is the probability of an error?

Assume we have already INSERTED n keys into the bloom filter Further, we have just called MEMBER(k) for some key k not in S this will check whether B[hi(k)] = 1 for all j = 1, 2, . . . r We will now show that there is only a small probability of this happening As there are at most n keys in the filter, at most nr bits of B are set to 1 (each INSERT sets at most r bits to 1) So the fraction of bits set to 1 is at most nr

m

so the probability that a randomly chosen bit is 1 is at most nr

m

This is the same as checking whether r randomly chosen bits of B all equal 1

B

1 1 1 1 1 1 1 1

m

slide-92
SLIDE 92

What is the probability of an error?

Assume we have already INSERTED n keys into the bloom filter Further, we have just called MEMBER(k) for some key k not in S this will check whether B[hi(k)] = 1 for all j = 1, 2, . . . r We will now show that there is only a small probability of this happening As there are at most n keys in the filter, at most nr bits of B are set to 1 (each INSERT sets at most r bits to 1) So the fraction of bits set to 1 is at most nr

m

so the probability that a randomly chosen bit is 1 is at most nr

m

This is the same as checking whether r randomly chosen bits of B all equal 1

B

1 1 1 1 1 1 1 1

m

slide-93
SLIDE 93

What is the probability of an error?

Assume we have already INSERTED n keys into the bloom filter Further, we have just called MEMBER(k) for some key k not in S this will check whether B[hi(k)] = 1 for all j = 1, 2, . . . r We will now show that there is only a small probability of this happening As there are at most n keys in the filter, at most nr bits of B are set to 1 (each INSERT sets at most r bits to 1) So the fraction of bits set to 1 is at most nr

m

so the probability that a randomly chosen bit is 1 is at most nr

m

so the probability that r randomly chosen bits all equal 1 is at most nr

m

r

This is the same as checking whether r randomly chosen bits of B all equal 1

B

1 1 1 1 1 1 1 1

m

slide-94
SLIDE 94

What is the probability of an error?

Assume we have already INSERTED n keys into the bloom filter Further, we have just called MEMBER(k) for some key k not in S this will check whether B[hi(k)] = 1 for all j = 1, 2, . . . r We will now show that there is only a small probability of this happening As there are at most n keys in the filter, at most nr bits of B are set to 1 (each INSERT sets at most r bits to 1) So the fraction of bits set to 1 is at most nr

m

so the probability that a randomly chosen bit is 1 is at most nr

m

so the probability that r randomly chosen bits all equal 1 is at most nr

m

r

This is the same as checking whether r randomly chosen bits of B all equal 1

B

1 1 1 1 1 1 1 1

m

(do this independently r times)

slide-95
SLIDE 95

What is the probability of a collision?

We now choose r to minimise this probability. . .

slide-96
SLIDE 96

What is the probability of a collision?

We now choose r to minimise this probability. . . By differentiating, we can find that nr

m

r

letting r = m/(ne) where e = 2.7813 . . . is minimised by

slide-97
SLIDE 97

What is the probability of a collision?

We now choose r to minimise this probability. . . By differentiating, we can find that nr

m

r

letting r = m/(ne) where e = 2.7813 . . . If we plug this in we get that,

1

e

m

ne ≈ (0.69) m n

the probability of failure, is at most is minimised by

slide-98
SLIDE 98

What is the probability of a collision?

We now choose r to minimise this probability. . . By differentiating, we can find that nr

m

r

letting r = m/(ne) where e = 2.7813 . . . If we plug this in we get that,

1

e

m

ne ≈ (0.69) m n

the probability of failure, is at most In particular to achieve a 1% failure probability, we can set m ≈ 12.52n bits is minimised by

slide-99
SLIDE 99

What is the probability of a collision?

We now choose r to minimise this probability. . . By differentiating, we can find that nr

m

r

letting r = m/(ne) where e = 2.7813 . . . If we plug this in we get that,

1

e

m

ne ≈ (0.69) m n

the probability of failure, is at most In particular to achieve a 1% failure probability, we can set m ≈ 12.52n bits is minimised by neither the space nor the failure probability depend on |U|

slide-100
SLIDE 100

What is the probability of a collision?

We now choose r to minimise this probability. . . By differentiating, we can find that nr

m

r

letting r = m/(ne) where e = 2.7813 . . . If we plug this in we get that,

1

e

m

ne ≈ (0.69) m n

the probability of failure, is at most In particular to achieve a 1% failure probability, we can set m ≈ 12.52n bits is minimised by neither the space nor the failure probability depend on |U| if we wanted a better probability, we could use more space

slide-101
SLIDE 101

What is the probability of a collision?

We now choose r to minimise this probability. . . By differentiating, we can find that nr

m

r

letting r = m/(ne) where e = 2.7813 . . . If we plug this in we get that,

1

e

m

ne ≈ (0.69) m n

the probability of failure, is at most In particular to achieve a 1% failure probability, we can set m ≈ 12.52n bits This is much better than the 100n bits we needed with a single hash function to achieve the same probability is minimised by neither the space nor the failure probability depend on |U| if we wanted a better probability, we could use more space

slide-102
SLIDE 102

Bloom filter summary

In a bloom filter, the MEMBER(k) operation A Bloom filter is a randomised data structure for storing a set S which supports two operations, each in O(1) time always returns ‘yes’ if k ∈ S however, if k is not in S there is a small chance, ǫ, that it will still say ‘yes’ when storing up to n keys The INSERT(k) operation inserts the key k from U into S (it never does this incorrectly) We have seen that if ǫ = 0.01 (1%) the the space used is m ≈ 12.52n bits By impoving the analysis, one can show that only ≈ 1.44 log2(1/ǫ) bits are needed (≈ 9.57n bits when ǫ = 0.01)

slide-103
SLIDE 103

Practical hash functions

We made the unrealistic assumption that each hash function hi maps a key k to a uniformly random integer between 1 and m.

slide-104
SLIDE 104

Practical hash functions

We made the unrealistic assumption that each hash function hi maps a key k to a uniformly random integer between 1 and m. In practice, we pick each hash function hi randomly from a fixed set of hash functions.

slide-105
SLIDE 105

Practical hash functions

  • 1. Pick a prime number p > |U|.
  • 2. Pick random integers a ∈ {1, . . . , p − 1}, b ∈ {0, . . . , p − 1}.
  • 3. Let hi be defined by hi(k) = 1 + ((ak + b) mod p) mod m.

We made the unrealistic assumption that each hash function hi maps a key k to a uniformly random integer between 1 and m. One way of doing this for integer keys is the following: (see CLRS 11.3.3) In practice, we pick each hash function hi randomly from a fixed set of hash functions.

For each i:

slide-106
SLIDE 106

Practical hash functions

  • 1. Pick a prime number p > |U|.
  • 2. Pick random integers a ∈ {1, . . . , p − 1}, b ∈ {0, . . . , p − 1}.
  • 3. Let hi be defined by hi(k) = 1 + ((ak + b) mod p) mod m.

We made the unrealistic assumption that each hash function hi maps a key k to a uniformly random integer between 1 and m. One way of doing this for integer keys is the following: (see CLRS 11.3.3)

Some number theory can be used to prove that this set of hash functions is “pseudorandom” in some sense; however, technically they are not “random enough” for our analysis above to go through.

In practice, we pick each hash function hi randomly from a fixed set of hash functions.

For each i:

slide-107
SLIDE 107

Practical hash functions

  • 1. Pick a prime number p > |U|.
  • 2. Pick random integers a ∈ {1, . . . , p − 1}, b ∈ {0, . . . , p − 1}.
  • 3. Let hi be defined by hi(k) = 1 + ((ak + b) mod p) mod m.

We made the unrealistic assumption that each hash function hi maps a key k to a uniformly random integer between 1 and m. One way of doing this for integer keys is the following: (see CLRS 11.3.3)

Some number theory can be used to prove that this set of hash functions is “pseudorandom” in some sense; however, technically they are not “random enough” for our analysis above to go through.

Nevertheless, in practice hash functions like this are very effective. In practice, we pick each hash function hi randomly from a fixed set of hash functions.

For each i:

slide-108
SLIDE 108

Bloom filter summary

In a bloom filter, the MEMBER(k) operation A Bloom filter is a randomised data structure for storing a set S which supports two operations, each in O(1) time always returns ‘yes’ if k ∈ S however, if k is not in S there is a small chance, ǫ, that it will still say ‘yes’ when storing up to n keys The INSERT(k) operation inserts the key k from U into S (it never does this incorrectly) We have seen that if ǫ = 0.01 (1%) the the space used is m ≈ 12.52n bits By impoving the analysis, one can show that only ≈ 1.44 log2(1/ǫ) bits are needed (≈ 9.57n bits when ǫ = 0.01)