SLIDE 1
Lecture #2: Advanced hashing and concentration bounds
- Bloom filters
- Cuckoo hashing
- Load balancing
- Tail bounds
SLIDE 2 Bloom filters
Idea: For the sake of efficiency, sometime we allow our data structure to make mistakes Bloom filter: Hash table that has only false positives (may report that a key is present when it is not, but always reports a key that is present) Very simple and fast Example: Google Chrome uses a Bloom filter to maintain its list of potentially malicious web sites.
- Most queried keys are not in the table
- If a key is in the table, can check against a slower (errorless) hash table
Many applications in networking (see survey by Broder and Mitzenmacher)
SLIDE 3
Bloom filters
Data structure: Universe π±. Parameters π, π β₯ 1 Maintain an array π΅ of π bits; initially π΅ 0 = π΅ 1 = β― = π΅ π β 1 = 0 Choose π hash functions β1, β2, β¦ , βπ: π± β π (assume completely random functions for sake of analysis)
SLIDE 4
Bloom filters
Data structure: Universe π±. Parameters π, π β₯ 1 Maintain an array π΅ of π bits; initially π΅ 0 = π΅ 1 = β― = π΅ π β 1 = 0 Choose π hash functions β1, β2, β¦ , βπ: π± β π (assume completely random functions for sake of analysis) To add a key π¦ β π± to the dictionary π β π±, set bits π΅ β1 π¦ β 1, π΅ β2 π¦ β 1, β¦ , π΅ βπ π¦ β 1
SLIDE 5
Bloom filters
Data structure: Universe π±. Parameters π, π β₯ 1 Maintain an array π΅ of π bits; initially π΅ 0 = π΅ 1 = β― = π΅ π β 1 = 0 Choose π hash functions β1, β2, β¦ , βπ: π± β π (assume completely random functions for sake of analysis) To add a key π¦ β π± to the dictionary π β π±, set bits π΅ β1 π¦ β 1, π΅ β2 π¦ β 1, β¦ , π΅ βπ π¦ β 1 To answer a query: π β π ? Check whether π΅ βπ π¦ = 1 for all π = 1,2, β¦ , π If yes, answer Yes. If no, answer No.
SLIDE 6
Bloom filters
No false negatives: Clearly if π¦ β π, we return Yes. But there is some chance that other keys have caused the bits in positions β1 π¦ , β¦ , βπ(π¦) to be set even if π¦ β π.
SLIDE 7
Bloom filters
No false negatives: Clearly if π¦ β π, we return Yes. Let us assume that π = π. Compute β[π΅ β = 0] for some location β β [π]: π π, π = 1 β 1 π
ππ
β πβππ
π
But there is some chance that other keys have caused the bits in positions β1 π¦ , β¦ , βπ(π¦) to be set even if π¦ β π. Heuristic analysis:
(Here we use the approximation 1 β
1 π π
β πβ1 for π large enough.)
SLIDE 8
Bloom filters
No false negatives: Clearly if π¦ β π, we return Yes. Let us assume that π = π. Compute β[π΅ β = 0] for some location β β [π]: π π, π = 1 β 1 π
ππ
β πβππ
π
But there is some chance that other keys have caused the bits in positions β1 π¦ , β¦ , βπ(π¦) to be set even if π¦ β π. Heuristic analysis:
(Here we use the approximation 1 β
1 π π
β πβ1 for π large enough.)
If each location in π΅ is 0 with probability π(π, π), then a false positive for π¦ β π should happen with probability at most 1 β π π, π
π β 1 β πβππ π π
SLIDE 9
Bloom filters
Heuristic analysis: If each location in π΅ is 0 with probability π(π, π), then a false positive for π¦ β π should happen with probability at most 1 β π π, π
π β 1 β πβππ π π
SLIDE 10
Bloom filters
Heuristic analysis: If each location in π΅ is 0 with probability π(π, π), then a false positive for π¦ β π should happen with probability at most 1 β π π, π
π β 1 β πβππ π π
But the actual fraction of 0β²π‘ in the hash table is a random variable ππ,π with expectation π½ ππ,π = π π, π To get the analysis right, we need a concentration bound: Want to say that ππ,π is close to its expected value with high probability. [We will return to this in the 2nd half of the lecture]
SLIDE 11
Bloom filters
Heuristic analysis: If each location in π΅ is 0 with probability π(π, π), then a false positive for π¦ β π should happen with probability at most 1 β π π, π
π β 1 β πβππ π π
But the actual fraction of 0β²π‘ in the hash table is a random variable ππ,π with expectation π½ ππ,π = π π, π To get the analysis right, we need a concentration bound: Want to say that ππ,π is close to its expected value with high probability. [We will return to this in the 2nd half of the lecture] If the heuristic analysis is correct, it gives nice estimates: For instance, if π = 8π, then choosing the optimal value of π = 7 gives false positive rate about 2%.
SLIDE 12
Lecture #2: Advanced hashing and concentration bounds
- Bloom filters
- Cuckoo hashing
- Load balancing
- Tail bounds
Cuckoo hashing is a hash scheme with worst-case constant lookup time. The name derives from the behavior of some species of cuckoo, where the cuckoo chick pushes the other eggs or young out of the nest when it hatches; analogously, inserting a new key into a cuckoo hashing table may push an older key to a different location in the table.
SLIDE 13
Cuckoo hashing
Idea: Simple hashing without errors Lookups are worst case π(1) time Deletions are worst case π(1) time Insertions are expected π(1) time Insertion time is π(1) with good probability [will require a concentration bound]
SLIDE 14 Cuckoo hashing
Data structure: Two tables π΅1 and π΅2 both of size π = π π Two hash functions β1, β2 βΆ π± β [π] (will assume hash functions are fully random) When an element π¦ β π is inserted, if either π΅1 β1 π¦
is empty, store π¦ there.
SLIDE 15 Cuckoo hashing
Data structure: Two tables π΅1 and π΅2 both of size π = π π Two hash functions β1, β2 βΆ π± β [π] (will assume hash functions are fully random) When an element π¦ β π is inserted, if either π΅1 β1 π¦
is empty, store π¦ there. Bump: Whenever an element π¨ is bumped from π΅π βπ π¨ , attempt to store it in the other location π΅π βπ π¨
(here π, π = 1,2 or 2,1 )
If both locations are occupied, then place π¦ in π΅1 β1 π¦ and bump the current occupant.
SLIDE 16 Cuckoo hashing
Data structure: Two tables π΅1 and π΅2 both of size π = π π Two hash functions β1, β2 βΆ π± β [π] (will assume hash functions are fully random) When an element π¦ β π is inserted, if either π΅1 β1 π¦
is empty, store π¦ there. Bump: Whenever an element π¨ is bumped from π΅π βπ π¨ , attempt to store it in the other location π΅π βπ π¨
(here π, π = 1,2 or 2,1 )
Abort: After 6 log π consecutive bumps, stop the process and build a fresh hash table using new random hash functions β1, β2. If both locations are occupied, then place π¦ in π΅1 β1 π¦ and bump the current occupant.
SLIDE 17
Cuckoo hashing
Arrows represent the alternate location for each key. If we insert an item at the location of π΅, it will get bumped, thereby bumping πΆ, and then we are done. Cycles are possible (where the insertion process never completes). Whatβs an example? Alternately (as in the picture), we can use a single table with 2π entries and two hash functions β1, β2: π± β 2π (with the same βbumpingβ algorithm)
SLIDE 18
Cuckoo hashing
Data structure: Two tables π΅1 and π΅2 both of size π = π π Two hash functions β1, β2 βΆ π± β [π] (will assume hash functions are fully random) Theorem: Expected time to perform an insert operation is π(1) if π β₯ 4π.
SLIDE 19
Cuckoo hashing
Data structure: Two tables π΅1 and π΅2 both of size π = π π Two hash functions β1, β2 βΆ π± β [π] (will assume hash functions are fully random) Theorem: Expected time to perform an insert operation is π(1) if π β₯ 4π. Pretty goodβ¦ but only 25% memory utilization. Can actually get about 50% memory utilization. Experimentally, with 3 hash functions instead of 2, can get β 90% utilization, but it is an open question to provide tight analyses for π hash functions when π β₯ 3.
SLIDE 20
Lecture #2: Advanced hashing and concentration bounds
- Bloom filters
- Cuckoo hashing
- Load balancing
- Tail bounds
SLIDE 21
Load balancing
Suppose we have π jobs to assign to π servers. Clearly we could achieve a load of one job/server, but this might result in an expensive/hard-to-parallelize allocation rule.
SLIDE 22
Load balancing
Suppose we have π jobs to assign to π servers. Clearly we could achieve a load of one job/server, but this might result in an expensive/hard-to-parallelize allocation rule. We could hash the balls into bins. Letβs again consider the case of a uniformly random hash function β βΆ π β π
SLIDE 23
Load balancing
Suppose we have π jobs to assign to π servers. Clearly we could achieve a load of one job/server, but this might result in an expensive/hard-to-parallelize allocation rule. We could hash the balls into bins. Letβs again consider the case of a uniformly random hash function β βΆ π β π Claim: The max-loaded server has < 8 log π / log log π jobs with probability at least 1 β 1/π
SLIDE 24
Load balancing
Suppose we have π jobs to assign to π servers. Clearly we could achieve a load of one job/server, but this might result in an expensive/hard-to-parallelize allocation rule. We could hash the balls into bins. Letβs again consider the case of a uniformly random hash function β βΆ π β π Claim: The max-loaded server has < 8 log π / log log π jobs with probability at least 1 β 1/π Proof: Probability that a fixed server π β {1,2, β¦ , π} gets at least π jobs is at most π π 1 π
π
β€ ππ π! β
1 ππ β€ 1 π! β€ πβπ
2
SLIDE 25
Load balancing
Suppose we have π jobs to assign to π servers. Clearly we could achieve a load of one job/server, but this might result in an expensive/hard-to-parallelize allocation rule. We could hash the balls into bins. Letβs again consider the case of a uniformly random hash function β βΆ π β π Claim: The max-loaded server has < 8 log π / log log π jobs with probability at least 1 β 1/π Proof: Probability that a fixed server π β {1,2, β¦ , π} gets at least π jobs is at most π π 1 π
π
β€ ππ π! β
1 ππ β€ 1 π! β€ πβπ
2
If we choose π =
8 log π log log π this is at most 1/π2
Explanation: π
π 2 β₯
log π
4 log π log log π β₯ 22 log π = π2
SLIDE 26
Load balancing
Suppose we have π jobs to assign to π servers. Clearly we could achieve a load of one job/server, but this might result in an expensive/hard-to-parallelize allocation rule. We could hash the balls into bins. Letβs again consider the case of a uniformly random hash function β βΆ π β π Claim: The max-loaded server has < 8 log π / log log π jobs with probability at least 1 β 1/π Proof: Probability that a fixed server π β {1,2, β¦ , π} gets at least π jobs is at most π π 1 π
π
β€ ππ π! β
1 ππ β€ 1 π! β€ πβπ
2
If we choose π =
8 log π log log π this is at most 1/π2
Now a union bound shows that the probability of any server getting at least π jobs is at most 1/π.
SLIDE 27
Concentration bounds
Claim: The max-loaded server has < 8 log π / log log π jobs with probability at least 1 β 1/π Proof: Probability that a fixed server π β {1,2, β¦ , π} gets at least π jobs is at most π π 1 π
π
β€ ππ π! β
1 ππ β€ 1 π! β€ πβπ
2
If we choose π =
8 log π log log π this is at most 1/π2
Now a union bound shows that the probability of any server getting at least π jobs is at most 1/π. This is an example of a concentration bound. Let ππ be the number of jobs assigned to the πth server. By linearity of expectation, π½ ππ = Οπ=1
π
β job π β server π = π β
1/π = 1.
SLIDE 28
Concentration bounds
Claim: The max-loaded server has < 8 log π / log log π jobs with probability at least 1 β 1/π Proof: Probability that a fixed server π β {1,2, β¦ , π} gets at least π jobs is at most π π 1 π
π
β€ ππ π! β
1 ππ β€ 1 π! β€ πβπ
2
If we choose π =
8 log π log log π this is at most 1/π2
Now a union bound shows that the probability of any server getting at least π jobs is at most 1/π. This is an example of a concentration bound. Let ππ be the number of jobs assigned to the πth server. By linearity of expectation, π½ ππ = Οπ=1
π
β job π β server π = π β
1/π = 1. We showed that β ππ β₯
8 log π log log π β€ 1 π2 and then took a union bound over all π servers.
SLIDE 29
Concentration bounds
This is an example of a concentration bound. Let ππ be the number of jobs assigned to the πth server. By linearity of expectation, π½ ππ = Οπ=1
π
β job π β server π = π β
1/π = 1. We showed that β ππ β₯
8 log π log log π β€ 1 π2 and then took a union bound over all π servers.
This is a common analysis technique: If a random variable (like ππ) depends in a βsmoothβ way on the outcome of many independent events, then it is likely not too far from its expectation.
SLIDE 30
Concentration bounds
Let ππ be the number of jobs assigned to the πth server. By linearity of expectation, π½ ππ = Οπ=1
π
β job π β server π = π β
1/π = 1. We showed that β ππ β₯
8 log π log log π β€ 1 π2 and then took a union bound over all π servers.
This is a common analysis technique: If a random variable (like ππ) depends in a βsmoothβ way on the outcome of many independent events, then it is likely not too far from its expectation. βSmoothβ in this case means that the outcome of any decision (where to put job π) does not affect the value of ππ by too much (only by 1). This is an example of a concentration bound.
SLIDE 31
EXERCISE
Is it concentrated? [why or why not?] #1: Choose a uniformly random vector π β βπ with π = π1
2 + π2 2 + β― + ππ 2 = 1
What is π½[π1
2] ?
What is the typical value of the maximum: max π1 , π2 , β¦ , ππ ? #2 Rich get richer: Suppose we have π people. Everyone starts with 1 dollar.
We assign π2 more dollars in rounds. πth round: If person π already has ππ dollars, we give them the πth dollar with probability ππ π β 1 i.e., with probability to the proportional the amount of money they already have. Let ππ be the amount of money person π ends up with. What is the typical value of π1? Is π1 concentrated? What is the typical value of max(π1, π2, β¦ , ππ)? Is it concentrated?
SLIDE 32
Lecture #2: Advanced hashing and concentration bounds
- Bloom filters
- Cuckoo hashing
- Load balancing
- Tail bounds
SLIDE 33
Markovβs inequality
The more you know: The more information we have about a random variable, the stronger the concentration we can prove.
SLIDE 34
Markovβs inequality
The more you know: The more information we have about a random variable, the stronger the concentration we can prove. The most basic concentration bound is Markovβs inequality. It requires knowing only the expected value: If π is a non-negative random variable, then for any π β₯ 1, β π β₯ π β€ π½ π π Proof? (itβs written there)
SLIDE 35
Markovβs inequality
The more you know: The more information we have about a random variable, the stronger the concentration we can prove. The most basic concentration bound is Markovβs inequality. It requires knowing only the expected value: If π is a non-negative random variable, then for any π β₯ 1, β π β₯ π β€ π½ π π Proof? (itβs written there) Example: If your expected revenue is $10,000, then the probability to make $1,000 is at most 1/10.
SLIDE 36
EXERCISE
Markovβs inequality: If π is a non-negative random variable, then for any π β₯ 1, β π β₯ π β€ π½ π π A permutation is an invertible mapping π βΆ 1,2, β¦ , π β {1,2, β¦ , π} A number π is called a fixed point of π if π π = π. Exercise: Prove that if π is a uniformly random permutation, then β π has more than π fixed points β€ 1 π
SLIDE 37
Chebyshevβs inequality
Recall that the variance of a random variable π is the value var π = π2 = π½ π β π½π 2
SLIDE 38
Chebyshevβs inequality
Recall that the variance of a random variable π is the value var π = π2 = π½ π β π½π 2 Chebyshevβs inequality: If π is a random variable with var π = π2, then for any π > 0, β π β π½π β₯ ππ β€ 1 π2
SLIDE 39
Chebyshevβs inequality
Recall that the variance of a random variable π is the value var π = π2 = π½ π β π½π 2 Chebyshevβs inequality: If π is a random variable with var π = π2, then for any π > 0, β π β π½π β₯ ππ β€ 1 π2 Proof: Apply Markovβs inequality to the random variable π = π β π½π 2
SLIDE 40
Chebyshevβs inequality
Recall that the variance of a random variable π is the value var π = π2 = π½ π β π½π 2 Chebyshevβs inequality: If π is a random variable with var π = π2, then for any π > 0, β π β π½π β₯ ππ β€ 1 π2 Proof: Apply Markovβs inequality to the random variable π = π β π½π 2 Application: Suppose we map π balls into π bins using a 2-universal hash family β. Then with probability at least 1/2, the maximum load is at most π π .
SLIDE 41
Chebyshevβs inequality
Application: Suppose we map π balls into π bins using a 2-universal hash family β. Then with probability at least 1/2, the maximum load is at most π π . Let ππ be the load of bin π. Let πππ be the indicator random variable such that πππ = 1 β πth bin gets the πth ball. Note that π½ πππ = 1/π for each π = 1, β¦ , π.
SLIDE 42
Chebyshevβs inequality
Application: Suppose we map π balls into π bins using a 2-universal hash family β. Then with probability at least 1/2, the maximum load is at most π π . Let ππ be the load of bin π. Let πππ be the indicator random variable such that πππ = 1 β πth bin gets the πth ball. Note that π½ πππ = 1/π for each π = 1, β¦ , π. Exercise: For any random variable π, var π = π½ π2 β π½π 2
SLIDE 43
Chebyshevβs inequality
Application: Suppose we map π balls into π bins using a 2-universal hash family β. Then with probability at least 1/2, the maximum load is at most π π . Let ππ be the load of bin π. Let πππ be the indicator random variable such that πππ = 1 β πth bin gets the πth ball. Note that π½ πππ = 1/π for each π = 1, β¦ , π. Exercise: For any random variable π, var π = π½ π2 β π½π 2 So write: var ππ = π½
ππ1 + β― + πππ 2 β 1 We have π½ πππ
2 = π½ πππ = 1/π and π½ ππππππ = β β π = β π = π β€ 1/π2
using the 2-universal property, so var ππ β€ π β
1 π + π π β 1 π2 β 1 = 1 β 1 π β€ 1
SLIDE 44
Chebyshevβs inequality var ππ β€ π β
1 π + π π β 1 π2 β 1 = 1 β 1 π β€ 1
Chebyshevβs inequality: If π is a random variable with var π = π2, then for any π > 0, β π β π½π β₯ ππ β€ 1 π2
SLIDE 45
Chebyshevβs inequality var ππ β€ π β
1 π + π π β 1 π2 β 1 = 1 β 1 π β€ 1
Chebyshevβs inequality: If π is a random variable with var π = π2, then for any π > 0, β π β π½π β₯ ππ β€ 1 π2 Apply Chebyshevβs inequality to ππ, yielding β ππ β 1 β₯ π β€ 1 π2
SLIDE 46
Chebyshevβs inequality var ππ β€ π β
1 π + π π β 1 π2 β 1 = 1 β 1 π β€ 1
Chebyshevβs inequality: If π is a random variable with var π = π2, then for any π > 0, β π β π½π β₯ ππ β€ 1 π2 Apply Chebyshevβs inequality to ππ, yielding β ππ β 1 β₯ π β€ 1 π2 Thus β ππ β 1 β₯ 2π β€ 1
2π, so a union bound yields
β max π1, β¦ , ππ β₯ 2π + 1 β€ 1 2
SLIDE 47
EXERCISE
Let π be the actual percentage of the population the prefers candidate #1 and let ΖΈ π = π1 + β― + ππ /π denote the empirical mean. Exercise: Prove that if we want π β ΖΈ π β€ π to hold with 99% probability, then we need only sample π = π(1/π2) voters. Chebyshevβs inequality: If π is a random variable with var π = π2, then for any π > 0, β π β π½π β₯ ππ β€ 1 π2 Suppose we choose π independent random voters and ask them whether they prefer candidate #1 over candidate #2. We see outcomes π1, π2, β¦ , ππ β 0,1 .
SLIDE 48
Sums of independent random variables
Hoeffdingβs inequality: Let π1, β¦ , ππ be a sequence of independent random variables where, for each 1 β€ π β€ π, we have ππ β€ ππ β€ ππ. Let π = (π1 + β― + ππ)/π. Then:
β π β π½π β₯ π β€ 2 π
β 2π2π2 Οπ=1
π
ππβππ 2
SLIDE 49
Sums of independent random variables
Hoeffdingβs inequality: Let π1, β¦ , ππ be a sequence of independent random variables where, for each 1 β€ π β€ π, we have ππ β€ ππ β€ ππ. Let π = (π1 + β― + ππ)/π. Then:
β π β π½π β₯ π β€ 2 π
β 2π2π2 Οπ=1
π
ππβππ 2
Suppose we wanted our poll from the previous slide to be correct with probability at least 1 β π. Chebyshevβs inequality would tell us we need at most π
1 π2 π samples.
SLIDE 50
Sums of independent random variables
Hoeffdingβs inequality: Let π1, β¦ , ππ be a sequence of independent random variables where, for each 1 β€ π β€ π, we have ππ β€ ππ β€ ππ. Let π = (π1 + β― + ππ)/π. Then:
β π β π½π β₯ π β€ 2 π
β 2π2π2 Οπ=1
π
ππβππ 2
Suppose we wanted our poll from the previous slide to be correct with probability at least 1 β π. Chebyshevβs inequality would tell us we need at most π
1 π2 π samples.
Setting ππ = 0, ππ = 1, and π = π in Hoeffdingβs inequality gives β ΖΈ π β π β₯ π β€ 2πβ2π2π so we only need π β€ π
log 1
π
π2
samples.
SLIDE 51
Sums of independent random variables
Chernoff bound (multiplicative): Let π1, β¦ , ππ be a sequence of independent 0,1 -valued random variables. Let ππ = π½[ππ], π = π1 + π2 + β― + ππ, π = π½[π]. Then for every πΎ β₯ 1:
β π β₯ πΎπ β€ ππΎβ1 πΎπΎ
π
β π β€ π/πΎ β€ π
1 πΎβ1
πΎπΎ
π
SLIDE 52
Sums of independent random variables
Chernoff bound (multiplicative): Let π1, β¦ , ππ be a sequence of independent 0,1 -valued random variables. Let ππ = π½[ππ], π = π1 + π2 + β― + ππ, π = π½[π]. Then for every πΎ β₯ 1: π balls thrown randomly into π bins. ππ = 1 if ith ball ends up in first bin and ππ = 0 otherwise. Then π = # of balls in first bin. As we calculated earlier, π½ π = 1 For πΎ β
log π log log π, the Chernoff bound gives β π β₯ πΎ β€ 1/π2
Reproduce balls in bins:
β π β₯ πΎπ β€ ππΎβ1 πΎπΎ
π
β π β€ π/πΎ β€ π
1 πΎβ1
πΎπΎ
π
SLIDE 53
Sums of independent random variables
Chernoff bound (multiplicative): Let π1, β¦ , ππ be a sequence of independent 0,1 -valued random variables. Let ππ = π½[ππ], π = π1 + π2 + β― + ππ, π = π½[π]. Then for every πΎ β₯ 1:
β π β₯ πΎπ β€ ππΎβ1 πΎπΎ
π
β π β€ π/πΎ β€ π
1 πΎβ1
πΎπΎ
π
π balls thrown randomly into π bins. ππ = 1 if ith ball ends up in first bin and ππ = 0 otherwise. Then π = # of balls in first bin. As we calculated earlier, π½ π = 1 For πΎ β
log π log log π, the Chernoff bound gives β π β₯ πΎ β€ 1/π2
Reproduce balls in bins: This type of analysis works for much more complicated kinds of events (see homework #2)
SLIDE 54
(return to) Bloom filters
Heuristic analysis: If each location in π΅ is 0 with probability π(π, π), then a false positive for π¦ β π should happen with probability at most 1 β π π, π
π β 1 β πβππ π π
But the actual fraction of 0β²π‘ in the hash table is a random variable ππ,π with expectation π½ ππ,π = π π, π To get the analysis right, we need a concentration bound: Want to say that ππ,π is close to its expected value with high probability. Letβs analyze!
SLIDE 55
(return to) Bloom filters
We have an array with π bits and to hash an element π¦ β π±, we set the bits in positions β1 π¦ , β2 π¦ , β¦ , βπ π¦ to 1.
SLIDE 56
(return to) Bloom filters
Let π be the # of 0βs in the hash table after π elements are hashed. We have an array with π bits and to hash an element π¦ β π±, we set the bits in positions β1 π¦ , β2 π¦ , β¦ , βπ π¦ to 1.
SLIDE 57
(return to) Bloom filters
Let π be the # of 0βs in the hash table after π elements are hashed. Consider the π elements to hash: π¦1, π¦2, β¦ , π¦π We have an array with π bits and to hash an element π¦ β π±, we set the bits in positions β1 π¦ , β2 π¦ , β¦ , βπ π¦ to 1.
SLIDE 58
(return to) Bloom filters
Let π be the # of 0βs in the hash table after π elements are hashed. Consider the π elements to hash: π¦1, π¦2, β¦ , π¦π Let πΌ π¦π = β1 π¦π , β¦ , βπ π¦π We have an array with π bits and to hash an element π¦ β π±, we set the bits in positions β1 π¦ , β2 π¦ , β¦ , βπ π¦ to 1.
SLIDE 59
(return to) Bloom filters
Let π be the # of 0βs in the hash table after π elements are hashed. Consider the π elements to hash: π¦1, π¦2, β¦ , π¦π Let πΌ π¦π = β1 π¦π , β¦ , βπ π¦π Define π
π = π½ π
πΌ π¦1 , πΌ π¦2 , β¦ , πΌ π¦π to be the expected # of 0βs in the hash table after hashing the first π elements. We have an array with π bits and to hash an element π¦ β π±, we set the bits in positions β1 π¦ , β2 π¦ , β¦ , βπ π¦ to 1.
SLIDE 60
(return to) Bloom filters
Let π be the # of 0βs in the hash table after π elements are hashed. Consider the π elements to hash: π¦1, π¦2, β¦ , π¦π Let πΌ π¦π = β1 π¦π , β¦ , βπ π¦π Define π
π = π½ π
πΌ π¦1 , πΌ π¦2 , β¦ , πΌ π¦π to be the expected # of 0βs in the hash table after hashing the first π elements. Note that π¦1, β¦ , π¦π are any set of keys. The randomness here is all in the choice of the hash functions β1, β¦ , βπ. We have an array with π bits and to hash an element π¦ β π±, we set the bits in positions β1 π¦ , β2 π¦ , β¦ , βπ π¦ to 1.
SLIDE 61
(return to) Bloom filters
Let π be the # of 0βs in the hash table after π elements are hashed. Consider the π elements to hash: π¦1, π¦2, β¦ , π¦π πΌ π¦π = β1 π¦π , β¦ , βπ π¦π π
π = π½ π
πΌ π¦1 , πΌ π¦2 , β¦ , πΌ π¦π We calculated before that π0 = π½ π = π 1 β 1
π ππ
[why?]
SLIDE 62
(return to) Bloom filters
Let π be the # of 0βs in the hash table after π elements are hashed. Consider the π elements to hash: π¦1, π¦2, β¦ , π¦π πΌ π¦π = β1 π¦π , β¦ , βπ π¦π π
π = π½ π
πΌ π¦1 , πΌ π¦2 , β¦ , πΌ π¦π We calculated before that π0 = π½ π = π 1 β 1
π ππ
[why?] Now we want to know the probability that π is much different from its expectation π0.
SLIDE 63
(return to) Bloom filters
Let π be the # of 0βs in the hash table after π elements are hashed. Consider the π elements to hash: π¦1, π¦2, β¦ , π¦π πΌ π¦π = β1 π¦π , β¦ , βπ π¦π π
π = π½ π
πΌ π¦1 , πΌ π¦2 , β¦ , πΌ π¦π We calculated before that π0 = π½ π = π 1 β 1
π ππ
[why?] Now we want to know the probability that π is much different from its expectation π0. Claim #1: π
π+1 β π π β€ π
for all π = 1,2, β¦ , π β 1
SLIDE 64
(return to) Bloom filters
Let π be the # of 0βs in the hash table after π elements are hashed. Consider the π elements to hash: π¦1, π¦2, β¦ , π¦π πΌ π¦π = β1 π¦π , β¦ , βπ π¦π π
π = π½ π
πΌ π¦1 , πΌ π¦2 , β¦ , πΌ π¦π We calculated before that π0 = π½ π = π 1 β 1
π ππ
[why?] Now we want to know the probability that π is much different from its expectation π0. Claim #1: π
π+1 β π π β€ π
for all π = 1,2, β¦ , π β 1 Claim #2: π½ π
π+1
πΌ π¦1 , β¦ , πΌ π¦π = π
π
for all π = 1,2, β¦ , π β 1
SLIDE 65
(return to) Bloom filters
Let π be the # of 0βs in the hash table after π elements are hashed. Consider the π elements to hash: π¦1, π¦2, β¦ , π¦π πΌ π¦π = β1 π¦π , β¦ , βπ π¦π π
π = π½ π
πΌ π¦1 , πΌ π¦2 , β¦ , πΌ π¦π We calculated before that π0 = π½ π = π 1 β 1
π ππ
[why?] Now we want to know the probability that π is much different from its expectation π0. Claim #1: π
π+1 β π π β€ π
for all π = 1,2, β¦ , π β 1 Claim #2: π½ π
π+1
πΌ π¦1 , β¦ , πΌ π¦π = π
π
for all π = 1,2, β¦ , π β 1 Such a sequence of random variables is called a martingale
SLIDE 66
Azumaβs inequality
Suppose that {π0, π1, β¦ , ππ} is a martingale such that for some constants {π
π},
π
π+1 β π π β€ π π for all π = 0,1, β¦ , π β 1. Then for any π > 0,
β ππ β π0 β₯ π β€ 2 exp β π2 2 π1
2 + β― + ππ 2
SLIDE 67
Azumaβs inequality
Suppose that {π0, π1, β¦ , ππ} is a martingale such that for some constants {π
π},
π
π+1 β π π β€ π π for all π = 0,1, β¦ , π β 1. Then for any π > 0,
β ππ β π0 β₯ π β€ 2 exp β π2 2 π1
2 + β― + ππ 2
For our problem: π1 = π2 = β― = ππ = π So the probability that the # of 0βs differs from its expectation by more than π is at most 2 exp(βπ2/2π2π) So the deviation is β π π and is tightly concentrated in this window.
SLIDE 68
(take home) EXERCISE
Suppose that {π0, π1, β¦ , ππ} is a martingale such that for some constants {π
π},
π
π+1 β π π β€ π π for all π = 0,1, β¦ , π β 1. Then for any π > 0,
β ππ β π0 β₯ π β€ 2 exp β π2 2 π1
2 + β― + ππ 2
For our problem: π1 = π2 = β― = ππ = π So the probability that the # of 0βs differs from its expectation by more than π is at most 2 exp(βπ2/2π2π) So the deviation is β π π and is tightly concentrated in this window. Improve the error probability to 2 exp(βπ2/2ππ) using a different martingale.