model relax optimize lookup false positive rates = f(p 0 , p 1 …) cost 0 < p 2 < 1 0 < p 1 < 1 memory = f(p 0 , p 1 …) in terms of p 0 , p 1 footprint 0 < p 0 < 1
Bloom filters … p 2 false positive p 1 rates p 0 lookup = ∑ p i cost
memory Bloom filters footprint … p 2 false positive p 1 bits - ln(2) 2 entries rates false = e p 0 positive rate lookup = ∑ p i cost
memory Bloom filters footprint … p 2 false positive p 1 false ln ( ) rates positive rate p 0 bits = - entries ln(2) 2 lookup = ∑ p i cost
memory Bloom filters footprint … … p 2 bits(p 2 , N/T 2 ) false positive p 1 bits(p 1 , N/T ) rates p 0 bits(p 0 , N ) lookup = ∑ p i cost
memory Bloom filters footprint … … p 2 bits(p 2 , N/T 2 ) false positive p 1 bits(p 1 , N/T) rates p 0 bits(p 0 , N) false positive rates ∑ ln( p i ) lookup c · N · = ∑ p i memory = - cost T i size ratio constant entries
Bloom filters … p 2 optimize false positive p 1 rates p 0 ∑ ln( p i ) lookup c · N · = ∑ p i memory = - cost T i
Monkey Bloom filters … e x p o p 0 /T 2 d n e e c false n r e t i a a positive p 0 /T s l e rates p 0
State-of-the-Art Monkey Bloom filters Bloom filters … e x p p o p 0 /T 2 d n e s e c a false n r m e t i a a e p positive p 0 /T s l e rates p 0 p
State-of-the-Art Monkey Bloom filters Bloom filters … … … < p 0 /T 2 < p false positive p p 0 /T < rates > p 0 p = ∑ p i = ∑ p lookup cost <
State-of-the-Art Monkey Bloom filters Bloom filters … … … < p 0 /T 2 < p false positive p p 0 /T < rates > p 0 p = ∑ p i = ∑ p lookup cost < = O( log( N ) · e - M/N ) = O( e -M/N ) N | number of entries M | overall memory for Bloom filters
State-of-the-Art Monkey Bloom filters Bloom filters … … … < p 0 /T 2 < p false positive p p 0 /T < rates > p 0 p = ∑ p i = ∑ p lookup cost < = O( log( N ) · e - M/N ) = O( e -M/N ) asymptotic win N | number of entries M | overall memory for Bloom filters lookup cost increases at slower rate as data grows
Monkey Bloom filters … convergent p 0 /T 2 geometric false series positive p 0 /T rates p 0
Monkey Bloom filters … p 0 /T 2 false positive p 0 /T rates p 0 - ln( p i ) ∑ c · entries · memory = T i
Monkey Bloom filters … p 0 /T 2 false positive p 0 /T rates p 0 c · entries · - ln( lookup cost ) memory =
Monkey Bloom filters … p 0 /T 2 false positive p 0 /T rates p 0 c · entries · - ln( lookup cost ) memory = model lookups vs. memory trade-off
fixed memory existing lookup systems Problem 1: suboptimal filters allocation cost Problem 2: hard to tune update cost
fixed memory x existing lookup systems Problem 1: suboptimal filters allocation cost Problem 2: hard to tune Pareto frontier x update cost
x Bloom filters size lookups vs. memory lookup Problem 1: suboptimal filters allocation cost Problem 2: hard to tune x update cost
x merge policy greed lookup lookups vs. updates Problem 1: suboptimal filters allocation cost t m h r a o Problem 2: hard to tune x u g h p u t x update cost
M onkey: O ptimal N avigable Key -Value Store memory filters LSM-tree ad-hoc merge trade-offs policy observations: fixed false ? positive rates performance lookups updates log c lookup cost = ∑ p i lookup existing insights: cost Monkey sorted suboptimal array update cost optimize allocation updates vs. lookups steps: answer what-if asymptotically better design questions navigate memory vs. lookups
M onkey: O ptimal N avigable Key -Value Store memory filters LSM-tree ad-hoc merge trade-offs policy observations: fixed false ? positive rates performance lookups updates log c lookup cost = ∑ p i lookup existing insights: cost Monkey sorted suboptimal array update cost optimize allocation updates vs. lookups steps: answer what-if asymptotically better design questions navigate memory vs. lookups
Identify merge policy size ratio
Identify Map merge policy lookups size ratio updates
Identify Map merge policy sorted log LSM-tree lookups array size ratio updates
Identify Map merge policy log lookups sorted size ratio array updates
Identify Map Navigate merge policy workload hardware log lookups sorted size ratio array maximum updates optimal throughout
Merge Policies Leveling Tiering write-optimized read-optimized
Leveling Tiering read-optimized write-optimized
Leveling Tiering read-optimized write-optimized T runs per level
Leveling Tiering read-optimized write-optimized T runs per level merge & flush
Leveling Tiering read-optimized write-optimized T runs per level
Leveling Tiering read-optimized write-optimized T runs per level merge
Leveling Tiering read-optimized write-optimized T runs per level T times bigger flush
Leveling Tiering read-optimized write-optimized T runs per level T times bigger
Recommend
More recommend