Leveraging bloom filters on Redis Cristian Castiblanco - - PowerPoint PPT Presentation

leveraging bloom filters on redis cristian castiblanco
SMART_READER_LITE
LIVE PREVIEW

Leveraging bloom filters on Redis Cristian Castiblanco - - PowerPoint PPT Presentation

Leveraging bloom filters on Redis Cristian Castiblanco me@cristian.io | cristian@scopely.com https://cristian.io Stream processing at Scopely Stream processing at Scopely Idempotence An operation is said to be idempotent when applying it


slide-1
SLIDE 1

Leveraging bloom filters on Redis

slide-2
SLIDE 2

Cristian Castiblanco

me@cristian.io | cristian@scopely.com

https://cristian.io

slide-3
SLIDE 3

Stream processing at Scopely

slide-4
SLIDE 4

Stream processing at Scopely

slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7
slide-8
SLIDE 8
slide-9
SLIDE 9
slide-10
SLIDE 10
slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14
slide-15
SLIDE 15
slide-16
SLIDE 16

Idempotence

slide-17
SLIDE 17

An operation is said to be idempotent when applying it multiple times has the same effect.

slide-18
SLIDE 18

Simplest approach to idempotence

slide-19
SLIDE 19

Idempotence with Redis sets

slide-20
SLIDE 20

Idempotence with Redis sets

slide-21
SLIDE 21

Idempotence with Redis sets

slide-22
SLIDE 22

Idempotence with Redis sets

slide-23
SLIDE 23

Memory usage per idempotence store

320 million records/day ≈ 70GB of memory

slide-24
SLIDE 24

Is there a better way?

slide-25
SLIDE 25

Is there a better way?

  • Space-efficient
slide-26
SLIDE 26

Is there a better way?

  • Space-efficient
  • Cost-effective
slide-27
SLIDE 27

Is there a better way?

  • Space-efficient
  • Cost-effective
  • More performant
slide-28
SLIDE 28

Is there a better way?

  • Space-efficient
  • Cost-effective
  • More performant
  • Awesome
slide-29
SLIDE 29

Enter bloom filters

Probabilistic data structure to check for item membership

slide-30
SLIDE 30

Enter bloom filters

Probabilistic data structure to check for item membership

slide-31
SLIDE 31

Bloom filters query

slide-32
SLIDE 32

Bloom filters query

  • Definitely not in the set
slide-33
SLIDE 33

Bloom filters query

  • Definitely not in the set
  • Probably in the set
slide-34
SLIDE 34

Bloom filters query

  • Definitely not in the set
  • Probably in the set
  • Configurable error rate
slide-35
SLIDE 35

Bloom fiters space efficiency

Given 10.000.000 UUIDs...

slide-36
SLIDE 36

Bloom fiters space efficiency

Given 10.000.000 UUIDs...

  • Redis set: 1GB
slide-37
SLIDE 37

Bloom fiters space efficiency

Given 10.000.000 UUIDs...

  • Redis set: 1GB
  • Plain text: ~300 MB
slide-38
SLIDE 38

Bloom fiters space efficiency

Given 10.000.000 UUIDs...

  • Redis set: 1GB
  • Plain text: ~300 MB
  • gzip: ~150 MB
slide-39
SLIDE 39

Bloom fiters space efficiency

Given 10.000.000 UUIDs...

  • Redis set: 1GB
  • Plain text: ~300 MB
  • gzip: ~150 MB
  • Bloom filter with 1e-05 error rate: ~30MB

(i.e., 1 in a million)

slide-40
SLIDE 40

Bloom fiters space efficiency

Given 10.000.000 UUIDs...

  • Redis set: 1GB
  • Plain text: ~300 MB
  • gzip: ~150 MB
  • Bloom filter with 1e-05 error rate: ~30MB

(i.e., 1 in a million)

  • Bloom filter with 1e-11 error rate: ~60MB

(i.e., 1 in a million million)

slide-41
SLIDE 41

Memory usage comparison

Sets 70GB vs Bloom Filters 7GB

slide-42
SLIDE 42

Latency comparison

Redis sets Bloom filters

slide-43
SLIDE 43

Bloom filters example

slide-44
SLIDE 44
slide-45
SLIDE 45
slide-46
SLIDE 46
slide-47
SLIDE 47
slide-48
SLIDE 48
slide-49
SLIDE 49
slide-50
SLIDE 50
slide-51
SLIDE 51
slide-52
SLIDE 52
slide-53
SLIDE 53
slide-54
SLIDE 54
slide-55
SLIDE 55
slide-56
SLIDE 56
slide-57
SLIDE 57
slide-58
SLIDE 58
slide-59
SLIDE 59
slide-60
SLIDE 60

False positive == dropped data

slide-61
SLIDE 61

Bloom filters characteristics

  • Capacity
  • Error rate probability
slide-62
SLIDE 62

Scaling bloom filters

slide-63
SLIDE 63

Scaling bloom filters

slide-64
SLIDE 64

Scaling bloom filters

slide-65
SLIDE 65

Scaling bloom filters

slide-66
SLIDE 66

Scaling bloom filters

slide-67
SLIDE 67

Scaling bloom filters

slide-68
SLIDE 68

Scaling bloom filters

slide-69
SLIDE 69

Scaling bloom filters

slide-70
SLIDE 70

Tuning bloom filters

Size depends on capacity/error probability

slide-71
SLIDE 71

Tuning bloom filters

slide-72
SLIDE 72

Tuning bloom filters

  • False positive probability:
  • Depends on your use case
slide-73
SLIDE 73

Tuning bloom filters

  • False positive probability:
  • Depends on your use case
  • Initial capacity:
  • Can't be too generous
  • Can't be too conservative
slide-74
SLIDE 74

First attempt: LUA scripts

slide-75
SLIDE 75

Second attempt: bloomd

github.com/armon/bloomd

slide-76
SLIDE 76

bloomd drawbacks

slide-77
SLIDE 77

bloomd drawbacks

  • Lack of High Availability
slide-78
SLIDE 78

bloomd drawbacks

  • Lack of High Availability
  • No clustering support
slide-79
SLIDE 79

bloomd drawbacks

  • Lack of High Availability
  • No clustering support
  • Maintenance
slide-80
SLIDE 80

bloomd drawbacks

  • Lack of High Availability
  • No clustering support
  • Maintenance
  • Rigid API
slide-81
SLIDE 81

bloomd drawbacks

  • Lack of High Availability
  • No clustering support
  • Maintenance
  • Rigid API
  • Feels like abandonware
slide-82
SLIDE 82

ReBloom

Bloom filters as a Redis module

slide-83
SLIDE 83

ReBloom example

> BF.RESERVE your_filter 0.00001 50000000 OK > BF.ADD your_filter foo 1 > BF.EXISTS your_filter foo 1 > BF.EXISTS your_filter bar

slide-84
SLIDE 84

ReBloom

slide-85
SLIDE 85

ReBloom

  • Clustering
slide-86
SLIDE 86

ReBloom

  • Clustering
  • Redundancy/replication
slide-87
SLIDE 87

ReBloom

  • Clustering
  • Redundancy/replication
  • Lower cognitive overhead
slide-88
SLIDE 88

ReBloom

  • Clustering
  • Redundancy/replication
  • Lower cognitive overhead
  • Powerful API
slide-89
SLIDE 89

ReBloom

  • Clustering
  • Redundancy/replication
  • Lower cognitive overhead
  • Powerful API
  • No maintainance
slide-90
SLIDE 90

Summary

  • Bloom filters significantly reduce

memory usage and latency

  • Redis modules allows your custom data

structures to scale github.com/casidiablo cristian.io