[PPT] - Scaling Saved Searches Serving real time push-notifications for PowerPoint Presentation

SLIDE 1

Scaling Saved Searches

Serving real time push-notifications for millions saved searches 466382733

SLIDE 2

SLIDE 3

SLIDE 4

Who are we?

SLIDE 5

ebay kleinanzeigen ≠ ebay

SLIDE 6

SLIDE 7

What are we?

SLIDE 8

ads = classified ads

SLIDE 9

SLIDE 10

SLIDE 11

SLIDE 12

SLIDE 13

SLIDE 14

SLIDE 15

some numbers

SLIDE 16

22M ads live!

SLIDE 17

18M searches/day

SLIDE 18

SLIDE 19

SLIDE 20

SLIDE 21

Saved Searches

Serving real time push-notifications for millions saved searches 466382733

SLIDE 22

700k new ads/day 8M saved searches

SLIDE 23

48.000.000.000 theoretical matches a day!

SLIDE 24

p r o c e s s i t !

SLIDE 25

W

hat?

SLIDE 26

SLIDE 27

SLIDE 28

SLIDE 29

SLIDE 30

SLIDE 31

SLIDE 32

SLIDE 33

SLIDE 34

SLIDE 35

H

w?

SLIDE 36

* * 0/1 * * ?

SLIDE 37

r e a l t i m e ?

SLIDE 38

s c a l a b l e ?

SLIDE 39

C a n w e d o b e t t e r ?

SLIDE 40

2 0 1 5

SLIDE 41 src=https://www.esciencecenter.nl/img/main/logo-elastic.png

SLIDE 42 Percolator Traditionally you design documents based on your data, store them into an index, and then define queries via the search API in order to retrieve these documents. The percolator works in the opposite direction. First you store queries into an index and then, via the percolate API, you define documents in order to retrieve these queries. src=https://www.elastic.co/guide/en/elasticsearch/reference/current/search-percolate.html

SLIDE 43

SLIDE 44

SLIDE 45

SLIDE 46

SLIDE 47

SLIDE 48

SLIDE 49

SLIDE 50

SLIDE 51

SLIDE 52

SLIDE 53

SLIDE 54

SLIDE 55

SLIDE 56

SLIDE 57

SLIDE 58

SLIDE 59

H o w m a n y p u s h e s p e r d a y ?

SLIDE 60

SLIDE 61

~3x

SLIDE 62

H

w?

SLIDE 63

700k new ads/day

SLIDE 64

match all?

SLIDE 65

a s k s e a r c h

SLIDE 66

h o w m a n y r e s u l t s ?

SLIDE 67

c r e a t e b u c k e t s

SLIDE 68

0 - 100: RT 101 - 1000: 1h 1001 - 10000: 2h > 10000: 6h

SLIDE 69

...

SLIDE 70

l i f e t i m e

f

a s e a r c h

SLIDE 71

SLIDE 72

SLIDE 73

s l e e p ...

Z Z Z Z Z Z

SLIDE 74 Z Z Z Z Z Z

SLIDE 75

SLIDE 76

SLIDE 77

SLIDE 78

S e t u p

SLIDE 79

S e t u p

SLIDE 80

SLIDE 81

cloud

SLIDE 82

2 data centers

SLIDE 83

2 data centers 10 data + 3 master

SLIDE 84

2 data centers 10 data + 3 master

SLIDE 85

replication x1 shards x80

SLIDE 86

SLIDE 87

SLIDE 88

SLIDE 89

SOLVED ES5

SLIDE 90

s k i p

n
v e r l o a d

SLIDE 91

SLIDE 92

e l a s t i c f a s t

n

i n d e x i n g

SLIDE 93

f i l t e r s l e e p i n g s e a r c h e s

SLIDE 94

SLIDE 95

m e t a d a t a

SLIDE 96

filter:{ “next_pushdate”: [* TO NOW]}

SLIDE 97

n l y 3 0 %

s e a r c h e s a r e

n l i n e

SLIDE 98

d e s k t o p

SLIDE 99

a v o i d d b - r e a d p e r s e a r c h

SLIDE 100

h a s h p e r s e a r c h

SLIDE 101

b l o o m f i l t e r i n c o o k i e

SLIDE 102 src=https://upload.wikimedia.org/wikipedia/commons/thumb/a/ac/Bloom_filter.svg/2000px-Bloom_filter.svg.png

SLIDE 103

a p p s

SLIDE 104

d e e p l i n k

n

r e s u l t s i z e

SLIDE 105

SLIDE 106

5

SLIDE 107

5 1

SLIDE 108

s t o r e s e a r c h e s l o c a l

SLIDE 109

b a c k e n d s y n c

n

a c t i o n s

SLIDE 110

S a v e d S e a r c h

SLIDE 111

S t a b l e ?

SLIDE 112

S t a b i l i z e e l a s t i c

SLIDE 113

Boost your percolator!

Tips & Tricks

SLIDE 114

“This indeed seems like a large application of percolate.”

Elastic support, June 2015

SLIDE 115 Performance linear with number of queries

SLIDE 116

1. Consider using other systems.

SLIDE 117

1. Consider using other systems.

“It is worth noting that simple exist matches on a field are probably not a great application for percolator. This doesn’t utilize any text matching capability or complex boolean.” Anything, anywhere! Every ad offering something for free!

SLIDE 118

1. Consider using other systems.

SLIDE 119

2. Optimise your data structure.

SLIDE 120

2. Optimise your data structure.

SLIDE 121

2. Optimise your data structure.

SLIDE 122

3. Filter, filter, filter!

SLIDE 123

3. Filter, filter, filter!

“The filter only works on the metadata fields. The query field isn’t indexed by default.”

SLIDE 124

3. Filter, filter, filter!

CATEGORY: cars CATEGORY: all CATEGORY: cars OR all

SLIDE 125 … what else can we filter?

3. Filter, filter, filter!

SLIDE 126

3. Filter, filter, filter!

SLIDE 127

4. Use bulk requests.

SLIDE 128

5. Use parallel bulk requests.

SLIDE 129

5. Use parallel bulk requests.

index node1 A1 node2 A2

SLIDE 130

5. Use parallel bulk requests.

“Currently, to utilise all of your shards, you would need to consider sending multipercolate requests in parallel.” index node1 A1 node2 A2 https://github.com/elastic/elasticsearch/issues/13177

SLIDE 131

5. Use parallel bulk requests.

SLIDE 132

6. Degrade gracefully

SLIDE 133 Matthias: Antique copper lamps in Pankow André: Cars in Berlin

6. Degrade gracefully

SLIDE 134

6. Degrade gracefully

André: Cars in Berlin Matthias: Antique copper lamps in Pankow

SLIDE 135

6. Degrade gracefully

HIGH PRIORITY LOW PRIORITY André: Cars in Berlin Matthias: Antique copper lamps in Pankow

SLIDE 136

6. Degrade gracefully

SLIDE 137 Outcome Reduced percolation time:

SLIDE 138 Outcome Doubled the number of push notifications:

SLIDE 139

S t a b i l i z e e l a s t i c

SLIDE 140

S t a b l e ?

SLIDE 141

8 0 0 0 0 0 0 s e a r c h e s 7 0 0 0 0 0 a d s / d a y

SLIDE 142

S t a b i l i z e p l a t f o r m

SLIDE 143

eBayK saved searches goes 2016 architecture

SLIDE 144 Before: one DB rules it all MySQL

SLIDE 145 Before: one DB rules it all create saved search MySQL

SLIDE 146 Before: one DB rules it all create saved search change saved search MySQL

SLIDE 147 Before: one DB rules it all create ad create saved search change saved search MySQL

SLIDE 148 Before: one DB rules it all create ad create saved search change saved search MySQL found match

SLIDE 149 Before: one DB rules it all create ad create saved search change saved search MySQL got push found match

SLIDE 150 MySQL Before...

SLIDE 151 MySQL AwakeJob Before...

SLIDE 152 MySQL AwakeJob SendJob CreateJob Before...

SLIDE 153 MySQL CleanupJob AwakeJob SendJob IndexerJob CreateJob ExpireJob Before...

SLIDE 154 Before: bottleneck communication via DB super high performance resiliency scalability ..?

SLIDE 155 Goal: event-driven data pipeline

SLIDE 156

SLIDE 157 What is Apache Kafka? distributed messaging system - persistent - high throughput Topic 1 Topic 2 Producer Producer Consumer Consumer Consumer

SLIDE 158 But what’s new?

SLIDE 159 But what’s new? 1 2 3

SLIDE 160 Now: streams and data flows percolate create ad

SLIDE 161 Now: streams and data flows percolate create ad found match

SLIDE 162 Now: streams and data flows percolate process match create ad found match

SLIDE 163 Now: streams and data flows percolate process push create ad found match

SLIDE 164 Now: streams and data flows percolate process push create ad found match MySQL

SLIDE 165 Now: streams and data flows percolate process push create ad found match MySQL

SLIDE 166 Compaction

SLIDE 167 Compaction: Kafka == source of truth?

SLIDE 168 Compaction: Kafka == source of truth? A: 23 B: 12 B: null C: A: 24 time

SLIDE 169 Compaction: Kafka == source of truth? A: 23 B: 12 B: null C: A: 24 A: 24 C: time

SLIDE 170 Compaction: Kafka == source of truth? A: 24 C: time

SLIDE 171 Compaction: Kafka == source of truth? Consumer A: 24 C:

SLIDE 172 Compaction: Kafka == source of truth? Consumer A: 24 C:

SLIDE 173 Compaction: Kafka == source of truth? Consumer A: 24 C:

SLIDE 174 Issues encountered

SLIDE 175 Issues encountered latency - used local cache

SLIDE 176 Issues encountered some components couldn’t keep up - spot-on optimisation latency - used local cache

SLIDE 177 Issues encountered some components couldn’t keep up - spot-on optimisation

ut of order writes - ?

latency - used local cache

SLIDE 178

w r a p u p

SLIDE 179

simplicity fine tune elastic use streaming

SLIDE 180

T h a n k y o u

SLIDE 181

SLIDE 182 References ”Building LinkedIn’s Real-time Activity Data Pipeline”, Ken Goodhope, Joel Koshy, Jay Kreps, Neha Narkhede, Richard Park, Jun Rao, Victor Yang Ye