Scaling Saved Searches Serving real time push-notifications for - - PowerPoint PPT Presentation

scaling saved searches
SMART_READER_LITE
LIVE PREVIEW

Scaling Saved Searches Serving real time push-notifications for - - PowerPoint PPT Presentation

466382733 Scaling Saved Searches Serving real time push-notifications for millions saved searches Who are we? ebay kleinanzeigen ebay What are we? ads = classified ads some numbers 22M ads live! 18M searches/day 466382733 Saved


slide-1
SLIDE 1

Scaling Saved Searches

Serving real time push-notifications for millions saved searches 466382733
slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4

Who are we?

slide-5
SLIDE 5

ebay kleinanzeigen ≠ ebay

slide-6
SLIDE 6
slide-7
SLIDE 7

What are we?

slide-8
SLIDE 8

ads = classified ads

slide-9
SLIDE 9
slide-10
SLIDE 10
slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14
slide-15
SLIDE 15

some numbers

slide-16
SLIDE 16

22M ads live!

slide-17
SLIDE 17

18M searches/day

slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21

Saved Searches

Serving real time push-notifications for millions saved searches 466382733
slide-22
SLIDE 22

700k new ads/day 8M saved searches

slide-23
SLIDE 23

48.000.000.000 theoretical matches a day!

slide-24
SLIDE 24

p r o c e s s i t !

slide-25
SLIDE 25

W

hat?

slide-26
SLIDE 26
slide-27
SLIDE 27
slide-28
SLIDE 28
slide-29
SLIDE 29
slide-30
SLIDE 30
slide-31
SLIDE 31
slide-32
SLIDE 32
slide-33
SLIDE 33
slide-34
SLIDE 34
slide-35
SLIDE 35

H

  • w?
slide-36
SLIDE 36

* * 0/1 * * ?

slide-37
SLIDE 37

r e a l t i m e ?

slide-38
SLIDE 38

s c a l a b l e ?

slide-39
SLIDE 39

C a n w e d o b e t t e r ?

slide-40
SLIDE 40

2 0 1 5

slide-41
SLIDE 41 src=https://www.esciencecenter.nl/img/main/logo-elastic.png
slide-42
SLIDE 42 Percolator Traditionally you design documents based on your data, store them into an index, and then define queries via the search API in order to retrieve these documents. The percolator works in the opposite direction. First you store queries into an index and then, via the percolate API, you define documents in order to retrieve these queries. src=https://www.elastic.co/guide/en/elasticsearch/reference/current/search-percolate.html
slide-43
SLIDE 43
slide-44
SLIDE 44
slide-45
SLIDE 45
slide-46
SLIDE 46
slide-47
SLIDE 47
slide-48
SLIDE 48
slide-49
SLIDE 49
slide-50
SLIDE 50
slide-51
SLIDE 51
slide-52
SLIDE 52
slide-53
SLIDE 53
slide-54
SLIDE 54
slide-55
SLIDE 55
slide-56
SLIDE 56
slide-57
SLIDE 57
slide-58
SLIDE 58
slide-59
SLIDE 59

H o w m a n y p u s h e s p e r d a y ?

slide-60
SLIDE 60
slide-61
SLIDE 61

~3x

slide-62
SLIDE 62

H

  • w?
slide-63
SLIDE 63

700k new ads/day

slide-64
SLIDE 64

match all?

slide-65
SLIDE 65

a s k s e a r c h

slide-66
SLIDE 66

h o w m a n y r e s u l t s ?

slide-67
SLIDE 67

c r e a t e b u c k e t s

slide-68
SLIDE 68

0 - 100: RT 101 - 1000: 1h 1001 - 10000: 2h > 10000: 6h

slide-69
SLIDE 69

...

slide-70
SLIDE 70

l i f e t i m e

  • f

a s e a r c h

slide-71
SLIDE 71
slide-72
SLIDE 72
slide-73
SLIDE 73

s l e e p ...

Z Z Z Z Z Z
slide-74
SLIDE 74 Z Z Z Z Z Z
slide-75
SLIDE 75
slide-76
SLIDE 76
slide-77
SLIDE 77
slide-78
SLIDE 78

S e t u p

slide-79
SLIDE 79

S e t u p

slide-80
SLIDE 80
slide-81
SLIDE 81

cloud

slide-82
SLIDE 82

2 data centers

slide-83
SLIDE 83

2 data centers 10 data + 3 master

slide-84
SLIDE 84

2 data centers 10 data + 3 master

slide-85
SLIDE 85

replication x1 shards x80

slide-86
SLIDE 86
slide-87
SLIDE 87
slide-88
SLIDE 88
slide-89
SLIDE 89

SOLVED ES5

slide-90
SLIDE 90

s k i p

  • n
  • v e r l o a d
slide-91
SLIDE 91
slide-92
SLIDE 92

e l a s t i c f a s t

  • n

i n d e x i n g

slide-93
SLIDE 93

f i l t e r s l e e p i n g s e a r c h e s

slide-94
SLIDE 94
slide-95
SLIDE 95

m e t a d a t a

slide-96
SLIDE 96

filter:{ “next_pushdate”: [* TO NOW]}

slide-97
SLIDE 97
  • n l y 3 0 %

s e a r c h e s a r e

  • n l i n e
slide-98
SLIDE 98

d e s k t o p

slide-99
SLIDE 99

a v o i d d b - r e a d p e r s e a r c h

slide-100
SLIDE 100

h a s h p e r s e a r c h

slide-101
SLIDE 101

b l o o m f i l t e r i n c o o k i e

slide-102
SLIDE 102 src=https://upload.wikimedia.org/wikipedia/commons/thumb/a/ac/Bloom_filter.svg/2000px-Bloom_filter.svg.png
slide-103
SLIDE 103

a p p s

slide-104
SLIDE 104

d e e p l i n k

  • n

r e s u l t s i z e

slide-105
SLIDE 105
slide-106
SLIDE 106

5

slide-107
SLIDE 107

5 1

slide-108
SLIDE 108

s t o r e s e a r c h e s l o c a l

slide-109
SLIDE 109

b a c k e n d s y n c

  • n

a c t i o n s

slide-110
SLIDE 110

S a v e d S e a r c h

slide-111
SLIDE 111

S t a b l e ?

slide-112
SLIDE 112

S t a b i l i z e e l a s t i c

slide-113
SLIDE 113

Boost your percolator!

Tips & Tricks
slide-114
SLIDE 114

“This indeed seems like a large application of percolate.”

Elastic support, June 2015
slide-115
SLIDE 115 Performance linear with number of queries
slide-116
SLIDE 116
  • 1. Consider using other systems.
slide-117
SLIDE 117
  • 1. Consider using other systems.
“It is worth noting that simple exist matches on a field are probably not a great application for percolator. This doesn’t utilize any text matching capability or complex boolean.” Anything, anywhere! Every ad offering something for free!
slide-118
SLIDE 118
  • 1. Consider using other systems.
slide-119
SLIDE 119
  • 2. Optimise your data structure.
slide-120
SLIDE 120
  • 2. Optimise your data structure.
slide-121
SLIDE 121
  • 2. Optimise your data structure.
slide-122
SLIDE 122
  • 3. Filter, filter, filter!
slide-123
SLIDE 123
  • 3. Filter, filter, filter!
“The filter only works on the metadata fields. The query field isn’t indexed by default.”
slide-124
SLIDE 124
  • 3. Filter, filter, filter!
CATEGORY: cars CATEGORY: all CATEGORY: cars OR all
slide-125
SLIDE 125 … what else can we filter?
  • 3. Filter, filter, filter!
slide-126
SLIDE 126
  • 3. Filter, filter, filter!
slide-127
SLIDE 127
  • 4. Use bulk requests.
slide-128
SLIDE 128
  • 5. Use parallel bulk requests.
slide-129
SLIDE 129
  • 5. Use parallel bulk requests.
index node1 A1 node2 A2
slide-130
SLIDE 130
  • 5. Use parallel bulk requests.
“Currently, to utilise all of your shards, you would need to consider sending multipercolate requests in parallel.” index node1 A1 node2 A2 https://github.com/elastic/elasticsearch/issues/13177
slide-131
SLIDE 131
  • 5. Use parallel bulk requests.
slide-132
SLIDE 132
  • 6. Degrade gracefully
slide-133
SLIDE 133 Matthias: Antique copper lamps in Pankow André: Cars in Berlin
  • 6. Degrade gracefully
slide-134
SLIDE 134
  • 6. Degrade gracefully
André: Cars in Berlin Matthias: Antique copper lamps in Pankow
slide-135
SLIDE 135
  • 6. Degrade gracefully
HIGH PRIORITY LOW PRIORITY André: Cars in Berlin Matthias: Antique copper lamps in Pankow
slide-136
SLIDE 136
  • 6. Degrade gracefully
slide-137
SLIDE 137 Outcome Reduced percolation time:
slide-138
SLIDE 138 Outcome Doubled the number of push notifications:
slide-139
SLIDE 139

S t a b i l i z e e l a s t i c

slide-140
SLIDE 140

S t a b l e ?

slide-141
SLIDE 141

8 0 0 0 0 0 0 s e a r c h e s 7 0 0 0 0 0 a d s / d a y

slide-142
SLIDE 142

S t a b i l i z e p l a t f o r m

slide-143
SLIDE 143

eBayK saved searches goes 2016 architecture

slide-144
SLIDE 144 Before: one DB rules it all MySQL
slide-145
SLIDE 145 Before: one DB rules it all create saved search MySQL
slide-146
SLIDE 146 Before: one DB rules it all create saved search change saved search MySQL
slide-147
SLIDE 147 Before: one DB rules it all create ad create saved search change saved search MySQL
slide-148
SLIDE 148 Before: one DB rules it all create ad create saved search change saved search MySQL found match
slide-149
SLIDE 149 Before: one DB rules it all create ad create saved search change saved search MySQL got push found match
slide-150
SLIDE 150 MySQL Before...
slide-151
SLIDE 151 MySQL AwakeJob Before...
slide-152
SLIDE 152 MySQL AwakeJob SendJob CreateJob Before...
slide-153
SLIDE 153 MySQL CleanupJob AwakeJob SendJob IndexerJob CreateJob ExpireJob Before...
slide-154
SLIDE 154 Before: bottleneck communication via DB super high performance resiliency scalability ..?
slide-155
SLIDE 155 Goal: event-driven data pipeline
slide-156
SLIDE 156
slide-157
SLIDE 157 What is Apache Kafka? distributed messaging system - persistent - high throughput Topic 1 Topic 2 Producer Producer Consumer Consumer Consumer
slide-158
SLIDE 158 But what’s new?
slide-159
SLIDE 159 But what’s new? 1 2 3
slide-160
SLIDE 160 Now: streams and data flows percolate create ad
slide-161
SLIDE 161 Now: streams and data flows percolate create ad found match
slide-162
SLIDE 162 Now: streams and data flows percolate process match create ad found match
slide-163
SLIDE 163 Now: streams and data flows percolate process push create ad found match
slide-164
SLIDE 164 Now: streams and data flows percolate process push create ad found match MySQL
slide-165
SLIDE 165 Now: streams and data flows percolate process push create ad found match MySQL
slide-166
SLIDE 166 Compaction
slide-167
SLIDE 167 Compaction: Kafka == source of truth?
slide-168
SLIDE 168 Compaction: Kafka == source of truth? A: 23 B: 12 B: null C: A: 24 time
slide-169
SLIDE 169 Compaction: Kafka == source of truth? A: 23 B: 12 B: null C: A: 24 A: 24 C: time
slide-170
SLIDE 170 Compaction: Kafka == source of truth? A: 24 C: time
slide-171
SLIDE 171 Compaction: Kafka == source of truth? Consumer A: 24 C:
slide-172
SLIDE 172 Compaction: Kafka == source of truth? Consumer A: 24 C:
slide-173
SLIDE 173 Compaction: Kafka == source of truth? Consumer A: 24 C:
slide-174
SLIDE 174 Issues encountered
slide-175
SLIDE 175 Issues encountered latency - used local cache
slide-176
SLIDE 176 Issues encountered some components couldn’t keep up - spot-on optimisation latency - used local cache
slide-177
SLIDE 177 Issues encountered some components couldn’t keep up - spot-on optimisation
  • ut of order writes - ?
latency - used local cache
slide-178
SLIDE 178

w r a p u p

slide-179
SLIDE 179

simplicity fine tune elastic use streaming

slide-180
SLIDE 180

T h a n k y o u

slide-181
SLIDE 181
slide-182
SLIDE 182 References ”Building LinkedIn’s Real-time Activity Data Pipeline”, Ken Goodhope, Joel Koshy, Jay Kreps, Neha Narkhede, Richard Park, Jun Rao, Victor Yang Ye