Counting events reliably with storm & riak Frank Schrder - eBay - - PowerPoint PPT Presentation

counting events reliably with storm riak
SMART_READER_LITE
LIVE PREVIEW

Counting events reliably with storm & riak Frank Schrder - eBay - - PowerPoint PPT Presentation

Counting events reliably with storm & riak Frank Schrder - eBay Classi fi eds Group Amsterdam marktplaats.nl Classi fi eds Admarkt Pay-per-click ads for professional sellers Seller places ad, chooses a budget and cost per


slide-1
SLIDE 1

Counting events reliably with storm & riak

Frank Schröder - eBay Classifieds Group Amsterdam

slide-2
SLIDE 2

marktplaats.nl

  • Classifieds
slide-3
SLIDE 3

Admarkt

  • Pay-per-click ads for

professional sellers

slide-4
SLIDE 4

Seller places ad,
 chooses a budget and cost per click

slide-5
SLIDE 5

We show the ad if it is relevant and budget is available

slide-6
SLIDE 6

We show the ad if it is relevant and budget is available

slide-7
SLIDE 7

Count clicks & impressions 
 Update budget & ranking

slide-8
SLIDE 8

We chose Storm & riak for ranking calculation

slide-9
SLIDE 9

Constraints

slide-10
SLIDE 10

135M events/day @ 3.2K/sec peak

slide-11
SLIDE 11

accurate real-time scale horizontally handle events out-of-order

slide-12
SLIDE 12

accurate real-time scale horizontally handle events out-of-order

slide-13
SLIDE 13

Storm

Real-time computation framework from Twitter Stream based producer-consumer topologies Nice properties for concurrent
 processing

slide-14
SLIDE 14

Storm

You write: a) code that handles a single event 
 in a single threaded context b) configuration how the events are produced and flow through the topology Then Storm sets up the queues and manages the Java VMs which run your code

slide-15
SLIDE 15

Storm

Spouts emit tuples (Producer) Bolts consume tuples and can emit them, too (Consumer & Producer) Storm worker = Java VM, 
 Each spout & bolt = 1 thread in a worker Concurrency is configurable
 and independent of your code

slide-16
SLIDE 16

Storm simple topology

bolt bolt spout AMQ event source riak

Storm Topology

riak

slide-17
SLIDE 17

Storm complex topology

bolt bolt bolt bolt spout spout spout AMQ AMQ AMQ event source LB riak riak riak riak riak

Storm Topology

slide-18
SLIDE 18

Storm

  • ur topology

marktplaats.nl spout AMQ avg 1 read avg 2 read event handler avg 1 update avg 2 update store store riak

slide-19
SLIDE 19

Storm

  • ur topology

marktplaats.nl spout AMQ avg 1 read avg 2 read event handler avg 1 update avg 2 update store store riak

7 riak nodes 3 spouts on 3 servers 24 avg1 bolts 24 avg2 bolts 96 event handler bolts

slide-20
SLIDE 20

Storm Hardware Setup

storm001 storm002 storm003 storm nimbus storm ui storm worker storm worker storm worker stormzoo003 stormzoo002 stormzoo001 zoo keeper stormriak001 stormriak002 stormriak003 stormriak004 stormriak005 ActiveMQ zoo keeper ActiveMQ zoo keeper ActiveMQ stormriak006 stormriak007

slide-21
SLIDE 21

Admarkt click-counter

  • 1. Service writes JSON event to file and sends it to
  • ActiveMQ. Use same format for logs and Storm.
  • 2. Spouts read JSON events from ActiveMQ and

emit them into the topology

  • 3. Bolts process events and update state in riak

If something goes wrong we replay events by putting the logs on the queue again

slide-22
SLIDE 22

riak for persistence How fast can we write?

slide-23
SLIDE 23

Riak Write Performance

riak 1.2.1, 5 node cluster

  • ps per second

5000 10000 15000 20000 25000 Document size in bytes 256 1024 4096 8192 16384

1 read + 1 write/sec peak

slide-24
SLIDE 24

Conclusion: Document size is important

slide-25
SLIDE 25

How can we be accurate?

slide-26
SLIDE 26

How can we be accurate?

  • Handle each event exactly once
slide-27
SLIDE 27

But events can arrive


  • ut-of-order…
slide-28
SLIDE 28

But events can arrive


  • ut-of-order…
slide-29
SLIDE 29

How can we know whether we have seen an event before?

slide-30
SLIDE 30

Idea 1: Comparing timestamps

event timestamp < last timestamp: 
 we have seen it already Milliseconds are not accurate enough NTP clock skew Replaying and bootstrapping does not work since you can’t tell an old from a replayed event

slide-31
SLIDE 31

Idea 2: Sequential Counters

event id < last id: we have seen it already How do you build a distributed, reliable, sorted counter? How do you handle service restarts? How can this not be the SPOF of the service? No idea ... Replaying and bootstrapping does not work for the same reasons as before

slide-32
SLIDE 32

Idea 3: Keep track of hashes

Event hash in current document: 
 we have seen it already Bootstrapping and replaying just works Over-counting cannot happen On failure just replay the logs but ...

slide-33
SLIDE 33

How many hashes do we have?

slide-34
SLIDE 34

Keeping track of events

135M events per day -> 135M hashes 650K live ads -> 210 events per day/ad But a handful of outliers get 
 40.000 events / hour - each sha1: 40 chars, md5: 32 chars, crc32: 8 chars Collisions?

slide-35
SLIDE 35

Hash sizes

Remember that document size is important

sha1: 210*40 = 8.4KB md5: 210*32 = 6.7KB crc32: 210*8 = 1.7KB

slide-36
SLIDE 36

Keeping documents small

Usually events are played forward in chronological order Only during replay and failure we need older hashes

slide-37
SLIDE 37

Keeping documents small

Keep only the current hour in the main document (hot set) Hash must be unique per ad per hour 


  • > Should take care of collisions. Should ...

At hh:00 move the older hashes into a separate document Keep documents with older hashes for as long we want to be able to replay (1-2 weeks)

slide-38
SLIDE 38

But with riak we don’t have TX …

slide-39
SLIDE 39

Moving hashes from one doc to another without TX

  • 1. Write archive doc with older

hashes but keep them in the main document

  • 2. Remove older events from the

current document and then write it

slide-40
SLIDE 40

Replaying events without TX

  • 1. Load older hashes from riak and

merge them with main document

  • 2. Write archive doc with older hashes

but keep them in the main document

  • 3. Remove older events from the

current document and then write it

slide-41
SLIDE 41

Serialization

Document size is important -> 
 Serialization makes a difference Kryo isn’t as fast as you might think JSON isn’t as bad as you might think Custom beats everything by a wide margin Maintainability is important, too

slide-42
SLIDE 42

Serialization

Maintainability is important, too You can look at JSON (helpful) Schema evolution via 
 Content-Type headers

slide-43
SLIDE 43

Persistence

Average ad has average number of hashes Can be written in real-time Outliers have orders of magnitude more hashes More hashes -> bigger docs & more writes

  • > kills riak (even a handful of them)
slide-44
SLIDE 44

Persistence

Simple back pressure rule (deferred writes) saves us Small doc -> write immediately Larger doc -> wait up to 5 sec Volatile docs receive lots of events during defer period. Saves writes

slide-45
SLIDE 45

8 months in

  • Lessons learned
slide-46
SLIDE 46

Riak

Cleaning up riak is hard since you can’t shouldn’t list buckets or keys. Easier with 2.0 Can’t query riak for “how many docs have value x > 5” without a program. Easier with 2.0 MapRed with gzipped JSON requires Erlang

  • code. JS can’t handle it. Not in 2.0
slide-47
SLIDE 47

Riak

Deferred writes only help so much. Maybe use constant write rate to make system more predictable. Riak scales nicely with more nodes.

slide-48
SLIDE 48

Storm

Mostly stable and fast (v0.8.2) Must understand internal queues and their sizing. Otherwise, topology just stops Need external tools for verifying that topology is working correctly

slide-49
SLIDE 49

Hashes

Nice idea but creates unbounded number of documents. Disks fill up and cleaning up is hard. Replay logic kills performance. Replaying is too slow if we need to replay a full day or more.

slide-50
SLIDE 50

rethink

slide-51
SLIDE 51

We don’t want to know what we have seen

  • We want to know what

we have not seen

slide-52
SLIDE 52

This would solve some problems:

  • doc size constant

number of docs constant riak cleanup not necessary

slide-53
SLIDE 53

But how do we know what we haven’t seen if we don’t know what is coming?

slide-54
SLIDE 54

Idea 2: Sequential Counters

event id < last id: we have seen it already How do you build a distributed, reliable, sorted counter? How do you handle service restarts? How can this not be the SPOF of the service? No idea ... Replaying and bootstrapping does not work for the same reasons as before

slide-55
SLIDE 55

Idea 2: Sequential Counters

event id < last id: we have seen it already How do you build a distributed, reliable, sequential counter? How do you handle service restarts? How can this not be the SPOF of the service? No idea ... Replaying and bootstrapping does not work for the same reasons as before

slide-56
SLIDE 56

Why just one counter?

slide-57
SLIDE 57

Lets have multiple

slide-58
SLIDE 58

Lets have multiple e.g.

  • ne per service

instance

slide-59
SLIDE 59

eventId = counterId + counterValue

  • e.g.
  • hostA-20131030_152543:15
slide-60
SLIDE 60

Create unique counter id at service start and
 start counting from 0

  • Increment atomically

(AtomicLong) and send counter id + value to storm

slide-61
SLIDE 61

Storm keeps track of counter value
 per counter id
 
 Keep gap lists of missed events

slide-62
SLIDE 62

Now we can predict what is coming

slide-63
SLIDE 63

Questions?

slide-64
SLIDE 64

Thank you

  • frschroeder@ebay.com