End-to-end Exactly-once Aggregation over Ad Streams Amiraj Dhawan - - PowerPoint PPT Presentation

end to end exactly once aggregation over ad streams
SMART_READER_LITE
LIVE PREVIEW

End-to-end Exactly-once Aggregation over Ad Streams Amiraj Dhawan - - PowerPoint PPT Presentation

End-to-end Exactly-once Aggregation over Ad Streams Amiraj Dhawan Amit Ramesh Yelps Mission Connecting people with great local businesses. Outline Background & context Business requirements Design


slide-1
SLIDE 1

End-to-end Exactly-once Aggregation over Ad Streams

Amiraj Dhawan Amit Ramesh

slide-2
SLIDE 2

Yelp’s Mission

Connecting people with great local businesses.

slide-3
SLIDE 3
  • Background & context
  • Business requirements
  • Design iterations
  • Exactly-once aggregation
  • What’s next?

Outline

slide-4
SLIDE 4

Local Ads

  • Work done within the Local Ads group
  • Manage a few 100K ad campaigns daily
  • Mom and pop stores to national chains
  • Pipelines receive a few thousand msgs/sec
  • Pipelines in production for more than a year
slide-5
SLIDE 5

Local Ads – Consumer facing

slide-6
SLIDE 6

Local Ads – Advertiser facing

slide-7
SLIDE 7

Local Ads – Ad Campaign Management

slide-8
SLIDE 8

Distilled Business Requirements

  • Aggregate events over a day period
  • Slice aggregates along defined dimensions
  • Provide partial aggregates as day progresses
  • Make aggregates as accurate as possible

Day Dimension 1 Dimension 2 Dimension N Aggregate 1 Aggregate 2 Aggregate M

slide-9
SLIDE 9

An Illustrative Example

  • Count ad clicks over a day period
  • Provide click counts by ad campaign
  • Provide partial click counts as day progresses

Day Campaign ID Number of clicks 4/17/2019 23265 35 Day Campaign ID Number of clicks 4/17/2019 23265 42

slide-10
SLIDE 10

Stream Processing 101

Stream Processing Engine

Database Input Stream(s) Output Stream(s)

slide-11
SLIDE 11

Stream Processing 101

Stream Processing Engine

Database Input Stream(s) Output Stream(s)

slide-12
SLIDE 12

Windowed operations

Tumbling window Sliding window

slide-13
SLIDE 13

Processing pipeline

Why not...

Day Campaign ID Number of clicks 4/17/2019 23265 35

slide-14
SLIDE 14

Processing pipeline

Why not...

Day Campaign ID Number of clicks 4/17/2019 23265 35

  • Need partial click counts as day progresses!
  • Stateful operation

slide-15
SLIDE 15

Processing pipeline

How about...

Day Campaign ID Number of clicks 4/17/2019 23265 35

∆’s

slide-16
SLIDE 16

Processing pipeline

How about...

Day Campaign ID Number of clicks 4/17/2019 23265 35

∆’s

  • Cassandra has a Counter column type
  • Integer type with increment and decrement
slide-17
SLIDE 17

However...

  • Counter is not meant to be idempotent
  • Good for approximate metrics (likes/follows)
  • Reported discrepancies of up to 5%
  • Discrepancies due to being distributed
  • No plans to make it idempotent
slide-18
SLIDE 18

Processing pipeline

Alright...

Day Campaign ID Number of clicks 4/17/2019 23265 35

∑t + ∆ ∑t

  • Use Cassandra for the current count
  • Increment in Spark and update Cassandra
slide-19
SLIDE 19

Kafka 101

10 9 8 7 6 5 4 3 2 1 0 10 9 8 7 6 5 4 3 2 1 0 10 9 8 7 6 5 4 3 2 1 0

Partitions

Offsets

  • Data is in partitions
  • Partition is ordered
  • Consumers track

their own progress

slide-20
SLIDE 20

Spark Streaming 101

  • Micro-batching
  • No pipelining
  • App manages
  • ffset commits
slide-21
SLIDE 21

Putting them together

∑t + ∆ ∑t ∆ ∑t ∑t + ∆

Kafka Offset Commit Stage 1 Stage 2 Stage 3

slide-22
SLIDE 22

In the words of Ken Arnold

Failure is the defining difference between distributed and local programming, so you have to design distributed systems with the expectation of failure. Imagine asking people, "If

the probability of something happening is one in ten to the thirteenth, how

  • ften would it happen?" Your natural human sense would be to answer,

"Never." That is an infinitely large number in human terms. But if you ask a physicist, she would say, "All the time. In a cubic foot of air, those things happen all the time." When you design distributed systems, you have to

say, "Failure happens all the time." So when you design, you design for failure. It is your number one concern.

slide-23
SLIDE 23

Failure Modes

∑t + ∆ ∑t ∆ ∑t ∑t + ∆

Kafka Offset Commit Stage 1 Stage 2 Stage 3

slide-24
SLIDE 24

Failure Modes

∑t + ∆ ∑t ∆ ∑t ∑t + ∆

Kafka Offset Commit Stage 1 Stage 2 Stage 3

slide-25
SLIDE 25

Failure Modes

∑t + ∆ ∑t ∆ ∑t ∑t + ∆

Kafka Offset Commit Stage 1 Stage 2 Stage 3

slide-26
SLIDE 26

Failure Modes

∑t + ∆ ∑t ∆ ∑t ∑t + ∆

Kafka Offset Commit Stage 1 Stage 2 Stage 3

slide-27
SLIDE 27

At Least + At Most = Exactly-once

  • Should be able to distinguish processed data
  • Versioning rows is one way to do it
  • Versions need to be monotonically increasing
  • Data in Kafka partitions are already ordered
  • Versioning can leverage data order
slide-28
SLIDE 28

Basic Idea

Day Campaign ID Number of clicks Version 4/17/2019 5 3 2

ID: 5 CLK ID: 5 ID: 5 ID: 5 CLK ID: 5 CLK ID: 5 CLK

5 4 3 2 1 0 Commit Offset

slide-29
SLIDE 29

Basic Idea

ID: 5 CLK ID: 9 ID: 5 ID: 5 CLK ID: 9 CLK ID: 9 CLK

5 4 3 2 1 0 Commit Offset

Day Campaign ID Number of clicks Version 4/17/2019 5 1 2 4/17/2019 9 2 1

slide-30
SLIDE 30

Basic Idea

Day Campaign ID Number of clicks Version 4/17/2019 5 2 P0: 2 P1: 3 4/17/2019 9 3 P0: 0 P1: 1

ID: 5 CLK ID: 9 ID: 9 CLK ID: 5 ID: 5 CLK ID: 9 CLK

5 4 3 2 1 0 Partition 0

ID: 5 CLK ID: 9 ID: 5 ID: 5 CLK ID: 9 CLK ID: 9 CLK

5 4 3 2 1 0 Partition 1

slide-31
SLIDE 31

Exactly-once Aggregation

∑t + ∆ , Vert+1

∑t ∑t , Vert ∑t + ∆ , Vert+1

Kafka Offset Commit Ver ∆ ∑t , Vert Stage 1 Stage 2 Stage 3

slide-32
SLIDE 32

Exactly-once Aggregation

∑t ∑t , Ver ∑t + ∆ , Vert+1

Kafka Offset Commit Ver ∆ Stage 1 Stage 2 Stage 3 ∑t + ∆ , Vert+1 ∑t , Vert

slide-33
SLIDE 33

Exactly-once Aggregation

∑t ∑t , Ver ∑t + ∆ , Vert+1

Kafka Offset Commit Ver ∆ Stage 1 Stage 2 Stage 3 ∑t + ∆ , Vert+1 ∑t , Vert

slide-34
SLIDE 34

Exactly-once Aggregation

∑t ∑t , Ver

Kafka Offset Commit Ver ∆ Stage 1 Stage 2

∑t + ∆ , Vert+1

Stage 3 ∑t + ∆ , Vert+1 ∑t , Vert

slide-35
SLIDE 35

Exactly-once Aggregation

∑t ∑t , Ver ∑t + ∆ , Vert+1

Kafka Offset Commit Ver ∆ Stage 1 Stage 2 Stage 3 ∑t + ∆ , Vert+1 ∑t , Vert

slide-36
SLIDE 36

Generalization

  • Aggregation logic is in the pipeline
  • Logic can be arbitrarily complex
  • Does not have to be a mathematical function
  • Strings, sets, lists, maps, etc.
slide-37
SLIDE 37

What’s next?

  • Windowed joins

○ As a specialization of aggregation ○ Allows for arbitrary business rules in joins

  • Deduplication within aggregation

○ Input streams can typically have duplicates

slide-38
SLIDE 38

www.yelp.com/careers/

We're Hiring!

slide-39
SLIDE 39

@YelpEngineering fb.com/YelpEngineers engineeringblog.yelp.com github.com/yelp