Taming large state for real-time joins Sonali Sharma & Shriya - - PowerPoint PPT Presentation

taming large state for real time joins
SMART_READER_LITE
LIVE PREVIEW

Taming large state for real-time joins Sonali Sharma & Shriya - - PowerPoint PPT Presentation

Taming large state for real-time joins Sonali Sharma & Shriya Arora Netflix Waiting for your data be like .... I love waiting for my data - said no stakeholder ever! Sonali Sharma Shriya Arora - Senior Data engineer, Data


slide-1
SLIDE 1

Taming large state for real-time joins

Sonali Sharma & Shriya Arora Netflix

slide-2
SLIDE 2
slide-3
SLIDE 3

Waiting for your data be like ....

slide-4
SLIDE 4
slide-5
SLIDE 5

“I love waiting for my data”

  • said no

stakeholder ever!

slide-6
SLIDE 6

Sonali Sharma Shriya Arora

  • Senior Data engineer, Data Science and Engineering, Netflix
  • Build data products for personalization
  • Building low latency data pipelines
  • Deal with PB scale of data
slide-7
SLIDE 7

Coming up in the next 40 minutes

  • Use case for a stateful streaming pipeline
  • Concept and Building blocks of streaming apps
  • Data join in a streaming context (windows)
  • Challenges in building low latency pipeline
slide-8
SLIDE 8

Use case for streaming pipeline

slide-9
SLIDE 9
slide-10
SLIDE 10
slide-11
SLIDE 11

Netflix Traffic

1 trillion events per day 100 PB of data stored on cloud

slide-12
SLIDE 12

Recommendations everywhere!

slide-13
SLIDE 13

Which artwork to show?

slide-14
SLIDE 14

Signal: Take Fraction

Take Fraction = 1 / 3

Profile B Profile C Play No play User A User B User C

slide-15
SLIDE 15

Making a case for streaming ETL

`

Real time Reporting Real time Alerting Faster training of ML models Computational gains

slide-16
SLIDE 16

Recap: Use case

  • Join Impression events with playback events in real

time to calculate take fraction

  • Train model faster and on fresher data
  • Convert large batch data processing pipeline to a

stateful streaming pipeline

slide-17
SLIDE 17

Concepts and Building Blocks

slide-18
SLIDE 18

Modern stream processing frameworks

Qcon stream processing talks 2017

slide-19
SLIDE 19

Bounded vs Unbounded Data

Batch data at rest, hard boundaries Stream data is unbounded

Window

slide-20
SLIDE 20

Solution: Windows

Windows split the stream into buckets of finite size, over which we can apply computations. stream.keyBy(...) .window(...) [.trigger(...)] [.allowedLateness(...)] .reduce/aggregate/fold/apply() stream.join(otherStream) .where(<KeySelector>) .equalTo(<KeySelector>) .window(<WindowAssigner>) .apply(<JoinFunction>) T Group By Join

slide-21
SLIDE 21

Event time vs processing time

1 2 3 4 5 Clock Event time Processing time

slide-22
SLIDE 22

Out-of-order and late-arriving events

Event time windows 1st burst of events 2nd burst of events Processing time windows Events from the Netflix apps

Ingestion pipeline

slide-23
SLIDE 23

Solution: Watermark

A watermark is a notion of input completeness with respect to event time. Watermarks act as a metric of progress when processing an unbounded data source.

slide-24
SLIDE 24

Slowly changing dimensions

Enriching stream with dimensional data

`

Combine streams

API calls for enrichment Movie Metadata (Hive or data map) Enriched stream Raw streams

slide-25
SLIDE 25

Fault tolerance

Checkpoint {n} Checkpoint {n-1} Older records Newer records

Checkpoint

  • Snapshot of metadata and state of the app
  • Helps in recovery

Event time Checkpoint interval

slide-26
SLIDE 26

Check point interval

Interval should have cover duration and pauses with buffer

slide-27
SLIDE 27

Recap: Concepts and Building blocks

  • Handling unbounded data, define boundaries using

Windows

  • Event time processing
  • Handle out of order and late arriving events using

Watermarks

  • Enrich data in stream using external calls
  • Fault tolerance is very important for streaming

applications

slide-28
SLIDE 28

Making a stream join work

slide-29
SLIDE 29

Data Flow Architecture

Playback stream Reduce

Transform + AssignTs Transform + AssignTs Output .keyBy

By Source, Fair use, https://en.wikipedia.org/w/index.php?curid=47175041 Impression stream

kafla

.keyBy

slide-30
SLIDE 30

Data Flow Architecture

Transform + AssignTs

By Source, Fair use, https://en.wikipedia.org/w/index.php?curid=47175041

Parse (raw ->T) Filter (T-> T) AssignTs (t.getTs())

slide-31
SLIDE 31

Joining streams: Keyed Streams

DataStream KeyedStream

.keyBy

.keyBy

slide-32
SLIDE 32

Stream joins in Flink: Maintaining State

  • Events need to be held in-memory for user-defined intervals
  • f time for meaningful aggregations
  • Data held in memory needs to be cleared when no longer

needed

A B C RocksDB

Checkpoint

slide-33
SLIDE 33

Aggregating streams: Windows

Windows split the stream into buckets of finite size,

  • ver which we can apply

computations. Stream volume: 200k/s/region Repeating values for same keys: 3-4

slide-34
SLIDE 34

Aggregating streams

Can the events be summarized as they come?

slide-35
SLIDE 35

Updating state: CoProcess Function

K1

Impressions Playback

K1,I K1,I K3,I K3,I K1,P K1,P K3,P K4,P K3,I

ValueState<T> Composite Type I + P +

K3

I + + P

K4

P

slide-36
SLIDE 36

Stream joins in Flink: Updating State

  • Timers

○ Flink’s TimerService can be used to register callbacks for future time instants.

processElement()

  • nTimer()

Timer service State Aggregated elements

slide-37
SLIDE 37

Recap

Playback stream Summarize

Transform + AssignTs Transform + AssignTs Output .keyBy

By Source, Fair use, https://en.wikipedia.org/w/index.php?curid=47175041 Impression stream

kafla

.keyBy

slide-38
SLIDE 38

Challenges

slide-39
SLIDE 39
slide-40
SLIDE 40

Challenge: Data Correctness

  • Trade-offs

○ Latency v/s completeness

  • Duplicates

○ Most streaming systems are at-most-once ○ de-duplication explodes state

  • Data validation

○ Real-time auditing of data ○ How to stop the incoming flow of bad data?

slide-41
SLIDE 41

Challenge: Operations

Visibility into event time progression

slide-42
SLIDE 42

Challenge: Operations

  • Visibility into state
  • Monitoring checkpoints
  • Periodic Savepoints
  • Intercepting RocksDB

metrics

slide-43
SLIDE 43

Challenge: Data recovery

  • Replaying from Kafka

○ Checkpoints contain offset information ○ Different streams have different volumes

  • Replaying from Hive

○ Kafka retention is expensive ○ Easier for stateless applications

slide-44
SLIDE 44

Solution: Replaying from Kafka

  • Ingestion time filtering

○ Read all input streams from earliest ○ Netflix Kafka producer stamps processing time ○ Filter out events based on processing time

System went down System came back up

T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T0

stream.filter(e => e.ingestionTs > T2 && e.ingestionTs < T7 )

slide-45
SLIDE 45

Challenge: Region failovers

  • Event time is dependent on incoming data
  • Force moving the watermark via a

maxInactivity parameter

slide-46
SLIDE 46

Challenges we are working on

  • State Schema Evolution
  • Application level De-duplication
  • Auto Scaling and recovery
  • Replaying and Restating data
slide-47
SLIDE 47

Finally

slide-48
SLIDE 48
  • Fresher data for Personalization models
  • Enhanced user experience
  • Enable stakeholders for early decision making
  • Save on storage and compute costs
  • Real-time auditing and early detection of data gaps

What sparked joy

slide-49
SLIDE 49

Questions?

Join us!

@NetflixData