DRIZZLE: FAST AND Adaptable STREAM PROCESSING AT SCALE Shivaram - - PowerPoint PPT Presentation

drizzle fast and adaptable stream processing at scale
SMART_READER_LITE
LIVE PREVIEW

DRIZZLE: FAST AND Adaptable STREAM PROCESSING AT SCALE Shivaram - - PowerPoint PPT Presentation

DRIZZLE: FAST AND Adaptable STREAM PROCESSING AT SCALE Shivaram Venkataraman, Aurojit Panda, Kay Ousterhout , Michael Armbrust, Ali Ghodsi, Michael Franklin, Benjamin Recht, Ion Stoica STREAMING WORKLOADS Streaming Trends: Low latency Results


slide-1
SLIDE 1

DRIZZLE: FAST AND Adaptable STREAM PROCESSING AT SCALE

Shivaram Venkataraman, Aurojit Panda, Kay Ousterhout, Michael Armbrust, Ali Ghodsi, Michael Franklin, Benjamin Recht, Ion Stoica

slide-2
SLIDE 2

STREAMING WORKLOADS

slide-3
SLIDE 3

Streaming Trends: Low latency

Results power decisions by machines

Credit card fraud Disable account Suspicious user logins Ask security questions Slow video load Direct user to new CDN

slide-4
SLIDE 4

Disable stolen accounts Detect suspicious logins Dynamically adjust application behavior

Streaming Requirements: High throughput

As many as 10s of millions of updates per second Need a distributed system

slide-5
SLIDE 5

Distributed Execution Models

slide-6
SLIDE 6

Execution models: CONTINUOUS OPERATORS

… …

Group by user, run anomaly detection

slide-7
SLIDE 7

Execution models: CONTINUOUS OPERATORS

… … …

Group by user, run anomaly detection

Mutable local state Low latency output

slide-8
SLIDE 8

Execution models: CONTINUOUS OPERATORS

… … …

Group by user, run anomaly detection

Systems: Google MillWheel Streaming DBs: Borealis, Flux etc Naiad

Mutable local state

… …

Low latency output

slide-9
SLIDE 9

Execution models: Micro-batch

… …

… … … … … …

Group by user, run anomaly detection … …

z Tasks output state

  • n completion

Output at task granularity

slide-10
SLIDE 10

Execution models: Micro-batch

… …

… … … … … …

Group by user, run anomaly detection … …

z Tasks output state

  • n completion

Output at task granularity

Dynamic task scheduling Adaptability Straggler mitigation Elasticity Fault tolerance

Microsoft Dryad Google FlumeJava

slide-11
SLIDE 11

Failure recovery

slide-12
SLIDE 12

Failure recovery: continuous operators

Chandy Lamport Async Checkpoint Checkpointed state All machines replay from checkpoint

?

slide-13
SLIDE 13

Failure recovery: Micro-batch

… Task output is periodically checkpointed

z z z z

Task boundaries capture task interactions!

slide-14
SLIDE 14

Failure recovery: Micro-batch

… Parallelize replay Replay tasks from failed machine

z z z z

Task output is periodically checkpointed

slide-15
SLIDE 15

Execution models

Continuous operators Micro-batch Static scheduling Inflexible Slow failover Low latency Dynamic scheduling Adaptable Parallel recovery Straggler mitigation Higher latency Processing granularity Scheduling granularity

slide-16
SLIDE 16

Execution models

Continuous operators Micro-batch Static scheduling Low latency Dynamic scheduling (coarse granularity) Higher latency (coarse-grained processing) Drizzle Low latency (fine-grained processing) Dynamic scheduling (coarse granularity)

slide-17
SLIDE 17

inside the scheduler

… … … … … …

… …

Scheduler

?

(1) Decide how to assign tasks to machines

data locality fair sharing

(2) Serialize and send tasks

… …

slide-18
SLIDE 18

SCHEDULING OVERHEADS

Cluster: 4 core, r3.xlarge machines Workload: Sum of 10k numbers per-core Median-task time breakdown 50 100 150 200 250 4 8 16 32 64 128 Time (ms) ime (ms) Machines Machines Compute + Data Transfer Task Fetch Scheduler Delay

slide-19
SLIDE 19

inside the scheduler

… … … … … …

… …

Scheduler

?

(1) Decide how to assign tasks to machines

data locality fair sharing

(2) Serialize and send tasks

… …

?

Reuse scheduling decisions!

slide-20
SLIDE 20

DRIZZLE

… … … … … … … …

(1) Pre-schedule reduce tasks (2) Group schedule micro-batches Goal: remove frequent scheduler interaction

slide-21
SLIDE 21

… …

(1) Pre-schedule reduce tasks Goal: Remove scheduler involvement for reduce tasks

slide-22
SLIDE 22

… …

(1) Pre-schedule reduce tasks

?

Goal: Remove scheduler involvement for reduce tasks

slide-23
SLIDE 23

coordinating shuffles: Existing systems

… …

Metadata describes shuffle data location Data fetched from remote machines

slide-24
SLIDE 24

coordinating shuffles: Pre-scheduling

… …

… (1) Pre-schedule reducers (2) Mappers get metadata (3) Mappers trigger reducers

slide-25
SLIDE 25

DRIZZLE

… … … … … … … …

(1) Pre-schedule reduce tasks (2) Group schedule micro-batches Goal: wait to return to scheduler

slide-26
SLIDE 26

Group scheduling

… … … … … … … …

Group of 2

Schedule group

  • f micro-batches

at once Fault tolerance, scheduling at group boundaries

Group of 2

slide-27
SLIDE 27

50 100 150 200 250 300 4 8 16 32 64 128 Time / Iter (ms) ime / Iter (ms) Machines Machines Baseline Only Pre-Scheduling Drizzle-10 Drizzle-100

Micro-benchmark: 2-stages

100 iterations – Breakdown of pre-scheduling, group-scheduling

In the paper: group size auto-tuning

slide-28
SLIDE 28

Evaluation

Continuous operators Micro-batch Static scheduling Low latency Dynamic scheduling (coarse granularity) Higher latency (coarse-grained processing) Drizzle Low latency (fine-grained processing) Dynamic scheduling (coarse granularity)

  • 1. Latency?
  • 2. Adaptability?
slide-29
SLIDE 29

EVALUATION: Latency

Yahoo! Streaming Benchmark Input: JSON events of ad-clicks Compute: Number of clicks per campaign Window: Update every 10s Comparing Spark 2.0, Flink 1.1.1, Drizzle 128 Amazon EC2 r3.xlarge instances

slide-30
SLIDE 30

0.2 0.4 0.6 0.8 1 500 1000 1500 2000 2500 3000 Event Latency (ms) Event Latency (ms) Spark Drizzle Flink

Streaming BENCHMARK - performance

Yahoo Streaming Benchmark: 20M JSON Ad-events / second, 128 machines Event Latency: Difference between window end, processing end

slide-31
SLIDE 31

5000 10000 15000 20000 190 200 210 220 230 240 250 260 270 280 290 Spark Flink Drizzle

Adaptability: FAULT TOLERANCE

Inject machine failure at 240 seconds Yahoo Streaming Benchmark: 20M JSON Ad-events / second, 128 machines

31

500 1000 1500 2000 190 200 210 220 230 Spark Flink Drizzle

Latency (ms)

slide-32
SLIDE 32

Execution models

Continuous operators Micro-batch Static scheduling Low latency Dynamic scheduling Higher latency Drizzle Low latency (fine-grained processing) Dynamic scheduling (coarse-granularity) Optimization of batches Optimization of batches

slide-33
SLIDE 33

Optimize execution of each micro-batch by pushing down aggregation

INTRA-BATCH QUERY optimization

Yahoo Streaming Benchmark: 20M JSON Ad-events / second, 128 machines 0.2 0.4 0.6 0.8 1 500 1000 1500 2000 2500 3000 Event Latency ( Event Latency (ms ms) Spark Drizzle Flink Drizzle-Optimized

slide-34
SLIDE 34

EVALUATION

End-to-end Latency Fault tolerance Query optimization

Synthetic micro-benchmarks Video Analytics Shivaram’s Thesis: Iterative ML Algorithms

Yahoo Streaming Benchmark Throughput Elasticity Group-size tuning

slide-35
SLIDE 35

Conclusion

Continuous operators Micro-batch Static scheduling Low latency Dynamic scheduling (coarse granularity) Higher latency (coarse-grained processing) Drizzle Low latency (fine-grained processing) Dynamic scheduling (coarse granularity) Optimization of batches Optimization of batches

Source code: https://github.com/amplab/drizzle-spark Shivaram is answering questions on sli.do