Google Cloud Dataflow Manuel Fahndrich Software Engineer Google - - PowerPoint PPT Presentation

google cloud dataflow
SMART_READER_LITE
LIVE PREVIEW

Google Cloud Dataflow Manuel Fahndrich Software Engineer Google - - PowerPoint PPT Presentation

Streaming Auto-Scaling in Google Cloud Dataflow Manuel Fahndrich Software Engineer Google Addictive Mobile Game https://commons.wikimedia.org/wiki/File:Globe_centered_in_the_Atlantic_Ocean_(green_and_grey_globe_scheme).svg Individual


slide-1
SLIDE 1

Software Engineer Google

Manuel Fahndrich

Streaming Auto-Scaling in Google Cloud Dataflow

slide-2
SLIDE 2

https://commons.wikimedia.org/wiki/File:Globe_centered_in_the_Atlantic_Ocean_(green_and_grey_globe_scheme).svg

Addictive Mobile Game

slide-3
SLIDE 3

1,251,965 1,019,341 989,673 151,365 109,903 98,736

Team Ranking Individual Ranking

Sarah Joe Milo

Hourly Ranking Daily Ranking

slide-4
SLIDE 4

An Unbounded Stream of Game Events

9:00 8:00 14:00 13:00 12:00 11:00 10:00 2:00 1:00 7:00 6:00 5:00 4:00 3:00

slide-5
SLIDE 5

… with unknown delays.

9:00 8:00 14:00 13:00 12:00 11:00 10:00 8:00

8:00 8:00 8:00

slide-6
SLIDE 6

The Resource Allocation Problem

time

workload

  • ver-provisioned

resources

time

workload under-provisioned resources

resources resources

slide-7
SLIDE 7

Matching Resources to Workload

time

workload auto-tuned resources

resources

slide-8
SLIDE 8

Resources = Parallelism

time

workload auto-tuned parallelism

parallelism

More generally: VMs (including CPU, RAM, network, IO).

slide-9
SLIDE 9

Assumptions

Big Data Problem Embarrassingly Parallel Scaling VMs ==> Scales Throughput Horizontal Scaling

slide-10
SLIDE 10

Agenda

Streaming Dataflow Pipelines Pipeline Execution Adjusting Parallelism Automatically Summary + Future Work 1 2 3 4

slide-11
SLIDE 11

Streaming Dataflow

1

slide-12
SLIDE 12

2012 2002 2004 2006 2008 2010

MapReduce

GFS Big Table Dremel Pregel

FlumeJava

Colossus Spanner

2014

MillWheel

Dataflow

2016

Google’s Data-Related Systems

slide-13
SLIDE 13

Google Dataflow SDK

Open Source SDK used to construct a Dataflow pipeline. (Now Incubating as Apache Beam)

slide-14
SLIDE 14

Computing Team Scores

// Collection of raw log lines PCollection<String> raw = ...; // Element-wise transformation into team/score // pairs PCollection<KV<String, Integer>> input = raw.apply(ParDo.of(new ParseFn())) // Composite transformation containing an // aggregation PCollection<KV<String, Integer>> output = input .apply(Window.into(FixedWindows.of(Minutes(60)))) .apply(Sum.integersPerKey());

slide-15
SLIDE 15

Google Cloud Dataflow

  • Given code in Dataflow (incubating as Apache Beam)

SDK...

  • Pipelines can run…

○ On your development machine ○ On the Dataflow Service on Google Cloud Platform ○ On third party environments like Spark or Flink.

slide-16
SLIDE 16

Cloud Dataflow

A fully-managed cloud service and programming model for batch and streaming big data processing.

Google Cloud Dataflow

slide-17
SLIDE 17

Google Cloud Dataflow

Optimize Schedule

GCS GCS

slide-18
SLIDE 18

time

workload auto-tuned parallelism

parallelism

Back to the Problem at Hand

slide-19
SLIDE 19

Signals measuring Workload Policy making Decisions Mechanism actuating Change

Auto-Tuning Ingredients

slide-20
SLIDE 20

Pipeline Execution

2

slide-21
SLIDE 21

S0 S2 S1

Optimized Pipeline = DAG of Stages

raw input Individual points team points

slide-22
SLIDE 22

S0 S2 S1

Stage Throughput Measure

raw input Individual points team points throughput throughput throughput

slide-23
SLIDE 23

Picture by Alexandre Duret-Lutz, Creative Commons 2.0 Generic

slide-24
SLIDE 24

S0 S2 S1

Queues of Data Ready for Processing Queue Size = Backlog

slide-25
SLIDE 25

vs. Backlog Growth Backlog Size

slide-26
SLIDE 26

Backlog Growth = Processing Deficit

slide-27
SLIDE 27

S1

Derived Signal: Stage Input Rate

throughput

Input Rate = Throughput + Backlog Growth

backlog growth

slide-28
SLIDE 28

Constant Backlog... ...could be bad

slide-29
SLIDE 29

Backlog Time =

Backlog Size Throughput

slide-30
SLIDE 30

Backlog Time = Time to get through backlog

slide-31
SLIDE 31

Bad Backlog = Long Backlog Time

slide-32
SLIDE 32

Backlog Growth and Backlog Time Inform Upscaling. What Signals indicate Downscaling?

slide-33
SLIDE 33

Low CPU Utilization

slide-34
SLIDE 34

Throughput Backlog growth Backlog time CPU utilization Signals Summary

slide-35
SLIDE 35

Goals:

  • 1. No backlog growth
  • 2. Short backlog time
  • 3. Reasonable CPU utilization

Policy: making Decisions

slide-36
SLIDE 36

Upscaling Policy: Keeping Up

Given M machines For a stage, given: average stage throughput T average positive backlog growth G of stage Machines needed for stage to keep up: (T + G) T M’ = M

slide-37
SLIDE 37

Upscaling Policy: Catching Up

Given M machines Given R (time to reduce backlog) For a stage, given: average backlog time B Extra machines to remove backlog: B R Extra = M

slide-38
SLIDE 38

Upscaling Policy: All Stages

Want all stages to:

  • 1. keep up
  • 2. have log backlog time

Pick Maximum over all stages of M’ + Extra

slide-39
SLIDE 39

Example (signals)

input rate throughput backlog growth backlog time MB/s seconds

slide-40
SLIDE 40

Example (signals)

input rate throughput backlog growth backlog time MB/s seconds

slide-41
SLIDE 41

Example (signals)

input rate throughput backlog growth backlog time MB/s seconds

slide-42
SLIDE 42

Example (signals)

input rate throughput backlog growth backlog time MB/s seconds

slide-43
SLIDE 43

Example (policy)

M’ M Extra R=60s

machines

slide-44
SLIDE 44

Example (policy)

M’ M

machines

Extra R=60s

slide-45
SLIDE 45

Example (policy)

M’ M

machines

Extra R=60s

slide-46
SLIDE 46

Example (policy)

M’ M

machines

Extra R=60s

slide-47
SLIDE 47

Preconditions for Downscaling Low backlog time No backlog growth Low CPU utilization

slide-48
SLIDE 48

How far can we downscale?

Stay tuned...

slide-49
SLIDE 49

Adjusting Parallelism of a Running Streaming Pipeline

Mechanism: actuating Change

3

slide-50
SLIDE 50

S0 S2 S1

Optimized Pipeline = DAG of Stages

slide-51
SLIDE 51

S0 S2 S1

Optimized Pipeline = DAG of Stages

slide-52
SLIDE 52

S0 S2 S1

Optimized Pipeline = DAG of Stages

Machine 0

slide-53
SLIDE 53

Adding Parallelism

Machine 0 S0 S2 S1 S0 S2 S1 S0 S2 S1

slide-54
SLIDE 54

Adding Parallelism

S0 S2 S1 S0 S2 S1 Machine 0 Machine 1

slide-55
SLIDE 55

Adding Parallelism = Splitting Key Ranges

S0 S2 S1 S0 S2 S1 Machine 0 Machine 1

slide-56
SLIDE 56

Migrating a Computation

slide-57
SLIDE 57

Adding Parallelism = Migrating Computation Ranges

S0 S2 S1 S0 S2 S1 Machine 0 Machine 1

slide-58
SLIDE 58

Checkpoint and Recovery ~ Computation Migration

slide-59
SLIDE 59

Key Ranges and Persistence

S0 S2 S1 Machine 0 S0 S2 S1 Machine 1 S0 S2 S1 Machine 3 S0 S2 S1 Machine 2

slide-60
SLIDE 60

Downscaling from 4 to 2 Machines

S0 S2 S1 Machine 0 S0 S2 S1 Machine 1 S0 S2 S1 Machine 3 S0 S2 S1 Machine 2

slide-61
SLIDE 61

Downscaling from 4 to 2 Machines

S0 S2 S1 Machine 0 S0 S2 S1 Machine 1 S0 S2 S1 Machine 3 S0 S2 S1 Machine 2

slide-62
SLIDE 62

Downscaling from 4 to 2 Machines

S0 S2 S1 Machine 0 S0 S2 S1 Machine 1

slide-63
SLIDE 63

Downscaling from 4 to 2 Machines

S0 S2 S1 Machine 1 S0 S2 S1 Machine 2

Upsizing = Steps in Reverse

slide-64
SLIDE 64

Granularity of Parallelism

As of March 2016, Google Cloud Dataflow:

  • Splits Key Ranges initially Based on Max Machines
  • At Max: 1 Logical Persistent Disk per Machine

Each disk has slice of key ranges from all stages

  • Only (relatively) even Disk Distributions
  • Results in Scaling Quanta
slide-65
SLIDE 65

Parallelism Disk per Machine 3 N/A 4 15 5 12 6 10 7 8, 9 8 7, 8 9 6, 7 10 6 12 5 15 4 20 3 30 2 60 1

Example Scaling Quanta: Max = 60 Machines

slide-66
SLIDE 66

Goals:

  • 1. No backlog growth
  • 2. Short backlog time
  • 3. Reasonable CPU utilization

Policy: making Decisions

slide-67
SLIDE 67

Preconditions for Downscaling Low backlog time No backlog growth Low CPU utilization

slide-68
SLIDE 68

Next lower scaling quanta => M’ machines Estimate future CPUM’ per machine: If new CPUM’ < threshold (say 90%), downscale to M’

Downscaling Policy

M M’ CPUM’ = CPUM

slide-69
SLIDE 69

Summary + Future Work

4

slide-70
SLIDE 70

Artificial Experiment

slide-71
SLIDE 71

Auto-Scaling Summary

Signals: throughput, backlog time, backlog growth, CPU utilization Policy: keep up, reduce backlog, use CPUs Mechanism: split key ranges, migrate computations

slide-72
SLIDE 72
  • Experiment with non-uniform disk distributions to

address hot ranges

  • Dynamically splitting ranges finer than initially done.
  • Approximate model of #VM - throughput relation

Future Work

slide-73
SLIDE 73

Questions?

Further reading on streaming model: The world beyond batch: Streaming 101 The world beyond batch: Streaming 102