No Shard Left Behind Straggler-free data processing in Cloud - PowerPoint PPT Presentation

No Shard Left Behind Straggler-free data processing in Cloud Dataflow Eugene Kirpichov Senior Software Engineer

Workers Time Google Cloud Platform 2

Google Cloud Platform 3

Plan 01 Intro 04 Autoscaling Setting the stage Why dynamic rebalancing really matters 02 Stragglers 05 If you remember two things Where they come from and Philosophy of everything above how people fight them 03 Dynamic rebalancing 1 How it works 2 Why is it hard

01 Intro Setting the stage

Google’s data processing timeline Dataflow Apache Beam MapReduce FlumeJava Dremel Spanner GFS Big Table Pregel Colossus MillWheel 2002 2004 2006 2008 2010 2012 2014 2016 Google Cloud Platform 6

WordCount Pipeline p = Pipeline.create(options); p.apply( TextIO.Read.from ("gs://dataflow-samples/shakespeare/*")) .apply( FlatMapElements.via ( word → Arrays.asList(word.split("[^a-zA-Z']+")))) .apply( Filter.byPredicate (word → !word.isEmpty())) .apply( Count.perElement ()) .apply( MapElements.via ( count → count.getKey() + ": " + count.getValue()) .apply( TextIO.Write.to ("gs://.../...")); p.run(); Google Cloud Platform 7

A B ParDo DoFn: A → [B] K, V K, [V] GroupByKey GBK MapReduce = ParDo + GroupByKey + ParDo Google Cloud Platform 8

Running a ParDo shard 1 DoFn shard 2 DoFn DoFn shard N DoFn Google Cloud Platform 9

Gantt charts Workers shard N Time Google Cloud Platform 10

Large WordCount: Read files, GroupByKey, Write files. 400 workers 20 minutes Google Cloud Platform 11

02 Stragglers Where they come from, and how people fight them

Stragglers Workers Time Google Cloud Platform 13

Amdahl’s law: it gets worse at scale #workers serial fraction Higher scale ⇒ More bottlenecked by serial parts. Google Cloud Platform 14

Where do stragglers come from? Uneven partitioning Uneven complexity Uneven resources Noise Process dictionary Join Foos / Bars, Bad machines Spuriously slow external in parallel by first letter: in parallel by Foos. RPCs Bad network ⅙ words start with ‘t’ ⇒ Some Foos have ≫ Bugs Resource contention < 6x speedup Bars than others. Google Cloud Platform 15

What would you do? Uneven partitioning Uneven complexity Uneven resources Noise Oversplit Backups Hand-tune Restarts Use data statistics Predictive ⇒ Unreliable Weak Google Cloud Platform 16

These kinda work. But not really. Manual tuning = Sisyphean task Time-consuming, uninformed, obsoleted by data drift ⇒ Almost always tuned wrong Statistics often missing / wrong Doesn’t exist for intermediate data Size != complexity Backups/restarts only address slow workers Google Cloud Platform 17

Upfront heuristics don’t work: will predict wrong. Higher scale → more likely. Google Cloud Platform Confidential & Proprietary 18

High scale triggers worst-case behavior. Corollary: If you’re bottlenecked by worst-case behavior, you won’t scale. Google Cloud Platform Confidential & Proprietary 19

03.1 Dynamic rebalancing How it works

Detect and fight stragglers Workers Time Google Cloud Platform 21

What is a straggler, really? Workers Slower than perfectly-parallel: t end > sum(t end ) / N Time Google Cloud Platform 22

Split stragglers, return residuals into pool of work Now Avg completion time 170 100 130 200 foo.txt (cheap, atomic) Workers keep running schedule 100 170 170 200 Time Google Cloud Platform 23

Rinse, repeat (“liquid sharding”) Now Avg completion time Workers Time Google Cloud Platform 24

1 ParDo ParDo/GBK/ParDo Skewed Uniform 24 workers 400 workers 50% 25% Google Cloud Platform 25

Adaptive > Predictive Get out of trouble > avoid trouble Google Cloud Platform Confidential & Proprietary 26

03.2 Dynamic rebalancing Why is it hard?

And that’s it? What’s so hard? Semantics Quality Making predictions Being sure it works What can be split? Wait-free Non-uniform density Testing consistency Data consistency Perfect granularity Stuckness Debugging Not just files “Dark matter” Measuring quality APIs Google Cloud Platform 28

What is splitting split at 170 foo.txt [100, 200) foo.txt [100, 170) foo.txt [170, 200) Google Cloud Platform 29

What is splitting: Associativity [A, B) + [B, C) = [A, C) Google Cloud Platform 30

What is splitting: Rounding up [A, B) = records starting in [A, B) Random access ⇒ Can split without scanning data! Google Cloud Platform 31

What is splitting: Rounding up squash vanilla grape apple rose beet pear lime kiwi fig [A, B) = records starting in [A, B) Random access [a, h) [h, s) [s, $) ⇒ Can split without scanning data! squash vanilla grape apple rose beet pear lime kiwi fig Google Cloud Platform 32

What is splitting: Blocks [A, B) = records in blocks starting in [A, B) Google Cloud Platform 33

What is splitting: Readers foo.txt [100, 200) split at 170 “Reader” Re-reading consistency: continue until EOF = re-read shard foo.txt [100, 170) foo.txt [170, 200) Google Cloud Platform 34

Dynamic splitting: readers ok not ok read not yet read X = last record read: Exact, Increasing e.g. can’t split an arbitrary SQL query Google Cloud Platform 35

[A, B) = blocks of records starting in [A, B) [A, B) + [B, C) = [A, C) Random access ⇒ No scanning needed to split Reading repeatable, ordered by position, positions exact Google Cloud Platform Confidential & Proprietary 36

Concurrency when splitting time ... Read Process should I split? ? ? ? Per-element processing While we wait, 1000s of workers idle. in O(hours) is common! Google Cloud Platform 37

Concurrency when splitting ... Read Process should I split? ? ? ? Per-element processing While we wait, 1000s of workers idle. in O(hours) is common! ... Read Process split! ok. split! ok. Split wait-free (but race-free), while processing/reading. see code: RangeTracker Google Cloud Platform 38

Perfectly granular splitting “Few records, heavy processing” is common. ⇒ Perfect parallelism required Google Cloud Platform 39

Separation: ParDo { record → sleep( ∞ ) } parallelized perfectly (requires wait-free + perfectly granular ) Google Cloud Platform Confidential & Proprietary 40

Separation is a qualitative improvement foo5.txt foo42.txt foo8.txt ParDo: ParDo: foo100.txt /path/to/foo*.txt expand read glob records foo91.txt perfectly parallel perfectly parallel foo26.txt over records over files foo87.txt See also: Splittable DoFn http://s.apache.org/splittable-do-fn foo56.txt infinite scalability (no “shard per file”) Google Cloud Platform 41

“Practical” solutions improve performance “No compromise” solutions reduce dimension of the problem space Google Cloud Platform Confidential & Proprietary 42

Google Cloud Platform 43

Making predictions: easy, right? 130 200 100 ~30% complete: 130 / [100, 200) = 0.3 Split at 70%: 0.7 [100, 200) = 170 t grape apple beet kiwi fig ~50% complete: k / [a, z) ≈ 0.5 Split at 70%: 0.7 [a, z) ≈ t 70% Google Cloud Platform 44

Easy; usually too good to be true. Progress Progress Progress 100% 100% 100% p x t t t t x t 100% Progress Progress Progress 100% 100% 100% t t t Google Cloud Platform 45

Accurate predictions = wrong goal, infeasible. Wildly off ⇒ System should still work Optimize for emergent behavior (separation) Better goal: detect stuckness Google Cloud Platform Confidential & Proprietary 46

Dark matter Heavy work that you don’t know exists, until you hit it. Goal: discover and distribute dark matter as quickly as possible. Google Cloud Platform 47 47 (Image credit: NASA)

04 Autoscaling Why dynamic rebalancing really matters

A lot of work ⇒ A lot of workers How much work will there be? Can’t predict: data size, complexity, etc. What should you do? Adaptive > Predictive. Keep re-estimating total work; scale up/down (Image credit: Wikipedia) 49 49

Start off with 3 workers, things are looking okay 10m Re-estimation ⇒ orders of magnitude more work: need 100 workers! 3 days 100 workers useless without 100 pieces of work! 92 workers idle 50

Autoscaling + dynamic rebalancing Now scaling up is no big deal! Add workers Work distributes itself Job smoothly scales 3 → 1000 workers. Waves of splitting Upscaling & VM startup Google Cloud Platform 51

05 If you remember two things Philosophy of everything above

If you remember two things Adaptive > Predictive Fighting stragglers > Preventing stragglers Emergent behavior > Local precision “No compromise” solutions matter Reducing dimension > Incremental improvement “Corner cases” are clues that you’re still compromising wait-free heavy records separation perfectly granular reading-as-ParDo rebalancing autoscaling reusability Google Cloud Platform 53

Thank you Q&A Google Cloud Platform Confidential & Proprietary 54

No Shard Left Behind Straggler-free data processing in Cloud - PowerPoint PPT Presentation

No Shard Left Behind Straggler-free data processing in Cloud Dataflow Eugene Kirpichov Senior Software Engineer Workers Time Google Cloud Platform 2 Google Cloud Platform 3 Plan 01 Intro 04 Autoscaling Setting the stage Why dynamic

No Child Left Behind No Child Left Behind Our Children Are Our Future: No Child Left Behind A

Indirect Left Turns Study Indirect Left Turns Study Indirect Left Turns Study Indirect Left

No Parent Left Behind: No Parent Left Behind: Connecting Institutional Priorities with

Semigroups of Left I-quotients Nassraddin Ghroda May 11, 2010 Semigroups of Left I-quotients

#prep X Assembly 02: Left Fan In this guide, we attach Left filament fan to the X carriage.

NoThing L Thing Left B Behind A National Surgical Patient-Safety Project to Prevent Retained

Co-residence patterns of the individuals left behind by the migrants and their analytical

AW Left Cervical Radiculopathy Presentation World class swimmer with left scapular pain

Right hemisphere Motor functions on left side of body Perceives left side of space Left

Oasis Community Learning Presentation By Jim Gardner The Shard - July 2019 [Notes from the

Economic Shocks A City Horizons Lecture, The Shard. London 12 December 2017 Ron Martin

The SHARD Triple-Store Rick Schantz Kurt Rohloff krohloff@bbn.com schantz@bbn.com @avometric

Orbis UUID Generation, using Consistent Hashing in Erlang UUID [42-bit Timestamp, 12-bit Shard,

Monitoring the progress of those children furthest behind Pledge to Leave No One Behind

The Feminization of International Migration Cort es and its effects on the Families Left

8/29/2015 Effect of Empirical Left Atrial Appendage Isolation on Effect of Empirical Left Atrial

Climate Change Negotiations GTAP-E Group August 14, 2009 Bangkok, Thailand Introduction

Interconnectedness (2) 5 Interconnectedness (3) 6 Interconnectedness (4) 7 Global Supply

AGM 5 June 2019 AGM Presentation Strategic re-positioning rebalancing the portfolio

Rebalancing the International Economic Order Presentation to a conference hosted by the Peterson

(ZALT) June 2020 Disclosure Information presented does not involve the rendering of personalized

Medicare Advantage Boot Camp for Health Actuaries Presenters: Daniel Bailey, FSA, MAAA Kevin

STBD Proposition One Budget Amendment STBD Proposition 1 Timeline November 2014 Prop 1 approved

Renter Rebate Policy Director Jake Feldman, Senior Reform Proposal Fiscal Analyst Vermont