Keeping Master Green at Scale Sundaram Ananthanarayanan , Masoud - - PowerPoint PPT Presentation

keeping master green at scale
SMART_READER_LITE
LIVE PREVIEW

Keeping Master Green at Scale Sundaram Ananthanarayanan , Masoud - - PowerPoint PPT Presentation

Keeping Master Green at Scale Sundaram Ananthanarayanan , Masoud Saeida Ardekani, Denis Haenikel, Balaji Varadarajan, Simon Soriano, Dhaval Patel, Ali-Reza Adl-Tabatabai (https://eng.uber.com/research/keeping-master-green-at-scale/) Monorepo is


slide-1
SLIDE 1

Keeping Master Green at Scale

Sundaram Ananthanarayanan, Masoud Saeida Ardekani, Denis Haenikel, Balaji Varadarajan, Simon Soriano, Dhaval Patel, Ali-Reza Adl-Tabatabai (https://eng.uber.com/research/keeping-master-green-at-scale/)

slide-2
SLIDE 2
  • Single, shared repo hosting companies’ software assets

Advantages of a Monorepo [Ciera et al. @ICSE’18] ✔ Simplified Dependency Management ✔ Improved Code Visibility

Monorepo is popular!

Multirepo Monorepo!

slide-3
SLIDE 3
  • Monorepos handle a huge volume of commits every day
  • Existing CI workflows do not guarantee an always green master

‐ Too hard at scale

  • Submit Queue guarantees an always-green master at scale

Always green master considered hard

slide-4
SLIDE 4

01 Why green master is hard 02 Probabilistic Speculation 03 Conflict Analyzer 04 Evaluation Outline

slide-5
SLIDE 5

Monorepo Developer Change Revision CI Server BUILD TEST RESULT Developer

Lifecycle of a change in monorepo

Change Revision Peer Review

Change Revision

slide-6
SLIDE 6

Challenge: Concurrent conflicting changes

master C1

Alice

C2

Bob

slide-7
SLIDE 7

Challenge: Concurrent conflicting changes

C1 master C1 C2

Alice Bob

C2 build steps fail

slide-8
SLIDE 8

Example of a real conflict

slide-9
SLIDE 9

How often conflicts happen?

slide-10
SLIDE 10

How often conflicts happen?

Observation: Chances of a conflict ↑ from 5% to 40% as #. of concurrent & potentially conflicting changes ↑

slide-11
SLIDE 11

Delayed rollouts Hampered Productivity Complex rollbacks

Drawbacks of a red master

slide-12
SLIDE 12

Keeping master green: Queue

Alice, Bob, Carol enqueue changes they want to commit master

Alice Bob

H C3 C2 C1

Carol

slide-13
SLIDE 13

Keeping master green: Queue

master

Alice Bob

H C3 C2 C1

Carol

C1 is built and tested against mainline head (H).

slide-14
SLIDE 14

Keeping master green: Queue

Build steps for H ⊕ C1 succeed. master

Alice Bob

H C3 C2 C1

Carol

slide-15
SLIDE 15

Keeping master green: Queue

C1 is committed and it becomes the head. C2 is tested against it. master

Alice Bob

C3 C2

Carol

H

slide-16
SLIDE 16

Keeping master green: Queue

Build steps for H ⊕ C2 fails and C2 is rejected. master

Alice Bob

H C3 C2

Carol

H

slide-17
SLIDE 17

Keeping master green: Queue

✔ Guarantees an always green master by serializing changes

✖ Does not scale to 1000s of changes/day master

Alice Bob

H C3

Carol

H

slide-18
SLIDE 18

Keeping master green: Batching changes

C1 and C2 are batched and build steps are run. ` master

Alice Bob

H C3 C2 C1

Carol

slide-19
SLIDE 19

`

Keeping master green: Batching changes

master

Alice Bob

H C3 C2 C1

Carol

✔ Improves the throughput if batches succeed more often than not

✖ Testing batches masks intermediate changes that fail ✖ Batches will fail often as the size of the batch increases

What happens when batches fail?

slide-20
SLIDE 20

Challenge: how to do this at scale? (1000s of commits/day)

Keeping master green: Goals

Guarantee serializability

  • Illusion of a single queue when committing

changes

  • Git only offers serializability of patches

Provide reasonable SLAs

  • Overheads should be short enough for

developers to trade speed for correctness!

slide-21
SLIDE 21

Submit Queue: Overview

Speculation Engine

  • Speculates on success/failure
  • f changes
  • Builds speculation graph

Conflict Analyzer

  • Determines independent

changes

  • Constructs conflict graph

Planner Engine

  • Selects most valuable builds

from speculation engine

  • Execute builds and commit

changes

slide-22
SLIDE 22

Speculation Tree

C3 C2 C1 C1, C2, C3 - pending changes

slide-23
SLIDE 23

Speculation Tree

B1 C3 C2 C1

B1: Build Steps for H ⨁ C1

slide-24
SLIDE 24

Speculation Tree

B1 C3 C2 C1 1. Precompute the outcome of committing C2 under different realities 2. Commit or reject C2 based on the outcome of B1 and one of {B2, B1.2} B1.2 B2

B1 fails → C1 rejected B1 succeeds → C1 commits

B1.2 : Build C2 against (H ⨁ C1) B2: Build Steps for H ⨁ C2

slide-25
SLIDE 25

Speculation Tree

Challenge: Which builds to run? B1 B1.2 B2 B1.2.3 B3 B2.3 B1.3

B2 fails → C2 rejected B2 succeeds → C2 commits B1.2 fails → C2 rejected B1.2 succeeds → C2 commits B1 fails → C1 rejected B1 succeeds → C1 commits

C3 C2 C1

slide-26
SLIDE 26

Approach #1: Speculate Them All

Speculate on all possible outcomes equally

  • Selects builds in a breadth-first order

B1

B12

B123 B2 B3 B23 B13 C3 C2 C1

Does not scale for 1000s of changes/day

  • Need to run 2n builds in parallel to commit ‘n’ changes

Leads to substantial waste of resources

slide-27
SLIDE 27

Speculate Them All: Resource Wastage

B1

B12

B123 B2 B3 B23 B13 C3 C2 C1

slide-28
SLIDE 28

Speculate Them All: Observation

If we select and execute builds whose outcomes are most likely to be needed, then we require only n (out of 2n) builds.

Challenge: Which ‘n’ builds are likely to be needed?

slide-29
SLIDE 29

Probabilistic Speculation

represents the prob. the result of the build BC is used to make to commit/reject C. B1 B1.2 B1.2.3 B2 B3 B2.3 B1.3

slide-30
SLIDE 30

Probabilistic Speculation

Root B1 is always needed as is used to determine if C1 can be committed B1 B1.2 B1.2.3 B2 B3 B2.3 B1.3

slide-31
SLIDE 31

Probabilistic Speculation

represents the prob. that change C1 succeeds individually B1 B1.2 B1.2.3 B2 B3 B2.3 B1.3

slide-32
SLIDE 32

Probabilistic Speculation

represents the prob. that change C1 succeeds individually B1 B1.2 B1.2.3 B2 B3 B2.3 B1.3

slide-33
SLIDE 33

Probabilistic Speculation

represents the prob. that C2 conflicts with C1 B1 B1.2 B1.2.3 B2 B3 B2.3 B1.3

B1.2 : Build C2 against (H ⨁ C1)

slide-34
SLIDE 34

Probabilistic Speculation

B1 B1.2 B1.2.3 B2 B3 B2.3 B1.3

slide-35
SLIDE 35

Probabilistic Speculation: Summary

Choose most valuable builds by determining

  • Probability of success of a change
  • Probability of a conflict bet. changes

B1

B12

B123 B2 B3 B23 B13

slide-36
SLIDE 36
  • Logistic regression to train prediction models

○ Feature set includes 100+ hand-picked features ○ Prediction accuracy of 97%

Evaluating and

Speculation

  • dynamic features to re-adjust

weights based on initial predictions

  • # speculations succeeded
  • # speculations failed

Change

  • # affected targets
  • # git commits
  • # files changed
  • status of pre-submit checks

Developer

  • developer name
  • employment proficiencies
slide-37
SLIDE 37

Speculation

  • dynamic features to re-adjust weights

based on initial predictions

  • # speculations succeeded
  • # speculations failed

Revision

  • revision is a container for changes
  • # changes submitted
  • revert and test plans
  • # Submit attempts made

Change

  • # affected targets
  • # git commits
  • # files changed
  • status of pre-submit checks

Developer

  • developer name
  • employment proficiencies

Features for Training ML Models

slide-38
SLIDE 38

Conflict Analyzer

  • So far, we assumed all changes potentially conflict with each other

○ Cannot commit in parallel

  • What if changes can be proved to be independent?

○ Commit changes in parallel ○ Trim speculation space

  • We use Conflict Analyzer to find independent changes
slide-39
SLIDE 39

C1 C2 C3 Conflict graph for changes C1, C2, C3 where C1 and C2 are independent and conflict with C3.

Conflict Analyzer: Commit Changes in Parallel

slide-40
SLIDE 40

C1 C2 C3 Insight: Changes C1 and C2 can be committed in parallel. B1 B1.2.3 B2 B3 B1.3 B2.3

B1 fails B1 succeeds B2 succeeds B2 fails

Conflict Analyzer: Commit Changes in Parallel

slide-41
SLIDE 41

C1 C3 C2 Conflict graph for C1, C2, C3 where C1 conflicts with independent changes C2 and C3.

Conflict Analyzer: Trim Speculation Space

slide-42
SLIDE 42

Insight: Because C3 does not speculate on C2, # of possible builds for C3 reduces to 2. B1 B1.3 B2 B3 B1.2 C1 C3 C2

B1 fails

B1 succeeds

Conflict Analyzer: Trim Speculation Space

slide-43
SLIDE 43
  • Build system to detect if changes are

independent

  • Code partitioned into smaller entities

called targets

  • Every change affects a set of targets

Conflict Analyzer: Detecting conflicts at scale

main.exe util.o main.o util.c util.h main.c Example build graph

T1 T2 T3

slide-44
SLIDE 44

Two changes are independent if they affect a disjoint set of targets.

Detecting Conflicts: Intuition

slide-45
SLIDE 45

Build Graph for H ⊕ C1 Target Y Target X Target Z Build Graph for H ⊕ C2 Target Y Target X Target Z

Applying C1 Applying C2

Original Build Graph for H Target Y Target X Target Z

Detecting Conflicts: Example

slide-46
SLIDE 46

Original Build Graph for H Target Y Target X Target Z Build Graph for H ⊕ C1 Target Y Target X Target Z

Applying C1 Applying C2

Build Graph for H ⊕ C2 Target Y Target X Target Z

  • C1 and C2 are conflicting
  • But, the intersection of affected targets is empty!

Detecting Conflicts: Puzzle

slide-47
SLIDE 47

{(x, 4), (y, 5)} ∪ {(z, 6)} ≠ {(x, 4), (y, 5), (z, 7)} Thus, C1 and C2 are conflicting! Original Build Graph for H Target Y Target X Target Z 2 1 3

Build Graph for H ⊕ C1

Target Y Target X Target Z 5 4 3

Applying C

1

Applying C2

Build Graph for H ⊕ C2

Target Y Target X Target Z 2 1 6

Build Graph for H ⊕ C1⊕ C2

Target Y Target X Target Z 5 4 7

{(x, 4), (y, 5)} {(z, 6)} {(x, 4), (y, 5), (z,7)} Applying C1⊕ C2

Detecting Conflicts: Composition

slide-48
SLIDE 48
  • Intersection Approach

✖ Does not detect all kinds of conflicts

  • Union Approach

✖ Determining conflicts for n changes requires n2 build graphs!

  • Hybrid Approach

✔ Only 7.9% of changes cause a change to the build graph

  • Union Graph Approach (details in paper)

Detecting Conflicts: Summary

slide-49
SLIDE 49

Core Service

Submit Queue: Architecture Overview

API Service Web UI CLI Monorepo Build Controller

Schedule builds

Planner Engine

Select builds Determine conflicts Commit change’s patch Submit Change Push Changes

Speculation Engine

Speculation Graph

Conflict Analyzer

Conflict Graph

slide-50
SLIDE 50

Evaluation

Questions

  • How does Submit Queue performance compare against other strategies?

○ Queue, Speculate-all, Optimistic

  • What is the impact of conflict analyzer?

Setup

  • Implemented an Oracle that predicts outcome of a change correctly

○ All results normalized against the Oracle

  • Ingested real changes into our system at different rates
slide-51
SLIDE 51

Evaluation: Submit Queue Performance

slide-52
SLIDE 52

Speculate-all suffers up to 15x slowdown compared to the Oracle.

Evaluation: Submit Queue Performance (P50)

Speculate-all

slide-53
SLIDE 53

Evaluation: Submit Queue Performance (P50)

Speculate-all Optimistic speculation

Optimistic speculation performs better than speculate-all esp. under contention.

slide-54
SLIDE 54

Speculate-all

Submit Queue has the best performance among all the approaches.

Optimistic speculation Submit Queue

Evaluation: Submit Queue Performance (P50)

slide-55
SLIDE 55

Evaluation: Submit Queue Performance (P50)

Submit Queue

Performance matches Oracle’s performance under low contention

slide-56
SLIDE 56

Evaluation: Submit Queue Performance (P99)

Submit Queue

P99 turnaround time is only 4x worse under extreme contention. We don’t operate there typically in production.

slide-57
SLIDE 57
  • Oracle’s turnaround time improves by up to 50% with conflict analyzer.
  • Benefit for SQ and Speculate-all steadily converges towards Oracle.

Evaluation: Impact of Conflict Analyzer

P95 Turnaround Time Impr. for 500 changes/hour

slide-58
SLIDE 58

Submit Queue guarantees always-green master

  • Probabilistic speculation powered by logistic regression to select builds

that are likely to succeed, and execute them in parallel

  • Conflict analyzer to commit independent changes in parallel, and trim the

speculation space.

  • Evaluated Submit Queue in production deployment
slide-59
SLIDE 59

Thank you!