Parity Models Erasure-Coded Resilience for Prediction Serving - PowerPoint PPT Presentation

Parity Models Erasure-Coded Resilience for Prediction Serving Systems Jack Kosaian Rashmi Vinayak Shivaram Venkataraman

Rashmi Vinayak Shivaram Venkataraman 2

Machine learning lifecycle Training Inference Deploy model in Get a model to reach target domain desired accuracy “Batch” jobs Online Hours to weeks Milliseconds 3

Machine learning inference queries predictions 0.8 0.05 0.15 cat dog bird 4

Prediction serving systems Inference in datacenter/cluster settings Open Source Cloud Services 5

Prediction serving system architectures queries predictions Frontend model instances 6

Machine learning inference question- answering translation ranking Must operate with low, predictable latency 7

Unavailability in serving systems • Slowdowns and failures (unavailability) - Resource contention - Hardware failures - Runtime slowdowns - ML-specific events • Result in inflated tail latency - Cause prediction serving systems to miss SLOs Must alleviate slowdowns and failures 8

Redundancy-based resilience • Proactive: send each query to 2+ servers • Reactive: wait for a timeout before duplicating query Reactive Recovery Delay (lower is better) Proactive Resource Overhead (lower is better) 9

Erasure codes: proactive, resource-efficient Relation to (n, k) notation k data units r “parity” units encoding n = k + r “parity” D 2 P = D 1 + D 2 D 1 D 1 P D 2 D 2 = P – D 1 any k out of (k+r) units original k data units decoding 11

Erasure codes: proactive, resource-efficient Storage Reactive Communication Recovery Delay Prediction Serving Systems erasure (lower is better) codes Proactive Resource Overhead (lower is better) 12

Coded-computation Our goal: Using erasure codes to reduce tail latency in prediction serving Goal: preserve results of computation over queries X 1 X 2 queries F F F models F(X 1 ) F(X 2 ) predictions 13

Coded-computation Our goal: Using erasure codes to reduce tail latency in prediction serving Encode queries X 1 X 2 encode “parity query” F F F F(X 1 ) F(X 2 ) 14

Coded-computation Our goal: Using erasure codes to reduce tail latency in prediction serving Decode results of inference over queries X 1 X 2 encode “parity query” F F F F(X 1 ) F(P) decode F(X 2 ) 15

Traditional coding vs. coded-computation Codes for storage Coded-computation D 1 D 2 X 1 X 2 encode encode F F F D 1 D 2 P F(X 1 ) F(P) decode decode F(X 2 ) D 2 Need to recover computation over inputs 16

Challenge: Non-linear computation Linear computation Non-linear computation Example: F(X) = 2X Example: F(X) = X 2 X 2 P = X 1 + X 2 X 2 P = X 1 + X 2 X 1 X 1 2X 2X X 2 X 2 X 2 2X F(X 2 ) = F(P) – F(X 1 ) F(X 2 ) = F(P) – F(X 1 ) F(X 2 ) = 2(X 1 + X 2 ) 2 – X 12 F(X 2 ) = 2(X 1 + X 2 ) – X 1 F(X 2 ) = X 22 + 2X 1 X 2 F(X 2 ) = 2X 2 Actual is X 22 17

Challenge: Non-linear computation Linear computation Non-linear computation Example: F(X) = 2X X 2 P = X 1 + X 2 X 2 P = X 1 + X 2 X 1 X 1 2X 2X 2X F(X 2 ) = F(P) – F(X 1 ) F(X 2 ) = F(P) – F(X 1 ) F(X 2 ) = 2(X 1 + X 2 ) – X 1 F(X 2 ) = ??? F(X 2 ) = 2X 2 18

Current approaches to coded-computation • Lots of great work on linear computations • Huang 1984, Lee 2015, Dutta 2016, Dutta 2017, Mallick 2018, more… • Recent work supports restricted nonlinear computations • Yu 2018 • At least 2x resource overhead Current approaches insufficient for neural networks in prediction serving systems 19

Our approach: Learning-based coded-computation Learning a Code: Machine Learning for Approximate Non-Linear Coded Computation https://arxiv.org/abs/1806.01259 Parity Models: Erasure-Coded Resilience for Prediction Serving Systems To appear in ACM SOSP 2019 https://jackkosaian.github.io 21

Learning an erasure code? Design encoder and decoder as neural networks X 1 X 2 encoder Accurate decoder Learning a Code: Machine Learning for Approximate Non-Linear Coded Computation https://arxiv.org/abs/1806.01259 22

Learning an erasure code? Design encoder and decoder as neural networks X 1 X 2 encoder Accurate Expensive Computationally encoder/decoder expensive decoder Learning a Code: Machine Learning for Approximate Non-Linear Coded Computation https://arxiv.org/abs/1806.01259 23

Learn computation over parities Use simple, fast encoders and decoders Learn computation over parities: “parity model” P = X 1 + X 2 X 2 X 1 Accurate parity model Efficient (F P ) encoder/decoder F(X 2 ) = F P (P) – F(X 1 ) Parity Models: Erasure-Coded Resilience for Prediction Serving Systems To appear in ACM SOSP 2019 https://jackkosaian.github.io 24

Designing parity models Parity model goal Transform parities such that decoder can reconstruct unavailable predictions P = X 1 + X 2 X 2 X 1 parity model (F P ) F(X 2 ) = F P (P) – F(X 1 ) 25

Designing parity models Parity model goal Transform parities such that decoder can reconstruct unavailable predictions P = X 1 + X 2 X 2 X 1 parity model (F P ) F(X 2 ) = F P (P) – F(X 1 ) Learn a parity model F P (P) = F(X 1 ) + F(X 2 ) 26

Designing parity models Parity model goal Transform parities such that decoder can reconstruct unavailable predictions P = X 1 + X 2 X 2 X 1 parity model (F P ) F(X 2 ) = F P (P) – F(X 1 ) F P (P) = F(X 1 ) + F(X 2 ) 27

Training a parity model 1. Sample k inputs and encode 2. Perform inference with parity model P = X 1 + X 2 3. Compute loss 4. Backpropogate loss 5. Repeat F P (P) 1 compute loss Desired output: F(X 1 ) + F(X 2 ) 0.8 0.15 0.05 0.2 0.7 0.1 28

Training a parity model: higher parameter k 1. Sample inputs and encode 2. Perform inference with parity model P = X 1 + X 2 + X 3 + X 4 3. Compute loss 4. Backpropogate loss 5. Repeat F P (P) 3 Desired output: F(X 1 ) + F(X 2 ) + F(X 3 ) + F(X 4 ) 31

Training a parity model: different encoder 1. Sample inputs and encode 2. Perform inference with parity model P = 3. Compute loss 4. Backpropogate loss 5. Repeat F P (P) 3 32

Learning results in approximate reconstructions Appropriate for machine learning inference 1. Predictions resulting from inference are approximations 2. Inaccuracy only at play when predictions otherwise slow/failed 33

Implementing parity models in Clipper queries predictions Frontend Encoder Decoder parity model slow/failed 34

Design space in parity models framework Encoder/decoder • Many possibilities • Generic: addition/subtraction P = X 1 + X 2 X 2 X 1 • Can specialize to task parity model Parity model architecture (F P ) • Again, many possibilities • Same as original model ⇒ same latency as original F(X 2 ) = F P (P) – F(X 1 ) 35

Evaluation 1. How accurate are reconstructions using parity models? 2. How much can parity models help reduce tail latency? 36

Evaluation of Accuracy • Addition/subtraction code • k = 2, r = 1 (P = X 1 + X 2 ) • 2x less overhead than replication 37

Evaluation of Accuracy Parity model only comes into play when predictions are slow/failed • Addition/subtraction code • k = 2, r = 1 (P = X 1 + X 2 ) • 2x less overhead than replication 38

Evaluation of Accuracy Parity model only comes into play when predictions are slow/failed • Addition/subtraction code • k = 2, r = 1 (P = X 1 + X 2 ) • 2x less overhead than replication 39

Evaluation of Overall Accuracy Parity model only comes into play when predictions are slow/failed • Addition/subtraction code • k = 2, r = 1 (P = X 1 + X 2 ) • 2x less overhead than 6.1% replication 40

Evaluation of Overall Accuracy Parity model only comes into play when predictions are slow/failed • Addition/subtraction code • k = 2, r = 1 (P = X 1 + X 2 ) • 2x less overhead than 0.6% 6.1% replication 41

Evaluation of Overall Accuracy Parity model only comes into play when predictions are slow/failed • Addition/subtraction code • k = 2, r = 1 (P = X 1 + X 2 ) expected operating regime • 2x less overhead than replication 42

Evaluation of Accuracy: Higher values of k Tradeoff between resource-overhead, resilience, and accuracy • Addition/subtraction code 43

Evaluation of Accuracy: Object-localization Ground Truth Available Parity Models 44

Evaluation of Accuracy: Task-specific encoder 22% accuracy improvement over addition/subtraction at k = 4 Input Images Parity Image 32 encode 32 32 32 45

Evaluation of Tail Latency Reduction: Setup • Implemented in Clipper prediction serving system • Evaluate with 18-36 nodes on AWS with varying: • Inference hardware (GPUs, CPUs) • Query arrival rates • Batch sizes • Levels of load imbalance • Amounts of redundancy • Baseline approaches • Baseline: approach with same number of resources as parity models 46

Parity Models Erasure-Coded Resilience for Prediction Serving - PowerPoint PPT Presentation

Parity Models Erasure-Coded Resilience for Prediction Serving Systems Jack Kosaian Rashmi Vinayak Shivaram Venkataraman Rashmi Vinayak Shivaram Venkataraman 2 Machine learning lifecycle Training Inference Deploy model in Get a model to

Parity Models Erasure-Coded Resilience for Prediction Serving Systems Jack Kosaian Rashmi

Forward Error Correction using Erasure Codes using Erasure Codes Reference : L. Rizzo,

Decoding F q -linear codes over erasure channels Sara D. Cardell Universidad de Alicante

Algorithms for Parity Games Piotr Danilewski May 15, 2008 Piotr Danilewski Algorithms for

M12 X-coded 10Gb/s M12 X-Coded Field installable for Rail D4 Industrial Ethernet, Ethernet/IP

Linear-Time Erasure List-Decoding of Expander Codes Noga Ron-Zewi (University of Haifa) Mary

Erasure Codes. Erasure Code: Example. Example Make polynomial, P ( x ) = a 2 x 2 + a 1 x + a 0

Type Erasure 86 What is Type Erasure? The way for the Java

Parity Violating Electron Scattering at Jefferson Lab Prof. Kent Paschke Intro to PVeS Parity

1-D and 2-D Parity FEC draft-ietf-fecframe-1d2d-parity-scheme-00 IETF 73 November 2008 Ali

Solving parity games Definition (Parity game) G = V E , V A , R , : V N where

Functional Assessment of Erasure Coded Storage Archive Computer Systems, Cluster, and Networking

SelectiveEC: Selective Reconstruction in Erasure-coded Storage Systems Liangliang Xu, Min Lyu,

Childrens Resilience Initiative One Communitys Response to ACEs through Resilience 1

Coded Computational Photography ! EE367/CS448I: Computational Imaging and Display !

Turbo Codes and Turbo-Coded Modulation Turbo Codes and Turbo-Coded Modulation in CDMA Mobile

Hierarchical Codes: How to Make Erasure Codes Attractive for Peer-to-Peer Storage Systems

Disclosures Nothing to disclose Critical Care Management of Acute Ischemic Stroke Nerissa U.

Disclosures Nothing to disclose Current Management of Research Funding: American Heart

THE BEATITUDES (Matt5:1-12) Fr. Dan Abba Literally, mystagogy means leading those who have

CS 798: Quantum Fault Tolerance lecture 5: Threshold 1/2 for erasure error recap

Somewhat Non-Committing Encryption and Efficient Adaptively Secure Oblivious Transfer Hong-Sheng

Sub-optimality of superposition coding for three or more receivers Chandra Nair, & Mehdi

Type Systems Lecture 11 Jan. 12th, 2005 Sebastian Maneth

Parity Models Erasure-Coded Resilience for Prediction Serving - PowerPoint PPT Presentation

Parity Models Erasure-Coded Resilience for Prediction Serving Systems Jack Kosaian Rashmi Vinayak Shivaram Venkataraman Rashmi Vinayak Shivaram Venkataraman 2 Machine learning lifecycle Training Inference Deploy model in Get a model to

Parity Models Erasure-Coded Resilience for Prediction Serving Systems Jack Kosaian Rashmi

Forward Error Correction using Erasure Codes using Erasure Codes Reference : L. Rizzo,

Decoding F q -linear codes over erasure channels Sara D. Cardell Universidad de Alicante

Algorithms for Parity Games Piotr Danilewski May 15, 2008 Piotr Danilewski Algorithms for

M12 X-coded 10Gb/s M12 X-Coded Field installable for Rail D4 Industrial Ethernet, Ethernet/IP

Linear-Time Erasure List-Decoding of Expander Codes Noga Ron-Zewi (University of Haifa) Mary

Erasure Codes. Erasure Code: Example. Example Make polynomial, P ( x ) = a 2 x 2 + a 1 x + a 0

Type Erasure 86 What is Type Erasure? The way for the Java

Parity Violating Electron Scattering at Jefferson Lab Prof. Kent Paschke Intro to PVeS Parity

1-D and 2-D Parity FEC draft-ietf-fecframe-1d2d-parity-scheme-00 IETF 73 November 2008 Ali

Solving parity games Definition (Parity game) G = V E , V A , R , : V N where

Functional Assessment of Erasure Coded Storage Archive Computer Systems, Cluster, and Networking

SelectiveEC: Selective Reconstruction in Erasure-coded Storage Systems Liangliang Xu, Min Lyu,

Childrens Resilience Initiative One Communitys Response to ACEs through Resilience 1

Coded Computational Photography ! EE367/CS448I: Computational Imaging and Display !

Turbo Codes and Turbo-Coded Modulation Turbo Codes and Turbo-Coded Modulation in CDMA Mobile

Hierarchical Codes: How to Make Erasure Codes Attractive for Peer-to-Peer Storage Systems

Disclosures Nothing to disclose Critical Care Management of Acute Ischemic Stroke Nerissa U.

Disclosures Nothing to disclose Current Management of Research Funding: American Heart

THE BEATITUDES (Matt5:1-12) Fr. Dan Abba Literally, mystagogy means leading those who have

CS 798: Quantum Fault Tolerance lecture 5: Threshold 1/2 for erasure error recap

Somewhat Non-Committing Encryption and Efficient Adaptively Secure Oblivious Transfer Hong-Sheng

Sub-optimality of superposition coding for three or more receivers Chandra Nair, &amp; Mehdi

Type Systems Lecture 11 Jan. 12th, 2005 Sebastian Maneth

Sub-optimality of superposition coding for three or more receivers Chandra Nair, & Mehdi