Parity Models Erasure-Coded Resilience for Prediction Serving Systems
Jack Kosaian Rashmi Vinayak Shivaram Venkataraman
Parity Models Erasure-Coded Resilience for Prediction Serving - - PowerPoint PPT Presentation
Parity Models Erasure-Coded Resilience for Prediction Serving Systems Jack Kosaian Rashmi Vinayak Shivaram Venkataraman Rashmi Vinayak Shivaram Venkataraman 2 Machine learning lifecycle Training Inference Deploy model in Get a model to
Jack Kosaian Rashmi Vinayak Shivaram Venkataraman
Rashmi Vinayak Shivaram Venkataraman
2
3
Training Inference Get a model to reach desired accuracy Deploy model in target domain Hours to weeks Milliseconds “Batch” jobs Online
4
cat
0.15 0.8 0.05
dog bird
Open Source Cloud Services
5
6
Frontend
queries predictions
model instances
7
translation question- answering ranking
8
9
Recovery Delay Resource Overhead Reactive Proactive (lower is better) (lower is better)
11
encoding decoding
12
Recovery Delay Resource Overhead Reactive Proactive (lower is better) (lower is better) erasure codes
13
F F F F(X1) F(X2) queries models predictions
Goal: preserve results of computation over queries
Our goal: Using erasure codes to reduce tail latency in prediction serving X1 X2
14
Encode queries
X1 X2 encode F(X1) F(X2) F F F “parity query” Our goal: Using erasure codes to reduce tail latency in prediction serving
15
Decode results of inference over queries
encode X1 X2 decode F(X2) F(X1) F(P) F F F “parity query” Our goal: Using erasure codes to reduce tail latency in prediction serving
16
Need to recover computation over inputs
D2 D1 D1 D2 P F F F encode X1 X2 decode F(X2) encode decode D2 F(X1) F(P)
17
Linear computation X1 X2 P = X1 + X2 F(X2) = F(P) – F(X1) Non-linear computation X1 X2 P = X1 + X2 F(X2) = F(P) – F(X1) Actual is X22 Example: F(X) = 2X Example: F(X) = X2 2X 2X 2X X2 X2 X2 F(X2) = 2(X1 + X2) – X1 F(X2) = 2X2 F(X2) = 2(X1 + X2)2 – X12 F(X2) = X22 + 2X1X2
18
Linear computation X1 X2 P = X1 + X2 F(X2) = F(P) – F(X1) Non-linear computation X1 X2 P = X1 + X2 F(X2) = F(P) – F(X1) Example: F(X) = 2X 2X 2X 2X F(X2) = 2(X1 + X2) – X1 F(X2) = 2X2 F(X2) = ???
19
21
Learning a Code: Machine Learning for Approximate Non-Linear Coded Computation https://arxiv.org/abs/1806.01259 Parity Models: Erasure-Coded Resilience for Prediction Serving Systems To appear in ACM SOSP 2019
https://jackkosaian.github.io
22
encoder
X1 X2
decoder
Accurate
Learning a Code: Machine Learning for Approximate Non-Linear Coded Computation https://arxiv.org/abs/1806.01259
23
encoder
Computationally expensive X1 X2
decoder
Accurate Expensive encoder/decoder
Learning a Code: Machine Learning for Approximate Non-Linear Coded Computation https://arxiv.org/abs/1806.01259
Accurate
24
P = X1 + X2 X1 X2
parity model (FP)
F(X2) = FP(P) – F(X1) Efficient encoder/decoder
Parity Models: Erasure-Coded Resilience for Prediction Serving Systems To appear in ACM SOSP 2019 https://jackkosaian.github.io
25
Transform parities such that decoder can reconstruct unavailable predictions P = X1 + X2 X1 X2 F(X2) = FP(P) – F(X1)
parity model (FP)
26
Transform parities such that decoder can reconstruct unavailable predictions FP(P) = F(X1) + F(X2) P = X1 + X2 X1 X2 F(X2) = FP(P) – F(X1)
parity model (FP) Learn a parity model
27
Transform parities such that decoder can reconstruct unavailable predictions FP(P) = F(X1) + F(X2) P = X1 + X2 X1 X2 F(X2) = FP(P) – F(X1)
parity model (FP)
28
F(X1) + F(X2) P = X1 + X2 FP(P)1 compute loss
0.8 0.15 0.05 0.2 0.7 0.1
Desired output:
29
P = X1 + X2 FP(P)2 F(X1) + F(X2)
0.15 0.8 0.05 0.3 0.5 0.2
compute loss Desired output:
30
P = X1 + X2 FP(P)3 F(X1) + F(X2) Desired output:
0.03 0.02 0.95 0.3 0.3 0.4
compute loss
31
P = X1 + X2 + X3 + X4
FP(P)3 F(X1) + F(X2) + F(X3) + F(X4) Desired output:
32
P =
FP(P)3
33
34
Frontend
Encoder Decoder
parity model queries predictions
slow/failed
same latency as original
35
P = X1 + X2 X1 X2
parity model (FP)
F(X2) = FP(P) – F(X1)
36
37
replication
38
Parity model only comes into play when predictions are slow/failed
replication
39
Parity model only comes into play when predictions are slow/failed
replication
40
Parity model only comes into play when predictions are slow/failed
replication
6.1%
41
Parity model only comes into play when predictions are slow/failed
replication
6.1% 0.6%
42
Parity model only comes into play when predictions are slow/failed
replication
expected operating regime
43
Tradeoff between resource-overhead, resilience, and accuracy
44
Ground Truth Available Parity Models
45
encode
46
47
40% same median
48
49
F(X1) + F(X2) P = X1 + X2
FP(P)3 compute loss
50
51
encoder
X1 X2
decoder
P = X1 + X2 X1 X2
parity model (FP)
F(X2) = FP(P) – F(X1)
52
encoder
X1 X2
decoder
parity model (FP)
53