Willump: A Statistically-Aware End-to-end Optimizer for ML Inference
Peter Kraft, Daniel Kang, Deepak Narayanan, Shoumik Palkar, Peter Bailis, Matei Zaharia
1
Willump: A Statistically-Aware End-to-end Optimizer for ML Inference - - PowerPoint PPT Presentation
Willump: A Statistically-Aware End-to-end Optimizer for ML Inference Peter Kraft , Daniel Kang, Deepak Narayanan, Shoumik Palkar, Peter Bailis, Matei Zaharia 1 Problem: ML Inference Often performance-critical. Recent focus on tools for
Peter Kraft, Daniel Kang, Deepak Narayanan, Shoumik Palkar, Peter Bailis, Matei Zaharia
1
2
3
Receive Raw Data Compute Features Predict With Model
4
Source: Pretzel (OSDI ‘18)
Feature computation takes >99% of the time! Production Microsoft sentiment analysis pipeline
Model run time
5
6
7
8
9
10
11
12
Artist Score Rank Beatles 9.7 1 Bruce Springsteen 9.5 2 … … … Justin Bieber 5.6 999 Nickelback 4.1 1000
13
Artist Score Rank Beatles 9.7 1 Bruce Springsteen 9.5 2 … … … Justin Bieber 5.6 999 Nickelback 4.1 1000
14
Artist Score Rank Beatles 9.7 1 Bruce Springsteen 9.5 2 … … … Justin Bieber 5.6 999 Nickelback 4.1 1000
High-value: Rank precisely, return. Low-value: Approximate, discard.
15
Source: Cheng et al. (DLRS’ 16), Kang et al. (VLDB ‘17)
○
Easy to use.
○
Slow.
○
Fast
○
Require a lot of work to implement.
16
17
18
19
20
def pipeline(x1, x2): input = lib.transform(x1, x2) preds = model.predict(input) return preds
Input Pipeline
def pipeline(x1, x2): input = lib.transform(x1, x2) preds = model.predict(input) return preds
Willump Optimization
Infer Transformation Graph
Input Pipeline
23
def pipeline(x1, x2): input = lib.transform(x1, x2) preds = model.predict(input) return preds
Willump Optimization
Infer Transformation Graph Statistically-Aware Optimizations:
Input Pipeline
24
def pipeline(x1, x2): input = lib.transform(x1, x2) preds = model.predict(input) return preds
Willump Optimization
Infer Transformation Graph Compiler Optimizations (Weld—Palkar et al. VLDB ‘18)
Input Pipeline
Statistically-Aware Optimizations:
25
def pipeline(x1, x2): input = lib.transform(x1, x2) preds = model.predict(input) return preds
Willump Optimization
Infer Transformation Graph Compiler Optimizations (Weld—Palkar et al. VLDB ‘18)
def willump_pipeline(x1, x2): preds = compiled_code(x1, x2) return preds
Optimized Pipeline
Input Pipeline
Statistically-Aware Optimizations:
26
27
28
Source: Viola-Jones (CVPR’ 01), Kang et al. (VLDB ‘17)
29
Compute All Features Model Prediction
Compute All Features Model Prediction Compute Selected Features Approximate Model Prediction
Compute All Features Model Prediction Compute Selected Features Approximate Model Prediction
Confidence > Threshold
Yes
Compute All Features Model Prediction Compute Selected Features Compute Remaining Features Approximate Model Prediction Original Model
Confidence > Threshold
Yes No
34
Compute Selected Features Compute Remaining Features Approximate Model Prediction Original Model
Confidence > Threshold
Yes No
Key question: Select which features?
36
Compute Selected Features Compute Remaining Features Approximate Model Prediction Original Model
Confidence > Threshold
Yes No
Can approximate query. Can’t approximate query.
Compute Selected Features (S) Approximate Model Prediction
Confidence > Threshold
Yes
P(Yes) = P(approx) cost(𝑇)
𝑇
Compute Selected Features (S) Compute Remaining Features Approximate Model Prediction Original Model
Confidence > Threshold
No
P(No) = P(~approx)
𝑇
cost(𝐺)
Compute Selected Features (S) Compute Remaining Features Approximate Model Prediction Original Model
Confidence > Threshold
Yes No
P(Yes) = P(approx) P(No) = P(~approx) cost(𝑇)
𝑇
cost(𝐺)
41
𝑇
42
𝑇
43
𝑇
44
𝑇
45
𝑇
46
𝑇
47
𝑇
48
𝑇
49
𝑇
50
𝑇
51
𝑇
52
𝑇
53
𝑇
54
𝑇
55
𝑇
56
𝑇
57
58
59
60
Artist Score Rank Beatles 9.7 1 Bruce Springsteen 9.5 2 … … … Justin Bieber 5.6 999 Nickelback 4.1 1000
High-value: Rank precisely, return. Low-value: Approximate, discard.
61
62
Source: Cheng et al. (DLRS ‘16)
63
64
65
○
Music (music recommendation– queries remotely stored precomputed features)
○
Purchase (predict next purchase, tabular AutoML features)
○
Toxic (toxic comment detection – computes string features)
66
67
15x 1.6x 1x 1x 2.4x 3.2x
68
69
4.0x 1x 2.7x 1x 3.2x 30x
70