Willump: A Statistically-Aware End-to-end Optimizer for ML Inference - PowerPoint PPT Presentation

Willump: A Statistically-Aware End-to-end Optimizer for ML Inference Peter Kraft , Daniel Kang, Deepak Narayanan, Shoumik Palkar, Peter Bailis, Matei Zaharia 1

Problem: ML Inference ● Often performance-critical. ● Recent focus on tools for ML prediction serving. 2

A Common Bottleneck: Feature Computation Receive Raw ● Many applications bottlenecked by Data feature computation. ● Pipeline of transformations computes Compute numerical features from data for model. Features Predict With Model 3

A Common Bottleneck: Feature Computation ● Feature computation is bottleneck when models are inexpensive — boosted trees, not DNNs. ● Common on tabular/structured data! 4

A Common Bottleneck: Feature Computation Production Microsoft sentiment analysis pipeline Model run Feature computation takes > 99% of the time! time 5 Source: Pretzel (OSDI ‘18)

Current State-of-the-art ● Apply traditional serving optimizations, e.g. caching (Clipper), compiler optimizations (Pretzel). ● Neglect unique statistical properties of ML apps. 6

Statistical Properties of ML Amenability to approximation 7

Statistical Properties of ML Amenability to approximation Easy input: Hard input: Definitely not Maybe a a dog. dog? 8

Statistical Properties of ML Amenability to approximation Easy input: Hard input: Definitely not Maybe a a dog. dog? Existing Systems: Use Expensive Model for Both 9

Statistical Properties of ML Amenability to approximation Easy input: Hard input: Definitely not Maybe a a dog. dog? Statistically-Aware Systems: Use cheap model on bucket, expensive model on cat. 10

Statistical Properties of ML ● Model is often part of a bigger app (e.g. top-K query) 11

Statistical Properties of ML ● Model is often part of a bigger app (e.g. top-K query) Artist Score Rank Problem: Beatles 9.7 1 Return top Bruce Springsteen 9.5 2 10 artists. … … … Justin Bieber 5.6 999 Nickelback 4.1 1000 12

Statistical Properties of ML ● Model is often part of a bigger app (e.g. top-K query) Existing Systems Artist Score Rank Use Beatles 9.7 1 expensive Bruce Springsteen 9.5 2 model for … … … everything! Justin Bieber 5.6 999 Nickelback 4.1 1000 13

Statistical Properties of ML ● Model is often part of a bigger app (e.g. top-K query) Statistically-aware Systems Artist Score Rank High-value: Beatles 9.7 1 Rank precisely, Bruce Springsteen 9.5 2 return. … … … Low-value: Justin Bieber 5.6 999 Approximate, discard. Nickelback 4.1 1000 14

Prior Work: Statistically-Aware Optimizations ● Statistically-aware optimizations exist in literature. ● Always application-specific and custom-built. ● Never automatic! Source: Cheng et al. ( DLRS’ 16 ), Kang et al. 15 (VLDB ‘17)

ML Inference Dilemna ● ML inference systems: Easy to use. ○ Slow. ○ ● Statistically-aware systems: Fast ○ Require a lot of work to implement. ○ 16

Can an ML inference system be fast and easy to use? 17

Willump: Overview ● Statistically-aware optimizer for ML Inference. ● Targets feature computation! ● Automatic model-agnostic statistically-aware opts. ● 10x throughput+latency improvements. 18

Outline ● System Overview ● Optimization 1: End-to-end Cascades ● Optimization 2: Top-K Query Approximation ● Evaluation 19

Willump: Goals ● Automatically maximize performance of ML inference applications whose performance bottleneck is feature computation 20

System Overview Input Pipeline def pipeline(x1, x2): input = lib.transform(x1, x2) preds = model.predict(input) return preds

System Overview Willump Optimization Input Pipeline def pipeline(x1, x2): Infer Transformation input = lib.transform(x1, x2) Graph preds = model.predict(input) return preds

System Overview Willump Optimization Input Pipeline def pipeline(x1, x2): Infer Transformation input = lib.transform(x1, x2) Graph preds = model.predict(input) return preds Statistically-Aware Optimizations: 1. End-To-End Cascades 2. Top-K Query Approximation 23

System Overview Willump Optimization Input Pipeline def pipeline(x1, x2): Infer Transformation input = lib.transform(x1, x2) Graph preds = model.predict(input) return preds Statistically-Aware Optimizations: 1. End-To-End Cascades 2. Top-K Query Approximation Compiler Optimizations (Weld —Palkar et al. VLDB ‘18) 24

System Overview Willump Optimization Input Pipeline def pipeline(x1, x2): Infer Transformation input = lib.transform(x1, x2) Graph preds = model.predict(input) return preds Statistically-Aware Optimizations: 1. End-To-End Cascades 2. Top-K Query Approximation Optimized Pipeline def willump_pipeline(x1, x2): Compiler Optimizations preds = compiled_code(x1, x2) (Weld —Palkar et al. VLDB ‘18) return preds 25

Outline ● System Overview ● Optimization 1: End-to-end Cascades ● Optimization 2: Top-K Query Approximation ● Evaluation 26

Background: Model Cascades ● Classify “easy” inputs with cheap model. ● Cascade to expensive model for “hard” inputs. Easy input: Hard input: Definitely not Maybe a a dog. dog? 27

Background: Model Cascades ● Used for image classification, object detection. ● Existing systems application-specific and custom-built. Source: Viola-Jones ( CVPR’ 01 ), Kang et al. 28 (VLDB ‘17)

Our Optimization: End-to-end cascades ● Compute only some features for “easy” data inputs; cascade to computing all for “hard” inputs. ● Automatic and model-agnostic, unlike prior work. ○ Estimates for runtime performance & accuracy of a feature set ○ Efficient search process for tuning parameters 29

End-to-end Cascades: Original Model Compute All Features Model Prediction

End-to-end Cascades: Approximate Model Compute Compute Selected Features All Features Approximate Model Cascades Optimization Model Prediction Prediction

End-to-end Cascades: Confidence Compute Compute Selected Features All Features Approximate Model Cascades Optimization Confidence > Threshold Model Yes Prediction Prediction

End-to-end Cascades: Final Pipeline Compute Compute Selected Features All Features Approximate Model Cascades Optimization Confidence > Threshold Model Yes No Compute Remaining Features Original Model Prediction Prediction

End-to-end Cascades: Constructing Cascades ● Construct cascades during model training. ● Need model training set and an accuracy target. 34

End-to-end Cascades: Selecting Features Key question: Compute Selected Features Select which features? Approximate Model Confidence > Threshold Yes No Compute Remaining Features Original Model Prediction

End-to-end Cascades: Selecting Features ● Goal: Select features that minimize expected query time given accuracy target. 36

End-to-end Cascades: Selecting Features Two possibilities for a query: Can approximate or not. Compute Selected Features Approximate Model Confidence > Threshold No Yes Can approximate Can’t approximate Compute Remaining Features query. query. Original Model Prediction

End-to-end Cascades: Selecting Features 𝑄( approx ) cost (𝑇) + 𝑄(~ approx ) cost (𝐺) min 𝑇 Compute Selected Features (S) cost (𝑇) Approximate Model Confidence > Threshold P(Yes) = P(approx) Yes Prediction

End-to-end Cascades: Selecting Features 𝑄( approx ) cost (𝑇) + 𝑄(~ approx ) cost (𝐺) min 𝑇 Compute Selected Features (S) Approximate Model Confidence > Threshold P(No) = P(~approx) No Compute Remaining Features cost (𝐺) Original Model Prediction

End-to-end Cascades: Selecting Features 𝑄( approx ) cost (𝑇) + 𝑄(~ approx ) cost (𝐺) min 𝑇 Compute Selected Features (S) cost (𝑇) Approximate Model Confidence > Threshold P(Yes) = P(approx) P(No) = P(~approx) No Yes Compute Remaining Features cost (𝐺) Original Model Prediction

End-to-end Cascades: Selecting Features ● Goal: Select feature set S that minimizes query time: 𝑄( approx ) cost (𝑇) + 𝑄(~ approx ) cost (𝐺) min 𝑇 41

End-to-end Cascades: Selecting Features ● Goal: Select feature set S that minimizes query time: 𝑄( approx ) cost (𝑇) + 𝑄(~ approx ) cost (𝐺) min 𝑇 ● Approach: ○ Choose several potential values of cost (𝑻) . ○ Find best feature set with each cost(S). ○ Train model & find cascade threshold for each set. ○ Pick best overall. 42

End-to-end Cascades: Selecting Features ● Goal: Select feature set S that minimizes query time: 𝑄( approx ) cost (𝑇) + 𝑄(~ approx ) cost (𝐺) min 𝑇 ● Approach: ○ Choose several potential values of cost (𝑇) . ○ Find best feature set with each cost(S). ○ Train model & find cascade threshold for each set. ○ Pick best overall. 43

End-to-end Cascades: Selecting Features ● Goal: Select feature set S that minimizes query time: 𝑄( approx ) cost (𝑇) + 𝑄(~ approx ) cost (𝐺) min 𝑇 ● Approach: ○ Choose several potential values of cost (𝑇) . ○ Find best feature set with each cost(S). ○ Train model & find cascade threshold for each set. ○ Pick best overall. 44

Willump: A Statistically-Aware End-to-end Optimizer for ML Inference - PowerPoint PPT Presentation

Willump: A Statistically-Aware End-to-end Optimizer for ML Inference Peter Kraft , Daniel Kang, Deepak Narayanan, Shoumik Palkar, Peter Bailis, Matei Zaharia 1 Problem: ML Inference Often performance-critical. Recent focus on tools for

Statistically-Significant Correlations 11 Oct, 2014 0F 2014 NNN4 Statistically-Significant

On Statistically Secure Obfuscation with Approximate Correctness Zvika Brakerski 1 Christina

Toolkit to Support Intelligibility in Context Aware Applications Context-Aware Applications P

Location-Aware Computing Definition: Location-aware applications generate outputs/behaviors

SALT 20 was statistically significant from placebo for the 8 and 12 mg BID cohorts in alopecia

CONCATENATION AND SPECIES TREE METHODS Joao Tonini, EXHIBIT STATISTICALLY INDISTINGUISHABLE

Statistically Integrated Metabonomic-Proteomic Studies on a Human Prostate Cancer Xenograft

Big anniversary, broader focus Medicare Star measures Statistically significant 3-5 year

Explaining Cortical Adaptation with a Statistically Optimized Normalization Mo del Martin

Can we use Bayesian methods to resolve the current crisis of statistically-significant research

Learning grammar(s) statistically Mark Johnson joint work with Sharon Goldwater and Tom Griffiths

Mol2Net Statistically optimized binary ethosomal gel of Carvedilol: allevialates hypertenstion

Statistically Based Model Comparison Techniques H. T. Banks Center for Research in Scientific

*There was a statistically significant increase for all four measures [p<0.001] Brookfield S.

Adaptive Histograms from a Randomized Queue that is Prioritized for Statistically Equivalent

Statistically-Indistinguishable Ensembles and the Evaluation of Climate Models Corey Dethier

Tree Recursion Announcements Recursive Factorial factorial (!) if n == 0 n! = 1 if n > 0

On the well-posedness of cascades of analytic nonlinear input-output systems driven by noise

State of Art Techniques in Digital to Analog Converter Design Dr. Rahmi Hezar Senior Member of

(from Chapter 5 4 th edition of the text Chapter 4 5 th edition) Review: Locations for CSS

Object Detection using Haar like Features CS 395T: Visual Recognition and Search Harshdeep

Scaling the Cascades Interconnect-aware FPGA implementation of Machine Learning problems Anand

Outline Contagion Contagion Basic Contagion Basic Contagion Models Models Complex Networks,

Online Learning to Rank with Features Authors: Shuai Li, Tor Lattimore, Csaba Szepesvri The

Willump: A Statistically-Aware End-to-end Optimizer for ML Inference - PowerPoint PPT Presentation

Willump: A Statistically-Aware End-to-end Optimizer for ML Inference Peter Kraft , Daniel Kang, Deepak Narayanan, Shoumik Palkar, Peter Bailis, Matei Zaharia 1 Problem: ML Inference Often performance-critical. Recent focus on tools for

Statistically-Significant Correlations 11 Oct, 2014 0F 2014 NNN4 Statistically-Significant

On Statistically Secure Obfuscation with Approximate Correctness Zvika Brakerski 1 Christina

Toolkit to Support Intelligibility in Context Aware Applications Context-Aware Applications P

Location-Aware Computing Definition: Location-aware applications generate outputs/behaviors

SALT 20 was statistically significant from placebo for the 8 and 12 mg BID cohorts in alopecia

CONCATENATION AND SPECIES TREE METHODS Joao Tonini, EXHIBIT STATISTICALLY INDISTINGUISHABLE

Statistically Integrated Metabonomic-Proteomic Studies on a Human Prostate Cancer Xenograft

Big anniversary, broader focus Medicare Star measures Statistically significant 3-5 year

Explaining Cortical Adaptation with a Statistically Optimized Normalization Mo del Martin

Can we use Bayesian methods to resolve the current crisis of statistically-significant research

Learning grammar(s) statistically Mark Johnson joint work with Sharon Goldwater and Tom Griffiths

Mol2Net Statistically optimized binary ethosomal gel of Carvedilol: allevialates hypertenstion

Statistically Based Model Comparison Techniques H. T. Banks Center for Research in Scientific

*There was a statistically significant increase for all four measures [p&lt;0.001] Brookfield S.

Adaptive Histograms from a Randomized Queue that is Prioritized for Statistically Equivalent

Statistically-Indistinguishable Ensembles and the Evaluation of Climate Models Corey Dethier

Tree Recursion Announcements Recursive Factorial factorial (!) if n == 0 n! = 1 if n &gt; 0

On the well-posedness of cascades of analytic nonlinear input-output systems driven by noise

State of Art Techniques in Digital to Analog Converter Design Dr. Rahmi Hezar Senior Member of

(from Chapter 5 4 th edition of the text Chapter 4 5 th edition) Review: Locations for CSS

Object Detection using Haar like Features CS 395T: Visual Recognition and Search Harshdeep

Scaling the Cascades Interconnect-aware FPGA implementation of Machine Learning problems Anand

Outline Contagion Contagion Basic Contagion Basic Contagion Models Models Complex Networks,

Online Learning to Rank with Features Authors: Shuai Li, Tor Lattimore, Csaba Szepesvri The

*There was a statistically significant increase for all four measures [p<0.001] Brookfield S.

Tree Recursion Announcements Recursive Factorial factorial (!) if n == 0 n! = 1 if n > 0