willump a statistically aware
play

Willump: A Statistically-Aware End-to-end Optimizer for ML Inference - PowerPoint PPT Presentation

Willump: A Statistically-Aware End-to-end Optimizer for ML Inference Peter Kraft , Daniel Kang, Deepak Narayanan, Shoumik Palkar, Peter Bailis, Matei Zaharia 1 Problem: ML Inference Often performance-critical. Recent focus on tools for


  1. Willump: A Statistically-Aware End-to-end Optimizer for ML Inference Peter Kraft , Daniel Kang, Deepak Narayanan, Shoumik Palkar, Peter Bailis, Matei Zaharia 1

  2. Problem: ML Inference ● Often performance-critical. ● Recent focus on tools for ML prediction serving. 2

  3. A Common Bottleneck: Feature Computation Receive Raw ● Many applications bottlenecked by Data feature computation. ● Pipeline of transformations computes Compute numerical features from data for model. Features Predict With Model 3

  4. A Common Bottleneck: Feature Computation ● Feature computation is bottleneck when models are inexpensive — boosted trees, not DNNs. ● Common on tabular/structured data! 4

  5. A Common Bottleneck: Feature Computation Production Microsoft sentiment analysis pipeline Model run Feature computation takes > 99% of the time! time 5 Source: Pretzel (OSDI ‘18)

  6. Current State-of-the-art ● Apply traditional serving optimizations, e.g. caching (Clipper), compiler optimizations (Pretzel). ● Neglect unique statistical properties of ML apps. 6

  7. Statistical Properties of ML Amenability to approximation 7

  8. Statistical Properties of ML Amenability to approximation Easy input: Hard input: Definitely not Maybe a a dog. dog? 8

  9. Statistical Properties of ML Amenability to approximation Easy input: Hard input: Definitely not Maybe a a dog. dog? Existing Systems: Use Expensive Model for Both 9

  10. Statistical Properties of ML Amenability to approximation Easy input: Hard input: Definitely not Maybe a a dog. dog? Statistically-Aware Systems: Use cheap model on bucket, expensive model on cat. 10

  11. Statistical Properties of ML ● Model is often part of a bigger app (e.g. top-K query) 11

  12. Statistical Properties of ML ● Model is often part of a bigger app (e.g. top-K query) Artist Score Rank Problem: Beatles 9.7 1 Return top Bruce Springsteen 9.5 2 10 artists. … … … Justin Bieber 5.6 999 Nickelback 4.1 1000 12

  13. Statistical Properties of ML ● Model is often part of a bigger app (e.g. top-K query) Existing Systems Artist Score Rank Use Beatles 9.7 1 expensive Bruce Springsteen 9.5 2 model for … … … everything! Justin Bieber 5.6 999 Nickelback 4.1 1000 13

  14. Statistical Properties of ML ● Model is often part of a bigger app (e.g. top-K query) Statistically-aware Systems Artist Score Rank High-value: Beatles 9.7 1 Rank precisely, Bruce Springsteen 9.5 2 return. … … … Low-value: Justin Bieber 5.6 999 Approximate, discard. Nickelback 4.1 1000 14

  15. Prior Work: Statistically-Aware Optimizations ● Statistically-aware optimizations exist in literature. ● Always application-specific and custom-built. ● Never automatic! Source: Cheng et al. ( DLRS’ 16 ), Kang et al. 15 (VLDB ‘17)

  16. ML Inference Dilemna ● ML inference systems: Easy to use. ○ Slow. ○ ● Statistically-aware systems: Fast ○ Require a lot of work to implement. ○ 16

  17. Can an ML inference system be fast and easy to use? 17

  18. Willump: Overview ● Statistically-aware optimizer for ML Inference. ● Targets feature computation! ● Automatic model-agnostic statistically-aware opts. ● 10x throughput+latency improvements. 18

  19. Outline ● System Overview ● Optimization 1: End-to-end Cascades ● Optimization 2: Top-K Query Approximation ● Evaluation 19

  20. Willump: Goals ● Automatically maximize performance of ML inference applications whose performance bottleneck is feature computation 20

  21. System Overview Input Pipeline def pipeline(x1, x2): input = lib.transform(x1, x2) preds = model.predict(input) return preds

  22. System Overview Willump Optimization Input Pipeline def pipeline(x1, x2): Infer Transformation input = lib.transform(x1, x2) Graph preds = model.predict(input) return preds

  23. System Overview Willump Optimization Input Pipeline def pipeline(x1, x2): Infer Transformation input = lib.transform(x1, x2) Graph preds = model.predict(input) return preds Statistically-Aware Optimizations: 1. End-To-End Cascades 2. Top-K Query Approximation 23

  24. System Overview Willump Optimization Input Pipeline def pipeline(x1, x2): Infer Transformation input = lib.transform(x1, x2) Graph preds = model.predict(input) return preds Statistically-Aware Optimizations: 1. End-To-End Cascades 2. Top-K Query Approximation Compiler Optimizations (Weld —Palkar et al. VLDB ‘18) 24

  25. System Overview Willump Optimization Input Pipeline def pipeline(x1, x2): Infer Transformation input = lib.transform(x1, x2) Graph preds = model.predict(input) return preds Statistically-Aware Optimizations: 1. End-To-End Cascades 2. Top-K Query Approximation Optimized Pipeline def willump_pipeline(x1, x2): Compiler Optimizations preds = compiled_code(x1, x2) (Weld —Palkar et al. VLDB ‘18) return preds 25

  26. Outline ● System Overview ● Optimization 1: End-to-end Cascades ● Optimization 2: Top-K Query Approximation ● Evaluation 26

  27. Background: Model Cascades ● Classify “easy” inputs with cheap model. ● Cascade to expensive model for “hard” inputs. Easy input: Hard input: Definitely not Maybe a a dog. dog? 27

  28. Background: Model Cascades ● Used for image classification, object detection. ● Existing systems application-specific and custom-built. Source: Viola-Jones ( CVPR’ 01 ), Kang et al. 28 (VLDB ‘17)

  29. Our Optimization: End-to-end cascades ● Compute only some features for “easy” data inputs; cascade to computing all for “hard” inputs. ● Automatic and model-agnostic, unlike prior work. ○ Estimates for runtime performance & accuracy of a feature set ○ Efficient search process for tuning parameters 29

  30. End-to-end Cascades: Original Model Compute All Features Model Prediction

  31. End-to-end Cascades: Approximate Model Compute Compute Selected Features All Features Approximate Model Cascades Optimization Model Prediction Prediction

  32. End-to-end Cascades: Confidence Compute Compute Selected Features All Features Approximate Model Cascades Optimization Confidence > Threshold Model Yes Prediction Prediction

  33. End-to-end Cascades: Final Pipeline Compute Compute Selected Features All Features Approximate Model Cascades Optimization Confidence > Threshold Model Yes No Compute Remaining Features Original Model Prediction Prediction

  34. End-to-end Cascades: Constructing Cascades ● Construct cascades during model training. ● Need model training set and an accuracy target. 34

  35. End-to-end Cascades: Selecting Features Key question: Compute Selected Features Select which features? Approximate Model Confidence > Threshold Yes No Compute Remaining Features Original Model Prediction

  36. End-to-end Cascades: Selecting Features ● Goal: Select features that minimize expected query time given accuracy target. 36

  37. End-to-end Cascades: Selecting Features Two possibilities for a query: Can approximate or not. Compute Selected Features Approximate Model Confidence > Threshold No Yes Can approximate Can’t approximate Compute Remaining Features query. query. Original Model Prediction

  38. End-to-end Cascades: Selecting Features 𝑄( approx ) cost (𝑇) + 𝑄(~ approx ) cost (𝐺) min 𝑇 Compute Selected Features (S) cost (𝑇) Approximate Model Confidence > Threshold P(Yes) = P(approx) Yes Prediction

  39. End-to-end Cascades: Selecting Features 𝑄( approx ) cost (𝑇) + 𝑄(~ approx ) cost (𝐺) min 𝑇 Compute Selected Features (S) Approximate Model Confidence > Threshold P(No) = P(~approx) No Compute Remaining Features cost (𝐺) Original Model Prediction

  40. End-to-end Cascades: Selecting Features 𝑄( approx ) cost (𝑇) + 𝑄(~ approx ) cost (𝐺) min 𝑇 Compute Selected Features (S) cost (𝑇) Approximate Model Confidence > Threshold P(Yes) = P(approx) P(No) = P(~approx) No Yes Compute Remaining Features cost (𝐺) Original Model Prediction

  41. End-to-end Cascades: Selecting Features ● Goal: Select feature set S that minimizes query time: 𝑄( approx ) cost (𝑇) + 𝑄(~ approx ) cost (𝐺) min 𝑇 41

  42. End-to-end Cascades: Selecting Features ● Goal: Select feature set S that minimizes query time: 𝑄( approx ) cost (𝑇) + 𝑄(~ approx ) cost (𝐺) min 𝑇 ● Approach: ○ Choose several potential values of cost (𝑻) . ○ Find best feature set with each cost(S). ○ Train model & find cascade threshold for each set. ○ Pick best overall. 42

  43. End-to-end Cascades: Selecting Features ● Goal: Select feature set S that minimizes query time: 𝑄( approx ) cost (𝑇) + 𝑄(~ approx ) cost (𝐺) min 𝑇 ● Approach: ○ Choose several potential values of cost (𝑇) . ○ Find best feature set with each cost(S). ○ Train model & find cascade threshold for each set. ○ Pick best overall. 43

  44. End-to-end Cascades: Selecting Features ● Goal: Select feature set S that minimizes query time: 𝑄( approx ) cost (𝑇) + 𝑄(~ approx ) cost (𝐺) min 𝑇 ● Approach: ○ Choose several potential values of cost (𝑇) . ○ Find best feature set with each cost(S). ○ Train model & find cascade threshold for each set. ○ Pick best overall. 44

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend