PRETZEL:Opening the Black Box of Machine Learning Prediction Serving - PowerPoint PPT Presentation

PRETZEL:Opening the Black Box of Machine Learning Prediction Serving Systems Presented by Qinyuan Sun Slides are modified from first author Yunseong Lee’s slides

2 Outline • Prediction Serving Systems • Limitations of Black Box Approaches • PRETZEL: White-box Prediction ServingSystem • Evaluation • Conclusion

Machine Learning PredictionServing Performance goal: 1. Models are learned fromdata 1) Low latency 2. Modelsare deployed and served together 2) High throughput 3) Minimal resourceusage Learn Model Deploy Data Server Users T raining Predictionserving 2

4 ML Prediction ServingSystems: Replication State-of-the-art Result ensemble caching Clipper TFServing ML.Net Request Batching T ext “Pretzel is tasty” Analysis • Assumption: models are blackbox • Re-use the same code in training phase Image cat Recognition • Encapsulate alloperations car into a function call (e.g., predict() ) • Apply external optimizations PredictionServing System

5 How does ModelsLook Like inside Boxes? vs. L J Pretzel istasty Model (text) (positive vs.negative) <Example: SentimentAnalysis>

6 How do ModelsLook inside Boxes? DAG ofOperators Featurizers Char Predictor Ngram Logistic v s . J Pretzel istasty T okenizer Concat Regression L Word Ngram <Example: SentimentAnalysis>

7 How do ModelsLook inside Boxes? DAG ofOperators Extract Compute N-grams finalscore Char Ngram Logistic v s . J Pretzel istasty T okenizer Concat Regression L Word Ngram Split text Merge two intotokens vectors <Example: SentimentAnalysis>

8 Many ModelsHave Similar Structures • Many part of a model can be re-used in other models • Customer personalization, T emplates, T ransferLearning • Identical set of operators with differentparameters

9 Outline • Prediction Serving Systems • Limitations of Black BoxApproaches • PRETZEL: White-box Prediction ServingSystem • Evaluation • Conclusion

1 0 Limitation 1: ResourceWaste • Resources are isolated across Blackboxes 1. Unable to share memoryspace è Waste memory to maintain duplicate objects (despite similarities between models) 2. Nocoordination for CPU resources between boxes è Serving many models can use too many threads machine

1 1 Limitation 2: Inconsideration for Ops’Characteristics 1. Operators have different performancecharacteristics • Concat materializes avector • LogReg takes only 0.3% (contrary to the training phase) 2. There can be a better plan if such characteristics are considered • Re-use the existingvectors • Apply in-place update in LogReg Others CharNgram WordNgram Concat LogReg 0.3 Char Ngram 34.2 32.7 23.1 9.6 Log T okenizer Concat Reg Word 40% 60% 0% 100 Ngram 80% 20 % Latency breakdown %

1 2 Limitation 3: LazyInitialization • ML.Net initializes code and memory lazily (efficient in training phase) • Run 250 SentimentAnalysis models 100 times è cold: first execution / hot: average of the rest99 • Long-tail latency in the coldcase • Code analysis, Just–in-time (JIT) compilation, memory allocation,etc • Difficult to provide strong Service-Level-Agreement(SLA) Char Ngram 444x Log T okenizer Concat 13x Reg Word Ngram

1 3 Outline • (Black-box) Prediction ServingSystems • Limitations of Black BoxApproaches • PRETZEL: White-box Prediction ServingSystem • Evaluation • Conclusion

1 4 PRETZEL: White-box PredictionServing • We analyze models to optimize the internal execution • We let models co-exist on the same runtime, sharing computation and memoryresources • We optimize models in twodirections: 1. End-to-end optimizations 2. Multi-model optimizations

1 5 End-to-End Optimizations Optimize the execution of individual models from start to end 1. [Ahead-of-time Compilation] Compile operators’ code inadvance à No JIToverhead 2. [Vector pooling] Pre-allocate data structures à No memory allocation on the data path

1 6 Multi-model Optimizations Share computation and memory acrossmodels 1. [Object Store] Share Operatorsparameters/weights à Maintain only onecopy 2.[Sub-plan Materialization] Reuse intermediate results computed by othermodels à Save computation

System Components 3. Runtime: Execute inferencequeries 1. Flour: IntermediateRepresentation Runtime var fContext = ...; Object var Tokenizer = ...; Scheduler return fPrgm.Plan(); Store … 2. Oven: Compiler/Optimizer 4. FrontEnd: Handle userrequests FrontEnd 17

1 8 Prediction Serving withPRETZEL 1. Offline Model • Analyze structural information ofmodels Analyze • Build ModelPlan for optimalexecution • Register ModelPlan toRuntime Model Runtime Register Plan 2. Online • Handle predictionrequests • Coordinate CPU & memoryresources FrontEnd Runtime

System Design: OfflinePhase 1. T ranslate Model into FlourProgram <Model> <Flour Program> var fContext = new FlourContext(...) var tTokenizer = fContext.CSV .FromText(fields, fieldsType, sep) . Tokenize (); var tCNgram = tTokenizer. CharNgram (numCNgrms, ...); var tWNgram = tTokenizer. WordNgram (numWNgrms, ...); var fPrgrm = tCNgram Char . Concat (tWNgram) Ngram . ClassifierBinaryLinear (cParams); Log T okenizer Concat Reg Word return fPrgrm.Plan(); Ngram 18

Rule-based System Design: OfflinePhase optimizer 2. Oven optimizer/compiler build ModelPlan Push linearpredictor & Remove Concat <Flour Program> var fContext = new FlourContext(...) var tTokenizer = fContext.CSV .FromText(fields, fieldsType, sep) . Tokenize (); Group ops intostages var tCNgram = tTokenizer. CharNgram (numCNgrms, ...); var tWNgram = <Model Plan> Stage1 tTokenizer. WordNgram (numWNgrms, ...); var fPrgrm S1 = tCNgram Logical DAG . Concat (tWNgram) . ClassifierBinaryLinear (cParams); S2 return fPrgrm.Plan(); Stage2 19

Rule-based System Design: OfflinePhase optimizer 2. Oven optimizer/compiler build ModelPlan Push linearpredictor & Remove Concat <Flour Program> e.g., Dictionary , N-gramLength var fContext = new FlourContext(...) var fContext = new FlourContext(...) var fContext = new FlourContext(...) var tTokenizer = fContext.CSV var tTokenizer = fContext.CSV var tTokenizer = fContext.CSV .FromText(fields, fieldsType, sep) .FromText( fields , fieldsType , sep ) .FromText(fields, fieldsType, sep) .Tokenize(); .Tokenize(); . Tokenize (); Group ops intostages var var tCNgram tCNgram = = tTokenizer.CharNgram( numCNgrms , tTokenizer. CharNgram (numCNgrms, var tCNgram = tTokenizer.CharNgram(numCNgrms, ...); ...); ...); var var var tWNgram tWNgram tWNgram = = = <Model Plan> Stage1 tTokenizer. WordNgram (numWNgrms, tTokenizer.WordNgram( numWNgrms , ...); ...); var var fPrgrm fPrgrm tTokenizer.WordNgram(numWNgrms, ...); var fPrgrm S1 = tCNgram = tCNgram = tCNgram Logical DAG . Concat (tWNgram) .Concat(tWNgram) .Concat(tWNgram) . ClassifierBinaryLinear (cParams); .ClassifierBinaryLinear(cParams); .ClassifierBinaryLinear( cParams ); Parameters S2 return fPrgrm.Plan(); return fPrgrm.Plan(); return fPrgrm.Plan(); Statistics e.g., dense vs.sparse, Stage2 maximum vectorsize 20

2 2 System Design: OfflinePhase 3. ModelPlan is registered to Runtime LogicalStages PhysicalStages <Model Plan> Model1 S1 S2 S1 Logical DAG S2 Parameters Statistics ObjectStore 2. Find the most 1. Store parameters& efficient physical impl. mapping between using params & stats logical stages

2 3 System Design: OfflinePhase 3. Registerselected 3. ModelPlan is registered to Runtime physical stagesto Catalog LogicalStages PhysicalStages Catalog <Model Plan> Model1 S1 S2 S1 Logical DAG S2 Parameters N-gramlength Statistics Sparse vs. Dense ObjectStore 1 vs. 3 2. Find the most 1. Store parameters& efficient physical impl. mapping between using params & stats logical stages

2 4 System Design: OnlinePhase LogicalStages Model1 Model2 S1 S1’ 2. Instantiate S2 S2’ physicalstages along with 4. Send resultback ObjectStore parameters to Client <Model1, “Pretzel istasty”> PhysicalStages 1. When aprediction 3. Execute stagesusing request arrives thread-pools, Runtime managed byScheduler

2 5 Outline • (Black-box) Prediction ServingSystems • Limitations of Black boxapproaches • PRETZEL: White-box Prediction ServingSystem • Evaluation • Conclusion

2 6 Evaluation • Q. How PRETZEL improves performance overblack-box approaches? • in terms of latency , memory and throughput • 500 Models from Microsoft Machine Learning T eam • 250 Sentiment Analysis(Memory-bound) • 250 Attendee Count(Compute-bound) • System configuration • 16 Cores CPU, 32GBRAM • Windows 10, .Net core2.0

PRETZEL:Opening the Black Box of Machine Learning Prediction Serving - PowerPoint PPT Presentation

PRETZEL:Opening the Black Box of Machine Learning Prediction Serving Systems Presented by Qinyuan Sun Slides are modified from first author Yunseong Lees slides 2 Outline Prediction Serving Systems Limitations of Black Box

Paradoxes in Probability How probability continues to amuse me! Let's play a game! Box A Box B

The cosmetic surgery conjecture for pretzel knots Stipsicz Andrs Rnyi Institute of

OPENING V.1 OPENING V.2 - for improvisation OPENING V.3 OPENING V.4 OPENING V.5

Black Box Scanning Tool + White Box Testing Tool Toshis Black Box Scanning Tool Same

A recipe for black box functors Maru Sarazola and Brendan Fong What is a black box functor? In

Machine Learning: Opening the Pandoras Box By Dhiana Deva - Machine Learning Engineer at Spoti

Kid s Box American English Level 1 Presentation Plus: Kid s Box American English Kid s Box

Flux Box Flux Box A concept by Flux Laboratory Flux box : concept Flux box : concept What is Flux

[7] Gaussian Elimination Starting to peek inside the black box So far sol ve( A, b) is a black

JSJ decompositions of toroidal 3-manifolds obtained by Dehn surgeries on pretzel knots K Ichihara

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Side Channel Analysis & Countermeasures Begl Bilgin 27 Dec. 2014 - IAM Alumni Meeting

Spence is a storage container that measures dispenses flour for the home baker. Store or Measure?

The Flour Pot 10/21/2010 Pink B + The problem + The solution + Product contract

Embedding GI inside the JVM Benoit Baudry Kwaku Yeboah-Antwi 1 Specialize to environment .. or not

Lecture 4: Regularization II Princeton University COS 495 Instructor: Yingyu Liang Review

M3S2 - Normal Distribution Professor Jarad Niemi STAT 226 - Iowa State University September 28,

The ASEAN Flour Milling Industry and Turkish Flour RICARDO M. PINCA Executive Director

CSE 115 Introduction to Computer Science I Road map Review File reading newline

How Industrial Hemp Can Lead to Business Growth 9/22/2020 TODAYS AGENDA 1. About NIP Group

PRETZEL:Opening the Black Box of Machine Learning Prediction Serving - PowerPoint PPT Presentation

PRETZEL:Opening the Black Box of Machine Learning Prediction Serving Systems Presented by Qinyuan Sun Slides are modified from first author Yunseong Lees slides 2 Outline Prediction Serving Systems Limitations of Black Box

Paradoxes in Probability How probability continues to amuse me! Let's play a game! Box A Box B

The cosmetic surgery conjecture for pretzel knots Stipsicz Andrs Rnyi Institute of

OPENING V.1 OPENING V.2 - for improvisation OPENING V.3 OPENING V.4 OPENING V.5

Black Box Scanning Tool + White Box Testing Tool Toshis Black Box Scanning Tool Same

A recipe for black box functors Maru Sarazola and Brendan Fong What is a black box functor? In

Machine Learning: Opening the Pandoras Box By Dhiana Deva - Machine Learning Engineer at Spoti

Kid s Box American English Level 1 Presentation Plus: Kid s Box American English Kid s Box

Flux Box Flux Box A concept by Flux Laboratory Flux box : concept Flux box : concept What is Flux

[7] Gaussian Elimination Starting to peek inside the black box So far sol ve( A, b) is a black

JSJ decompositions of toroidal 3-manifolds obtained by Dehn surgeries on pretzel knots K Ichihara

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Side Channel Analysis &amp; Countermeasures Begl Bilgin 27 Dec. 2014 - IAM Alumni Meeting

Spence is a storage container that measures dispenses flour for the home baker. Store or Measure?

The Flour Pot 10/21/2010 Pink B + The problem + The solution + Product contract

Embedding GI inside the JVM Benoit Baudry Kwaku Yeboah-Antwi 1 Specialize to environment .. or not

Lecture 4: Regularization II Princeton University COS 495 Instructor: Yingyu Liang Review

M3S2 - Normal Distribution Professor Jarad Niemi STAT 226 - Iowa State University September 28,

The ASEAN Flour Milling Industry and Turkish Flour RICARDO M. PINCA Executive Director

CSE 115 Introduction to Computer Science I Road map Review File reading newline

How Industrial Hemp Can Lead to Business Growth 9/22/2020 TODAYS AGENDA 1. About NIP Group

Side Channel Analysis & Countermeasures Begl Bilgin 27 Dec. 2014 - IAM Alumni Meeting