pretzel opening the black box of machine learning
play

PRETZEL:Opening the Black Box of Machine Learning Prediction Serving - PowerPoint PPT Presentation

PRETZEL:Opening the Black Box of Machine Learning Prediction Serving Systems Presented by Qinyuan Sun Slides are modified from first author Yunseong Lees slides 2 Outline Prediction Serving Systems Limitations of Black Box


  1. PRETZEL:Opening the Black Box of Machine Learning Prediction Serving Systems Presented by Qinyuan Sun Slides are modified from first author Yunseong Lee’s slides

  2. 2 Outline • Prediction Serving Systems • Limitations of Black Box Approaches • PRETZEL: White-box Prediction ServingSystem • Evaluation • Conclusion

  3. Machine Learning PredictionServing Performance goal: 1. Models are learned fromdata 1) Low latency 2. Modelsare deployed and served together 2) High throughput 3) Minimal resourceusage Learn Model Deploy Data Server Users T raining Predictionserving 2

  4. 4 ML Prediction ServingSystems: Replication State-of-the-art Result ensemble caching Clipper TFServing ML.Net Request Batching T ext “Pretzel is tasty” Analysis • Assumption: models are blackbox • Re-use the same code in training phase Image cat Recognition • Encapsulate alloperations car into a function call (e.g., predict() ) • Apply external optimizations PredictionServing System

  5. 5 How does ModelsLook Like inside Boxes? vs. L J Pretzel istasty Model (text) (positive vs.negative) <Example: SentimentAnalysis>

  6. 6 How do ModelsLook inside Boxes? DAG ofOperators Featurizers Char Predictor Ngram Logistic v s . J Pretzel istasty T okenizer Concat Regression L Word Ngram <Example: SentimentAnalysis>

  7. 7 How do ModelsLook inside Boxes? DAG ofOperators Extract Compute N-grams finalscore Char Ngram Logistic v s . J Pretzel istasty T okenizer Concat Regression L Word Ngram Split text Merge two intotokens vectors <Example: SentimentAnalysis>

  8. 8 Many ModelsHave Similar Structures • Many part of a model can be re-used in other models • Customer personalization, T emplates, T ransferLearning • Identical set of operators with differentparameters

  9. 9 Outline • Prediction Serving Systems • Limitations of Black BoxApproaches • PRETZEL: White-box Prediction ServingSystem • Evaluation • Conclusion

  10. 1 0 Limitation 1: ResourceWaste • Resources are isolated across Blackboxes 1. Unable to share memoryspace è Waste memory to maintain duplicate objects (despite similarities between models) 2. Nocoordination for CPU resources between boxes è Serving many models can use too many threads machine

  11. 1 1 Limitation 2: Inconsideration for Ops’Characteristics 1. Operators have different performancecharacteristics • Concat materializes avector • LogReg takes only 0.3% (contrary to the training phase) 2. There can be a better plan if such characteristics are considered • Re-use the existingvectors • Apply in-place update in LogReg Others CharNgram WordNgram Concat LogReg 0.3 Char Ngram 34.2 32.7 23.1 9.6 Log T okenizer Concat Reg Word 40% 60% 0% 100 Ngram 80% 20 % Latency breakdown %

  12. 1 2 Limitation 3: LazyInitialization • ML.Net initializes code and memory lazily (efficient in training phase) • Run 250 SentimentAnalysis models 100 times è cold: first execution / hot: average of the rest99 • Long-tail latency in the coldcase • Code analysis, Just–in-time (JIT) compilation, memory allocation,etc • Difficult to provide strong Service-Level-Agreement(SLA) Char Ngram 444x Log T okenizer Concat 13x Reg Word Ngram

  13. 1 3 Outline • (Black-box) Prediction ServingSystems • Limitations of Black BoxApproaches • PRETZEL: White-box Prediction ServingSystem • Evaluation • Conclusion

  14. 1 4 PRETZEL: White-box PredictionServing • We analyze models to optimize the internal execution • We let models co-exist on the same runtime, sharing computation and memoryresources • We optimize models in twodirections: 1. End-to-end optimizations 2. Multi-model optimizations

  15. 1 5 End-to-End Optimizations Optimize the execution of individual models from start to end 1. [Ahead-of-time Compilation] Compile operators’ code inadvance à No JIToverhead 2. [Vector pooling] Pre-allocate data structures à No memory allocation on the data path

  16. 1 6 Multi-model Optimizations Share computation and memory acrossmodels 1. [Object Store] Share Operatorsparameters/weights à Maintain only onecopy 2.[Sub-plan Materialization] Reuse intermediate results computed by othermodels à Save computation

  17. System Components 3. Runtime: Execute inferencequeries 1. Flour: IntermediateRepresentation Runtime var fContext = ...; Object var Tokenizer = ...; Scheduler return fPrgm.Plan(); Store … 2. Oven: Compiler/Optimizer 4. FrontEnd: Handle userrequests FrontEnd 17

  18. 1 8 Prediction Serving withPRETZEL 1. Offline Model • Analyze structural information ofmodels Analyze • Build ModelPlan for optimalexecution • Register ModelPlan toRuntime Model Runtime Register Plan 2. Online • Handle predictionrequests • Coordinate CPU & memoryresources FrontEnd Runtime

  19. System Design: OfflinePhase 1. T ranslate Model into FlourProgram <Model> <Flour Program> var fContext = new FlourContext(...) var tTokenizer = fContext.CSV .FromText(fields, fieldsType, sep) . Tokenize (); var tCNgram = tTokenizer. CharNgram (numCNgrms, ...); var tWNgram = tTokenizer. WordNgram (numWNgrms, ...); var fPrgrm = tCNgram Char . Concat (tWNgram) Ngram . ClassifierBinaryLinear (cParams); Log T okenizer Concat Reg Word return fPrgrm.Plan(); Ngram 18

  20. Rule-based System Design: OfflinePhase optimizer 2. Oven optimizer/compiler build ModelPlan Push linearpredictor & Remove Concat <Flour Program> var fContext = new FlourContext(...) var tTokenizer = fContext.CSV .FromText(fields, fieldsType, sep) . Tokenize (); Group ops intostages var tCNgram = tTokenizer. CharNgram (numCNgrms, ...); var tWNgram = <Model Plan> Stage1 tTokenizer. WordNgram (numWNgrms, ...); var fPrgrm S1 = tCNgram Logical DAG . Concat (tWNgram) . ClassifierBinaryLinear (cParams); S2 return fPrgrm.Plan(); Stage2 19

  21. Rule-based System Design: OfflinePhase optimizer 2. Oven optimizer/compiler build ModelPlan Push linearpredictor & Remove Concat <Flour Program> e.g., Dictionary , N-gramLength var fContext = new FlourContext(...) var fContext = new FlourContext(...) var fContext = new FlourContext(...) var tTokenizer = fContext.CSV var tTokenizer = fContext.CSV var tTokenizer = fContext.CSV .FromText(fields, fieldsType, sep) .FromText( fields , fieldsType , sep ) .FromText(fields, fieldsType, sep) .Tokenize(); .Tokenize(); . Tokenize (); Group ops intostages var var tCNgram tCNgram = = tTokenizer.CharNgram( numCNgrms , tTokenizer. CharNgram (numCNgrms, var tCNgram = tTokenizer.CharNgram(numCNgrms, ...); ...); ...); var var var tWNgram tWNgram tWNgram = = = <Model Plan> Stage1 tTokenizer. WordNgram (numWNgrms, tTokenizer.WordNgram( numWNgrms , ...); ...); var var fPrgrm fPrgrm tTokenizer.WordNgram(numWNgrms, ...); var fPrgrm S1 = tCNgram = tCNgram = tCNgram Logical DAG . Concat (tWNgram) .Concat(tWNgram) .Concat(tWNgram) . ClassifierBinaryLinear (cParams); .ClassifierBinaryLinear(cParams); .ClassifierBinaryLinear( cParams ); Parameters S2 return fPrgrm.Plan(); return fPrgrm.Plan(); return fPrgrm.Plan(); Statistics e.g., dense vs.sparse, Stage2 maximum vectorsize 20

  22. 2 2 System Design: OfflinePhase 3. ModelPlan is registered to Runtime LogicalStages PhysicalStages <Model Plan> Model1 S1 S2 S1 Logical DAG S2 Parameters Statistics ObjectStore 2. Find the most 1. Store parameters& efficient physical impl. mapping between using params & stats logical stages

  23. 2 3 System Design: OfflinePhase 3. Registerselected 3. ModelPlan is registered to Runtime physical stagesto Catalog LogicalStages PhysicalStages Catalog <Model Plan> Model1 S1 S2 S1 Logical DAG S2 Parameters N-gramlength Statistics Sparse vs. Dense ObjectStore 1 vs. 3 2. Find the most 1. Store parameters& efficient physical impl. mapping between using params & stats logical stages

  24. 2 4 System Design: OnlinePhase LogicalStages Model1 Model2 S1 S1’ 2. Instantiate S2 S2’ physicalstages along with 4. Send resultback ObjectStore parameters to Client <Model1, “Pretzel istasty”> PhysicalStages 1. When aprediction 3. Execute stagesusing request arrives thread-pools, Runtime managed byScheduler

  25. 2 5 Outline • (Black-box) Prediction ServingSystems • Limitations of Black boxapproaches • PRETZEL: White-box Prediction ServingSystem • Evaluation • Conclusion

  26. 2 6 Evaluation • Q. How PRETZEL improves performance overblack-box approaches? • in terms of latency , memory and throughput • 500 Models from Microsoft Machine Learning T eam • 250 Sentiment Analysis(Memory-bound) • 250 Attendee Count(Compute-bound) • System configuration • 16 Cores CPU, 32GBRAM • Windows 10, .Net core2.0

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend