cs 744 clipper
play

CS 744: CLIPPER Shivaram Venkataraman Fall 2020 ADMINISTRIVIA - PowerPoint PPT Presentation

morning ! Good CS 744: CLIPPER Shivaram Venkataraman Fall 2020 ADMINISTRIVIA Course Project Proposals - Due on Friday! - See Piazza for template - Submission instructions soon midterms dangle / the ML on Pinera at upto


  1. morning ! Good CS 744: CLIPPER Shivaram Venkataraman Fall 2020

  2. ADMINISTRIVIA Course Project Proposals - Due on Friday! → - See Piazza for template - Submission instructions soon midterms dangle / the ML on Pinera → at upto Midterm details → section - Open book, open notes - Held in class time 9.30-10.45am Central Time - Type / Upload photos (extra 15 mins)

  3. MACHINE LEARNING: INFERENCE ÷ ÷ : ÷ : : O O

  4. ↳ Fw ! ! GOALS ( percentile " 99.9 ggg percentile or latency [ - - -iger - Interactive latencies (tail latency < 100ms) how → - High throughput to handle load to users many need → that requests " many made be - Improved prediction accuracy specific ML - Generality (?) models / qwitpj.AM?.wxe..m mi II t \ handle many as as frameworks possible

  5. n*iJL÷¥÷÷ Requests HTTP sk ARCHITECHTURE inform ^ over \ - - . t Improve - accuracy ' eager - → Failures r eight L → Replicates It [ - - ↳ zseikit / ↳ a spark - rabbi fair X herd ← D Deel go ← t L

  6. ↳ API people tint # dell MODEL CONTAINERS ÷dciM is implemented Interface framework per Run using Docker containers once ] µ initiative TF shim ' ate Can be replicated across machines - ' r - are model frameworks Mim ' ! TF instantiate 1¥ ' Y .pe?f - so rent → win .

  7. ↳ data point gpeoloinieddder MODEL ABSTRACTION LAYER Predict , * good . ¥kiEm;; to user - id - I movies for Caching predict - 101 TF - es are → M " µ , - Improve performance for frequent queries or spark -50 Tt - LRU eviction policy . Her - Important for feedback : - www.tfiidrYDTE.jo/iv.::.:Eiesf&dback T feedback I - ¢ dir ith ① predict . mm ! ? high and → Model Predictions update -

  8. that RPC max . do both size an To BATCHING, QUEUING within put while + ↳ fixed lost ( eating SLO - we lead overhead - → amortize hasmdddisswt ;÷% :L Goals, Insight batches could vary opium - Increase latency (within SLO) → hardware model ! parellism each for improved throughput f- ✓ - Reduce RPC overheads - GPU / BLAS acceleration Approach - Per container queues. - Why? Cpu Gpu

  9. ADAPTIVE BATCHING latency observe ) SLO LAND late AIMD: Additive Inc Multiplicative Dec write batch - - - - inc . Why ? carefully 5 batch in Increase & ed-domrmaffiHE.7.su ! 4 4 Batch Size FL 3 2 f s Delayed: Wait until batch exists 2 ① 1 - Why? 0 examples Gang 0 2 4 6 8 10 Collect certainties wait ? . ? Time → link ) should upto a dinette Elias should Iad ? then few ↳ Der

  10. Accuracy MODEL SELECTION Improve → ensembles ÷ → - y

  11. a n÷÷÷ SINGLE MODEL SELECTION Multi-Arm Bandit formulation ' - Explore vs Exploit - Regret: Loss by not picking optimal action - Goal: Minimize regret option Clipper each atta "¥a- get with - Exp3 algorithm → - - Single evaluation - - Scales to more models ¥y based weights update on feedback Omodd2 ④ model I -

  12. ↳ ensembles MULTI MODELS → predict movies it Ensemble 5¥ - Combine output from models (weighted average) Este ft \ - How do we get the weights ? Apart Combination tf v linear Robust Prediction =L y.at/32i - React to model changes • se - Output confidence score t t B d & " dos update cat Expo → classifier , - o . 0.25 → cat Binary . CI - 6 > as threshold - 4 O O CZ dog & combine

  13. ↳ ↳ STRAGGLER MITIGATION - to Why do stragglers occur? model containers N slow ? for wait be 9TH might we them of reply , some totem locating ? replicas more . .2rep Approach 1¥ result based . on Approx finished has whatever late ! → ML them specific Better approx →

  14. SUMMARY • Clipper: ML inference Workloads + Requirements • Layered architecture provides generality • Caching, Batching, Replication to improve latency, throughput • Multi-Arm bandits to improve accuracy

  15. DISCUSSION https://forms.gle/FCVhPURqz7HSbDtg6

  16. Consider a scenario where you run a model serving service that hosts a number of different applications. The traffic for some applications is sporadic (e.g. only a few hours where they are used). What are some advantages / disadvantages of using Clipper for such a service? Advantages Disadvantages contented be Rade might batching Adaptive → tune → → delayed .net?fashim ? - hooted replicas multiple → elasticity roti frequent greeted → Containerization - applications inlet - . slow t ppc pt ⇒ we . provided T ↳ de - Effie

  17. ) :L , homie :O smug bing.g.ms ? judith things ? . different O O D D Ao O - : Treasonable ↳ µ ensembles accurate low tetany inflation very is

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend