alert accurate learning for energy and timeliness
play

ALERT: Accurate Learning for Energy and Timeliness Chengcheng Wan , - PowerPoint PPT Presentation

ALERT: Accurate Learning for Energy and Timeliness Chengcheng Wan , Muhammad Husni Santriaji, Eri Rogers, Henry Hoffmann, Michael Maire and Shan Lu DNN is Deployed Everywhere Trading Auto Smart driving city QA robot Weather Text forecast


  1. ALERT: Accurate Learning for Energy and Timeliness Chengcheng Wan , Muhammad Husni Santriaji, Eri Rogers, Henry Hoffmann, Michael Maire and Shan Lu

  2. DNN is Deployed Everywhere Trading Auto Smart driving city QA robot Weather Text forecast generator 2

  3. DNN Deployment is Challenging. ? L A DNN System E Challenges • Configuration space is huge Road • Environment may change dynamically • Must be low overhead 3

  4. Previous Work Previous Works Challenges Huge Space of Configuration Resource Dynamic Management Environment [1] H. Hoffmann et. al. Jouleguard: DNN design energy guarantees for approximate applications. SOSP, 2015. Low Overhead [2] C. Imes et. al. Poet: a portable approach to minimizing energy under soft real-time constraints. RTAS, 2015 [3] N. Mishra et. al. CALOREE: learning control for predictable latency and low energy. ASPLOS, 2018. [4] A. Rahmani et. al. SPECTR: formal supervisory control and coordination for many-core systems resource management. ASPLOS, 2018. 4 …

  5. Our ALERT System ? A L E R T DNN System DNN & Power Cap Selection L A E Feedback-based estimation Challenges Road Measurement • Configuration space is huge • Environment may change dynamically 5 • Must be low overhead

  6. Our ALERT System ? A L E R T DNN System DNN & Power Cap Selection L A E Feedback-based ξ estimation Challenges Road Measurement • Configuration space is huge • Environment may change dynamically 6 • Must be low overhead

  7. Evaluation Highlights ✔ ALERT satisfies LAE constraints. 99.9% cases for vision; 98.5% cases for NLP ✔ Probabilistic design overcomes dynamic variability efficiently. ALERT achieves 93-99% of Oracle’s performance ✔ Coordinating App- and Sys- level improves performance. Reduces 13% energy and 27% error over prior approach 7

  8. Outline Understanding DNN Deployment Challenges ALERT Run-time Inference Management Experiments and Results 8

  9. Outline Understanding DNN Deployment Challenges ALERT Run-time Inference Management Experiments and Results 9

  10. Experiment Settings Platforms DNNs 4 4 ODroid, CPUs, GPU ResNet50, VGG16, RNN, Bert Tasks 3 Image classification (ImageNet) Sentence prediction (PTB) Question Answering (SQuAD) 10

  11. Tradeoffs from DNNs 42 DNNs on ImageNet classifications 40 MobileNet-v1 (α=1) Top-5 Error Rate (%) 35 30 MobileNet-v2 (α=1.3) 25 ResNet50 20 NasNet-large 15 10 PnasNet-large 5 High accuracy comes with long latency. 0 0 0.05 0.1 0.15 0.2 Inference Time of One Image (s) 11

  12. Tradeoffs from System Settings 16 Power limit setting (W) 15.5 Average Energy (J) 15 14.5 14 Fastest 13.5 Least Energy 13 12.5 12 No setting is optimal for both energy and latency. 0.07 0.09 0.11 0.13 0.15 0.17 Inference Time of One Image (s) 12

  13. Run-time Variability Without co-locate job With co-locate job 13

  14. Run-time Variability Latency variation increased by co-located jobs. Without co-locate job With co-locate job 14

  15. Potential Solutions ∞ 100 Sys-level 90 App-level 80 Average Energy (J) Combined 70 60 50 40 30 20 10 Combining both level achieves best performance. 0 Deadline 0.1s 0.2s 0.3s 0.4s 0.5s 0.6s 0.7s Constraint Settings (deadline × accuracy_goal) 15

  16. Outline Understanding DNN Deployment Challenges ALERT Run-time Inference Management Experiments and Results 16

  17. Three Dimensions & Two Tasks Maximize Accuracy L A With energy consumption goal and inference deadline Inference Accuracy E Deadline Goal Minimize Energy With accuracy goal and inference Energy Consumption deadline Goal 17

  18. Maximize Accuracy Task Configurations Constraints Optimization Power cap L <X 1,1 1,2 1,3 A max( ) 2,1 2,2 2,3 DNNs E 3,1 3,2 3,3 <Y 4,1 4,2 4,3 18

  19. L How to estimate the inference latency? ● Two key challenges ○ Runtime variation: The inference time may be different even for same the configuration Profiling 50 Runtime 52 46 58 53 70 99 75 51 … 94 19

  20. L How to estimate the inference latency? ● Two key challenges ○ Runtime variation ○ Too many combinations of DNNs and resources Power Cap p 1 p 2 … p k X X … X d 1 d 2 X X … X DNNs … … … … X X … X d l 20

  21. L Potential Solution ● Kalman filter ○ Estimate latency for each configuration ○ Use recent execution history 51 52 43 58 49 DNN2, P1 29 31 30 DNN1, P2 History Prediction 21

  22. L Potential Solution: drawback ● Cannot solve the problem ○ Not enough history for each configuration ? DNN1, P1 51 52 43 58 49 DNN2, P1 29 31 30 DNN1, P2 ? DNN2, P2 History Prediction 22

  23. L How to estimate the inference latency? ● Global Slow-down factor ξ ○ Use recent execution history under any DNN or resources 40 60 ? DNN1, P1 34 51 DNN2, P1 ξ 20 30 DNN1, P2 150% 30 45 ? DNN2, P2 Profiling Runtime 23

  24. L How to estimate the inference latency? ● Mean estimation is not sufficient ○ The variation might be too big to provide a good prediction. ● Different implications on DNN selection Mean Variation 50 52 43 58 49 Sequence 1 5 51 50 49 49 50 Sequence 2 1 15 99 10 70 50 Sequence 3 40 History Prediction 24

  25. L How to estimate the inference latency? ● Global Slow-down factor ξ ○ Use recent execution history under any DNN or resources ○ Estimate its distribution: mean and variance Mean Variation ξ 52 43 58 49 50 5 History 25

  26. A How to estimate accuracy under a deadline? ● Can inference be finished before deadline? ○ If yes, training accuracy of the selected DNN ○ If not, random guess accuracy Inference ■ Unless it’s an Anytime DNN. Accuracy ! " ! &'() # ",% Time 26

  27. A What is an Anytime DNN? Deadline Road Traditional DNN Timeline Anytime DNN Chocolate Ground Road [1] C. Wan et. al. Orthogonalized SGD and Nested Architectures for Anytime Neural Networks . ICML, 2020. 27

  28. A How to estimate accuracy under a deadline? ● Can inference be finished before deadline? ○ If yes, training accuracy of the selected DNN ○ If not, ■ Traditional DNN: random guess accuracy. ■ Anytime DNN: accuracy of the last output Inference Accuracy Inference Accuracy ! " ! + ! * ! " ! &'() ! &'() Time # " # * # + # " Time 28 Anytime DNN Traditional DNN

  29. A How to estimate accuracy under a deadline? Accuracy- Latency Expectation Latency Distribution of Accuracy Relation 29

  30. E How to manage energy? ● Power-cap as a knob to configure system resource ● Idle power: other process may still consume energy when DNN inference has finished Power DNN active1 DNN active2 DNN Idle Latency Target New input Time 30

  31. E How to estimate the energy consumption? ● Estimate energy from power ○ DNN active power is power setting ○ DNN idle power is estimated by Kalman filter Power DNN active Power setting × time DNN Idle ?× time Time Latency Target New input 31

  32. Our ALERT System ? A L E R T DNN System DNN & Power Cap Selection L A E Feedback-based estimation Road Measurement 32

  33. Outline Understanding DNN Deployment Challenges ALERT Run-time Inference Management Experiments and Results 33

  34. Experiment Settings Platforms Tasks 3 2 CPUs, GPU 1. Minimize energy 2. Maximize accuracy DNNs Scenarios 2 5 Sparse ResNet50, RNN Default, Compute intensive (2), Memory intensive (2) 34

  35. Schemes Oracles • Oracle : Change configuration for every input. Assume perfect knowledge of future. Emulated from profiling result. • Oracle-static : Same configuration for all inputs. Baselines • Sys-only : Only adjust power-cap • App-only : Use an Anytime DNN • No-coord : Anytime DNN without coordination with power-cap 35

  36. Evaluation: Scheduler Performance Average performance normalized to Oracle_Static (Smaller is better) 1.2 1.0 App-only 0.8 Sys-only 0.6 No-coord 0.4 Sys+App(ALERT) Oracle 0.2 Violations (%) 0.0 Minimize Energy 36

  37. Evaluation: Scheduler Performance Average performance normalized to Oracle_Static (Smaller is better) 1.2 1.0 App-only 0.8 Sys-only 0.6 No-coord 0.4 Sys+App(ALERT) Oracle 0.2 Violations (%) 0.0 Minimize Error 37

  38. How ALERT Works with Traditional DNN Meet requirements in most cases Quickly detect contention changes Use anytime DNN under unstable environment 38

  39. How ALERT Works with Traditional DNN Meet requirements in most cases Quickly detect contention changes Use anytime DNN under unstable environment 39

  40. How ALERT Works with Anytime +Traditional DNN Meet requirements in most cases Quickly detect contention changes Use anytime DNN under unstable environment 40

  41. Conclusion • Understand DNN inference challenges • ALERT Run-time inference System • High performance and energy efficiency 41

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend