Op Opti timi mizat atio ion n Fram amewor ork k for or DN - PowerPoint PPT Presentation

Neu euOS OS: : A L A Lat aten ency cy-Pr Pred edict ictabl able e Mul ulti ti-Di Dime mens nsion ional al Op Opti timi mizat atio ion n Fram amewor ork k for or DN DNN-dr driv iven en Aut uton onom omous ous Sys ystem ems Soroush Bateni The University of Texas at Dallas Cong Liu The University of Texas at Dallas

Background The e ta tale of e of tw two w o wor orlds ds Deep Neural Networks (DNNs) Autonomous Embedded Systems Σ Σ Σ Σ Σ Σ FC FC Autonomous SoftMax Decision 2

Background The e ta tale of e of tw two w o wor orlds ds Deep Neural Networks (DNNs) Autonomous Embedded Systems Σ Σ Σ Σ Σ Σ Main Objectives Main Objective • Timing predictability • Energy efficiency • Maximum Accuracy • Safety FC FC Autonomous SoftMax Decision 3

Background Mar arriage iage betw tween een th the tw e two w o wor orlds ds Deep Neural Networks (DNNs) Autonomous Embedded Systems Σ Σ Σ Σ Σ Σ FC FC Autonomous SoftMax Decision 4

Background The e big ig pic ictur ture Hardware/software DNN stack for executing DNNs in Autonomous Embedded Systems Framework/OS 5

Background The e big ig pic ictur ture Hardware/software DNN stack for executing DNNs in Autonomous Embedded Systems Framework/OS The focus of related research in AES is currently mostly on the DNN and the hardware. 6

Background The e big ig pic ictur ture Efficient DNNs Hardware/software DNN • Quantization stack for executing • Lowrank approximation DNNs in Autonomous Embedded Systems Framework/OS 7

Background The e big ig pic ictur ture Hardware/software DNN stack for executing DNNs in Autonomous Embedded Systems Framework/OS Special Processors • AI accelerators • DNN-focused SoCs 8

Goals Wh Wher ere syst e system em sof softw twar are/fr e/frame amewor ork k ca can n hel elp DNN Challenges • Meet timing requirements Framework/OS • Be energy efficient • Minimize accuracy loss. All the above goals must be achieved at the same time. 9

Motivation Jack Ja ck of of al all tr trad ades, es, ma mast ster er of of no none ne Timing predictable & Timing predictable & Master of none energy efficient accurate Combining the two (even at Can be achieved at system Can be achieved at different rates) will yield level via Dynamic Voltage application level via DNN unpredictable results. Frequency Scaling (DVFS). configuration change. 10

Motivation Jack Ja ck of of al all tr trad ades, es, ma mast ster er of of no none ne Timing predictable & Timing predictable & Master of none energy efficient accurate Combining the two (even at Can be achieved at system Can be achieved at different rates) will yield level via Dynamic Voltage application level via DNN unpredictable results. Frequency Scaling (DVFS). configuration change. 11

Motivation Jack Ja ck of of al all tr trad ades, es, ma mast ster er of of no none ne Timing predictable & Timing predictable & Master of none energy efficient accurate Combining the two (even at Can be achieved at system Can be achieved at Need per-layer Need per-layer different rates) will yield level via Dynamic Voltage application level via DNN Need coordination. unpredictable results. adjustments. adjustments. Frequency Scaling (DVFS). configuration change. 12

Motivation No o on one is a e is alon one Multiple ResNet-50 instances executed together Takeaways The underlying system-level solution here is PredJoule 1 The first DNN instance is the winner, other DNN instances not as lucky because the method used here is greedy. The DVFS configurations chosen only work well for the first DNN instance. 1 Bateni, Soroush, Husheng Zhou, Yuankun Zhu, and Cong Liu. "Predjoule: A timing-predictable energy optimization framework for deep neural networks." In 2018 IEEE Real-Time Systems Symposium (RTSS) 13

Motivation No o on one is a e is alon one Multiple ResNet-50 instances executed together Takeaways The underlying system-level solution here is PredJoule 1 The first DNN instance is the winner, other DNN instances not as lucky Need cross-DNN because the method used coordination. here is greedy. The DVFS configurations chosen only work well for the first DNN instance. 1 Bateni, Soroush, Husheng Zhou, Yuankun Zhu, and Cong Liu. "Predjoule: A timing-predictable energy optimization framework for deep neural networks." In 2018 IEEE Real-Time Systems Symposium (RTSS) 14

Design Desig Des ign Goal n Goals Core Targets Optimization Targets Timing ng predicta ctable: the system must meet The system must also be flexible to adapt to • deadlines set by the system designer for the different system constraints. We offer three DNN. optimization targets (switchable by an • Energy y efficient icient: the system must use DVFS to external policy controller): achieve near-optimal energy usage for DNNs. • Min Energy y (M p ) ) is used when our design is deployed in Ac Accu curat ate: the system can change accuracy • extremely low power scenario such as remote sensing. dynamically but must do so cautiously. • Max Ac Accur urac acy y (M A ) ) is used when our design is deployed • Multi ti-DNN DNN compa mpati tibility ility: the system should be in extremely mission-critical scenarios. able to coordinate and find an efficient • Balanc nced ed Energy y and Ac Accur urac acy y is the scenario where our solution for all DNN instances. design can choose what is best given the timing requirement. 15

Design Tim imin ing predi edict ctabi abili lity ty LAG analysis Proportional Deadline Keep track of per-layer progress Build an ideal schedule by setting sub • • deadlines in proportion to their execution Tracked execution time time Per-layer execution time Accumulative LAG Per-layer sub-deadline Per-layer sub-deadline End-to-end deadline for the DNN instance 16

Design Coo oordi dinati nation on Building a cohort ∆ Calculator X i Calculator We keep a pair of local 1. Based on the last reported 1. For each element of ∆, values of LAG in the calculate the required variables for each DNN cohort, calculate a (further) speedup (or instance. speedup (or slowdown) slowdown) for other DNN Lookup 1 the best possible instances. 2. This time, lookup 1 the best DVFS configuration for 2. that slowdown. possible approximation configuration that 3. The output is a list (∆) of matches that slowdown. optimal DVFS configurations for each DNN instance. 1 Please see the paper and the source code for more information. 17

Design Op Opti timi mizat atio ion The decision tree Overview of modes 2 … Cohort 1 n-1 n ∆ Calculator … ∆= δ 1 δ 2 δ n-1 δ n … X i Calculator X i Calculator … … 𝑇 𝐵1 𝑇 𝐵2 𝑇 𝐵𝑜 𝑇 𝐵1 𝑇 𝐵2 𝑇 𝐵𝑜 18

Design Op Opti timi mizat atio ion The decision tree Overview of modes 2 … Choosing a δ (DVFS configuration) will have consequences in Cohort 1 n-1 n terms of accuracy for all DNNs in the cohort. Therefore, the question is, which δ is the best? ∆ Calculator Min. . Energy (M p ) ) chooses the δ that has the least PowerUp value in the … ∆= δ 1 δ 2 δ n-1 δ n PowerUp/SpeedUp table, without looking at accuracy loss. Max. Ac Accur uracy cy (M A ) ) chooses the δ so to minimize the value of σ ∀𝜀 𝑗 𝑇 𝐵𝑗 . … X i Calculator X i Calculator Balanc nced ed Energy y and Ac Accur urac acy y uses the Bivariate Regression Analysis (BRA) to … … achieve a balanced approach backed by statistical analysis of the tree 1 . 𝑇 𝐵1 𝑇 𝐵2 𝑇 𝐵𝑜 𝑇 𝐵1 𝑇 𝐵2 𝑇 𝐵𝑜 1 Please see the paper for more information. 19

Implementation and Evaluation Over Ov ervie view Based on Caffe Tested extensively Available as an open-source project on Tested on NVIDIA Jetson TX2 and Jetson • • GitHub AGX Xavier No need to use APIs Tested using image recognition DNNs • • No need to redesign DNN models • AlexNet, GoogleNet, ResNet-50, VGGNet • Tested using three cohort sizes • Need to generate • • Small: 1 DNN instance • Hash tables • Medium: 2-4 DNN instances • Lowrank approximated version of your DNN model. • Large: 6-8 DNN instances We include a mixed scenario that uses a • combination of all the DNN models 20

Evaluation Ene Energy 68% avg. improvement on TX2 70% avg. improvement on TX2 46% avg. improvement on AGX Xavier 21

Evaluation Ene Energy 22

Evaluation La Latenc ency 23

Evaluation La Latenc ency 68% avg. improvement on TX2 53% avg. improvement on TX2 40% avg. improvement on AGX Xavier 32% avg. improvement on AGX Xavier 24

Evaluation Tai ail La Latenc ency Small cohort Medium cohort Large cohort 3.25% deadline miss rate. Deadline miss rate same as Deadline miss rate same as the small cohort. the small cohort. 25

Evaluation Rel elat ativ ive e Acc ccur uracy acy 26

Evaluation Fl Flexi xibil ilit ity 27

Evaluation Fl Flexi xibil ilit ity 11759 DVFS configurations on Jetson TX2. 51967 DVFS configurations on Jetson AGX Xavier. 28

Op Opti timi mizat atio ion n Fram amewor ork k for or DN - PowerPoint PPT Presentation

Neu euOS OS: : A L A Lat aten ency cy-Pr Pred edict ictabl able e Mul ulti ti-Di Dime mens nsion ional al Op Opti timi mizat atio ion n Fram amewor ork k for or DN DNN-dr driv iven en Aut uton onom omous ous

OPTI 95 MIAMI OPTI 95 MIAMI OPTI 95 MIAMI 19- -21 September 1995 21 September

Inter Interconn connecte ected S d Syst ystems ems Framew amewor ork k (IS (ISF): F):

Introd oduction on t to o Bal Baldrige Ex Excel ellen ence F e Fram amewor ork A

OPTI 500, Spring 2011, Lecture 11, Bouncing Rays 1 OPTI 500, Spring 2011, Lecture 11, Bouncing

GEOMATI CS EDUCATI ON I N SPAI N: GEOMATI CS EDUCATI ON I N SPAI N: influence of the new legal

Flask Pyth thon on web eb fram amewor orks ks Django Roughly follows MVC pattern

Tota otal S Sens ensitivity B Bas ased ed DFM FM Optimiz mizat ation ion of of Standar

35 Years of TIMI Trials? Robert P. Giugliano, MD, SM, FACC, FAHA Senior Investigator, TIMI Study

Op Opti timi mizi zing ng Path thway Pa Partnerships Al Align gning g TWC and THECB

TEDDY TEDDY T ask-force in E urope for D rug D evelopment for the Y oung 16 February 2009,

Collect ollectiv ive e Fr Framew amewor ork k and and Per erfor ormance mance Optimiz

Business siness Services vices Frame amewor ork k Draf aft t Pla lan Engag agemen ement

A A Framew amewor ork k for Esta or Establishin blishing g System System of of Systems

RA RACE CE A softw A softwar are e fr framew amewor ork k for inter or interope

20 2019-20 20 Sp Spec ecia ial l Ed Educ ucat atio ion n Bu Budg dget Pr t Pres

Marketing the Benefits of Online Auctions 1 Russ Hilk CAI, GPPA, AMM Fram Fram Erik Rasmus

San Jos, Costa Rica 26 de setembro de 2016 Best Practices to IXP Participants How to Internet

Autonomy, Intention, Verification Michael Fisher University of Liverpool, United Kingdom

Analysis of Internet topology data Johnson Chen and Ljiljana Trajkovi hchenj, ljilja@cs.sfu.ca

Deep Learning for Control in Robotics Narada Warakagoda Robotics = Physical Autonomous Systems

Apply Image-to-Image Translation on Autonomous Driving Systems Testing Presented by Yilin Han,

The IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems FairWare2018, 29 May

Visual Scene Understanding for Autonomous Driving Raquel Urtasun University of Toronto Oct 3,

through Natural Language Francisco Javier Chiyah Garcia Heriot-Watt University, Edinburgh, UK GI