Mixed Precision Neural Architecture Search for Energy Efficient Deep - PowerPoint PPT Presentation

Mixed Precision Neural Architecture Search for Energy Efficient Deep Learning Chengyue Gong* 1 , Zixuan Jiang * 2 , Dilin Wang 1 , Yibo Lin 2 , Qiang Liu 1 , and David Z. Pan 2 1 CS Department, 2 ECE Department The University of Texas at Austin ∗ indicates equal contributions 1

Contents t Introduction t Our algorithm t Experimental results t Conclusion 2

Success of Machine Learning cat Chinese English 3

Energy Efficient Computation t Energy computation, latency, security, etc. are critical metrics of edge inference. Cloud Edge Training Inference t Tradeoff between accuracy and complexity of models. t Efficient computation Higher › Neural architecture Accuracy Less Energy › Quantization 4

Neural Architecture Design t Mechanism of neural ImageNet Results 1000 50 networks is not well 45 interpreted. 40 35 Layers / Speed (ms) 100 Error rate (%) 30 t Designing neural architecture 25 is challenging. 20 10 15 10 t Can we advance AI/ML using 5 artificial intelligence instead 1 0 t 1 6 9 8 4 0 1 2 0 e of human intelligence? V 1 1 0 5 0 1 3 5 N - - - - - 1 1 2 - G G t t t x n e e e - - - t t t e o G G N N N e e e l i A N N N t V V p s s s e e e s s s e R R R e e e c R R R n I Layers Speed (ms) Top-1 error Top-5 error 5

Neural Architecture Search Search space Environment as a black box Sample networks # ! " Training and evaluation Hardware simulation Update controller 6

Neural Architecture Search t Black box optimization › Find the optimal network Search configuration to maximize the space performance › Huge search space Sample Update following t Available methods policy policy › Reinforcement learning › Evolutionary algorithm Black box › Differentiable architecture search H. Liu, K. Simonyan, and Y. Yang, “Darts: Differentiable architecture search,” ICLR 2019. 7

Quantization t Weights, activations can be MobileNet-V1 on ImageNet 100 35 quantized due to the inherent 90 redundancy in representations. 30 80 25 70 t Mixed precision for different 60 Energy (mJ) 20 Accuracy layers 50 15 › HAQ 40 30 10 20 K. Wang, Z. Liu, Y. Lin, J. Lin, and S. Han, “HAQ: 5 hardware-aware automated quantization with mixed 10 precision,” CVPR, 2019. 0 0 fix8 fix6 fix4 haq mixed haq mixed top 1 top 5 energy (mJ) 8

Our Work Mixed Precision Quantization Neural Energy Architecture Efficient Search Computation Our work 9

Search Space Basis: MobileNetV2 Block (MB) expand ratio ! ∈ 1,3,6 kernel size ' ∈ 3,5,7 network connectivity * ∈ 0,1 layer−wise bitwidths , - , , . ∈ 2,4,6,8 Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. MobileNetV2: Inverted residuals and linear bottlenecks . CVPR 2018. 11

Search Space Input Shape Block Type Bitwidth #Channels Stride #Blocks 224 × 224 × 3 Conv 3 × 3 8 32 2 1 Search Space 112 × 112 × 32 16 2 1 block( !, #, $, % & , % ' ) 56 × 56 × 16 24 1 2 56 × 56 × 24 32 2 4 Neural architecture # settings !, #, $ 28 × 28 × 32 64 2 4 ()) (( ≈ +. ()×+. /0 14 × 14 × 64 128 1 4 Quantization 14 × 14 × 128 160 2 5 % & , % ' 7 × 7 × 160 256 1 2 7 × 7 × 256 Conv 1 × 1 8 1280 1 1 7 × 7 × 1280 Pooling and FC 8 - 1 1 12

Our Framework Environment Deployment Energy Controller Weighted # ! " Reward Loss Training Evaluation 13

Problem Formulation t Discover neural architectures that minimize the task-related loss while satisfying the energy constraint . , ∗ . ; 0 12345267 min $ % &~( ) * + Expectation of loss 8. :. ; ∗ = argmin * + , . ; 0 6@24A Training of NN % &~( ) B . < D Energy constraint E $ is the policy with parameter F . represents a neural network with weights ; 14

Problem Formulation t Discover neural architectures that minimize the task-related loss while satisfying the energy constraint. , ∗ . ; 0 12345267 min $ % &~( ) * + 8. :. ; ∗ = argmin * + , . ; 0 6@24A % &~( ) B . < D t Relaxation , ∗ . ; 0 12345267 min $ % &~( ) * + + F max % &~( ) B . − D, 0 8. :. ; ∗ = argmin * + , . ; 0 6@24A 15

Hardware Environment Deployment Energy Controller Weighted # ! " Reward Loss Training Evaluation 16

REINFORCE algorithm t Policy gradient theorem For any differentiable policy ! " , for any policy objective functions # , the policy gradient is ∇ " # % = ' ( ) [∇ " log ! " . ( ) ] t Non-differentiable energy measures ∇ " ' 0~( ) # 2 = ' 0~( ) # 2 ∇ " log ! " : ≈ 1 5 6 # 2 7 ∇ " log ! " 2 7 789 17

Software Environment Deployment Energy Controller Weighted # ! " Reward Loss Training Evaluation 18

Non-Differentiability t Relax the discrete mask variable ! to be a continuous random variable computed by the Gumbel Softmax function ! " = exp (( " + log - " )/0 ∑ exp (( " + log - " )/0 where - " is the logit, ( " ~ Gumbel(0, 1), 0 is the temperature. t One-hot [0, 1, 0] Continuous [.3, .5, .2] E. Jang, S. Gu, and B. Poole, “Categorical reparameterization with Gumbel-Softmax,” ICLR, 2017. 19

Bilevel Optimization t Whenever the policy parameter ! changes, the weights of network "($) needs to be retrained. t Motivated by differentiable architecture search ( DARTS ), we propose the following algorithm. Sample minibatch of network configurations $ from the controller Update network models "($) by minimizing the training loss Update the controller parameters ! H. Liu, K. Simonyan, and Y. Yang, “DARTS: Differentiable architecture search,” ICLR 2019. 20

Experimental Settings t Hardware simulator of Bit Fusion [1, 2] t First, search architectures and mixed precision for each layer on a proxy task, tiny ImageNet › Trained for a fixed 60 epochs › 5 days on 1 NVIDIA Tesla P100 t Next, train the discovered architectures on CIFAR-100 and ImageNet from scratch. [1] H. Sharma, J. Park, N. Suda, L. Lai, B. Chau, V. Chandra, and H. Esmaeilzadeh, “Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural network,” in Proc. ISCA, June 2018, pp. 764–775. [2] https://github.com/hsharma35/bitfusion 22

Searched Results ! = 0.1 Ours-small Ours-base ! = 0.01 23

Results on ImageNet IMAGENET RESULT HAQ-small Ours-small HAQ-base Ours-base 40.2 32.1 24.7 21.2 16.3 12.9 12.7 11.6 10.9 10.1 9.94 8.91 2.12 2.06 1.7 1.44 Top-5 Error Model Size (MB) Energy (mJ) Latency (ms) 24

Joint NAS and Mixed Precision Quantization 22.6 NAS + quantization 22.4 (small) 22.2 Ours-small 22 Error (%) Model 21.8 Size NAS + quantization 21.6 (base) 21.4 Ours-base 21.2 21 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 Energy (mJ) 25

Adaptive Mixed Precision Quantization t Pareto front for error rate, latency, and energy 26

Conclusion t We propose a new methodology to perform joint optimization of NAS and mixed precision quantization in the extended search space. t Hardware performance is involved in the objective function. t Our methodology facilitates the end-to-end design automation flow of neural network design and deployment, especially the edge inference. 28

Thank you! 29

Backup Framework t Pipelines of Hardware-Centric Model Design Automation for Quantization Efficient Neural Networks t Limited research considers each stage of the pipeline Mixed collaboratively Precision NAS t Our proposed framework: Mixed Precision NAS Neural Model Architecture Pruning Search 30

Backup result ImageNet RESULT VGG-16 FXP 8 Resnet-50 FXP 8 MobileNetV2 FXP 8 FBNet-B FXP 8 FBNet-B FXP3 HAQ-small Mixed Ours-small Mixed HAQ-base Mixed Ours-base Mixed 838 753 591 557 138 83.9 73.9 36.29 40.2 33.01 31.62 34.7 32.1 28.19 28.23 26.84 29.1 29.1 27.9 25.5 24.7 24.7 29 21.2 16.3 15.4 13.5 12.7 12.9 11.6 10.9 10.1 9.94 9.75 8.97 8.91 7.4 5.3 4.5 3.4 2.12 2.06 1.68 1.7 1.44 Top-1 Error Top-5 Error Model Size (MB) Energy (mJ) Latency (ms) 31

Mixed Precision Neural Architecture Search for Energy Efficient Deep - PowerPoint PPT Presentation

Mixed Precision Neural Architecture Search for Energy Efficient Deep Learning Chengyue Gong* 1 , Zixuan Jiang * 2 , Dilin Wang 1 , Yibo Lin 2 , Qiang Liu 1 , and David Z. Pan 2 1 CS Department, 2 ECE Department The University of Texas at Austin

Mixed Precision Training PAI Overview What is mixed-precision

MIXED PRECISION TRAINING OF DEEP NEURAL NETWORKS Carl Case, NVIDIA OUTLINE 1. What is mixed

MIXED PRECISION TRAINING Michael OConnor MIXED PRECISION What is the benefit? Using mixed

EFFECTIVE USE OF MIXED PRECISION FOR HPC Kate Clark, Smoky Mountain Conference 2019 Why Mixed

MIXED PRECISION TRAINING: THEORY AND PRACTICE Paulius Micikevicius What is Mixed Precision

Neural Architecture Search Yu Cao What is Neural Architecture Search (NAS) Selecting the optimal

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Automated Mixed-Precision for TensorFlow Training Reed Wanderman-Milne (Google) and Nathan Luehr

Mixed Oxides in Selective Mixed Oxides in Selective Mixed Oxides in Selective Mixed Oxides in

Mixed Methodological Analysis David F. Feldon Utah State University May 8, 2018 Mixed Methods

Regression 2: Mixed Models Marco Baroni Practical Statistics in R Outline Mixed models with

Mixing it up with random effects Joshua Loftus Mixed models Intro to mixed models What is a

AUTOMATIC MIXED PRECISION IN PYTORCH Michael Carilli and Michael Ruberry, 3/20/2019 THIS TALK

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

VLVK EHF. VLVK EHF. Precision machining Precision machining Professional precision for

2018 Milken Institute Hamptons Dialogues Precision, Precision, Precision: The Future of Health

Opportunities Andy Macdonald 6 th December 2016 Agenda 1. Offshore Renewable Energy Catapult

The Relational Model of Data 5DV119 Introduction to Database Management Ume a University

Hash Tables LAST TODAY NEXT Client vs library interface Hashing Making Implementing

JavaScript Arrays and RegExs Lecture 10 CGS 3066 Fall 2016 November 1, 2016 Arrays

MICE Update J. Pasternak 03/12/2014, SLAC, MAP meeting Outline Introduction Preparations

Survey: Leveraging Human Guidance for Deep Reinforcement Learning Tasks Ruohan Zhang, Faraz

Nonlinear Modulational Instability of Dispersive PDE Models Jiayin Jin, Shasha Liao, and Zhiwu

Machine Learning Software: Design and Practical Use Chih-Jen Lin National Taiwan University