Mixed Precision Neural Architecture Search for Energy Efficient Deep - - PowerPoint PPT Presentation

mixed precision neural architecture search for energy
SMART_READER_LITE
LIVE PREVIEW

Mixed Precision Neural Architecture Search for Energy Efficient Deep - - PowerPoint PPT Presentation

Mixed Precision Neural Architecture Search for Energy Efficient Deep Learning Chengyue Gong* 1 , Zixuan Jiang * 2 , Dilin Wang 1 , Yibo Lin 2 , Qiang Liu 1 , and David Z. Pan 2 1 CS Department, 2 ECE Department The University of Texas at Austin


slide-1
SLIDE 1

Chengyue Gong*1, Zixuan Jiang*2, Dilin Wang1, Yibo Lin2, Qiang Liu1, and David Z. Pan2

1CS Department, 2ECE Department

The University of Texas at Austin

Mixed Precision Neural Architecture Search for Energy Efficient Deep Learning

1

∗ indicates equal contributions

slide-2
SLIDE 2

Contents

t Introduction t Our algorithm t Experimental results t Conclusion

2

slide-3
SLIDE 3

Chinese English

Success of Machine Learning

3

cat

slide-4
SLIDE 4

Energy Efficient Computation

t Energy computation, latency,

security, etc. are critical metrics of edge inference.

t Tradeoff between accuracy

and complexity of models.

t Efficient computation

› Neural architecture › Quantization

4

Cloud Training Edge Inference

Higher Accuracy Less Energy

slide-5
SLIDE 5

Neural Architecture Design

t Mechanism of neural

networks is not well interpreted.

t Designing neural architecture

is challenging.

t Can we advance AI/ML using

artificial intelligence instead

  • f human intelligence?

5

5 10 15 20 25 30 35 40 45 50 1 10 100 1000 A l e x N e t I n c e p t i

  • n
  • V

1 V G G

  • 1

6 V G G

  • 1

9 R e s N e t

  • 1

8 R e s N e t

  • 3

4 R e s N e t

  • 5

R e s N e t

  • 1

1 R e s N e t

  • 1

5 2 R e s N e t

  • 2

Error rate (%) Layers / Speed (ms)

ImageNet Results

Layers Speed (ms) Top-1 error Top-5 error

slide-6
SLIDE 6

Neural Architecture Search

6

Search space

!"

Sample networks # Environment as a black box Training and evaluation Hardware simulation Update controller

slide-7
SLIDE 7

Neural Architecture Search

t Black box optimization

› Find the optimal network configuration to maximize the performance › Huge search space

t Available methods

› Reinforcement learning › Evolutionary algorithm › Differentiable architecture search

  • H. Liu, K. Simonyan, and Y. Yang, “Darts: Differentiable

architecture search,” ICLR 2019.

Search space Sample following policy Black box Update policy

7

slide-8
SLIDE 8

Quantization

t Weights, activations can be

quantized due to the inherent redundancy in representations.

t Mixed precision for different

layers

› HAQ

  • K. Wang, Z. Liu, Y. Lin, J. Lin, and S. Han, “HAQ:

hardware-aware automated quantization with mixed precision,” CVPR, 2019.

8

5 10 15 20 25 30 35 10 20 30 40 50 60 70 80 90 100 fix8 fix6 fix4 haq mixed haq mixed

Energy (mJ) Accuracy

MobileNet-V1 on ImageNet

top 1 top 5 energy (mJ)

slide-9
SLIDE 9

Our Work

9

Our work Neural Architecture Search Mixed Precision Quantization Energy Efficient Computation

slide-10
SLIDE 10

Contents

t Introduction t Our algorithm t Experimental results t Conclusion

10

slide-11
SLIDE 11

Search Space Basis: MobileNetV2 Block (MB)

11

Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. MobileNetV2: Inverted residuals and linear

  • bottlenecks. CVPR 2018.

expand ratio ! ∈ 1,3,6 kernel size ' ∈ 3,5,7 network connectivity * ∈ 0,1 layer−wise bitwidths ,-, ,. ∈ 2,4,6,8

slide-12
SLIDE 12

Search Space

Input Shape Block Type Bitwidth #Channels Stride #Blocks 224 × 224 × 3 Conv 3 × 3 8 32 2 1 112 × 112 × 32 Search Space block(!, #, $, %&, %') 16 2 1 56 × 56 × 16 24 1 2 56 × 56 × 24 32 2 4 28 × 28 × 32 64 2 4 14 × 14 × 64 128 1 4 14 × 14 × 128 160 2 5 7 × 7 × 160 256 1 2 7 × 7 × 256 Conv 1 × 1 8 1280 1 1 7 × 7 × 1280 Pooling and FC 8

  • 1

1

12

Neural architecture !, #, $ Quantization %&, %'

# settings ())(( ≈ +. ()×+./0

slide-13
SLIDE 13

Our Framework

13

Environment Controller !" Evaluation Weighted Reward Training # Deployment Energy Loss

slide-14
SLIDE 14

Problem Formulation

t Discover neural architectures that minimize the task-related loss

while satisfying the energy constraint. min

$ %&~() * + ,∗ . ; 012345267

  • 8. :. ;∗ = argmin * +

, . ; 06@24A

%&~()B . < D E$ is the policy with parameter F . represents a neural network with weights ;

14

Expectation of loss Training of NN Energy constraint

slide-15
SLIDE 15

Problem Formulation

t Discover neural architectures that minimize the task-related loss

while satisfying the energy constraint. min

$ %&~() * + ,∗ . ; 012345267

  • 8. :. ;∗ = argmin * +

, . ; 06@24A

%&~()B . < D

t Relaxation

min

$ %&~() * + ,∗ . ; 012345267

+ F max %&~()B . − D, 0

  • 8. :. ;∗ = argmin * +

, . ; 06@24A

15

slide-16
SLIDE 16

Hardware Environment

16

Controller !" Evaluation Weighted Reward Training # Deployment Energy Loss

slide-17
SLIDE 17

REINFORCE algorithm

t Policy gradient theorem

For any differentiable policy !", for any policy objective functions #, the policy gradient is ∇ "# % = '()[∇" log !" .()]

t Non-differentiable energy measures

∇"'0~()# 2 = '0~() # 2 ∇" log !" ≈ 1 5 6

789 :

# 27 ∇" log !" 27

17

slide-18
SLIDE 18

Software Environment

18

Controller !" Evaluation Weighted Reward Training # Deployment Energy Loss

slide-19
SLIDE 19

Non-Differentiability

t Relax the discrete mask variable ! to be a continuous random

variable computed by the Gumbel Softmax function !" = exp (("+ log -")/0 ∑ exp (("+ log -")/0 where -" is the logit, ("~Gumbel(0, 1), 0 is the temperature.

t One-hot [0, 1, 0]

Continuous [.3, .5, .2]

  • E. Jang, S. Gu, and B. Poole, “Categorical reparameterization with Gumbel-Softmax,” ICLR, 2017.

19

slide-20
SLIDE 20

t Whenever the policy parameter ! changes, the weights of network

"($) needs to be retrained.

t Motivated by differentiable architecture search (DARTS), we

propose the following algorithm.

  • H. Liu, K. Simonyan, and Y. Yang, “DARTS: Differentiable architecture search,” ICLR 2019.

Bilevel Optimization

20

Sample minibatch of network configurations $ from the controller Update network models "($) by minimizing the training loss Update the controller parameters !

slide-21
SLIDE 21

Contents

t Introduction t Our algorithm t Experimental results t Conclusion

21

slide-22
SLIDE 22

Experimental Settings

t Hardware simulator of Bit Fusion [1, 2] t First, search architectures and mixed precision for each layer on a

proxy task, tiny ImageNet

› Trained for a fixed 60 epochs › 5 days on 1 NVIDIA Tesla P100

t Next, train the discovered architectures on CIFAR-100 and

ImageNet from scratch.

[1] H. Sharma, J. Park, N. Suda, L. Lai, B. Chau, V. Chandra, and H. Esmaeilzadeh, “Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural network,” in Proc. ISCA, June 2018, pp. 764–775. [2] https://github.com/hsharma35/bitfusion

22

slide-23
SLIDE 23

Searched Results

23

Ours-small Ours-base

! = 0.1 ! = 0.01

slide-24
SLIDE 24

Results on ImageNet

24

12.7 1.7 12.9 32.1 11.6 1.44 8.91 21.2 10.1 2.12 16.3 40.2 9.94 2.06 10.9 24.7 Top-5 Error Model Size (MB) Energy (mJ) Latency (ms)

IMAGENET RESULT

HAQ-small Ours-small HAQ-base Ours-base

slide-25
SLIDE 25

Joint NAS and Mixed Precision Quantization

25 21 21.2 21.4 21.6 21.8 22 22.2 22.4 22.6 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 Error (%) Energy (mJ)

Ours-base NAS + quantization (small) Ours-small NAS + quantization (base) Model Size

slide-26
SLIDE 26

Adaptive Mixed Precision Quantization

26

t Pareto front for error rate, latency, and energy

slide-27
SLIDE 27

Contents

t Introduction t Our algorithm t Experimental results t Conclusion

27

slide-28
SLIDE 28

Conclusion

t We propose a new methodology to perform joint optimization of

NAS and mixed precision quantization in the extended search space.

t Hardware performance is involved in the objective function. t Our methodology facilitates the end-to-end design automation flow

  • f neural network design and deployment, especially the edge

inference.

28

slide-29
SLIDE 29

Thank you!

29

slide-30
SLIDE 30

Backup Framework

t Pipelines of Hardware-Centric

Design Automation for Efficient Neural Networks

t Limited research considers

each stage of the pipeline collaboratively

t Our proposed framework:

Mixed Precision NAS

30

Mixed Precision NAS

Model Quantization Model Pruning Neural Architecture Search

slide-31
SLIDE 31

Backup result

31

29.1 7.4 138 753 838 24.7 5.3 25.5 557 591 28.19 9.75 3.4 29 73.9 26.84 8.97 4.5 34.7 83.9 36.29 15.4 1.68 13.5 27.9 33.01 12.7 1.7 12.9 32.1 31.62 11.6 1.44 8.91 21.2 29.1 10.1 2.12 16.3 40.2 28.23 9.94 2.06 10.9 24.7 Top-1 Error Top-5 Error Model Size (MB) Energy (mJ) Latency (ms)

ImageNet RESULT

VGG-16 FXP 8 Resnet-50 FXP 8 MobileNetV2 FXP 8 FBNet-B FXP 8 FBNet-B FXP3 HAQ-small Mixed Ours-small Mixed HAQ-base Mixed Ours-base Mixed