SnaPEA : Predictive Early Activation for Reducing Computation In - - PowerPoint PPT Presentation

snapea
SMART_READER_LITE
LIVE PREVIEW

SnaPEA : Predictive Early Activation for Reducing Computation In - - PowerPoint PPT Presentation

SnaPEA : Predictive Early Activation for Reducing Computation In Deep Convolutional Neural Networks * * Vahideh Akhlaghi Amir Yazdanbakhsh Kambiz Samadi Rajesh K. Gupta Hadi Esmaeilzadeh * Equal Contribution University


slide-1
SLIDE 1

Vahideh Akhlaghi Amir Yazdanbakhsh Kambiz Samadi Rajesh K. Gupta Hadi Esmaeilzadeh

SnaPEA:

Predictive Early Activation for Reducing Computation In Deep Convolutional Neural Networks

University of California, San Diego †Georgia Institute of Technology

‡Qualcomm Technologies, Inc

Equal Contribution

ISCA ’18

* *† ‡ *

slide-2
SLIDE 2

CNNs perform trillions of operations for one input

. . .

Dog

GoogLeNet

283,000,000,000 Ops

SqueezeNet

222,000,000,000 Ops

AlexNet

1,147,000,000,000 Ops

VGG-16

16,362,000,000,000 Ops CNN models

Operations for inference

2

slide-3
SLIDE 3

. . .

Dog

GoogLeNet

283,000,000,000 Ops

SqueezeNet

222,000,000,000 Ops

AlexNet

1,147,000,000,000 Ops

VGG-16

16,362,000,000,000 Ops CNN models

Operations for inference

≥ 90% of operations are for convolutional layers

Convolutions dominate CNN computation

3

slide-4
SLIDE 4

Research challenge:

How to reduce CNN computation with minimal effect on accuracy? Our solution: SnaPEA

  • 1. Leverage algorithmic structure
  • 2. Exploit runtime information
  • 3. Tune up with static multi-variable optimization

4

slide-5
SLIDE 5

(1) Algorithmic structure of CNNs guides SnaPEA

. . . . . .

Pooling

Conv

ReLU Normal.

Conv

ReLU

Conv

ReLU

Kernels

. . . 𝒙𝒍𝒌𝒋𝒚𝒍𝒌𝒋

𝒋 𝒌 𝒍

5

slide-6
SLIDE 6

(1) Algorithmic structure of CNNs guides SnaPEA

. . . . . .

Pooling

Conv

ReLU Normal.

Conv

ReLU

Conv

ReLU

Rectified Linear Unit (ReLU)

Kernels

. . . 𝒙𝒍𝒌𝒋𝒚𝒍𝒌𝒋

𝒋 𝒌 𝒍

6

slide-7
SLIDE 7

Opportunity to reduce the computation

Large number of negative convolution outputs (61% on average)

0% 20% 40% 60% 80% 100%

AlexNet GoogLeNet SqueezeNet VGGNet Average

Negative Inputs to the Activation Layers

. . .

Conv

. . .

ReLU

GoogLeNet

Black pixels are zero values

7

slide-8
SLIDE 8

ReLU makes negative outputs zero: cut convolution short

Early termination of convolution

Blue boxes are the performed operations in two highlighted convolutions

Input Output

Convolution

8

slide-9
SLIDE 9

GoogLeNet

. . . . . .

Pooling

Conv

ReLU Normal.

Conv

ReLU

Conv

ReLU

(2) Runtime information enables reducing computation Varying distribution of zero and non-zero outputs

Rectified Linear Unit (ReLU)

9

slide-10
SLIDE 10

SnaPEA: Principles

SnaPEA: Leveraging algorithmic structure of CNNs and runtime information

Reduce computation without accuracy loss Trade accuracy for further computation reduction Add minimal hardware overhead

10

slide-11
SLIDE 11

+ + + + + + + + + + + + X +

  • +

+ +

  • +
  • +
  • w

+

  • +

+ + + +

  • +
  • PartialSum

ReLU

Original convolution

SnaPEA: An illustrative example

11

slide-12
SLIDE 12

+ + + + + + + + + + + + X +

  • +

+ +

  • +
  • +
  • w

+

  • +

+ + + +

  • +
  • PartialSum

ReLU + + + + + +

  • w

+ + + + + + + + + + + + X + + + + + + +

  • PartialSum
  • ReLU

Convolution in SnaPEA (Exact mode) Original convolution

SnaPEA: An illustrative example (Exact mode)

12

slide-13
SLIDE 13

20 40 60 80 100 AlexNet GoogleNet SqueezeNet VGGNet % Negative Weights

Potential benefits in the exact mode

On average, 54% of the weights are negative

13

slide-14
SLIDE 14

w X

+ + +

PartialSum ≤ th

+ + + +

  • +

SnaPEA: An illustrative example (Predictive mode)

Convolution in SnaPEA (Predictive mode)

Partial Sum

+ + + + + + + + + + + +

X

+

  • +

+ + + +

  • w

+ + + + + + + +

  • Yes

Speculation operations

14

+ + + + + + + + + + + +

X

+

  • +

+ + + +

  • w

+ + + + + + + +

  • Partial Sum

No

slide-15
SLIDE 15

Speculation operations

X * w

x * w

Group 1 Group 2 Group n

Large absolute value

Large Small n largest weights largest weights from each group

Small absolute value

X * w

x * w

15

slide-16
SLIDE 16

Speculation parameters: Th: Threshold N: Number of speculation operations Find (Th, N) for all kernels in a CNN to minimize operations and satisfy the accuracy

Optimize the level of speculation

16

slide-17
SLIDE 17

… …

All convolution kernels in a CNN

Layer 2 Layer L Layer 1

(Th,N)

Kernel Profiling Local Optimization Global Optimization

Optimize the level of speculation

17

slide-18
SLIDE 18

Threshold Value # of Operations

… …

Layer 2 Layer L Layer 1

Dog

Per kernel sensitivity analysis

… …

All convolution kernels in a CNN

Layer 2 Layer L Layer 1

(Th,N)

Kernel Profiling Local Optimization Global Optimization

Optimize the level of speculation

18

slide-19
SLIDE 19

Threshold Value # of Operations

… …

Layer 2 Layer L Layer 1

Cat

Per kernel sensitivity analysis

… …

All convolution kernels in a CNN

Layer 2 Layer L Layer 1

(Th,N)

Kernel Profiling Local Optimization Global Optimization

Optimize the level of speculation

19

slide-20
SLIDE 20

Threshold Value # of Operations

Per kernel sensitivity analysis

… …

Layer 2 Layer L Layer 1

Dog

… …

All convolution kernels in a CNN

Layer 2 Layer L Layer 1

(Th,N)

Kernel Profiling Local Optimization Global Optimization

Optimize the level of speculation

20

slide-21
SLIDE 21

Set of configurations per layer

… …

Layer 2 Layer L Layer 1 Dog

Kernel m Kernel 2 Kernel 1

Kernel m Kernel 2 Kernel 1

Dog

… …

All convolution kernels in a CNN

Layer 2 Layer L Layer 1

(Th,N)

Kernel Profiling Local Optimization Global Optimization

Optimize the level of speculation

21

slide-22
SLIDE 22

… …

Layer 2 Layer L Layer 1

Kernel k Kernel 2 Kernel 1

Kernel k Kernel 2 Kernel 1

Set of configurations per layer Dog Dog

… …

All convolution kernels in a CNN

Layer 2 Layer L Layer 1

(Th,N)

Kernel Profiling Local Optimization Global Optimization

Optimize the level of speculation

22

slide-23
SLIDE 23

… …

Layer 2 Layer L Layer 1

Cat

Layer L Kernel k Layer 1 Kernel 2 Layer 1 Kernel 1

Adjust parameters regarding the cross-layer effect

… …

All convolution kernels in a CNN

Layer 2 Layer L Layer 1

(Th,N)

Kernel Profiling Local Optimization Global Optimization

Optimize the level of speculation

23

slide-24
SLIDE 24

… …

Layer 2 Layer L Layer 1

Bird

Layer L Kernel k Layer 1 Kernel 2 Layer 1 Kernel 1

Adjust parameters regarding the cross-layer effect

… …

All convolution kernels in a CNN

Layer 2 Layer L Layer 1

(Th,N)

Kernel Profiling Local Optimization Global Optimization

Optimize the level of speculation

24

slide-25
SLIDE 25

Adjust parameters regarding the cross-layer effect

… …

Layer 2 Layer L Layer 1

Dog

Layer L Kernel k Layer 1 Kernel 2 Layer 1 Kernel 1

… …

All convolution kernels in a CNN

Layer 2 Layer L Layer 1

(Th,N)

Kernel Profiling Local Optimization Global Optimization

Optimize the level of speculation

25

slide-26
SLIDE 26

SnaPEA: Hardware implementation

Add low-overhead sign checks and threshold checks to the hardware

… …

PE1,1 PE1,m PEn,1 PEn,m

Partial result Terminate Terminate

Sign-bit

Exact mode Predictive mode

Threshold

PAU MAC PAU PAU

MAC MAC PAU MAC PAU PAU

MAC MAC

PAU MAC PAU PAU

MAC MAC PAU MAC PAU PAU

MAC MAC

Prediction Activation Unit (PAU)

26

slide-27
SLIDE 27

Compute Lane

Weight and In/Out Buffer Index Buffer K Compute Lanes MAC Prediction Activation Unit (PAU)

SnaPEA: Hardware implementation

Processing Engine (PE)

27

slide-28
SLIDE 28

CNN Model AlexNet 2012 GoogLeNet 2015 SqueezeNet 2016 VGG-16 2014 Top-1 Accuracy 57.2% 68.7% 57.5% 70.5% Top-5 Accuracy 80.1% 89.0% 80.3% 89.9%

Benchmarks Optimization Hardware implementation

Optimization algorithm built on top of Caffe Simulation: Cycle accurate Power estimation: Design Compiler using TSMC 45 nm Baseline design: Eyeriss with the same number MAC units (256) SnaPEA area overhead compared to Eyeriss: 4.5%

28

Experimental setup

slide-29
SLIDE 29

1.27 1.34 1.29 1.24 1.28 1.37 1.45 1.44 1.26 1.38 1.54 2.02 1.52 1.51 1.63 1.89 2.08 1.83 1.81 1.81

0.5 1 1.5 2 2.5

AlexNet GoogleNet SqueezeNet VGGNet Geomean Exact Predictive (accuracy loss <= 1%) Predictive (accuracy loss <= 2%) Predictive (accuracy loss <= 3%)

Speedup over Eyeriss

29

Experimental results

slide-30
SLIDE 30

Layers in the predictive mode for accuracy loss ≤ 3%

Network % of Conv Layers Speedup Energy Improvement AlexNet 60.0 2.11 1.97 GoogleNet 84.2 2.14 2.04 SqueezeNet 65.4 1.94 1.84 VGGNet 61.5 1.87 1.73

Experimental results

On average, 68% of layers operate in the predictive mode (3% accuracy drop).

30

slide-31
SLIDE 31

Speedup

Highest speedup (3.6x) in a layer in GoogLeNet

31

Experimental results

slide-32
SLIDE 32

SnaPEA

Exploit algorithmic structure and runtime information Reduce computations in convolutional layers Control the accuracy with multi-variable optimization Add minimal hardware overhead

Future directions

Leverage runtime information (e.g., patterns in inputs and activations) Expand to other activation functions (e.g., sigmoid) Tune up the hardware for more parallelism

Conclusion

32