Minimum Per-Layer Precision of Deep Neural Networks Charbel Sakr , - - PowerPoint PPT Presentation

minimum per layer precision of deep
SMART_READER_LITE
LIVE PREVIEW

Minimum Per-Layer Precision of Deep Neural Networks Charbel Sakr , - - PowerPoint PPT Presentation

An Analytical Method to Determine Minimum Per-Layer Precision of Deep Neural Networks Charbel Sakr , Naresh Shanbhag Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign Code:


slide-1
SLIDE 1

An Analytical Method to Determine Minimum Per-Layer Precision of Deep Neural Networks

Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign

Charbel Sakr, Naresh Shanbhag

Code: https://github.com/charbel-sakr/Precision-Analysis-of-Neural-Networks Also found on sakr2.web.engr.Illinois.edu

slide-2
SLIDE 2

Machine Learning ASICs

How are they choosing these precisions? Why is it working? Can it be determined analytically?

[Sze’16, ISSCC] AlexNet accelerator 16b fixed-point [Chen’15, ASPLOS] ML accelerator 16b fixed-point [Google’17, ISCA] Tensorflow accelerator 8b fixed-point Eyeriss PuDianNao TPU

slide-3
SLIDE 3

Current Approaches

  • Stochastic Rounding during training [Gupta, ICML’15 – Hubara, NIPS’16]

→ Difficulty of training in a discrete space

  • Trial-and-error approach [Sung, SiPS’14]

→ Exhaustive search is expensive

  • SQNR based precision allocation [Lin, ICML’16]

→ Lack of precision/accuracy understanding

Fixed-point Quantization

No accuracy

  • vs. precision

understanding

slide-4
SLIDE 4

International Conference on Machine Learning (ICML) - 2017

slide-5
SLIDE 5

Precision in Neural Networks

[Sakr et al., ICML’17]

Classification Output Quantization Mismatch Probability

Floating-point Network Fixed-point Network

𝑞𝑛: “Mismatch Probability”

slide-6
SLIDE 6

Second Order Bound on 𝒒𝒏

[Sakr et al., ICML’17]

  • Input/Weight precision trade-off:

→ Optimal precision allocation by balancing the sum

  • Data dependence (compute once and reuse)

→ Derivatives obtained in last step of backprop → Only one forward-backward pass needed

activation quantization noise gain weight quantization noise gain

slide-7
SLIDE 7

Proof Sketch

  • For one input, when do we have a mismatch?

→ If FL network predicts label “𝑘” → But FX network predicts label “𝑗” where 𝑗 ≠ 𝑘 → This happens with some probability computed as follows: → But we already know → Whose variance is → Applying Chebyshev + LTP yields the result

(Output re-ordering due to quantization) (Symmetry of quantization noise)

[Sakr et al., ICML’17]

slide-8
SLIDE 8

Tighter Bound on 𝒒𝒏

𝑁: Number of Classes 𝑇: Signal to quantization noise ratio 𝑄1 & 𝑄2: Correction factors

  • Mismatch probability decreases double exponentially with precision

→ Theoretically stronger than Theorem 1 → Unfortunately, less practical [Sakr et al., ICML’17]

slide-9
SLIDE 9

Per-layer Precision Assignment

  • Per-layer precision allows for more

aggressive complexity reductions

  • Search

space is huge: 2L dimensional grid

  • Example: 5 layer network, precision

considered up to 16 bits → 1610 ≈

  • ne million millions design points
  • We need an analytical way to reduce

the search space → maybe using the analysis of Sakr et al.

Layer 1 Layer L

(i (activations) (weights)

key idea: equalization of reflected quantization noise variances

slide-10
SLIDE 10

Fine-grained Precision Analysis

[Sakr et al., ICML’17] [Sakr et al., ICASSP’18] per-layer quantization noise gains

slide-11
SLIDE 11

Per-layer Precision Assignment Method

equalization of quantization noise variances Search space is reduced to

  • nly

a

  • ne

dimensional axis of reference precision min Step 1: compute least quantization noise gain Step 2: select precision to equalize all quantization noise variances

slide-12
SLIDE 12

Comparison with related works

  • Simplified but meaningful model of complexity

→ Computational cost → Total number of FAs used assuming folded MACs with bit growth allowed → Number of MACs is equal to the number of dot products computed → Number of FAs per MAC: → Representational cost → Total number of bits needed to represent weights and activations → High level measure of area and communications cost (data movement)

  • Other works considered

→ Stochastic quantization (SQ) [Gupta’15, ICML] → 784 – 1000 – 1000 – 10 (MNIST) → 64C5 – MP2 – 64C5 – MP2 – 64FC – 10 (CIFAR10) → BinaryNet (BN) [Hubara’16, NIPS] → 784 – 2048 – 2048 – 2048 – 10 (MNIST) & VGG (CIFAR10)

slide-13
SLIDE 13

Precision Profiles – CIFAR-10

VGG-11 ConvNet on CIFAR-10: 32C3-32C3-MP2-64C3-64C3-MP2-128C3-128C3-256FC-256FC-10

Precision chosen such that 𝑞𝑛 ≤ 1%

  • Weight precision requirements greater [Sakr et al., ICML’17]
  • Precision decreases with depth – more sensitivity to perturbations in

early layers [Raghu et al., ICML’17]

slide-14
SLIDE 14

Precision – Accuracy Trade-offs

  • Precision reduction is the greatest when using the proposed fine-

grained precision assignment

slide-15
SLIDE 15

Precision – Accuracy – Complexity Trade-offs

  • Best complexity reduction when the proposed precision assignment
  • Complexity even much smaller than a BinaryNet for same accuracy due

to much higher network complexity

slide-16
SLIDE 16

Conclusion & Future Work

  • Presented an analytical method to determine per-layer precision of neural

networks

  • Method based on equalization of reflected quantization noise variances
  • Per-layer precision assignment reveals interesting properties of neural

networks

  • Method leads to significant overall complexity reduction, e.g., reduction of

minimum precision to 2 bits and lesser cost of implementation than BinaryNets

  • Future work:
  • Trade-offs between precision vs. depth vs. width
  • Trade-offs between precision and pruning
slide-17
SLIDE 17

Thank you!

Acknowledgment: This work was supported in part by Systems on Nanoscale Information fabriCs (SONIC), one of the six SRC STARnet Centers, sponsored by MARCO and DARPA.