minimum per layer precision of deep
play

Minimum Per-Layer Precision of Deep Neural Networks Charbel Sakr , - PowerPoint PPT Presentation

An Analytical Method to Determine Minimum Per-Layer Precision of Deep Neural Networks Charbel Sakr , Naresh Shanbhag Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign Code:


  1. An Analytical Method to Determine Minimum Per-Layer Precision of Deep Neural Networks Charbel Sakr , Naresh Shanbhag Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign Code: https://github.com/charbel-sakr/Precision-Analysis-of-Neural-Networks Also found on sakr2.web.engr.Illinois.edu

  2. Machine Learning ASICs Eyeriss PuDianNao TPU [Sze’16, ISSCC] [Chen’15, ASPLOS] [Google’17, ISCA] AlexNet accelerator ML accelerator Tensorflow accelerator 16b fixed-point 16b fixed-point 8b fixed-point How are they choosing these precisions? Why is it working? Can it be determined analytically?

  3. Current Approaches • Stochastic Rounding during training [Gupta, ICML’15 – Hubara , NIPS’16] → Difficulty of training in a discrete space No accuracy • Trial-and- error approach [Sung, SiPS’14] → Exhaustive search is expensive vs. precision understanding • SQNR based precision allocation [Lin, ICML’16] → Lack of precision/accuracy understanding Fixed-point Quantization

  4. International Conference on Machine Learning (ICML) - 2017

  5. Precision in Neural Networks Classification Output Quantization Mismatch Probability Floating-point Network Fixed-point Network 𝑞 𝑛 : “Mismatch Probability” [Sakr et al., ICML’17]

  6. Second Order Bound on 𝒒 𝒏 • Input/Weight precision trade-off: → Optimal precision allocation by balancing the sum • Data dependence (compute once and reuse) → Derivatives obtained in last step of backprop → Only one forward-backward pass needed weight quantization noise gain activation quantization noise gain [Sakr et al., ICML’17]

  7. Proof Sketch • For one input, when do we have a mismatch? → If FL network predicts label “ 𝑘 ” → But FX network predicts label “ 𝑗 ” where 𝑗 ≠ 𝑘 → This happens with some probability computed as follows: (Output re-ordering due to quantization) (Symmetry of quantization noise) → But we already know → Whose variance is → Applying Chebyshev + LTP yields the result [Sakr et al., ICML’17]

  8. Tighter Bound on 𝒒 𝒏 • Mismatch probability decreases double exponentially with precision → Theoretically stronger than Theorem 1 → Unfortunately, less practical 𝑁 : Number of Classes 𝑇 : Signal to quantization noise ratio 𝑄1 & 𝑄2 : Correction factors [Sakr et al., ICML’17]

  9. Per-layer Precision Assignment (i (activations) • Per-layer precision allows for more aggressive complexity reductions • Layer 1 Search space is huge: 2L (weights) dimensional grid • Example: 5 layer network, precision considered up to 16 bits → 16 10 ≈ one million millions design points • We need an analytical way to reduce the search space → maybe using the analysis of Sakr et al. Layer L key idea: equalization of reflected quantization noise variances

  10. Fine-grained Precision Analysis [Sakr et al., ICML’17] per-layer quantization noise gains [Sakr et al., ICASSP’18]

  11. Per-layer Precision Assignment Method equalization of quantization noise variances Step 1 : compute least quantization noise gain Step 2 : select precision to equalize all quantization noise variances Search space is reduced to only a one dimensional axis of reference precision min

  12. Comparison with related works • Simplified but meaningful model of complexity → Computational cost → Total number of FAs used assuming folded MACs with bit growth allowed → Number of MACs is equal to the number of dot products computed → Number of FAs per MAC: → Representational cost → Total number of bits needed to represent weights and activations → High level measure of area and communications cost (data movement) • Other works considered → Stochastic quantization (SQ) [Gupta’15, ICML] → 784 – 1000 – 1000 – 10 (MNIST) → 64C5 – MP2 – 64C5 – MP2 – 64FC – 10 (CIFAR10) → BinaryNet (BN) [Hubara’16, NIPS] → 784 – 2048 – 2048 – 2048 – 10 (MNIST) & VGG (CIFAR10)

  13. Precision Profiles – CIFAR-10 VGG-11 ConvNet on CIFAR-10: 32C3-32C3-MP2-64C3-64C3-MP2-128C3-128C3-256FC-256FC-10 Precision chosen such that 𝑞 𝑛 ≤ 1% • Weight precision requirements greater [Sakr et al., ICML’ 17] • Precision decreases with depth – more sensitivity to perturbations in early layers [Raghu et al., ICML’ 17]

  14. Precision – Accuracy Trade-offs • Precision reduction is the greatest when using the proposed fine- grained precision assignment

  15. Precision – Accuracy – Complexity Trade-offs • Best complexity reduction when the proposed precision assignment • Complexity even much smaller than a BinaryNet for same accuracy due to much higher network complexity

  16. Conclusion & Future Work • Presented an analytical method to determine per-layer precision of neural networks • Method based on equalization of reflected quantization noise variances • Per-layer precision assignment reveals interesting properties of neural networks • Method leads to significant overall complexity reduction, e.g., reduction of minimum precision to 2 bits and lesser cost of implementation than BinaryNets • Future work: • Trade-offs between precision vs. depth vs. width • Trade-offs between precision and pruning

  17. Thank you! Acknowledgment: This work was supported in part by Systems on Nanoscale Information fabriCs (SONIC), one of the six SRC STARnet Centers, sponsored by MARCO and DARPA.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend