Learning Accurate Low-bit Deep Neural Networks with Stochastic - PowerPoint PPT Presentation

Learning Accurate Low-bit Deep Neural Networks with Stochastic Quantization Yinpeng Dong 1 , Renkun Ni 2 , Jianguo Li 3 , Yurong Chen 3 , Jun Zhu 1 , Hang Su 1 1 Department of CST, Tsinghua University 2 University of Virginia 3 Intel Labs China

Deep Learning is Everywhere Self-Driving Alpha Go Machine Translation Dota 2

Limitations n More data + deeper models à more FLOPs + lager memory n Computation Intensive n Memory Intensive n Hard to deploy on mobile devices 3

Low-bit DNNs for Efficient Inference n High Redundancy in DNNs; n Quantize full-precision(32-bits) weights to binary(1 bit) or ternary(2 bits) weights; n Replace multiplication(convolution) by addition and subtraction; 4

� Typical Low-bit DNNs n BinaryConnect: 𝐶 " = $+1 with probability 𝑞 = 𝜏(𝑋 " ) −1 with probability 1 − 𝑞 n BWN: minimize 𝑋 − 𝛽𝐶 @ 𝛽 = ∑ 𝑋 " "AB 𝐶 " = 𝑡𝑗𝑕𝑜 𝑋 " , 𝑒 n TWN: minimize 𝑋 − 𝛽𝑈 +1 if 𝑋 " > ∆ ∑ 𝑋 " "∈M ∆ 0 if 𝑋 " < ∆ 𝑈 " = E , 𝛽 = 𝐽 ∆ −1 if 𝑋 " < −∆ ∆= 0.7 @ 𝐽 ∆ = 𝑗 𝑋 " > ∆ , 𝑒 Q 𝑋 " "AB 5

Training & Inference of Low-bit DNN n Let 𝑋 be the full-precision weights, 𝑅 be the low-bit weights ( 𝐶 , 𝑈 , α𝐶 , α𝑈 ). n Forward propagation: quantize 𝑋 to 𝑅 and perform convolution or multiplication n Backward propagation: use 𝑅 to calculate gradients n Parameter update: 𝑋 TUB = 𝑋 T − 𝜃 T WX WY Z n Inference: only need to keep low-bit weights 𝑅 6

Motivations n Quantize all weights simultaneously; n Quantization error 𝑋 − 𝑅 may be large for some elements/filters; n Induce inappropriate gradient directions. n Quantize a portion of weights n Stochastic selection n Could be applied to any low-bit settings 7

Roulette Selection Algorithm Weight Matrix Quantization Error Stochastic Partition with r = 50% Hybrid Weight Matrix Rotation Rotation 1.3 -1.1 0.75 0.85 0.2 1.3 -1.1 0.75 0.85 C1 0.95 -0.9 1.05 -1.0 0.05 1 -1 1 -1 C2 Selection Selection Point Point 1.4 -0.9 -0.8 0.9 0.2 1 -1 -1 1 C3 -1.2 0.8 1.0 -1.0 0.1 -1.2 0.8 1.0 -1.0 C4 1-st selection: v=0.58 2-nd selection: v=0.37 C2 selected C3 selected 𝑋 " − 𝑅 " B 𝑓 " = Quantization Error: 𝑋 " B Quantization Probability: Larger quantization error means smaller quantization probability, e.g. 𝑞 " ∝ B ] ^ Quantization Ratio r: Gradually increase to 100% 8

Training & Inference _ n Hybrid weight matrix 𝑅 _ " = $𝑅 " if channel i being selected 𝑅 𝑋 " else n Parameter update 𝑋 TUB = 𝑋 T − 𝜃 T 𝜖𝑀 _ T 𝜖𝑅 n Inference: all weights are quantized; use 𝑅 to perform inference 9

�� Ablation Studies n Selection Granularity: ¨ Filter-level > Element-level n Selection/partition algorithms ¨ Stochastic (roulette) > deterministic (sorting) ~ fixed (selection only at first iteration) n Quantization functions ¨ Linear > Sigmoid > Constant ~ Softmax , where 𝑔 = B n 𝑞 " = exp (𝑔 " ) ∑ exp ⁄ (𝑔 " ) ] n Quantization Ratio Update Scheme ¨ Exponential > Fine-tune > Uniformly n 50% à 75% à 87.5% à 100% 10

Results -- CIFAR CIFAR-10 CIFAR-100 Bits VGG-9 ResNet-56 VGG-9 ResNet-56 FWN 32 9.00 6.69 30.68 29.49 BWN 1 10.67 16.42 37.68 35.01 SQ-BWN 1 9.40 7.15 35.25 31.56 TWN 2 9.87 7.64 34.80 32.09 SQ-TWN 2 8.37 6.20 34.24 28.90 error (%) of VGG-9 and ResNet-56 trained with 5 different methods on the CIFAR-10 and 80 2 FWN FWN BWN TWN 1.8 SQ-BWN SQ-TWN 1.6 60 1.4 1.2 Loss Loss 40 1 0.8 0.6 20 0.4 0.2 0 0 0 64 128 192 256 0 64 128 192 256 Iter.(k) Iter.(k) 11

Results -- ImageNet AlexNet-BN ResNet-18 Bits top-1 top-5 top-1 top-5 FWN 32 44.18 20.83 34.80 13.60 BWN 1 51.22 27.18 45.20 21.08 SQ-BWN 1 48.78 24.86 41.64 18.35 TWN 2 47.54 23.81 39.83 17.02 SQ-TWN 2 44.70 21.40 36.18 14.26 (%) of AlexNet-BN and ResNet-18 trained with 5 different methods on 12

Conclusions n We propose a stochastic quantization algorithm for Low-bit DNN training n Our algorithm can be flexibly applied to all low-bit settings; n Our algorithm help to consistently improve the performance; n We release our codes to public for future development ¨ https://github.com/dongyp13/Stochastic-Quantization 13

Learning Accurate Low-bit Deep Neural Networks with Stochastic - PowerPoint PPT Presentation

Learning Accurate Low-bit Deep Neural Networks with Stochastic Quantization Yinpeng Dong 1 , Renkun Ni 2 , Jianguo Li 3 , Yurong Chen 3 , Jun Zhu 1 , Hang Su 1 1 Department of CST, Tsinghua University 2 University of Virginia 3 Intel Labs China

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Deep Learning with Neural Networks The Structure and Optimization of Deep Neural Networks Allan

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Listing Bit Strings List all bit strings of length 3. Listing Bit Strings List all bit strings

Data Mining II Neural Networks and Deep Learning Heiko Paulheim Deep Learning A recent

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

(Very) Brief Introduction to Neural Networks IITP-03 Algorithms for NLP 1 / 31 Learning

Bit Accurate Roundoff Bit Accurate Roundoff Noise Analysis of Noise Analysis of Fixed-Point

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Cache: Bit it-Serial l In In-Cache Acceleration of f Deep Neural l Networks Charles

Lecture 13 : Lecture 13 : Special Bit Instructions Todays Goals L Learn bit-set and

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Networks - Deep Learning Artificial Intelligence @ Allegheny College Janyl Jumadinova

Implementing DNNs What this lecture is about: on Embedded Overview of frameworks for

Quantization for TVM Ziheng Jiang TVM Conference, Dec 12th 2018 Quantization for TVM What is

Same, Same But Different Recovering Neural Network Quantization Error Through Weight

Q-Clouds: Managing Performance Interference Effects for QoS-Aware Clouds Ripal Nathuji Aman

Obstacles to the quantization of general relativity using symplectic structures Tom McClain

Quantization, after Souriau Prequantization Quantization? Group algebra Classical Franois

Ultra-low-bit Neural Network Quantization Peisong Wang Institute of Automation, Chinese Academy

A First Course in Digital Communications Ha H. Nguyen and E. Shwedyk February 2009 A First