learning accurate low bit deep neural networks with
play

Learning Accurate Low-bit Deep Neural Networks with Stochastic - PowerPoint PPT Presentation

Learning Accurate Low-bit Deep Neural Networks with Stochastic Quantization Yinpeng Dong 1 , Renkun Ni 2 , Jianguo Li 3 , Yurong Chen 3 , Jun Zhu 1 , Hang Su 1 1 Department of CST, Tsinghua University 2 University of Virginia 3 Intel Labs China


  1. Learning Accurate Low-bit Deep Neural Networks with Stochastic Quantization Yinpeng Dong 1 , Renkun Ni 2 , Jianguo Li 3 , Yurong Chen 3 , Jun Zhu 1 , Hang Su 1 1 Department of CST, Tsinghua University 2 University of Virginia 3 Intel Labs China

  2. Deep Learning is Everywhere Self-Driving Alpha Go Machine Translation Dota 2

  3. Limitations n More data + deeper models ร  more FLOPs + lager memory n Computation Intensive n Memory Intensive n Hard to deploy on mobile devices 3

  4. Low-bit DNNs for Efficient Inference n High Redundancy in DNNs; n Quantize full-precision(32-bits) weights to binary(1 bit) or ternary(2 bits) weights; n Replace multiplication(convolution) by addition and subtraction; 4

  5. ๏ฟฝ Typical Low-bit DNNs n BinaryConnect: ๐ถ " = $+1 with probability ๐‘ž = ๐œ(๐‘‹ " ) โˆ’1 with probability 1 โˆ’ ๐‘ž n BWN: minimize ๐‘‹ โˆ’ ๐›ฝ๐ถ @ ๐›ฝ = โˆ‘ ๐‘‹ " "AB ๐ถ " = ๐‘ก๐‘—๐‘•๐‘œ ๐‘‹ " , ๐‘’ n TWN: minimize ๐‘‹ โˆ’ ๐›ฝ๐‘ˆ +1 if ๐‘‹ " > โˆ† โˆ‘ ๐‘‹ " "โˆˆM โˆ† 0 if ๐‘‹ " < โˆ† ๐‘ˆ " = E , ๐›ฝ = ๐ฝ โˆ† โˆ’1 if ๐‘‹ " < โˆ’โˆ† โˆ†= 0.7 @ ๐ฝ โˆ† = ๐‘— ๐‘‹ " > โˆ† , ๐‘’ Q ๐‘‹ " "AB 5

  6. Training & Inference of Low-bit DNN n Let ๐‘‹ be the full-precision weights, ๐‘… be the low-bit weights ( ๐ถ , ๐‘ˆ , ฮฑ๐ถ , ฮฑ๐‘ˆ ). n Forward propagation: quantize ๐‘‹ to ๐‘… and perform convolution or multiplication n Backward propagation: use ๐‘… to calculate gradients n Parameter update: ๐‘‹ TUB = ๐‘‹ T โˆ’ ๐œƒ T WX WY Z n Inference: only need to keep low-bit weights ๐‘… 6

  7. Motivations n Quantize all weights simultaneously; n Quantization error ๐‘‹ โˆ’ ๐‘… may be large for some elements/filters; n Induce inappropriate gradient directions. n Quantize a portion of weights n Stochastic selection n Could be applied to any low-bit settings 7

  8. Roulette Selection Algorithm Weight Matrix Quantization Error Stochastic Partition with r = 50% Hybrid Weight Matrix Rotation Rotation 1.3 -1.1 0.75 0.85 0.2 1.3 -1.1 0.75 0.85 C1 0.95 -0.9 1.05 -1.0 0.05 1 -1 1 -1 C2 Selection Selection Point Point 1.4 -0.9 -0.8 0.9 0.2 1 -1 -1 1 C3 -1.2 0.8 1.0 -1.0 0.1 -1.2 0.8 1.0 -1.0 C4 1-st selection: v=0.58 2-nd selection: v=0.37 C2 selected C3 selected ๐‘‹ " โˆ’ ๐‘… " B ๐‘“ " = Quantization Error: ๐‘‹ " B Quantization Probability: Larger quantization error means smaller quantization probability, e.g. ๐‘ž " โˆ B ] ^ Quantization Ratio r: Gradually increase to 100% 8

  9. Training & Inference _ n Hybrid weight matrix ๐‘… _ " = $๐‘… " if channel i being selected ๐‘… ๐‘‹ " else n Parameter update ๐‘‹ TUB = ๐‘‹ T โˆ’ ๐œƒ T ๐œ–๐‘€ _ T ๐œ–๐‘… n Inference: all weights are quantized; use ๐‘… to perform inference 9

  10. ๏ฟฝ๏ฟฝ Ablation Studies n Selection Granularity: ยจ Filter-level > Element-level n Selection/partition algorithms ยจ Stochastic (roulette) > deterministic (sorting) ~ fixed (selection only at first iteration) n Quantization functions ยจ Linear > Sigmoid > Constant ~ Softmax , where ๐‘” = B n ๐‘ž " = exp (๐‘” " ) โˆ‘ exp โ„ (๐‘” " ) ] n Quantization Ratio Update Scheme ยจ Exponential > Fine-tune > Uniformly n 50% ร  75% ร  87.5% ร  100% 10

  11. Results -- CIFAR CIFAR-10 CIFAR-100 Bits VGG-9 ResNet-56 VGG-9 ResNet-56 FWN 32 9.00 6.69 30.68 29.49 BWN 1 10.67 16.42 37.68 35.01 SQ-BWN 1 9.40 7.15 35.25 31.56 TWN 2 9.87 7.64 34.80 32.09 SQ-TWN 2 8.37 6.20 34.24 28.90 error (%) of VGG-9 and ResNet-56 trained with 5 different methods on the CIFAR-10 and 80 2 FWN FWN BWN TWN 1.8 SQ-BWN SQ-TWN 1.6 60 1.4 1.2 Loss Loss 40 1 0.8 0.6 20 0.4 0.2 0 0 0 64 128 192 256 0 64 128 192 256 Iter.(k) Iter.(k) 11

  12. Results -- ImageNet AlexNet-BN ResNet-18 Bits top-1 top-5 top-1 top-5 FWN 32 44.18 20.83 34.80 13.60 BWN 1 51.22 27.18 45.20 21.08 SQ-BWN 1 48.78 24.86 41.64 18.35 TWN 2 47.54 23.81 39.83 17.02 SQ-TWN 2 44.70 21.40 36.18 14.26 (%) of AlexNet-BN and ResNet-18 trained with 5 different methods on 12

  13. Conclusions n We propose a stochastic quantization algorithm for Low-bit DNN training n Our algorithm can be flexibly applied to all low-bit settings; n Our algorithm help to consistently improve the performance; n We release our codes to public for future development ยจ https://github.com/dongyp13/Stochastic-Quantization 13

  14. Q & A

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend