Learning Accurate Low-bit Deep Neural Networks with Stochastic Quantization
Yinpeng Dong1, Renkun Ni2, Jianguo Li3, Yurong Chen3, Jun Zhu1, Hang Su1
1Department of CST, Tsinghua University 2University of Virginia 3Intel Labs China
Learning Accurate Low-bit Deep Neural Networks with Stochastic - - PowerPoint PPT Presentation
Learning Accurate Low-bit Deep Neural Networks with Stochastic Quantization Yinpeng Dong 1 , Renkun Ni 2 , Jianguo Li 3 , Yurong Chen 3 , Jun Zhu 1 , Hang Su 1 1 Department of CST, Tsinghua University 2 University of Virginia 3 Intel Labs China
1Department of CST, Tsinghua University 2University of Virginia 3Intel Labs China
2
3
n More data + deeper models ร more FLOPs + lager
n Computation Intensive n Memory Intensive n Hard to deploy on mobile devices
4
n High Redundancy in DNNs; n Quantize full-precision(32-bits) weights to binary(1 bit)
n Replace multiplication(convolution) by addition and
5
n BinaryConnect:
")
n BWN: minimize ๐ โ ๐ฝ๐ถ
" ,
" @ "AB
n TWN: minimize ๐ โ ๐ฝ๐
" > โ
" < โ
" < โโ
"
" > โ ,
" @ "AB
6
n Let ๐ be the full-precision weights, ๐ be the low-bit
n Forward propagation: quantize ๐ to ๐ and perform
n Backward propagation: use ๐ to calculate gradients n Parameter update: ๐TUB = ๐T โ ๐T WX
WYZ
n Inference: only need to keep low-bit weights ๐
7
n Quantize all weights simultaneously; n Quantization error ๐ โ ๐ may be large for some
n Induce inappropriate gradient directions. n Quantize a portion of weights n Stochastic selection n Could be applied to any low-bit settings
8
1.3
0.75 0.85 0.95 1.4
1.05
0.8
0.9 1.0
0.2 0.05 0.2 0.1
Selection Point
C1 C2 C3 C4
1-st selection: v=0.58 C2 selected Rotation
Selection Point
2-nd selection: v=0.37 C3 selected Rotation
1.3
0.75 0.85 1 1
1
0.8
1 1.0
Weight Matrix Quantization Error Stochastic Partition with r = 50% Hybrid Weight Matrix
" โ ๐ " B
" B
]^
9
n Hybrid weight matrix ๐
" else
n Parameter update
n Inference: all weights are quantized; use ๐ to perform
10
n Selection Granularity:
ยจ Filter-level > Element-level
n Selection/partition algorithms
ยจ Stochastic (roulette) > deterministic (sorting) ~ fixed
n Quantization functions
ยจ Linear > Sigmoid > Constant ~ Softmax
n ๐" = exp
(๐
") โ exp
(๐
")
, where ๐ = B
]
n Quantization Ratio Update Scheme
ยจ Exponential > Fine-tune > Uniformly
n 50% ร 75% ร 87.5% ร 100%
11
Bits CIFAR-10 CIFAR-100 VGG-9 ResNet-56 VGG-9 ResNet-56 FWN 32 9.00 6.69 30.68 29.49 BWN 1 10.67 16.42 37.68 35.01 SQ-BWN 1 9.40 7.15 35.25 31.56 TWN 2 9.87 7.64 34.80 32.09 SQ-TWN 2 8.37 6.20 34.24 28.90 error (%) of VGG-9 and ResNet-56 trained with 5 different methods on the CIFAR-10 and
20 40 60 80 64 128 192 256 Loss Iter.(k) FWN BWN SQ-BWN 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 64 128 192 256 Loss Iter.(k) FWN TWN SQ-TWN
12
Bits AlexNet-BN ResNet-18 top-1 top-5 top-1 top-5 FWN 32 44.18 20.83 34.80 13.60 BWN 1 51.22 27.18 45.20 21.08 SQ-BWN 1 48.78 24.86 41.64 18.35 TWN 2 47.54 23.81 39.83 17.02 SQ-TWN 2 44.70 21.40 36.18 14.26 (%) of AlexNet-BN and ResNet-18 trained with 5 different methods on
13
n We propose a stochastic quantization algorithm for
n Our algorithm can be flexibly applied to all low-bit
n Our algorithm help to consistently improve the
n We release our codes to public for future development
ยจ https://github.com/dongyp13/Stochastic-Quantization