CMSC5743 L06: Binary/Ternary Network
Bei Yu
(Latest update: November 2, 2020)
Fall 2020
1 / 21
CMSC5743 L06: Binary/Ternary Network Bei Yu (Latest update: - - PowerPoint PPT Presentation
CMSC5743 L06: Binary/Ternary Network Bei Yu (Latest update: November 2, 2020) Fall 2020 1 / 21 These slides contain/adapt materials developed by Ritchie Zhao et al. (2017). Accelerating binarized convolutional neural networks with
Fall 2020
1 / 21
software-programmable FPGAs”. In: Proc. FPGA, pp. 15–24
convolutional neural networks”. In: Proc. ECCV, pp. 525–542
2 / 21
−0.05 0.05 1600 3200 4800 6400
Weight Value Count
3 / 21
6
Input Map
2.4 6.2 … 3.3 1.8
… Weights
0.8 0.1 0.3 0.8
Input Map
(Binary) 1 −1 … 1 1
… Weights
(Binary) 1 −1 1 −1
Output Map
5.0 9.1 … 4.3 7.8
…
123
(Integer) 1 −3 … 3 −7
… 423 = 123 − 5 67 − 8
Output Map
(Binary) 1 −1 … 1 −1
… =23 = >+1 if 423 ≥ 0 −1 otherwise
Batch Normalization Binarization
batch normalization CNN BNN
4 / 21
7
[2] M. Courbariaux et al. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1
32x32 16x16 8x8 4x4 3 128 128 256 256 512 512 1024 1024 10 Number of feature maps Feature map dimensions
4 / 21
– Encode {+1,−1} as {0,1} à multiplies become XORs – Conv/dense layers do dot products à XOR and popcount – Operations can map to LUT fabric as opposed to DSPs
– Fewer bits per weight may be offset by having more weights
8
b1 b2 b1
1 ⨯
⨯ b2 +1 +1 +1 +1 −1 −1 −1 +1 −1 −1 −1 +1 b1 b2 b1
1 XO
XOR b2 1 1 1 1 1 1
4 / 21
Architecture Depth Param Bits (Float) Param Bits (Fixed-Point) Error Rate (%) ResNet [3]
(CIFAR-10)
164 51.9M 13.0M* 11.26 BNN [2] 9
11.40
9
– Conservative assumption: ResNet can use 8-bit weights – BNN is based on VGG (less advanced architecture) – BNN seems to hold promise! * Assuming each float param can be quantized to 8-bit fixed-point
[2] M. Courbariaux et al. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1
[3] K. He, X. Zhang, S. Ren, and J. Sun. Identity Mappings in Deep Residual Networks. ECCV 2016. 4 / 21
Minimize the Quantization Error Reduce the Gradient Error
5 / 21
Minimize the Quantization Error Reduce the Gradient Error
6 / 21
1
1Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
6 / 21
1
1Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
6 / 21
1
1Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
6 / 21
1
1Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
6 / 21
1
1Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
6 / 21
1
1Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
6 / 21
1
1Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
6 / 21
0' 10' 20' 30' 40' 50' 60' '' ' AlexNet'TopX1'(%)'ILSVRC2012' 56.7' 0.2' Full'Precision' Naïve'
1
1Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
6 / 21
1
1Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
6 / 21
1
1Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
6 / 21
Train f for b binary w y weights:
3. Load a random input image X 4. WB = sign(W) 5. α = kWk`1
n
6. Forward pass with α, WB 7. Compute loss function C 8.
@C @W = Backward pass with α, WB
9. Update W (W = W − @C
@W)
1
1Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
6 / 21
1
1Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
6 / 21
1
1Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
6 / 21
1
1Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
6 / 21
1
1Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
6 / 21
1
1Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
6 / 21
1
1Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
6 / 21
0' 10' 20' 30' 40' 50' 60' '' ' AlexNet'TopX1'(%)'ILSVRC2012' 56.7' 0.2' 56.8' Full'Precision' Naïve' Binary'Weight'
1
1Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
6 / 21
1
1Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
6 / 21
1
1Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
6 / 21
1
1Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
6 / 21
B sign(X) R B
(1) Binarizing Weights
(3) Convolution with XNOR-Bitcount
B B R R sign(X)
(2) Binarizing Input Efficient
|X:,:,i| c
Redundant computation in overlapping areas Inefficient (2) Binarizing Input
X R B sign(X)
Average Filter
1
1Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
6 / 21
1
1Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
6 / 21
0' 10' 20' 30' 40' 50' 60' '' ' AlexNet'TopX1'(%)'ILSVRC2012' 56.7' 0.2' 56.8' 30.5'
1
1Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
6 / 21
sign(x) !
X1' +1'
A'typical'block'in'CNN' BNorm' AcIv' Pool' Conv' '
MaxXPooling'
1
1Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
6 / 21
BNorm' AcIv' Pool' Conv' '
1
1Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
6 / 21
BNorm' AcIv' BNorm' AcIv' Pool' Conv' '
1
1Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
6 / 21
1
1Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
6 / 21
0' 10' 20' 30' 40' 50' 60' AlexNet'TopX1'(%)'ILSVRC2012' 56.7' 0.2' 56.8' 30.5' 44.2' ✓ 32x'Smaller'Model'
50 100 150 200 250 300 350 400 450 500 AlexNet VGG ResNet-18 Float Binary 245 MB 500 MB 100 MB 7.4 MB 16 MB 1.5 MB
✓ 58x'Less'ComputaIon'
1 32 1024 number of channels 0x 20x 40x 60x 80x Speedup by varying channel size 0x0 10x10 20x20 filter size 50x 55x 60x 65x Speedup by varying filter size
1
1Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
6 / 21
0' 10' 20' 30' 40' 50' 60' 70' 80' 90' AlexNet'Top.1$&$5'(%)'ILSVRC2012'
1
1Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
6 / 21
“Binaryconnect: Training deep neural networks with binary weights during propagations”. In: Advances in neural information processing systems,
networks: Training deep neural networks with weights and activations constrained to+ 1 or-1”. In: arXiv preprint arXiv:1602.02830) suffer the accuracy loss
possible
7 / 21
8 / 21
bases
9 / 21
where ¯
2 M−1)
10 / 21
a
M
M
11 / 21
where v is a shift parameter
where R is the real-value activation
12 / 21
with linear combination of binary convolutions
binarized using different functions Hv1,Hv2,Hv3
m=1 αmBm, N n=1 βnAn
m=1
n=1 αmβn Conv (Bm, An)
13 / 21
2Xiaofan Lin, Cong Zhao, and Wei Pan (2017). “Towards accurate binary convolutional neural network”. In: Advances in
Neural Information Processing Systems, pp. 345–353.
14 / 21
Minimize the Quantization Error Reduce the Gradient Error
15 / 21
gradient mismatch between the gradient of the binarization function
updated.
15 / 21
16 / 21
∂L ∂Al,t
r =
∂L ∂Al,t
b
∂Al,t
b
∂Al,t
r =
∂L ∂Al,t
b
∂ Sign(Al,t
r )
∂Al,t
r
∂L ∂Al,t
b
∂F(Al,t
r )
∂Al,t
r
function of Sign function
17 / 21
∂L ∂Al,t
r =
∂L ∂Al,t
b
∂Al,t
b
∂Al,t
r =
∂L ∂Al,t
b
∂ Sign(Al,t
r )
∂Al,t
r
∂L ∂Al,t
b
∂F(Al,t
r )
∂Al,t
r
Approxsign(x) = −1,
if x < −1
2x + x2,
if − 1 ≤ x < 0
2x − x2,
if 0 ≤ x < 1
1,
∂Approxsign(x) ∂x
= 2 + 2x,
if − 1 ≤ x < 0
2 − 2x,
if 0 ≤ x < 1
0,
18 / 21
3Zechun Liu et al. (2018). “Bi-real net: Enhancing the performance of 1-bit cnns with improved representational capability
and advanced training algorithm”. In: Proceedings of the European conference on computer vision (ECCV), pp. 722–737.
19 / 21
Overview of the trained ternary quantization procedure.
4Chenzhuo Zhu et al. (2017). “Trained ternary quantization”. In: Proc. ICLR. 20 / 21
Ternary weights value (above) and distribution (below) with iterations for different layers of ResNet-20 on CIFAR-10.
4Chenzhuo Zhu et al. (2017). “Trained ternary quantization”. In: Proc. ICLR. 20 / 21
Convolutional Neural Networks”. In: Proc. DAC, 60:1–60:6
neural networks”. In: arXiv preprint arXiv:1805.06085
compact deep neural networks”. In: Proceedings of the European conference on computer vision (ECCV), pp. 365–382
with low-precision weights”. In: arXiv preprint arXiv:1702.03044
quantization”. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5918–5926
21 / 21