CENG5030 Part 2-4: CNN Inaccurate Speedup-2 —- Quantization
Bei Yu
(Latest update: March 25, 2019)
Spring 2019
1 / 9
CENG5030 Part 2-4: CNN Inaccurate Speedup-2 - Quantization Bei Yu - - PowerPoint PPT Presentation
CENG5030 Part 2-4: CNN Inaccurate Speedup-2 - Quantization Bei Yu (Latest update: March 25, 2019) Spring 2019 1 / 9 These slides contain/adapt materials developed by Suyog Gupta et al. (2015). Deep learning with limited numerical
Spring 2019
1 / 9
software-programmable FPGAs”. In: Proc. FPGA, pp. 15–24
convolutional neural networks”. In: Proc. ECCV, pp. 525–542
2 / 9
3 / 9
What'should'I'learn' to'do'well'in' computer'vision' research?' I'want'to'research'
LEARNING'in'it?'
3 / 9
DEEP'LEARNING'
3 / 9
GPU$ Server$
3 / 9
Ohhh'No!!!'
3 / 9
3 / 9
Fixed-Point Representation Binary/Ternary Network Reading List
4 / 9
Fixed-Point Representation Binary/Ternary Network Reading List
5 / 9
5 / 9
5 / 9
5 / 9
7
Number representation
Granularity
1
1Suyog Gupta et al. (2015). “Deep learning with limited numerical precision”. In: Proc. ICML, pp. 1737–1746. 6 / 9
8
Number representation Multiply-and-ACCumulate
(48-bits) WL-bit multiplier
Granularity
1
1Suyog Gupta et al. (2015). “Deep learning with limited numerical precision”. In: Proc. ICML, pp. 1737–1746. 6 / 9
9
1
1Suyog Gupta et al. (2015). “Deep learning with limited numerical precision”. In: Proc. ICML, pp. 1737–1746. 6 / 9
10
either or
expected rounding error is zero
1
1Suyog Gupta et al. (2015). “Deep learning with limited numerical precision”. In: Proc. ICML, pp. 1737–1746. 6 / 9
11
FL 8 FL 10 Float Lower precision FL 14 Lower precision
1
1Suyog Gupta et al. (2015). “Deep learning with limited numerical precision”. In: Proc. ICML, pp. 1737–1746. 6 / 9
12
rounded to zero when using the round-to-nearest scheme.
FL 8 FL 10 Float Lower precision FL 14 Lower precision
1
1Suyog Gupta et al. (2015). “Deep learning with limited numerical precision”. In: Proc. ICML, pp. 1737–1746. 6 / 9
13
1
1Suyog Gupta et al. (2015). “Deep learning with limited numerical precision”. In: Proc. ICML, pp. 1737–1746. 6 / 9
21
Systolic Array (SA) of Multiple-and- ACCumulate (MACC) units L2 Cache (BRAM) READ Top Controller WRITE L2-to-SA AXI Interface to DDR3 8GB DDR3 SO-DIMM
Xilinx Kintex K325T FPGA
DSP MACC DSP MACC
Input FIFOs: Matrix A Input FIFOs: Matrix B MACC units (n x n array)
DSP MACC DSP MACC DSP MACC DSP MACC FIFO FIFO DSP MACC DSP MACC FIFO DSP MACC FIFO FIFO FIFO
n x n Systolic Array
n n Wavefront systolic array for computing matrix product AB. Arrows indicate dataflow Top-level controller and memory hierarchy designed to maximize data reuse
1
1Suyog Gupta et al. (2015). “Deep learning with limited numerical precision”. In: Proc. ICML, pp. 1737–1746. 6 / 9
22
p.n rows
n columns
MUX MUX
n : dimension of the systolic array p : parameter chosen based on available BRAM resources
1
1Suyog Gupta et al. (2015). “Deep learning with limited numerical precision”. In: Proc. ICML, pp. 1737–1746. 6 / 9
23
DSP MACC DSP MACC DSP MACC DSP MACC FIFO FIFO FIFO FIFO DSP ROUND FIFO DSP ROUND FIFO
Output path
Output C FIFOs
Local registers
1
1Suyog Gupta et al. (2015). “Deep learning with limited numerical precision”. In: Proc. ICML, pp. 1737–1746. 6 / 9
24
DSP MACC DSP MACC DSP MACC DSP MACC FIFO FIFO FIFO FIFO DSP ROUND FIFO DSP ROUND FIFO
Output path
Output C FIFOs
Local registers
Accumulated result
Pseudo-random number generated using LFSR
Truncate LSBs, and saturate to limits if result exceeds range These operations can be implemented efficiently using a single DSP unit
1
1Suyog Gupta et al. (2015). “Deep learning with limited numerical precision”. In: Proc. ICML, pp. 1737–1746. 6 / 9
Fixed-Point Representation Binary/Ternary Network Reading List
7 / 9
6
Input Map
2.4 6.2 … 3.3 1.8
… Weights
0.8 0.1 0.3 0.8
Input Map
(Binary) 1 −1 … 1 1
… Weights
(Binary) 1 −1 1 −1
Output Map
5.0 9.1 … 4.3 7.8
…
123
(Integer) 1 −3 … 3 −7
… 423 = 123 − 5 67 − 8
Output Map
(Binary) 1 −1 … 1 −1
… =23 = >+1 if 423 ≥ 0 −1 otherwise
Batch Normalization Binarization
batch normalization CNN BNN
7 / 9
7
[2] M. Courbariaux et al. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1
32x32 16x16 8x8 4x4 3 128 128 256 256 512 512 1024 1024 10 Number of feature maps Feature map dimensions
7 / 9
– Encode {+1,−1} as {0,1} à multiplies become XORs – Conv/dense layers do dot products à XOR and popcount – Operations can map to LUT fabric as opposed to DSPs
– Fewer bits per weight may be offset by having more weights
8
b1 b2 b1
1 ⨯
⨯ b2 +1 +1 +1 +1 −1 −1 −1 +1 −1 −1 −1 +1 b1 b2 b1
1 XO
XOR b2 1 1 1 1 1 1
7 / 9
Architecture Depth Param Bits (Float) Param Bits (Fixed-Point) Error Rate (%) ResNet [3]
(CIFAR-10)
164 51.9M 13.0M* 11.26 BNN [2] 9
11.40
9
– Conservative assumption: ResNet can use 8-bit weights – BNN is based on VGG (less advanced architecture) – BNN seems to hold promise! * Assuming each float param can be quantized to 8-bit fixed-point
[2] M. Courbariaux et al. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1
[3] K. He, X. Zhang, S. Ren, and J. Sun. Identity Mappings in Deep Residual Networks. ECCV 2016. 7 / 9
2
2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
8 / 9
2
2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
8 / 9
∗ W
WB ∗ W
WB
2
2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
8 / 9
∗ W
2
2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
8 / 9
α∗, WB∗ = arg min
WB,α{||W − αWB||2}
α∗ | = 1
nkWk`1 WB∗ = sign(W) ∗ W
WB
2
2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
8 / 9
2
2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
8 / 9
2
2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
8 / 9
0' 10' 20' 30' 40' 50' 60' '' ' AlexNet'TopX1'(%)'ILSVRC2012' 56.7' 0.2' Full'Precision' Naïve'
2
2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
8 / 9
2
2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
8 / 9
Person' Dog'
2
2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
8 / 9
Train f for b binary w y weights:
3. Load a random input image X 4. WB = sign(W) 5. α = kWk`1
n
6. Forward pass with α, WB 7. Compute loss function C 8.
@C @W = Backward pass with α, WB
9. Update W (W = W − @C
@W)
2
2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
8 / 9
Train f for b binary w y weights:
3. Load a random input image X 4. WB = sign(W) 5. α = kWk`1
n
6. Forward pass with α, WB 7. Compute loss function C 8.
@C @W = Backward pass with α, WB
9. Update W (W = W − @C
@W)
2
2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
8 / 9
Train f for b binary w y weights:
3. Load a random input image X 4. WB = sign(W) 5. α = kWk`1
n
6. Forward pass with α, WB 7. Compute loss function C 8.
@C @W = Backward pass with α, WB
9. Update W (W = W − @C
@W)
2
2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
8 / 9
Train f for b binary w y weights:
3. Load a random input image X 4. WB = sign(W) 5. α = kWk`1
n
6. Forward pass with α, WB 7. Compute loss function C 8.
@C @W = Backward pass with α, WB
9. Update W (W = W − @C
@W)
2
2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
8 / 9
Train f for b binary w y weights:
3. Load a random input image X 4. WB = sign(W) 5. α = kWk`1
n
6. Forward pass with α, WB 7. Compute loss function C 8.
@C @W = Backward pass with α, WB
9. Update W (W = W − @C
@W)
2
2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
8 / 9
sign(x) ! Gx !
X1' +1' X1' +1' +1'
[Hinton et al. 2012]
Train f for b binary w y weights:
3. Load a random input image X 4. WB = sign(W) 5. α = kWk`1
n
6. Forward pass with α, WB 7. Compute loss function C 8.
@C @W = Backward pass with α, WB
9. Update W (W = W − @C
@W)
2
2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
8 / 9
Train f for b binary w y weights:
3. Load a random input image X 4. WB = sign(W) 5. α = kWk`1
n
6. Forward pass with α, WB 7. Compute loss function C 8.
@C @W = Backward pass with α, WB
9. Update W (W = W − @C
@W)
2
2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
8 / 9
0' 10' 20' 30' 40' 50' 60' '' ' AlexNet'TopX1'(%)'ILSVRC2012' 56.7' 0.2' 56.8' Full'Precision' Naïve' Binary'Weight'
2
2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
8 / 9
2
2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
8 / 9
∗ W
WB XB
2
2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
8 / 9
∗ W
WB XB
α∗ = 1 n kWkℓ1
β∗ = 1 n kXkℓ1
WB∗ = sign(W) XB∗ = sign(X)
YB,γ kY − γYBk2 2
2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
8 / 9
B sign(X) R B
(1) Binarizing Weights
(3) Convolution with XNOR-Bitcount
B B R R sign(X)
(2) Binarizing Input Efficient
P |X:,:,i| c
Redundant computation in overlapping areas Inefficient (2) Binarizing Input
X R B sign(X)
Average Filter
2
2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
8 / 9
B B R R sign(X)
3. Load a random input image X 4. WB = sign(W) 5. α = kWk`1
n
6. Forward pass with α, WB 7. Compute loss function C 8.
@C @W = Backward pass with α, WB
9. Update W (W = W − @C
@W)
2
2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
8 / 9
0' 10' 20' 30' 40' 50' 60' '' ' AlexNet'TopX1'(%)'ILSVRC2012' 56.7' 0.2' 56.8' 30.5'
2
2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
8 / 9
sign(x) !
X1' +1'
A'typical'block'in'CNN' BNorm' AcIv' Pool' Conv' '
MaxXPooling'
2
2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
8 / 9
BNorm' AcIv' Pool' Conv' '
2
2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
8 / 9
BNorm' AcIv' BNorm' AcIv' Pool' Conv' '
2
2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
8 / 9
BNorm' AcIv' Pool' Conv' '
3. Load a random input image X 4. WB = sign(W) 5. α = kWk`1
n
6. Forward pass with α, WB 7. Compute loss function C 8.
@C @W = Backward pass with α, WB
9. Update W (W = W − @C
@W)
B B R R sign(X)
∗, β∗
2
2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
8 / 9
0' 10' 20' 30' 40' 50' 60' AlexNet'TopX1'(%)'ILSVRC2012' 56.7' 0.2' 56.8' 30.5' 44.2' ✓ 32x'Smaller'Model'
50 100 150 200 250 300 350 400 450 500 AlexNet VGG ResNet-18 Float Binary 245 MB 500 MB 100 MB 7.4 MB 16 MB 1.5 MB
✓ 58x'Less'ComputaIon'
1 32 1024 number of channels 0x 20x 40x 60x 80x Speedup by varying channel size 0x0 10x10 20x20 filter size 50x 55x 60x 65x Speedup by varying filter size
2
2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
8 / 9
0' 10' 20' 30' 40' 50' 60' 70' 80' 90' AlexNet'Top.1$&$5'(%)'ILSVRC2012'
2
2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:
8 / 9
Fixed-Point Representation Binary/Ternary Network Reading List
9 / 9
Fixed-Point Representation:
quantization of deep convolutional networks”. In: Proc. ICML, pp. 2849–2858
Binary/Ternary Network:
Convolutional Neural Networks”. In: Proc. DAC, 60:1–60:6
9 / 9