CENG5030 Part 2-4: CNN Inaccurate Speedup-2 - Quantization Bei Yu - - PowerPoint PPT Presentation

ceng5030 part 2 4 cnn inaccurate speedup 2 quantization
SMART_READER_LITE
LIVE PREVIEW

CENG5030 Part 2-4: CNN Inaccurate Speedup-2 - Quantization Bei Yu - - PowerPoint PPT Presentation

CENG5030 Part 2-4: CNN Inaccurate Speedup-2 - Quantization Bei Yu (Latest update: March 25, 2019) Spring 2019 1 / 9 These slides contain/adapt materials developed by Suyog Gupta et al. (2015). Deep learning with limited numerical


slide-1
SLIDE 1

CENG5030 Part 2-4: CNN Inaccurate Speedup-2 —- Quantization

Bei Yu

(Latest update: March 25, 2019)

Spring 2019

1 / 9

slide-2
SLIDE 2

These slides contain/adapt materials developed by ◮ Suyog Gupta et al. (2015). “Deep learning with limited numerical precision”. In:

  • Proc. ICML, pp. 1737–1746

◮ Ritchie Zhao et al. (2017). “Accelerating binarized convolutional neural networks with

software-programmable FPGAs”. In: Proc. FPGA, pp. 15–24

◮ Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary

convolutional neural networks”. In: Proc. ECCV, pp. 525–542

2 / 9

slide-3
SLIDE 3

3 / 9

slide-4
SLIDE 4

What'should'I'learn' to'do'well'in' computer'vision' research?' I'want'to'research'

  • n'a'topic'with'DEAP'

LEARNING'in'it?'

3 / 9

slide-5
SLIDE 5

DEEP'LEARNING'

3 / 9

slide-6
SLIDE 6

GPU$ Server$

3 / 9

slide-7
SLIDE 7

Ohhh'No!!!'

3 / 9

slide-8
SLIDE 8

State of the art recognition methods

  • 'Very'Expensive''
  • Memory'
  • ComputaIon'
  • Power'

3 / 9

slide-9
SLIDE 9

Overview

Fixed-Point Representation Binary/Ternary Network Reading List

4 / 9

slide-10
SLIDE 10

Overview

Fixed-Point Representation Binary/Ternary Network Reading List

5 / 9

slide-11
SLIDE 11

Fixed-Point v.s. Floating-Point

5 / 9

slide-12
SLIDE 12

Fixed-Point v.s. Floating-Point

5 / 9

slide-13
SLIDE 13

Fixed-Point v.s. Floating-Point

5 / 9

slide-14
SLIDE 14

Fixed-Point Arithmetic

7

Number representation

Granularity

1

1Suyog Gupta et al. (2015). “Deep learning with limited numerical precision”. In: Proc. ICML, pp. 1737–1746. 6 / 9

slide-15
SLIDE 15

Fixed-Point Arithmetic

8

Number representation Multiply-and-ACCumulate

(48-bits) WL-bit multiplier

Granularity

1

1Suyog Gupta et al. (2015). “Deep learning with limited numerical precision”. In: Proc. ICML, pp. 1737–1746. 6 / 9

slide-16
SLIDE 16

Fixed-Point Arithmetic: Rounding Modes

9

Round-to-nearest

1

1Suyog Gupta et al. (2015). “Deep learning with limited numerical precision”. In: Proc. ICML, pp. 1737–1746. 6 / 9

slide-17
SLIDE 17

Fixed-Point Arithmetic: Rounding Modes

10

Stochastic rounding Round-to-nearest

  • Non-zero probability of rounding to

either or

  • Unbiased rounding scheme:

expected rounding error is zero

1

1Suyog Gupta et al. (2015). “Deep learning with limited numerical precision”. In: Proc. ICML, pp. 1737–1746. 6 / 9

slide-18
SLIDE 18

MNIST: Fully-connected DNNs

11

FL 8 FL 10 Float Lower precision FL 14 Lower precision

1

1Suyog Gupta et al. (2015). “Deep learning with limited numerical precision”. In: Proc. ICML, pp. 1737–1746. 6 / 9

slide-19
SLIDE 19

MNIST: Fully-connected DNNs

12

  • For small fractional lengths (FL < 12), a large majority of weight updates are

rounded to zero when using the round-to-nearest scheme.

  • Convergence slows down
  • For FL < 12, there is a noticeable degradation in the classification accuracy

FL 8 FL 10 Float Lower precision FL 14 Lower precision

1

1Suyog Gupta et al. (2015). “Deep learning with limited numerical precision”. In: Proc. ICML, pp. 1737–1746. 6 / 9

slide-20
SLIDE 20

13

  • Stochastic rounding preserves gradient information (statistically)
  • No degradation in convergence properties
  • Test error nearly equal to that obtained using 32-bit floats

MNIST: Fully-connected DNNs

1

1Suyog Gupta et al. (2015). “Deep learning with limited numerical precision”. In: Proc. ICML, pp. 1737–1746. 6 / 9

slide-21
SLIDE 21

21

Systolic Array (SA) of Multiple-and- ACCumulate (MACC) units L2 Cache (BRAM) READ Top Controller WRITE L2-to-SA AXI Interface to DDR3 8GB DDR3 SO-DIMM

Xilinx Kintex K325T FPGA

FPGA prototyping: GEMM with stochastic rounding

DSP MACC DSP MACC

Input FIFOs: Matrix A Input FIFOs: Matrix B MACC units (n x n array)

DSP MACC DSP MACC DSP MACC DSP MACC FIFO FIFO DSP MACC DSP MACC FIFO DSP MACC FIFO FIFO FIFO

n x n Systolic Array

n n Wavefront systolic array for computing matrix product AB. Arrows indicate dataflow Top-level controller and memory hierarchy designed to maximize data reuse

Computation Communication

1

1Suyog Gupta et al. (2015). “Deep learning with limited numerical precision”. In: Proc. ICML, pp. 1737–1746. 6 / 9

slide-22
SLIDE 22

Maximizing data reuse

22

L2- Cache

p.n rows

n columns

A-cache (p.n rows) B-cache (n cols)

Matrix A [N x K] Matrix B [K x M]

MUX MUX

Inner Loop: Cycle through columns of Matrix B (M/n iterations) Outer Loop: Cycle through rows of Matrix A (K/p.n iterations) Re-use factor for Matrix A: M times Re-use factor for Matrix B: p.n times

n : dimension of the systolic array p : parameter chosen based on available BRAM resources

1

1Suyog Gupta et al. (2015). “Deep learning with limited numerical precision”. In: Proc. ICML, pp. 1737–1746. 6 / 9

slide-23
SLIDE 23

Stochastic rounding

23

DSP MACC DSP MACC DSP MACC DSP MACC FIFO FIFO FIFO FIFO DSP ROUND FIFO DSP ROUND FIFO

Output path

Output C FIFOs

Local registers

1

1Suyog Gupta et al. (2015). “Deep learning with limited numerical precision”. In: Proc. ICML, pp. 1737–1746. 6 / 9

slide-24
SLIDE 24

Stochastic rounding

24

DSP MACC DSP MACC DSP MACC DSP MACC FIFO FIFO FIFO FIFO DSP ROUND FIFO DSP ROUND FIFO

Output path

Output C FIFOs

Local registers

DSP ROUND

Accumulated result

  • LSBs to be rounded-off

Pseudo-random number generated using LFSR

+

Truncate LSBs, and saturate to limits if result exceeds range These operations can be implemented efficiently using a single DSP unit

1

1Suyog Gupta et al. (2015). “Deep learning with limited numerical precision”. In: Proc. ICML, pp. 1737–1746. 6 / 9

slide-25
SLIDE 25

Overview

Fixed-Point Representation Binary/Ternary Network Reading List

7 / 9

slide-26
SLIDE 26

6

Binarized Neural Networks (BNN)

Input Map

2.4 6.2 … 3.3 1.8

… Weights

0.8 0.1 0.3 0.8

Input Map

(Binary) 1 −1 … 1 1

… Weights

(Binary) 1 −1 1 −1

=

Output Map

5.0 9.1 … 4.3 7.8

=

123

(Integer) 1 −3 … 3 −7

… 423 = 123 − 5 67 − 8

  • : + <

Output Map

(Binary) 1 −1 … 1 −1

… =23 = >+1 if 423 ≥ 0 −1 otherwise

Batch Normalization Binarization

Key Differences

  • 1. Inputs are binarized (−1 or +1)
  • 2. Weights are binarized (−1 or +1)
  • 3. Results are binarized after

batch normalization CNN BNN

7 / 9

slide-27
SLIDE 27
  • 6 conv layers, 3 dense layers, 3 max pooling layers
  • All conv filters are 3x3
  • First conv layer takes in floating-point input
  • 13.4 Mbits total model size (after hardware optimizations)

7

BNN CIFAR-10 Architecture [2]

[2] M. Courbariaux et al. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1

  • r -1. arXiv:1602.02830, Feb 2016.

32x32 16x16 8x8 4x4 3 128 128 256 256 512 512 1024 1024 10 Number of feature maps Feature map dimensions

7 / 9

slide-28
SLIDE 28
  • 1. Floating point ops replaced with binary logic ops

– Encode {+1,−1} as {0,1} à multiplies become XORs – Conv/dense layers do dot products à XOR and popcount – Operations can map to LUT fabric as opposed to DSPs

  • 2. Binarized weights may reduce total model size

– Fewer bits per weight may be offset by having more weights

8

Advantages of BNN

b1 b2 b1

1 ⨯

⨯ b2 +1 +1 +1 +1 −1 −1 −1 +1 −1 −1 −1 +1 b1 b2 b1

1 XO

XOR b2 1 1 1 1 1 1

7 / 9

slide-29
SLIDE 29

Architecture Depth Param Bits (Float) Param Bits (Fixed-Point) Error Rate (%) ResNet [3]

(CIFAR-10)

164 51.9M 13.0M* 11.26 BNN [2] 9

  • 13.4M

11.40

9

BNN vs CNN Parameter Efficiency

Comparison:

– Conservative assumption: ResNet can use 8-bit weights – BNN is based on VGG (less advanced architecture) – BNN seems to hold promise! * Assuming each float param can be quantized to 8-bit fixed-point

[2] M. Courbariaux et al. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1

  • r -1. arXiv:1602.02830, Feb 2016.

[3] K. He, X. Zhang, S. Ren, and J. Sun. Identity Mappings in Deep Residual Networks. ECCV 2016. 7 / 9

slide-30
SLIDE 30

I ∗

+''−''×' 1x' 1x'

OperaIons' Memory' ComputaIon'

+''−''' ~32x' ~2x' XNOR' BitXcount' ~32x' ~58x'

I ∗ I ∗ I ∗

R R R R B R B R B

Binary'Weight'Networks' XNORXNetworks'

2

2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:

  • Proc. ECCV, pp. 525–542.

8 / 9

slide-31
SLIDE 31

I ∗

+''−''×' 1x' 1x'

OperaIons' Memory' ComputaIon'

+''−''' ~32x' ~2x'

I ∗

XNOR' BitXcount' ~32x' ~58x'

I ∗ I ∗

R R R R B R B R B

2

2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:

  • Proc. ECCV, pp. 525–542.

8 / 9

slide-32
SLIDE 32

I ∗

∗ W

))

gn(XT R R

))

gn(XT

B

R

WB ∗ W

B

R

WB

WB = sign(W)

2

2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:

  • Proc. ECCV, pp. 525–542.

8 / 9

slide-33
SLIDE 33

Quantization Error

WB = sign(W)

_'

0.75

R B

∗ W

WB

2

2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:

  • Proc. ECCV, pp. 525–542.

8 / 9

slide-34
SLIDE 34

Optimal Scaling Factor

α∗, WB∗ = arg min

WB,α{||W − αWB||2}

α∗ | = 1

nkWk`1 WB∗ = sign(W) ∗ W

≈ computing α

R B

WB

2

2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:

  • Proc. ECCV, pp. 525–542.

8 / 9

slide-35
SLIDE 35

I ∗

R

I ∗computing α

R B

( )

≈ R

How'to'train'a'CNN'with'binary'filters?'

2

2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:

  • Proc. ECCV, pp. 525–542.

8 / 9

slide-36
SLIDE 36

Training Binary Weight Networks

Naive S Solution: ! .1 1. . ! 2 . 1. .

2

2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:

  • Proc. ECCV, pp. 525–542.

8 / 9

slide-37
SLIDE 37

0' 10' 20' 30' 40' 50' 60' '' ' AlexNet'TopX1'(%)'ILSVRC2012' 56.7' 0.2' Full'Precision' Naïve'

2

2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:

  • Proc. ECCV, pp. 525–542.

8 / 9

slide-38
SLIDE 38

''.'.'.''' '.'.'.'' W W

R R R

''.'.'.''' '.'.'.'' WB

B B B Binarization

2

2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:

  • Proc. ECCV, pp. 525–542.

8 / 9

slide-39
SLIDE 39

''.'.'.''' '.'.'.''

B B B

Person' Dog'

''.'.'.''' '.'.'.'' W W

R R R Binarization

2

2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:

  • Proc. ECCV, pp. 525–542.

8 / 9

slide-40
SLIDE 40

Binary Weight Network

''.'.'.''' '.'.'.''

R R R

Train f for b binary w y weights:

  • 1. Randomly initialize W
  • 2. For iter = 1 to N

3. Load a random input image X 4. WB = sign(W) 5. α = kWk`1

n

6. Forward pass with α, WB 7. Compute loss function C 8.

@C @W = Backward pass with α, WB

9. Update W (W = W − @C

@W)

2

2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:

  • Proc. ECCV, pp. 525–542.

8 / 9

slide-41
SLIDE 41

Binary Weight Network

W W ''.'.'.''' '.'.'.''

R R R

''.'.'.''' '.'.'.''

R R R

Train f for b binary w y weights:

  • 1. Randomly initialize W
  • 2. For iter = 1 to N

3. Load a random input image X 4. WB = sign(W) 5. α = kWk`1

n

6. Forward pass with α, WB 7. Compute loss function C 8.

@C @W = Backward pass with α, WB

9. Update W (W = W − @C

@W)

2

2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:

  • Proc. ECCV, pp. 525–542.

8 / 9

slide-42
SLIDE 42

Binary Weight Network

''.'.'.''' '.'.'.'' ''.'.'.''' '.'.'.'' W W WB

R R R B B B

Train f for b binary w y weights:

  • 1. Randomly initialize W
  • 2. For iter = 1 to N

3. Load a random input image X 4. WB = sign(W) 5. α = kWk`1

n

6. Forward pass with α, WB 7. Compute loss function C 8.

@C @W = Backward pass with α, WB

9. Update W (W = W − @C

@W)

2

2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:

  • Proc. ECCV, pp. 525–542.

8 / 9

slide-43
SLIDE 43

Binary Weight Network

W W ''.'.'.''' '.'.'.'' ''.'.'.''' '.'.'.'' WB

R R R B B B

Train f for b binary w y weights:

  • 1. Randomly initialize W
  • 2. For iter = 1 to N

3. Load a random input image X 4. WB = sign(W) 5. α = kWk`1

n

6. Forward pass with α, WB 7. Compute loss function C 8.

@C @W = Backward pass with α, WB

9. Update W (W = W − @C

@W)

2

2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:

  • Proc. ECCV, pp. 525–542.

8 / 9

slide-44
SLIDE 44

Binary Weight Network

LOSS$

''.'.'.''' '.'.'.'' W W

R R R

''.'.'.''' '.'.'.'' WB

B B B

Train f for b binary w y weights:

  • 1. Randomly initialize W
  • 2. For iter = 1 to N

3. Load a random input image X 4. WB = sign(W) 5. α = kWk`1

n

6. Forward pass with α, WB 7. Compute loss function C 8.

@C @W = Backward pass with α, WB

9. Update W (W = W − @C

@W)

2

2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:

  • Proc. ECCV, pp. 525–542.

8 / 9

slide-45
SLIDE 45

Binary Weight Network

sign(x) ! Gx !

X1' +1' X1' +1' +1'

[Hinton et al. 2012]

LOSS$

''.'.'.''' '.'.'.'' WB ''.'.'.''' '.'.'.'' W W

R R R B B B

''.'.'.''' '.'.'.'' Gw

R R R

Train f for b binary w y weights:

  • 1. Randomly initialize W
  • 2. For iter = 1 to N

3. Load a random input image X 4. WB = sign(W) 5. α = kWk`1

n

6. Forward pass with α, WB 7. Compute loss function C 8.

@C @W = Backward pass with α, WB

9. Update W (W = W − @C

@W)

2

2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:

  • Proc. ECCV, pp. 525–542.

8 / 9

slide-46
SLIDE 46

Binary Weight Network

W = = W W -

  • Gw

''.'.'.''' '.'.'.''

R R R

''.'.'.''' '.'.'.''

R R R

''.'.'.''' '.'.'.''

R R R

Gw W W

Train f for b binary w y weights:

  • 1. Randomly initialize W
  • 2. For iter = 1 to N

3. Load a random input image X 4. WB = sign(W) 5. α = kWk`1

n

6. Forward pass with α, WB 7. Compute loss function C 8.

@C @W = Backward pass with α, WB

9. Update W (W = W − @C

@W)

2

2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:

  • Proc. ECCV, pp. 525–542.

8 / 9

slide-47
SLIDE 47

0' 10' 20' 30' 40' 50' 60' '' ' AlexNet'TopX1'(%)'ILSVRC2012' 56.7' 0.2' 56.8' Full'Precision' Naïve' Binary'Weight'

2

2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:

  • Proc. ECCV, pp. 525–542.

8 / 9

slide-48
SLIDE 48

+''−''×' 1x' 1x'

OperaIons' Memory' ComputaIon'

+''−''' ~32x' ~2x' XNOR' BitXcount' ~32x' ~58x'

I ∗ I ∗ I ∗ I ∗

R R R R B R B R B

XNORXNetworks'

2

2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:

  • Proc. ECCV, pp. 525–542.

8 / 9

slide-49
SLIDE 49

Binary Input and Binary Weight (XNOR- Net)

))

))

gn(XT R

∗ W

R

B

WB XB

B

B α β R B α

2

2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:

  • Proc. ECCV, pp. 525–542.

8 / 9

slide-50
SLIDE 50

Binary Input and Binary Weight (XNOR- Net)

))

))

gn(XT R

∗ W

R

B

WB XB

B

B α β R B α Y γ YB

Y ≈

γ YB

γ∗ = 1 nkYkℓ1 YB∗ = sign(Y)

α∗ = 1 n kWkℓ1

β∗ = 1 n kXkℓ1

WB∗ = sign(W) XB∗ = sign(X)

YB∗, γ∗ = arg min

YB,γ kY − γYBk2 2

2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:

  • Proc. ECCV, pp. 525–542.

8 / 9

slide-51
SLIDE 51

B sign(X) R B

(1) Binarizing Weights

= = =

(3) Convolution with XNOR-Bitcount

B B R R sign(X)

≈ c"

(2) Binarizing Input Efficient

=

P |X:,:,i| c

Redundant computation in overlapping areas Inefficient (2) Binarizing Input

X R B sign(X)

=

Average Filter

2

2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:

  • Proc. ECCV, pp. 525–542.

8 / 9

slide-52
SLIDE 52

B B R R sign(X)

≈ , β∗

  • 1. Randomly initialize W
  • 2. For iter = 1 to N

3. Load a random input image X 4. WB = sign(W) 5. α = kWk`1

n

6. Forward pass with α, WB 7. Compute loss function C 8.

@C @W = Backward pass with α, WB

9. Update W (W = W − @C

@W)

2

2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:

  • Proc. ECCV, pp. 525–542.

8 / 9

slide-53
SLIDE 53

0' 10' 20' 30' 40' 50' 60' '' ' AlexNet'TopX1'(%)'ILSVRC2012' 56.7' 0.2' 56.8' 30.5'

2

2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:

  • Proc. ECCV, pp. 525–542.

8 / 9

slide-54
SLIDE 54

Network Structure in XNOR-Networks

sign(x) !

X1' +1'

A'typical'block'in'CNN' BNorm' AcIv' Pool' Conv' '

✗InformaIon'Loss' ✓MulIple'Maximums'

MaxXPooling'

2

2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:

  • Proc. ECCV, pp. 525–542.

8 / 9

slide-55
SLIDE 55

Network Structure in XNOR-Networks

BNorm' AcIv' Pool' Conv' '

✗InformaIon'Loss' ✓MulIple'Maximums'

2

2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:

  • Proc. ECCV, pp. 525–542.

8 / 9

slide-56
SLIDE 56

Network Structure in XNOR-Networks

✓InformaIon'Loss' ✓MulIple'Maximums'

BNorm' AcIv' BNorm' AcIv' Pool' Conv' '

2

2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:

  • Proc. ECCV, pp. 525–542.

8 / 9

slide-57
SLIDE 57

BNorm' AcIv' Pool' Conv' '

  • 1. Randomly initialize W
  • 2. For iter = 1 to N

3. Load a random input image X 4. WB = sign(W) 5. α = kWk`1

n

6. Forward pass with α, WB 7. Compute loss function C 8.

@C @W = Backward pass with α, WB

9. Update W (W = W − @C

@W)

B B R R sign(X)

∗, β∗

2

2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:

  • Proc. ECCV, pp. 525–542.

8 / 9

slide-58
SLIDE 58

0' 10' 20' 30' 40' 50' 60' AlexNet'TopX1'(%)'ILSVRC2012' 56.7' 0.2' 56.8' 30.5' 44.2' ✓ 32x'Smaller'Model'

50 100 150 200 250 300 350 400 450 500 AlexNet VGG ResNet-18 Float Binary 245 MB 500 MB 100 MB 7.4 MB 16 MB 1.5 MB

✓ 58x'Less'ComputaIon'

1 32 1024 number of channels 0x 20x 40x 60x 80x Speedup by varying channel size 0x0 10x10 20x20 filter size 50x 55x 60x 65x Speedup by varying filter size

2

2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:

  • Proc. ECCV, pp. 525–542.

8 / 9

slide-59
SLIDE 59

0' 10' 20' 30' 40' 50' 60' 70' 80' 90' AlexNet'Top.1$&$5'(%)'ILSVRC2012'

2

2Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In:

  • Proc. ECCV, pp. 525–542.

8 / 9

slide-60
SLIDE 60

Overview

Fixed-Point Representation Binary/Ternary Network Reading List

9 / 9

slide-61
SLIDE 61

Further Reading List

Fixed-Point Representation:

◮ Darryl Lin, Sachin Talathi, and Sreekanth Annapureddy (2016). “Fixed point

quantization of deep convolutional networks”. In: Proc. ICML, pp. 2849–2858

◮ Soroosh Khoram and Jing Li (2018). “Adaptive quantization of neural networks”. In:

  • Proc. ICLR

Binary/Ternary Network:

◮ Hyeonuk Kim et al. (2017). “A Kernel Decomposition Architecture for Binary-weight

Convolutional Neural Networks”. In: Proc. DAC, 60:1–60:6

◮ Chenzhuo Zhu et al. (2017). “Trained ternary quantization”. In: Proc. ICLR

9 / 9