Deep Learning for Mobile Part II
Instructor - Simon Lucey
16-623 - Designing Computer Vision Apps
Deep Learning for Mobile Part II Instructor - Simon Lucey 16-623 - - - PowerPoint PPT Presentation
Deep Learning for Mobile Part II Instructor - Simon Lucey 16-623 - Designing Computer Vision Apps Today Motivation XNOR Networks YOLO State of the Art Recognition Methods State of the art recognition methods
Instructor - Simon Lucey
16-623 - Designing Computer Vision Apps
Today
State of the Art Recognition Methods
Convolutional Neural Networks
Common Deep Learning Packages
TensorFlow
TensorFlow
Number'of'Opera-ons':'
Inference'-me'on'CPU':'
Rastegari, Mohammad, et al. "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks." ECCV 2016
TensorFlow iOS
Accelerate Framework
“image operations” “matrix operations” “signal processing” “misc math”
Accelerate Framework
“image operations” “matrix operations” “signal processing” “misc math” “basic neural network subroutines”
BNNS
(2016)
Accelerate Framework
“image operations” “matrix operations” “signal processing” “misc math” “basic neural network subroutines”
BNNS
(2016)
(Taken from https://www.bignerdranch.com/blog/neural-networks-in-ios-10-and-macos/ )
Deep Learning Kit
http://deeplearningkit.org/
Today
Lower Precision
32Xbit'
1Xbit'
Reducing'Precision'
{X1,+1}' {0,1}' MUL' XNOR' ADD,'SUB' BitXCount'(popcount)' [Han'et'al.'2016]'
−0.05 0.05 1600 3200 4800 6400
8Xbit'
Lower Precision
Rastegari, Mohammad, et al. "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks." ECCV 2016
Why Binary?
'
Rastegari, Mohammad, et al. "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks." ECCV 2016
Why Binary?
I ⇤
+''−''×' 1x' 1x'
OperaIons' Memory' ComputaIon'
+''−''' ~32x' ~2x' XNOR' BitXcount' ~32x' ~58x'
I ⇤ I ⇤ I ⇤
R R R
R B R B R B
Binary'Weight'Networks' XNORXNetworks'
Rastegari, Mohammad, et al. "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks." ECCV 2016
Why Binary?
I ⇤
+''−''×' 1x' 1x'
OperaIons' Memory' ComputaIon'
+''−''' ~32x' ~2x'
I ⇤
XNOR' BitXcount' ~32x' ~58x'
I ⇤ I ⇤
R R R
R B R B R B
Rastegari, Mohammad, et al. "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks." ECCV 2016
Reminder: XNOR
Rastegari, Mohammad, et al. "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks." ECCV 2016
I ⇤
⇤ W
))
gn(XT
R R
⇡
))
gn(XT
B
R
WB
⇤ W
⇡
B
R
WB
WB = sign(W)
Why Binary?
Rastegari, Mohammad, et al. "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks." ECCV 2016
WB = sign(W)
0.75
R
B
⇤ W
WB
Quantization Methods
Optimal Scaling Factor
α,WB = J(α, WB)
J(α, WB) = ||W − αWB||2
2
Optimal Scaling Factor
α,WB = J(α, WB)
J(α, B) = tr
2
Optimal Scaling Factor
α,WB = J(α, WB)
J(α, B) = tr
2
= α2n − 2α · tr
Simple Example
b
s.t. b ∈ {+1, −1}
α
Optimal Scaling Factor
WB = sign(W)
||W||`1 = tr
Optimal Scaling Factor
α∗, WB∗ = arg min
WB,α{||W − αWB||2}
α∗ | = 1
nkWk`1
WB∗ = sign(W) ⇤ W
R
B
WB
How to train a CNN with binary filters?
I ⇤
R
I ⇤computing α
R
B
≈
R
Naive Solution
Naive Solution
''.'.'.''' '.'.'.'' W W
R R R
Naive Solution
''.'.'.''' '.'.'.'' W W
R R R
''.'.'.''' '.'.'.'' WB
B B B
Naive Solution
0' 10' 20' 30' 40' 50' 60' '' ' AlexNet'TopX1'(%)'ILSVRC2012' 56.7' 0.2' Full'Precision' N a ï v e '
Binary Weight Network
Binary Weight Network
''.'.'.''' '.'.'.''
R R R
Train f for b binary w y weights:
3. Load a random input image X 4. WB = sign(W) 5. α = kWk`1
n
6. Forward pass with α, WB 7. Compute loss function C 8.
∂C ∂W = Backward pass with α, WB
9. Update W (W = W ∂C
∂W)
Binary Weight Network
''.'.'.''' '.'.'.''
R R R
''.'.'.''' '.'.'.''
R R R
Train f for b binary w y weights:
3. Load a random input image X 4. WB = sign(W) 5. α = kWk`1
n
6. Forward pass with α, WB 7. Compute loss function C 8.
∂C ∂W = Backward pass with α, WB
9. Update W (W = W ∂C
∂W)
Binary Weight Network
''.'.'.''' '.'.'.'' ''.'.'.''' '.'.'.'' W W WB
R R R
B B B
Train f for b binary w y weights:
3. Load a random input image X 4. WB = sign(W) 5. α = kWk`1
n
6. Forward pass with α, WB 7. Compute loss function C 8.
∂C ∂W = Backward pass with α, WB
9. Update W (W = W ∂C
∂W)
Binary Weight Network
Binary Weight Network
W W ''.'.'.''' '.'.'.'' ''.'.'.''' '.'.'.'' WB
R R R
B B B
Train f for b binary w y weights:
3. Load a random input image X 4. WB = sign(W) 5. α = kWk`1
n
6. Forward pass with α, WB 7. Compute loss function C 8.
∂C ∂W = Backward pass with α, WB
9. Update W (W = W ∂C
∂W)
Binary Weight Network
LOSS$
''.'.'.''' '.'.'.'' W W
R R R
''.'.'.''' '.'.'.'' WB
B B B
Train f for b binary w y weights:
3. Load a random input image X 4. WB = sign(W) 5. α = kWk`1
n
6. Forward pass with α, WB 7. Compute loss function C 8.
∂C ∂W = Backward pass with α, WB
9. Update W (W = W ∂C
∂W)
Binary Weight Network
LOSS$
LOSS$
''.'.'.''' '.'.'.'' WB ''.'.'.''' '.'.'.'' W W
R R R
B B B
''.'.'.''' '.'.'.'' Gw
R R R
Train f for b binary w y weights:
3. Load a random input image X 4. WB = sign(W) 5. α = kWk`1
n
6. Forward pass with α, WB 7. Compute loss function C 8.
∂C ∂W = Backward pass with α, WB
9. Update W (W = W ∂C
∂W)
Binary Weight Network
Gradients of Binary Weights
sign(x) ! Gx !
X1' +1' X1' +1' +1'
∂g(w) ∂wT = ∂f(wb) ∂[wb]T ∂sign(w) ∂wT
sign(x) ∂sign(x) ∂x
g(w) = f(sign{w}) = f(wb)
W = = W W -
''.'.'.''' '.'.'.''
R R R
''.'.'.''' '.'.'.''
R R R
''.'.'.''' '.'.'.''
R R R
Gw W W
Train f for b binary w y weights:
3. Load a random input image X 4. WB = sign(W) 5. α = kWk`1
n
6. Forward pass with α, WB 7. Compute loss function C 8.
∂C ∂W = Backward pass with α, WB
9. Update W (W = W ∂C
∂W)
Binary Weight Network
Binary Weight Network
0' 10' 20' 30' 40' 50' 60' '' ' AlexNet'TopX1'(%)'ILSVRC2012' 56.7' 0.2' 56.8' Full'Precision' Naïve' Binary'Weight'
Reminder
+''−''×' 1x' 1x'
OperaIons' Memory' ComputaIon'
+''−''' ~32x' ~2x' XNOR' BitXcount' ~32x' ~58x'
I ⇤ I ⇤ I ⇤ I ⇤
R R R
R B R B R B
XNORXNetworks'
Binary Weight - Binary Input Network
))
⇡
))
gn(XT
R
⇤ W
R
B
WB XB
B
B α β
R B α
Binary Weight - Binary Input Network
))
⇡
))
gn(XT
R
⇤ W
R
B
WB XB
B
B α β
R B α
Y
γ
YB
Y
⇡
γ YB
γ∗ = 1 n∥Y∥ℓ1
YB∗ = sign(Y)
α∗ = 1 n ∥W∥ℓ1
β∗ = 1 n ∥X∥ℓ1
WB∗ = sign(W)
XB∗ = sign(X)
YB∗, γ∗ = arg min
YB,γ ∥Y − γYB∥2
Binary Weight - Binary Input Network
B sign(X)
R
B
(1) Binarizing Weights
= = =
(3) Convolution with XNOR-Bitcount
B B
R R
sign(X)
≈
c"
(2) Binarizing Input Efficient
=
|X:,:,i| c
Redundant computation in overlapping areas Inefficient (2) Binarizing Input
X
R
B sign(X)
=
Average Filter
Binary Convolution
I ⇤ W ⇡ (sign(I) ~ sign(W)) Kα
is cNWNI, that some modern
c = no. of channels
NW = no. of elements in W NI = no. of elements in I
I =
64cNW cNW+64.
S
Binary Weight - Binary Input Network
B B
R R
sign(X)
≈
∗, β∗
3. Load a random input image X 4. WB = sign(W) 5. α = kWk`1
n
6. Forward pass with α, WB 7. Compute loss function C 8.
∂C ∂W = Backward pass with α, WB
9. Update W (W = W ∂C
∂W)
XNOR Networks
0' 10' 20' 30' 40' 50' 60' '' ' AlexNet'TopX1'(%)'ILSVRC2012' 56.7' 0.2' 56.8' 30.5'
Problem with Pooling
A typical block in CNN
BNorm$ Ac/v$ Pool$ Conv$ $
✗InformaIon'Loss' ✓MulIple'Maximums'
MaxXPooling'
Problem with Pooling
A block in XNOR-Net
Pool$ BinConv$ $ BNorm$ BinAc/v$
XNOR Network
BNorm' AcIv' Pool' Conv' '
3. Load a random input image X 4. WB = sign(W) 5. α = kWk`1
n
6. Forward pass with α, WB 7. Compute loss function C 8.
∂C ∂W = Backward pass with α, WB
9. Update W (W = W ∂C
∂W)
B B
R R
sign(X)
≈
∗, β∗
XNOR Network
BNorm' AcIv' Pool' Conv' '
3. Load a random input image X 4. WB = sign(W) 5. α = kWk`1
n
6. Forward pass with α, WB 7. Compute loss function C 8.
∂C ∂W = Backward pass with α, WB
9. Update W (W = W ∂C
∂W)
B B
R R
sign(X)
≈
∗, β∗
Refer to this as an “XNOR” Network
Results
0' 10' 20' 30' 40' 50' 60' AlexNet'TopX1'(%)'ILSVRC2012' 56.7' 0.2' 56.8' 30.5' 44.2' ✓ 32x'Smaller'Model'
50 100 150 200 250 300 350 400 450 500 AlexNet VGG ResNet-18 Float Binary 245 MB 500 MB 100 MB 7.4 MB 16 MB 1.5 MB✓ 58x'Less'ComputaIon'
1 32 1024
number of channels
0x 20x 40x 60x 80x
Speedup by varying channel size
0x0 10x10 20x20
filter size
50x 55x 60x 65x
Speedup by varying filter size
Results
0' 10' 20' 30' 40' 50' 60' 70' 80' 90' AlexNet'Top.1$&$5'(%)'ILSVRC2012'
Today
You Only Look Once (YOLO)
Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." CVPR 2016.
You Only Look Once (YOLO)
Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." CVPR 2016.
conditional probability map
You Only Look Once (YOLO)
Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." CVPR 2016.
You Only Look Once (YOLO)
Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." CVPR 2016.
YOLO on Nature
Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." CVPR 2016.
YOLO on Nature
Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." CVPR 2016.