Deep Learning for Mobile Part II Instructor - Simon Lucey 16-623 - Designing Computer Vision Apps
Today • Motivation • XNOR Networks • YOLO
State of the Art Recognition Methods State of the art recognition methods • 'Very'Expensive'' • Memory' • ComputaIon' • Power'
Convolutional Neural Networks … ' … '
Common Deep Learning Packages • Deep Learning Packages used include, • Caffe (out of Berkley - first popular package). • MatConvNet (MATLAB interface very easy to use). • Torch (based on LUA used by Facebook) • TensorFlow (based on Python used by Google).
TensorFlow
TensorFlow
GPU ! I ⇤ R R +''−''×' Number'of'Opera-ons':' Inference'-me'on'CPU':' • AlexNet'' ! 1.5B'FLOPs' • AlexNet'' ! ~3'fps' • VGG'''''''' ! '19.6B'FLOPs' • VGG'''''''' ! '~0.25'fps' Rastegari, Mohammad, et al. "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks." ECCV 2016
TensorFlow iOS
Accelerate Framework “image operations” “matrix operations” “signal processing” “misc math”
Accelerate Framework “image operations” “matrix operations” “signal processing” “misc math” BNNS (2016) “basic neural network subroutines”
Accelerate Framework “image operations” “matrix operations” “signal processing” “misc math” BNNS (2016) “basic neural network subroutines” (Taken from https://www.bignerdranch.com/blog/neural-networks-in-ios-10-and-macos/ )
Deep Learning Kit http://deeplearningkit.org/
Today • Motivation • XNOR Networks • YOLO
Lower Precision Lower Precision [Han'et'al.'2016]' 6400 4800 Reducing'Precision' Saving'Memory' • 3200 Saving'ComputaIon' • 1600 R B 0 I − 0.05 0 0.05 ∈ { − 1 , +1 } 1Xbit' 32Xbit' 8Xbit' {X1,+1}' {0,1}' MUL' XNOR' ADD,'SUB' BitXCount'(popcount)' Rastegari, Mohammad, et al. "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks." ECCV 2016
Why Binary? • Binary'InstrucIons'' • AND,'OR,'XOR,'XNOR,''PoPCount'(BitXCount)' ' • Low'Power'Device' Rastegari, Mohammad, et al. "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks." ECCV 2016
Why Binary? I ⇤ OperaIons' Memory' ComputaIon' I ⇤ +''−''×' 1x' 1x' R R I ⇤ +''−''' ~32x' ~2x' R R B Binary'Weight'Networks' XNOR' I ⇤ ~32x' ~58x' R B R B BitXcount' XNORXNetworks' Rastegari, Mohammad, et al. "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks." ECCV 2016
Why Binary? I ⇤ OperaIons' Memory' ComputaIon' I ⇤ +''−''×' 1x' 1x' R R I ⇤ +''−''' ~32x' ~2x' R R B XNOR' I ⇤ ~32x' ~58x' R B R B BitXcount' Rastegari, Mohammad, et al. "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks." ECCV 2016
Reminder: XNOR Rastegari, Mohammad, et al. "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks." ECCV 2016
Why Binary? I ⇤ ⇡ )) � )) � B R R R gn( X T W B gn( X T ⇤ W ⇡ R B W B ⇤ W W B = sign(W) Rastegari, Mohammad, et al. "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks." ECCV 2016
Quantization Methods W B = sign(W) W B _' ⇤ W ⇡ 0.75 R B
Optimal Scaling Factor α , W B = J ( α , W B ) arg min J ( α , W B ) = || W − α W B || 2 2
Optimal Scaling Factor α , W B = J ( α , W B ) arg min J ( α , W B ) = || W − α W B || 2 2 α 2 B T B − 2 α B T W + W T W � � J ( α , B ) = tr
Optimal Scaling Factor α , W B = J ( α , W B ) arg min J ( α , W B ) = || W − α W B || 2 2 α 2 B T B − 2 α B T W + W T W � � J ( α , B ) = tr B T W = α 2 n − 2 α · tr � � + constant
Simple Example α 2 − 2 α · b · w + constant arg min b s.t. b ∈ { +1 , − 1 } • Since we know that is always positive then, α b = +1 , if w > 0 b = − 1 , if w < 0 • Or more simply, b = sign( w )
Optimal Scaling Factor W B = sign( W ) • Since then, W T sign( W ) � � || W || ` 1 = tr • Therefore, J ( α ) = α 2 n − 2 α · || W || ` 1 + constant
Optimal Scaling Factor ⇡ computing α R B W B ⇤ W α ∗ , W B ∗ = arg min W B , α {|| W − α W B || 2 } W B ∗ = sign( W ) α ∗ | = 1 n k W k ` 1
How to train a CNN with binary filters? ) ( I ⇤ computing α I ⇤ R ≈ B R R
Naive Solution 1. Train a network with real parameters. 2. Binarize the weight filters.
Naive Solution R W W '.'.'.'' ''.'.'.''' R R
Naive Solution R W W '.'.'.'' ''.'.'.''' R R B W B '.'.'.'' ''.'.'.''' B B
Naive Solution AlexNet'TopX1'(%)'ILSVRC2012' 60' 56.7' 50' 40' 30' 20' 10' 0.2' 0' Full'Precision' '' ' N a ï v e '
Binary Weight Network Binary Weight Network Train f for b binary w y weights: 1. Randomly initialize W 2. For iter = 1 to N R '.'.'.'' ''.'.'.''' R R 3. Load a random input image X W B = sign( W ) 4. α = k W k ` 1 5. n Forward pass with α , W B 6. 7. Compute loss function C ∂ C ∂ W = Backward pass with α , W B 8. Update W ( W = W � ∂ C ∂ W ) 9.
Binary Weight Network Train f for b binary w y weights: 1. Randomly initialize W 2. For iter = 1 to N R R '.'.'.'' '.'.'.'' ''.'.'.''' ''.'.'.''' R R R R 3. Load a random input image X W B = sign( W ) 4. α = k W k ` 1 5. n Forward pass with α , W B 6. 7. Compute loss function C ∂ C ∂ W = Backward pass with α , W B 8. Update W ( W = W � ∂ C ∂ W ) 9.
Binary Weight Network W W Binary Weight Network R '.'.'.'' ''.'.'.''' R R Train f for b binary w y weights: 1. Randomly initialize W W B 2. For iter = 1 to N B '.'.'.'' ''.'.'.''' B B 3. Load a random input image X W B = sign( W ) 4. α = k W k ` 1 5. n Forward pass with α , W B 6. 7. Compute loss function C ∂ C ∂ W = Backward pass with α , W B 8. Update W ( W = W � ∂ C ∂ W ) 9.
Binary Weight Network W W Binary Weight Network R '.'.'.'' ''.'.'.''' R R Train f for b binary w y weights: 1. Randomly initialize W W B 2. For iter = 1 to N B '.'.'.'' ''.'.'.''' B B 3. Load a random input image X W B = sign( W ) 4. α = k W k ` 1 5. n Forward pass with α , W B 6. 7. Compute loss function C ∂ C ∂ W = Backward pass with α , W B 8. Update W ( W = W � ∂ C ∂ W ) 9.
Binary Weight Network W W Binary Weight Network R '.'.'.'' ''.'.'.''' R R Train f for b binary w y weights: 1. Randomly initialize W W B 2. For iter = 1 to N B '.'.'.'' ''.'.'.''' B B LOSS$ 3. Load a random input image X W B = sign( W ) 4. α = k W k ` 1 5. n LOSS$ Forward pass with α , W B 6. 7. Compute loss function C ∂ C ∂ W = Backward pass with α , W B 8. Update W ( W = W � ∂ C ∂ W ) 9.
Binary Weight Network W W Binary Weight Network R '.'.'.'' ''.'.'.''' R R Train f for b binary w y weights: 1. Randomly initialize W W B 2. For iter = 1 to N B '.'.'.'' ''.'.'.''' B B LOSS$ 3. Load a random input image X W B = sign( W ) 4. α = k W k ` 1 5. n Forward pass with α , W B 6. 7. Compute loss function C G w ∂ C ∂ W = Backward pass with α , W B 8. R '.'.'.'' ''.'.'.''' R Update W ( W = W � ∂ C R ∂ W ) 9.
Gradients of Binary Weights g ( w ) = f (sign { w } ) = f ( w b ) = ∂ f ( w b ) ∂ g ( w ) ∂ sign( w ) ∂ [ w b ] T ∂ w T ∂ w T +1' ∂ sign( x ) +1' sign( x ) G x ! sign(x) ! X1' +1' ∂ x X1'
Binary Weight Network W W Binary Weight Network R '.'.'.'' ''.'.'.''' R R - � G w Train f for b binary w y weights: W = = W W - 1. Randomly initialize W 2. For iter = 1 to N R '.'.'.'' ''.'.'.''' R R 3. Load a random input image X W B = sign( W ) 4. α = k W k ` 1 5. n Forward pass with α , W B 6. 7. Compute loss function C G w ∂ C ∂ W = Backward pass with α , W B 8. R '.'.'.'' ''.'.'.''' R Update W ( W = W � ∂ C R ∂ W ) 9.
Binary Weight Network AlexNet'TopX1'(%)'ILSVRC2012' 60' 56.8' 56.7' 50' 40' 30' 20' 10' 0.2' 0' '' ' Full'Precision' Naïve' Binary'Weight'
Reminder I ⇤ OperaIons' Memory' ComputaIon' I ⇤ +''−''×' 1x' 1x' R R I ⇤ +''−''' ~32x' ~2x' R B R XNOR' I ⇤ ~32x' ~58x' R B R B BitXcount' XNORXNetworks'
Binary Weight - Binary Input Network R B α B α β ⇡ )) � B )) � B R R X B W B gn( X T ⇤ W
Binary Weight - Binary Input Network R B α B α β ⇡ )) � B )) � B R R W B X B gn( X T ⇤ W Y B Y γ γ Y B ⇡ Y Y B ∗ , γ ∗ = arg min Y B , γ ∥ Y − γ Y B ∥ 2 γ ∗ = 1 Y B ∗ = sign( Y ) n ∥ Y ∥ ℓ 1 X B ∗ = sign( X ) W B ∗ = sign( W ) β ∗ = 1 α ∗ = 1 n ∥ X ∥ ℓ 1 n ∥ W ∥ ℓ 1
Binary Weight - Binary Input Network (1) Binarizing Weights B R = (2) Binarizing Input Redundant computation in overlapping areas = R B Inefficient = sign( X ) X (2) Binarizing Input � | X : , : , i | = = B Efficient c sign( X ) Average Filter c" (3) Convolution with XNOR-Bitcount ≈ R B R B sign( X )
Recommend
More recommend