deep learning for mobile part ii
play

Deep Learning for Mobile Part II Instructor - Simon Lucey 16-623 - - PowerPoint PPT Presentation

Deep Learning for Mobile Part II Instructor - Simon Lucey 16-623 - Designing Computer Vision Apps Today Motivation XNOR Networks YOLO State of the Art Recognition Methods State of the art recognition methods


  1. Deep Learning for Mobile Part II Instructor - Simon Lucey 16-623 - Designing Computer Vision Apps

  2. Today • Motivation • XNOR Networks • YOLO

  3. State of the Art Recognition Methods State of the art recognition methods • 'Very'Expensive'' • Memory' • ComputaIon' • Power'

  4. Convolutional Neural Networks … ' … '

  5. Common Deep Learning Packages • Deep Learning Packages used include, • Caffe (out of Berkley - first popular package). • MatConvNet (MATLAB interface very easy to use). • Torch (based on LUA used by Facebook) • TensorFlow (based on Python used by Google).

  6. TensorFlow

  7. TensorFlow

  8. GPU ! I ⇤ R R +''−''×' Number'of'Opera-ons':' Inference'-me'on'CPU':' • AlexNet'' ! 1.5B'FLOPs' • AlexNet'' ! ~3'fps' • VGG'''''''' ! '19.6B'FLOPs' • VGG'''''''' ! '~0.25'fps' Rastegari, Mohammad, et al. "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks." ECCV 2016

  9. TensorFlow iOS

  10. Accelerate Framework “image operations” “matrix operations” “signal processing” “misc math”

  11. Accelerate Framework “image operations” “matrix operations” “signal processing” “misc math” BNNS (2016) “basic neural network subroutines”

  12. Accelerate Framework “image operations” “matrix operations” “signal processing” “misc math” BNNS (2016) “basic neural network subroutines” (Taken from https://www.bignerdranch.com/blog/neural-networks-in-ios-10-and-macos/ )

  13. Deep Learning Kit http://deeplearningkit.org/

  14. Today • Motivation • XNOR Networks • YOLO

  15. Lower Precision Lower Precision [Han'et'al.'2016]' 6400 4800 Reducing'Precision' Saving'Memory' • 3200 Saving'ComputaIon' • 1600 R B 0 I − 0.05 0 0.05 ∈ { − 1 , +1 } 1Xbit' 32Xbit' 8Xbit' {X1,+1}' {0,1}' MUL' XNOR' ADD,'SUB' BitXCount'(popcount)' Rastegari, Mohammad, et al. "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks." ECCV 2016

  16. Why Binary? • Binary'InstrucIons'' • AND,'OR,'XOR,'XNOR,''PoPCount'(BitXCount)' ' • Low'Power'Device' Rastegari, Mohammad, et al. "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks." ECCV 2016

  17. Why Binary? I ⇤ OperaIons' Memory' ComputaIon' I ⇤ +''−''×' 1x' 1x' R R I ⇤ +''−''' ~32x' ~2x' R R B Binary'Weight'Networks' XNOR' I ⇤ ~32x' ~58x' R B R B BitXcount' XNORXNetworks' Rastegari, Mohammad, et al. "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks." ECCV 2016

  18. Why Binary? I ⇤ OperaIons' Memory' ComputaIon' I ⇤ +''−''×' 1x' 1x' R R I ⇤ +''−''' ~32x' ~2x' R R B XNOR' I ⇤ ~32x' ~58x' R B R B BitXcount' Rastegari, Mohammad, et al. "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks." ECCV 2016

  19. Reminder: XNOR Rastegari, Mohammad, et al. "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks." ECCV 2016

  20. Why Binary? I ⇤ ⇡ )) � )) � B R R R gn( X T W B gn( X T ⇤ W ⇡ R B W B ⇤ W W B = sign(W) Rastegari, Mohammad, et al. "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks." ECCV 2016

  21. Quantization Methods W B = sign(W) W B _' ⇤ W ⇡ 0.75 R B

  22. Optimal Scaling Factor α , W B = J ( α , W B ) arg min J ( α , W B ) = || W − α W B || 2 2

  23. Optimal Scaling Factor α , W B = J ( α , W B ) arg min J ( α , W B ) = || W − α W B || 2 2 α 2 B T B − 2 α B T W + W T W � � J ( α , B ) = tr

  24. Optimal Scaling Factor α , W B = J ( α , W B ) arg min J ( α , W B ) = || W − α W B || 2 2 α 2 B T B − 2 α B T W + W T W � � J ( α , B ) = tr B T W = α 2 n − 2 α · tr � � + constant

  25. Simple Example α 2 − 2 α · b · w + constant arg min b s.t. b ∈ { +1 , − 1 } • Since we know that is always positive then, α b = +1 , if w > 0 b = − 1 , if w < 0 • Or more simply, b = sign( w )

  26. Optimal Scaling Factor W B = sign( W ) • Since then, W T sign( W ) � � || W || ` 1 = tr • Therefore, J ( α ) = α 2 n − 2 α · || W || ` 1 + constant

  27. Optimal Scaling Factor ⇡ computing α R B W B ⇤ W α ∗ , W B ∗ = arg min W B , α {|| W − α W B || 2 } W B ∗ = sign( W ) α ∗ | = 1 n k W k ` 1

  28. How to train a CNN with binary filters? ) ( I ⇤ computing α I ⇤ R ≈ B R R

  29. Naive Solution 1. Train a network with real parameters. 2. Binarize the weight filters.

  30. Naive Solution R W W '.'.'.'' ''.'.'.''' R R

  31. Naive Solution R W W '.'.'.'' ''.'.'.''' R R B W B '.'.'.'' ''.'.'.''' B B

  32. Naive Solution AlexNet'TopX1'(%)'ILSVRC2012' 60' 56.7' 50' 40' 30' 20' 10' 0.2' 0' Full'Precision' '' ' N a ï v e '

  33. Binary Weight Network Binary Weight Network Train f for b binary w y weights: 1. Randomly initialize W 2. For iter = 1 to N R '.'.'.'' ''.'.'.''' R R 3. Load a random input image X W B = sign( W ) 4. α = k W k ` 1 5. n Forward pass with α , W B 6. 7. Compute loss function C ∂ C ∂ W = Backward pass with α , W B 8. Update W ( W = W � ∂ C ∂ W ) 9.

  34. Binary Weight Network Train f for b binary w y weights: 1. Randomly initialize W 2. For iter = 1 to N R R '.'.'.'' '.'.'.'' ''.'.'.''' ''.'.'.''' R R R R 3. Load a random input image X W B = sign( W ) 4. α = k W k ` 1 5. n Forward pass with α , W B 6. 7. Compute loss function C ∂ C ∂ W = Backward pass with α , W B 8. Update W ( W = W � ∂ C ∂ W ) 9.

  35. Binary Weight Network W W Binary Weight Network R '.'.'.'' ''.'.'.''' R R Train f for b binary w y weights: 1. Randomly initialize W W B 2. For iter = 1 to N B '.'.'.'' ''.'.'.''' B B 3. Load a random input image X W B = sign( W ) 4. α = k W k ` 1 5. n Forward pass with α , W B 6. 7. Compute loss function C ∂ C ∂ W = Backward pass with α , W B 8. Update W ( W = W � ∂ C ∂ W ) 9.

  36. Binary Weight Network W W Binary Weight Network R '.'.'.'' ''.'.'.''' R R Train f for b binary w y weights: 1. Randomly initialize W W B 2. For iter = 1 to N B '.'.'.'' ''.'.'.''' B B 3. Load a random input image X W B = sign( W ) 4. α = k W k ` 1 5. n Forward pass with α , W B 6. 7. Compute loss function C ∂ C ∂ W = Backward pass with α , W B 8. Update W ( W = W � ∂ C ∂ W ) 9.

  37. Binary Weight Network W W Binary Weight Network R '.'.'.'' ''.'.'.''' R R Train f for b binary w y weights: 1. Randomly initialize W W B 2. For iter = 1 to N B '.'.'.'' ''.'.'.''' B B LOSS$ 3. Load a random input image X W B = sign( W ) 4. α = k W k ` 1 5. n LOSS$ Forward pass with α , W B 6. 7. Compute loss function C ∂ C ∂ W = Backward pass with α , W B 8. Update W ( W = W � ∂ C ∂ W ) 9.

  38. Binary Weight Network W W Binary Weight Network R '.'.'.'' ''.'.'.''' R R Train f for b binary w y weights: 1. Randomly initialize W W B 2. For iter = 1 to N B '.'.'.'' ''.'.'.''' B B LOSS$ 3. Load a random input image X W B = sign( W ) 4. α = k W k ` 1 5. n Forward pass with α , W B 6. 7. Compute loss function C G w ∂ C ∂ W = Backward pass with α , W B 8. R '.'.'.'' ''.'.'.''' R Update W ( W = W � ∂ C R ∂ W ) 9.

  39. Gradients of Binary Weights g ( w ) = f (sign { w } ) = f ( w b ) = ∂ f ( w b ) ∂ g ( w ) ∂ sign( w ) ∂ [ w b ] T ∂ w T ∂ w T +1' ∂ sign( x ) +1' sign( x ) G x ! sign(x) ! X1' +1' ∂ x X1'

  40. Binary Weight Network W W Binary Weight Network R '.'.'.'' ''.'.'.''' R R - � G w Train f for b binary w y weights: W = = W W - 1. Randomly initialize W 2. For iter = 1 to N R '.'.'.'' ''.'.'.''' R R 3. Load a random input image X W B = sign( W ) 4. α = k W k ` 1 5. n Forward pass with α , W B 6. 7. Compute loss function C G w ∂ C ∂ W = Backward pass with α , W B 8. R '.'.'.'' ''.'.'.''' R Update W ( W = W � ∂ C R ∂ W ) 9.

  41. Binary Weight Network AlexNet'TopX1'(%)'ILSVRC2012' 60' 56.8' 56.7' 50' 40' 30' 20' 10' 0.2' 0' '' ' Full'Precision' Naïve' Binary'Weight'

  42. Reminder I ⇤ OperaIons' Memory' ComputaIon' I ⇤ +''−''×' 1x' 1x' R R I ⇤ +''−''' ~32x' ~2x' R B R XNOR' I ⇤ ~32x' ~58x' R B R B BitXcount' XNORXNetworks'

  43. Binary Weight - Binary Input Network R B α B α β ⇡ )) � B )) � B R R X B W B gn( X T ⇤ W

  44. Binary Weight - Binary Input Network R B α B α β ⇡ )) � B )) � B R R W B X B gn( X T ⇤ W Y B Y γ γ Y B ⇡ Y Y B ∗ , γ ∗ = arg min Y B , γ ∥ Y − γ Y B ∥ 2 γ ∗ = 1 Y B ∗ = sign( Y ) n ∥ Y ∥ ℓ 1 X B ∗ = sign( X ) W B ∗ = sign( W ) β ∗ = 1 α ∗ = 1 n ∥ X ∥ ℓ 1 n ∥ W ∥ ℓ 1

  45. Binary Weight - Binary Input Network (1) Binarizing Weights B R = (2) Binarizing Input Redundant computation in overlapping areas = R B Inefficient = sign( X ) X (2) Binarizing Input � | X : , : , i | = = B Efficient c sign( X ) Average Filter c" (3) Convolution with XNOR-Bitcount ≈ R B R B sign( X )

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend