Deep Learning for Mobile Part II Instructor - Simon Lucey 16-623 - - PowerPoint PPT Presentation

Deep Learning for Mobile Part II Instructor - Simon Lucey 16-623 - Designing Computer Vision Apps

Today • Motivation • XNOR Networks • YOLO

State of the Art Recognition Methods State of the art recognition methods • 'Very'Expensive'' • Memory' • ComputaIon' • Power'

Convolutional Neural Networks … ' … '

Common Deep Learning Packages • Deep Learning Packages used include, • Caffe (out of Berkley - first popular package). • MatConvNet (MATLAB interface very easy to use). • Torch (based on LUA used by Facebook) • TensorFlow (based on Python used by Google).

TensorFlow

GPU ! I ⇤ R R +''−''×' Number'of'Opera-ons':' Inference'-me'on'CPU':' • AlexNet'' ! 1.5B'FLOPs' • AlexNet'' ! ~3'fps' • VGG'''''''' ! '19.6B'FLOPs' • VGG'''''''' ! '~0.25'fps' Rastegari, Mohammad, et al. "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks." ECCV 2016

TensorFlow iOS

Accelerate Framework “image operations” “matrix operations” “signal processing” “misc math”

Accelerate Framework “image operations” “matrix operations” “signal processing” “misc math” BNNS (2016) “basic neural network subroutines”

Accelerate Framework “image operations” “matrix operations” “signal processing” “misc math” BNNS (2016) “basic neural network subroutines” (Taken from https://www.bignerdranch.com/blog/neural-networks-in-ios-10-and-macos/ )

Deep Learning Kit http://deeplearningkit.org/

Today • Motivation • XNOR Networks • YOLO

Lower Precision Lower Precision [Han'et'al.'2016]' 6400 4800 Reducing'Precision' Saving'Memory' • 3200 Saving'ComputaIon' • 1600 R B 0 I − 0.05 0 0.05 ∈ { − 1 , +1 } 1Xbit' 32Xbit' 8Xbit' {X1,+1}' {0,1}' MUL' XNOR' ADD,'SUB' BitXCount'(popcount)' Rastegari, Mohammad, et al. "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks." ECCV 2016

Why Binary? • Binary'InstrucIons'' • AND,'OR,'XOR,'XNOR,''PoPCount'(BitXCount)' ' • Low'Power'Device' Rastegari, Mohammad, et al. "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks." ECCV 2016

Why Binary? I ⇤ OperaIons' Memory' ComputaIon' I ⇤ +''−''×' 1x' 1x' R R I ⇤ +''−''' ~32x' ~2x' R R B Binary'Weight'Networks' XNOR' I ⇤ ~32x' ~58x' R B R B BitXcount' XNORXNetworks' Rastegari, Mohammad, et al. "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks." ECCV 2016

Why Binary? I ⇤ OperaIons' Memory' ComputaIon' I ⇤ +''−''×' 1x' 1x' R R I ⇤ +''−''' ~32x' ~2x' R R B XNOR' I ⇤ ~32x' ~58x' R B R B BitXcount' Rastegari, Mohammad, et al. "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks." ECCV 2016

Reminder: XNOR Rastegari, Mohammad, et al. "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks." ECCV 2016

Why Binary? I ⇤ ⇡ )) � )) � B R R R gn( X T W B gn( X T ⇤ W ⇡ R B W B ⇤ W W B = sign(W) Rastegari, Mohammad, et al. "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks." ECCV 2016

Quantization Methods W B = sign(W) W B _' ⇤ W ⇡ 0.75 R B

Optimal Scaling Factor α , W B = J ( α , W B ) arg min J ( α , W B ) = || W − α W B || 2 2

Optimal Scaling Factor α , W B = J ( α , W B ) arg min J ( α , W B ) = || W − α W B || 2 2 α 2 B T B − 2 α B T W + W T W � � J ( α , B ) = tr

Optimal Scaling Factor α , W B = J ( α , W B ) arg min J ( α , W B ) = || W − α W B || 2 2 α 2 B T B − 2 α B T W + W T W � � J ( α , B ) = tr B T W = α 2 n − 2 α · tr � � + constant

Simple Example α 2 − 2 α · b · w + constant arg min b s.t. b ∈ { +1 , − 1 } • Since we know that is always positive then, α b = +1 , if w > 0 b = − 1 , if w < 0 • Or more simply, b = sign( w )

Optimal Scaling Factor W B = sign( W ) • Since then, W T sign( W ) � � || W || ` 1 = tr • Therefore, J ( α ) = α 2 n − 2 α · || W || ` 1 + constant

Optimal Scaling Factor ⇡ computing α R B W B ⇤ W α ∗ , W B ∗ = arg min W B , α {|| W − α W B || 2 } W B ∗ = sign( W ) α ∗ | = 1 n k W k ` 1

How to train a CNN with binary filters? ) ( I ⇤ computing α I ⇤ R ≈ B R R

Naive Solution 1. Train a network with real parameters. 2. Binarize the weight filters.

Naive Solution R W W '.'.'.'' ''.'.'.''' R R

Naive Solution R W W '.'.'.'' ''.'.'.''' R R B W B '.'.'.'' ''.'.'.''' B B

Naive Solution AlexNet'TopX1'(%)'ILSVRC2012' 60' 56.7' 50' 40' 30' 20' 10' 0.2' 0' Full'Precision' '' ' N a ï v e '

Binary Weight Network Binary Weight Network Train f for b binary w y weights: 1. Randomly initialize W 2. For iter = 1 to N R '.'.'.'' ''.'.'.''' R R 3. Load a random input image X W B = sign( W ) 4. α = k W k ` 1 5. n Forward pass with α , W B 6. 7. Compute loss function C ∂ C ∂ W = Backward pass with α , W B 8. Update W ( W = W � ∂ C ∂ W ) 9.

Binary Weight Network Train f for b binary w y weights: 1. Randomly initialize W 2. For iter = 1 to N R R '.'.'.'' '.'.'.'' ''.'.'.''' ''.'.'.''' R R R R 3. Load a random input image X W B = sign( W ) 4. α = k W k ` 1 5. n Forward pass with α , W B 6. 7. Compute loss function C ∂ C ∂ W = Backward pass with α , W B 8. Update W ( W = W � ∂ C ∂ W ) 9.

Binary Weight Network W W Binary Weight Network R '.'.'.'' ''.'.'.''' R R Train f for b binary w y weights: 1. Randomly initialize W W B 2. For iter = 1 to N B '.'.'.'' ''.'.'.''' B B 3. Load a random input image X W B = sign( W ) 4. α = k W k ` 1 5. n Forward pass with α , W B 6. 7. Compute loss function C ∂ C ∂ W = Backward pass with α , W B 8. Update W ( W = W � ∂ C ∂ W ) 9.

Binary Weight Network W W Binary Weight Network R '.'.'.'' ''.'.'.''' R R Train f for b binary w y weights: 1. Randomly initialize W W B 2. For iter = 1 to N B '.'.'.'' ''.'.'.''' B B LOSS$ 3. Load a random input image X W B = sign( W ) 4. α = k W k ` 1 5. n LOSS$ Forward pass with α , W B 6. 7. Compute loss function C ∂ C ∂ W = Backward pass with α , W B 8. Update W ( W = W � ∂ C ∂ W ) 9.

Binary Weight Network W W Binary Weight Network R '.'.'.'' ''.'.'.''' R R Train f for b binary w y weights: 1. Randomly initialize W W B 2. For iter = 1 to N B '.'.'.'' ''.'.'.''' B B LOSS$ 3. Load a random input image X W B = sign( W ) 4. α = k W k ` 1 5. n Forward pass with α , W B 6. 7. Compute loss function C G w ∂ C ∂ W = Backward pass with α , W B 8. R '.'.'.'' ''.'.'.''' R Update W ( W = W � ∂ C R ∂ W ) 9.

Gradients of Binary Weights g ( w ) = f (sign { w } ) = f ( w b ) = ∂ f ( w b ) ∂ g ( w ) ∂ sign( w ) ∂ [ w b ] T ∂ w T ∂ w T +1' ∂ sign( x ) +1' sign( x ) G x ! sign(x) ! X1' +1' ∂ x X1'

Binary Weight Network W W Binary Weight Network R '.'.'.'' ''.'.'.''' R R - � G w Train f for b binary w y weights: W = = W W - 1. Randomly initialize W 2. For iter = 1 to N R '.'.'.'' ''.'.'.''' R R 3. Load a random input image X W B = sign( W ) 4. α = k W k ` 1 5. n Forward pass with α , W B 6. 7. Compute loss function C G w ∂ C ∂ W = Backward pass with α , W B 8. R '.'.'.'' ''.'.'.''' R Update W ( W = W � ∂ C R ∂ W ) 9.

Binary Weight Network AlexNet'TopX1'(%)'ILSVRC2012' 60' 56.8' 56.7' 50' 40' 30' 20' 10' 0.2' 0' '' ' Full'Precision' Naïve' Binary'Weight'

Reminder I ⇤ OperaIons' Memory' ComputaIon' I ⇤ +''−''×' 1x' 1x' R R I ⇤ +''−''' ~32x' ~2x' R B R XNOR' I ⇤ ~32x' ~58x' R B R B BitXcount' XNORXNetworks'

Binary Weight - Binary Input Network R B α B α β ⇡ )) � B )) � B R R X B W B gn( X T ⇤ W

Binary Weight - Binary Input Network R B α B α β ⇡ )) � B )) � B R R W B X B gn( X T ⇤ W Y B Y γ γ Y B ⇡ Y Y B ∗ , γ ∗ = arg min Y B , γ ∥ Y − γ Y B ∥ 2 γ ∗ = 1 Y B ∗ = sign( Y ) n ∥ Y ∥ ℓ 1 X B ∗ = sign( X ) W B ∗ = sign( W ) β ∗ = 1 α ∗ = 1 n ∥ X ∥ ℓ 1 n ∥ W ∥ ℓ 1

Binary Weight - Binary Input Network (1) Binarizing Weights B R = (2) Binarizing Input Redundant computation in overlapping areas = R B Inefficient = sign( X ) X (2) Binarizing Input � | X : , : , i | = = B Efficient c sign( X ) Average Filter c" (3) Convolution with XNOR-Bitcount ≈ R B R B sign( X )

Deep Learning for Mobile Part II Instructor - Simon Lucey 16-623 - - PowerPoint PPT Presentation

Deep Learning for Mobile Part II Instructor - Simon Lucey 16-623 - Designing Computer Vision Apps Today Motivation XNOR Networks YOLO State of the Art Recognition Methods State of the art recognition methods

Deep Learning for Mobile Part I Instructor - Simon Lucey 16-623 - Designing Computer Vision Apps

Deep Learning Tutorial Part II Greg Shakhnarovich TTI-Chicago December 2016 Deep Learning

Deep Learning Tutorial Part I Greg Shakhnarovich TTI-Chicago December 2016 Deep Learning

Basics of Machine Learning and Deep Learning (Part I) Machine Learning Tom Mitchell: An

It's Polleverywhere Time! Introduction Mobile learning Mobile learning is the use of mobile

Crowdsourcing of Weather Data on Mobile App and Deep Learning Lior Perez 99th AMS annual

FlexDNN: Input-Adaptive On-Device Deep Learning for Efficient Mobile Vision ACM/IEEE Symposium on

N. Lane et al. l. DeepX: A Software Accelerator for Low Power Deep Learning In Inference on

Unsupervised Deep Learning Tutorial - Part 2 Alex Graves MarcAurelio Ranzato

DeepIntent : Deep Icon-Behavior Learning for Detecting Intention-Behavior Discrepancy in Mobile

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Mobile AR/VR with Edge-based Deep Learning Jiasi Chen Department of Computer Science &

What is mobile learning, mobile learning policies and technologies Dr. Mohamed Ally Learning

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Machine learning from a complexity point of view Artemy Kolchinsky SFI CSSS 2019 1

Deep Learning: Part 2 Graduate School of Culture Technology, KAIST Juhan Nam Outlines

On Fair Selection in the Presence of Implicit Variance Emelianov, Gast, Gummadi, Loiseau (EC 2020)

Computer Systems Lecture 15 Pipelining and Hazards CS 230 - Spring 2020 3-1 Pipelining CS

NLO QCD corrections to Wb b/Zb b production at hadron colliders Laura Reina RADCOR 07,

BUILDING INCLUSIVE ECONOMIES Advancing the research agenda Kay McGowan Global Development Lab

I can represent the multiplication visually by drawing the vector. National 5 Slides WB 29th Jan

CSEE 3827: Fundamentals of Computer Systems Lecture 21 and 22 April 22 and 27, 2009 Martha Kim

Dependences and Hazards Lecture 17 CS301 Administrative Daily Review of todays lecture

Midnight Laundry 6 PM 7 8 9 10 11 12

Deep Learning for Mobile Part II Instructor - Simon Lucey 16-623 - - PowerPoint PPT Presentation

Deep Learning for Mobile Part II Instructor - Simon Lucey 16-623 - Designing Computer Vision Apps Today Motivation XNOR Networks YOLO State of the Art Recognition Methods State of the art recognition methods

Deep Learning for Mobile Part I Instructor - Simon Lucey 16-623 - Designing Computer Vision Apps

Deep Learning Tutorial Part II Greg Shakhnarovich TTI-Chicago December 2016 Deep Learning

Deep Learning Tutorial Part I Greg Shakhnarovich TTI-Chicago December 2016 Deep Learning

Basics of Machine Learning and Deep Learning (Part I) Machine Learning Tom Mitchell: An

It's Polleverywhere Time! Introduction Mobile learning Mobile learning is the use of mobile

Crowdsourcing of Weather Data on Mobile App and Deep Learning Lior Perez 99th AMS annual

FlexDNN: Input-Adaptive On-Device Deep Learning for Efficient Mobile Vision ACM/IEEE Symposium on

N. Lane et al. l. DeepX: A Software Accelerator for Low Power Deep Learning In Inference on

Unsupervised Deep Learning Tutorial - Part 2 Alex Graves MarcAurelio Ranzato

DeepIntent : Deep Icon-Behavior Learning for Detecting Intention-Behavior Discrepancy in Mobile

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Mobile AR/VR with Edge-based Deep Learning Jiasi Chen Department of Computer Science &amp;

What is mobile learning, mobile learning policies and technologies Dr. Mohamed Ally Learning

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Machine learning from a complexity point of view Artemy Kolchinsky SFI CSSS 2019 1

Deep Learning: Part 2 Graduate School of Culture Technology, KAIST Juhan Nam Outlines

On Fair Selection in the Presence of Implicit Variance Emelianov, Gast, Gummadi, Loiseau (EC 2020)

Computer Systems Lecture 15 Pipelining and Hazards CS 230 - Spring 2020 3-1 Pipelining CS

NLO QCD corrections to Wb b/Zb b production at hadron colliders Laura Reina RADCOR 07,

BUILDING INCLUSIVE ECONOMIES Advancing the research agenda Kay McGowan Global Development Lab

I can represent the multiplication visually by drawing the vector. National 5 Slides WB 29th Jan

CSEE 3827: Fundamentals of Computer Systems Lecture 21 and 22 April 22 and 27, 2009 Martha Kim

Dependences and Hazards Lecture 17 CS301 Administrative Daily Review of todays lecture

Midnight Laundry 6 PM 7 8 9 10 11 12

Mobile AR/VR with Edge-based Deep Learning Jiasi Chen Department of Computer Science &