BranchyNet: Fast Inference via Early Exiting from Deep Neural - PowerPoint PPT Presentation

BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks ICPR 2016 Surat Teerapittayanon Brad McDanel H. T. Kung Harvard John A. Paulson School of Engineering and Applied Sciences 1

outline Motivation and Background Trend towards deeper networks Auxiliary network structures (GoogLeNet) BranchyNet Architecture Training Inference Experimental Results Future Work Conclusion BranchyNet with 3 exits 2

trend towards deeper networks Accuracy vs. Depth (ILSVRC workshop - Kaiming He) 3

auxiliary networks GoogLeNet introduces auxiliary networks Provide regularization to main network Improves accuracy ≈ 1 % Removed after training Only main network is used during inference Can we leverage auxiliary networks to address inference runtime of deeper networks? Section of GoogLeNet 4

branchynet Easier input samples require lower level features for correct classification Harder input samples require higher level features Use early exit branches (auxiliary networks) to classify easier samples No computation performed at higher layers Requires mechanism for determining network confidence about a sample to use exit Jointly training the main and early exit branches improves the quality of lower branches Allowing more samples to exit at earlier points BranchyNet (LeNet) 5

Reaches Exit 1 Determined “confident” Classifies sample No additional work performed at upper layers branchynet example: easy sample New sample enters the network 6

Determined “confident” Classifies sample No additional work performed at upper layers branchynet example: easy sample New sample enters the network Reaches Exit 1 Confident? 6

Classifies sample No additional work performed at upper layers branchynet example: easy sample New sample enters the network Reaches Exit 1 Determined “confident” Confident? Yes 6

branchynet example: easy sample New sample enters the network Reaches Exit 1 Determined “confident” Classifies sample No additional work performed at Confident? Yes upper layers 0 6

Reaches Exit 1 Determined “not confident” Continues up the main network (no re-computation of lower layers) Must exit (classify sample) as Exit 2 is final exit point branchynet example: hard sample New sample enters the network 7

Determined “not confident” Continues up the main network (no re-computation of lower layers) Must exit (classify sample) as Exit 2 is final exit point branchynet example: hard sample New sample enters the network Reaches Exit 1 Confident? 7

Continues up the main network (no re-computation of lower layers) Must exit (classify sample) as Exit 2 is final exit point branchynet example: hard sample New sample enters the network Reaches Exit 1 Determined “not confident” Confident? No 7

Must exit (classify sample) as Exit 2 is final exit point branchynet example: hard sample New sample enters the network Reaches Exit 1 Determined “not confident” Continues up the main network (no re-computation of lower layers) 7

branchynet example: hard sample New sample enters the network 7 Reaches Exit 1 Determined “not confident” Continues up the main network (no re-computation of lower layers) Must exit (classify sample) as Exit 2 is final exit point 7

measuring network confidence Use entropy of softmax output to measure confidence entropy ( y ) = y c log y c , ∑ c ∈C where y is a vector containing computed probabilities for all possible class labels and C is a set of all possible labels Choice of entropy versus other measures Exit 1 Softmax Output 8

branchynet training Pretrain main network first Add exit branches and train again The final loss function is the weighted sum of losses of all exits N L branchynet (ˆ y , y ; θ ) = w n L (ˆ y exit n , y ; θ ) , ∑ n = 1 where N is the total number of exit points Early exit weights W 1 .. N − 1 = 1 Last exit weight W N = 0 . 3 9

branchynet inference 1: procedure BranchyNetFastInference( x , T ) for n = 1 .. N do 2: z = f exit n ( x ) 3: y = softmax ( z ) 4: ˆ e = entropy (ˆ y ) 5: if e < T n then 6: return arg max ˆ y 7: return arg max ˆ y 8: Figure: BranchyNet Fast Inference Algorithm. x is an input sample, T is a vector where the n-th entry T n is the threshold for determining whether to exit a sample at the n-th exit point, and N is the number of exit points of the network. 10

networks and datasets Network Architectures LeNet (on MNIST) AlexNet (on CIFAR-10) Branchy-LeNet Branchy-AlexNet 11

results Points on the curve found by sweeping over values of T In the case of more than one early exit, we take combinations of T i values Accuracy improvement over baseline network (red diamond) due to joint training Runtime improvements over baseline network due to classifying the majority of samples at early exit points (no computation performed for higher layers) As T values increase, more samples exit at the higher exit branches 12

future work Automatically find the threshold values T for each exit branch Investigate alternative confidence measures other than softmax entropy (e.g., OpenMax, GANs) Dynamically adjusting the weight of loss based on individual samples Easier samples have more weight at lower branches Harder samples have more weight at higher branches 13

conclusion Introduce a mechanism to exit a percentage of samples at earlier points in the network Jointly training these exit points improves accuracy which allows additional samples to exit early Achieve a factor of 2-4x speedup compared to baseline single network for our test case BranchyNet implementation written in Chainer and open source: https://gitlab.com/htkung/branchynet 14

Thanks for your attention! Comments and Questions? 15

results table Table: Selected performance results for BranchyNet on the different network structures. The BrachyNet rows correspond to the knee points (denoted as green stars in the previous slides). Network Acc. (�) Time (ms) GainThrshld. T Exit (�) LeNet 99.20 3.37 - - - B-LeNet 99.25 0.62 5.4x 0.025 94.3, 5.63 AlexNet 78.38 9.56 - - - CPU B-AlexNet79.19 6.32 1.5x 0.0001, 0.0565.6, 25.2, 9.2 LeNet 99.20 1.58 - - - B-LeNet 99.25 0.34 4.7x 0.025 94.3, 5.63 GPU AlexNet 78.38 3.15 - - - B-AlexNet79.19 1.30 2.4x 0.0001, 0.0565.6, 25.2, 9.2 16

BranchyNet: Fast Inference via Early Exiting from Deep Neural - PowerPoint PPT Presentation

BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks ICPR 2016 Surat Teerapittayanon Brad McDanel H. T. Kung Harvard John A. Paulson School of Engineering and Applied Sciences 1 outline Motivation and Background Trend

Exiting from QE Fumio Hayashi and Junko Koeda for presentation at SF Fed Conference March 28,

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

DEEP INTO TRTIS: BERT PRACTICAL DEPLOYMENT ON NVIDIA GPU Xu Tianhao, Deep Learning Solution

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Brief Summary of Early Retirement Early Retirement Incentive Plans Highlights of the Early Out

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Learning Deep Generative Models Inference & Representation Lecture 12 Rahul G. Krishnan

Distributed Deep Learning Inference using Apache MXNet* and Apache Spark Naveen Swamy Amazon AI

Post-Selection Inference Todd Kuffner Washington University in St. Louis PhyStat 2016

Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard

Type Inference 75 Definition Type Inference Type inference = Java compiler's ability

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

... - Chkago . Pittsburgh Now York Gty and c .... HoriMmCoIot.do ento ~ Refueling

American International Group, Inc. Third Quarter 2012 Results Conference Call Presentation

PT Considerations for the Nonoperatively Treated Proximal Humerus Fractures John Cavanaugh PT

WELCOME TO THE 2016 BMGA BANQUET 2016 BMGA Board Brad Kadue President* Roger Hamm -

Early Learners on the autism spectrum Marleen Westerveld, PhD Project Team Griffith University:

Hilberts Nullstellensatz, Linear Algebra and Combinatorial Problems Susan Margulies

The Welfare Impact of Global Migration in the OECD Countries A. Aubry and M. Burzy ski

IPA Assembly 2017 Presentation for Chapter Creator God, you are luring us into a future more

BranchyNet: Fast Inference via Early Exiting from Deep Neural - PowerPoint PPT Presentation

BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks ICPR 2016 Surat Teerapittayanon Brad McDanel H. T. Kung Harvard John A. Paulson School of Engineering and Applied Sciences 1 outline Motivation and Background Trend

Exiting from QE Fumio Hayashi and Junko Koeda for presentation at SF Fed Conference March 28,

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

DEEP INTO TRTIS: BERT PRACTICAL DEPLOYMENT ON NVIDIA GPU Xu Tianhao, Deep Learning Solution

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Brief Summary of Early Retirement Early Retirement Incentive Plans Highlights of the Early Out

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Learning Deep Generative Models Inference &amp; Representation Lecture 12 Rahul G. Krishnan

Distributed Deep Learning Inference using Apache MXNet* and Apache Spark Naveen Swamy Amazon AI

Post-Selection Inference Todd Kuffner Washington University in St. Louis PhyStat 2016

Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard

Type Inference 75 Definition Type Inference Type inference = Java compiler's ability

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

... - Chkago . Pittsburgh Now York Gty and c .... HoriMmCoIot.do ento ~ Refueling

American International Group, Inc. Third Quarter 2012 Results Conference Call Presentation

PT Considerations for the Nonoperatively Treated Proximal Humerus Fractures John Cavanaugh PT

WELCOME TO THE 2016 BMGA BANQUET 2016 BMGA Board Brad Kadue President* Roger Hamm -

Early Learners on the autism spectrum Marleen Westerveld, PhD Project Team Griffith University:

Hilberts Nullstellensatz, Linear Algebra and Combinatorial Problems Susan Margulies

The Welfare Impact of Global Migration in the OECD Countries A. Aubry and M. Burzy ski

IPA Assembly 2017 Presentation for Chapter Creator God, you are luring us into a future more

Learning Deep Generative Models Inference & Representation Lecture 12 Rahul G. Krishnan