Shallow-Deep Networks: Understanding and Mitigating Network - PowerPoint PPT Presentation

Shallow-Deep Networks: Understanding and Mitigating Network Overthinking Yiğitcan Kaya , Sanghyun Hong, Tudor Dumitraș University of Maryland, College Park ICML 2019 - Long Beach, CA

What is overthinking? We, especially grad students , often think more than needed to solve a problem. Shallow-Deep Networks: Understanding and Mitigating Network Overthinking

What is overthinking? We, especially grad students , often think more than needed to solve a problem. i. Wastes our valuable energy ( wasteful ) Shallow-Deep Networks: Understanding and Mitigating Network Overthinking

What is overthinking? We, especially grad students , often think more than needed to solve a problem. i. Wastes our valuable energy ( wasteful ) ii. Causes us to make mistakes ( destructive ) Shallow-Deep Networks: Understanding and Mitigating Network Overthinking

Do deep neural networks overthink too? Without requiring the full depth, DNNs can correctly classify the majority of samples. Experiments on four recent CNNs and three common image classification tasks Shallow-Deep Networks: Understanding and Mitigating Network Overthinking

Do deep neural networks overthink too? Without requiring the full depth, DNNs can correctly classify the majority of samples. i. Wastes computation for up to 95% of the samples ( wasteful ) Shallow-Deep Networks: Understanding and Mitigating Network Overthinking

Do deep neural networks overthink too? Without requiring the full depth, DNNs can correctly classify the majority of samples. i. Wastes computation for up to 95% of the samples ( wasteful ) ii. Occurs in ~50% of all misclassifications ( destructive ) Shallow-Deep Networks: Understanding and Mitigating Network Overthinking

How do we detect overthinking? Internal classifiers allow us to observe whether the DNN correctly classifies the sample at an earlier layer. Shallow-Deep Networks: Understanding and Mitigating Network Overthinking

How do we detect overthinking? Internal classifiers allow us to observe whether the DNN correctly classifies the sample at an earlier layer. ➢ Our generic Shallow-Deep Network (SDN) modification introduces internal classifiers to DNNs. Shallow-Deep Networks: Understanding and Mitigating Network Overthinking

The SDN modification Internal Layers Final Classifier Final conv1 conv2 conv3 conv4 full Input Prediction FR Original CNN Internal Classifier SDN modification full Internal Prediction Applied to VGG, ResNet, WideResNet and MobileNet. Shallow-Deep Networks: Understanding and Mitigating Network Overthinking

The SDN modification Challenge How to train accurate internal classifiers? Shallow-Deep Networks: Understanding and Mitigating Network Overthinking

The SDN modification Challenge How to train accurate internal classifiers? Prior Work Claims this hurts the accuracy in off-the-shelf DNNs Proposes a unique architecture [1] [1] Huang, Gao, et al. "Multi-scale dense convolutional networks for efficient prediction." ICLR 2018 Shallow-Deep Networks: Understanding and Mitigating Network Overthinking

The SDN modification Challenge How to train accurate internal classifiers? Results Our modification often improves the original accuracy by up to 10% . ( See our poster ) Shallow-Deep Networks: Understanding and Mitigating Network Overthinking

The wasteful effect of overthinking Horse ✔ conv1 conv2 conv3 conv4 full Input FR Wasteful for the correct classification full Horse ✔ Shallow-Deep Networks: Understanding and Mitigating Network Overthinking

The wasteful effect of overthinking Challenge How can we know where in the DNN to stop? Shallow-Deep Networks: Understanding and Mitigating Network Overthinking

The wasteful effect of overthinking Challenge How can we know where in the DNN to stop? Our Solution Classification confidence of the internal classifiers Shallow-Deep Networks: Understanding and Mitigating Network Overthinking

The wasteful effect of overthinking Our Solution Classification confidence of the internal classifiers Results A confidence-based early exit scheme reduces the average inference cost by up to 50% . ( See our poster ) Shallow-Deep Networks: Understanding and Mitigating Network Overthinking

The destructive effect of overthinking Dog X conv1 conv2 conv3 conv4 full Input Destructive for the FR correct classification full Horse ✔ Shallow-Deep Networks: Understanding and Mitigating Network Overthinking

The destructive effect causes disagreement Dog X conv1 conv2 conv3 conv4 full Input FR full Horse ✔ Shallow-Deep Networks: Understanding and Mitigating Network Overthinking

The destructive effect causes disagreement Challenge How can we quantify the internal disagreement? Our Solution The confusion metric Shallow-Deep Networks: Understanding and Mitigating Network Overthinking

The destructive effect causes disagreement Our Solution The confusion metric? Results Confusion indicates whether a misclassification is likely. Confusion is a reliable error indicator. ( See our poster ) Shallow-Deep Networks: Understanding and Mitigating Network Overthinking

The destructive effect causes disagreement Our Solution The confusion metric? Results Backdoor attacks [2] also increase the confusion of the victim DNN for malicious samples. ( See our poster ) [2] Gu, Tianyu, et al. "BadNets: Evaluating Backdooring Attacks on Deep Neural Networks." IEEE Access 7 (2019): 47230-47244. Shallow-Deep Networks: Understanding and Mitigating Network Overthinking

Implications • Eliminating overthinking would lead to a significant boost in accuracy and inference-time. • We need DNNs that can adjust their complexity based on the required feature complexity. Shallow-Deep Networks: Understanding and Mitigating Network Overthinking

For more details, visit our website http://shallowdeep.network Thank you! Don’t overthink! Come and see our poster! Pacific Ballroom – Poster #24 – 06:30-09:00 PM Shallow-Deep Networks: Understanding and Mitigating Network Overthinking

Shallow-Deep Networks: Understanding and Mitigating Network - PowerPoint PPT Presentation

Shallow-Deep Networks: Understanding and Mitigating Network Overthinking Yiitcan Kaya , Sanghyun Hong, Tudor Dumitra University of Maryland, College Park ICML 2019 - Long Beach, CA What is overthinking? We, especially grad students , often

Shallow vs. deep networks Restricted Boltzmann Machines Shallow : one hidden layer Features

E9 205 Machine Learning for Signal Processing Understanding Deep Networks 08-11-2019 Instructor

GEOTHERMAL SYSTEMS AND TECHNOLOGIES 5. SHALLOW GEOTHERMAL SYSTEMS 5. SHALLOW GEOTHERMAL SYSTEMS

XX Shan Deep Somehistory 80 S N N 2018 Approximation Theory is depth better why Shallow

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Deep and Shallow Embeddings in Coq Danil Annenkov Bas Spitters Aarhus University, Concordium

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

SHALLOW WATER BATHYMETRY WITH AN SHALLOW WATER BATHYMETRY WITH AN INCOHERENT X- -BAND RADAR

1.25 1.25 Moz Moz HIGH HIGH - GRADE, SHALLOW GRADE, SHALLOW WA GOLD PROJECT WA GOLD PROJECT

High resolution 3-D LES simulations: a link between shallow and deep cumulus? Jon Petch Met

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning for Shallow Sequencing Johnny Israeli Nvidia Genomics Group GTC 2018 1 Talk

Question-Answering: Shallow & Deep Techniques for NLP Deep Processing Techniques for NLP

Question-Answering: Shallow & Deep Techniques for NLP Ling571 Deep Processing Techniques

Deep Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Deep Networks Need for

Deep Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Deep Networks Need for

Generic and Robust Localization of Multi-Dimensional Root Causes Zeyan Li , Chengyang Luo, Yiwei

OOPSLA 2004, Vancouver 1 Introduction: generic types in Java 1.5 The

Queue 7 January 2019 OSU CSE 1 Queue The Queue component family allows you to manipulate

2/16/2014 The unstable overdose patient The unstable overdose patient Craig Smollin MD

Generic types Measuring Efficiency Minesweeper start-up Grader comments for JUnit and

Comparative Study of Generic Programming Features in Object-Oriented Languages 1 Julia Belyakova

The Discrete Logarithm Problem with Preprocessing Henry Corrigan-Gibbs and Dmitry Kogan Stanford

Chapter 19 Generics CS165 Colorado State University Original slides by Daniel Liang Modified

Shallow-Deep Networks: Understanding and Mitigating Network - PowerPoint PPT Presentation

Shallow-Deep Networks: Understanding and Mitigating Network Overthinking Yiitcan Kaya , Sanghyun Hong, Tudor Dumitra University of Maryland, College Park ICML 2019 - Long Beach, CA What is overthinking? We, especially grad students , often

Shallow vs. deep networks Restricted Boltzmann Machines Shallow : one hidden layer Features

E9 205 Machine Learning for Signal Processing Understanding Deep Networks 08-11-2019 Instructor

GEOTHERMAL SYSTEMS AND TECHNOLOGIES 5. SHALLOW GEOTHERMAL SYSTEMS 5. SHALLOW GEOTHERMAL SYSTEMS

XX Shan Deep Somehistory 80 S N N 2018 Approximation Theory is depth better why Shallow

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Deep and Shallow Embeddings in Coq Danil Annenkov Bas Spitters Aarhus University, Concordium

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

SHALLOW WATER BATHYMETRY WITH AN SHALLOW WATER BATHYMETRY WITH AN INCOHERENT X- -BAND RADAR

1.25 1.25 Moz Moz HIGH HIGH - GRADE, SHALLOW GRADE, SHALLOW WA GOLD PROJECT WA GOLD PROJECT

High resolution 3-D LES simulations: a link between shallow and deep cumulus? Jon Petch Met

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning for Shallow Sequencing Johnny Israeli Nvidia Genomics Group GTC 2018 1 Talk

Question-Answering: Shallow &amp; Deep Techniques for NLP Deep Processing Techniques for NLP

Question-Answering: Shallow &amp; Deep Techniques for NLP Ling571 Deep Processing Techniques

Deep Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Deep Networks Need for

Deep Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Deep Networks Need for

Generic and Robust Localization of Multi-Dimensional Root Causes Zeyan Li , Chengyang Luo, Yiwei

OOPSLA 2004, Vancouver 1 Introduction: generic types in Java 1.5 The

Queue 7 January 2019 OSU CSE 1 Queue The Queue component family allows you to manipulate

2/16/2014 The unstable overdose patient The unstable overdose patient Craig Smollin MD

Generic types Measuring Efficiency Minesweeper start-up Grader comments for JUnit and

Comparative Study of Generic Programming Features in Object-Oriented Languages 1 Julia Belyakova

The Discrete Logarithm Problem with Preprocessing Henry Corrigan-Gibbs and Dmitry Kogan Stanford

Chapter 19 Generics CS165 Colorado State University Original slides by Daniel Liang Modified

Question-Answering: Shallow & Deep Techniques for NLP Deep Processing Techniques for NLP

Question-Answering: Shallow & Deep Techniques for NLP Ling571 Deep Processing Techniques