Jumpout : Improved Dropout for Deep Neural Networks with ReLUs - PowerPoint PPT Presentation

Sep 09, 2022 •256 likes •356 views

Jumpout : Improved Dropout for Deep Neural Networks with ReLUs Shengjie Wang, Tianyi Zhou, Jeff A. Bilmes University of Washington, Seattle Dropout has a few Drawbacks... Dropout encourages DNNs to apply the same linear model to different

Jumpout : Improved Dropout for Deep Neural Networks with ReLUs Shengjie Wang*, Tianyi Zhou*, Jeff A. Bilmes University of Washington, Seattle
Dropout has a few Drawbacks... • Dropout encourages DNNs to apply the same linear model to different data points but does not enforce local smoothness. • Dropping zeros has no effects but still counts in drop rates. • Dropout does not work well with BatchNorm. B D B D
Dropout has a few Drawbacks... • Dropout encourages DNNs to apply the same linear model to different data points but does not enforce local smoothness. • Dropping zeros has no effects but still counts in drop rates. • Dropout does not work well with BatchNorm. B D • Jumpout improves dropout with three modifications with (almost) no extra computation/memory costs.
Jumpout Modification I – Encourage Local Smoothness • Instead of applying a constant dropout rate, the dropout rate is sampled from the positive part of a gaussian distribution, and the standard deviation is used to control the strength of regularization. Data point Monotone dropout rate Row of W Constant dropout rate
Jumpout Modification II - Better Control of Regularization • The dropout rate is 0.8 normalized by the 0.7 proportion of active neurons 0.6 ReLU Activation Portion of the input layer so that we 0.5 can better control the 0.4 regularization for different 0.3 layers and for different 0.2 training stages. 0.1 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 Epoch conv 1 conv 2 conv 3 fc 1 Portion of active neurons across training epochs for different layers.
Jumpout Modification III - Synergize well with Batchnorm • The rescaling factor for training is changed to 1 – 𝑞 $%.'( to account for both the changes of the mean and the variance. 1.2 1.2 1.15 1.15 1.1 1.1 1.05 1.05 Variance Ratio Mean Ratio 1 1 0.95 0.95 0.9 0.9 0.85 0.85 0.8 0.8 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 Epoch Epoch (1 - p)^(-1) (1 - p)^(-0.5) (1 - p)^(-0.75) (1 - p)^(-1) (1 - p)^(-0.5) (1 - p)^(-0.75) Changes of Mean (left) and Variance (right) when applying various rescaling factors. Blue: Dropout Grey: Jumpout
Results STL10
Thank you! • For more details, please come to our poster session Tuesday 06:30 - 09:00 PM Pacific Ballroom #29

Recommend

Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th eodore

Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th eodore Bluche Christopher Kermorvant J er ome Louradour tb@a2ia.com , jl@a2ia.com 1/22 Dropout improves Recurrent Neural Networks for Handwriting

529 views • 24 slides

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Feed-forward Networks Network Training Error Backpropagation Deep Learning Feed-forward Networks Network Training Error Backpropagation Deep Learning Neural Networks Neural networks arise from attempts to model Neural Networks

380 views • 9 slides

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural Networks can represent complex decision boundaries decision boundaries Variable size. Any boolean function can be Variable size. Any boolean

358 views • 14 slides

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Deep Neural Networks and Deep Reinforcement Learning Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and Courville [chapt. 6,7,8]; AIMA [sect. 21.1-21.3]; Sutton and Barto, Reinforcement Learning: an

528 views • 35 slides

AMMI Introduction to Deep Learning 6.3. Dropout Fran cois Fleuret

AMMI Introduction to Deep Learning 6.3. Dropout Fran cois Fleuret https://fleuret.org/ammi-2018/ Wed Aug 29 16:57:55 CAT 2018 COLE POLYTECHNIQUE FDRALE DE LAUSANNE A first deep regularization technique is dropout (Srivastava

684 views • 29 slides

Deep learning 6.3. Dropout Fran cois Fleuret https://fleuret.org/ee559/ Nov 2, 2020 A first

Deep learning 6.3. Dropout Fran cois Fleuret https://fleuret.org/ee559/ Nov 2, 2020 A first deep regularization technique is dropout (Srivastava et al., 2014). It consists of removing units at random during the forward pass on each

457 views • 27 slides

Deep learning 6.3. Dropout Fran cois Fleuret https://fleuret.org/dlc/ Dec 20, 2020 A first

Deep learning 6.3. Dropout Fran cois Fleuret https://fleuret.org/dlc/ Dec 20, 2020 A first deep regularization technique is dropout (Srivastava et al., 2014). It consists of removing units at random during the forward pass on each

765 views • 27 slides

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks and Handwriting Recognition Steven Sloss Math 164 Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven Sloss Structure Training Neural Networks Math 164 Motivation Problem

889 views • 41 slides

Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural

Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks Alexander Shevchenko Marco Mondelli Neural Network Training From theoretical perspective training of neural networks is di ffi cult

494 views • 28 slides

Follow the Leader with Dropout Perturbations Tim van Erven COLT, 2014 Joint work with: Wojciech

Follow the Leader with Dropout Perturbations Tim van Erven COLT, 2014 Joint work with: Wojciech Kotowski Manfred Warmuth Neural Network Neural Network Dropout Training Stochastic gradient descent Randomly remove every hidden/input

442 views • 20 slides

Deep Learning Primer Nishith Khandwala Neural Networks Overview Neural Network Basics

Deep Learning Primer Nishith Khandwala Neural Networks Overview Neural Network Basics Activation Functions Stochastic Gradient Descent (SGD) Regularization (Dropout) Training Tips and Tricks Neural Network (NN)

828 views • 45 slides

Deep Learning with Neural Networks The Structure and Optimization of Deep Neural Networks Allan

Deep Learning with Neural Networks The Structure and Optimization of Deep Neural Networks Allan Zelener Machine Learning Reading Group January 7 th 2016 The Graduate Center, CUNY Objectives Explain some of the trends of deep learning and

680 views • 44 slides

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova November 21, 2016 Neural Networks 2/20 Neural Networks 3/20 Neural Networks Neural computing requires a number of neurons , to be connected

813 views • 21 slides

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Sequential Data with Neural Networks Recurrent Neural

303 views • 4 slides

Optimizing Deep Neural Networks Leena Chennuru Vankadara 26-10-2015 Table of Contents Neural

Optimizing Deep Neural Networks Leena Chennuru Vankadara 26-10-2015 Table of Contents Neural Networks and loss surfaces Problems of Deep Architectures Optimization in Neural Networks Under fitting Proliferation of Saddle

1.67k views • 41 slides

Dropout as a Structured Shrinkage Prior Eric Nalisnick , Jos Miguel Hernndez-Lobato , Padhraic

Dropout as a Structured Shrinkage Prior Eric Nalisnick , Jos Miguel Hernndez-Lobato , Padhraic Smyth University of Cambridge University of California, Irvine Dropout & Multiplicative Noise (2012) Standard Neural After

328 views • 21 slides

Coronavirus and mental health Justin A. Chen, MD, MPH February 25, 2020 Ambiguity fuels anxiety

Coronavirus and mental health Justin A. Chen, MD, MPH February 25, 2020 Ambiguity fuels anxiety Headlines outside of Asia Headlines in the U.S. Stigma is not a black or white issue Japan - the hashtag Singapore Hong Kong, South

500 views • 11 slides

BUSY POLLING Netdev 2.1 Past, Present, Future Presented by : Eric Dumazet @ Google But many

BUSY POLLING Netdev 2.1 Past, Present, Future Presented by : Eric Dumazet @ Google But many many thanks to all contributors over the years : Jesse Brandeburg, Eliezer Tamir, Alexander Duyck, Sridhar Samudrala, Willem de Bruijn, Paolo

479 views • 29 slides

Panel Discussion: In-vehicle Technology to Address Distracted Driving Moderator: Peter Appel ,

Panel Discussion: In-vehicle Technology to Address Distracted Driving Moderator: Peter Appel , Administrator Research and Innovative Technology Administration, U.S. DOT Speakers: Eric Collins, JD Chief Operating Officer Mobile Posse

363 views • 4 slides

SAT and SMT algorithms Paul Jackson School of Informatics University of Edinburgh Formal

SAT and SMT algorithms Paul Jackson School of Informatics University of Edinburgh Formal Verification Spring 2018 Basic question Given a propositional logic formula, is it satisfiable? Standard to always put formulas into Conjunctive Normal

509 views • 31 slides

Grammar as a Foreign Language Authors:- Oriol Vinyals, Lukasz Kaiser, Terry Koo, Slav Petrov,

Grammar as a Foreign Language Authors:- Oriol Vinyals, Lukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, Geoffrey Hinton Presented by:- Ved Upadhyay PaperLink:-https://papers.nips.cc/paper/5635-grammar-as-a-foreign- language.pdf Contents

1.23k views • 21 slides

Creating Connections FOSTERING POSITIVE RELATIONSHIPS BETWEEN SCHOOL-BASED HEALTH CENTERS AND

Creating Connections FOSTERING POSITIVE RELATIONSHIPS BETWEEN SCHOOL-BASED HEALTH CENTERS AND SCHOOLS Presenter: Jade Williamson Healthy Schools Specialist Division of Student Equity and Opportunity Denver Public Schools 720-423-3748

378 views • 27 slides

L 2 Reg u lari z ation Techniq u e u sing Keras IN TR OD U C TION TO TE N SOR FL OW IN R

L 2 Reg u lari z ation Techniq u e u sing Keras IN TR OD U C TION TO TE N SOR FL OW IN R Colleen Bobbie Instr u ctor The o v erfitting challenge Training data : Testing Data : Small Variance ( Good !) Large Variance ( Bad !) INTRODUCTION TO

441 views • 24 slides

Matolo Nyamai GIS/Soil & Water Engineering Kenya Agricultural and Livestock research

Matolo Nyamai GIS/Soil & Water Engineering Kenya Agricultural and Livestock research Organization (KALRO) Research, Capacity Development AgriGIS Workshop and Think Tank meetings, Nairobi, Kenya KALRO Background A corporate body created

274 views • 8 slides