Jumpout : Improved Dropout for Deep Neural Networks with ReLUs - - PowerPoint PPT Presentation

jumpout improved dropout for deep neural networks with
SMART_READER_LITE
LIVE PREVIEW

Jumpout : Improved Dropout for Deep Neural Networks with ReLUs - - PowerPoint PPT Presentation

Jumpout : Improved Dropout for Deep Neural Networks with ReLUs Shengjie Wang*, Tianyi Zhou*, Jeff A. Bilmes University of Washington, Seattle Dropout has a few Drawbacks... Dropout encourages DNNs to apply the same linear model to different


slide-1
SLIDE 1

Jumpout : Improved Dropout for Deep Neural Networks with ReLUs

Shengjie Wang*, Tianyi Zhou*, Jeff A. Bilmes University of Washington, Seattle

slide-2
SLIDE 2

Dropout has a few Drawbacks...

  • Dropout encourages DNNs to apply the same linear model to

different data points but does not enforce local smoothness.

  • Dropping zeros has no effects but still counts in drop rates.
  • Dropout does not work well with BatchNorm.

B D

B D

slide-3
SLIDE 3

Dropout has a few Drawbacks...

  • Dropout encourages DNNs to apply the same linear model to

different data points but does not enforce local smoothness.

  • Dropping zeros has no effects but still counts in drop rates.
  • Dropout does not work well with BatchNorm.
  • Jumpout improves dropout with three modifications

with (almost) no extra computation/memory costs.

B D

slide-4
SLIDE 4

Jumpout Modification I – Encourage Local Smoothness

  • Instead of applying a constant

dropout rate, the dropout rate is sampled from the positive part of a gaussian distribution, and the standard deviation is used to control the strength of regularization.

Data point Row of W Monotone dropout rate Constant dropout rate

slide-5
SLIDE 5

Jumpout Modification II

  • Better Control of Regularization
  • The dropout rate is

normalized by the proportion of active neurons

  • f the input layer so that we

can better control the regularization for different layers and for different training stages.

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97

ReLU Activation Portion Epoch

conv 1 conv 2 conv 3 fc 1

Portion of active neurons across training epochs for different layers.

slide-6
SLIDE 6

Jumpout Modification III

  • Synergize well with Batchnorm
  • The rescaling factor for training is changed to 1 – 𝑞 $%.'( to

account for both the changes of the mean and the variance.

0.8 0.85 0.9 0.95 1 1.05 1.1 1.15 1.2 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97

Mean Ratio Epoch

(1 - p)^(-1) (1 - p)^(-0.5) (1 - p)^(-0.75) 0.8 0.85 0.9 0.95 1 1.05 1.1 1.15 1.2 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97

Variance Ratio Epoch

(1 - p)^(-1) (1 - p)^(-0.5) (1 - p)^(-0.75)

Changes of Mean (left) and Variance (right) when applying various rescaling factors. Blue: Dropout Grey: Jumpout

slide-7
SLIDE 7

Results

STL10

slide-8
SLIDE 8

Thank you!

  • For more details, please come to our poster session

Tuesday 06:30 - 09:00 PM Pacific Ballroom #29