Jumpout : Improved Dropout for Deep Neural Networks with ReLUs - - PowerPoint PPT Presentation
Jumpout : Improved Dropout for Deep Neural Networks with ReLUs - - PowerPoint PPT Presentation
Jumpout : Improved Dropout for Deep Neural Networks with ReLUs Shengjie Wang*, Tianyi Zhou*, Jeff A. Bilmes University of Washington, Seattle Dropout has a few Drawbacks... Dropout encourages DNNs to apply the same linear model to different
Dropout has a few Drawbacks...
- Dropout encourages DNNs to apply the same linear model to
different data points but does not enforce local smoothness.
- Dropping zeros has no effects but still counts in drop rates.
- Dropout does not work well with BatchNorm.
B D
B D
Dropout has a few Drawbacks...
- Dropout encourages DNNs to apply the same linear model to
different data points but does not enforce local smoothness.
- Dropping zeros has no effects but still counts in drop rates.
- Dropout does not work well with BatchNorm.
- Jumpout improves dropout with three modifications
with (almost) no extra computation/memory costs.
B D
Jumpout Modification I – Encourage Local Smoothness
- Instead of applying a constant
dropout rate, the dropout rate is sampled from the positive part of a gaussian distribution, and the standard deviation is used to control the strength of regularization.
Data point Row of W Monotone dropout rate Constant dropout rate
Jumpout Modification II
- Better Control of Regularization
- The dropout rate is
normalized by the proportion of active neurons
- f the input layer so that we
can better control the regularization for different layers and for different training stages.
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97
ReLU Activation Portion Epoch
conv 1 conv 2 conv 3 fc 1
Portion of active neurons across training epochs for different layers.
Jumpout Modification III
- Synergize well with Batchnorm
- The rescaling factor for training is changed to 1 – 𝑞 $%.'( to
account for both the changes of the mean and the variance.
0.8 0.85 0.9 0.95 1 1.05 1.1 1.15 1.2 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97
Mean Ratio Epoch
(1 - p)^(-1) (1 - p)^(-0.5) (1 - p)^(-0.75) 0.8 0.85 0.9 0.95 1 1.05 1.1 1.15 1.2 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97
Variance Ratio Epoch
(1 - p)^(-1) (1 - p)^(-0.5) (1 - p)^(-0.75)
Changes of Mean (left) and Variance (right) when applying various rescaling factors. Blue: Dropout Grey: Jumpout
Results
STL10
Thank you!
- For more details, please come to our poster session