jumpout improved dropout for deep neural networks with
play

Jumpout : Improved Dropout for Deep Neural Networks with ReLUs - PowerPoint PPT Presentation

Jumpout : Improved Dropout for Deep Neural Networks with ReLUs Shengjie Wang*, Tianyi Zhou*, Jeff A. Bilmes University of Washington, Seattle Dropout has a few Drawbacks... Dropout encourages DNNs to apply the same linear model to different


  1. Jumpout : Improved Dropout for Deep Neural Networks with ReLUs Shengjie Wang*, Tianyi Zhou*, Jeff A. Bilmes University of Washington, Seattle

  2. Dropout has a few Drawbacks... • Dropout encourages DNNs to apply the same linear model to different data points but does not enforce local smoothness. • Dropping zeros has no effects but still counts in drop rates. • Dropout does not work well with BatchNorm. B D B D

  3. Dropout has a few Drawbacks... • Dropout encourages DNNs to apply the same linear model to different data points but does not enforce local smoothness. • Dropping zeros has no effects but still counts in drop rates. • Dropout does not work well with BatchNorm. B D • Jumpout improves dropout with three modifications with (almost) no extra computation/memory costs.

  4. Jumpout Modification I – Encourage Local Smoothness • Instead of applying a constant dropout rate, the dropout rate is sampled from the positive part of a gaussian distribution, and the standard deviation is used to control the strength of regularization. Data point Monotone dropout rate Row of W Constant dropout rate

  5. Jumpout Modification II - Better Control of Regularization • The dropout rate is 0.8 normalized by the 0.7 proportion of active neurons 0.6 ReLU Activation Portion of the input layer so that we 0.5 can better control the 0.4 regularization for different 0.3 layers and for different 0.2 training stages. 0.1 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 Epoch conv 1 conv 2 conv 3 fc 1 Portion of active neurons across training epochs for different layers.

  6. Jumpout Modification III - Synergize well with Batchnorm • The rescaling factor for training is changed to 1 – 𝑞 $%.'( to account for both the changes of the mean and the variance. 1.2 1.2 1.15 1.15 1.1 1.1 1.05 1.05 Variance Ratio Mean Ratio 1 1 0.95 0.95 0.9 0.9 0.85 0.85 0.8 0.8 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 Epoch Epoch (1 - p)^(-1) (1 - p)^(-0.5) (1 - p)^(-0.75) (1 - p)^(-1) (1 - p)^(-0.5) (1 - p)^(-0.75) Changes of Mean (left) and Variance (right) when applying various rescaling factors. Blue: Dropout Grey: Jumpout

  7. Results STL10

  8. Thank you! • For more details, please come to our poster session Tuesday 06:30 - 09:00 PM Pacific Ballroom #29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend