follow the leader with dropout perturbations
play

Follow the Leader with Dropout Perturbations Tim van Erven COLT, - PowerPoint PPT Presentation

Follow the Leader with Dropout Perturbations Tim van Erven COLT, 2014 Joint work with: Wojciech Kotowski Manfred Warmuth Neural Network Neural Network Dropout Training Stochastic gradient descent Randomly remove every hidden/input


  1. Follow the Leader with Dropout Perturbations Tim van Erven COLT, 2014 Joint work with: Wojciech Kotłowski Manfred Warmuth

  2. Neural Network

  3. Neural Network

  4. Dropout Training ● Stochastic gradient descent ● Randomly remove every hidden/input unit with probability 1/2 before each gradient descent update [Hinton et al., 2012]

  5. Dropout Training ● Very successful in e.g. image classification, speech recognition ● Many people trying to analyse why it works [Wager, Wang, Liang, 2013]

  6. Prediction with Expert Advice ● Every round : 1. (Randomly) choose expert 2. Observe expert losses 3. Our loss is Goal: minimize expected regret Loss of the best expert where

  7. Follow-the-Leader ● Deterministically choose the expert that has predicted best in the past: is the leader. ● Can be fooled: regret grows linearly in T for adversarial data

  8. Dropout Perturbations is the perturbed leader

  9. Dropout Perturbations for Binary Losses ● For losses in it works: for any dropout probability ● No tuning required!

  10. Dropout Perturbations for Binary Losses ● For losses in it works: for any dropout probability ● No tuning required! ● But it does not work for continuous losses in [0,1]: there exist losses such that

  11. Binarized Dropout Perturbations: Continuous Losses ● The right generalization: for losses in [0,1]

  12. Small Regret for IID Data If loss vectors are – independent, identically distributed between trials, – with a gap between expected loss of best expert and the rest, then regret is constant : w.h.p. ● Algorithms that rely on doubling trick for or do not get this.

  13. Instance of Follow-the-Perturbed Leader ● Follow-the-Perturbed-Leader [Kalai,Vempala,2005] : We have data-dependent perturbations that differ between experts . ● Standard analysis: bound probability of leader change in the be-the-leader lemma. ● Elegant simple bound for perturbations of Kalai&Vempala, but not for us.

  14. Related Work: RWP ● Random walk perturbation [Devroye et al. 2013] : for a centered Bernoulli variable ● Equivalent to dropout if ● But perturbations do not adapt to data, so no -bound

  15. Proof Outline ● Find worst-case loss sequence

  16. Proof Outline ● Find worst-case loss sequence : e.g. for 3 experts with cumulative losses 1 , 3 and 5

  17. Proof Outline ● Find worst-case loss sequence : e.g. for 3 experts with cumulative losses 1 , 3 and 5 1. Cumulative losses approximately equal: apply lemma from RWP roughly once per K rounds 2. Expert 1 much smaller cum. loss: Hoeffding

  18. Summary ● Simple algorithm: Follow-the-leader on losses that are perturbed by binarized dropout ● No tuning necessary ● On any losses: ● On i.i.d. loss vectors with gap between best expert and rest: w.h.p.

  19. Many Open Questions To discuss at the poster ! ● Can we use dropout for: – Tracking the best expert? – Combinatorial settings (e.g. online shortest path)? ● Need to reuse randomness between experts ● What about variations on the dropout perturbations? – Drop the whole loss vector at once?

  20. References ● Hinton, Srivastava, Krizhevsky, Sutskever, Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. CoRR, abs/1207.0580, 2012. ● Wager, Wang, Liang. Dropout training as adaptive regularization. NIPS, 2013. ● Kalai, Vempala. Effcient algorithms for online decision problems. Journal of Computer and System Sciences, 71(3):291–307, 2005. ● Devroye, Lugosi, Neu. Prediction by random-walk perturbation. COLT, 2013. ● Van Erven, Kotłowski, Warmuth. Follow the leader with dropout perturbations. COLT, 2014.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend