min l
play

min L ( ) - GANs: Hard (different) optimization problem: minimax. - PowerPoint PPT Presentation

SVRE: NEW METHOD FOR TRAINING GAN S G AUTHIER G IDEL Mila, Universit e de Montr eal Research intern at Element AI G ENERATIVE M ODELING AND M ODEL -B ASED R EASONING FOR R OBOTICS AND AI W ORKSHOP June 14, 2019 R EDUCING N OISE IN GAN T


  1. SVRE: NEW METHOD FOR TRAINING GAN S G AUTHIER G IDEL Mila, Universit´ e de Montr´ eal Research intern at Element AI G ENERATIVE M ODELING AND M ODEL -B ASED R EASONING FOR R OBOTICS AND AI W ORKSHOP June 14, 2019

  2. R EDUCING N OISE IN GAN T RAINING WITH V ARIANCE R EDUCED E XTRAGRADIENT T ATJANA C HAVDAROVA * G AUTHIER G IDEL * F RANC ¸ OIS F LEURET S IMON L ACOSTE -J ULIEN * Equal contribution

  3. G ENERATIVE A DVERSARIAL N ETWORKS [Goodfellow et al., 2014] Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 3 / 17

  4. C HALLENGES - Standard supervised learning: min θ L ( θ ) - GANs: Hard (different) optimization problem: minimax. Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 4 / 17

  5. C HALLENGES - Standard supervised learning: min θ L ( θ ) - GANs: Hard (different) optimization problem: minimax. Image source: Vaishnavh Nagarajan Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 4 / 17

  6. “N OISE ” : NOISY GRADIENT ESTIMATES D UE TO STOCHASTICITY - Using sub-samples (mini-batches) of the full dataset to update the parameters - Variance Reduced (VR) Gradient: optimization methods that reduce such noise Minimization: Single-objective φ θ Batch method direction Stochastic method direction: noisy Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 5 / 17

  7. V ARIANCE R EDUCTION –M OTIVATION FOR G AMES - I NTUITIVELY : M INIMIZATION V S . G AME (N OISE FROM S TOCHASTIC GRADIENT ) - E MPIRICALLY : B IG GAN –“I NCREASED BATCH SIZE SIGNIFICANTLY IMPROVES PERFORMANCES ” - T O SUM UP , TWO ISSUES : φ φ θ θ Minimization Game “ approximately ” the right direction Direction with noise can be “ bad ” Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 6 / 17

  8. V ARIANCE R EDUCTION –M OTIVATION FOR G AMES - I NTUITIVELY : M INIMIZATION V S . G AME (N OISE FROM S TOCHASTIC GRADIENT ) - E MPIRICALLY : B IG GAN –“I NCREASED BATCH SIZE SIGNIFICANTLY IMPROVES PERFORMANCES ” - T O SUM UP , TWO ISSUES : Brock et al. [2018] report a relative improvement of 46% of the Inception Score metric [Salimans et al., 2016] on ImageNet if the mini-batch size is increased 8 –fold. Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 6 / 17

  9. V ARIANCE R EDUCTION –M OTIVATION FOR G AMES - I NTUITIVELY : M INIMIZATION V S . G AME (N OISE FROM S TOCHASTIC GRADIENT ) - E MPIRICALLY : B IG GAN –“I NCREASED BATCH SIZE SIGNIFICANTLY IMPROVES PERFORMANCES ” - T O SUM UP , TWO ISSUES : - Adversarial aspect from min-max → Extragradient. - Noise from stochastic gradient → Variance Reduction. Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 6 / 17

  10. E XTRAGRADIENT Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 7 / 17

  11. E XTRAGRADIENT Two players θ , ϕ . Idea: perform a “Lookahead step ” � θ t +1 / 2 = θ t − η ∇ θ L G ( θ t , ϕ t ) Extrapolation: ϕ t +1 / 2 = ϕ t − η ∇ ϕ L D ( θ t , ϕ t ) � θ t +1 = θ t − η ∇ θ L G ( θ t +1 / 2 , ϕ t +1 / 2 ) Update: ϕ t +1 = ϕ t − η ∇ ϕ L D ( θ t +1 / 2 , ϕ t +1 / 2 ) Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 8 / 17

  12. V ARIANCE R EDUCED G RADIENT M ETHODS Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 9 / 17

  13. V ARIANCE R EDUCED E STIMATE OF THE G RADIENT Based on Finite sum assumption: n 1 � L ( x i , ω ) , n i =1 Epoch based algorithm: � - Save the full gradient 1 i ∇L ( x i , ω S ) and the snapshot ω S . n - For one epoch use the update rule: � � x i , ω S � � � + 1 ∇L ( x i , ω S ) − ∇L ω ← ω − η ∇L ( x i , ω ) n � �� � i � �� � Stochastic gradient correction using saved past iterate - Requires 2 stochastic gradients (at the current point and at the snapshot). - If ω S is close to ω → close to full batch gradient → small variance. - Full batch gradient expensive but tractable, e.g. , compute it once per pass. Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 10 / 17

  14. V ARIANCE R EDUCED E STIMATE OF THE G RADIENT Based on Finite sum assumption: n 1 � L ( x i , ω ) , n i =1 Epoch based algorithm: � - Save the full gradient 1 i ∇L ( x i , ω S ) and the snapshot ω S . n - For one epoch use the update rule: � � x i , ω S � � � + 1 ∇L ( x i , ω S ) − ∇L ω ← ω − η ∇L ( x i , ω ) n � �� � i � �� � Stochastic gradient correction using saved past iterate - Requires 2 stochastic gradients (at the current point and at the snapshot). - If ω S is close to ω → close to full batch gradient → small variance. - Full batch gradient expensive but tractable, e.g. , compute it once per pass. Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 10 / 17

  15. V ARIANCE R EDUCED E STIMATE OF THE G RADIENT Based on Finite sum assumption: n 1 � L ( x i , ω ) , n i =1 Epoch based algorithm: � - Save the full gradient 1 i ∇L ( x i , ω S ) and the snapshot ω S . n - For one epoch use the update rule: � � x i , ω S � � � + 1 ∇L ( x i , ω S ) − ∇L ω ← ω − η ∇L ( x i , ω ) n � �� � i � �� � Stochastic gradient correction using saved past iterate - Requires 2 stochastic gradients (at the current point and at the snapshot). - If ω S is close to ω → close to full batch gradient → small variance. - Full batch gradient expensive but tractable, e.g. , compute it once per pass. Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 10 / 17

  16. V ARIANCE R EDUCED E STIMATE OF THE G RADIENT Based on Finite sum assumption: n 1 � L ( x i , ω ) , n i =1 Epoch based algorithm: � - Save the full gradient 1 i ∇L ( x i , ω S ) and the snapshot ω S . n - For one epoch use the update rule: � � x i , ω S � � � + 1 ∇L ( x i , ω S ) − ∇L ω ← ω − η ∇L ( x i , ω ) n � �� � i � �� � Stochastic gradient correction using saved past iterate - Requires 2 stochastic gradients (at the current point and at the snapshot). - If ω S is close to ω → close to full batch gradient → small variance. - Full batch gradient expensive but tractable, e.g. , compute it once per pass. Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 10 / 17

  17. SVRE: V ARIANCE REDUCTION + E XTRAGRADIENT P SEUDO – ALGORITHM 1. Save snapshot ω S ← ω t and compute 1 � i ∇L ( x i , ω S ) . n 2. For i in 1 , . . . , epoch_length : - Compute ω t + 1 2 with variance reduced gradients at ω t . - Compute ω t +1 with variance reduced gradients at ω t + 1 2 . - t ← t + 1 3. Repeat until convergence. Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 11 / 17

  18. SVRE: V ARIANCE REDUCTION + E XTRAGRADIENT P SEUDO – ALGORITHM 1. Save snapshot ω S ← ω t and compute 1 � i ∇L ( x i , ω S ) . n 2. For i in 1 , . . . , epoch_length : - Compute ω t + 1 2 with variance reduced gradients at ω t . - Compute ω t +1 with variance reduced gradients at ω t + 1 2 . - t ← t + 1 3. Repeat until convergence. Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 11 / 17

  19. SVRE: V ARIANCE REDUCTION + E XTRAGRADIENT P SEUDO – ALGORITHM 1. Save snapshot ω S ← ω t and compute 1 � i ∇L ( x i , ω S ) . n 2. For i in 1 , . . . , epoch_length : - Compute ω t + 1 2 with variance reduced gradients at ω t . - Compute ω t +1 with variance reduced gradients at ω t + 1 2 . - t ← t + 1 3. Repeat until convergence. Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 11 / 17

  20. SVRE: V ARIANCE REDUCTION + E XTRAGRADIENT P SEUDO – ALGORITHM 1. Save snapshot ω S ← ω t and compute 1 � i ∇L ( x i , ω S ) . n 2. For i in 1 , . . . , epoch_length : - Compute ω t + 1 2 with variance reduced gradients at ω t . - Compute ω t +1 with variance reduced gradients at ω t + 1 2 . - t ← t + 1 3. Repeat until convergence. Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 11 / 17

  21. SVRE: V ARIANCE REDUCTION + E XTRAGRADIENT P SEUDO – ALGORITHM 1. Save snapshot ω S ← ω t and compute 1 � i ∇L ( x i , ω S ) . n 2. For i in 1 , . . . , epoch_length : - Compute ω t + 1 2 with variance reduced gradients at ω t . - Compute ω t +1 with variance reduced gradients at ω t + 1 2 . - t ← t + 1 3. Repeat until convergence. Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 11 / 17

  22. SVRE: V ARIANCE REDUCTION + E XTRAGRADIENT P SEUDO – ALGORITHM 1. Save snapshot ω S ← ω t and compute 1 � i ∇L ( x i , ω S ) . n 2. For i in 1 , . . . , epoch_length : - Compute ω t + 1 2 with variance reduced gradients at ω t . - Compute ω t +1 with variance reduced gradients at ω t + 1 2 . - t ← t + 1 3. Repeat until convergence. SVRE yields the fastest convergence rate for strongly convex stochastic game optimization in the literature. Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 11 / 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend