loss surfaces mode connectivity and fast ensembling of
play

Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs Timur - PowerPoint PPT Presentation

Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs Timur Garipov 1 , 2 Pavel Izmailov 3 Dmitrii Podoprikhin 4 Dmitry Vetrov 5 Andrew Gordon Wilson 3 1 Samsung AI Center in Moscow, 2 Skolkovo Institute of Science and Technology, 3 Cornell


  1. Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs Timur Garipov 1 , 2 Pavel Izmailov 3 Dmitrii Podoprikhin 4 Dmitry Vetrov 5 Andrew Gordon Wilson 3 1 Samsung AI Center in Moscow, 2 Skolkovo Institute of Science and Technology, 3 Cornell University, 4 Samsung-HSE Laboratory, 5 National Research University Higher School of Economics Neural Information Processing Systems Montreal, Canada December 4, 2018 1/10

  2. Loss Surfaces ResNet-164, CIFAR-100 2/10

  3. Loss Surfaces ResNet-164, CIFAR-100 3/10

  4. Finding Paths between Modes w 2 ∈ R | net | Weights of pretrained networks: � w 1 , � Define parametric curve: φ θ ( · ) [0 , 1] → R | net | φ θ (0) = � w 1 , φ θ (1) = � w 2 DNN loss function: L ( w ) Minimize averaged loss w.r.t. θ � 1 minimize ℓ ( θ ) = L ( φ θ ( t )) dt = E t ∼ U (0 , 1) L ( φ θ ( t )) θ 0 4/10

  5. 5/10

  6. Loss Surfaces VGG-16, CIFAR-10 80 > 3 > 3 > 3 50 3 3 3 60 Train loss 60 40 1.3 1.3 1.3 30 40 40 0.56 0.55 0.55 20 0.26 0.25 0.24 20 20 0.13 0.12 0.12 10 0 0.078 0.066 0.064 0 0 − 20 0.055 0.044 0.042 − 10 − 20 0.039 0.028 0.026 − 20 0 20 40 60 80 100 − 20 0 20 40 60 80 − 20 0 20 40 60 80 Test error (%) > 40 > 40 > 40 80 50 40 40 40 60 60 40 25 25 25 30 40 40 17 16 16 20 12 12 12 20 20 9.7 10 9.5 9.4 0 8.3 8.2 8.1 0 0 − 20 7.6 7.5 7.4 − 10 − 20 6.8 6.7 6.6 − 20 0 20 40 60 80 100 − 20 0 20 40 60 80 − 20 0 20 40 60 80 6/10

  7. 7/10

  8. Fast Geometric Ensembles (FGE) Learning rate α 1 α 2 Learning Rate n Test error (%) 35 30 25 c c c 15 Distance Ensemble 10 75% training 5 0 Epoch 0 0.5 c 1 c 1.5 c 2 c 2.5 c 3 c 3.5 c FGE iteration number 8/10

  9. Ensembling Results 82 SSE separate SSE ensemble 1 B model FGE separate FGE ensemble 80 Test accuracy (%) 78 76 74 0 0.5 B B 1.5 B 2 B Training budget SSE = Huang et al., (“Snapshot ensembles: Train 1, get m for free”), ICLR 2017 9/10

  10. Summary Local optima are connected by simple curves. To find these curves we minimize loss uniformly in expectation over a path from one mode to another. We are inspired by these insights to propose a fast ensembling algorithm. PyTorch code released for both mode connectivity and FGE Come to our poster #162! 10/10

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend