learning optimal linear regularizers
play

Learning Optimal Linear Regularizers Matthew Streeter Setup - PowerPoint PPT Presentation

Learning Optimal Linear Regularizers Matthew Streeter Setup Want to produce a model Will minimize training loss + regularizer: L train () + R() Ultimately, we care about test loss: L test () Setup Want to produce a


  1. Learning Optimal Linear Regularizers Matthew Streeter

  2. Setup ● Want to produce a model θ ● Will minimize training loss + regularizer: L train (θ) + R(θ) ● Ultimately, we care about test loss: L test (θ)

  3. Setup ● Want to produce a model θ ● Will minimize training loss + regularizer: L train (θ) + R(θ) ● Ultimately, we care about test loss: L test (θ) ● An optimal regularizer: R(θ) = L test (θ) - L train (θ) ○ suggests that a good regularizer should upper bound the generalization gap

  4. What makes a good regularizer? ● Want to find regularizer R that minimizes L test (θ R )

  5. What makes a good regularizer? ● Want to find regularizer R that minimizes L test (θ R )

  6. What makes a good regularizer? ● Want to find regularizer R that minimizes L test (θ R )

  7. What makes a good regularizer? ● Want to find regularizer R that minimizes L test (θ R ) Approximate by maximizing over small set of models (estimating test loss using validation set)

  8. Learning linear regularizers ● Linear regularizer: R(θ) = λ * feature_vector(θ)

  9. Learning linear regularizers ● Linear regularizer: R(θ) = λ * feature_vector(θ) ● LearnReg : given models with known training & validation loss, finds best λ (in terms of approximation on previous slide)

  10. Learning linear regularizers ● Linear regularizer: R(θ) = λ * feature_vector(θ) ● LearnReg : given models with known training & validation loss, finds best λ (in terms of approximation on previous slide) Solves a sequence of linear programs

  11. Learning linear regularizers ● Linear regularizer: R(θ) = λ * feature_vector(θ) ● LearnReg : given models with known training & validation loss, finds best λ (in terms of approximation on previous slide) Solves a sequence of linear programs Under certain assumptions, can “jump” to optimal λ given data from just 1 + |λ| models

  12. Learning linear regularizers ● Linear regularizer: R(θ) = λ * feature_vector(θ) ● LearnReg : given models with known training & validation loss, finds best λ (in terms of approximation on previous slide) Solves a sequence of linear programs Under certain assumptions, can “jump” to optimal λ given data from just 1 + |λ| models ● TuneReg: uses LearnReg iteratively to do hyperparameter tuning

  13. Hyperparameter tuning experiment ● Inception-v3 transfer learning problem, linear combination of 4 regularizers

  14. Hyperparameter tuning experiment ● Inception-v3 transfer learning problem, linear combination of 4 regularizers

  15. Hyperparameter tuning experiment ● Inception-v3 transfer learning problem, linear combination of 4 regularizers LearnReg kicks in here

  16. Hyperparameter tuning experiment ● Inception-v3 transfer learning problem, linear combination of 4 regularizers LearnReg kicks in here

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend