Optimizer Benchmarking Needs to Account for Hyperparameter Tuning - PowerPoint PPT Presentation

Optimizer Benchmarking Needs to Account for Hyperparameter Tuning Prabhu Teja S * 1, 2 Florian Mai * 1, 2 Thijs Vogels 2 Martin Jaggi 2 François Fleuret 1, 2 1 Idiap Research Institute, 2 EPFL, Switzerland * Equal Contribution prabhu.teja, florian.mai@idiap.ch

The problem of optimizer evaluation Expected loss L ( θ ) Optimizer A Optimizer B θ ⋆ B θ ⋆ A Hyperparameter θ F igure: Two optimizers A & B with hyperparameter θ . Which one do we prefer in practice? prabhu.teja, florian.mai@idiap.ch 1

The problem of optimizer evaluation Expected loss L ( θ ) Optimizer A Optimizer B θ ⋆ B θ ⋆ A Hyperparameter θ F igure: Two optimizers A & B with hyperparameter θ . Which one do we prefer in practice? 1. The absolute performance of the optimizer → L ( θ ⋆ A ) , L ( θ ⋆ B ) 2. Difficulty of finding good hyperparameter configuration ≈ θ ⋆ A , θ ⋆ B . prabhu.teja, florian.mai@idiap.ch 1

The Problem of Optimizer Evaluation: SGD vs Adam 1. SGD often achieves better peak performance than Adam in previous literature 2. We take into cognizance the cost of automatic Hyperparameter Optimization (HPO), and find: 1.0 SGD (l.r. schedule tuned, fixed mom. and w.d.) 12% Probability of being the best 13% 0.8 SGD (tuned l.r., fixed mom. and w.d.) 17% Adam (all params. tuned) 0.6 0.4 Adam (only l.r. tuned) 58% 0.2 0.0 10 20 30 40 50 60 Budget for hyperparameter optimization (# models trained) Our method eliminates human biases arising from manual hyperparameter tuning. prabhu.teja, florian.mai@idiap.ch 2

Revisiting the notion of an Optimizer Definition An optimizer is a pair M = ( U Θ , p Θ ) , which applies its update rule U ( S t ; Θ) at each step t depending on its current state S t . Its hyperparameters Θ = ( θ 1 , . . . , θ N ) have a prior probability distribution p Θ : (Θ → R ) defined. p Θ should be specified by the optimizer designer, e.g., Adam’s ǫ > 0 and close to 0 = ⇒ ǫ ∼ Log-uniform ( − 8 , 0 ) prabhu.teja, florian.mai@idiap.ch 3

HPO aware optimizer benchmarking Algorithm 1 Benchmark with ‘expected quality at budget’ input: optimizer O , cross-task hyperparameter prior p Θ , task T , tuning budget B Initialize list ← [ ] . for R repetitions do Perform random search with budget B : – S ← sample B elements from p Θ . – list ← [ best ( S ) , . . . list ] . return mean ( list ), var ( list ), or other statistics prabhu.teja, florian.mai@idiap.ch 4

Calibrated task independent priors p Θ Optimizer Tunable parameters Cross-task prior SGD Learning rate ?? Momentum Weight decay Poly decay ( p ) Adagrad Learning rate Adam Learning rate β 1 , β 2 ǫ prabhu.teja, florian.mai@idiap.ch 5

Calibrated task independent priors p Θ Optimizer Tunable parameters Cross-task prior SGD Learning rate ?? Momentum Weight decay Poly decay ( p ) Adagrad Learning rate Adam Learning rate β 1 , β 2 ǫ Sample a large number of points and their performance from a large range of admissible values Maximum Likelihood Estimate (MLE) of the prior’s parameters using the top 20% performant values from the previous step. prabhu.teja, florian.mai@idiap.ch 5

Calibrated task independent priors p Θ Optimizer Tunable parameters Cross-task prior SGD Learning rate Log-normal(-2.09, 1.312) Momentum U [ 0 , 1 ] Weight decay Log-uniform(-5, -1) Poly decay ( p ) U [ 0 . 5 , 5 ] Adagrad Learning rate Log-normal(-2.004, 1.20) Adam Learning rate Log-normal(-2.69, 1.42) 1- Log-uniform(-5, -1) β 1 , β 2 Log-uniform(-8, 0) ǫ Sample a large number of points and their performance from a large range of admissible values Maximum Likelihood Estimate (MLE) of the prior’s parameters using the top 20% performant values from the previous step. prabhu.teja, florian.mai@idiap.ch 5

The importance of Recipes Optimizer label Tunable parameters SGD-M C W C SGD( γ, µ = 0 . 9 , λ = 10 − 5 ) SGD-M C D SGD( γ, µ = 0 . 9 , λ = 10 − 5 ) + Poly Decay( p ) SGD-MW SGD( γ, µ, λ ) Adam( γ , β 1 = 0 . 9, β 2 = 0 . 999, ǫ = 10 − 8 ) Adam-LR Adam Adam( γ, β 1 , β 2 , ǫ ) SGD( γ, µ, λ ) is SGD with γ learning rate, µ momentum, λ weight decay coefficient. Adagrad( γ ) is Adagrad with γ learning rate, Adam( γ, β 1 , β 2 , ǫ ) is Adam with learning rate γ , momentum parameters β 1 , β 2 , and normalization parameter ǫ prabhu.teja, florian.mai@idiap.ch 6

Performance at a budget ❈■❋❆❘ ✶✵ ■▼❉❜ ▲❙❚▼ 90 90 85 80 ❚❡st ❆❝❝✉r❛❝② ❚❡st ❆❝❝✉r❛❝② 80 70 75 60 70 50 65 40 60 ❇✉❞❣❡t ✶ ❇✉❞❣❡t ✹ ❇✉❞❣❡t ✶✻ ❇✉❞❣❡t ✻✹ ❇✉❞❣❡t ✶ ❇✉❞❣❡t ✹ ❇✉❞❣❡t ✶✻ ❇✉❞❣❡t ✻✹ Performance of Adam-LR , Adam , SGD-M C W C , SGD-MW , SGD-M C D at various hyperparameter search budgets prabhu.teja, florian.mai@idiap.ch 7

Summarizing our findings 1 . 00 Aggregated relative performance 0 . 95 0 . 90 0 . 85 Adam Adam-LR 0 . 80 SGDM C W C SGD-Decay 0 . 75 0 20 40 60 80 100 # hyperparameter configurations (budget) Summary statistics: 1 o ( k , p ) S ( o , k ) = � o ′ ∈O o ′ ( k , p ) , |P| max p ∈P where o ( k , p ) denotes the expected performance of optimizer o ∈ O on test problem p ∈ P after k iterations of hyperparameter search. prabhu.teja, florian.mai@idiap.ch 8

Our findings 1. Support the hypothesis that adaptive gradient methods are easier to tune than non-adaptive methods The substantial value of the adaptive gradient methods, specifically Adam, is its amenability to hyperparameter search. prabhu.teja, florian.mai@idiap.ch 9

Our findings 1. Support the hypothesis that adaptive gradient methods are easier to tune than non-adaptive methods The substantial value of the adaptive gradient methods, specifically Adam, is its amenability to hyperparameter search. 2. Tuning optimizers’ hyperparameters apart from the learning rate becomes more useful as the available tuning budget goes up. Even with relatively large tuning budget, tuning only the learning rate of Adam is the safer choice, as it achieves good results with high probability. prabhu.teja, florian.mai@idiap.ch 9

THANK YOU prabhu.teja, florian.mai@idiap.ch 10

Optimizer Benchmarking Needs to Account for Hyperparameter Tuning - PowerPoint PPT Presentation

Optimizer Benchmarking Needs to Account for Hyperparameter Tuning Prabhu Teja S * 1, 2 Florian Mai * 1, 2 Thijs Vogels 2 Martin Jaggi 2 Franois Fleuret 1, 2 1 Idiap Research Institute, 2 EPFL, Switzerland * Equal Contribution prabhu.teja,

The MySQL Query Optimizer Explained Through Optimizer Trace ystein Grvlen Senior Staff

Explaining the Postgres Query Optimizer B RUCE M OMJIAN The optimizer is the "brain" of

Needs Slides Needs Slides Needs Slides Needs Slides Needs Slides Needs Slides Needs Slides

B3 Benchmarking B3 Building Benchmarking Program Overview www.CleanEnergyResourceTeams.org B3

Understanding and control of MySQL Query Optimizer traditional and novel tools and techniques

Benchmarking Lunch-n-Learn March 18, 2019 Agenda 1. Why Benchmarking? 2. Introduction to

Account Compliance Trust Account Reconciliation Agenda Trust Account Overview Top

Module 13: Optimizing Query Performance Overview Introduction to the Query Optimizer

The Volcano Optimizer Generator Generator: Object-oriented and scientific Extensibility and

The Volcano Optimizer Generator: Extensibility and Efficient Search Presentation: Mirna Limic

Account Home Revamp AB 1296 January 23 rd 2020 | 1 Account Home Fixer-Upper We redesigned

LANDING ACCOUNT PROCEDURES. LANDING ACCOUNT The Landing Account is a report of all the cargo that

TDC Special Project Accounts Development Account Contingency Account Donations and

2015 Benchmarking & Data Management April 15, 2015 PSTA Runs on Data Highlights from 2015 1.

Autonomous Driving on Benchmarks Xiaodi Hou TWO DECADES OF BENCHMARKING Two decades of

PMPA/MPI Statistics and PMPA/MPI Statistics and Benchmarking Project Benchmarking Project Magda

Spring Grove Lacrosse Club Registration Tutorial First Steps 1. Make an account for yourself

Your TSP Account: What to Think About When Nearing Retirement or Considering Leaving the

DEVELOPMENT OF A PRIVACY PRESERVING LIFERAY PORTAL DOCUMENT SYNCHRONIZER FOR ANDROID BY

Tutamen: A Next-Generation Secret-Storage System Andy Sayler, Taylor Andrews, Matt Monaco, and

How to Create an OLLI Student Account & Pay Your

Heat-ray: Combating Identity Snowball Attacks Using Machine Learning, Combinatorial Optimization

Education Savings Account Information for School Administrators 1 AG AGENDA 1. Program

Web Security CS 161: Computer Security Prof. Raluca Ada Popa March 15, 2018 Some content

Optimizer Benchmarking Needs to Account for Hyperparameter Tuning - PowerPoint PPT Presentation

Optimizer Benchmarking Needs to Account for Hyperparameter Tuning Prabhu Teja S * 1, 2 Florian Mai * 1, 2 Thijs Vogels 2 Martin Jaggi 2 Franois Fleuret 1, 2 1 Idiap Research Institute, 2 EPFL, Switzerland * Equal Contribution prabhu.teja,

The MySQL Query Optimizer Explained Through Optimizer Trace ystein Grvlen Senior Staff

Explaining the Postgres Query Optimizer B RUCE M OMJIAN The optimizer is the &quot;brain&quot; of

Needs Slides Needs Slides Needs Slides Needs Slides Needs Slides Needs Slides Needs Slides

B3 Benchmarking B3 Building Benchmarking Program Overview www.CleanEnergyResourceTeams.org B3

Understanding and control of MySQL Query Optimizer traditional and novel tools and techniques

Benchmarking Lunch-n-Learn March 18, 2019 Agenda 1. Why Benchmarking? 2. Introduction to

Account Compliance Trust Account Reconciliation Agenda Trust Account Overview Top

Module 13: Optimizing Query Performance Overview Introduction to the Query Optimizer

The Volcano Optimizer Generator Generator: Object-oriented and scientific Extensibility and

The Volcano Optimizer Generator: Extensibility and Efficient Search Presentation: Mirna Limic

Account Home Revamp AB 1296 January 23 rd 2020 | 1 Account Home Fixer-Upper We redesigned

LANDING ACCOUNT PROCEDURES. LANDING ACCOUNT The Landing Account is a report of all the cargo that

TDC Special Project Accounts Development Account Contingency Account Donations and

2015 Benchmarking &amp; Data Management April 15, 2015 PSTA Runs on Data Highlights from 2015 1.

Autonomous Driving on Benchmarks Xiaodi Hou TWO DECADES OF BENCHMARKING Two decades of

PMPA/MPI Statistics and PMPA/MPI Statistics and Benchmarking Project Benchmarking Project Magda

Spring Grove Lacrosse Club Registration Tutorial First Steps 1. Make an account for yourself

Your TSP Account: What to Think About When Nearing Retirement or Considering Leaving the

DEVELOPMENT OF A PRIVACY PRESERVING LIFERAY PORTAL DOCUMENT SYNCHRONIZER FOR ANDROID BY

Tutamen: A Next-Generation Secret-Storage System Andy Sayler, Taylor Andrews, Matt Monaco, and

How to Create an OLLI Student Account &amp; Pay Your

Heat-ray: Combating Identity Snowball Attacks Using Machine Learning, Combinatorial Optimization

Education Savings Account Information for School Administrators 1 AG AGENDA 1. Program

Web Security CS 161: Computer Security Prof. Raluca Ada Popa March 15, 2018 Some content

Explaining the Postgres Query Optimizer B RUCE M OMJIAN The optimizer is the "brain" of

2015 Benchmarking & Data Management April 15, 2015 PSTA Runs on Data Highlights from 2015 1.

How to Create an OLLI Student Account & Pay Your