Learning Optimal Linear Regularizers Matthew Streeter Setup - PowerPoint PPT Presentation

Sep 20, 2022 •318 likes •484 views

Learning Optimal Linear Regularizers Matthew Streeter Setup Want to produce a model Will minimize training loss + regularizer: L train () + R() Ultimately, we care about test loss: L test () Setup Want to produce a

Learning Optimal Linear Regularizers Matthew Streeter
Setup ● Want to produce a model θ ● Will minimize training loss + regularizer: L train (θ) + R(θ) ● Ultimately, we care about test loss: L test (θ)
Setup ● Want to produce a model θ ● Will minimize training loss + regularizer: L train (θ) + R(θ) ● Ultimately, we care about test loss: L test (θ) ● An optimal regularizer: R(θ) = L test (θ) - L train (θ) ○ suggests that a good regularizer should upper bound the generalization gap
What makes a good regularizer? ● Want to find regularizer R that minimizes L test (θ R )
What makes a good regularizer? ● Want to find regularizer R that minimizes L test (θ R )
What makes a good regularizer? ● Want to find regularizer R that minimizes L test (θ R )
What makes a good regularizer? ● Want to find regularizer R that minimizes L test (θ R ) Approximate by maximizing over small set of models (estimating test loss using validation set)
Learning linear regularizers ● Linear regularizer: R(θ) = λ * feature_vector(θ)
Learning linear regularizers ● Linear regularizer: R(θ) = λ * feature_vector(θ) ● LearnReg : given models with known training & validation loss, finds best λ (in terms of approximation on previous slide)
Learning linear regularizers ● Linear regularizer: R(θ) = λ * feature_vector(θ) ● LearnReg : given models with known training & validation loss, finds best λ (in terms of approximation on previous slide) Solves a sequence of linear programs
Learning linear regularizers ● Linear regularizer: R(θ) = λ * feature_vector(θ) ● LearnReg : given models with known training & validation loss, finds best λ (in terms of approximation on previous slide) Solves a sequence of linear programs Under certain assumptions, can “jump” to optimal λ given data from just 1 + |λ| models
Learning linear regularizers ● Linear regularizer: R(θ) = λ * feature_vector(θ) ● LearnReg : given models with known training & validation loss, finds best λ (in terms of approximation on previous slide) Solves a sequence of linear programs Under certain assumptions, can “jump” to optimal λ given data from just 1 + |λ| models ● TuneReg: uses LearnReg iteratively to do hyperparameter tuning
Hyperparameter tuning experiment ● Inception-v3 transfer learning problem, linear combination of 4 regularizers
Hyperparameter tuning experiment ● Inception-v3 transfer learning problem, linear combination of 4 regularizers
Hyperparameter tuning experiment ● Inception-v3 transfer learning problem, linear combination of 4 regularizers LearnReg kicks in here
Hyperparameter tuning experiment ● Inception-v3 transfer learning problem, linear combination of 4 regularizers LearnReg kicks in here

Recommend

Data-Dependent Sample Complexities for Deep Neural Networks Tengyu Ma Colin Wei Stanford

Data-Dependent Sample Complexities for Deep Neural Networks Tengyu Ma Colin Wei Stanford University How do we design principled regularizers for deep models? How do we design principled regularizers for deep models? Many regularizers are

558 views • 30 slides

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Linear, Linear, Linear CS7616 Pattern Recognition A. Bobick CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive Computing Linear, Linear, Linear CS7616 Pattern Recognition A. Bobick Administrivia

685 views • 64 slides

Learning Regularizers From Data Venkat Chandrasekaran Caltech Joint work with Yong Sheng Soh

Learning Regularizers From Data Venkat Chandrasekaran Caltech Joint work with Yong Sheng Soh Variational Perspective on Inference o Loss ensures fidelity to observed data o Based on the specific inverse problem one wishes to solve o Regularizer

580 views • 31 slides

Screening Rules for Lasso with Non-Convex Sparse Regularizers Joseph Salmon

Screening Rules for Lasso with Non-Convex Sparse Regularizers Joseph Salmon http://josephsalmon.eu Universit de Montpellier Joint work with A. Rakotomamonjy and G. Gasso 1 / 18 Motivation and objective Lasso and screening Learning

440 views • 24 slides

Inverse problems and control optimal in non-linear mechanics C. Stolz 1 2 Introduction

Inverse problems and control optimal in non-linear mechanics C. Stolz 1 2 Introduction Optimal control Some Inverse Problems Typical approaches Other applications of optimal control Picof12 2 Optimal Control Optimal Control

543 views • 31 slides

Linear Optimal Control How does this guy remain upright? Overview 1. expressing a linear system

Linear Optimal Control How does this guy remain upright? Overview 1. expressing a linear system in state space form 2. discrete time linear optimal control (LQR) 3. linearizing around an operating point 4. linear model predictive control 5.

1.42k views • 109 slides

Stochastic Optimization for DC Functions and Non-smooth Non-convex Regularizers with

Stochastic Optimization for DC Functions and Non-smooth Non-convex Regularizers with Non-asymptotic Convergence Yi Xu 1 , Qi Qi 1 , Qihang Lin 1 , Rong Jin 2 , Tianbao Yang 1 1. The University of Iowa 2. Damo Academy at Alibaba June 12, 2019

246 views • 11 slides

De-biasing arbitrary convex regularizers and asymptotic normality Pierre C Bellec, Rutgers

De-biasing arbitrary convex regularizers and asymptotic normality Pierre C Bellec, Rutgers University Mathematical Methods of Modern Statistics 2, June 2020 Joint work with Cun-Hui Zhang (Rutgers). Second order Poincar inequalities and

1.05k views • 35 slides

Screening Rules for Lasso with Non-Convex Sparse Regularizers A. Rakotomamonjy Joint work with G.

Screening Rules for Lasso with Non-Convex Sparse Regularizers A. Rakotomamonjy Joint work with G. Gasso and J. Salmon ICML 2019 This work benefited from the support of the project OATMIL ANR-17-CE23-0012 of the French National Research Agency

352 views • 6 slides

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

The Optimal Agent Application & Evaluation Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent Application & Evaluation Motivation Artificial Intelligence (AI) is the field inspired by the

400 views • 36 slides

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

Need for Unmanned . . . Need for Easily . . . Technical Details of . . . Need for an Optimal . . . Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal Trajectory for Solution: How to . . . What If

429 views • 20 slides

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE core topics important Linear Combinations x 1 2x 2 + x 1 = x 2 =1 Linear Combinations Algebra Linear Combinations

1.58k views • 117 slides

Linear Optimal Control (LQR) Robert Platt Northeastern University The linear control problem

Linear Optimal Control (LQR) Robert Platt Northeastern University The linear control problem Given: System: The linear control problem Given: System: Cost function: where: The linear control problem Given: System: Cost function:

999 views • 50 slides

Martingale Optimal Transport in Higher Hadrien De March Dimension Optimal transport

MOTDim d Martingale Optimal Transport in Higher Hadrien De March Dimension Optimal transport Formulation of the problems Optimal Hadrien De March transport in practice Martingale CMAP, Ecole Polytechnique optimal transport

690 views • 24 slides

Categorical Semantics for Linear Logic Categorical semantics for linear logic Interaction

Categorical Semantics for Linear Logic Wolfgang Jeltsch Linear logic Categorical Semantics for Linear Logic Categorical semantics for linear logic Interaction Wolfgang Jeltsch between linear and non-linear Based on work by Nick Benton

382 views • 21 slides

Linear Programming Linear Programming In a linear programming problem, there is a set of

Linear Programming Linear Programming In a linear programming problem, there is a set of variables, and we want to assign real values to CISC5835, Algorithms for Big Data them so as to satisfy a set of linear equations and/or linear

415 views • 6 slides

GCDs & linear gcd(a,b) is an integer combinations: linear combination of a and b . The

GCD is a linear combination Mathematics for Computer Science MIT 6.042J/18.062J Theorem: GCDs & linear gcd(a,b) is an integer combinations: linear combination of a and b . The Pulverizer gcd(a,b) = sa + tb Albert R Meyer March 6,

518 views • 5 slides

Announcements Wednesday, September 06 WeBWorK due today at 11:59pm. The quiz on Friday

Announcements Wednesday, September 06 WeBWorK due today at 11:59pm. The quiz on Friday covers through Section 1.2 (last weeks material) Announcements Wednesday, September 06 Good references about applications (introductions to chapters in

647 views • 21 slides

r r Prof. Inder K. Rana Room 112 B Department of Mathematics

r r Prof. Inder K. Rana Room 112 B Department of Mathematics IIT-Bombay, Mumbai-400076 (India) Email: ikr@math.iitb.ac.in Lecture 13 Prof. Inder K. Rana Department of Mathematics, IIT - Bombay Abstract

748 views • 21 slides

Limits on Representing Functions by Linear Combinations of Simple Functions 0,1

Limits on Representing Functions by Linear Combinations of Simple Functions 0,1 0,1 ? simple simple simple simple simple simple Ryan Williams MIT The -linear Representation Problem Let be a class of

283 views • 14 slides

1.3 VECTOR EQUATIONS Key concepts to master: linear combinations of vectors and a spanning set.

1.3 VECTOR EQUATIONS Key concepts to master: linear combinations of vectors and a spanning set. Vector: A matrix with only one column. Vectors in R n (vectors with n entries): u 1 u 2 u = u n Geometric Description of R 2 x 1 Vector is the

448 views • 14 slides

11. Equality constrained minimization equality constrained minimization eliminating

Convex Optimization Boyd & Vandenberghe 11. Equality constrained minimization equality constrained minimization eliminating equality constraints Newtons method with equality constraints infeasible start Newton method

447 views • 19 slides

Variable Ranges in Linear Constraints Salvatore RUGGIERI and Fred MESNARD Dipartimento di

Variable Ranges in Linear Constraints Salvatore RUGGIERI and Fred MESNARD Dipartimento di Informatica, Universit` a di Pisa, Italy LIM-IREMIA, universit e de la R eunion, France 1/22 Introduction: example We add variable ranges to linear

506 views • 25 slides

Optimization and Simulation Constrained optimization Michel Bierlaire michel.bierlaire@epfl.ch

Optimization and Simulation Constrained optimization Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory Optimization and Simulation p. 1/51 The problem Generic problem: x R n f ( x ) min subject to [ h : R n

721 views • 51 slides

Learning Optimal Linear Regularizers Matthew Streeter Setup - PowerPoint PPT Presentation

Learning Optimal Linear Regularizers Matthew Streeter Setup Want to produce a model Will minimize training loss + regularizer: L train () + R() Ultimately, we care about test loss: L test () Setup Want to produce a

Data-Dependent Sample Complexities for Deep Neural Networks Tengyu Ma Colin Wei Stanford

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Learning Regularizers From Data Venkat Chandrasekaran Caltech Joint work with Yong Sheng Soh

Screening Rules for Lasso with Non-Convex Sparse Regularizers Joseph Salmon

Inverse problems and control optimal in non-linear mechanics C. Stolz 1 2 Introduction

Linear Optimal Control How does this guy remain upright? Overview 1. expressing a linear system

Stochastic Optimization for DC Functions and Non-smooth Non-convex Regularizers with

De-biasing arbitrary convex regularizers and asymptotic normality Pierre C Bellec, Rutgers

Screening Rules for Lasso with Non-Convex Sparse Regularizers A. Rakotomamonjy Joint work with G.

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Linear Optimal Control (LQR) Robert Platt Northeastern University The linear control problem

Martingale Optimal Transport in Higher Hadrien De March Dimension Optimal transport

Categorical Semantics for Linear Logic Categorical semantics for linear logic Interaction

Linear Programming Linear Programming In a linear programming problem, there is a set of

GCDs & linear gcd(a,b) is an integer combinations: linear combination of a and b . The

Announcements Wednesday, September 06 WeBWorK due today at 11:59pm. The quiz on Friday

r r Prof. Inder K. Rana Room 112 B Department of Mathematics

Limits on Representing Functions by Linear Combinations of Simple Functions 0,1

1.3 VECTOR EQUATIONS Key concepts to master: linear combinations of vectors and a spanning set.

11. Equality constrained minimization equality constrained minimization eliminating

Variable Ranges in Linear Constraints Salvatore RUGGIERI and Fred MESNARD Dipartimento di

Optimization and Simulation Constrained optimization Michel Bierlaire michel.bierlaire@epfl.ch

Sambuz

Useful Links

Newsletter

Mail Us

Learning Optimal Linear Regularizers Matthew Streeter Setup - PowerPoint PPT Presentation

Learning Optimal Linear Regularizers Matthew Streeter Setup Want to produce a model Will minimize training loss + regularizer: L train () + R() Ultimately, we care about test loss: L test () Setup Want to produce a

Data-Dependent Sample Complexities for Deep Neural Networks Tengyu Ma Colin Wei Stanford

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Learning Regularizers From Data Venkat Chandrasekaran Caltech Joint work with Yong Sheng Soh

Screening Rules for Lasso with Non-Convex Sparse Regularizers Joseph Salmon

Inverse problems and control optimal in non-linear mechanics C. Stolz 1 2 Introduction

Linear Optimal Control How does this guy remain upright? Overview 1. expressing a linear system

Stochastic Optimization for DC Functions and Non-smooth Non-convex Regularizers with

De-biasing arbitrary convex regularizers and asymptotic normality Pierre C Bellec, Rutgers

Screening Rules for Lasso with Non-Convex Sparse Regularizers A. Rakotomamonjy Joint work with G.

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

Graphics 2014 Linear Algebra II Linear Maps &amp; Matrices Linear Maps &amp; Matrices CORE

Linear Optimal Control (LQR) Robert Platt Northeastern University The linear control problem

Martingale Optimal Transport in Higher Hadrien De March Dimension Optimal transport

Categorical Semantics for Linear Logic Categorical semantics for linear logic Interaction

Linear Programming Linear Programming In a linear programming problem, there is a set of

GCDs &amp; linear gcd(a,b) is an integer combinations: linear combination of a and b . The

Announcements Wednesday, September 06 WeBWorK due today at 11:59pm. The quiz on Friday

r r Prof. Inder K. Rana Room 112 B Department of Mathematics

Limits on Representing Functions by Linear Combinations of Simple Functions 0,1

1.3 VECTOR EQUATIONS Key concepts to master: linear combinations of vectors and a spanning set.

11. Equality constrained minimization equality constrained minimization eliminating

Variable Ranges in Linear Constraints Salvatore RUGGIERI and Fred MESNARD Dipartimento di

Optimization and Simulation Constrained optimization Michel Bierlaire michel.bierlaire@epfl.ch

Sambuz

Useful Links

Newsletter

Mail Us

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

GCDs & linear gcd(a,b) is an integer combinations: linear combination of a and b . The