Empirical Risk Minimization October 29, 2015 Outline Empirical - PowerPoint PPT Presentation

Empirical Risk Minimization October 29, 2015

Outline • Empirical risk minimization view – Perceptron – CRF

Notation for Linear Models • Training data: {(x 1 , y 1 ), (x 2 , y 2 ), …, (x N , y N )} • Testing data: {(x N+1 , y N+1 ), … (x N+N' , y N+N' )} • Feature function: g • Weights: w • Decoding: • Learning: • Evaluation:

Structured Perceptron • Described as an online algorithm. • On each iteration, take one example, and update the weights according to: • Not discussing today: the theoretical guarantees this gives, separability, and the averaged and voted versions.

Empirical Risk Minimization • A unifying framework for many learning algorithms. • Many options for the loss function L and the regularization function R.

Solving the Minimization Problem • In some friendly cases, there is a closed form solution for the minimizer of w – E.g., the maximum likelihood estimator for HMMs • Usually , we have to use an iterative algorithm which amounts to progressively finding better versions of w – involves hard/soft inference with each improved value of w on either part or all of the training set

Loss Functions You May Know Name Expression of Log loss (joint)   Log loss (conditional) Zero-one loss   Expected zero- one loss

Loss Functions You May Know Name Expression of Log loss (joint)   Log loss (conditional) Cost   Expected cost, a.k.a. “risk”

CRFs and Loss • Plugging in the log-linear form (and not worrying at this level about locality of features): ‘

Training CRFs and   Other Linear Models • Early days: iterative scaling (specialized method for log-linear models only) • ~2002: quasi-Newton methods – (using LBFGS which dates from the late 1980s) • ~2006: stochastic gradient descent • ~2010: adaptive gradient methods

Perceptron and Loss • Not clear immediately what L is, but the “gradient” of L should be: • The vector of above quantities is actually a subgradient of:

Compare • CRF (log-loss): ‘ • Perceptron:

Loss Functions

Loss Functions You Know Name Expression of Convex? Log loss (joint) ✔ Log loss ✔ (conditional) Cost   Expected cost, a.k.a. “risk” Perceptron ✔ loss  

Loss Functions You Know Name Expression of Cont.? Log loss (joint) ✔ Log loss ✔ (conditional) Cost   Expected cost, ✔ a.k.a. “risk” Perceptron ✔ loss  

Loss Functions You Know Name Expression of Cost? Log loss (joint) Log loss (conditional) Cost   ✔ Expected cost, ✔ a.k.a. “risk” Perceptron loss  

The Ideal Loss Function For computational convenience: • Convex • Continuous For good performance: • Cost-aware • Theoretically sound

On Regularization • In principle, this choice is independent from the choice of the loss function. • Squared L 2 norm is the most λ common starting place. λ • L 1 and other sparsity- λ inducing regularizers as well as structured regularizers λ are of interest

Practical Advice • Features still more important than the loss function. – But general, easy-to-implement algorithms are quite useful! • Perceptron is easiest to implement. • CRFs and max margin techniques usually do better. • Tune the regularization constant, λ . – Never on the test data.

Empirical Risk Minimization October 29, 2015 Outline Empirical - PowerPoint PPT Presentation

Empirical Risk Minimization October 29, 2015 Outline Empirical risk minimization view Perceptron CRF Notation for Linear Models Training data: {(x 1 , y 1 ), (x 2 , y 2 ), , (x N , y N )} Testing data: {(x N+1 , y N+1 ),

CSC2412: Private Gradient Descent & Empirical Risk Minimization Sasho Nikolov 1 Empirical

Introduction to Machine Learning Vapnik Chervonenkis Theory Barnabs Pczos Empirical Risk

Minimization Satoru Iwata (University of Tokyo) Submodular Function Minimization ( )

Introduction to Machine Learning CMU-10701 10. Risk Minimization Barnabs Pczos 10. Risk

Introduction to Machine Learning ML-Basics: Losses & Risk Minimization Learning goals Know

Risk Management Workshop 1 Risk management workshop Why do we Risk Risk and need risk

Functional Principal Component Analysis May 14, 2018 Empirical Principal Component FPC for the

Suboptimality of Penalized Empirical Risk Minimization in Classification. Guillaume Lecu e

Di ff erentially Private Empirical Risk Minimization with Non-convex Loss Functions Di Wang ,

Nonconvex Variance Reduced Optimization with Arbitrary Sampling Samuel Horvth Peter Richtrik

High Order Methods for Empirical Risk Minimization Alejandro Ribeiro Department of Electrical and

Benefits of Radial Build Benefits of Radial Build Minimization and Requirements Minimization and

1 The Minimization Problem The Minimization Problem Input: A DFA (deterministic finite-state

ARS Workshop Context Markov Random Fields minimization and minimal cuts in Exact total variation

Counterfactual Regret Minimization and Domination in Extensive-Form Games Richard Gibson

Learning as Loss Minimization Machine Learning 1 Learning as loss minimization The setup

Lecture Series - MSG 141 C2-Simula5on Interoperability

Special Issues in SNFs/NFs during the COVID-19 Pandemic Alice Bonner, I HI Senior Advisor for

Evidence-based Health Promotion into the Workplace Jeff Harris, MD MPH MBA Overview Why do

COVID-19 Updates for the COPD Community April 20, 2020 Introductory Remarks Corinne Costa Davis

On the Local Minima of the Empirical Risk Chi Jin * 1 , Lydia T. Liu* 1 , Rong Ge 2 , Michael I.

9.54 class 8 Supervised learning Optimization, regularization, kernels Shimon Ullman + Tomaso

CS485/685 Lecture 15: Feb 28, 2012 Probably Approximately Correct Learning [BDSS] Chapter 1

The landscape of empirical risk for non-convex losses Song Mei ICME, Stanford December 3, 2016

Empirical Risk Minimization October 29, 2015 Outline Empirical - PowerPoint PPT Presentation

Empirical Risk Minimization October 29, 2015 Outline Empirical risk minimization view Perceptron CRF Notation for Linear Models Training data: {(x 1 , y 1 ), (x 2 , y 2 ), , (x N , y N )} Testing data: {(x N+1 , y N+1 ),

CSC2412: Private Gradient Descent &amp; Empirical Risk Minimization Sasho Nikolov 1 Empirical

Introduction to Machine Learning Vapnik Chervonenkis Theory Barnabs Pczos Empirical Risk

Minimization Satoru Iwata (University of Tokyo) Submodular Function Minimization ( )

Introduction to Machine Learning CMU-10701 10. Risk Minimization Barnabs Pczos 10. Risk

Introduction to Machine Learning ML-Basics: Losses &amp; Risk Minimization Learning goals Know

Risk Management Workshop 1 Risk management workshop Why do we Risk Risk and need risk

Functional Principal Component Analysis May 14, 2018 Empirical Principal Component FPC for the

Suboptimality of Penalized Empirical Risk Minimization in Classification. Guillaume Lecu e

Di ff erentially Private Empirical Risk Minimization with Non-convex Loss Functions Di Wang ,

Nonconvex Variance Reduced Optimization with Arbitrary Sampling Samuel Horvth Peter Richtrik

High Order Methods for Empirical Risk Minimization Alejandro Ribeiro Department of Electrical and

Benefits of Radial Build Benefits of Radial Build Minimization and Requirements Minimization and

1 The Minimization Problem The Minimization Problem Input: A DFA (deterministic finite-state

ARS Workshop Context Markov Random Fields minimization and minimal cuts in Exact total variation

Counterfactual Regret Minimization and Domination in Extensive-Form Games Richard Gibson

Learning as Loss Minimization Machine Learning 1 Learning as loss minimization The setup

Lecture Series - MSG 141 C2-Simula5on Interoperability

Special Issues in SNFs/NFs during the COVID-19 Pandemic Alice Bonner, I HI Senior Advisor for

Evidence-based Health Promotion into the Workplace Jeff Harris, MD MPH MBA Overview Why do

COVID-19 Updates for the COPD Community April 20, 2020 Introductory Remarks Corinne Costa Davis

On the Local Minima of the Empirical Risk Chi Jin * 1 , Lydia T. Liu* 1 , Rong Ge 2 , Michael I.

9.54 class 8 Supervised learning Optimization, regularization, kernels Shimon Ullman + Tomaso

CS485/685 Lecture 15: Feb 28, 2012 Probably Approximately Correct Learning [BDSS] Chapter 1

The landscape of empirical risk for non-convex losses Song Mei ICME, Stanford December 3, 2016

CSC2412: Private Gradient Descent & Empirical Risk Minimization Sasho Nikolov 1 Empirical

Introduction to Machine Learning ML-Basics: Losses & Risk Minimization Learning goals Know