N OISE ... p (y|x) x Y X the same x can generate different y - PowerPoint PPT Presentation

I NTRODUCTION TO LEARNING R EGULARIZATION M ETHODS FOR H IGH D IMENSIONAL L EARNING Francesca Odone and Lorenzo Rosasco odone@disi.unige.it - lrosasco@mit.edu June 6, 2011 Regularization Methods for High Dimensional Learning Introduction to learning

D IFFERENT PROBLEMS IN SUPERVISED LEARNING In supervised learning we are given a set of input-output pairs ( x 1 , y 1 ) , . . . , ( x n , y n ) that we call a training set Classification: A learning problem with output values taken from a finite unordered set C = { C 1 , . . . , C k } . A special case is binary classification where y i ∈ {− 1 , 1 } . Regression: A learning problem whose output values are real y i ∈ IR Regularization Methods for High Dimensional Learning Introduction to learning

L EARNING IS I NFERENCE P REDICTIVITY OR G ENERALIZATION Given the data, the goal is to learn how to make decisions/predictions about future data / data not belonging to the training set . The problem is : Avoid overfitting!! Regularization Methods for High Dimensional Learning Introduction to learning

P REDICTIVITY Among many possible solutions how can we choose one that correcly applies to previously unseen data? Regularization Methods for High Dimensional Learning Introduction to learning

T HE ROLE OF PROBABILITY In supervised learning we consider the following The relationship can be stochastic, or deterministic with stochastic noise. If it is entirely umpredictable no learning takes place (we are not about to learn how to predict lotto numbers!) Regularization Methods for High Dimensional Learning Introduction to learning

D ATA GENERATED BY A PROBABILITY DISTRIBUTION We assume that X and Y are two sets of random variables. We consider a set of data S = { ( x 1 , y 1 ) , . . . , ( x n , y n ) } that we call a training set. The training set consists of a set of independent identically distributed samples drawn from the probability distribution on X × Y . The joint conditional probabilities obey to the following: p ( x , y ) = p ( y | x ) p ( x ) . p ( x , y ) is fixed but unknown. Regularization Methods for High Dimensional Learning Introduction to learning

N OISE ... p (y|x) x Y X the same x can generate different y (according to p ( y | x ) ): the underlying process is deterministic, but there is noise in the measurement of y ; the underlying process is not deterministic ; the underlying process is deterministic, but only incomplete information is available. Regularization Methods for High Dimensional Learning Introduction to learning

... AND S AMPLING y EVEN IN A NOISE FREE CASE WE HAVE TO DEAL WITH SAMPLING x the marginal p ( x ) distribution might model errors in the location of the p(x) input points; discretization error for a given grid; presence or absence of certain input instances x Regularization Methods for High Dimensional Learning Introduction to learning

... AND S AMPLING y EVEN IN A NOISE FREE CASE WE HAVE TO DEAL WITH SAMPLING x the marginal p ( x ) distribution might model ✁ � errors in the location of the p(x) input points; discretization error for a given grid; presence or absence of certain input instances x Regularization Methods for High Dimensional Learning Introduction to learning

... AND S AMPLING y EVEN IN A NOISE FREE CASE WE HAVE TO DEAL WITH SAMPLING x the marginal p ( x ) distribution might model errors in the location of the input points; p(x) discretization error for a given grid; presence or absence of certain input instances x Regularization Methods for High Dimensional Learning Introduction to learning

... AND S AMPLING y ✧ ★ EVEN IN A NOISE FREE CASE WE ✪ ✩ ✴ ✳ ✲ ✱ HAVE TO DEAL WITH SAMPLING ✬ ✫ ☛ ✡ ✯ ✰ ✞ ✝ ✵ ✶ ✮ ✭ ✘ ✙ ✍ ✎ ✍ ✎ ✗ ✖ ✁ � x ✏ ✄ ✑ ✂ ☎ ✔ ✕ ✔ ✛ ✚ the marginal p ( x ) distribution ✆ ✌ ☞ ✟ ✠ ✦ ✥ ✒ ✓ might model ✤ ✣ ✢ ✜ errors in the location of the input points; p(x) discretization error for a given grid; presence or absence of certain input instances x Regularization Methods for High Dimensional Learning Introduction to learning

H YPOTHESIS S PACE Predictivity is a trade-off between the information provided by training data and the complexity of the solution we are looking for The hypothesis space, H , is the space of functions where we look for our solution Supervised learning uses the training data to learn a function f of H , f : X → Y , that can be applied to previously unseen data: y pred = f ( x new ) Regularization Methods for High Dimensional Learning Introduction to learning

L OSS FUNCTIONS How do we choose a “good” f ∈ H ? L OSS FUNCTION In order to measure the goodness of our function f we use a non negative function called loss function V . In general V ( f ( x ) , y ) denotes the price to pay in associating f ( x ) to x instead than y Regularization Methods for High Dimensional Learning Introduction to learning

L OSS FUNCTIONS FOR REGRESSION The most common is the square loss or L 2 loss V ( f ( x ) , y ) = ( f ( x ) − y ) 2 Absolute value or L 1 loss : V ( f ( x ) , y ) = | f ( x ) − y | Vapnik’s ǫ - insensitive loss : V ( f ( x ) , y ) = ( | f ( x ) − y | − ǫ ) + Regularization Methods for High Dimensional Learning Introduction to learning

L OSS FUNCTIONS FOR ( BINARY ) CLASSIFICATION The most intuitive one: 0 − 1 -loss : V ( f ( x ) , y ) = θ ( − yf ( x )) ( θ is the step function) The more tractable hinge loss : V ( f ( x ) , y ) = ( 1 − yf ( x )) + And again the square loss or L 2 loss V ( f ( x ) , y ) = ( f ( x ) − y ) 2 Regularization Methods for High Dimensional Learning Introduction to learning

L OSS FUNCTIONS Regularization Methods for High Dimensional Learning Introduction to learning

L EARNING ALGORITHM L EARNING ALGORITHM If Z = X × Y , a learning algorithm is a map L : Z n → H that looks at the training set S and selects from H a function f S : X → Y such that f S ( x ) ∼ y in a generalizing way Regularization Methods for High Dimensional Learning Introduction to learning

W HAT WE HAVE SEEN SO FAR We are considering an input space X and an output space Y ⊂ R an unknown probability distribution on the product space Z = X × Y : p ( X , Y ) a training set of n samples i.i.d. from p : S = { ( x 1 , y 1 ) , . . . , ( x n , y n ) } a hypothesis space H , that is, a space of functions f : X → Y a learning algorithm, that is a map L : Z n → H selecting from H a function f S such that f S ( x ) ∼ y in a predictive way Regularization Methods for High Dimensional Learning Introduction to learning

L EARNING AS RISK MINIMIZATION Learning means to produce a hypothesis making the expected error or true error small Expected error : � I [ f ] = V ( f ( x ) , y ) p ( x , y ) dxdy . X × Y We would like to obtain f H = arg min f ∈H I [ f ] If the probability density is known, then learning is easy! Unfortunatelly it is usually fixed but unknown What we do have is the training set S Regularization Methods for High Dimensional Learning Introduction to learning

E MPIRICAL R ISK M INIMIZATION (ERM) Given a loss function V = V ( y , f ( x )) we define the empirical risk I emp [ f ] as n I emp [ f , S ] = 1 � V ( f ( x i ) , y i ) n i = 1 ERM PRINCIPLE The Empirical Risk Minimization (ERM) principle chooses the function f S ∈ H according to the following f S = arg min f ∈H I emp [ f , S ] . Regularization Methods for High Dimensional Learning Introduction to learning

G OOD QUALITIES OF A SOLUTION For a solution to be useful in the context of learning it must generalize be stable (well posed). Regularization Methods for High Dimensional Learning Introduction to learning

R EMINDER I LL POSED PROBLEM A mathematical problem is well posed in the sense of Hadamard is the solution exists the solution is unique the solution depends continuously on the data If a problem is not well posed it is called ill posed . Regularization Methods for High Dimensional Learning Introduction to learning

R EMINDER C ONVERGENCE IN PROBABILITY Let { X n } be a sequence of bounded random variables. Then n →∞ X n = X lim in probability if ∀ ǫ > 0 n →∞ P {| X n − X | ≥ ǫ } = 0 lim Regularization Methods for High Dimensional Learning Introduction to learning

C ONSISTENCY AND GENERALIZATION A desirable property for f S is consistency: n →∞ I [ f S ] = I [ f H ] lim that is the training error must converge to the expected error — consistency guarantees generalization Regularization Methods for High Dimensional Learning Introduction to learning

N OISE ... p (y|x) x Y X the same x can generate different y - PowerPoint PPT Presentation

I NTRODUCTION TO LEARNING R EGULARIZATION M ETHODS FOR H IGH D IMENSIONAL L EARNING Francesca Odone and Lorenzo Rosasco odone@disi.unige.it - lrosasco@mit.edu June 6, 2011 Regularization Methods for High Dimensional Learning Introduction to

Organisation Oriented Programming with M oise + at the system and agent levels Jomi F. Hbner

Proposal Programming agents with a high abstraction level AgentSpeak BDI agents (reactive

Overview of M oise + framework Jomi F. Hbner (collaboration with Jaime S. Sichman and Olivier

A COUSTIC A SSESSMENT & N OISE A BATEMENT A CTION P LAN FOR S T . M ARYS C EMENT What well

O CTOBER 2009 N IAGARA - ON - THE L AKE I NADEQUACY OF W IND T URBINE N OISE R EGULATIONS AND

T ETERBORO A IRCRAFT N OISE A BATEMENT A DVISORY C OMMITTEE 2016 H ALF Y EAR R EPORT Gabriel

Effects of of Background ound Noi oise and Playback on on Avian D n Detection on Prob

FA FAA In Interim rim Re Response to to Noi Noise Fo Forums Ne NextGen Gen Noi oise Mi

Poi oise sed d for or Discover scovery BOL.V .V - TSXV August 20, 2020 Forwa ward d

Turquoi oise se T Trai ail Ch Char arter Sc r School ol 2016 The Oldest Charter School in

Noi oise Red eductio ion Usi sing Stru tructures Bas ased On n Cou ouple led Helm elmholt

The Message in the Shadow: Noise AND Knowledge William Sharpe, Columbia University oise

Quantifier elimination versus Hilberts 17 th problem Marie-Franc oise Roy Universit e

SENSEI project S ub- E lectron- N oise S kipperCCD E xperimental I nstrument Ultra low-energy

N oise on R esistive S witching : a F okker -P lanck A pproach G.A. Patterson 1 , D.F. Grosz 2 , 3

Outline 1 Organisation Context M oise + 2 Reorganisation Group Phases 3 Programming with

MSc Knowledge Engineering: A List of Topics Michael Rovatsos March 17, 2005 Introduction

A Gentle Introduction to Machine Learning Definition Second Lecture Learn unknown function

Data Mining II Model Validation Heiko Paulheim Why Model Validation? We have seen so far

Decision Trees (Ch. 18.1-18.3) Learning We will (finally) move away from uncertainty (for a

CSCI 446: Artificial Intelligence Neural Nets (wrap-up) and Decision Trees Instructor: Michele

Learning From Observat ions I n w hich w e describe agent s t hat can improve t heir behavior

Chris Hallsworth Statistics Advisory Service Coordinator c.a.hallsworth@bath.ac.uk

String Examples that the bird chased The dog chased the cat the bird chased The dog the bird

N OISE ... p (y|x) x Y X the same x can generate different y - PowerPoint PPT Presentation

I NTRODUCTION TO LEARNING R EGULARIZATION M ETHODS FOR H IGH D IMENSIONAL L EARNING Francesca Odone and Lorenzo Rosasco odone@disi.unige.it - lrosasco@mit.edu June 6, 2011 Regularization Methods for High Dimensional Learning Introduction to

Organisation Oriented Programming with M oise + at the system and agent levels Jomi F. Hbner

Proposal Programming agents with a high abstraction level AgentSpeak BDI agents (reactive

Overview of M oise + framework Jomi F. Hbner (collaboration with Jaime S. Sichman and Olivier

A COUSTIC A SSESSMENT &amp; N OISE A BATEMENT A CTION P LAN FOR S T . M ARYS C EMENT What well

O CTOBER 2009 N IAGARA - ON - THE L AKE I NADEQUACY OF W IND T URBINE N OISE R EGULATIONS AND

T ETERBORO A IRCRAFT N OISE A BATEMENT A DVISORY C OMMITTEE 2016 H ALF Y EAR R EPORT Gabriel

Effects of of Background ound Noi oise and Playback on on Avian D n Detection on Prob

FA FAA In Interim rim Re Response to to Noi Noise Fo Forums Ne NextGen Gen Noi oise Mi

Poi oise sed d for or Discover scovery BOL.V .V - TSXV August 20, 2020 Forwa ward d

Turquoi oise se T Trai ail Ch Char arter Sc r School ol 2016 The Oldest Charter School in

Noi oise Red eductio ion Usi sing Stru tructures Bas ased On n Cou ouple led Helm elmholt

The Message in the Shadow: Noise AND Knowledge William Sharpe, Columbia University oise

Quantifier elimination versus Hilberts 17 th problem Marie-Franc oise Roy Universit e

SENSEI project S ub- E lectron- N oise S kipperCCD E xperimental I nstrument Ultra low-energy

N oise on R esistive S witching : a F okker -P lanck A pproach G.A. Patterson 1 , D.F. Grosz 2 , 3

Outline 1 Organisation Context M oise + 2 Reorganisation Group Phases 3 Programming with

MSc Knowledge Engineering: A List of Topics Michael Rovatsos March 17, 2005 Introduction

A Gentle Introduction to Machine Learning Definition Second Lecture Learn unknown function

Data Mining II Model Validation Heiko Paulheim Why Model Validation? We have seen so far

Decision Trees (Ch. 18.1-18.3) Learning We will (finally) move away from uncertainty (for a

CSCI 446: Artificial Intelligence Neural Nets (wrap-up) and Decision Trees Instructor: Michele

Learning From Observat ions I n w hich w e describe agent s t hat can improve t heir behavior

Chris Hallsworth Statistics Advisory Service Coordinator c.a.hallsworth@bath.ac.uk

String Examples that the bird chased The dog chased the cat the bird chased The dog the bird

A COUSTIC A SSESSMENT & N OISE A BATEMENT A CTION P LAN FOR S T . M ARYS C EMENT What well