RegML 2020 Class 1 Statistical Learning Theory Lorenzo Rosasco - PowerPoint PPT Presentation

RegML 2020 Class 1 Statistical Learning Theory Lorenzo Rosasco UNIGE-MIT-IIT

All starts with DATA ◮ Supervised: { ( x 1 , y 1 ) , . . . , ( x n , y n ) } , ◮ Unsupervised: { x 1 , . . . , x m } , ◮ Semi-supervised: { ( x 1 , y 1 ) , . . . , ( x n , y n ) } ∪ { x 1 , . . . , x m } L.Rosasco, RegML 2020

Learning from examples L.Rosasco, RegML 2020

Setting for the supervised learning problem ◮ X × Y probability space, with measure ρ . ◮ S n = ( x 1 , y 1 ) , . . . , ( x n , y n ) ∼ ρ n , i.e. sampled i.i.d. ◮ L : Y × Y → [0 , ∞ ) , measurable loss function . ◮ Expected risk � E ( f ) = L ( y, f ( x )) dρ ( x, y ) . X × Y Problem: Solve f : X → Y E ( f ) , min given only S n ( ρ fixed, but unknown). L.Rosasco, RegML 2020

Data space X Y �� output space input space L.Rosasco, RegML 2020

Input space X input space: ◮ linear spaces, e. g. – vectors, – functions, – matrices/operators ◮ “structured” spaces, e. g. – strings, – probability distributions, – graphs L.Rosasco, RegML 2020

Output space Y output space ◮ linear spaces, e. g. – Y = R , regression, – Y = R T , multi-task regression, – Y Hilbert space, functional regression, ◮ “structured” spaces – Y = { +1 , − 1 } , classification, – Y = { 1 , . . . , T } , multi-label classification, – strings, – probability distributions, – graphs L.Rosasco, RegML 2020

Probability distribution Reflects uncertainty and stochasticity of the learning problem ρ ( x, y ) = ρ X ( x ) ρ ( y | x ) , ◮ ρ X marginal distribution on X , ◮ ρ ( y | x ) conditional distribution on Y given x ∈ X . L.Rosasco, RegML 2020

Conditional distribution and noise f ∗ ( x 4 , y 4 ) ( x 1 , y 1 ) ( x 5 , y 5 ) ( x 3 , y 3 ) ( x 2 , y 2 ) Regression y i = f ∗ ( x i ) + ǫ i , ◮ Let f ∗ : X → Y , fixed function ◮ ǫ 1 , . . . , ǫ n zero mean random variables ◮ x 1 , . . . , x n random L.Rosasco, RegML 2020

Conditional distribution and misclassification Classification ρ ( y | x ) = { ρ (1 | x ) , ρ ( − 1 | x ) } , 0.9 1 Noise in classification: overlap between the classes � � � � � � � ρ (1 | x ) − ρ ( − 1 | x ) � ≤ t ∆ t = x ∈ X � L.Rosasco, RegML 2020

Marginal distribution and sampling ρ X takes into account uneven sampling of the input space L.Rosasco, RegML 2020

Marginal distribution, densities and manifolds p ( x ) = dρ X ( x ) → p ( x ) = dρ X ( x ) d vol( x ) , dx 1.0 1.0 0.5 0.5 0.0 0.0 � 0.5 � 0.5 � 1.0 � 1.0 � 1.0 � 0.5 0.0 0.5 1.0 � 1.0 � 0.5 0.0 0.5 1.0 L.Rosasco, RegML 2020

Loss functions L : Y × Y → [0 , ∞ ) , ◮ The cost of predicting f ( x ) in place of y . � ◮ Part of the problem definition E ( f ) = L ( y, f ( x )) dρ ( x, y ) ◮ Measures the pointwise error , L.Rosasco, RegML 2020

Losses for regression L ( y, y ′ ) = L ( y − y ′ ) ◮ Square loss L ( y, y ′ ) = ( y − y ′ ) 2 , ◮ Absolute loss L ( y, y ′ ) = | y − y ′ | , ◮ ǫ -insensitive L ( y, y ′ ) = max( | y − y ′ | − ǫ, 0) , 1.0 0.8 Square Loss 0.6 Absolute - insensitive 0.4 0.2 1.0 0.5 0.5 1.0 L.Rosasco, RegML 2020

Losses for classification L ( y, y ′ ) = L ( − yy ′ ) ◮ 0-1 loss L ( y, y ′ ) = 1 {− yy ′ > 0 } ◮ Square loss L ( y, y ′ ) = (1 − yy ′ ) 2 , ◮ Hinge-loss L ( y, y ′ ) = max(1 − yy ′ , 0) , ◮ logistic loss L ( y, y ′ ) = log(1 + exp( − yy ′ )) , 2.0 1.5 0 1 loss square loss 1.0 Hinge loss Logistic loss 0.5 0.5 1 2 L.Rosasco, RegML 2020

Losses for structured prediction Loss specific for each learning task e. g. ◮ Multi-class: square loss, weighted square loss, logistic loss, . . . ◮ Multi-task: weighted square loss, absolute, . . . ◮ . . . L.Rosasco, RegML 2020

Expected risk � E ( f ) = E L ( f ) = L ( y, f ( x )) dρ ( x, y ) X × Y note that f ∈ F where F = { f : X → Y | f measurable } . Example Y = {− 1 , +1 } , L ( y, f ( x )) = 1 {− yf ( x ) > 0 } E ( f ) = P ( { ( x, y ) ∈ X × Y | f ( x ) � = y } ) . L.Rosasco, RegML 2020

Target function f ρ = arg min f ∈F E ( f ) , can be derived for many loss functions... L.Rosasco, RegML 2020

Target functions in regression square loss , � f ρ ( x ) = ydρ ( y | x ) Y absolute loss , f ρ ( x ) = median ρ ( y | x ) , where � y � + ∞ median p ( · ) = y s . t . tdp ( t ) = tdp ( t ) . −∞ y L.Rosasco, RegML 2020

Learning algorithms S n → � f n = � f S n f n estimates f ρ given the observed examples S n How to measure the error of an estimator? L.Rosasco, RegML 2020

Excess risk Excess Risk: E ( � f ) − inf f ∈F E ( f ) , Consistency: For any ǫ > 0 � � E ( � n →∞ P lim f ) − inf f ∈F E ( f ) > ǫ = 0 , L.Rosasco, RegML 2020

Tail bounds, sample complexity and error bound ◮ Tail bounds : For any ǫ > 0 , n ∈ N � � E ( � P f ) − inf f ∈F E ( f ) > ǫ ≤ δ ( n, F , ǫ ) ◮ Sample complexity: For any ǫ > 0 , δ ∈ (0 , 1] , when n ≥ n 0 ( ǫ, δ, F ) � � E ( � f ) − inf f ∈F E ( f ) > ǫ ≤ δ, P ◮ Error bounds : For any δ ∈ (0 , 1] , n ∈ N , with probability at least 1 − δ , E ( � f ) − inf f ∈F E ( f ) ≤ ǫ ( n, F , δ ) , L.Rosasco, RegML 2020

Error bounds and no free-lunch theorem Theorem For any � f , there exists a problem for which E ( E ( � f ) − inf f ∈F E ( f )) > 0 L.Rosasco, RegML 2020

No free-lunch theorem continued Theorem For any � f , there exists a ρ such that E ( E ( � f ) − inf f ∈F E ( f )) > 0 F → H Hypothesis space L.Rosasco, RegML 2020

Hypothesis space H ⊂ F E.g. X = R d � d w j x j , | w ∈ R d , ∀ x ∈ X } H = { f ( x ) = � w, x � = j =1 then H ⋍ R d . L.Rosasco, RegML 2020

Finite dictionaries D = { φ i : X → R | i = 1 , . . . , p } p � H = { f ( x ) = w j φ j ( x ) | w 1 , . . . , w p ∈ R , ∀ x ∈ X } j =1 f ( x ) = w ⊤ Φ( x ) , Φ( x ) = ( φ 1 ( x ) , . . . , φ p ( x )) L.Rosasco, RegML 2020

This class Learning theory ingredients ◮ Data space/distribution ◮ Loss function, risks and target functions ◮ Learning algorithms and error estimates ◮ Hypothesis space L.Rosasco, RegML 2020

Next class ◮ Regularized learning algorithm: penalization ◮ Statistics and computations ◮ Nonparametrics and kernels L.Rosasco, RegML 2020

RegML 2020 Class 1 Statistical Learning Theory Lorenzo Rosasco - PowerPoint PPT Presentation

RegML 2020 Class 1 Statistical Learning Theory Lorenzo Rosasco UNIGE-MIT-IIT All starts with DATA Supervised: { ( x 1 , y 1 ) , . . . , ( x n , y n ) } , Unsupervised: { x 1 , . . . , x m } , Semi-supervised: { ( x 1 , y 1 ) , . . .

RegML 2020 Class 7 Dictionary learning Lorenzo Rosasco UNIGE-MIT-IIT Data representation A

RegML 2016 Class 7 Dictionary learning Lorenzo Rosasco UNIGE-MIT-IIT June 30, 2016 Data

RegML 2016 Class 6 Structured sparsity Lorenzo Rosasco UNIGE-MIT-IIT June 30, 2016 Exploiting

RegML 2020 Class 4 Regularization for multi-task learning Lorenzo Rosasco UNIGE-MIT-IIT

RegML 2020 Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT Learning

RegML 2016 Class 4 Regularization for multi-task learning Lorenzo Rosasco UNIGE-MIT-IIT June

RegML 2020 Class 3 Early Stopping and Spectral Regularization Lorenzo Rosasco UNIGE-MIT-IIT

Applications to high dimensional problems Francesca Odone and Lorenzo Rosasco RegML 2013

MIT 9.520/6.860, Fall 2019 Statistical Learning Theory and Applications Class 02: Statistical

COMPLETE STATISTICAL THEORY OF LEARNING LEARNING USING STATISTICAL INVARIANTS Vladimir Vapnik

Statistical and Computational Statistical and Computational Learning Theory Learning Theory

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Advanced Econometrics 2, Hilary term 2020 Statistical decision theory Maximilian Kasy Department

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

Chapter 1: Probability Theory (a recap) STK4011/9011: Statistical Inference Theory Johan Pensar

Programming Abstraction in C++ Eric S. Roberts and Julie Zelenski Stanford University 2010

E s t i m a t i o n Es st ti im ma at ti io on n E o f t h e of f t th

Bidirectional Model-based Policy Optimization Hang Lai , Jian Shen, Weinan Zhang, Yong Yu

May 2017 Jeff Tongs Director Technical and Quality Accounting Treats or Threats? Are you

DISCOURSE EXPECTATIONS IN A NON-NATIVE LANGUAGE Theres Grter 1 , Hannah Rohde 2 & Amy J.

Hypothesis testing and statistical decision theory Lirong Xia March 25, 2016 Schedule

Non-Relativistic Ion Beam Diagnostics Chris Richard Budker Seminar 12-2-18 Outline Goals

Credits and the Instability of the Financial System: a Physicists Point of View Thomas Guhr

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

RegML 2020 Class 1 Statistical Learning Theory Lorenzo Rosasco - PowerPoint PPT Presentation

RegML 2020 Class 1 Statistical Learning Theory Lorenzo Rosasco UNIGE-MIT-IIT All starts with DATA Supervised: { ( x 1 , y 1 ) , . . . , ( x n , y n ) } , Unsupervised: { x 1 , . . . , x m } , Semi-supervised: { ( x 1 , y 1 ) , . . .

RegML 2020 Class 7 Dictionary learning Lorenzo Rosasco UNIGE-MIT-IIT Data representation A

RegML 2016 Class 7 Dictionary learning Lorenzo Rosasco UNIGE-MIT-IIT June 30, 2016 Data

RegML 2016 Class 6 Structured sparsity Lorenzo Rosasco UNIGE-MIT-IIT June 30, 2016 Exploiting

RegML 2020 Class 4 Regularization for multi-task learning Lorenzo Rosasco UNIGE-MIT-IIT

RegML 2020 Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT Learning

RegML 2016 Class 4 Regularization for multi-task learning Lorenzo Rosasco UNIGE-MIT-IIT June

RegML 2020 Class 3 Early Stopping and Spectral Regularization Lorenzo Rosasco UNIGE-MIT-IIT

Applications to high dimensional problems Francesca Odone and Lorenzo Rosasco RegML 2013

MIT 9.520/6.860, Fall 2019 Statistical Learning Theory and Applications Class 02: Statistical

COMPLETE STATISTICAL THEORY OF LEARNING LEARNING USING STATISTICAL INVARIANTS Vladimir Vapnik

Statistical and Computational Statistical and Computational Learning Theory Learning Theory

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Advanced Econometrics 2, Hilary term 2020 Statistical decision theory Maximilian Kasy Department

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

Chapter 1: Probability Theory (a recap) STK4011/9011: Statistical Inference Theory Johan Pensar

Programming Abstraction in C++ Eric S. Roberts and Julie Zelenski Stanford University 2010

E s t i m a t i o n Es st ti im ma at ti io on n E o f t h e of f t th

Bidirectional Model-based Policy Optimization Hang Lai , Jian Shen, Weinan Zhang, Yong Yu

May 2017 Jeff Tongs Director Technical and Quality Accounting Treats or Threats? Are you

DISCOURSE EXPECTATIONS IN A NON-NATIVE LANGUAGE Theres Grter 1 , Hannah Rohde 2 &amp; Amy J.

Hypothesis testing and statistical decision theory Lirong Xia March 25, 2016 Schedule

Non-Relativistic Ion Beam Diagnostics Chris Richard Budker Seminar 12-2-18 Outline Goals

Credits and the Instability of the Financial System: a Physicists Point of View Thomas Guhr

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

DISCOURSE EXPECTATIONS IN A NON-NATIVE LANGUAGE Theres Grter 1 , Hannah Rohde 2 & Amy J.