Regularization for Multi-Output Learning Lorenzo Rosasco 9.520 L. - PowerPoint PPT Presentation

Regularization for Multi-Output Learning Lorenzo Rosasco 9.520 L. Rosasco Regularization for Multi-Output Learning

About this class Goal In many practical problems, it is convenient to model the object of interest as a function with multiple outputs. In machine learning, this problem typically goes under the name of multi-task or multi-output learning. We present some concepts and algorithms to solve this kind of problems. L. Rosasco Regularization for Multi-Output Learning

Plan Examples and Set-up Tikhonov regularization for multiple output learning Regularizers and Kernels Vector Fields Multiclass Conclusions L. Rosasco Regularization for Multi-Output Learning

Costumers Modeling Costumers Modeling the goal is to model buying preferences of several people based on previous purchases. borrowing strength People with similar tastes will tend to buy similar items and their buying history is related. The idea is then to predict the consumer preferences for all individuals simultaneously by solving a multi-output learning problem. Each consumer is modelled as a task and its previous preferences are the corresponding training set. L. Rosasco Regularization for Multi-Output Learning

Multi-task Learning We are given T scalar tasks. For each task j = 1 , . . . , T , we are given a set of examples n j S j = ( x j i , y j i ) i = 1 sampled i.i.d. according to a distribution P j . The goal is to find f j ( x ) ∼ y j = 1 , . . . , T . L. Rosasco Regularization for Multi-Output Learning

Multi-task Learning Task 1 Y X Task 2 X L. Rosasco Regularization for Multi-Output Learning

Pharmacological Data Blood concentration of a medicine across different times. Each task is a patient. 60 60 40 40 20 20 0 0 0 5 10 15 20 25 0 5 10 15 20 25 60 60 40 40 20 20 0 0 0 5 10 15 20 25 0 5 10 15 20 25 Single-task Multi-task 60 60 Red dots are test and black dots are training points. ( pics from Pillonetto et al. 08) L. Rosasco Regularization for Multi-Output Learning

Names and Applicatons Related problems: conjoint analysis transfer learning collaborative filtering co-kriging Examples of applications: geophysics music recommendation (Dinuzzo 08) pharmacological data (Pillonetto at el. 08) binding data (Jacob et al. 08) movies recommendation (Abernethy et al. 08) HIV Therapy Screening (Bickel et al. 08) L. Rosasco Regularization for Multi-Output Learning

Multi-task Learning: Remarks The framework is very general. The input spaces can be different. The output space can be different. The hypotheses spaces can be different L. Rosasco Regularization for Multi-Output Learning

How Can We Design an Algorithm? In all the above problems one can think of improving performances, by exploiting relation among the different outputs. A possible way to do this is penalized empirical risk minimization f 1 ,..., f T ERR [ f 1 , . . . , f T ] + λ PEN ( f 1 , . . . , f T ) min Typically The error term is the sum of the empirical risks. The penalty term enforces similarity among the tasks. L. Rosasco Regularization for Multi-Output Learning

Error Term We are going to choose the square loss to measure errors. T n 1 ( y j i − f j ( x j ERR [ f 1 , . . . , f T ] = � � i )) 2 n j j = 1 i = 1 L. Rosasco Regularization for Multi-Output Learning

MTL MTL Let f j : X → R , j = 1 , . . . T then T � ERR [ f 1 , . . . , f T ] = I S j [ f j ] j = 1 with n I S [ f ] = 1 � ( y i − f ( x i )) 2 n i = 1 L. Rosasco Regularization for Multi-Output Learning

Building Regularizers We assume that input, output and hypotheses spaces are the same, i.e. X j = X , Y j = Y , and H j = H , for all j = 1 , . . . , T . We also assume H to be a RKHS with kernel K . L. Rosasco Regularization for Multi-Output Learning

Regularizers: Mixed Effect For each component/task the solution is the same function plus a component/task specific component. T T T � f j − PEN ( f 1 , . . . , f T ) = λ � � f j � 2 � � f s � 2 K + γ K j = 1 j = 1 s = 1 L. Rosasco Regularization for Multi-Output Learning

Regularizers: Graph Regularization We can define a regularizer that, in addition to a standard regularization on the single components, forces stronger or weaker similarity through a T × T positive weight matrix M : T T � f ℓ − f q � 2 PEN ( f 1 , . . . , f T ) = γ � � � f ℓ � 2 K M ℓ q + λ K M ℓℓ ℓ, q = 1 ℓ = 1 L. Rosasco Regularization for Multi-Output Learning

Regularizers: cluster The components/tasks are partitioned into c clusters: components in the same cluster should be similar. Let m r , r = 1 , . . . , c , be the cardinality of each cluster, I ( r ) , r = 1 , . . . , c , be the index set of the components that belong to cluster c . c c || f l − f r || 2 r || 2 � � � PEN ( f 1 , . . . , f T ) = γ K + λ m r || f K r = 1 l ∈ I ( r ) r = 1 r , , r = 1 , . . . , c , is the mean in cluster c . where f L. Rosasco Regularization for Multi-Output Learning

How can we find a the solution? We have to solve T n T T T f 1 ,..., f T { 1 i − f j ( x i )) 2 + λ � f j − � � ( y j � � f j � 2 � � f s � 2 min K + γ K } n j = 1 i = 1 j = 1 j = 1 s = 1 (we considered the first regularizer as an example). The theory of RKHS gives us a way to do this using what we already know from the scalar case. L. Rosasco Regularization for Multi-Output Learning

Tikhonov Regularization We now show that for al the above penalties we can define a suitable RKHS with kernel Q (and re-index the sums in the error term), so that T n 1 i − f j ( x i )) 2 + λ PEN ( f 1 , . . . , f T ) } � � ( y j f 1 ,..., f T { min n j j = 1 i = 1 can be written as n T f ∈H { 1 ( y i − f ( x i , t i )) 2 + λ � f � 2 � min Q } n T i = 1 L. Rosasco Regularization for Multi-Output Learning

Kernels at Rescue Consider a (joint) kernel Q : ( X , Π) × ( X , Π) → R , where Π = 1 , . . . T is the index set of the output components. A function in the space is � f ( x , t ) = Q (( x , t ) , ( x i , t i )) c i , i with norm � � f � 2 Q = Q (( x j , t j ) , ( x i , t i )) c i c j . i , j L. Rosasco Regularization for Multi-Output Learning

A Useful Class of Kernels Let A be a T × T positive definite matrix and K a scalar kernel. Consider a kernel Q : ( X , Π) × ( X , Π) → R , defined by Q (( x , t ) , ( x ′ , t ′ )) = K ( x , x ′ ) A t , t ′ . Then the norm of a function is � � f � 2 Q = K ( x i , x j ) A t i t j c i c j . i , j L. Rosasco Regularization for Multi-Output Learning

Regularizers and Kernels If we fix t then f t ( x ) = f ( t , x ) is one of the task. The norm � · � Q can be related to the scalar products among the tasks. � A † � f � 2 s , t � f s , f t � K Q = s , t This implies that : s , t A † s , t � f s , f t � K defines a kernel A regularizer of the form � Q . The norm induced by a kernel Q of the form K ( x , x ′ ) A can be seen as a regularizer. The matrix A encodes relations among outputs. L. Rosasco Regularization for Multi-Output Learning

Regularizers and Kernels We sketch the proof of � f � 2 � A † s , t � f s , f t � K Q = s , t Recall that � f � 2 � Q = K ( x i , x j ) A t i t j c i c j ij and note that if f t ( x ) = � i K ( x , x i ) A t , t i c i , then � � f s , f t � K = K ( x i , x j ) A s , t i A t , t j c i c j . i , j s , t (or rather A † We need to multiply by A − 1 s , t ) the last equality. L. Rosasco Regularization for Multi-Output Learning

Regularization for Multi-Output Learning Lorenzo Rosasco 9.520 L. - PowerPoint PPT Presentation

Regularization for Multi-Output Learning Lorenzo Rosasco 9.520 L. Rosasco Regularization for Multi-Output Learning About this class Goal In many practical problems, it is convenient to model the object of interest as a function with multiple

Regularization for Multi-Output Learning Lorenzo Rosasco 9.520 Class 11 March 9, 2011 L.

Tra ffi c Management as a Service | Ghent, Belgium INPUT PROCESS OUTPUT INPUT PROCESS OUTPUT

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

Regularization Regularization is a general approach to add a complexity parameter to a

Regularization Overview Regularization Overview Problems & Multicollinearity We will

LIC-Based Regularization of Multi-Valued Images David Tschumperl CNRS UMR 6072 (GREYC/ENSICAEN)

File Input and Output File Input and Output 1 / 9 File input/output input function reads values

Chapter 12 Overview Devices and Output Visual Output Dynamic Visualizations Sound

16. Recursion 2 Output: 103 Input: (3 + 5) * 20 Output: 160 Input: -(3 + 5) + 20 Output: 12

CS7015 (Deep Learning) : Lecture 8 Regularization: Bias Variance Tradeoff, l2 regularization,

Multi-Task Learning and Matrix Regularization Andreas Argyriou TTI Chicago Outline

Manifold Regularization Lorenzo Rosasco MIT, 9.520 L. Rosasco Manifold Regularization About

The Learning Problem and Regularization Tomaso Poggio 9.520 Class 02 September 2015 Tomaso

The Learning Problem and Regularization Tomaso Poggio 9.520 Class 02 February 2011 Tomaso

The Learning Problem and Regularization Tomaso Poggio 9.520 Class 02 September 2014 Tomaso

Calibrating the Calibrating the Output of a Linear Output of a Linear Output of a Linear

Two-output models ADVAN CED DEEP LEARN IN G W ITH K ERAS Zach Deane-Mayer Data Scientist

Structure Functions and Low-x Working Group Summary Convenors A. Glazov S. Moch K. Nagano

Job Scheduling and Multiple Access Emre Telatar, EPFL Sibi Raj (EPFL), David Tse (UC Berkeley) 1

Using OpenLoops for one-loop scattering amplitudes Philipp Maierhfer Physik-Institut

Communcation over interference channels Dustin Cartwright 1 February 24, 2011 1 work in progress

Automatic Selection of Partitioning Variables for Small Multiple Displays Anushka Anand, Justin

Computer Architecture: Lecture 6 Multicycle MIPS Implementation Severe 100% midterm

Exokernel An Operating System Architecture for Application-Level Resource Management Josh

Regularization for Multi-Output Learning Lorenzo Rosasco 9.520 L. - PowerPoint PPT Presentation

Regularization for Multi-Output Learning Lorenzo Rosasco 9.520 L. Rosasco Regularization for Multi-Output Learning About this class Goal In many practical problems, it is convenient to model the object of interest as a function with multiple

Regularization for Multi-Output Learning Lorenzo Rosasco 9.520 Class 11 March 9, 2011 L.

Tra ffi c Management as a Service | Ghent, Belgium INPUT PROCESS OUTPUT INPUT PROCESS OUTPUT

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

Regularization Regularization is a general approach to add a complexity parameter to a

Regularization Overview Regularization Overview Problems &amp; Multicollinearity We will

LIC-Based Regularization of Multi-Valued Images David Tschumperl CNRS UMR 6072 (GREYC/ENSICAEN)

File Input and Output File Input and Output 1 / 9 File input/output input function reads values

Chapter 12 Overview Devices and Output Visual Output Dynamic Visualizations Sound

16. Recursion 2 Output: 103 Input: (3 + 5) * 20 Output: 160 Input: -(3 + 5) + 20 Output: 12

CS7015 (Deep Learning) : Lecture 8 Regularization: Bias Variance Tradeoff, l2 regularization,

Multi-Task Learning and Matrix Regularization Andreas Argyriou TTI Chicago Outline

Manifold Regularization Lorenzo Rosasco MIT, 9.520 L. Rosasco Manifold Regularization About

The Learning Problem and Regularization Tomaso Poggio 9.520 Class 02 September 2015 Tomaso

The Learning Problem and Regularization Tomaso Poggio 9.520 Class 02 February 2011 Tomaso

The Learning Problem and Regularization Tomaso Poggio 9.520 Class 02 September 2014 Tomaso

Calibrating the Calibrating the Output of a Linear Output of a Linear Output of a Linear

Two-output models ADVAN CED DEEP LEARN IN G W ITH K ERAS Zach Deane-Mayer Data Scientist

Structure Functions and Low-x Working Group Summary Convenors A. Glazov S. Moch K. Nagano

Job Scheduling and Multiple Access Emre Telatar, EPFL Sibi Raj (EPFL), David Tse (UC Berkeley) 1

Using OpenLoops for one-loop scattering amplitudes Philipp Maierhfer Physik-Institut

Communcation over interference channels Dustin Cartwright 1 February 24, 2011 1 work in progress

Automatic Selection of Partitioning Variables for Small Multiple Displays Anushka Anand, Justin

Computer Architecture: Lecture 6 Multicycle MIPS Implementation Severe 100% midterm

Exokernel An Operating System Architecture for Application-Level Resource Management Josh

Regularization Overview Regularization Overview Problems & Multicollinearity We will