RegML 2020 Class 4 Regularization for multi-task learning Lorenzo - PowerPoint PPT Presentation

RegML 2020 Class 4 Regularization for multi-task learning Lorenzo Rosasco UNIGE-MIT-IIT

Supervised learning so far ◮ Regression f : X → Y ⊆ R ◮ Classification f : X → Y = {− 1 , 1 } What next? ◮ Vector-valued f : X → Y ⊆ R T ◮ Multiclass f : X → Y = { 1 , 2 , . . . , T } ◮ ... L.Rosasco, RegML 2020

Multitask learning Given i ) n 1 i ) n T S 1 = ( x 1 i , y 1 i =1 , . . . , S T = ( x T i , y T i =1 find f 1 : X 1 → Y 1 , . . . , f T : X T → Y T L.Rosasco, RegML 2020

Multitask learning Given i ) n 1 i ) n T S 1 = ( x 1 i , y 1 i =1 , . . . , S T = ( x T i , y T i =1 find f 1 : X 1 → Y 1 , . . . , f T : X T → Y T ◮ vector valued regression, S n = ( x i , y i ) n y i ∈ R T i =1 , x i ∈ X, MTL with equal inputs! Output coordinates are “tasks” ◮ multiclass S n = ( x i , y i ) n i =1 , x i ∈ X, y i ∈ { 1 , . . . , T } L.Rosasco, RegML 2020

Why MTL? Task 1 Y X Task 2 X L.Rosasco, RegML 2020

Why MTL? 60 60 40 40 20 20 0 0 0 5 10 15 20 25 0 5 10 15 20 25 60 60 40 40 20 20 0 0 0 5 10 15 20 25 0 5 10 15 20 25 Real data! L.Rosasco, RegML 2020

Why MTL? Related problems: ◮ conjoint analysis ◮ transfer learning ◮ collaborative filtering ◮ co-kriging Examples of applications: ◮ geophysics ◮ music recommendation (Dinuzzo 08) ◮ pharmacological data (Pillonetto at el. 08) ◮ binding data (Jacob et al. 08) ◮ movies recommendation (Abernethy et al. 08) ◮ HIV Therapy Screening (Bickel et al. 08) L.Rosasco, RegML 2020

Why MTL? VVR, e.g. vector fields estimation L.Rosasco, RegML 2020

Why MTL? Component 1 Y X Component 2 X L.Rosasco, RegML 2020

Penalized regularization for MTL err( w 1 , . . . , w T ) + pen( w 1 , . . . , w T ) We start with linear models f 1 ( x ) = w ⊤ 1 x, . . . , f T ( x ) = w ⊤ T x L.Rosasco, RegML 2020

Empirical error T n i � � 1 � ( y i j − w ⊤ i x i j ) 2 E ( w 1 , . . . , w T ) = n i i =1 j =1 ◮ could consider other losses ◮ could try to “couple” errors L.Rosasco, RegML 2020

Least squares error We focus on vector valued regression (VVR) S n = ( x i , y i ) n y i ∈ R T x i ∈ X, i =1 , L.Rosasco, RegML 2020

Least squares error We focus on vector valued regression (VVR) S n = ( x i , y i ) n y i ∈ R T x i ∈ X, i =1 , � T � n 1 t x i ) 2 = 1 ˆ − � ( y t i − w ⊤ � 2 n � X W Y �� F n t =1 i =1 n × d d × T n × T � y t � W � 2 F = Tr( W ⊤ W ) , W = ( w 1 , . . . , w T ) , Y it = ˆ i i = 1 . . . n t = 1 . . . T L.Rosasco, RegML 2020

MTL by regularization pen( w 1 . . . w T ) ◮ Coupling task solutions by regularization ◮ Borrowing strength ◮ Exploit structure L.Rosasco, RegML 2020

Regularizations for MTL � T � w t � 2 pen ( w 1 , . . . , w T ) = t =1 L.Rosasco, RegML 2020

Regularizations for MTL � T � w t � 2 pen ( w 1 , . . . , w T ) = t =1 Single tasks regularization! � T � n � T 1 t x i ) 2 + λ � w t � 2 = ( y t i − w ⊤ min n w 1 ,...,w T t =1 i =1 t =1 � T � n 1 t x i ) 2 + λ � w t � 2 ) ( y t i − w ⊤ (min n w t t =1 i =1 L.Rosasco, RegML 2020

Regularizations for MTL ◮ Isotropic coupling � � 2 � � T T T � � � � w j − 1 � � � w j � 2 (1 − α ) + α w i � � T � j =1 j =1 i =1 L.Rosasco, RegML 2020

Regularizations for MTL ◮ Isotropic coupling � � 2 � � T T T � � � � w j − 1 � � � w j � 2 (1 − α ) + α w i � � T � j =1 j =1 i =1 ◮ Graph coupling - Let M ∈ R T × T an adjacency matrix, with M ts ≥ 0 T T T � � � M ts � w t − w s � 2 + γ � w t � 2 t =1 s =1 t =1 special case: output divided in clusters L.Rosasco, RegML 2020

A general form of regularization All the regularizers so far are of the form T T � � A ts w ⊤ t w s t =1 s =1 for a suitable positive definite matrix A L.Rosasco, RegML 2020

MTL regularization revisited ◮ Single tasks � T j =1 � w j � 2 = ⇒ A = I L.Rosasco, RegML 2020

MTL regularization revisited ◮ Single tasks � T j =1 � w j � 2 = ⇒ A = I ◮ Isotropic coupling � � 2 � � � T � T � T � � w j − 1 � � � w j � 2 (1 − α ) + α w j � � T � � j =1 j =1 j =1 A = I − α = ⇒ T 1 L.Rosasco, RegML 2020

MTL regularization revisited ◮ Single tasks � T j =1 � w j � 2 = ⇒ A = I ◮ Isotropic coupling � � 2 � � � T � T � T � � w j − 1 � � � w j � 2 (1 − α ) + α w j � � T � � j =1 j =1 j =1 A = I − α = ⇒ T 1 ◮ Graph coupling � T � T � T M ts � w t − w s � 2 + γ � w t � 2 t =1 s =1 t =1 = ⇒ A = L + γI, where L graph Laplacian of M � � L = D − M, D = diag ( M T,j , ) M 1 ,j , . . . , j j L.Rosasco, RegML 2020

A general form of regularization A ∈ R T × T Let W = ( w 1 , . . . , w T ) , Note that T T � � A ts w ⊤ t w s = Tr( WAW ⊤ ) t =1 s =1 L.Rosasco, RegML 2020

A general form of regularization A ∈ R T × T Let W = ( w 1 , . . . , w T ) , Note that T T � � A ts w ⊤ t w s = Tr( WAW ⊤ ) t =1 s =1 Indeed � d � d � T Tr( WAW ⊤ ) = ⊤ AW i = W i A ts W it W is i =1 i =1 t,s =1 T d T � � � A ts w ⊤ = A ts W is W ir = t w s t,s =1 i =1 t,s =1 L.Rosasco, RegML 2020

Computations 1 n � � XW − � F + λ Tr( WAW ⊤ ) Y � 2 L.Rosasco, RegML 2020

Computations 1 n � � XW − � F + λ Tr( WAW ⊤ ) Y � 2 Consider the SVD A = U Σ U ⊤ , Σ = diag ( σ 1 , . . . , σ T ) L.Rosasco, RegML 2020

Computations 1 n � � XW − � F + λ Tr( WAW ⊤ ) Y � 2 Consider the SVD A = U Σ U ⊤ , Σ = diag ( σ 1 , . . . , σ T ) let ˜ ˜ Y = � W = WU, Y U then we can rewrite the above problem as 1 X ˜ W − ˜ F + λ Tr( ˜ W Σ ˜ n � � Y � 2 W ⊤ ) L.Rosasco, RegML 2020

Computations (cont.) Finally, rewrite 1 n � � X ˜ W − ˜ F + λ Tr( ˜ W Σ ˜ W ⊤ ) Y � 2 as T n � � ( 1 t x i ) 2 + λσ t � ˜ y t w ⊤ w t � 2 ) (˜ i − ˜ n t =1 i =1 and use W = ˜ WU ⊤ Compare to single task regularization. . . L.Rosasco, RegML 2020

Computations (cont.) E λ ( W ) = 1 n � � XW − � Y � 2 F + λ Tr( WAW ⊤ ) Alternatively ∇E λ ( W ) = 2 X ⊤ ( � � XW − � Y ) + 2 λWA n W t +1 = W t − γ ∇E λ ( W t ) Trivially extends to other loss functions. L.Rosasco, RegML 2020

Beyond Linearity f t ( x ) = w ⊤ t Φ( x ) , Φ( x ) = ( φ 1 ( x ) , . . . , φ p ( x )) E λ ( W ) = 1 Y � 2 + λ Tr( WAW ⊤ ) , n � � Φ W − � with � Φ matrix with rows Φ( x 1 ) , . . . , Φ( x n ) L.Rosasco, RegML 2020

Nonparametrics and kernels � n f t ( x ) = K ( x, x i ) C it i =1 with � 2 � KC ℓ − � � C ℓ +1 = C ℓ − γ Y + 2 λC ℓ A n ◮ C ℓ ∈ R n × T ◮ � K ∈ R n × n , � K ij = K ( x i , x j ) Y ∈ R n × T , � ◮ � Y ij = y j i L.Rosasco, RegML 2020

Spectral filtering for MTL Beyond penalization 1 Y � 2 + λ Tr( WAW ⊤ ) , n � � XW − � min W other forms of regularizations can be considered ◮ projection ◮ early stopping L.Rosasco, RegML 2020

Multiclass and MTL Y = { 1 , . . . , T } L.Rosasco, RegML 2020

From Multiclass to MTL Encoding For j = 1 , . . . , T j �→ e j canonical vector of R T the problem reduces to vector valued regression Decoding For f ( x ) ∈ R T e ⊤ f ( x ) �→ argmax t f ( x ) = argmax f t ( x ) t =1 ,...t t =1 ,...t L.Rosasco, RegML 2020

Single MTL and OVA Write 1 Y � 2 + λ Tr( WW ⊤ ) , n � � XW − � min W as � T � n t 1 i ) 2 + λ � w t � 2 ( w ⊤ t x t i − y t min n w t t =1 i =1 This is known as one versus all (OVA) L.Rosasco, RegML 2020

Beyond OVA Consider 1 Y � 2 + λ Tr( WAW ⊤ ) , n � � XW − � min W that is � T � T � n ( 1 t x i ) 2 + λσ t � ˜ y t w ⊤ w t � 2 ) min (˜ i − ˜ n w t ˜ t =1 t =1 i =1 Class relatedness encoded in A L.Rosasco, RegML 2020

Back to MTL T n t � � 1 ( y t i x t j − w ⊤ j ) 2 n t t =1 j =1 ⇓ T � � ( ˆ � 2 X W − Y ) ⊙ M F , n = n t �� t =1 n × d d × T n × T n × T ◮ ⊙ Hadamard product ◮ M mask ◮ Y having one non-zero value for each row L.Rosasco, RegML 2020

Computations W � ( ˆ XW − Y ) ⊙ M � 2 F + λ Tr( WAW ⊤ ) min ◮ can be rewritten using tensor calculus ◮ computation for vector valued regression easily extended ◮ sparsity of M can be exploited L.Rosasco, RegML 2020

From MTL to matrix completion Special case Take d = n and X = I � ( ˆ XW − Y ) ◦ M � 2 F ⇓ T n � � y ij ) 2 M ij ( w ij − ¯ t =1 i =1 L.Rosasco, RegML 2020

Summary so far A regularization framework for ◮ VVR ◮ Multiclass ◮ MTL ◮ Matrix completion if the structure of the “tasks” is known. What if it is not? L.Rosasco, RegML 2020

The structure of MTL Consider 1 Y � 2 + λ Tr( WAW ⊤ ) , n � � XW − � min W the matrix A encodes structure. Can we learn it? L.Rosasco, RegML 2020

RegML 2020 Class 4 Regularization for multi-task learning Lorenzo - PowerPoint PPT Presentation

RegML 2020 Class 4 Regularization for multi-task learning Lorenzo Rosasco UNIGE-MIT-IIT Supervised learning so far Regression f : X Y R Classification f : X Y = { 1 , 1 } What next? Vector-valued f : X Y R T

RegML 2016 Class 4 Regularization for multi-task learning Lorenzo Rosasco UNIGE-MIT-IIT June

RegML 2020 Class 7 Dictionary learning Lorenzo Rosasco UNIGE-MIT-IIT Data representation A

RegML 2020 Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT Learning

RegML 2020 Class 3 Early Stopping and Spectral Regularization Lorenzo Rosasco UNIGE-MIT-IIT

RegML 2016 Class 7 Dictionary learning Lorenzo Rosasco UNIGE-MIT-IIT June 30, 2016 Data

RegML 2016 Class 6 Structured sparsity Lorenzo Rosasco UNIGE-MIT-IIT June 30, 2016 Exploiting

Multi-Task Learning and Matrix Regularization Andreas Argyriou TTI Chicago Outline

Regularization for Multi-Output Learning Lorenzo Rosasco 9.520 L. Rosasco Regularization for

Regularization for Multi-Output Learning Lorenzo Rosasco 9.520 Class 11 March 9, 2011 L.

Applications to high dimensional problems Francesca Odone and Lorenzo Rosasco RegML 2013

Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 L. Rosasco

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

Regularization Regularization is a general approach to add a complexity parameter to a

RegML 2020 Class 1 Statistical Learning Theory Lorenzo Rosasco UNIGE-MIT-IIT All starts with

H OW C AN W E D ESIGN AN A LGORITHM ? A possible way to do this is penalized empirical risk

Regularization Overview Regularization Overview Problems & Multicollinearity We will

Incorporating Engineering Knowledge Max Yi Ren, Panos Y. Papalambros University of Michigan, Ann

Financing on-site power in Myanmar On-site power for off-grid telecom tower Grid-tied rooftop

Systems Engineering and the Sins of Complex Software OSADL Nicholas Mc Guire <

Privacy engineering, CyLab privacy by design, privacy impact assessments, and privacy governance

3. Preference Learning Techniques 4. Complexity of Preference Learning 5. Conclusions 1 ECAI

Leveraging Multimodal LDA for Hyperlinking Anca Roxana Simon, Ronan Sicre, R emi Bois,

Representing, Eliciting, and Reasoning with Preferences ICAPS-09 Tutorial Ronen Brafman Carmel

Representing, Eliciting, and Reasoning with Preferences AAAI-07 Tutorial Forum Ronen Brafman

RegML 2020 Class 4 Regularization for multi-task learning Lorenzo - PowerPoint PPT Presentation

RegML 2020 Class 4 Regularization for multi-task learning Lorenzo Rosasco UNIGE-MIT-IIT Supervised learning so far Regression f : X Y R Classification f : X Y = { 1 , 1 } What next? Vector-valued f : X Y R T

RegML 2016 Class 4 Regularization for multi-task learning Lorenzo Rosasco UNIGE-MIT-IIT June

RegML 2020 Class 7 Dictionary learning Lorenzo Rosasco UNIGE-MIT-IIT Data representation A

RegML 2020 Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT Learning

RegML 2020 Class 3 Early Stopping and Spectral Regularization Lorenzo Rosasco UNIGE-MIT-IIT

RegML 2016 Class 7 Dictionary learning Lorenzo Rosasco UNIGE-MIT-IIT June 30, 2016 Data

RegML 2016 Class 6 Structured sparsity Lorenzo Rosasco UNIGE-MIT-IIT June 30, 2016 Exploiting

Multi-Task Learning and Matrix Regularization Andreas Argyriou TTI Chicago Outline

Regularization for Multi-Output Learning Lorenzo Rosasco 9.520 L. Rosasco Regularization for

Regularization for Multi-Output Learning Lorenzo Rosasco 9.520 Class 11 March 9, 2011 L.

Applications to high dimensional problems Francesca Odone and Lorenzo Rosasco RegML 2013

Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 L. Rosasco

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

Regularization Regularization is a general approach to add a complexity parameter to a

RegML 2020 Class 1 Statistical Learning Theory Lorenzo Rosasco UNIGE-MIT-IIT All starts with

H OW C AN W E D ESIGN AN A LGORITHM ? A possible way to do this is penalized empirical risk

Regularization Overview Regularization Overview Problems &amp; Multicollinearity We will

Incorporating Engineering Knowledge Max Yi Ren, Panos Y. Papalambros University of Michigan, Ann

Financing on-site power in Myanmar On-site power for off-grid telecom tower Grid-tied rooftop

Systems Engineering and the Sins of Complex Software OSADL Nicholas Mc Guire &lt;

Privacy engineering, CyLab privacy by design, privacy impact assessments, and privacy governance

3. Preference Learning Techniques 4. Complexity of Preference Learning 5. Conclusions 1 ECAI

Leveraging Multimodal LDA for Hyperlinking Anca Roxana Simon, Ronan Sicre, R emi Bois,

Representing, Eliciting, and Reasoning with Preferences ICAPS-09 Tutorial Ronen Brafman Carmel

Representing, Eliciting, and Reasoning with Preferences AAAI-07 Tutorial Forum Ronen Brafman

Regularization Overview Regularization Overview Problems & Multicollinearity We will

Systems Engineering and the Sins of Complex Software OSADL Nicholas Mc Guire <