multi task learning
play

Multi-Task Learning: Models, Optimization and Applications Linli Xu - PowerPoint PPT Presentation

Multi-Task Learning: Models, Optimization and Applications Linli Xu University of Science and Technology of China University of Science and Technology of China Outline Introduction to multi-task learning (MTL): problem and models


  1. Multi-Task Learning: Models, Optimization and Applications Linli Xu University of Science and Technology of China University of Science and Technology of China

  2. Outline • Introduction to multi-task learning (MTL): problem and models • Multi-task learning with task-feature co-clusters • Low-rank optimization in multi-task learning • Multi-task learning applied to trajectory regression 2016/11/5 2

  3. Multiple Tasks Examination Scores Prediction 1 (Argyriou et. al. ’08 ) School 1 - Alverno High School Student Birth Previous … School … Exam id year score ranking score 72981 1985 95 … 83% … ? … school-dependent student-dependent School 138 - Jefferson Intermediate School Exam Student Birth Previous … School … score id year score ranking ? 31256 1986 87 … 72% … student-dependent school-dependent School 139 - Rosemead High School Exam Student Birth Previous … School … score id year score ranking ? 12381 1986 83 … 77% … school-dependent student-dependent 1 The Inner London Education Authority (ILEA) 2016/11/5 5

  4. Learning Multiple Tasks Learning each task independently School 1 - Alverno High School Exam Student Birth Previous School … 1st Score id year score ranking task ? 72981 1985 95 83% … Excellent … School 138 - Jefferson Intermediate School Student Birth Previous School … Exam 138th id year score ranking Score task 31256 1986 87 72% … ? Excellent School 139 - Rosemead High School Exam Student Birth Previous School … 139th id year score ranking Score task ? 12381 1986 83 77% … Excellent 2016/11/5 6

  5. Learning Multiple Tasks Learning multiple tasks simultaneously School 1 - Alverno High School Exam Student Birth Previous School … 1st Score id year score ranking task ? 72981 1985 95 83% … … School 138 - Jefferson Intermediate School Exam Student Birth Previous School … 138th Score id year score ranking task ? 31256 1986 87 72% … School 139 - Rosemead High School Student Birth Previous School … Exam 139th id year score ranking Score task 12381 1986 83 77% … ? Learn tasks simultaneously …… Model the task relationships 2016/11/5 7

  6. Multi-Task Learning Single Task Learning • Different from single task Task 1 Training Data Model Training learning Task 2 Training Data Training Model … … Task m Training Data Model Training Multi-Task Learning • Training multiple tasks Task 1 Training Data Model simultaneously to exploit task relationships Task 2 Training Data Model Training … … Task m Training Data Model 2016/11/5 8

  7. Exploiting Task Relationships Key challenge in multi-task learning: Exploiting (statistical) relationships between the tasks so as to improve individual and/or overall predictive accuracy (in comparison to training individual models)! 2016/11/5 10

  8. How Tasks Are Related? • All tasks are related – Models of all tasks are close to each other; – Models of all tasks share a common set of features; – Models share the same low rank subspace • Structure in tasks – clusters / graphs / trees • Learning with outlier tasks 2016/11/5 11

  9. Regularization-based Multi-Task Learning Task m Dimension d Task m Task m Sample n 2 Sample n 2 Sample n m Sample n m Dimension d ... ... Learning Sample n 1 Sample n 1 Feature Matrices X i Target Vectors Y i Model Matrix W We focus on linear models: 𝑍 𝑗 ~𝑌 𝑗 𝒙 𝑗 𝑌 𝑗 ∈ ℝ 𝑜 𝑗 ×𝑒 , 𝑍 𝑗 ∈ ℝ 𝑜 𝑗 ×1 , 𝑋 = [𝒙 1 , 𝒙 2 , … , 𝒙 𝑛 ] ∈ ℝ 𝑒×𝑛 Generic framework 𝑀𝑝𝑡𝑡 𝑋, 𝑌 𝑗 , 𝑍 𝑗 + 𝜇 𝑆𝑓𝑕(𝑋) min 𝑋 𝑗 Impose various types of relations on tasks with 𝑆𝑓𝑕 𝑋 2016/11/5 12

  10. How Tasks Are Related? • All tasks are related – Models of all tasks are close to each other; – Models of all tasks share a common set of features; – Models share the same low rank subspace • Structure in tasks – clusters / graphs / trees • Learning with outlier tasks 2016/11/5 13

  11. MTL Methods: Mean-Regularized MTL Evgeniou & Pontil, 2004 KDD Assumption: model parameters of all tasks are close to each other. – Advantage: simple, intuitive, easy to implement – Disadvantage: too simple Regularization – Penalizes the deviation of each task from the mean 2 𝑛 𝑛 1 𝑋 𝑗 − min 𝑋 𝑀𝑝𝑡𝑡(𝑋) + 𝜇 𝑋 𝑡 𝑛 𝑗=1 𝑡=1 2 2016/11/5 14

  12. MTL Methods: Joint Feature Learning Evgeniou et al. 2006 NIPS, Obozinski et. al. 2009 Stat Comput, Liu et. al. 2010 Technical Report Assumption: models of all tasks share a common set of features – Using group sparsity: ℓ 1,𝑟 -norm regularization Task m Task 1 Task 2 …… Feature 1 Regularization Feature 2 𝑒 𝑋 1,𝑟 = 𝑗=1 𝒙 𝑗 – Feature 3 𝑟 Feature 4 – When 𝑟 > 1 we have group sparsity Feature 5 Feature 6 min 𝑋 𝑀𝑝𝑡𝑡(𝑋) + 𝜇 𝑋 1,𝑟 Feature 7 …… Feature d 2016/11/5 15

  13. MTL Methods: Low-Rank MTL Ji et. al. 2009 ICML Assumption: in high dimensional feature space, the linear models share the same low-rank subspace Regularization - Rank minimization formulation min 𝑋 𝑀𝑝𝑡𝑡(𝑋) + 𝜇 ∙ rank(𝑋) – Rank minimization is NP-Hard for general loss functions • Convex relaxation: nuclear norm minimization min 𝑋 𝑀𝑝𝑡𝑡(𝑋) + 𝜇 𝑋 ∗ ( 𝑋 ∗ : sum of singular values of 𝑋 ) 2016/11/5 16

  14. How Tasks Are Related? • All tasks are related – Models of all tasks are close to each other; – Models of all tasks share a common set of features; – Models share the same low rank subspace • Structure in tasks – clusters / graphs / trees • Learning with outlier tasks 2016/11/5 17

  15. MTL Methods: Clustered MTL Zhou et. al. 2011 NIPS Assumption: cluster structure in tasks - the models of tasks from the same group are closer to each other than those from a different group Regularization - capture clustered structures 𝑀𝑝𝑡𝑡 W + 𝛽 tr 𝑋 𝑈 𝑋 − tr 𝐺 𝑈 𝑋 𝑈 𝑋𝐺 + 𝛾 tr 𝑋 𝑈 𝑋 min 𝑋,𝐺:𝐺 𝑈 𝐺=𝐽 𝑙 Improves capture cluster structures generalization performance 2016/11/5 18

  16. Regularization-based MTL: Decomposition Framework • In practice, it is too restrictive to constrain all tasks to share a single shared structure. • Assumption: the model is the sum of two components 𝑋 = 𝑄 + 𝑅 – A shared low dimensional subspace and a task specific component (Ando and Zhang, 2005, JMLR) – A group sparse component and a task specific sparse component (Jalali et.al., 2010, NIPS) – A low rank structure among relevant tasks + outlier tasks (Gong et.al., 2011, KDD) 2016/11/5 19

  17. How Tasks Are Related? • All tasks are related – Models of all tasks are close to each other; – Models of all tasks share a common set of features; – Models share the same low rank subspace • Structure in tasks – clusters / graphs / trees • Learning with outlier tasks 2016/11/5 20

  18. MTL Methods: Robust MTL Chen et. al. 2011 KDD Assumption: models share the same low-rank subspace + outlier tasks outlier tasks 𝑋 = 𝑄 + 𝑅 𝑅 Regularization Features 𝑄 ∗ : nuclear norm – 𝑛 𝑅 2,1 = 𝑘=1 𝒓 :,𝒌 2 – min 𝑋 𝑀𝑝𝑡𝑡(𝑋) + 𝛽 𝑄 ∗ + 𝛾 𝑅 2,1 low rank column-sparse 2016/11/5 21

  19. Summary So Far… • All multi-task learning formulations discussed above can fit into the 𝑋 = 𝑄 + 𝑅 schema. – Component 𝑄 : shared structure – Component 𝑅 : information not captured by the shared structure 2016/11/5 22

  20. Outline • Introduction to multi-task learning (MTL): problem and models • Multi-task learning with task-feature co-clusters • Low-rank optimization in multi-task learning • Multi-task learning applied to trajectory regression 2016/11/5 23

  21. Recap: How Tasks Are Related? • All tasks are related – Models of all tasks are close to each other; – Models of all tasks share a common set of features; – Models share the same low rank subspace • Structure in tasks – clusters / graphs / trees Task-level • Learning with outlier tasks 2016/11/5 24

  22. How Tasks are Related • Existing methods consider the structure at a general task-level • Restrictive assumption in practice: – In document classification: different tasks may be relevant to different sets of words – In a recommender system: two users with similar tastes on one feature subset may have totally different preference on another subset 2016/11/5 25

  23. CoCMTL: MTL with Task-Feature Co-Clusters [Xu. et al, AAAI15] • Motivation: feature-level groups feature task clustering on the bipartite graph • Impose task-feature co-clustering structure with 𝑆𝑓𝑕(𝑋) 2016/11/5 26

  24. CoCMTL: Model • Decomposition model: 𝑋 = 𝑄 + 𝑅 min 𝑋 𝑀𝑝𝑡𝑡(𝑋) + 𝜇 1 Ω 1 𝑄 + 𝜇 2 Ω 2 (𝑅) 2016/11/5 27

  25. CoCMTL: Model • Decomposition model: 𝑋 = 𝑄 + 𝑅 min 𝑋 𝑀𝑝𝑡𝑡(𝑋) + 𝜇 1 Ω 1 𝑄 + 𝜇 2 Ω 2 (𝑅) non-convex min 𝑒,𝑛 𝜏 𝑗 2 (𝑅) Ω 2 𝑅 = 𝑗=𝑙+1 min 𝑒,𝑛 2 (𝑅) 𝑋 𝑀𝑝𝑡𝑡(𝑋) + 𝜇 1 tr(𝑄𝑀𝑄 𝑈 ) + 𝜇 2 min 𝜏 𝑗 𝑗=𝑙+1 2016/11/5 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend