Self-reflective Multi-task Gaussian Process Kohei Hayashi 1 , - PowerPoint PPT Presentation

Self-reflective Multi-task Gaussian Process Kohei Hayashi 1 , Takashi Takenouchi 1 , Ryota Tomioka 2 , Hisashi Kashima 2 1 Graduate School of Information Science Nara Institute of Science and Technology 2 Department of Mathematical Informatics The University of Tokyo July 2nd, 2011 1 / 22

Multi-task learning: problem definition • tasks and data points are correlated Goal: predict from and 2 / 22

Multi-task learning: problem definition • tasks and data points are correlated Goal: predict from and 3 / 22

Gaussian process for multi-task learning Idea: capture the correlations by measuring similarities between the responses . Multi-task GP [Bonilla+ AISTAT’07] [Yu+ NIPS’07] . separately measures task/data point similarity by using additional information . 4 / 22

Challenges 1 Good similarity measurement . . • additional information may not be enough to capture the correlations ⇒ inaccurate prediction 2 Computational complexity . . • inverse of Gram matrix: not practical for large-scale datasets 5 / 22

Our contributions Propose a new framework for multi-task learning • Self-measuring similarities • measures similarities by observed responses themselves • Efficient, exact learning algorithm • ∼ 10 min for 1000 × 1000 matrix Apply to a recommender system • Outperform existing methods 6 / 22

Model 7 / 22

Simple linear model Consider a linear Gaussian model x ik = w ⊤ ξ ik + ε ik , ( i, k ) ∈ I • w ∈ R K : weight parameter • ξ ik ∈ R K : latent feature vector of x ik • ε ik ∼ N (0 , σ 2 ) : observation noise • I : indices set of observed elements 8 / 22

Bilinear assumption Assume that ξ ik is decomposed into ψ i and φ k : w ⊤ ξ ik = w ⊤ ( φ k ⊗ ψ i ) = ψ ⊤ i W φ k • ψ i ∈ R K 1 : i -th row-dependent feature • φ k ∈ R K 2 : k -th column-dependent feature • W ∈ R K 1 × K 2 : weight parameter ( vec W = w ) • K = K 1 K 2 9 / 22

Now ψ i and φ k are given by feature functions: ψ i = ψ ( x i : ) , φ k = φ ( x : k ) • x i : ∈ R D 1 : i -th row vector of X • x : k ∈ R D 2 : k -th col. vector of X 10 / 22

Kernel representation x pred w ⊤ ( φ ( x : k ) ⊗ ψ ( x i : )) = ˆ (primal) ik ˆ ∑ = β ik k ( { x i : , x : k } , { x j : , x : l } ) (dual) ( j,l ) ∈I . Self-measuring kernel (similarity) . k ( { x i : , x : k } , { x j : , x : l } ) = k ψ ( x i : , x j : ) k φ ( x : k , x : l ) = � ψ ( x i : ) , ψ ( x j : ) �� φ ( x : k ) , φ ( x : l ) � . sim( x ij , x kl ) =sim( x i : , x j : ) × sim( x : k , x : l ) 11 / 22

Latent variables for missing values x pred ˆ ∑ = β ik k ( { x i : , x : k } , { x j : , x : l } ) ik ( j,l ) ∈I How to compute k ( · , · ) with missing values? • introduce latent variables Z ) ⊤ ) ⊤ ( ( 1 , z i 2 , 3 , 4 , z i 5 x i : = ⇒ 1 , , 3 , 4 , . EM-like iterative estimation . • initialize Z 0 by data mean ik = x pred ik ( Z t − 1 ) for t = 1 , 2 , . . . • estimate z t • early stopping with a validation set . 12 / 22

Use of additional information We can exploit additional information S = ( s 1 , . . . , s D 1 ) and T = ( t 1 , . . . , t D 2 ) by combining them with self-measuring similarity. e.g. k ψ ( · , · ) = k ( x i : , x j : ) k ( s i , s j ) , k φ ( · , · ) = k ( x : k , x : l ) k ( t k , t l ) 13 / 22

Optimization 14 / 22

Strategy L 2 norm regularized least square solution: ˆ β = K − 1 x I • K = Ω ⊗ Σ + σ 2 I : Gram matrix • x I ∈ R M : observed elements of X • M = |I| : # observations . ıve approach: compute K − 1 Na¨ . • O ( M 3 ) time and O ( M 2 ) space • too expensive . 15 / 22

Strategy L 2 norm regularized least square solution: ˆ β = K − 1 x I • K = Ω ⊗ Σ + σ 2 I : Gram matrix • x I ∈ R M : observed elements of X • M = |I| : # observations . ıve approach: compute K − 1 Na¨ . • O ( M 3 ) time and O ( M 2 ) space • too expensive . Solve x I = K β by conjugate gradient with vec-trick 3 • O ( M 2 ) time and O ( M ) space 16 / 22

Experiment (updated) 17 / 22

Dataset Dataset: Movielens 100k data • 1682 movies × 943 users • x ik ∈ { 1 , . . . , 5 } • a rating of i -th movie by k -th user • # observations: 100 , 000 • 86 , 040 for training • 4 , 530 for validating (early stopping) • 9 , 430 for testing • additional information • user-specific feature: age, gender, ... • movie-specific feature: genre, release date, ... 18 / 22

Settings • RBF kernel: k ( x , x ′ ) = exp( − λ || x − x ′ || 2 ) • hyper-parameters { σ 2 , λ } : 3 -fold CV k ψ k φ Multi-task GP k ( s i , s j ) k ( t k , t l ) Self-measuring k ( x i : , x j : ) k ( x : k , x : l ) Product k ( x i : , x j : ) k ( s i , s j ) k ( x : k , x : l ) k ( t k , t l ) 19 / 22

Results Method RMSE time Matrix Factorization 0 . 9345 1 m 38 s Multi-task GP 1 . 0517 7 m 01 s Self-measuring 0 . 9308 16 m 22 s Product 18 m 25 s 0 . 9256 • The best score in http://mlcomp.org/datasets/341 20 / 22

Conclusion 1 Proposed a kernel-based method for multi-task . . learning problems • self-measuring similarity • efficient algorithm using CG method 2 Applied to a recommender system . . • outperformed existing methods in the Movielens 100k dataset 21 / 22

Conclusion 1 Proposed a kernel-based method for multi-task . . learning problems • self-measuring similarity • efficient algorithm using CG method 2 Applied to a recommender system . . • outperformed existing methods in the Movielens 100k dataset Questions? 22 / 22

Self-reflective Multi-task Gaussian Process Kohei Hayashi 1 , - PowerPoint PPT Presentation

Self-reflective Multi-task Gaussian Process Kohei Hayashi 1 , Takashi Takenouchi 1 , Ryota Tomioka 2 , Hisashi Kashima 2 1 Graduate School of Information Science Nara Institute of Science and Technology 2 Department of Mathematical Informatics

Multi-Dimensional Reflective BSDE July 29 2010, Cornell University By Qinghua Li, Columbia

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Joint Emotion Analysis via Multi-task Gaussian Processes Daniel Beck, Trevor Cohn, Lucia Specia

Gaussian Process Lei Tang Arizona State University Jul. 31th, 2007 Lei Tang (ASU) Gaussian

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

Gaussian Processes Seung-Hoon Na Chonbuk National University Gaussian Process Regression

3 New Indoor Positioning Solutions FOR AUTOMOTIVE TESTING 1. REFLECTIVE STRIP-AIDING

Reflective Parallel Programming Nicholas D. Matsakis, Thomas R. Gross ETH Zurich 1 Friday,

Complete self-evaluation rubric and written reflective response. Submit within 2 school days of

APNA 30th Annual Conference Session 4035: October 22, 2016 SELF-REFLECTIVE PRACTICE: A

Bond Task Force Draft Bond Task Force Recommendations Tuesday, February 27 , 2018 Bond Task

Task 1d: River basin management Task leader: LNEC; Involved partners EU: ISPRA, DTU, EWA Task

p wered Yva productivity AI Task Manager @nerdybff Task Management Task Management Todoist

amforth : multitasking Erich W alde Forth Gesellschaft e.V. 2012 Task Control Block state

IN5550 Neural Methods in Natural Language Processing Ensembles, transfer and multi-task

Real Time Operating Systems from Fundamentals of Real Time Systems Mukul Shirvaikar &

RegML 2016 Class 4 Regularization for multi-task learning Lorenzo Rosasco UNIGE-MIT-IIT June

Time management And productivity Marika, Qiancheng, David Overcommitting Procrastination

Course Introduction (contd.): Operating Systems Indranil Sengupta (odd section) and Mainack

Concurrent Programming with SCOOP Almost all computer systems on the market today have more than

The OASIS model for developpement of deterministic safety-critical multitask real-time systems

Self-reflective Multi-task Gaussian Process Kohei Hayashi 1 , - PowerPoint PPT Presentation

Self-reflective Multi-task Gaussian Process Kohei Hayashi 1 , Takashi Takenouchi 1 , Ryota Tomioka 2 , Hisashi Kashima 2 1 Graduate School of Information Science Nara Institute of Science and Technology 2 Department of Mathematical Informatics

Multi-Dimensional Reflective BSDE July 29 2010, Cornell University By Qinghua Li, Columbia

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Joint Emotion Analysis via Multi-task Gaussian Processes Daniel Beck, Trevor Cohn, Lucia Specia

Gaussian Process Lei Tang Arizona State University Jul. 31th, 2007 Lei Tang (ASU) Gaussian

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

Gaussian Processes Seung-Hoon Na Chonbuk National University Gaussian Process Regression

3 New Indoor Positioning Solutions FOR AUTOMOTIVE TESTING 1. REFLECTIVE STRIP-AIDING

Reflective Parallel Programming Nicholas D. Matsakis, Thomas R. Gross ETH Zurich 1 Friday,

Complete self-evaluation rubric and written reflective response. Submit within 2 school days of

APNA 30th Annual Conference Session 4035: October 22, 2016 SELF-REFLECTIVE PRACTICE: A

Bond Task Force Draft Bond Task Force Recommendations Tuesday, February 27 , 2018 Bond Task

Task 1d: River basin management Task leader: LNEC; Involved partners EU: ISPRA, DTU, EWA Task

p wered Yva productivity AI Task Manager @nerdybff Task Management Task Management Todoist

amforth : multitasking Erich W alde Forth Gesellschaft e.V. 2012 Task Control Block state

IN5550 Neural Methods in Natural Language Processing Ensembles, transfer and multi-task

Real Time Operating Systems from Fundamentals of Real Time Systems Mukul Shirvaikar &amp;

RegML 2016 Class 4 Regularization for multi-task learning Lorenzo Rosasco UNIGE-MIT-IIT June

Time management And productivity Marika, Qiancheng, David Overcommitting Procrastination

Course Introduction (contd.): Operating Systems Indranil Sengupta (odd section) and Mainack

Concurrent Programming with SCOOP Almost all computer systems on the market today have more than

The OASIS model for developpement of deterministic safety-critical multitask real-time systems

Real Time Operating Systems from Fundamentals of Real Time Systems Mukul Shirvaikar &