Output Kernel Learning Methods Francesco Dinuzzo 1 Cheng Soon Ong 2 - PowerPoint PPT Presentation

Output Kernel Learning Methods Francesco Dinuzzo 1 Cheng Soon Ong 2 Kenji Fukumizu 3 1 Max Planck Institute for Intelligent Systems, T¨ ubingen, Germany 2 NICTA, Melbourne 3 Institute of Statistical Mathematics, Japan

Part I Learning multiple tasks and their relationships

Multiple regression: population pharmacokinetics Xenobiotic concentration in 27 human subjects after a bolus administration 120 100 80 Concentration 60 40 20 0 0 5 10 15 20 25 Time (hours) The response curves have similar shapes. However, there is macroscopic inter-individual variability.

Multiple regression: population pharmacokinetics Subject #1 Subject #2 Subject #3 120 120 120 100 100 100 80 80 80 Concentration Concentration Concentration 60 60 60 40 40 40 20 20 20 0 0 0 0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25 Time (hours) Time (hours) Time (hours) Subject #4 Subject #5 Subject #6 120 120 120 100 100 100 80 80 80 Concentration Concentration Concentration 60 60 60 40 40 40 20 20 20 0 0 0 0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25 Time (hours) Time (hours) Time (hours) Few data points per subject with sparse sampling.

Multiple regression: population pharmacokinetics Subject #1 Subject #2 Subject #3 120 120 120 100 100 100 80 80 80 Concentration Concentration Concentration 60 60 60 40 40 40 20 20 20 0 0 0 0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25 Time (hours) Time (hours) Time (hours) Subject #4 Subject #5 Subject #6 120 120 120 100 100 100 80 80 80 Concentration Concentration Concentration 60 60 60 40 40 40 20 20 20 0 0 0 0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25 Time (hours) Time (hours) Time (hours) Can we combine the datasets to better estimate all the curves?

Collaborative filtering and recommender systems Data: collections of ratings assigned by several users to a set of items. Problem: estimate the preferences of the every user for all the items. Preference profiles are different for every user. However, similar users have similar preferences.

Collaborative filtering and recommender systems Additional information: Data about the items. Data about the users (e.g. gender, age, occupation). Data about the ratings themselves (e.g. timestamp, tags). Can we combine all these data to better estimate individual preferences?

Multi-task learning: dataset structure 10 20 30 40 Task 50 Sampling can be very sparse. 60 70 80 90 100 10 20 x

Multi-task learning: dataset structure 10 20 30 40 Task 50 Few samples per task... 60 70 80 90 100 10 20 x

Multi-task learning: dataset structure 10 20 30 40 Task 50 ... but each sample is shared by many tasks. 60 70 80 90 100 10 20 x

Object recognition with structure discovery 1 Build an object classifier with good generalization performance 2 Discover relationships between the different classes

Organizing the classes in a graph structure baseball-bat goose breadmaker calculator monitor chopsticks bear killer-whale helicopter boom-box laptop microwave sword chimp porcupine swan speed-boat vcr motorbikes photocopier tweezer gorilla video-projector tennis-court airplanes mountain-bike toaster refrigerator skunk comet cactus centipede butterfly fireworks golden-gate buddha grapes spaghetti fern crab grasshopper bowling-ball galaxy light-house eiffel-tower toad scorpion hummingbird hibiscus cd lightning teepee minaret iris brain frisbee rainbow windmill skyscraper mars steering-wheel tower-pisa How can we generate such a graph automatically?

Part II Output Kernel Learning

Kernel-based multi-task learning Multi-task supervised learning Synthesizing multiple functions f j : X → Y , j = 1 , . . . , m from multiple datasets of input-output pairs ( x ij , y ij ).

Kernel-based multi-task learning Multi-task supervised learning Synthesizing multiple functions f j : X → Y , j = 1 , . . . , m from multiple datasets of input-output pairs ( x ij , y ij ). Multi-task kernels For every pair of inputs ( x 1 , x 2 ) and every pair of task indices ( i, j ), specify a similarity value K (( x 1 , i ) , ( x 2 , j )) Equivalently, specify a matrix valued function H such that [ H ( x 1 , x 2 )] ij = K (( x 1 , i ) , ( x 2 , j ))

Decomposable kernels Decomposable kernels K (( x 1 , i ) , ( x 2 , j )) = K X ( x 1 , x 2 ) K Y ( i, j ) , Matrix-valued kernel H ( x 1 , x 2 ) = K X ( x 1 , x 2 ) · L , L ij = K Y ( i, j ) K X is the input kernel . K Y is the output kernel (equivalently, L ∈ S m + ).

Kernel-based regularization methods Kernel-based regularization   ℓ j m � � V ( y ij , f j ( x ij )) + � f � 2 min  H L  f ∈H L j =1 i =1

Kernel-based regularization methods Kernel-based regularization   ℓ j m � � V ( y ij , f j ( x ij )) + � f � 2 min  H L  f ∈H L j =1 i =1 Representer theorem � ℓ k m � � � f j ( x ) = L jk c ij K X ( x ij , x ) i =1 k =1

Kernel-based regularization methods Kernel-based regularization   ℓ j m � � V ( y ij , f j ( x ij )) + � f � 2 min  H L  f ∈H L j =1 i =1 Representer theorem � ℓ k m � � � f j ( x ) = L jk c ij K X ( x ij , x ) i =1 k =1 How to choose the output kernel? Independent single-task learning: L = I .

Kernel-based regularization methods Kernel-based regularization   ℓ j m � � V ( y ij , f j ( x ij )) + � f � 2 min  H L  f ∈H L j =1 i =1 Representer theorem � ℓ k m � � � f j ( x ) = L jk c ij K X ( x ij , x ) i =1 k =1 How to choose the output kernel? Independent single-task learning: L = I . Pooled single-task learning: L = 1 .

Kernel-based regularization methods Kernel-based regularization   ℓ j m � � V ( y ij , f j ( x ij )) + � f � 2 min  H L  f ∈H L j =1 i =1 Representer theorem � ℓ k m � � � f j ( x ) = L jk c ij K X ( x ij , x ) i =1 k =1 How to choose the output kernel? Independent single-task learning: L = I . Pooled single-task learning: L = 1 . Design it using prior knowledge.

Kernel-based regularization methods Kernel-based regularization   ℓ j m � � V ( y ij , f j ( x ij )) + � f � 2 min  H L  f ∈H L j =1 i =1 Representer theorem � ℓ k m � � � f j ( x ) = L jk c ij K X ( x ij , x ) i =1 k =1 How to choose the output kernel? Independent single-task learning: L = I . Pooled single-task learning: L = 1 . Design it using prior knowledge. Learn it from the data.

Multiple Kernel Learning Multiple Kernel Learning (MKL) N � K = d k ≥ 0 . d k K k , k =1

Multiple Kernel Learning Multiple Kernel Learning (MKL) N � K = d k ≥ 0 . d k K k , k =1 MKL with decomposable basis kernels N � d k K k X ( x 1 , x 2 ) K k K (( x 1 , i ) , ( x 2 , j )) = Y ( i, j ) k =1

Multiple Kernel Learning Multiple Kernel Learning (MKL) N � K = d k ≥ 0 . d k K k , k =1 MKL with decomposable basis kernels N � d k K k X ( x 1 , x 2 ) K k K (( x 1 , i ) , ( x 2 , j )) = Y ( i, j ) k =1 MKL with decomposable basis kernels (common input kernel) � N � � d k K k K (( x 1 , i ) , ( x 2 , j )) = K X ( x 1 , x 2 ) Y ( i, j ) k =1

Multiple Kernel Learning Multiple Kernel Learning (MKL) N � K = d k ≥ 0 . d k K k , k =1 MKL with decomposable basis kernels N � d k K k X ( x 1 , x 2 ) K k K (( x 1 , i ) , ( x 2 , j )) = Y ( i, j ) k =1 MKL with decomposable basis kernels (common input kernel) � N � � d k K k K (( x 1 , i ) , ( x 2 , j )) = K X ( x 1 , x 2 ) Y ( i, j ) k =1 Two issues: 1 The maximum number of kernels is limited by memory constraints. 2 Specifying the dictionary of basis kernels requires domain knowledge.

Output Kernel Learning Optimization problem     ℓ j m  min � � V ( y ij , f j ( x ij )) + � f � 2  , min H L + Ω( L )   L ∈ S + f ∈H L j =1 i =1

Output Kernel Learning Optimization problem     ℓ j m  min � � V ( y ij , f j ( x ij )) + � f � 2  , min H L + Ω( L )   L ∈ S + f ∈H L j =1 i =1 Examples: Squared Frobenius norm Ω( L ) = � L � 2 F . Sparsity-inducing regularizer Ω( L ) = � L � 1 . Low-rank inducing regularizer Ω( L ) = tr( L ) + I (rank( L ) ≤ p ) .

Low-Rank Output Kernel Learning Low-Rank OKL C T KCL � � �� W ⊙ ( Y − KCL ) � 2 � � + tr + tr( L ) F min min . L ∈ S m,p 2 λ 2 2 C ∈ R ℓ × m +

Low-Rank Output Kernel Learning Low-Rank OKL C T KCL � � �� W ⊙ ( Y − KCL ) � 2 � � + tr + tr( L ) F min min . L ∈ S m,p 2 λ 2 2 C ∈ R ℓ × m + A non-linear generalization of reduced-rank regression.

Low-Rank Output Kernel Learning Low-Rank OKL C T KCL � � �� W ⊙ ( Y − KCL ) � 2 � � + tr + tr( L ) F min min . L ∈ S m,p 2 λ 2 2 C ∈ R ℓ × m + A non-linear generalization of reduced-rank regression. One of the reformulations only requires storing low-rank matrices.

Output Kernel Learning Methods Francesco Dinuzzo 1 Cheng Soon Ong 2 - PowerPoint PPT Presentation

Output Kernel Learning Methods Francesco Dinuzzo 1 Cheng Soon Ong 2 Kenji Fukumizu 3 1 Max Planck Institute for Intelligent Systems, T ubingen, Germany 2 NICTA, Melbourne 3 Institute of Statistical Mathematics, Japan Part I Learning multiple

Tra ffi c Management as a Service | Ghent, Belgium INPUT PROCESS OUTPUT INPUT PROCESS OUTPUT

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

File Input and Output File Input and Output 1 / 9 File input/output input function reads values

Chapter 12 Overview Devices and Output Visual Output Dynamic Visualizations Sound

16. Recursion 2 Output: 103 Input: (3 + 5) * 20 Output: 160 Input: -(3 + 5) + 20 Output: 12

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Machine learning theory Kernel methods Hamid Beigy Sharif university of technology April 20,

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Multiple Kernel Learning and Feature Space Denoising Fei Yan, Josef Kittler and Krystian

Prediction in kernelized output spaces: output kernel trees and ensemble methods Pierre Geurts

Machine Learning Kernel Methods Hamid R. Rabiee Mohammad H. Rohban Spring 2015

Path Stitc hing: Inte r ne t-Wide Path and De lay E stimation fr om E xisting Me asur e me

Taking Drupal development to the Cloud Karel Bemelmans About me Working with Internet based

Entrapment: Tricking Malware with Transparent, Scalable Malware Analysis Paul Royal

Introduction to IP networking Olof Hagsand KTH CSC 1 Example: Packet transfer www.server.org

Fast lightweight accurate xenograft sorting (Faster xenograft sorting with 3-way bucketed Cuckoo

Stat 8931 (Aster Models) Lecture Slides Deck 6 Charles J. Geyer School of Statistics University

Notes: GIS Applications in Fire Ecology & Management In this lesson we will cover the

From little things big things grow How digital connectivity is helping Australian small

Output Kernel Learning Methods Francesco Dinuzzo 1 Cheng Soon Ong 2 - PowerPoint PPT Presentation

Output Kernel Learning Methods Francesco Dinuzzo 1 Cheng Soon Ong 2 Kenji Fukumizu 3 1 Max Planck Institute for Intelligent Systems, T ubingen, Germany 2 NICTA, Melbourne 3 Institute of Statistical Mathematics, Japan Part I Learning multiple

Tra ffi c Management as a Service | Ghent, Belgium INPUT PROCESS OUTPUT INPUT PROCESS OUTPUT

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

File Input and Output File Input and Output 1 / 9 File input/output input function reads values

Chapter 12 Overview Devices and Output Visual Output Dynamic Visualizations Sound

16. Recursion 2 Output: 103 Input: (3 + 5) * 20 Output: 160 Input: -(3 + 5) + 20 Output: 12

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Machine learning theory Kernel methods Hamid Beigy Sharif university of technology April 20,

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Multiple Kernel Learning and Feature Space Denoising Fei Yan, Josef Kittler and Krystian

Prediction in kernelized output spaces: output kernel trees and ensemble methods Pierre Geurts

Machine Learning Kernel Methods Hamid R. Rabiee Mohammad H. Rohban Spring 2015

Path Stitc hing: Inte r ne t-Wide Path and De lay E stimation fr om E xisting Me asur e me

Taking Drupal development to the Cloud Karel Bemelmans About me Working with Internet based

Entrapment: Tricking Malware with Transparent, Scalable Malware Analysis Paul Royal

Introduction to IP networking Olof Hagsand KTH CSC 1 Example: Packet transfer www.server.org

Fast lightweight accurate xenograft sorting (Faster xenograft sorting with 3-way bucketed Cuckoo

Stat 8931 (Aster Models) Lecture Slides Deck 6 Charles J. Geyer School of Statistics University

Notes: GIS Applications in Fire Ecology &amp; Management In this lesson we will cover the

From little things big things grow How digital connectivity is helping Australian small

Notes: GIS Applications in Fire Ecology & Management In this lesson we will cover the