Prediction in kernelized output spaces: output kernel trees and - PowerPoint PPT Presentation

Prediction in kernelized output spaces: output kernel trees and ensemble methods Pierre Geurts Florence d’Alché–Buc IBISC CNRS, Université d’Evry, GENOPOLE, Evry, France Department of EE and CS, University of Liège, Belgium 25 janvier 2007 Kernelized output spaces 25 janvier 2007 1 / 49

Motivation In many domains (e.g. text, computational biology), we want to predict complex or structured outputs. e.g. graphs, time series, classes with hierarchical relations, position in a graph, images... The main goal of our research team is to develop machine learning tools to extract structures: We try to address this issue by several ways and through text and systems biology applications : learning structure of BN (Bayesian networks) and DBN (Dynamic bayesian network) : unsupervised approaches learning interactions as a classification concept : supervised and semi-supervised approaches learning mapping between structures when input and output are strongly dependent : supervised approaches learning mapping between input feature vectors and structured outputs (this talk) Kernelized output spaces Motivation 25 janvier 2007 2 / 49

Supervised learning with structured outputs Example 1 : Image reconstruction Example 2 : Find the position of a gene/protein/enzyme in a biological network from various biological descriptors (function of the protein, localization, expression data) Very few solutions exist for these tasks (one precursor: KDE), none are explanatory We present a set of methods for handling complex outputs that have some explanatory power and illustrate it on these two problems with a main focus on the biological network completion Output Kernel Tree: an extension of regression tree to kernelized output spaces Ensemble methods devoted to regressors in kernelized output spaces Kernelized output spaces Motivation 25 janvier 2007 3 / 49

Outline Motivation 1 Supervised learning in kernelized output spaces 2 Output Kernel Tree 3 Ensemble methods 4 Parallel ensemble methods Gradient boosting Experiments 5 Image reconstruction Completion of biological networks Boosting Conclusion and future works 6 Kernelized output spaces Supervised learning in kernelized output spaces 25 janvier 2007 4 / 49

Supervised learning with complex outputs Suppose we have a sample of objects { o i , i = 1 , . . . , N } drawn from a fixed but unknown probability distribution Suppose we have two representations of the objects: an input feature vector representation: x i = x ( o i ) ∈ X an output representation : y i = y ( o i ) ∈ Y where Y is not necessary a vectorial space (it can be a finite set with complex relations between elements) From a learning sample { ( x i , y i ) | i = 1 , . . . , N } with x i ∈ X and y i ∈ Y , find a function h : X → Y that minimizes the expectation of some loss function ℓ : Y × Y → IR over the joint distribution of input/output pairs: E x , y { ℓ ( h ( x ) , y ) } Complex outputs: no constraint (for the moment) on the nature of Y Kernelized output spaces Supervised learning in kernelized output spaces 25 janvier 2007 5 / 49

General approach Kernelized output spaces Supervised learning in kernelized output spaces 25 janvier 2007 6 / 49

Use the kernel trick for the outputs Additional information to the training set: A Gram matrix K = ( k ij ) , with k ij = k ( y i , y j ) and k a Mercer Kernel with the corresponding mapping φ such that k ( y , y ′ ) = < φ ( y ) , φ ( y ′ ) > . Approach: Approximate the feature map φ with a function h φ : X → H defined 1 on the input space Get a prediction in the original output space by approximating the 2 function φ − 1 (pre-image problem) Kernelized output spaces Supervised learning in kernelized output spaces 25 janvier 2007 7 / 49

Possible applications Learning a mapping from an input vector into a structured output (graphs, sequences, trees, time series...) Learning with alternative loss functions (hierarchical classification for instance) Learning a kernel as a function of some inputs Kernelized output spaces Supervised learning in kernelized output spaces 25 janvier 2007 8 / 49

Learning a kernel as a function of some inputs In some applications, we want to learn a relationship between objects rather than an output (e.g. network completion problem) Learning data set: { x i | i = 1 , K = ( k ij ) , i , j = 1 . . . , N } In this case, we can make kernel predictions from predictions in H (without needing pre-images) g ( x , x ′ ) = � h φ ( x ) , h φ ( x ′ ) � . Kernelized output spaces Supervised learning in kernelized output spaces 25 janvier 2007 9 / 49

Outline Motivation 1 Supervised learning in kernelized output spaces 2 Output Kernel Tree 3 Ensemble methods 4 Parallel ensemble methods Gradient boosting Experiments 5 Image reconstruction Completion of biological networks Boosting Conclusion and future works 6 Kernelized output spaces Output Kernel Tree 25 janvier 2007 10 / 49

Standard regression trees A learning algorithm that solves the regression problem ( Y = IR and ℓ ( y 1 , y 2 ) = ( y 1 − y 2 ) 2 ) with a tree structured model Basic idea of the learning procedure: Recursively split the learning sample with tests based on the inputs trying to reduce as much as possible the variance of the output Stop when the output is constant in the leaf (or some stopping criterion is met) Kernelized output spaces Output Kernel Tree 25 janvier 2007 11 / 49

Focus on regression trees on multiple outputs Y = IR n and ℓ ( y 1 , y 2 ) = || y 1 − y 2 || 2 The algorithm is the same but: The best split is the one that maximizes the variance reduction: Score R ( Test , S ) = var { y | S } − N l N var { y | S l } − N r N var { y | S r } , where N is the size of S , N l (resp. N r ) the size of S l (resp. S r ), and var { Y | S } denotes the variance of the output Y in the subset S : N var { y | S } = 1 || y i − y || 2 with y = 1 � � y i N N i = 1 i = 1 which is the average distance to the center of mass (or the Kernelized output spaces Output Kernel Tree 25 janvier 2007 12 / 49

Regression trees in output feature space Let us suppose we have access to an output Gram matrix k ( y i , y j ) with k a kernel defined on Y × Y (with corresponding feature map φ : Y → F such that k ( y i , y j ) = � φ ( y i ) , φ ( y j ) � ) The idea is to grow a multiple output regression tree in the output feature space: The variance becomes: N var { φ ( y ) | S } = 1 || φ ( y i ) − 1 � � φ ( y i ) || 2 N N i = 1 i = 1 Predictions at leaf nodes become pre-images of the centers of mass N L y L = φ − 1 ( 1 � ˆ φ ( y i )) N L i = 1 We need to express everything in terms of kernel values only and return to the original output space Y Kernelized output spaces Output Kernel Tree 25 janvier 2007 13 / 49

Kernelization The variance may be written: N 1 || φ ( y i ) − 1 � � φ ( y i ) || 2 var { φ ( y ) | S } = N N i = 1 i = 1 N N 1 < φ ( y i ) , φ ( y i ) > − 1 � � = < φ ( y i ) , φ ( y j ) >, N 2 N i = 1 i , j = 1 which makes use only of dot products between vectors in the output feature space We can use the kernel trick and replace these dot-products by kernels: N N var { φ ( y ) | S } = 1 k ( y i , y i ) − 1 � � k ( y i , y j ) N 2 N i = 1 i , j = 1 From kernel values only, we can thus grow a regression tree that minimizes output feature space variance Kernelized output spaces Output Kernel Tree 25 janvier 2007 14 / 49

Prediction in the original output space Each leaf is associated with a subset of outputs from the learning sample N L y L = φ − 1 ( 1 X ˆ φ ( y i )) N L i = 1 Generic proposal for the pre-image problem: Find the output in the leaf closest to the center of mass: N L N L y ′ ∈{ y 1 ,..., y NL } || φ ( y ′ ) − 1 y ′ ∈{ y 1 ,..., y NL } k ( y ′ , y ′ ) − 2 φ ( y i ) || 2 = arg X X k ( y i , y ′ ) ˆ y L = arg min min N L N L i = 1 i = 1 Kernelized output spaces Output Kernel Tree 25 janvier 2007 15 / 49

Outline Motivation 1 Supervised learning in kernelized output spaces 2 Output Kernel Tree 3 Ensemble methods 4 Parallel ensemble methods Gradient boosting Experiments 5 Image reconstruction Completion of biological networks Boosting Conclusion and future works 6 Kernelized output spaces Ensemble methods 25 janvier 2007 16 / 49

Ensemble methods Parallel ensemble methods based on randomization: Grow several models in parallel and average their predictions Greatly improve accuracy of single regressors by reducing their variance Usually, they can be applied directly (e.g., bagging, random forests, extra-trees) Boosting algorithms: Grow the models in sequence by focusing on “difficult” examples Need to be extended to regressors with kernelized outputs We propose a kernelization of gradient boosting approaches (Friedman, 2001). Kernelized output spaces Ensemble methods 25 janvier 2007 17 / 49

Prediction in kernelized output spaces: output kernel trees and - PowerPoint PPT Presentation

Prediction in kernelized output spaces: output kernel trees and ensemble methods Pierre Geurts Florence dAlchBuc IBISC CNRS, Universit dEvry, GENOPOLE, Evry, France Department of EE and CS, University of Lige, Belgium 25 janvier

Tyrol Hill Park Phase 4 Elementary Campbell Elementary Campbell Park Spaces Open Park

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

CSE446: Kernels and Kernelized Perceptron Winter 2015 Luke

Machine Learning Fall 2017 Kernels (Kernels, Kernelized Perceptron and SVM) Professor Liang

Tra ffi c Management as a Service | Ghent, Belgium INPUT PROCESS OUTPUT INPUT PROCESS OUTPUT

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 L. Rosasco RKHS About this

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

Complex Prediction Problems A novel approach to multiple Structured Output Prediction Yasemin

} } 3. Take samples of HIV from other HIV positive people from the 3 Take samples of HIV from

Programming MapReduce in Mathematica Paul-Jean Letourneau Data Scientist, Wolfram Research

CSE182-L9 Modeling Protein domains using HMMs Profiles Revisited Note that profiles are a

C LASSIFICATION OF S TRUCTURED O BJECTS I O H H C H H Methanol molecule 3 C LASSIFICATION

Impact of Disclosures WHO 2017 and AJCC 8 th edition Shareholder Five Prime Therapeutics and

Progress WP4, WP7 Memodyn Application Scenario Organizational Coarse-Graining Jan Huwald

5/22/2014 Outline of Talk Endocervical Adenocarcinoma Treatment Decisions for Endocervical

Linear Manifold Embeddings of Pattern Clusters Robert Haralick Rave Harpaz Pattern Recognition