MACHINE LEARNING GPLVM, LWPR Slides for GPLVM are adapted from Neil - PowerPoint PPT Presentation

MACHINE LEARNING - Doctoral Class - EDOC MACHINE LEARNING GPLVM, LWPR Slides for GPLVM are adapted from Neil Lawrence Lectures Slides for LWPR are adapted from Stefan Schaal’s Lectures

MACHINE LEARNING - Doctoral Class - EDOC Why reducing data dimensionality? Raw Data Trying to find some structure in the data…..

MACHINE LEARNING - Doctoral Class - EDOC Why reducing data dimensionality? Reducing the dimensionality of the dataset at hand so that computation afterwards is more tractable Idea: sole a few of the dimensions matter, the projections of the data along the residual dimensions convey no structure of the data (already a form of generalization)

MACHINE LEARNING - Doctoral Class - EDOC Revision of PCA and kPCA We have already seen PCA and kernel PCA in the previous lectures PCA -> finds a linear projected space maximizes the variance in the projected space, minimizes the squared reconstruction error PCA was extended with: kPCA -> finds a non-linear projected space maximizes the variance in the projected space, minimizes the squared reconstruction error BUT…

MACHINE LEARNING - Doctoral Class - EDOC Why Probabilistic Dimensionality Reduction? Two-side projection: what if we need to manipulate data in the projected space and then reconstruct them back to the original? kPCA -> does not define the projection sending back from feature space to the original data-space.

MACHINE LEARNING - Doctoral Class - EDOC Why Probabilistic Dimensionality Reduction? Missing data: real-world data are often incomplete (e.g. a state-vector of a robot consists of several measurement delivered by the sensors, due to network problems some sensor readings are lost) Non-probabilistic treatment -> No systematic way to process missing data On usually either discard the incomplete data (time consuming if huge amount of data) or complete the missing values using a variety of interpolation methods. � Probabilistic PCA allows to handle missing data. E-M approach to computing the principal projections.

MACHINE LEARNING - Doctoral Class - EDOC Notation q -- dimension of latent / embedded space Ν -- dimension of data space M -- dimension of data points × ∈ W N q -- mapping matrix R Centered × = = ∈ Y y y y y T M N [ ,..., ] [ ,..., ] R original data 1 :,1 :, M N Projected × = = ∈ X x x x x T M q [ ,..., ] [ ,..., ] R data 1 :,1 :, M q

MACHINE LEARNING - Doctoral Class - EDOC Linear Dimensionality Reduction: Recap Y Represent data , , with a lower dimensional set of X latent variables . Assume a linear relationship of the form = Wx + η y , , : , : , : i i i where η = σ 2 ( 0 , ) N I , : i

MACHINE LEARNING - Doctoral Class - EDOC Linear Dimensionality Reduction: Recap Y Represent data , , with a lower dimensional set of X latent variables . Assume a linear relationship of the form = Wx + η y , , : , : , : i i i where Mapping matrix η = σ 2 ( 0 , ) N I , : i

MACHINE LEARNING - Doctoral Class - EDOC Linear Dimensionality Reduction: Recap Y Represent data , , with a lower dimensional set of X latent variables . Assume a linear relationship of the form = Wx + η y , , : , : , : i i i where Gaussian noise η = σ 2 ( 0 , ) N I , : i

MACHINE LEARNING - Doctoral Class - EDOC Probabilistic Regression: Recap A statistical approach to classical linear regression estimates the relationship between zero-mean variables y and x by building a linear model of the form: ( ) = = ∈ � Ν T , , , y f x w x w w x If one assumes that the observed values of y differ from f(x) by an additive noise ε that follows a zero-mean Gaussian distribution (such an assumption consists of putting a prior distribution over the noise), then: ( ) ( ) = + ε = + ε ε = σ T , , with 0, y f x w x w N

MACHINE LEARNING - Doctoral Class - EDOC Probabilistic Regression: Recap { } M Consider a training set of M pairs of data points , x y = i i 1 i with each pair independently and identically distributed (i.i.d) according to a Gaussian distribution. ( ) = + σ T The likelihood of the regressive model 0, is y x w N given by computing the probability density of each training pair σ given the parameters of the model ( , ) : w ( ) ⎛ ⎞ 2 − T y x w M M 1 ∏ ∏ ( ) ⎜ ⎟ = x w σ = − i i ˆ | , , exp y p y ⎜ ⎟ σ i i πσ 2 2 2 = = ⎝ ⎠ 1 1 i i

MACHINE LEARNING - Doctoral Class - EDOC Probabilistic Regression: Recap In Bayesian formalism , one would also specify a prior over the parameter w . Typical prior is to assume a zero mean Gaussian ( ) ( ) = σ prior with fixed covariance matrix: 0, p w N w One can then compute the posterior distribution over the parameter w using Bayes’ Theorem: ( ) ( ) | , p y x w p w likelihood x prior ( ) = posterior = , | , p w y x ( ) marginal likelihood | p y x Computing the expectation over the posterior distribution is called the maximum a posteriori (MAP) estimate of w.

MACHINE LEARNING - Doctoral Class - EDOC Probabilistic PCA: Recap Assumptions: Y all datapoints in are i.i.d. the relationship between data and the latent variables is Gaussian Ν = ∏ σ Y X W y Wx 2 I ( | , ) ( | , ) p N ,: ,: i i = 1 i

MACHINE LEARNING - Doctoral Class - EDOC Probabilistic PCA: Recap The marginal likelihood can be obtained in two ways: marginalizing over the latent variables (PPCA) or marginalizing over the parameters (DPCA) Parameter Optimization: M = ∏ σ Y X W y Wx 2 I ( | , ) ( | , ) p N ,: ,: i i = i 1 Define Gaussian prior X : over the latent space, M = ∏ X x 0 I ( ) ( | , ) p N ,: i = 1 i Integrate out the latent variables: M ∏ = + σ Y W y WW I T 2 ( | ) ( | 0, ) p N ,: i = 1 i

MACHINE LEARNING - Doctoral Class - EDOC Probabilistic PCA: Recap E-M for PCA During the e-step of the EM algorithm, if the data matrix is incomplete, one can replace the missing elements with the mean (affect very little the estimate) or an extreme value (increase the uncertainty of the model).

MACHINE LEARNING - Doctoral Class - EDOC Probabilistic PCA: Revision E-M for PCA This is ok only if number of missing values is small. If the number of missing values for Y is big, then one proceeds to finding the projection y* which minimizes MSE while ensuring that the solution lies in the sub-space spanned by the known entries of y.

MACHINE LEARNING - Doctoral Class - EDOC Dual Probabilistic PCA II Latent Variables Optimization: M = ∏ σ Y X W y Wx 2 I ( | , ) ( | , ) p N ,: ,: i i = i 1 Define Gaussian prior W : over the parameter, Ν = ∏ W w 0 I ( ) ( | , ) p N ,: i = 1 i Integrate out the parameters: M ∏ = + σ Y X y XX 2 I T ( | ) ( | 0, ) p N ,: i = i 1

MACHINE LEARNING - Doctoral Class - EDOC Dual Probabilistic PCA II The marginalized likelihood takes the form: M ∏ σ = Ν + σ Y X y XX 2 I T ( | , ) ( | 0, ) p i = 1 i To find the latent variables, optimize the log-likelihood: Ν Ν 1 M = + σ − = − π − − K XX I 2 K K YY 1 T T ln 2 ln ( ) L tr 2 2 2 The gradient of w.r.t. the latent variables: L ∂ L − − − = − K YY K X 1 1 K X 1 T N ∂ X

MACHINE LEARNING - Doctoral Class - EDOC Dual Probabilistic PCA II = Ν − S 1 YY T Introducing the substitution: We rewrite the gradient of the likelihood as: ∂ L − − − = − = σ + − 1 = K 1 SK 1 X K 1 X 0 S 2 I XX X X or T ( ) ∂ X K

MACHINE LEARNING - Doctoral Class - EDOC Dual Probabilistic PCA II = Ν − S 1 YY T Introducing the substitution: We rewrite the gradient of the likelihood as: ∂ L − − − = − = σ + − 1 = K 1 SK 1 X K 1 X 0 S 2 I XX X X or T ( ) ∂ X X = ULV Let us consider the single value decomposition of T , X therefore, we can rewrite the equation for the latent : + σ − − = SU L 2 L 1 1 V ULV T T [ ] V The solution is invariant to : = σ + SU U 2 I L 2 ( )

MACHINE LEARNING - Doctoral Class - EDOC Dual Probabilistic PCA II X = ULV T Need to find matrices in the decomposition of Λ is the matrix of S of eigenvalues = σ + The covariance matrix of SU U 2 I L 2 ( ) the observable data Eigenvectors of S L This implies that the elements from the diagonal of are given by 1 = λ − σ 2 ) ( 2 l i i

MACHINE LEARNING - Doctoral Class - EDOC Equivalence of PPCA and Dual PPCA Solution for PPCA: = W = Y YU U Λ T U W LV T W W Solution for Dual PPCA: = X = YY U U Λ ULV T T Equivalence is of the form: 1 − = U Y U Λ T 2 W If one knows a solution of PPCA then the solution of Dual PPCA can be computed directly through the equivalence form. Marginalization over the latent variables and parameters is equivalent. But marginalization over the latent variables allows for interesting extensions; see next.

MACHINE LEARNING - Doctoral Class - EDOC Gaussian Processes I To recall, the marginal likelihood in Dual PPCA is given by: M ∏ = + σ Y X y XX I T 2 ( | ) ( | 0, ) p N ,: i = 1 i

MACHINE LEARNING GPLVM, LWPR Slides for GPLVM are adapted from Neil - PowerPoint PPT Presentation

MACHINE LEARNING - Doctoral Class - EDOC MACHINE LEARNING GPLVM, LWPR Slides for GPLVM are adapted from Neil Lawrence Lectures Slides for LWPR are adapted from Stefan Schaals Lectures MACHINE LEARNING - Doctoral Class - EDOC Why reducing

Motivation: disease progression modelling Covariate-GPLVM Motivation: disease progression

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Measuring feet trajectories: challenges and applications S. PI ERARD, S. AZROUR and M. VAN

Playware Henrik Hautop Lund Center for Playware Technical University of Denmark Center for

To Your Health: Software Development in Genentech Research and Early Development (gRED) Erik

Introduction to Robotics Jan Faigl Department of Computer Science Faculty of Electrical

CS 528 Mobile and Ubiquitous Computing Lecture 5b: Step Counting & Activity Recognition

9/17/2018 DO BONE TURNOVER MARKERS PREDICT HIP FRACTURE RISK IN (UNTREATED) POSTMENOPAUSAL

Neuromechanical models of legged locomotion: How cockroaches run fast and stably without thinking

Legalization of Marihuana - Is it High Times? or Should we Keep off the Grass? What is is

MACHINE LEARNING GPLVM, LWPR Slides for GPLVM are adapted from Neil - PowerPoint PPT Presentation

MACHINE LEARNING - Doctoral Class - EDOC MACHINE LEARNING GPLVM, LWPR Slides for GPLVM are adapted from Neil Lawrence Lectures Slides for LWPR are adapted from Stefan Schaals Lectures MACHINE LEARNING - Doctoral Class - EDOC Why reducing

Motivation: disease progression modelling Covariate-GPLVM Motivation: disease progression

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Measuring feet trajectories: challenges and applications S. PI ERARD, S. AZROUR and M. VAN

Playware Henrik Hautop Lund Center for Playware Technical University of Denmark Center for

To Your Health: Software Development in Genentech Research and Early Development (gRED) Erik

Introduction to Robotics Jan Faigl Department of Computer Science Faculty of Electrical

CS 528 Mobile and Ubiquitous Computing Lecture 5b: Step Counting &amp; Activity Recognition

9/17/2018 DO BONE TURNOVER MARKERS PREDICT HIP FRACTURE RISK IN (UNTREATED) POSTMENOPAUSAL

Neuromechanical models of legged locomotion: How cockroaches run fast and stably without thinking

Legalization of Marihuana - Is it High Times? or Should we Keep off the Grass? What is is

CS 528 Mobile and Ubiquitous Computing Lecture 5b: Step Counting & Activity Recognition