Latent Force Models Neil D. Lawrence (work with Magnus Rattray, - PowerPoint PPT Presentation

Latent Force Models Neil D. Lawrence (work with Magnus Rattray, Mauricio ´ Alvarez , Pei Gao, Antti Honkela, David Luengo, Guido Sanguinetti, Michalis Titsias, Jennifer Withers) University of Sheffield University of Edinburgh Bayes 250 Workshop 6th September 2011

Outline Motivation and Review Motion Capture Example

Styles of Machine Learning Background: interpolation is easy, extrapolation is hard ◮ Urs H¨ olzle keynote talk at NIPS 2005. ◮ Emphasis on massive data sets. ◮ Let the data do the work—more data, less extrapolation. ◮ Alternative paradigm: ◮ Very scarce data: computational biology, human motion. ◮ How to generalize from scarce data? ◮ Need to include more assumptions about the data (e.g. invariances).

General Approach Broadly Speaking: Two approaches to modeling data modeling mechanistic modeling

General Approach Broadly Speaking: Two approaches to modeling data modeling mechanistic modeling let the data“speak”

General Approach Broadly Speaking: Two approaches to modeling data modeling mechanistic modeling let the data“speak” impose physical laws

General Approach Broadly Speaking: Two approaches to modeling data modeling mechanistic modeling let the data“speak” impose physical laws data driven

General Approach Broadly Speaking: Two approaches to modeling data modeling mechanistic modeling let the data“speak” impose physical laws data driven knowledge driven

General Approach Broadly Speaking: Two approaches to modeling data modeling mechanistic modeling let the data“speak” impose physical laws data driven knowledge driven adaptive models

General Approach Broadly Speaking: Two approaches to modeling data modeling mechanistic modeling let the data“speak” impose physical laws data driven knowledge driven adaptive models differential equations

General Approach Broadly Speaking: Two approaches to modeling data modeling mechanistic modeling let the data“speak” impose physical laws data driven knowledge driven adaptive models differential equations digit recognition

General Approach Broadly Speaking: Two approaches to modeling data modeling mechanistic modeling let the data“speak” impose physical laws data driven knowledge driven adaptive models differential equations digit recognition climate, weather models

General Approach Broadly Speaking: Two approaches to modeling data modeling mechanistic modeling Weakly Mechanistic let the data“speak” impose physical laws data driven knowledge driven adaptive models differential equations digit recognition climate, weather models

General Approach Broadly Speaking: Two approaches to modeling data modeling mechanistic modeling Strongly Mechanistic Weakly Mechanistic let the data“speak” impose physical laws data driven knowledge driven adaptive models differential equations digit recognition climate, weather models

Weakly Mechanistic vs Strongly Mechanistic ◮ Underlying data modeling techniques there are weakly mechanistic principles (e.g. smoothness). ◮ In physics the models are typically strongly mechanistic . ◮ In principle we expect a range of models which vary in the strength of their mechanistic assumptions. ◮ This work is one part of that spectrum: add further mechanistic ideas to weakly mechanistic models.

Dimensionality Reduction ◮ Linear relationship between the data, X ∈ ℜ n × p , and a reduced dimensional representation, F ∈ ℜ n × q , where q ≪ p . X = FW + ǫ , ǫ ∼ N ( 0 , Σ ) ◮ Integrate out F , optimize with respect to W . ◮ For Gaussian prior, F ∼ N ( 0 , I ) ◮ and Σ = σ 2 I we have probabilistic PCA (Tipping and Bishop, 1999; Roweis, 1998) . ◮ and Σ constrained to be diagonal, we have factor analysis.

Dimensionality Reduction: Temporal Data ◮ Deal with temporal data with a temporal latent prior. ◮ Independent Gauss-Markov priors over each f i ( t ) leads to : Rauch-Tung-Striebel (RTS) smoother (Kalman filter). ◮ More generally consider a Gaussian process (GP) prior, q � � � p ( F | t ) = N f : , i | 0 , K f : , i , f : , i . i =1

Joint Gaussian Process ◮ Given the covariance functions for { f i ( t ) } we have an implied covariance function across all { x i ( t ) } —(ML: semi-parametric latent factor model (Teh et al., 2005) , Geostatistics: linear model of coregionalization). ◮ Rauch-Tung-Striebel smoother has been preferred ◮ linear computational complexity in n . ◮ Advances in sparse approximations have made the general GP framework practical. (Titsias, 2009; Snelson and Ghahramani, nonero Candela and Rasmussen, 2005) . 2006; Qui˜

Gaussian Process: Exponentiated Quadratic Covariance ◮ Take, for example, exponentiated quadratic form for covariance. � −|| t − t ′ || 2 � t , t ′ � � k = α exp 2 ℓ 2 1 5 0.5 10 ◮ Gaussian process over m 0 latent functions. 15 −0.5 20 −1 25 5 10 15 20 25 n

Mechanical Analogy Back to Mechanistic Models! ◮ These models rely on the latent variables to provide the dynamic information. ◮ We now introduce a further dynamical system with a mechanistic inspiration. ◮ Physical Interpretation: ◮ the latent functions, f i ( t ) are q forces. ◮ We observe the displacement of p springs to the forces., ◮ Interpret system as the force balance equation, XD = FS + ǫ . ◮ Forces act, e.g. through levers — a matrix of sensitivities, S ∈ ℜ q × p . ◮ Diagonal matrix of spring constants, D ∈ ℜ p × p . ◮ Original System: W = SD − 1 .

Extend Model ◮ Add a damper and give the system mass. FS = ¨ XM + ˙ XC + XD + ǫ . ◮ Now have a second order mechanical system. ◮ It will exhibit inertia and resonance. ◮ There are many systems that can also be represented by differential equations. ◮ When being forced by latent function(s), { f i ( t ) } q i =1 , we call this a latent force model .

Physical Analogy

Gaussian Process priors and Latent Force Models Driven Harmonic Oscillator ◮ For Gaussian process we can compute the covariance matrices for the output displacements. ◮ For one displacement the model is q � m k ¨ x k ( t ) + c k ˙ x k ( t ) + d k x k ( t ) = b k + s ik f i ( t ) , (1) i =0 where, m k is the k th diagonal element from M and similarly for c k and d k . s ik is the i , k th element of S . ◮ Model the latent forces as q independent, GPs with exponentiated quadratic covariances − ( t − t ′ ) 2 � � k f i f l ( t , t ′ ) = exp δ il . 2 ℓ 2 i

Covariance for ODE Model ◮ Exponentiated Quadratic Covariance function for f ( t ) � t q 1 � x j ( t ) = s ji exp( − α j t ) f i ( τ ) exp( α j τ ) sin( ω j ( t − τ )) d τ m j ω j 0 i =1 ◮ Joint distribution f(t) 0.8 for x 1 ( t ), x 2 ( t ), 0.6 y 1 (t) x 3 ( t ) and f ( t ). 0.4 0.2 Damping ratios: y 2 (t) 0 ζ 1 ζ 2 ζ 3 −0.2 y 3 (t) 0.125 2 1 −0.4 f(t) y 1 (t) y 2 (t) y 3 (t)

Covariance for ODE Model ◮ Analogy � � � � e ⊤ e ⊤ x = i f i f i ∼ N ( 0 , Σ i ) → x ∼ N 0 , i Σ i e i i i ◮ Joint distribution f(t) 0.8 for x 1 ( t ), x 2 ( t ), 0.6 y 1 (t) x 3 ( t ) and f ( t ). 0.4 0.2 Damping ratios: y 2 (t) 0 ζ 1 ζ 2 ζ 3 −0.2 0.125 2 1 y 3 (t) −0.4 f(t) y 1 (t) y 2 (t) y 3 (t)

Covariance for ODE Model ◮ Exponentiated Quadratic Covariance function for f ( t ) � t q 1 � x j ( t ) = s ji exp( − α j t ) f i ( τ ) exp( α j τ ) sin( ω j ( t − τ )) d τ m j ω j 0 i =1 ◮ Joint distribution f(t) 0.8 for x 1 ( t ), x 2 ( t ), 0.6 y 1 (t) x 3 ( t ) and f ( t ). 0.4 0.2 Damping ratios: y 2 (t) 0 ζ 1 ζ 2 ζ 3 −0.2 y 3 (t) 0.125 2 1 −0.4 f(t) y 1 (t) y 2 (t) y 3 (t)

Joint Sampling of x ( t ) and f ( t ) ◮ lfmSample 1.5 1 0.5 0 −0.5 −1 −1.5 −2 50 55 60 65 70 Figure: Joint samples from the ODE covariance, black : f ( t ), red : x 1 ( t ) (underdamped), green : x 2 ( t ) (overdamped), and blue : x 3 ( t ) (critically damped).

Joint Sampling of x ( t ) and f ( t ) ◮ lfmSample 1.5 2 1 1.5 0.5 1 0 0.5 −0.5 0 −1 −0.5 −1.5 −1 −2 50 50 55 55 60 60 65 65 70 70 Figure: Joint samples from the ODE covariance, black : f ( t ), red : x 1 ( t ) (underdamped), green : x 2 ( t ) (overdamped), and blue : x 3 ( t ) (critically damped).

Joint Sampling of x ( t ) and f ( t ) ◮ lfmSample 1.5 2.5 2 2 1 1.5 1.5 0.5 1 1 0.5 0 0.5 0 −0.5 −0.5 0 −1 −1 −1.5 −0.5 −1.5 −2 −2.5 −2 −1 50 50 50 55 55 55 60 60 60 65 65 65 70 70 70 Figure: Joint samples from the ODE covariance, black : f ( t ), red : x 1 ( t ) (underdamped), green : x 2 ( t ) (overdamped), and blue : x 3 ( t ) (critically damped).

Joint Sampling of x ( t ) and f ( t ) ◮ lfmSample 2.5 1.5 2 2 2 1.5 1 1.5 1.5 1 0.5 1 0.5 1 0.5 0 0 0.5 0 −0.5 −0.5 −0.5 −1 0 −1 −1 −1.5 −1.5 −0.5 −1.5 −2 −2 −2.5 −2.5 −2 −1 50 50 50 50 55 55 55 55 60 60 60 60 65 65 65 65 70 70 70 70 Figure: Joint samples from the ODE covariance, black : f ( t ), red : x 1 ( t ) (underdamped), green : x 2 ( t ) (overdamped), and blue : x 3 ( t ) (critically damped).

Latent Force Models Neil D. Lawrence (work with Magnus Rattray, - PowerPoint PPT Presentation

Latent Force Models Neil D. Lawrence (work with Magnus Rattray, Mauricio Alvarez , Pei Gao, Antti Honkela, David Luengo, Guido Sanguinetti, Michalis Titsias, Jennifer Withers) University of Sheffield University of Edinburgh Bayes 250

1 Latent variable models In the next section we will discuss latent variable models for

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

Latent Force Models with Gaussian Processes Neil D. Lawrence Bayesian Research Kitchen,

Latent Variable Models CS3750 Xiaoting Li 1 Out utli line Latent Variable Models

Learning Overcomplete Latent Variable Models through Tensor Methods Anima Anandkumar UC Irvine

Latent Class Models: The Latent Class Logit Model Accouting for unobserved heterogeneity:

Pengtao Xie Joint work with Yuntian Deng and Eric Xing Carnegie Mellon University 1 Latent

Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 6 Stefano

C unobserved construct (e.g. Disordered v. Non- Disordered) Latent classes are mutually

Learning Latent Variable Models through Tensor Methods Anima Anandkumar U.C. Irvine Challenges

Models for Retrieval Models for Retrieval 1. HMM/N-gram-based 2. Latent Semantic Indexing (LSI)

Bond Task Force Draft Bond Task Force Recommendations Tuesday, February 27 , 2018 Bond Task

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model CS330

Optimization-Based Model Fitting for Latent Class and Latent Profile Analyses Guan-Hua Huang,

Latent Damage and Reliability in Semiconductor Devices May1625 - Advisor & Client: Dr. Randy

ZEB1 Regulates the Latent- -Lytic Lytic Switch Switch ZEB1 Regulates the Latent in Infection

Spherical Designs and Determinantal Point Processes Masatake HIRAO ( ) (Aichi

sr t rtt

Course on Inverse Problems Albert Tarantola Lesson VIII: Monte Carlo Methods Monte Carlo

More on kernels Marcel Lthi Graphics and Vision Research Group Department of Mathematics and

Standards for Knowledge Domain Representation Knowledge domain ontologies Existing

The Gibbs Sampler CSE 527 Lecture 9 Lawrence, et al. Detecting Subtle Sequence

Multi-parameter models - Gibbs Sampling Applied Bayesian Statistics Dr. Earvin Balderama

Gibbs Sampling from -Determinantal Point Processes Alireza Rezaei University of Washington

Latent Force Models Neil D. Lawrence (work with Magnus Rattray, - PowerPoint PPT Presentation

Latent Force Models Neil D. Lawrence (work with Magnus Rattray, Mauricio Alvarez , Pei Gao, Antti Honkela, David Luengo, Guido Sanguinetti, Michalis Titsias, Jennifer Withers) University of Sheffield University of Edinburgh Bayes 250

1 Latent variable models In the next section we will discuss latent variable models for

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

Latent Force Models with Gaussian Processes Neil D. Lawrence Bayesian Research Kitchen,

Latent Variable Models CS3750 Xiaoting Li 1 Out utli line Latent Variable Models

Learning Overcomplete Latent Variable Models through Tensor Methods Anima Anandkumar UC Irvine

Latent Class Models: The Latent Class Logit Model Accouting for unobserved heterogeneity:

Pengtao Xie Joint work with Yuntian Deng and Eric Xing Carnegie Mellon University 1 Latent

Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 6 Stefano

C unobserved construct (e.g. Disordered v. Non- Disordered) Latent classes are mutually

Learning Latent Variable Models through Tensor Methods Anima Anandkumar U.C. Irvine Challenges

Models for Retrieval Models for Retrieval 1. HMM/N-gram-based 2. Latent Semantic Indexing (LSI)

Bond Task Force Draft Bond Task Force Recommendations Tuesday, February 27 , 2018 Bond Task

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model CS330

Optimization-Based Model Fitting for Latent Class and Latent Profile Analyses Guan-Hua Huang,

Latent Damage and Reliability in Semiconductor Devices May1625 - Advisor &amp; Client: Dr. Randy

ZEB1 Regulates the Latent- -Lytic Lytic Switch Switch ZEB1 Regulates the Latent in Infection

Spherical Designs and Determinantal Point Processes Masatake HIRAO ( ) (Aichi

sr t rtt

Course on Inverse Problems Albert Tarantola Lesson VIII: Monte Carlo Methods Monte Carlo

More on kernels Marcel Lthi Graphics and Vision Research Group Department of Mathematics and

Standards for Knowledge Domain Representation Knowledge domain ontologies Existing

The Gibbs Sampler CSE 527 Lecture 9 Lawrence, et al. Detecting Subtle Sequence

Multi-parameter models - Gibbs Sampling Applied Bayesian Statistics Dr. Earvin Balderama

Gibbs Sampling from -Determinantal Point Processes Alireza Rezaei University of Washington

Latent Damage and Reliability in Semiconductor Devices May1625 - Advisor & Client: Dr. Randy