uncertainty in compositional models of alignment
play

Uncertainty in compositional models of alignment Ieva Kazlauskaite, - PowerPoint PPT Presentation

Uncertainty in compositional models of alignment Ieva Kazlauskaite, University of Bath Neill D.F. Campbell, University of Bath Carl Henrik Ek, University of Bristol Ivan Ustyuzhaninov, University of T ubingen Tom Waterson, Electronic Arts


  1. Uncertainty in compositional models of alignment Ieva Kazlauskaite, University of Bath Neill D.F. Campbell, University of Bath Carl Henrik Ek, University of Bristol Ivan Ustyuzhaninov, University of T¨ ubingen Tom Waterson, Electronic Arts September, 2019

  2. Motivation Data: • Motion capture sequences, e.g. a jump or a golf swing. • Each motion corresponds to a different style or mood. Goal: Generate new motions by interpolating between the captured clips. Pre-processing: The clips need to be temporally aligned.

  3. Motivation Assume we are given some time-series data with inputs x ∈ R N and J output sequences { y j ∈ R N } . We know that there are multiple underlying function that generated this data, say K such functions, f k ( · ), and the observed data was generated by warping the inputs to the true functions using some warping function g j ( x ) such that: y j = f k ( g j ( x )) + noise . (1) Two groups (to be found automatically): Unknown Unknown warps latent functions

  4. Motivation Unknowns: • Number of underlying functions K • Underlying functions f k ( · ) • Warps g j ( · ) for each sequence Data 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0 20 40 60 80 100

  5. Motivation Let’s try to find K using K-means clustering: K-means initialisation with 2 clusters K-means initialisation with 3 clusters 1.00 1.00 0.75 0.75 0.50 0.50 0.25 0.25 0.00 0.00 0.25 0.25 0.50 0.50 0 20 40 60 80 100 0 20 40 60 80 100

  6. Motivation K-means clustering vs. correct labels: K-means initialisation with 2 clusters K-means initialisation with 3 clusters 1.00 1.00 0.75 0.75 0.50 0.50 0.25 0.25 0.00 0.00 0.25 0.25 0.50 0.50 0 20 40 60 80 100 0 20 40 60 80 100 Correct clustering of inputs 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0 20 40 60 80 100

  7. Motivation A PCA scatter plot of the data: PCA initialisation with correct labels 3 2 1 0 1 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5

  8. Alignment model Three constituent parts: • Model of transformations (warps), g j • Model of sequences, f k • Alignment objective

  9. Model of transformations (warps) 1.00 0.75 0.50 0.25 g(x) 0.00 0.25 0.50 0.75 1.00 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 x Observed sequences Example warp • Parametric warps. � 1.00 i ∈ I w i = 1 , w i ≥ 0 ∀ i ∈ I 0.75 0.50 • Nonparametric warps. 0.25 0.00 For example, monotonic GPs 0.25 0.50 0.75 In general, we prefer warps that 1.00 are close to an identity 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 Riihim¨ aki & Vehtari. Gaussian processes with monotonicity information (2010) K. et al. Monotonic Gaussian Process Flow (2019)

  10. Model of sequences Option 1: interpolate sequences using linear interpolation or splines. Option 2: fit GPs to the sequences. • principled way to handle observational noise Observed sequences • can impose priors of f k 1.0 1.0 0.5 0.5 0.0 0.0 0.5 0.5 1.0 1.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 1.5 1.0 0.5 0.0 0.5 1.0 1.5 GP regression

  11. Notation Assume that the observed data was generated as: ǫ j ∼ N (0 , β − y j = f k ( g j ( x )) + ǫ j , j 1) (2) where x are fixed linearly spaced input locations (or evenly sampled time). Then the corresponding aligned sequences are: s j := f k ( x ) (3) The joint conditional likelihood is: �� s j �� � � � k θ j ( X , X ) �� k θ j ( X , G j ) � ∼ N p � G j , X j , θ j 0 , � k θ j ( G j , G j ) + β − 1 y j k θ j ( G j , X ) j (4)

  12. Model of sequences Pseudo-observations S Evenly spaced inputs X Observations Y Warped inputs g(X) Then the goal is to: • Fit GPs to observations and pseudo-observations { [ g ( X ) , X ] , [ Y , S ] } for each sequence • Impose alignment constraint on pseudo-observations { X , S }

  13. Alignment objective We want an alignment objective that: • infers the number of clusters (underlying functions) K • aligns sequences within these clusters We aim to design a clustering or dim. reduction objective that is invariant to the transformation (warps) of the inputs

  14. Pairwise distance alignment objective Minimise the pairwise distance between all sequences (irrespective of the underlying clusters of functions): J J � � || s n ( x ) − s m ( x ) || 2 L = (5) n =1 m = n +1 Warps Aligned functions Complexity: 1.845 Alignment error: 1.735 1.00 0.8 0.75 0.6 0.50 0.4 0.25 0.2 0.00 0.25 0.0 0.50 0.2 0.75 0.4 1.00 0 20 40 60 80 100 0 20 40 60 80 100

  15. Traditional GP-LVM • Observe high-dimensional data S . • Find low-dim representation Z that captures the structure of S . • Find a mapping f from Z to S . Latent space Z Mapping h Inputs S s j = h ( z j , θ ) + noise , where θ are parameters of h .

  16. Traditional GP-LVM In a GP-LVM, GPs are taken to be independent across the features and the likelihood function is: D D � � N ( s d | 0 , K + γ − 1 I ) p ( S | x ) = p ( s d | x ) = (6) d =1 d =1 Observed data Y in matrix form Aligned data S in matrix form

  17. GP-LVM as alignment objective We impose the alignment objective by learning a low-dimensional representation Z of the pseudo-observations S . L GP-LVM = log p ( S | Z , θ h , θ z , β ) − 1 = N 2 Tr ( K − 1 zz SS T ) 2 log | K zz | (7) � �� � � �� � complexity terms data fitting terms log( p ( Z | θ z )) + + log( p ( θ h )) + const � �� � � �� � prior over latent variables prior over GP mappings As an alignment objective, it is controlled by: 1. prior over the latent variables Z , p ( Z ) ∼ N ( 0 , θ z I ) 2. lengthscale in the GP-LVM mapping (part of θ h ))

  18. Aside: Pairwise distance alignment objective Observations = 0.000 = 0.100 2 1 0 1 2 0 2 4 6 0 2 4 6 0 2 4 6 = 0.464 = 2.154 = 10.000 2 1 0 1 2 0 2 4 6 0 2 4 6 0 2 4 6 y i , w i ∈ R 8 with γ || w || 2 , i = 1 , 2 , 3 , 4 y i transformed = y i input + w i ,

  19. Aside: GP-LVM as alignment objective Observations 0.000 0.100 2 1 0 1 2 0.02 0.02 0 0 0 0 2 4 6 0 2 4 6 0 2 4 6 0.464 2.154 10.000 2 1 0 1 0.25 0.42 0.51 2 0.34 0.81 0.96 0.02 0.01 0.05 0.04 0.10 0.07 0 2 4 6 0 2 4 6 0 2 4 6 y i , w i ∈ R 8 with γ || w || 2 , i = 1 , 2 , 3 , 4 y i transformed = y i input + w i ,

  20. Aside: Bayesian Mixture Model as alignment objective Observations 0.000 0.100 2 1 0 1 Cluster assignments Cluster assignments 2 0 1 2 3 0 1 2 3 0 2 4 6 0 2 4 6 0 2 4 6 0.464 2.154 10.000 2 1 0 1 Cluster assignments Cluster assignments Cluster assignments 2 0 1 2 3 0 1 2 3 0 1 2 3 0 2 4 6 0 2 4 6 0 2 4 6 y i , w i ∈ R 8 with γ || w || 2 , i = 1 , 2 , 3 , 4 y i transformed = y i input + w i ,

  21. Full objective for sequence alignment 1. For each of the J sequences we perform standard GP regression on the observed data y j and the pseudo-observations s j by learning the hyperparameters of the GPs and the parameters of the warpings. 2. Impose the alignment objective on the pseudo-observations S The sum of the log-likelihoods is: J J � � L = L GP i + L GP-LVM + log p ( g j ) j =1 j =1 J J � � log p ([ s j , y j ] T | x , g j , θ j , β j ) + L GP-LVM ( Z , ψ h , ψ z , γ ) + = log p ( g j ) j =1 j =1 (8)

  22. Results on ECG data Input data: Alignment with GP-LVM objective: Warps Aligned functions Complexity: 6.447 Alignment error: 0.411 Manifold locations 1.00 0.8 0.6 0.75 0.50 0.6 0.4 0.25 0.4 0.2 0.00 0.2 0.0 0.25 0.0 0.50 0.2 0.2 0.75 0.4 0.4 1.00 0 20 40 60 80 100 0 20 40 60 80 100 0.6 0.4 0.2 0.0 0.2 0.4

  23. Competing objectives and joint model X Z g f h β γ S Y J

  24. Competing objectives and joint model X Z g f h β γ S Y J Likelihood p ( S | H , F X ) as an equal mixture (where S j and S n refer to rows and columns of S ):   p ( S | H , F X ) = 1 � � N ( S n | H n , γ − 1 I J ) + N ( S j | F X j , β − 1 I N )  j 2 n j

  25. Multi-task learning and Matrix distributions Given data Y ∈ R J × N : 1. each sequence (row) has a GP prior and there’s a free-form matrix C that models the covariances between the sequences 1 . 2. learn sparse inverse covariance between features while accounting for a low-rank confounding covariance between samples using GP-LVM 2 : p ( Y | R , C − 1 ) = N (vec( Y ) | 0 N × D , C ⊗ R + σ 2 I N × D ) (9) 1 Bonilla et al. Multi-task Gaussian Process Prediction (2008) 2 Stegle et al. Efficient inference in matrix-variate Gaussian models with iid observation noise (2011)

  26. More generally... These types of constructions are useful when: 1. The data has a hierarchical structure with additional constraints: ǫ j ∼ N (0 , β − y j = f k ( g j ( x )) + ǫ j , j 1) 2. We want to perform dim. reduction or clustering that is invariant to a specific transformation

  27. Uncertainty in alignment model

  28. Uncertainty in alignment model While the alignment model is probabilistic, so far we only considered point estimates and ignored the uncertainties associated with warpings and group assignments. Uncertainty in the alignment model contains: 1. Observed sequences are often noisy 2. Warping uncertainty 3. Assignment of sequences to groups is ambiguous

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend