Uncertainty in compositional models of alignment Ieva Kazlauskaite, - PowerPoint PPT Presentation

Uncertainty in compositional models of alignment Ieva Kazlauskaite, University of Bath Neill D.F. Campbell, University of Bath Carl Henrik Ek, University of Bristol Ivan Ustyuzhaninov, University of T¨ ubingen Tom Waterson, Electronic Arts September, 2019

Motivation Data: • Motion capture sequences, e.g. a jump or a golf swing. • Each motion corresponds to a different style or mood. Goal: Generate new motions by interpolating between the captured clips. Pre-processing: The clips need to be temporally aligned.

Motivation Assume we are given some time-series data with inputs x ∈ R N and J output sequences { y j ∈ R N } . We know that there are multiple underlying function that generated this data, say K such functions, f k ( · ), and the observed data was generated by warping the inputs to the true functions using some warping function g j ( x ) such that: y j = f k ( g j ( x )) + noise . (1) Two groups (to be found automatically): Unknown Unknown warps latent functions

Motivation Unknowns: • Number of underlying functions K • Underlying functions f k ( · ) • Warps g j ( · ) for each sequence Data 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0 20 40 60 80 100

Motivation Let’s try to find K using K-means clustering: K-means initialisation with 2 clusters K-means initialisation with 3 clusters 1.00 1.00 0.75 0.75 0.50 0.50 0.25 0.25 0.00 0.00 0.25 0.25 0.50 0.50 0 20 40 60 80 100 0 20 40 60 80 100

Motivation K-means clustering vs. correct labels: K-means initialisation with 2 clusters K-means initialisation with 3 clusters 1.00 1.00 0.75 0.75 0.50 0.50 0.25 0.25 0.00 0.00 0.25 0.25 0.50 0.50 0 20 40 60 80 100 0 20 40 60 80 100 Correct clustering of inputs 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0 20 40 60 80 100

Motivation A PCA scatter plot of the data: PCA initialisation with correct labels 3 2 1 0 1 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5

Alignment model Three constituent parts: • Model of transformations (warps), g j • Model of sequences, f k • Alignment objective

Model of transformations (warps) 1.00 0.75 0.50 0.25 g(x) 0.00 0.25 0.50 0.75 1.00 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 x Observed sequences Example warp • Parametric warps. � 1.00 i ∈ I w i = 1 , w i ≥ 0 ∀ i ∈ I 0.75 0.50 • Nonparametric warps. 0.25 0.00 For example, monotonic GPs 0.25 0.50 0.75 In general, we prefer warps that 1.00 are close to an identity 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 Riihim¨ aki & Vehtari. Gaussian processes with monotonicity information (2010) K. et al. Monotonic Gaussian Process Flow (2019)

Model of sequences Option 1: interpolate sequences using linear interpolation or splines. Option 2: fit GPs to the sequences. • principled way to handle observational noise Observed sequences • can impose priors of f k 1.0 1.0 0.5 0.5 0.0 0.0 0.5 0.5 1.0 1.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 1.5 1.0 0.5 0.0 0.5 1.0 1.5 GP regression

Notation Assume that the observed data was generated as: ǫ j ∼ N (0 , β − y j = f k ( g j ( x )) + ǫ j , j 1) (2) where x are fixed linearly spaced input locations (or evenly sampled time). Then the corresponding aligned sequences are: s j := f k ( x ) (3) The joint conditional likelihood is: �� s j �� k θ j ( X , X ) �� k θ j ( X , G j ) � ∼ N p � G j , X j , θ j 0 , � k θ j ( G j , G j ) + β − 1 y j k θ j ( G j , X ) j (4)

Model of sequences Pseudo-observations S Evenly spaced inputs X Observations Y Warped inputs g(X) Then the goal is to: • Fit GPs to observations and pseudo-observations { [ g ( X ) , X ] , [ Y , S ] } for each sequence • Impose alignment constraint on pseudo-observations { X , S }

Alignment objective We want an alignment objective that: • infers the number of clusters (underlying functions) K • aligns sequences within these clusters We aim to design a clustering or dim. reduction objective that is invariant to the transformation (warps) of the inputs

Pairwise distance alignment objective Minimise the pairwise distance between all sequences (irrespective of the underlying clusters of functions): J J � � || s n ( x ) − s m ( x ) || 2 L = (5) n =1 m = n +1 Warps Aligned functions Complexity: 1.845 Alignment error: 1.735 1.00 0.8 0.75 0.6 0.50 0.4 0.25 0.2 0.00 0.25 0.0 0.50 0.2 0.75 0.4 1.00 0 20 40 60 80 100 0 20 40 60 80 100

Traditional GP-LVM • Observe high-dimensional data S . • Find low-dim representation Z that captures the structure of S . • Find a mapping f from Z to S . Latent space Z Mapping h Inputs S s j = h ( z j , θ ) + noise , where θ are parameters of h .

Traditional GP-LVM In a GP-LVM, GPs are taken to be independent across the features and the likelihood function is: D D � � N ( s d | 0 , K + γ − 1 I ) p ( S | x ) = p ( s d | x ) = (6) d =1 d =1 Observed data Y in matrix form Aligned data S in matrix form

GP-LVM as alignment objective We impose the alignment objective by learning a low-dimensional representation Z of the pseudo-observations S . L GP-LVM = log p ( S | Z , θ h , θ z , β ) − 1 = N 2 Tr ( K − 1 zz SS T ) 2 log | K zz | (7) � �� complexity terms data fitting terms log( p ( Z | θ z )) + + log( p ( θ h )) + const � �� prior over latent variables prior over GP mappings As an alignment objective, it is controlled by: 1. prior over the latent variables Z , p ( Z ) ∼ N ( 0 , θ z I ) 2. lengthscale in the GP-LVM mapping (part of θ h ))

Aside: Pairwise distance alignment objective Observations = 0.000 = 0.100 2 1 0 1 2 0 2 4 6 0 2 4 6 0 2 4 6 = 0.464 = 2.154 = 10.000 2 1 0 1 2 0 2 4 6 0 2 4 6 0 2 4 6 y i , w i ∈ R 8 with γ || w || 2 , i = 1 , 2 , 3 , 4 y i transformed = y i input + w i ,

Aside: GP-LVM as alignment objective Observations 0.000 0.100 2 1 0 1 2 0.02 0.02 0 0 0 0 2 4 6 0 2 4 6 0 2 4 6 0.464 2.154 10.000 2 1 0 1 0.25 0.42 0.51 2 0.34 0.81 0.96 0.02 0.01 0.05 0.04 0.10 0.07 0 2 4 6 0 2 4 6 0 2 4 6 y i , w i ∈ R 8 with γ || w || 2 , i = 1 , 2 , 3 , 4 y i transformed = y i input + w i ,

Aside: Bayesian Mixture Model as alignment objective Observations 0.000 0.100 2 1 0 1 Cluster assignments Cluster assignments 2 0 1 2 3 0 1 2 3 0 2 4 6 0 2 4 6 0 2 4 6 0.464 2.154 10.000 2 1 0 1 Cluster assignments Cluster assignments Cluster assignments 2 0 1 2 3 0 1 2 3 0 1 2 3 0 2 4 6 0 2 4 6 0 2 4 6 y i , w i ∈ R 8 with γ || w || 2 , i = 1 , 2 , 3 , 4 y i transformed = y i input + w i ,

Full objective for sequence alignment 1. For each of the J sequences we perform standard GP regression on the observed data y j and the pseudo-observations s j by learning the hyperparameters of the GPs and the parameters of the warpings. 2. Impose the alignment objective on the pseudo-observations S The sum of the log-likelihoods is: J J � � L = L GP i + L GP-LVM + log p ( g j ) j =1 j =1 J J � � log p ([ s j , y j ] T | x , g j , θ j , β j ) + L GP-LVM ( Z , ψ h , ψ z , γ ) + = log p ( g j ) j =1 j =1 (8)

Results on ECG data Input data: Alignment with GP-LVM objective: Warps Aligned functions Complexity: 6.447 Alignment error: 0.411 Manifold locations 1.00 0.8 0.6 0.75 0.50 0.6 0.4 0.25 0.4 0.2 0.00 0.2 0.0 0.25 0.0 0.50 0.2 0.2 0.75 0.4 0.4 1.00 0 20 40 60 80 100 0 20 40 60 80 100 0.6 0.4 0.2 0.0 0.2 0.4

Competing objectives and joint model X Z g f h β γ S Y J

Competing objectives and joint model X Z g f h β γ S Y J Likelihood p ( S | H , F X ) as an equal mixture (where S j and S n refer to rows and columns of S ):   p ( S | H , F X ) = 1 � � N ( S n | H n , γ − 1 I J ) + N ( S j | F X j , β − 1 I N )  j 2 n j

Multi-task learning and Matrix distributions Given data Y ∈ R J × N : 1. each sequence (row) has a GP prior and there’s a free-form matrix C that models the covariances between the sequences 1 . 2. learn sparse inverse covariance between features while accounting for a low-rank confounding covariance between samples using GP-LVM 2 : p ( Y | R , C − 1 ) = N (vec( Y ) | 0 N × D , C ⊗ R + σ 2 I N × D ) (9) 1 Bonilla et al. Multi-task Gaussian Process Prediction (2008) 2 Stegle et al. Efficient inference in matrix-variate Gaussian models with iid observation noise (2011)

More generally... These types of constructions are useful when: 1. The data has a hierarchical structure with additional constraints: ǫ j ∼ N (0 , β − y j = f k ( g j ( x )) + ǫ j , j 1) 2. We want to perform dim. reduction or clustering that is invariant to a specific transformation

Uncertainty in alignment model

Uncertainty in alignment model While the alignment model is probabilistic, so far we only considered point estimates and ignored the uncertainties associated with warpings and group assignments. Uncertainty in the alignment model contains: 1. Observed sequences are often noisy 2. Warping uncertainty 3. Assignment of sequences to groups is ambiguous

Uncertainty in compositional models of alignment Ieva Kazlauskaite, - PowerPoint PPT Presentation

Uncertainty in compositional models of alignment Ieva Kazlauskaite, University of Bath Neill D.F. Campbell, University of Bath Carl Henrik Ek, University of Bristol Ivan Ustyuzhaninov, University of T ubingen Tom Waterson, Electronic Arts

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

Sequence Alignment (chapter 6) p The biological problem p Global alignment p Local alignment p

Uncertainty AIMA Chapter 13 Outline Uncertainty Uncertainty Probability Syntax and

Ben Burr Trail PROJECT ALIGNMENT Project alignment Hamblen Elem School PROJECT ALIGNMENT

Ben Burr Trail PROJECT ALIGNMENT Project alignment Hamblen Elem School PROJECT ALIGNMENT

Data driven Ontology Alignment Data driven Ontology Alignment Nigam Shah nigam@stanford.edu

Sequence Alignment (chapter 6) The biological problem l Global alignment l Local alignment l

Image alignment Slides from Derek Hoiem, Svetlana Lazebnik Image source Alignment applications

UNCERTAINTY IN KNOWLEDGE Ch. 9 Uncertainty in Knowledge 1 Sources of Uncertainty

Unusual compositional dependence of the Unusual compositional dependence of the exciton reduced

Compositional Analysis of Compositional Analysis of Soluble Salts in Bresle Bresle Extraction

A Compositional Logic A Compositional Logic for Control Flow for Control Flow Gang Tan, Boston

Bruno Gavranovi c SYCO2 Compositional Deep Learning December 18, 2018 1 / 36 Compositional

TOD Alignment Rezoning Public Meeting July 18, 2019 TOD Alignment Rezoning The TOD Alignment

This week CSE 527 Sequence alignment Computational Biology More sequence alignment

Sequence Alignment Mark Voorhies 5/20/2015 Mark Voorhies Sequence Alignment Exercise: Scoring

Fitting a transformation: feature-based alignment Tues Oct 13 Motivation: Recognition Figures

Block Devices, Filesystems And Block Layer Alignment Christoph Anton Mitterer

Binary Foreground Map Evaluation Deng-Ping Fan Nankai University of Media Computing Lab

External Plagiarism Detection using Information Retrieval and Sequence Alignment Rao Muhammad

Web Search Engines Chapter 27, Part C Based on Larson and Hearsts slides at UC-Berkeley

Search and Information Retrieval Search on the Web 1 is a daily activity for many people

Searching with Context Reiner Kraft Farzin Maghoul Chi Chao Chang Ravi Kumar Yahoo!, Inc.,

CS490W Without search engines the web wouldnt scale The acceptance of search interaction makes