Probabilistic Dimensionality Reduction Neil D. Lawrence University - PowerPoint PPT Presentation

Linear Dimensionality Reduction Linear Latent Variable Model ◮ Represent data, Y , with a lower dimensional set of latent variables X . ◮ Assume a linear relationship of the form y i , : = Wx i , : + ǫ i , : , where � � 0 , σ 2 I ǫ i , : ∼ N .

Linear Latent Variable Model Probabilistic PCA ◮ Define linear-Gaussian X relationship between W latent variables and data. σ 2 Y n � � � y i , : | Wx i , : , σ 2 I p ( Y | X , W ) = N i = 1

Linear Latent Variable Model Probabilistic PCA ◮ Define linear-Gaussian X relationship between W latent variables and data. σ 2 ◮ Standard Latent Y variable approach: n � � � y i , : | Wx i , : , σ 2 I p ( Y | X , W ) = N i = 1

Linear Latent Variable Model Probabilistic PCA X W ◮ Define linear-Gaussian relationship between latent variables and σ 2 Y data. ◮ Standard Latent variable approach: n ◮ Define Gaussian prior � � � y i , : | Wx i , : , σ 2 I p ( Y | X , W ) = N over latent space , X . i = 1 n � � � N x i , : | 0 , I p ( X ) = i = 1

Linear Latent Variable Model X W Probabilistic PCA ◮ Define linear-Gaussian relationship between σ 2 Y latent variables and data. ◮ Standard Latent n variable approach: � � � y i , : | Wx i , : , σ 2 I p ( Y | X , W ) = N ◮ Define Gaussian prior i = 1 over latent space , X . n � ◮ Integrate out latent � � p ( X ) = N x i , : | 0 , I variables . i = 1 n � y i , : | 0 , WW ⊤ + σ 2 I � � p ( Y | W ) = N i = 1

Computation of the Marginal Likelihood � � 0 , σ 2 I x i , : ∼ N ( 0 , I ) , ǫ i , : ∼ N y i , : = Wx i , : + ǫ i , : ,

Computation of the Marginal Likelihood � � 0 , σ 2 I x i , : ∼ N ( 0 , I ) , ǫ i , : ∼ N y i , : = Wx i , : + ǫ i , : , Wx i , : ∼ N � 0 , WW ⊤ � ,

Computation of the Marginal Likelihood � � 0 , σ 2 I x i , : ∼ N ( 0 , I ) , ǫ i , : ∼ N y i , : = Wx i , : + ǫ i , : , Wx i , : ∼ N � 0 , WW ⊤ � , 0 , WW ⊤ + σ 2 I � � Wx i , : + ǫ i , : ∼ N

Linear Latent Variable Model II Probabilistic PCA Max. Likelihood Soln (Tipping and Bishop, 1999) W σ 2 Y n � y i , : | 0 , WW ⊤ + σ 2 I � � p ( Y | W ) = N i = 1

Linear Latent Variable Model II Probabilistic PCA Max. Likelihood Soln (Tipping and Bishop, 1999) n N � y i , : | 0 , C � , � C = WW ⊤ + σ 2 I p ( Y | W ) = i = 1

Linear Latent Variable Model II Probabilistic PCA Max. Likelihood Soln (Tipping and Bishop, 1999) n N � y i , : | 0 , C � , � C = WW ⊤ + σ 2 I p ( Y | W ) = i = 1 log p ( Y | W ) = − n 2 log | C | − 1 � � C − 1 Y ⊤ Y 2tr + const. If U q are first q principal eigenvectors of n − 1 Y ⊤ Y and the corresponding eigenvalues are Λ q ,

Linear Latent Variable Model II Probabilistic PCA Max. Likelihood Soln (Tipping and Bishop, 1999) n N � y i , : | 0 , C � , � C = WW ⊤ + σ 2 I p ( Y | W ) = i = 1 log p ( Y | W ) = − n 2 log | C | − 1 � � C − 1 Y ⊤ Y 2tr + const. If U q are first q principal eigenvectors of n − 1 Y ⊤ Y and the corresponding eigenvalues are Λ q , � 1 � W = U q LR ⊤ , Λ q − σ 2 I 2 L = where R is an arbitrary rotation matrix.

Linear Latent Variable Model III Dual Probabilistic PCA ◮ Define linear-Gaussian W relationship between X latent variables and data. σ 2 Y n � � y i , : | Wx i , : , σ 2 I � p ( Y | X , W ) = N i = 1

Linear Latent Variable Model III Dual Probabilistic PCA ◮ Define linear-Gaussian W relationship between X latent variables and data. σ 2 ◮ Novel Latent variable Y approach: n � � y i , : | Wx i , : , σ 2 I � p ( Y | X , W ) = N i = 1

Linear Latent Variable Model III Dual Probabilistic PCA W X ◮ Define linear-Gaussian relationship between latent variables and σ 2 Y data. ◮ Novel Latent variable approach: n ◮ Define Gaussian prior � � y i , : | Wx i , : , σ 2 I � p ( Y | X , W ) = N over parameters , W . i = 1 p � � � p ( W ) = N w i , : | 0 , I i = 1

Linear Latent Variable Model III W X Dual Probabilistic PCA ◮ Define linear-Gaussian relationship between σ 2 Y latent variables and data. ◮ Novel Latent variable n approach: � � y i , : | Wx i , : , σ 2 I � p ( Y | X , W ) = N ◮ Define Gaussian prior i = 1 over parameters , W . p � ◮ Integrate out � � N w i , : | 0 , I p ( W ) = parameters . i = 1 p � � y : , j | 0 , XX ⊤ + σ 2 I � p ( Y | X ) = N j = 1

Computation of the Marginal Likelihood � � 0 , σ 2 I w : , j ∼ N ( 0 , I ) , ǫ i , : ∼ N y : , j = Xw : , j + ǫ : , j ,

Computation of the Marginal Likelihood � � 0 , σ 2 I w : , j ∼ N ( 0 , I ) , ǫ i , : ∼ N y : , j = Xw : , j + ǫ : , j , Xw : , j ∼ N � 0 , XX ⊤ � ,

Computation of the Marginal Likelihood � � 0 , σ 2 I w : , j ∼ N ( 0 , I ) , ǫ i , : ∼ N y : , j = Xw : , j + ǫ : , j , Xw : , j ∼ N � 0 , XX ⊤ � , 0 , XX ⊤ + σ 2 I � � Xw : , j + ǫ : , j ∼ N

Linear Latent Variable Model IV Dual Probabilistic PCA Max. Likelihood Soln (Lawrence, 2004, 2005) X σ 2 Y p � � y : , j | 0 , XX ⊤ + σ 2 I � p ( Y | X ) = N j = 1

Linear Latent Variable Model IV Dual PPCA Max. Likelihood Soln (Lawrence, 2004, 2005) p � � � K = XX ⊤ + σ 2 I p ( Y | X ) = N y : , j | 0 , K , j = 1

Linear Latent Variable Model IV PPCA Max. Likelihood Soln p � � � K = XX ⊤ + σ 2 I p ( Y | X ) = N y : , j | 0 , K , j = 1 log p ( Y | X ) = − p 2 log | K | − 1 � K − 1 YY ⊤ � 2tr + const. q are first q principal eigenvectors of p − 1 YY ⊤ and the If U ′ corresponding eigenvalues are Λ q ,

Linear Latent Variable Model IV PPCA Max. Likelihood Soln p � � � K = XX ⊤ + σ 2 I p ( Y | X ) = N y : , j | 0 , K , j = 1 log p ( Y | X ) = − p 2 log | K | − 1 � K − 1 YY ⊤ � 2tr + const. q are first q principal eigenvectors of p − 1 YY ⊤ and the If U ′ corresponding eigenvalues are Λ q , � 1 � X = U ′ q LR ⊤ , Λ q − σ 2 I 2 L = where R is an arbitrary rotation matrix.

Linear Latent Variable Model IV Dual PPCA Max. Likelihood Soln (Lawrence, 2004, 2005) p � K = XX ⊤ + σ 2 I � � p ( Y | X ) = N y : , j | 0 , K , j = 1 log p ( Y | X ) = − p 2 log | K | − 1 � K − 1 YY ⊤ � 2tr + const. q are first q principal eigenvectors of p − 1 YY ⊤ and the If U ′ corresponding eigenvalues are Λ q , � 1 X = U ′ q LR ⊤ , � Λ q − σ 2 I 2 L = where R is an arbitrary rotation matrix.

Linear Latent Variable Model IV PPCA Max. Likelihood Soln (Tipping and Bishop, 1999) n N � y i , : | 0 , C � , � C = WW ⊤ + σ 2 I p ( Y | W ) = i = 1 log p ( Y | W ) = − n 2 log | C | − 1 � � C − 1 Y ⊤ Y 2tr + const. If U q are first q principal eigenvectors of n − 1 Y ⊤ Y and the corresponding eigenvalues are Λ q , � 1 W = U q LR ⊤ , � Λ q − σ 2 I 2 L = where R is an arbitrary rotation matrix.

Equivalence of Formulations The Eigenvalue Problems are equivalent ◮ Solution for Probabilistic PCA (solves for the mapping) Y ⊤ YU q = U q Λ q W = U q LR ⊤ ◮ Solution for Dual Probabilistic PCA (solves for the latent positions) YY ⊤ U ′ q = U ′ X = U ′ q LR ⊤ q Λ q ◮ Equivalence is from − 1 U q = Y ⊤ U ′ q Λ 2 q

Gaussian Processes: Extremely Short Overview 6 4 2 0 -2 -4 -6 0 2 4 6 8 10

Gaussian Processes: Extremely Short Overview 6 6 4 4 2 2 0 0 -2 -2 -4 -4 -6 -6 0 2 4 6 8 10 0 2 4 6 8 10

Non-Linear Latent Variable Model W Dual Probabilistic PCA X ◮ Define linear-Gaussian relationship between σ 2 latent variables and Y data. ◮ Novel Latent variable n � � � y i , : | Wx i , : , σ 2 I p ( Y | X , W ) = N approach: i = 1 ◮ Define Gaussian prior p over parameteters , W . � � � N w i , : | 0 , I p ( W ) = ◮ Integrate out i = 1 parameters . p � � y : , j | 0 , XX ⊤ + σ 2 I � p ( Y | X ) = N j = 1

Non-Linear Latent Variable Model Dual Probabilistic PCA ◮ Inspection of the marginal likelihood W shows ... X σ 2 Y p � y : , j | 0 , XX ⊤ + σ 2 I � � p ( Y | X ) = N j = 1

Non-Linear Latent Variable Model Dual Probabilistic PCA ◮ Inspection of the W marginal likelihood X shows ... ◮ The covariance matrix is a covariance σ 2 Y function. p � � � p ( Y | X ) = N y : , j | 0 , K j = 1 K = XX ⊤ + σ 2 I

Non-Linear Latent Variable Model Dual Probabilistic PCA W ◮ Inspection of the X marginal likelihood shows ... σ 2 ◮ The covariance matrix Y is a covariance function. p ◮ We recognise it as the � � � p ( Y | X ) = N y : , j | 0 , K ‘linear kernel’. j = 1 K = XX ⊤ + σ 2 I This is a product of Gaussian processes with linear kernels.

Non-Linear Latent Variable Model Dual Probabilistic PCA W ◮ Inspection of the X marginal likelihood shows ... ◮ The covariance matrix σ 2 Y is a covariance function. p ◮ We recognise it as the � � � p ( Y | X ) = N y : , j | 0 , K ‘linear kernel’. j = 1 ◮ We call this the K = ? Gaussian Process Latent Variable model Replace linear kernel with non-linear (GP-LVM). kernel for non-linear model.

Non-linear Latent Variable Models Exponentiated Quadratic (EQ) Covariance � � ◮ The EQ covariance has the form k i , j = k x i , : , x j , : , where 2  � �  � x i , : − x j , : � �   � � � 2   k x i , : , x j , : = α exp  −  .     2 ℓ 2     ◮ No longer possible to optimise wrt X via an eigenvalue problem. ◮ Instead find gradients with respect to X , α, ℓ and σ 2 and optimise using conjugate gradients.

Outline Probabilistic Linear Dimensionality Reduction Non Linear Probabilistic Dimensionality Reduction Examples Conclusions

Applications Style Based Inverse Kinematics ◮ Facilitating animation through modeling human motion (Grochow et al., 2004) Tracking ◮ Tracking using human motion models (Urtasun et al., 2005, 2006) Assisted Animation ◮ Generalizing drawings for animation (Baxter and Anjyo, 2006) Shape Models ◮ Inferring shape (e.g. pose from silhouette). (Ek et al., 2008b,a; Priacuriu and Reid, 2011a,b)

Example: Latent Doodle Space (Baxter and Anjyo, 2006)

Example: Latent Doodle Space (Baxter and Anjyo, 2006) Generalization with much less Data than Dimensions ◮ Powerful uncertainly handling of GPs leads to surprising properties. ◮ Non-linear models can be used where there are fewer data points than dimensions without overfitting .

Prior for Supervised Learning (Urtasun and Darrell, 2007) ◮ We introduce a prior that is based on the Fisher criteria    − 1   � � S − 1   p ( X ) ∝ exp tr w S b  ,   σ 2     d with S b the between class matrix and S w the within class matrix

Prior for Supervised Learning (Urtasun and Darrell, 2007) ◮ We introduce a prior that is based on the Fisher criteria    − 1   � � S − 1   p ( X ) ∝ exp tr w S b  ,   σ 2     d with S b the between class matrix and S w the within class matrix L n i � n ( M i − M 0 )( M i − M 0 ) ⊤ S w = i = 1 where X ( i ) = [ x ( i ) 1 , · · · , x ( i ) n i ] are the n i training points of class i , M i is the mean of the elements of class i , and M 0 is the mean of all the training points of all classes.

Prior for Supervised Learning (Urtasun and Darrell, 2007) ◮ We introduce a prior that is based on the Fisher criteria    − 1   � � S − 1   p ( X ) ∝ exp tr w S b  ,   σ 2     d with S b the between class matrix and S w the within class matrix L n i � n ( M i − M 0 )( M i − M 0 ) ⊤ S w = i = 1 L  n i  n i 1 � � ( x ( i ) k − M i )( x ( i )  k − M i ) ⊤  S b =       n n i     i = 1 k = 1 where X ( i ) = [ x ( i ) 1 , · · · , x ( i ) n i ] are the n i training points of class i , M i is the mean of the elements of class i , and M 0 is the

Prior for Supervised Learning (Urtasun and Darrell, 2007) ◮ We introduce a prior that is based on the Fisher criteria    − 1   � � S − 1   p ( X ) ∝ exp tr w S b  ,   σ 2     d with S b the between class matrix and S w the within class matrix 1 0.5 0.6 0 0.4 − 1 0 0.2 − 2 0 − 3 0.2 0.5 − 4 0.4 − 5 0.6 − 4 − 2 0 2 − 1 5 − 1 − 0 5 0 0 5 1 − 0.8 − 0.6 − 0.4 − 0.2 0 0.2 0.4

GaussianFace (Lu and Tang, 2014) ◮ First system to surpass human performance on cropped Learning Faces in Wild Data. http://tinyurl.com/nkt9a38 ◮ Lots of feature engineering, followed by a Discriminative GP-LVM. 1 0.98 0.96 0.94 true positive rate 0.92 0.9 0.88 High dimensional LBP (95.17%) [Chen et al. 2013] Fisher Vector Faces (93.03%) [Simonyan et al. 2013] 0.86 TL Joint Bayesian (96.33%) [Cao et al. 2013] Human, cropped (97.53%) [Kumar et al. 2009] 0.84 DeepFace-ensemble (97.35%) [Taigman et al. 2014] 0.82 ConvNet-RBM (92.52%) [Sun et al. 2013] Figure 5: The two rows present examples of matched and GaussianFace-FE + GaussianFace-BC (98.52%) 0.8 mismatched pairs respectively from LFW that were incorrectly 0 0.05 0.1 0.15 0.2 classified by the GaussianFace model. false positive rate Conclusion and Future Work Figure 4: The ROC curve on LFW. Our method achieves the best performance, beating human-level performance. This paper presents a principled Multi-Task Learning ap-

Continuous Character Control (Levine et al., 2012) ◮ Graph diffusion prior for enforcing connectivity between motions. � log K d log p ( X ) = w c ij i , j with the graph diffusion kernel K d obtain from K d H = − T − 1 / 2 LT − 1 / 2 ij = exp( β H ) with the graph Laplacian, and T is a diagonal matrix with T ii = � j w ( x i , x j ),  � k w ( x i , x k ) if i = j   L ij =  − w ( x i , x j )  otherwise.   and w ( x i , x j ) = || x i − x j || − p measures similarity.

Character Control: Results

GPLVM for Character Animation ◮ Learn a GPLVM from a small mocap sequence ◮ Pose synthesis by solving an optimization problem arg min x , y − log p ( y | x ) such that C ( y ) = 0 ◮ These handle constraints may come from a user in an interactive session, or from a mocap system. ◮ Smooth the latent space by adding noise in order to reduce the number of local minima. ◮ Optimization in an annealed fashion over different anneal version of the latent space.

Application: Replay same motion (Grochow et al., 2004)

Application: Keyframing joint trajectories (Grochow et al., 2004)

Application: Deal with missing data in mocap (Grochow et al., 2004)

Application: Style Interpolation (Grochow et al., 2004)

Shape Priors in Level Set Segmentation ◮ Represent contours with elliptic Fourier descriptors ◮ Learn a GPLVM on the parameters of those descriptors ◮ We can now generate close contours from the latent space ◮ Segmentation is done by non-linear minimization of an image-driven energy which is a function of the latent space

GPLVM on Contours [ V. Prisacariu and I. Reid, ICCV 2011]

Segmentation Results [ V. Prisacariu and I. Reid, ICCV 2011]

Probabilistic Dimensionality Reduction Neil D. Lawrence University - PowerPoint PPT Presentation

Probabilistic Dimensionality Reduction Neil D. Lawrence University of Sheffield Facebook, London 14th April 2016 Outline Probabilistic Linear Dimensionality Reduction Non Linear Probabilistic Dimensionality Reduction Examples Conclusions

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

Probabilistic Dimensionality Reduction Neil D. Lawrence Amazon Research Cambridge and University

WIKIPEDIA ARTICLE GROUP 9 Contents Article Overview 1. Dimensionality Reduction 2.

Nonlinear Dimensionality Reduction Donovan Parks Overview Direct visualization vs.

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Exploring Multivariate Data with Clustering and Dimensionality Reduction Marco Baroni Practical

Applied Machine Learning Dimensionality reduction using PCA Siamak Ravanbakhsh COMP 551 (Fall

Preprocessing and Dimensionality Reduction J er emy Fix CentraleSup elec

DIMENSIONALITY REDUCTION DIMENSIONALITY REDUCTION MATTHIEU BLOCH April 21, 2020 1 / 26

Kernel-Based Dimensionality Reduction Methods on Synthesized and Facial Image Data Jonathan L.

Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3 In this subfield, we think

Spatial Data: Dimensionality Reduction CSC444 Techniques In this subfield, we think of a data

Dimensionality Reduction INFO-4604, Applied Machine Learning University of Colorado Boulder

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

(Indicative) outline Introduction Multimedia Indexing and Retrieval Descriptors Georges

Shape Features WangRuchen CVBIOUC http://vision.ouc.edu.cn/~zhenghaiyong How to Convex hull

EVC - Computer Vision Recap 2 Content: Global Operations Morphological Operations

Matching Deformable Objects in Clutter Emanuele Rodol` a USI Lugano Joint work with L. Cosmo

Computing and Processing Correspondences with Functional Maps SIGGRAPH 2017 course Maks

CS481: Bioinformatics Algorithms Can Alkan EA224 calkan@cs.bilkent.edu.tr

The M4RI & M4RIE libraries for linear algebra over F 2 and small extensions Martin R. Albrecht

On APN functions EA-equivalent to permutations Valeriya Idrisova Sobolev Institute of

Probabilistic Dimensionality Reduction Neil D. Lawrence University - PowerPoint PPT Presentation

Probabilistic Dimensionality Reduction Neil D. Lawrence University of Sheffield Facebook, London 14th April 2016 Outline Probabilistic Linear Dimensionality Reduction Non Linear Probabilistic Dimensionality Reduction Examples Conclusions

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

Probabilistic Dimensionality Reduction Neil D. Lawrence Amazon Research Cambridge and University

WIKIPEDIA ARTICLE GROUP 9 Contents Article Overview 1. Dimensionality Reduction 2.

Nonlinear Dimensionality Reduction Donovan Parks Overview Direct visualization vs.

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Exploring Multivariate Data with Clustering and Dimensionality Reduction Marco Baroni Practical

Applied Machine Learning Dimensionality reduction using PCA Siamak Ravanbakhsh COMP 551 (Fall

Preprocessing and Dimensionality Reduction J er emy Fix CentraleSup elec

DIMENSIONALITY REDUCTION DIMENSIONALITY REDUCTION MATTHIEU BLOCH April 21, 2020 1 / 26

Kernel-Based Dimensionality Reduction Methods on Synthesized and Facial Image Data Jonathan L.

Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3 In this subfield, we think

Spatial Data: Dimensionality Reduction CSC444 Techniques In this subfield, we think of a data

Dimensionality Reduction INFO-4604, Applied Machine Learning University of Colorado Boulder

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

(Indicative) outline Introduction Multimedia Indexing and Retrieval Descriptors Georges

Shape Features WangRuchen CVBIOUC http://vision.ouc.edu.cn/~zhenghaiyong How to Convex hull

EVC - Computer Vision Recap 2 Content: Global Operations Morphological Operations

Matching Deformable Objects in Clutter Emanuele Rodol` a USI Lugano Joint work with L. Cosmo

Computing and Processing Correspondences with Functional Maps SIGGRAPH 2017 course Maks

CS481: Bioinformatics Algorithms Can Alkan EA224 calkan@cs.bilkent.edu.tr

The M4RI &amp; M4RIE libraries for linear algebra over F 2 and small extensions Martin R. Albrecht

On APN functions EA-equivalent to permutations Valeriya Idrisova Sobolev Institute of

The M4RI & M4RIE libraries for linear algebra over F 2 and small extensions Martin R. Albrecht