resampling pca gp inference
play

Resampling PCA & GP Inference Manfred Opper (ISIS, University - PowerPoint PPT Presentation

Resampling PCA & GP Inference Manfred Opper (ISIS, University of Southampton) Motivation Construct simple intractable GP model Study approximate (EC/EP) inference MC conceptually simple Get a quantitative idea why


  1. Resampling PCA & GP Inference Manfred Opper (ISIS, University of Southampton)

  2. Motivation • Construct “simple” intractable GP model • Study approximate (EC/EP) inference • “MC” conceptually simple • Get a quantitative idea why EC inference works.

  3. Resampling (Bootstrap) Estimate average case properties (test errors) of statistical estimators based on a single dataset D 0 = { y 1 , y 2 , y 3 } Bootstrap: Resample with replacement → Generate pseudo data. D 1 = { y 1 , y 2 , y 2 } , D 2 = { y 1 , y 1 , y 1 } , D 3 = { y 2 , y 3 , y 3 } , . . . etc Problem: Each sample requires retraining of some learning algorithm. Mapping to probabilistic model & Approximate inference: Only single trai- ning (inference) for single (effective) model required (Malzahn & Opper 2003).

  4. PCA • Goal: Project ( d dimensional) data vectors y → P q [ y ] on q < d dimensio- nal subspace with minimal reconstruction error E || y − P q [ y ] || 2 . • Method: Approximate expectation by N training data D 0 given by the ( d × N ) matrix Y = ( y 1 , y 2 , . . . , y N ). y i ∈ R d . d = ∞ allowed (feature vectors). Optimal subspace spanned by eigenvectors u l of data covariance matrix C = 1 N YY T corresponding to the q largest eigenvalues λ l ≥ λ .

  5. Reconstruction Error Expected reconstruction error (on novel data) E ( y · u l ) 2 � ε ( λ ) = l : λ l <λ Resample averaged reconstruction error   E r = 1 � y i y T i u l u T � � Tr E D   l N 0   y i / ∈ D ; λ l <λ

  6. Bootstrap of density of Eigenvalues 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.5 1 1.5 2 2.5 3 3.5 4 eigenvalue λ Bootstrap ( N = 50 random data, Dim = 25) 1 × and 3 × oversampled 2 1.4 1.8 1.2 1.6 1 1.4 1.2 0.8 1 0.6 0.8 0.6 0.4 0.4 0.2 0.2 0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4

  7. The model • Let s i = # times y i ∈ D • Diagonal random matrix D ii = D i = 1 C ( ǫ ) = Γ N YDY T . µ Γ( s i + ǫδ s i , 0 ) C (0) ∝ covariance matrix of the resampled data. • kernel matrix K = 1 N Y T Y • Partition function − 1 � � � K − 1 + D d N x exp 2 x T � � = Z x − 1 1 � � � 2 z T ( C ( ǫ ) + Γ I ) z 2 Γ d/ 2 (2 π ) ( N − d ) / 2 d d z exp = | K | .

  8. Z as generating function N − 2 ∂ ln Z 1 δ s j , 0 Tr y j y T � = j G (Γ) ∂ǫ µN ǫ =0 j =1 − 2 ∂ ln Z d = Γ + Tr G (Γ) ∂ Γ with u k u T G (Γ) = ( C (0) + Γ I ) − 1 = k � λ k + Γ k Compare with (resample averaged) reconstruction error   E r = 1 � y i y T i u l u T � � E D Tr   l N 0   ∈ D ; λ l <λ y i /

  9. Analytical Continuation Reconstruction error   E r = 1 y i y T i u l u T � � � Tr E D   l N 0   ∈ D ; λ l <λ y i / 1 Use representation of the Dirac δ δ ( x ) = lim η → 0 + ℑ π ( x − iη ) and get � λ 0 + dλ ′ ε r ( λ ′ ) E r = E 0 r + where   ε r ( λ ) = 1 η → 0 + ℑ 1 y j y T � � � π lim δ s j , 0 Tr j G ( − λ − iη ) E D  N 0 j defines error density from all eigenvalues > 0 and and E 0 r is the contribution from eigenspace with λ k = 0.

  10. Replica Trick Data averaged free energy 1 n ln E D [ Z n ] , − E D [ln Z ] = − lim n → 0 for integer n : Z ( n ) . � = E D [ Z n ] = dx ψ 1 ( x ) ψ 2 ( x ) where we set x . = ( x 1 , . . . , x n ) and       n n  − 1  − 1  x T  x T a K − 1 x a � � ψ 1 ( x ) = E D  exp ψ 2 ( x ) = exp a Dx a   2 2 a =1  a =1 intractable!

  11. Approximate Inference (EC: Opper & Winther) p 1 ( x ) = 1 p 0 ( x ) = 1 2 Λ 0 x T x , ψ 1 ( x ) e − Λ 1 x T x e − 1 Z 1 Z 0 with Λ 1 and Λ 0 “variational” parameters dx p 1 ( x ) ψ 2 ( x ) e Λ 1 x T x � Z ( n ) = Z 1 dx p 0 ( x ) ψ 2 ( x ) e Λ 1 x T x ≡ Z ( n ) � ≈ Z 1 EC (Λ 1 , Λ 0 ) Match moments � x T x � 1 = � x T x � 0 & Stationarity w.r.t. Λ 1 Final result 2 x T ( D +(Λ 0 − Λ) I ) x d x e − 1 � � � − ln Z EC = − E D ln − 2 x T ( K − 1 +Λ I ) x + ln d x e − 1 d x e − 1 � � 2 Λ 0 x T x − ln where we have set Λ = Λ 0 − Λ 1 . Tractable!

  12. Result: Artificial Data N = 50 data, Dim = 25, 3 × oversampled. EC vs resampling 1.2 1 0.8 0.6 0.4 0.2 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 eigenvalue λ

  13. The PCA Reconstruction Error ( N = 32 artificial random data, Dim = 25) Approximate bootstrap 3 × oversampled 25 20 15 10 5 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 eigenvalue λ test error versus sum of eigenvalues (training error)

  14. Approximate Bootstrap: handwritten Digits ( N = 100 data, Dim = 784) Density of eigenvalues and reconstruction error 0.6 20 0.4 15 10 0.2 5 eigenvalue λ 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

  15. The result without replicas 2 x T ( D +(Λ 0 − Λ) I ) x − ln 2 x T ( K − 1 +Λ I ) x + d x e − 1 d x e − 1 � � − ln Z = − ln 2 Λ 0 x T x + 1 d x e − 1 � + ln 2 ln det( I + r ) with � � � Λ 0 � − 1 − I � K − 1 + Λ I � r ij = 1 − Λ 0 . Λ 0 − Λ + D i ij Expand ∞ ( − 1) k +1 � r k � � ln det ( I + r ) = Tr ln ( I + r ) = Tr k k =1 We have E D [ r ij ] = 0 → 1.order term vanishes after average, 2.order yields on average � 2 � 2 � ∆ F = − 1 Λ 0 � − 1 � K − 1 + Λ I � � � Λ 0 − 1 × − 1 E D ii 4 Λ 0 − Λ + D i i i

  16. Correction Correction to resampling error 0.6 22 Resampled reconstruction error ( λ = 0) 20 18 16 0.4 14 12 10 0.2 8 6 4 0 2 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4 Resampling rate µ Resampling rate µ

  17. Correction to EC Z ( n ) �� � dk dxp 1 ( x ) ψ 2 ( x ) e Λ 1 x T x = 1 2 Λ x T x (2 π ) Nn e − ik T x χ ( k ) � � = dxψ 2 ( x ) e Z 1 � dx p 1 ( x ) e − ik T x is the characteristic function of the density where χ ( k ) . = p 1 . Cumulant expansion starts with a quadratic term (EC) ln χ ( k ) = − M 2 2 k T k + R ( k ) , (1) where M 2 = � x T a x a � 1 . Expand 4-th order term in R ( k ) as e R ( k ) = 1 + R ( k ) + . . . leads to ∆ F . Possibility of perturbative improvement?

  18. Conclusion • Non–Bayesian inference problems can be related to “hidden” probabili- stic models via analytic continuation. • EC approximate inference appears to be robust and survives analytic continuation and limits.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend