estimation of the kernel mean embedding with uncertainty
play

Estimation of the Kernel Mean Embedding (with uncertainty) Paul - PowerPoint PPT Presentation

Estimation of the Kernel Mean Embedding (with uncertainty) Paul Rubenstein University of Cambridge Max-Planck Institute for Intelligent Systems, Tbingen 20th January 2016 RKHS theory A function k : X X R is a kernel if given x 1 ,


  1. Estimation of the Kernel Mean Embedding (with uncertainty) Paul Rubenstein University of Cambridge Max-Planck Institute for Intelligent Systems, Tübingen 20th January 2016

  2. RKHS theory A function k : X × X − → R is a kernel if given x 1 , . . . , x n ∈ X , K ij = k ( x i , x k ) K is symmetric and positive semi-definite ( = is a valid covariance matrix) Associated to k are: ◮ A Hilbert space H of functions X − → R ◮ A ‘feature map’ φ : X − → R such that k ( x, x ′ ) = � φ ( x ) , φ ( x ′ ) � 2 of 11

  3. RKHS theory Suppose we are given: ◮ A random variable X ∼ P taking value in X ◮ A function f : X − → R � and that we want to evaluate f ( x ) d P ( x ) = E X f ( X ) . If f ∈ H , then E X f ( X ) = E X � f, φ ( X ) � = � f, E X φ ( X ) � So if we know the mean embedding of X , µ X := E X φ ( X ) , then we can calculate expectations of any function in H by taking an product. 3 of 11

  4. RKHS theory For certain k , the mapping P �→ µ P is injective, ie P = Q ⇐ ⇒ µ P = µ Q We can exploit this to construct statistical tests of properties of distributions. Two sample test: Given { X i } ∼ P , { Y i } ∼ Q , does P = Q ? Idea: estimate µ P , µ Q and see how different they are Independence testing: Given { ( X i , Y i ) } ∼ P XY does P XY = P X P Y ? Idea: estimate P XY , P X P Y and see how different they are 4 of 11

  5. Estimating µ X � k ( x, · ) d P ( x ) How do we estimate µ X ? µ X = E φ ( X ) = If { X i } n i =1 ∼ P , then n Emprical mean embedding 0.7 µ := 1 � ˆ k ( X i , · ) − → µ X 0.6 n i =1 0.5 0.4     k ( X 1 , · ) 1 /n mu(x) . . 0.3 . . 1 n =  , Φ =     . .    0.2 1 /n k ( X n , · ) 0.1 0 -5 -4 -3 -2 -1 0 1 2 3 4 5 x µ = 1 ⊺ ˆ n Φ 5 of 11

  6. Estimating µ X In Muandet et al 2015(?) (Kernel Mean Shrinkage Estimators), the risk of an estimator ˆ µ is defined: µ − µ � 2 ∆ = E � ˆ H and estimators that minimise ∆ are sought. Two proposals: For particular α that can be For λ estimated (by cross estimated from observations, validation) from observations, µ λ = Φ ⊺ ( K + λI ) − 1 K 1 n µ α = (1 − α )ˆ ˆ µ ˆ = (1 − α ) 1 ⊺ n Φ (this looks like GP regression) 6 of 11

  7. Bayesian estimation of µ X µ λ = Φ ⊺ ( K + λI ) − 1 K 1 n µ α = (1 − α ) 1 ⊺ ˆ n Φ ˆ Kernel Ridge Regression ⇐ ⇒ MAP inference in GP regression. Can we show that these estimators are the MAP solution to a Bayesian inference problem? µ ∼ GP (0 , k ) µ ∼ GP (0 , k ) µ | µ ∼ GP ( µ, γk ) ˆ µ | µ ∼ GP ( µ, λ I x = x ′ ) ˆ Define ‘pseudo-targets’ ˆ µ ( x ) = K 1 n and then perform Bayesian inference 7 of 11

  8. Deriving µ = (1 − α )Φ 1 n Consider µ ∼ GP (0 , k ) , ˆ µ | µ ∼ GP ( µ, γk ) � µ ( z ) � Choose a previously unobserved z and consider distribution of µ ( x ) ˆ k ⊺ � µ ( z ) � � � k zz �� z ∼ N 0 , µ ( x ) ˆ k z (1 + γ ) K � � 1 1 z K − 1 k z = ⇒ µ ( z ) | ˆ µ ( x ) ∼ N 1 + γ k ⊺ z 1 n , k zz − 1 + γ k ⊺ 1 α So if 1+ γ = (1 − α ) ⇐ ⇒ γ = 1 − α then MAP solution is µ = (1 − α )Φ 1 n 8 of 11

  9. µ λ = Φ ⊺ ( K + λI ) − 1 K 1 n Deriving ˆ Considering next µ ∼ GP (0 , k ) , ˆ µ | µ ∼ GP ( µ, λ I x = x ′ ) � µ ( z ) � � � k zz k ⊺ �� z ∼ N 0 , µ ( x ) ˆ k z K + λI z ( K + λI ) − 1 K 1 n , k zz − k ⊺ z ( K + λI ) − 1 k z � k ⊺ � = ⇒ µ ( z ) | ˆ µ ( x ) ∼ N Thus the MAP solution is µ = Φ ⊺ ( K + λI ) − 1 K 1 n 9 of 11

  10. Some problems Although we derive the same solution, most of the approach taken in the above doesn’t really make sense: ◮ The prior over µ is not sensible ◮ The likelihood ˆ µ is wrong - in fact, for large n , µ ≈ GP ( µ, 1 n [ C XX − µ X ⊗ µ X )] ˆ ◮ Uncertainty does not decay far away from observations as n grows. 10 of 11

  11. Thanks! Discussion? 11 of 11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend