Estimation of the Kernel Mean Embedding (with uncertainty) Paul - PowerPoint PPT Presentation

Estimation of the Kernel Mean Embedding (with uncertainty) Paul Rubenstein University of Cambridge Max-Planck Institute for Intelligent Systems, Tübingen 20th January 2016

RKHS theory A function k : X × X − → R is a kernel if given x 1 , . . . , x n ∈ X , K ij = k ( x i , x k ) K is symmetric and positive semi-definite ( = is a valid covariance matrix) Associated to k are: ◮ A Hilbert space H of functions X − → R ◮ A ‘feature map’ φ : X − → R such that k ( x, x ′ ) = � φ ( x ) , φ ( x ′ ) � 2 of 11

RKHS theory Suppose we are given: ◮ A random variable X ∼ P taking value in X ◮ A function f : X − → R � and that we want to evaluate f ( x ) d P ( x ) = E X f ( X ) . If f ∈ H , then E X f ( X ) = E X � f, φ ( X ) � = � f, E X φ ( X ) � So if we know the mean embedding of X , µ X := E X φ ( X ) , then we can calculate expectations of any function in H by taking an product. 3 of 11

RKHS theory For certain k , the mapping P �→ µ P is injective, ie P = Q ⇐ ⇒ µ P = µ Q We can exploit this to construct statistical tests of properties of distributions. Two sample test: Given { X i } ∼ P , { Y i } ∼ Q , does P = Q ? Idea: estimate µ P , µ Q and see how different they are Independence testing: Given { ( X i , Y i ) } ∼ P XY does P XY = P X P Y ? Idea: estimate P XY , P X P Y and see how different they are 4 of 11

Estimating µ X � k ( x, · ) d P ( x ) How do we estimate µ X ? µ X = E φ ( X ) = If { X i } n i =1 ∼ P , then n Emprical mean embedding 0.7 µ := 1 � ˆ k ( X i , · ) − → µ X 0.6 n i =1 0.5 0.4     k ( X 1 , · ) 1 /n mu(x) . . 0.3 . . 1 n =  , Φ =     . .    0.2 1 /n k ( X n , · ) 0.1 0 -5 -4 -3 -2 -1 0 1 2 3 4 5 x µ = 1 ⊺ ˆ n Φ 5 of 11

Estimating µ X In Muandet et al 2015(?) (Kernel Mean Shrinkage Estimators), the risk of an estimator ˆ µ is defined: µ − µ � 2 ∆ = E � ˆ H and estimators that minimise ∆ are sought. Two proposals: For particular α that can be For λ estimated (by cross estimated from observations, validation) from observations, µ λ = Φ ⊺ ( K + λI ) − 1 K 1 n µ α = (1 − α )ˆ ˆ µ ˆ = (1 − α ) 1 ⊺ n Φ (this looks like GP regression) 6 of 11

Bayesian estimation of µ X µ λ = Φ ⊺ ( K + λI ) − 1 K 1 n µ α = (1 − α ) 1 ⊺ ˆ n Φ ˆ Kernel Ridge Regression ⇐ ⇒ MAP inference in GP regression. Can we show that these estimators are the MAP solution to a Bayesian inference problem? µ ∼ GP (0 , k ) µ ∼ GP (0 , k ) µ | µ ∼ GP ( µ, γk ) ˆ µ | µ ∼ GP ( µ, λ I x = x ′ ) ˆ Define ‘pseudo-targets’ ˆ µ ( x ) = K 1 n and then perform Bayesian inference 7 of 11

Deriving µ = (1 − α )Φ 1 n Consider µ ∼ GP (0 , k ) , ˆ µ | µ ∼ GP ( µ, γk ) � µ ( z ) � Choose a previously unobserved z and consider distribution of µ ( x ) ˆ k ⊺ � µ ( z ) � � � k zz �� z ∼ N 0 , µ ( x ) ˆ k z (1 + γ ) K � � 1 1 z K − 1 k z = ⇒ µ ( z ) | ˆ µ ( x ) ∼ N 1 + γ k ⊺ z 1 n , k zz − 1 + γ k ⊺ 1 α So if 1+ γ = (1 − α ) ⇐ ⇒ γ = 1 − α then MAP solution is µ = (1 − α )Φ 1 n 8 of 11

µ λ = Φ ⊺ ( K + λI ) − 1 K 1 n Deriving ˆ Considering next µ ∼ GP (0 , k ) , ˆ µ | µ ∼ GP ( µ, λ I x = x ′ ) � µ ( z ) � � � k zz k ⊺ �� z ∼ N 0 , µ ( x ) ˆ k z K + λI z ( K + λI ) − 1 K 1 n , k zz − k ⊺ z ( K + λI ) − 1 k z � k ⊺ � = ⇒ µ ( z ) | ˆ µ ( x ) ∼ N Thus the MAP solution is µ = Φ ⊺ ( K + λI ) − 1 K 1 n 9 of 11

Some problems Although we derive the same solution, most of the approach taken in the above doesn’t really make sense: ◮ The prior over µ is not sensible ◮ The likelihood ˆ µ is wrong - in fact, for large n , µ ≈ GP ( µ, 1 n [ C XX − µ X ⊗ µ X )] ˆ ◮ Uncertainty does not decay far away from observations as n grows. 10 of 11

Thanks! Discussion? 11 of 11

Estimation of the Kernel Mean Embedding (with uncertainty) Paul - PowerPoint PPT Presentation

Estimation of the Kernel Mean Embedding (with uncertainty) Paul Rubenstein University of Cambridge Max-Planck Institute for Intelligent Systems, Tbingen 20th January 2016 RKHS theory A function k : X X R is a kernel if given x 1 ,

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

Uncertainty AIMA Chapter 13 Outline Uncertainty Uncertainty Probability Syntax and

Graph Drawing Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 )

Planarity Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 ) assigns

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Non-parametric Methods Oliver Schulte - CMPT 726 Bishop PRML Ch. 2.5 Kernel Density Estimation

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Embedding 3-manifolds via surgery on surfaces Kyle Larson University of Texas at Austin

Lecture 7: Kernel Density Estimation Applied Statistics 2015 1 / 20 Kernel Density Estimator

UNCERTAINTY IN KNOWLEDGE Ch. 9 Uncertainty in Knowledge 1 Sources of Uncertainty

More efficient representations of compounds for machine learning models Bing Huang and Anatole von

Linearized two-layers neural network in high dimensions Song Mei Stanford University May 26,

World's Fastest Machine Learning With GPUs http://github.com/h2oai/h2o4gpu Speaker: Jonathan C.

The effectiveness of fiscal policy The effectiveness of fiscal policy in Australia - - selected

Parallel and Distributed Training of Neural Networks via Successive Convex Approximation Authors

Neuromorphic Computing with Reservoir Neural Networks on Memristive Hardware Aaron Stockdill

A Contextual-Bandit Approach to Personalized News Article Recommendation Lihong li, Wei Chu,

Planning practice and the purpose of the enforcement system Remedy the effects of