consistent kernel mean estimation for functions of random
play

Consistent Kernel Mean Estimation for Functions of Random Variables - PowerPoint PPT Presentation

Consistent Kernel Mean Estimation for Functions of Random Variables Ilya Tolstikhin jointly with C.-J. Simon-Gabriel, A. ` Scibior, and B. Sch olkopf (NIPS 2016) Dagstuhl December 2016 Motivation Given: Independent random variables X


  1. Consistent Kernel Mean Estimation for Functions of Random Variables Ilya Tolstikhin jointly with C.-J. Simon-Gabriel, A. ` Scibior, and B. Sch¨ olkopf (NIPS 2016) Dagstuhl December 2016

  2. Motivation Given: ◮ Independent random variables X ∈ X and Y ∈ Y ; ◮ i.i.d. samples { X i } N i =1 and { Y j } N j =1 ; ◮ Any function f : X × Y → Z . Construct a flexible representation for the distribution of Z = f ( X, Y ) . Let’s represent distributions using their mean embeddings. The simplest estimator is: √ := 1 � N µ (1) � � ˆ f ( X i , Y i ) , · N − consistent i =1 k Z . Z N Experiments show that the U-statistic estimator performs better: √ 1 � N µ (2) � � ˆ := i,j =1 k Z f ( X i , Y j ) , · . N − consistent Z N 2

  3. Motivation Given: ◮ Independent random variables X ∈ X and Y ∈ Y ; ◮ i.i.d. samples { X i } N i =1 and { Y j } N j =1 ; ◮ Any function f : X × Y → Z . Construct a flexible representation for the distribution of Z = f ( X, Y ) . Let’s represent distributions using their mean embeddings. The simplest estimator is: √ := 1 � N µ (1) � � ˆ f ( X i , Y i ) , · N − consistent i =1 k Z . Z N Experiments show that the U-statistic estimator performs better: √ 1 � N µ (2) � � ˆ := i,j =1 k Z f ( X i , Y j ) , · . N − consistent Z N 2

  4. Motivation Experiments show that the U-statistic estimator performs better: √ 1 � N µ (2) � � ˆ := i,j =1 k Z f ( X i , Y j ) , · . N − consistent Z N 2 Unfortunately, N 2 may be computationally prohibitive. Sch¨ olkopf et. al (2015) : take n ≪ N and use reduced set methods to � N i =1 k ( X i , · ) ≈ � n 1 i =1 w i k ( X ′ 1. Approximate i , · ) ; N � N j =1 k ( Y j , · ) ≈ � n 1 j =1 v j k ( Y ′ 2. Approximate j , · ) ; N 3. Use the following estimator: n � f ( X ′ i , Y ′ � � µ Z := ˆ w i v j k Z j ) , · . i,j =1 Question: is ˆ µ Z consistent?

  5. Motivation Experiments show that the U-statistic estimator performs better: √ 1 � N µ (2) � � ˆ := i,j =1 k Z f ( X i , Y j ) , · . N − consistent Z N 2 Unfortunately, N 2 may be computationally prohibitive. Sch¨ olkopf et. al (2015) : take n ≪ N and use reduced set methods to � N i =1 k ( X i , · ) ≈ � n 1 i =1 w i k ( X ′ 1. Approximate i , · ) ; N � N j =1 k ( Y j , · ) ≈ � n 1 j =1 v j k ( Y ′ 2. Approximate j , · ) ; N 3. Use the following estimator: n � f ( X ′ i , Y ′ � � µ Z := ˆ w i v j k Z j ) , · . i,j =1 Question: is ˆ µ Z consistent?

  6. New results Answer: yes, ˆ µ Z is indeed consistent. Assume: Proof based on [SS16] ◮ X and Z are compact; ◮ f : X → Z is continuous; ◮ k X , k Z are continuous p.d. kernels on X and Z ; ◮ k X is c 0 -universal; ◮ There exists C s.t. � i | w i | ≤ C independently of n . Then: N N � � � � w i k X ( X i , · ) → µ X ⇒ w i k Z f ( X i ) , · → µ Z . H k X H k Z i =1 i =1 ◮ Importantly, w 1 , . . . , w N and X 1 , . . . , X N can be interdependent. ◮ Finite sample guarantees for X = R d , Z = R d ′ and Mat´ ern kernels. ◮ Applications: probabilistic programming, privacy-preserving ML, . . .

  7. New results Answer: yes, ˆ µ Z is indeed consistent. Assume: Proof based on [SS16] ◮ X and Z are compact; ◮ f : X → Z is continuous; ◮ k X , k Z are continuous p.d. kernels on X and Z ; ◮ k X is c 0 -universal; ◮ There exists C s.t. � i | w i | ≤ C independently of n . Then: N N � � � � w i k X ( X i , · ) → µ X ⇒ w i k Z f ( X i ) , · → µ Z . H k X H k Z i =1 i =1 ◮ Importantly, w 1 , . . . , w N and X 1 , . . . , X N can be interdependent. ◮ Finite sample guarantees for X = R d , Z = R d ′ and Mat´ ern kernels. ◮ Applications: probabilistic programming, privacy-preserving ML, . . .

  8. Related results. . . ◮ Minimax Estimation of Kernel Mean Embeddings T., Sriperumbudur, Muandet, 2016, arXiv Task: � X k ( x, · ) dP ( x ) based on the i.i.d. sample { X i } N Estimate i =1 Result: for translation-invariant kernels you can not do it faster than N − 1 / 2 . ◮ Minimax Estimation of MMD with Radial Kernels T., Sriperumbudur, Sch¨ olkopf, 2016, NIPS Task: Estimate � µ P − µ Q � H k based on i.i.d. samples { X i } N i =1 and { Y i } M i =1 Result: for radial kernels you can not do it faster than N − 1 / 2 + M − 1 / 2 .

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend