The Nearest Neighbor Information Estimator is Adaptively Near - PowerPoint PPT Presentation

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Jiantao Jiao Berkeley EECS Weihao Gao UIUC ECE Yanjun Han Stanford EE NIPS 2018, Montr´ eal, Canada

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Differential Entropy Estimation Differential entropy of a continuous density on R d : � 1 h ( f ) = R d f ( x ) log f ( x ) dx 2 / 6

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Differential Entropy Estimation Differential entropy of a continuous density on R d : � 1 h ( f ) = R d f ( x ) log f ( x ) dx ◮ machine learning tasks, e.g., classification, clustering, feature selection 2 / 6

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Differential Entropy Estimation Differential entropy of a continuous density on R d : � 1 h ( f ) = R d f ( x ) log f ( x ) dx ◮ machine learning tasks, e.g., classification, clustering, feature selection ◮ causal inference 2 / 6

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Differential Entropy Estimation Differential entropy of a continuous density on R d : � 1 h ( f ) = R d f ( x ) log f ( x ) dx ◮ machine learning tasks, e.g., classification, clustering, feature selection ◮ causal inference ◮ sociology 2 / 6

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Differential Entropy Estimation Differential entropy of a continuous density on R d : � 1 h ( f ) = R d f ( x ) log f ( x ) dx ◮ machine learning tasks, e.g., classification, clustering, feature selection ◮ causal inference ◮ sociology ◮ computational biology 2 / 6

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Differential Entropy Estimation Differential entropy of a continuous density on R d : � 1 h ( f ) = R d f ( x ) log f ( x ) dx ◮ machine learning tasks, e.g., classification, clustering, feature selection ◮ causal inference ◮ sociology ◮ computational biology ◮ · · · 2 / 6

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Differential Entropy Estimation Differential entropy of a continuous density on R d : � 1 h ( f ) = R d f ( x ) log f ( x ) dx ◮ machine learning tasks, e.g., classification, clustering, feature selection ◮ causal inference ◮ sociology ◮ computational biology ◮ · · · Our Task Given empirical samples X 1 , · · · , X n ∼ f , estimate h ( f ). 2 / 6

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Ideas of Nearest Neighbor 3 / 6

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Ideas of Nearest Neighbor Notations: ◮ n : number of samples ◮ d : dimensionality ◮ k : number of nearest neighbors ◮ R i , k : ℓ 2 distance of i -th sample to its k -th nearest neighbor ◮ vol d ( r ): volumn of the d -dimensional ball with radius r 3 / 6

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Ideas of Nearest Neighbor Notations: ◮ n : number of samples ◮ d : dimensionality ◮ k : number of nearest neighbors ◮ R i , k : ℓ 2 distance of i -th sample to its k -th nearest neighbor ◮ vol d ( r ): volumn of the d -dimensional ball with radius r Idea n h ( f ) = E [ − log f ( X )] ≈ − 1 � log f ( X i ) n i =1 f ( X i ) · vol d ( R i , k ) ≈ k n 3 / 6

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Kozachenko–Leonenko Estimator Definition (Kozachenko–Leonenko Estimator) n � n � n , k = 1 � ˆ h KL log k vol d ( R i , k ) + log( k ) − ψ ( k ) n � �� i =1 bias correction term 4 / 6

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Kozachenko–Leonenko Estimator Definition (Kozachenko–Leonenko Estimator) n � n � n , k = 1 � ˆ h KL log k vol d ( R i , k ) + log( k ) − ψ ( k ) n � �� i =1 bias correction term ◮ Easy to implement: no numerical integration 4 / 6

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Kozachenko–Leonenko Estimator Definition (Kozachenko–Leonenko Estimator) n � n � n , k = 1 � ˆ h KL log k vol d ( R i , k ) + log( k ) − ψ ( k ) n � �� i =1 bias correction term ◮ Easy to implement: no numerical integration ◮ Only tuning parameter: k 4 / 6

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Kozachenko–Leonenko Estimator Definition (Kozachenko–Leonenko Estimator) n � n � n , k = 1 � ˆ h KL log k vol d ( R i , k ) + log( k ) − ψ ( k ) n � �� i =1 bias correction term ◮ Easy to implement: no numerical integration ◮ Only tuning parameter: k ◮ Good empirical performance without theoretical guarantee, especially when the density may be close to zero. 4 / 6

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Main Result Let H s d be the class of probability densities supported on [0 , 1] d which are H¨ older smooth with parameter s ≥ 0. 5 / 6

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Main Result Let H s d be the class of probability densities supported on [0 , 1] d which are H¨ older smooth with parameter s ≥ 0. Theorem (Main Result) For fixed k and s ∈ (0 , 2] , � 1 � 2 � � 2 s s + d log n + n − 1 ˆ � n − h KL 2 . sup E f n , k − h ( f ) f ∈H s d 5 / 6

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Main Result Let H s d be the class of probability densities supported on [0 , 1] d which are H¨ older smooth with parameter s ≥ 0. Theorem (Main Result) For fixed k and s ∈ (0 , 2] , � 1 � 2 � � 2 s s + d log n + n − 1 ˆ � n − h KL 2 . sup E f n , k − h ( f ) f ∈H s d First theoretical guarantee of Kozachenko–Leonenko estimator without assuming density bounded away from zero. 5 / 6

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Matching Lower Bound Theorem (Han–Jiao–Weissman–Wu’17) For any s ≥ 0 , � 1 � 2 � � 2 s + d (log n ) − s +2 d s s + d + n − 1 ˆ � n − 2 . inf sup E f h − h ( f ) ˆ f ∈H s h d 6 / 6

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Matching Lower Bound Theorem (Han–Jiao–Weissman–Wu’17) For any s ≥ 0 , � 1 � 2 � � 2 s + d (log n ) − s +2 d s s + d + n − 1 ˆ � n − 2 . inf sup E f h − h ( f ) ˆ f ∈H s h d Take-home Message ◮ Nearest neighbor estimator is nearly minimax 6 / 6

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Matching Lower Bound Theorem (Han–Jiao–Weissman–Wu’17) For any s ≥ 0 , � 1 � 2 � � 2 s + d (log n ) − s +2 d s s + d + n − 1 ˆ � n − 2 . inf sup E f h − h ( f ) ˆ f ∈H s h d Take-home Message ◮ Nearest neighbor estimator is nearly minimax ◮ Nearest neighbor estimator adapts to the unknown smoothness s 6 / 6

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Matching Lower Bound Theorem (Han–Jiao–Weissman–Wu’17) For any s ≥ 0 , � 1 � 2 � � 2 s + d (log n ) − s +2 d s s + d + n − 1 ˆ � n − 2 . inf sup E f h − h ( f ) ˆ f ∈H s h d Take-home Message ◮ Nearest neighbor estimator is nearly minimax ◮ Nearest neighbor estimator adapts to the unknown smoothness s ◮ Maximal inequality plays a central role in dealing with small densities. 6 / 6

The Nearest Neighbor Information Estimator is Adaptively Near - PowerPoint PPT Presentation

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Jiantao Jiao Berkeley EECS Weihao Gao UIUC ECE Yanjun Han Stanford EE NIPS 2018, Montr eal, Canada The Nearest Neighbor Information Estimator is

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

NEAREST NEIGHBOR RULE Jeff Robble, Brian Renzenbrink, Doug Roberts Nearest Neighbor Rule

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

High-Dimensional Nearest Neighbor Search High-Dimensional Nearest Neighbor Search Who?

Nearest Neighbor Classification Machine Learning 1 This lecture K-nearest neighbor

Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search Sariel Har-Peled

Graph-based Nearest Neighbor Search: From Practice to Theory Liudmila Prokhorenkova, Aleksandr

The Nearest Neighbor Algorithm The Nearest Neighbor Algorithm Hypothesis Space Hypothesis Space

Learning From Data Lecture 16 Similarity and Nearest Neighbor Similarity Nearest Neighbor M.

Simultaneous Nearest Neighbor Search Piotr Indyk Robert Kleinberg MIT Cornell Sepideh

BAYES AND NEAREST NEIGHBOR BAYES AND NEAREST NEIGHBOR CLASSIFIERS CLASSIFIERS Matthieu R Bloch

Simple and Fast Nearest Neighbor Search Marcel Birn, Manuel Holtgrewe, Peter Sanders , Johannes

9/28/2009 Nearest Neighbor Queries What are the two nearest stars to Andromeda? Reverse

Nearest Neighbor Learning (Instance Based Learning) l Classify based on local similarity l Ranges

Learning: Nearest Neighbor, Perceptrons & Neural Nets Artificial Intelligence CSPP 56553

Continuous Nearest Neighbor Search Yufei Tao, Dimitris Papadias, Qiongmao Shen Hong Kong

Lecture 17 : Double Integrals 0/ 15 Some of you have not learned how to do double integrals. In

Lecture 3.2: Computing Fourier series and exploiting symmetry Matthew Macauley Department of

Pixek Seny Kamara,Tarik Moataz, Martin Zhu 1 2 9,198,580,293* 4% * since 2013 3 Why so Few?

2 Unit Bridging Course Day 10 Circular Functions III The cosine function, identities and

Weaving Generic Programming and Traversal Performance Bryan Chadwick and Karl Lieberherr AOSD

Antiderivatives Definition 1 (Antiderivative) . If F ( x ) = f ( x ) we call F an antideriv-

Numerical Integration Numerical Integration 1 / 11 Objective b Approximate f ( x ) dx a

Periods Numerical computation and applications Pierre Lairez Inria Saclay Sminaire de

The Nearest Neighbor Information Estimator is Adaptively Near - PowerPoint PPT Presentation

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Jiantao Jiao Berkeley EECS Weihao Gao UIUC ECE Yanjun Han Stanford EE NIPS 2018, Montr eal, Canada The Nearest Neighbor Information Estimator is

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

NEAREST NEIGHBOR RULE Jeff Robble, Brian Renzenbrink, Doug Roberts Nearest Neighbor Rule

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

High-Dimensional Nearest Neighbor Search High-Dimensional Nearest Neighbor Search Who?

Nearest Neighbor Classification Machine Learning 1 This lecture K-nearest neighbor

Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search Sariel Har-Peled

Graph-based Nearest Neighbor Search: From Practice to Theory Liudmila Prokhorenkova, Aleksandr

The Nearest Neighbor Algorithm The Nearest Neighbor Algorithm Hypothesis Space Hypothesis Space

Learning From Data Lecture 16 Similarity and Nearest Neighbor Similarity Nearest Neighbor M.

Simultaneous Nearest Neighbor Search Piotr Indyk Robert Kleinberg MIT Cornell Sepideh

BAYES AND NEAREST NEIGHBOR BAYES AND NEAREST NEIGHBOR CLASSIFIERS CLASSIFIERS Matthieu R Bloch

Simple and Fast Nearest Neighbor Search Marcel Birn, Manuel Holtgrewe, Peter Sanders , Johannes

9/28/2009 Nearest Neighbor Queries What are the two nearest stars to Andromeda? Reverse

Nearest Neighbor Learning (Instance Based Learning) l Classify based on local similarity l Ranges

Learning: Nearest Neighbor, Perceptrons &amp; Neural Nets Artificial Intelligence CSPP 56553

Continuous Nearest Neighbor Search Yufei Tao, Dimitris Papadias, Qiongmao Shen Hong Kong

Lecture 17 : Double Integrals 0/ 15 Some of you have not learned how to do double integrals. In

Lecture 3.2: Computing Fourier series and exploiting symmetry Matthew Macauley Department of

Pixek Seny Kamara,Tarik Moataz, Martin Zhu 1 2 9,198,580,293* 4% * since 2013 3 Why so Few?

2 Unit Bridging Course Day 10 Circular Functions III The cosine function, identities and

Weaving Generic Programming and Traversal Performance Bryan Chadwick and Karl Lieberherr AOSD

Antiderivatives Definition 1 (Antiderivative) . If F ( x ) = f ( x ) we call F an antideriv-

Numerical Integration Numerical Integration 1 / 11 Objective b Approximate f ( x ) dx a

Periods Numerical computation and applications Pierre Lairez Inria Saclay Sminaire de

Learning: Nearest Neighbor, Perceptrons & Neural Nets Artificial Intelligence CSPP 56553