Inference and Estimation Using Nearest Neighbors 2019 The Second - PowerPoint PPT Presentation

Inference and Estimation Using Nearest Neighbors 2019 The Second Korea-Japan Machine Learning Workshop 2019. 2. 22 (Fri.) Yung-Kyun Noh Seoul National University → Hanyang University Seoul National University

Nearest Neighbors • Similar data share similar properties (= Labels?) (= Behavior) : class 1 : class 2 2 Seoul National University

, uniformly with increasing N . In the limit, & p 2 ( x ) p 1 ( x ) x 0 x N N Data space For classification as an example: [T. Cover and P. Hart, IEEE TIT , 1967] 3 Seoul National University

Applications of Using Nearest Neighbors • Prediction using k -Nearest Neighbor Information – k -Nearest Neighbor Classification – k -Nearest Neighbor Regression : class 1 : class 2 • Estimation using k -Nearest Neighbor Information [Leonenko, N., Pronzato, L., & Savani, V., 2008] is a distance to the nearest neighbor in class c from . 4 Seoul National University

Similar Formulations • Nadaraya-Watson estimator for kernel classification/regression Kernel weight with respect to the distance bandwidth 5 Seoul National University

Bias Analysis • k -Nearest Neighbor Classification [R. R. Snapp et al. The Annals of Statistics , 1998] [Y.-K. Noh et al. IEEE TPAMI , 2018] …① …② ① : Asymptotic NN Error ② : Residual due to Fin init ite Sampli ling . 6 Seoul National University

Change of Metric [Y.-K. Noh et al. IEEE TPAMI , 2018] p ~ p ~ > z = L x Euclidean metric Optimal metric A = A A = I O p t p d ( ) p d ( ) N N p ( ) p ( ) p p x x  y y 1 1 p ( ) p ( ) x x y y 2 2 d d N N 7 Seoul National University

Nearest Neighbor Classification with Metric  Obtain from generative models r 2 p 1 ; r 2 p 2 ; p 1 ; p 2 [Y.-K. Noh et al. IEEE TPAMI , 2018] 20% increase 8 Seoul National University

Bandwidth and Nadaraya-Watson Regression 9 Seoul National University

Bias Analysis • k -Nearest Neighbor Classification → Minimizes mean square error (MSE) → Metric independent asymptotic property • Bias µ r ¶ > p ( x ) r y ( x ) + r 2 y ( x ) y ( x ) ¡ y ( x )] = h 2 + o ( h 4 ) E [ b p ( x ) 2 10 Seoul National University

For x & y Jointly Gaussian • Learned metric is not sensitive to the bandwidth [Y.-K. Noh, et al., NeurIPS , 2017] 11 Seoul National University

[Y.-K. Noh, et al., NeurIPS , 2017] 12 Seoul National University

Variance Reduction is Not Critical in High-Dimensions [Y.-K. Noh, et al., NeurIPS , 2017] Proposition Reducing the variance is not important in a high dimensional space once the bias is minimized and the bandwidth selection is followed. 13 Seoul National University

Information-theoretic Measure Estimation is a distance to the nearest neighbor in class c from . Metric invariant Metric dependent ② ① ① = ② 14 Seoul National University

Increase the KL-Divergence of Two Gaussians and its Estimation [Y.-K. Noh, et al., NeCo , 2018] 15 Seoul National University

MAKING GENERAL ESTIMATORS FOR F-DIVERGENCES 16 Seoul National University

Estimation of the General f -Divergences • Shannon Entropy Estimation [D. Lombardi and S. Pant, Phys. Rev. E , 2016] [A. Kraskov, H. Stögbauer, and P. Grassberger, Phys. Rev. E , 2004] , Note that d In this case, 17 Seoul National University

Density Estimator and Entropy Estimator • Loftsgaarden and Quesenberry (1965) • Shannon Entropy Estimator 18 Seoul National University

Historical Remarks of Making Plug-in Estimators [N. Leonenko, L. Pronzato, &V. Savani, Annals of Statistics , 2008] [B. Poczos and J. Schneider, AISTATS , 2011] Shannon entropy Plug-in and correction Rényi and Tsallis entropies Plug-in and correction [Moon, K. & Hero, A., 2014] considers the general f -divergence plug-in estimator 19 Seoul National University

Plug-in Nearest Neighbor f -divergence Estimator • Kullback-Leibler Divergence • Tsallis-alpha Divergence 20 Seoul National University

Plug-in methods do not work for general f -divergences [Cover, T., 1968] Nearest neighbor classification Cover True f -divergence [Cover, T., 1968] Plug-in estimator [Noh, Y.-K. Ph.D. thesis, 2011] 21 Seoul National University

Obtaining the General f -Divergence Estimator Inverse Laplace Transform 22 Seoul National University

arXiv:1805.08342 23 Seoul National University

Summary • Asymptotically, nearest neighbor methods are very nice. ( In terms of Theory!! ) • With finite samples, bias treatment using geometry change can improve the conventional nonparametric methods significantly (in high-dimensional space). • General and systematic way of obtaining f - divergence using nearest neighbor information. 24 Seoul National University

THAN HANK YO YOU Yung-Kyun Noh nohyung@snu.ac.kr 25 Seoul National University

Inference and Estimation Using Nearest Neighbors 2019 The Second - PowerPoint PPT Presentation

Inference and Estimation Using Nearest Neighbors 2019 The Second Korea-Japan Machine Learning Workshop 2019. 2. 22 (Fri.) Yung-Kyun Noh Seoul National University Hanyang University Seoul National University Nearest Neighbors Similar

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard

Post-Selection Inference Todd Kuffner Washington University in St. Louis PhyStat 2016

Type Inference 75 Definition Type Inference Type inference = Java compiler's ability

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Exact Inference Inference Basic task for inference: Compute

Gov 2000: 5. Estimation and Statistical Inference Matthew Blackwell Fall 2016 1 / 56 1. Point

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

M-Estimation under High-Dimensional Asymptotics DLD, Andrea Montanari 2014-05-01 DLD, Andrea

Part 3. Spectrum Estimation Part 3. Spectrum Estimation 3.2 Parametric Methods for Spectral

Travel Time Estimation using Approximate Belief States on a Hidden Markov Model Walid Krichene

Estimation of Oceanic Rainfall using Passive Estimation of Oceanic Rainfall using Passive and

Estimation of cosmological parameters using adaptive importance sampling Gersende FORT LTCI,

Estimation of cosmological parameters using adaptive importance sampling Gersende FORT LTCI,

Introduction to Machine Learning Random Forest: Benchmarking Trees, Forests, and Bagging K-NN

Non-Bayesian Classifiers Part I: k -Nearest Neighbor Classifier and Distance Functions Selim

c i,j max k,m c k,m 4 Wednesday, 26 Feb. 2020 Machine Learning (COMP 135) 3 Wednesday, 26

Instance Based Learning Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 8

C o l o r G l a s s C o n d e n s a t e a n d p a r t o n s a t u

Size Estimation - Statistical Models for Underreporting Gerhard Neubauer, Gordana Djura &

Monodromy Solver: sequential and parallel joint with Nathan Bliss (UIC), Tim Duff (Georgia Tech),

Pr [ X i ] . i = 1 E [ X ] = Pr [ X i ] . i = 1 Proof: One has E [

Inference and Estimation Using Nearest Neighbors 2019 The Second - PowerPoint PPT Presentation

Inference and Estimation Using Nearest Neighbors 2019 The Second Korea-Japan Machine Learning Workshop 2019. 2. 22 (Fri.) Yung-Kyun Noh Seoul National University Hanyang University Seoul National University Nearest Neighbors Similar

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard

Post-Selection Inference Todd Kuffner Washington University in St. Louis PhyStat 2016

Type Inference 75 Definition Type Inference Type inference = Java compiler's ability

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Exact Inference Inference Basic task for inference: Compute

Gov 2000: 5. Estimation and Statistical Inference Matthew Blackwell Fall 2016 1 / 56 1. Point

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

M-Estimation under High-Dimensional Asymptotics DLD, Andrea Montanari 2014-05-01 DLD, Andrea

Part 3. Spectrum Estimation Part 3. Spectrum Estimation 3.2 Parametric Methods for Spectral

Travel Time Estimation using Approximate Belief States on a Hidden Markov Model Walid Krichene

Estimation of Oceanic Rainfall using Passive Estimation of Oceanic Rainfall using Passive and

Estimation of cosmological parameters using adaptive importance sampling Gersende FORT LTCI,

Estimation of cosmological parameters using adaptive importance sampling Gersende FORT LTCI,

Introduction to Machine Learning Random Forest: Benchmarking Trees, Forests, and Bagging K-NN

Non-Bayesian Classifiers Part I: k -Nearest Neighbor Classifier and Distance Functions Selim

c i,j max k,m c k,m 4 Wednesday, 26 Feb. 2020 Machine Learning (COMP 135) 3 Wednesday, 26

Instance Based Learning Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 8

C o l o r G l a s s C o n d e n s a t e a n d p a r t o n s a t u

Size Estimation - Statistical Models for Underreporting Gerhard Neubauer, Gordana Djura &amp;

Monodromy Solver: sequential and parallel joint with Nathan Bliss (UIC), Tim Duff (Georgia Tech),

Pr [ X i ] . i = 1 E [ X ] = Pr [ X i ] . i = 1 Proof: One has E [

Size Estimation - Statistical Models for Underreporting Gerhard Neubauer, Gordana Djura &