Inference and Estimation Using Nearest Neighbors 2019 The Second - - PowerPoint PPT Presentation

inference and estimation using
SMART_READER_LITE
LIVE PREVIEW

Inference and Estimation Using Nearest Neighbors 2019 The Second - - PowerPoint PPT Presentation

Inference and Estimation Using Nearest Neighbors 2019 The Second Korea-Japan Machine Learning Workshop 2019. 2. 22 (Fri.) Yung-Kyun Noh Seoul National University Hanyang University Seoul National University Nearest Neighbors Similar


slide-1
SLIDE 1

Seoul National University

Inference and Estimation Using Nearest Neighbors

Yung-Kyun Noh

Seoul National University → Hanyang University

2019 The Second Korea-Japan Machine Learning Workshop

  • 2019. 2. 22 (Fri.)
slide-2
SLIDE 2

Seoul National University

Nearest Neighbors

  • Similar data share similar properties

2

: class 1 : class 2

(= Labels?) (= Behavior)

slide-3
SLIDE 3

Seoul National University

3

x

p1(x) p2(x)

Data space

x N N

, uniformly with increasing N. In the limit, & For classification as an example:

[T. Cover and P. Hart, IEEE TIT, 1967]

slide-4
SLIDE 4

Seoul National University

Applications of Using Nearest Neighbors

  • Prediction using k-Nearest Neighbor

Information

– k-Nearest Neighbor Classification – k-Nearest Neighbor Regression

  • Estimation using k-Nearest Neighbor

Information

4

: class 1 : class 2

is a distance to the nearest neighbor in class c from .

[Leonenko, N., Pronzato, L., & Savani, V., 2008]

slide-5
SLIDE 5

Seoul National University

Similar Formulations

  • Nadaraya-Watson estimator for kernel

classification/regression

5 Kernel weight with respect to the distance

bandwidth

slide-6
SLIDE 6

Seoul National University

Bias Analysis

  • k-Nearest Neighbor Classification

6 [R. R. Snapp et al. The Annals of Statistics, 1998] [Y.-K. Noh et al. IEEE TPAMI, 2018]

①: Asymptotic NN Error ②: Residual due to Fin init ite Sampli ling . …① …②

slide-7
SLIDE 7

Seoul National University

Change of Metric

7

z = L

>

x

~ p ~ p

[Y.-K. Noh et al. IEEE TPAMI, 2018]

Euclidean metric Optimal metric

1 1 2 2

( ) ( ) ( ) ( )

y y y y

p p p p  x x x x

N

d

N

d

( )

N

p d p ( )

N

p d p

A = I

A = A O p t
slide-8
SLIDE 8

Seoul National University

Nearest Neighbor Classification with Metric

8

20% increase

[Y.-K. Noh et al. IEEE TPAMI, 2018]

 Obtain from generative models

r 2p1; r 2p2; p1; p2

slide-9
SLIDE 9

Seoul National University

Bandwidth and Nadaraya-Watson Regression

9

slide-10
SLIDE 10

Seoul National University

Bias Analysis

  • k-Nearest Neighbor Classification
  • Bias

10

→Minimizes mean square error (MSE) →Metric independent asymptotic property

E [b y(x) ¡ y(x)] = h2 µr

>p(x)ry(x)

p(x) + r2y(x) 2 ¶ + o(h4)

slide-11
SLIDE 11

Seoul National University

For x & y Jointly Gaussian

  • Learned metric is not sensitive to the bandwidth

11 [Y.-K. Noh, et al., NeurIPS, 2017]

slide-12
SLIDE 12

Seoul National University

12 [Y.-K. Noh, et al., NeurIPS, 2017]

slide-13
SLIDE 13

Seoul National University

Variance Reduction is Not Critical in High-Dimensions

13

Proposition Reducing the variance is not important in a high dimensional space once the bias is minimized and the bandwidth selection is followed.

[Y.-K. Noh, et al., NeurIPS, 2017]

slide-14
SLIDE 14

Seoul National University

Information-theoretic Measure Estimation

14

Metric invariant Metric dependent

② ① ① = ②

is a distance to the nearest neighbor in class c from .

slide-15
SLIDE 15

Seoul National University

Increase the KL-Divergence of Two Gaussians and its Estimation

15 [Y.-K. Noh, et al., NeCo, 2018]

slide-16
SLIDE 16

Seoul National University

MAKING GENERAL ESTIMATORS FOR F-DIVERGENCES

16

slide-17
SLIDE 17

Seoul National University

Estimation of the General f-Divergences

  • Shannon Entropy Estimation

17 [D. Lombardi and S. Pant, Phys. Rev. E, 2016] [A. Kraskov, H. Stögbauer, and P. Grassberger, Phys. Rev. E, 2004]

, Note that

d

In this case,

slide-18
SLIDE 18

Seoul National University

Density Estimator and Entropy Estimator

  • Loftsgaarden and Quesenberry (1965)
  • Shannon Entropy Estimator

18

slide-19
SLIDE 19

Seoul National University

Historical Remarks of Making Plug-in Estimators

19

Plug-in and correction

[N. Leonenko, L. Pronzato, &V. Savani, Annals of Statistics, 2008] [B. Poczos and J. Schneider, AISTATS, 2011]

Plug-in and correction Rényi and Tsallis entropies Shannon entropy

[Moon, K. & Hero, A., 2014] considers the general f-divergence plug-in estimator

slide-20
SLIDE 20

Seoul National University

Plug-in Nearest Neighbor f-divergence Estimator

  • Kullback-Leibler Divergence
  • Tsallis-alpha Divergence

20

slide-21
SLIDE 21

Seoul National University

Plug-in methods do not work for general f-divergences

21

Cover

[Noh, Y.-K. Ph.D. thesis, 2011]

Plug-in estimator

[Cover, T., 1968]

True f-divergence

Nearest neighbor classification

[Cover, T., 1968]

slide-22
SLIDE 22

Seoul National University

Obtaining the General f-Divergence Estimator

22

Inverse Laplace Transform

slide-23
SLIDE 23

Seoul National University

23

arXiv:1805.08342

slide-24
SLIDE 24

Seoul National University

Summary

  • Asymptotically, nearest neighbor methods are

very nice. (In terms of Theory!!)

  • With finite samples, bias treatment using

geometry change can improve the conventional nonparametric methods significantly (in high-dimensional space).

  • General and systematic way of obtaining f-

divergence using nearest neighbor information.

24

slide-25
SLIDE 25

Seoul National University

25

THAN HANK YO YOU

Yung-Kyun Noh nohyung@snu.ac.kr