BAYES AND NEAREST NEIGHBOR BAYES AND NEAREST NEIGHBOR CLASSIFIERS - - PowerPoint PPT Presentation

bayes and nearest neighbor bayes and nearest neighbor
SMART_READER_LITE
LIVE PREVIEW

BAYES AND NEAREST NEIGHBOR BAYES AND NEAREST NEIGHBOR CLASSIFIERS - - PowerPoint PPT Presentation

BAYES AND NEAREST NEIGHBOR BAYES AND NEAREST NEIGHBOR CLASSIFIERS CLASSIFIERS Matthieu R Bloch Tuesday, January 21, 2020 1 LOGISTICS LOGISTICS TAs and Office hours Monday: Mehrdad (TSRB 523a) - 2pm-3:15pm Tuesday: TJ (VL C449 Cubicle D) -


slide-1
SLIDE 1

Matthieu R Bloch Tuesday, January 21, 2020

BAYES AND NEAREST NEIGHBOR BAYES AND NEAREST NEIGHBOR CLASSIFIERS CLASSIFIERS

1

slide-2
SLIDE 2

LOGISTICS LOGISTICS

TAs and Office hours Monday: Mehrdad (TSRB 523a) - 2pm-3:15pm Tuesday: TJ (VL C449 Cubicle D) - 1:30pm - 2:45pm Wednesday: Matthieu (TSRB 423) - 12:00:pm-1:15pm Thursday: Hossein (VL C449 Cubicle B): 10:45pm - 12:00pm Friday: Brighton (TSRB 523a) - 12pm-1:15pm Homework 1 posted on Canvas Due Wednesday January 29, 2020 (11:59PM EST) (Wednesday February 5, 2020 for DL)

2

slide-3
SLIDE 3

RECAP: BAYES CLASSIFIER RECAP: BAYES CLASSIFIER

What is the best risk (smallest) that we can achieve? Assume that we actually know and Denote the a posteriori class probabilities of by Denote the a priori class probabilities by Lemma (Bayes classifier) The classifier is optimal, i.e., for any classifier , we have . Terminology is called the Bayes classifier is called the Bayes risk

PX PY|X x ∈ X (x) ≜ (Y = k|X = x) ηk P ≜ (Y = k) πk P (x) ≜ (x) hB argmaxk∈[0;K−1] ηk h R( ) ≤ R(h) hB R( ) = [1 − (X)] hB EX max

k

ηk hB ≜ R( ) RB hB

3

slide-4
SLIDE 4

4

slide-5
SLIDE 5

OTHER FORMS OF THE BAYES CLASSIFIER OTHER FORMS OF THE BAYES CLASSIFIER

For (binary classification): log-likelihood ratio test If all classes are equally likely Example (Bayes classifier) Assume and . The Bayes risk for is with In practice we do not now and Plugin methods: use the data to learn the distributions and plug result in Bayes classifier

(x) ≜ (x) hB argmaxk∈[0;K−1] ηk (x) ≜ (x|k) hB argmaxk∈[0;K−1] πkpX|Y K = 2 log ≷ log (x|1) pX|Y (x|0) pX|Y π0 π1 = = ⋯ = π0 π1 πK−1 (x) ≜ (x|k) hB argmax

k∈[0;K−1]

pX|Y X|Y = 0 ∼ N(0, 1) X|Y = 1 ∼ N(1, 1) = π0 π1 R( ) = Φ(− ) hB

1 2

Φ ≜ Normal CDF PX PY|X

5

slide-6
SLIDE 6

OTHER LOSS FUNCTIONS OTHER LOSS FUNCTIONS

We have focused on the risk

  • btained for a binary loss function

There are many situations in which this is not appropriate Cost sensitive classification: false alarm and missed detection may not be equivalent Unbalanced data set: the probability of the largest class will dominate More to explore in the next homework!

(h(X) ≠ Y ) P 1{h(X) ≠ Y } 1{h(X) ≠ 0 and Y = 0} + 1{h(X) ≠ 1 and Y = 1} c0 c1

6

slide-7
SLIDE 7

NEAREST NEIGHBOR CLASSIFIER NEAREST NEIGHBOR CLASSIFIER

Back to our training dataset The nearest-neighbor (NN) classifier is where Risk of NN classifier conditioned on and How well does the average risk compare to the Bayes risk for large ? Lemma. Let , be i.i.d. in a separable metric space . Let be the nearest neighbor of . Then with probability one as Theorem (Binary NN classifier) Let be a separable metric space. Let , be such that, with probability one, is either a continuity point of and

  • r a point of non-zero probability measure.

Then, as ,

D ≜ {( , ), ⋯ , ( , )} x1 y1 xN yN (x) ≜ hNN yNN(x) NN(x) ≜ argmini ∥ − x∥ xi x xNN(x) (x, ) = ( )(1 − (x)) = (x)(1 − ( )). RNN xNN(x) ∑

k

ηk xNN(x) ηk ∑

k

ηk ηk xNN(x) = R( ) RNN hNN N x {xi}N

i=1

∼ Px X xNN(x) x → x xNN(x) N → ∞ X p(x|y = 0) p(x|y = 1) x p(x|y = 0) p(x|y = 1) N → ∞ R( ) ≤ R( ) ≤ 2R( )(1 − R( )) hB hNN hB hB

7

slide-8
SLIDE 8

8

slide-9
SLIDE 9

9

slide-10
SLIDE 10

K NEAREST NEIGHBOR CLASSIFIER K NEAREST NEIGHBOR CLASSIFIER

Can drive the risk of the NN classifier to the Bayes risk by increasing the size of the neighborhood Assign label to by taking majority vote among nearest neighbors Definition. Let be a classifier learned from a set of data points. The classifier is consistent if as . Theorem (Stone's Theorem) If , , , then is consistent Choosing is a problem of model selection Do not choose by minimizing the empirical risk on training: Need to rely on estimates from model selection techniques (more later!)

x K hK-NN [R( )] ≤ (1 + ) R( ) lim

N→∞ E

hK-NN 2 K − − − √ hB h ^N N h ^N [R( )] → E h ^N RB N → ∞ N → ∞ K → ∞ K/N → 0 hK-NN K K ( ) = 1{ ( ) = } = 0 R ˆN h1-NN 1 N ∑

i=1 N

h1 xi yi

10

slide-11
SLIDE 11

K NEAREST NEIGHBOR CLASSIFIER K NEAREST NEIGHBOR CLASSIFIER

Given enough data, a

  • NN classifier will do just as well as pretty much any other method

The number of samples can be huge (especially in high-dimension) The choice of matters a lot, model selection is important Finding the nearest neighbors out of a millions of datapoints is still computationally hard

  • d trees help, but still expensive in high dimension when

We will discuss other classifiers that make more assumptions about the underlying data

K N K K N ≈ d

11

slide-12
SLIDE 12

12

 