k-Nearest Neighbors Lecture 2 k-Nearest Neighbors September 16, - - PowerPoint PPT Presentation

k nearest neighbors
SMART_READER_LITE
LIVE PREVIEW

k-Nearest Neighbors Lecture 2 k-Nearest Neighbors September 16, - - PowerPoint PPT Presentation

Wentworth Institute of Technology COMP4050 Machine Learning | Fall 2015 | Derbinsky k-Nearest Neighbors Lecture 2 k-Nearest Neighbors September 16, 2015 1 Wentworth Institute of Technology COMP4050 Machine Learning | Fall 2015 |


slide-1
SLIDE 1

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

k-Nearest Neighbors

Lecture 2

September 16, 2015 k-Nearest Neighbors 1

slide-2
SLIDE 2

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

Outline

  • 1. Learning via distance measurements
  • 2. Model parameters

– Bias vs. Variance

  • 3. Extensions

– Regression – Improving Efficiency

September 16, 2015 k-Nearest Neighbors 2

slide-3
SLIDE 3

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

A Motivating Example

Movie ¡Title ¡ # ¡of ¡Kicks ¡ # ¡of ¡Kisses ¡ Type ¡of ¡Movie ¡ California ¡Man ¡ 3 ¡ 104 ¡ Romance ¡ He’s ¡Not ¡Really ¡into ¡Dudes ¡ 2 ¡ 100 ¡ Romance ¡ Beau>ful ¡Woman ¡ 1 ¡ 81 ¡ Romance ¡ Kevin ¡Longblade ¡ 101 ¡ 10 ¡ Ac>on ¡ Robo ¡Slayer ¡3000 ¡ 99 ¡ 5 ¡ Ac>on ¡ Amped ¡II ¡ 98 ¡ 2 ¡ Ac>on ¡ ? ¡ 18 ¡ 90 ¡ ? ¡

September 16, 2015 k-Nearest Neighbors 3

0 ¡ 50 ¡ 100 ¡ 150 ¡ 0 ¡ 50 ¡ 100 ¡ 150 ¡ # ¡of ¡Kisses ¡ # ¡of ¡Kicks ¡ Romance ¡ Ac>on ¡ Unknown ¡

slide-4
SLIDE 4

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

A Motivating Example

Movie ¡Title ¡ # ¡of ¡Kicks ¡ # ¡of ¡Kisses ¡ Type ¡of ¡Movie ¡ L2 ¡Distance ¡ California ¡Man ¡ 3 ¡ 104 ¡ Romance ¡ 20.52 ¡ He’s ¡Not ¡Really ¡into ¡Dudes ¡ 2 ¡ 100 ¡ Romance ¡ 18.87 ¡ Beau>ful ¡Woman ¡ 1 ¡ 81 ¡ Romance ¡ 19.24 ¡ Kevin ¡Longblade ¡ 101 ¡ 10 ¡ Ac>on ¡ 115.28 ¡ Robo ¡Slayer ¡3000 ¡ 99 ¡ 5 ¡ Ac>on ¡ 117.41 ¡ Amped ¡II ¡ 98 ¡ 2 ¡ Ac>on ¡ 118.93 ¡ ? ¡ 18 ¡ 90 ¡ ? ¡ 0 ¡

September 16, 2015 k-Nearest Neighbors 4

0 ¡ 50 ¡ 100 ¡ 150 ¡ 0 ¡ 50 ¡ 100 ¡ 150 ¡ # ¡of ¡Kisses ¡ # ¡of ¡Kicks ¡ Romance ¡ Ac>on ¡ Unknown ¡

slide-5
SLIDE 5

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

kNN

  • Store examples
  • Find the nearest k

neighbors to target

– Via distance function

  • Vote on result

September 16, 2015 k-Nearest Neighbors 5

Training Testing

slide-6
SLIDE 6

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

2D Multiclass Classification

September 16, 2015 k-Nearest Neighbors 6

Ground Truth 1-NN via Linear Scan

slide-7
SLIDE 7

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

Model Parameters

  • k – number of neighbors to find
  • D(x1,x2) – distance function
  • V({x, y}) – voting function

Related

  • Feature representation

– Scaling – Curse of dimensionality

  • Efficiency

– Storage/search

September 16, 2015 k-Nearest Neighbors 7

slide-8
SLIDE 8

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

Choosing k

  • 1 = Nearest Neighbor
  • Pro tip: if binary, choose odd to avoid ties
  • Tradeoff: under/over-fitting

– Small k: sensitive to noise – Large k: includes distal points

September 16, 2015 k-Nearest Neighbors 8

slide-9
SLIDE 9

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

Bias vs. Variance Revisited

September 16, 2015 k-Nearest Neighbors 9

General kNN

Model ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡as ¡

y = f(x)

ˆ f(x)

Err(x) = Bias2 + Variance + Irreducible Error

Err(x) = E[(Y − ˆ f(x))2] Bias = E[ ˆ f(x)] − f(x) Variance = E[( ˆ f(x) − E[ ˆ f(x)])2]

Irreducible Error = σ2

Bias = f(x) − 1 k

k

X

i=1

f(Ni(x))

Variance = σ2 k

Monotonically ¡increases ¡with ¡k ¡ Monotonically ¡decreases ¡with ¡k ¡ Example: ¡hUp://scoU.fortmann-­‑roe.com/docs/BiasVariance.html ¡

slide-10
SLIDE 10

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

Common Distance Functions

  • Manhattan (L1)
  • Euclidean (L2)
  • Cosine similarity

– Useful in high dimensions:

  • Edit distance
  • Graph traversal

– Decay

  • Modern: learn a useful distance measure!

Individual instance weighting

September 16, 2015 k-Nearest Neighbors 10

cos(θ) = A · B ||A|| ||B||

slide-11
SLIDE 11

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

Issues with Distance Functions

  • Categorical data

– Indicator function is safe (i.e. Hamming Distance)

  • Pay attention to nominal features!
  • Curses!

– Euclidean becomes less discriminating in high dimensions

  • Normalization

– Consider a function over features

  • Annual salary
  • Height in meters

– Common to scale features to [0, 1]

September 16, 2015 k-Nearest Neighbors 11

Xscaled = X − Min Max − Min

slide-12
SLIDE 12

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

V = Majority Vote

y0 = argmax

v

X

(xi,yi)2Dz

I(v = yi)

September 16, 2015 k-Nearest Neighbors 12

slide-13
SLIDE 13

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

V = Distance-Weighted Vote

y0 = argmax

v

X

(xi,yi)2Dz

wi × I(v = yi) where wi = 1 d(x0, xi)2

September 16, 2015 k-Nearest Neighbors 13

Useful ¡if ¡the ¡nearest ¡neighbors ¡vary ¡widely ¡in ¡their ¡distance ¡and ¡the ¡closer ¡neighbors ¡ more ¡reliably ¡indicate ¡the ¡class ¡of ¡the ¡object ¡

slide-14
SLIDE 14

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

Efficiency

Assume N training examples, d features…

  • What is the computational cost of training a new

instance?

  • How much space is required to store the model?
  • What is the computational cost of predicting the result
  • f a new test instance?

September 16, 2015 k-Nearest Neighbors 14

O(N · d) O(d) ∼ O(1) O(N · d)

slide-15
SLIDE 15

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

Some Theory (Cover & Hart, 1967)

  • Bayes error rate is the lowest possible error rate

for a given class of classifier

– Non-zero if the distributions of the instances overlap – More in later lectures

  • As the amount of data approaches infinity, kNN is

guaranteed to yield an error rate no worse than twice the Bayes error rate

  • kNN is guaranteed to approach the Bayes error

rate for some value of k (where k increases as a function of the number of data points)

September 16, 2015 k-Nearest Neighbors 15

slide-16
SLIDE 16

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

Applying kNN to Regression

  • Rather than voting on a label, the voting

function produces a value

– Average – Weighted average (w.r.t. distance)

September 16, 2015 k-Nearest Neighbors 16

slide-17
SLIDE 17

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

Example: House Price Index

Age ¡ Loan ¡ House ¡Price ¡Index ¡ 25 ¡ $40,000 ¡ 135 ¡ 35 ¡ $60,000 ¡ 256 ¡ 45 ¡ $80,000 ¡ 231 ¡ 20 ¡ $20,000 ¡ 267 ¡ 35 ¡ $120,000 ¡ 139 ¡ 52 ¡ $18,000 ¡ 150 ¡ 23 ¡ $95,000 ¡ 127 ¡ 40 ¡ $62,000 ¡ 216 ¡ 60 ¡ $100,000 ¡ 139 ¡ 48 ¡ $220,000 ¡ 250 ¡ 33 ¡ $150,000 ¡ 264 ¡ 48 ¡ $142,000 ¡ ? ¡

September 16, 2015 k-Nearest Neighbors 17

hUp://www.saedsayad.com/k_nearest_neighbors_reg.htm ¡

slide-18
SLIDE 18

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

Improving Efficiency

  • Filtered Storage

– Condensed NN

  • Intelligent Search

– Space partitioning (k-d tree, R-tree)

  • Approximate NN

– Locality Sensitive Hashing – Boundary Forests

September 16, 2015 k-Nearest Neighbors 18

slide-19
SLIDE 19

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

A 2D Classification Example

September 16, 2015 k-Nearest Neighbors 19

slide-20
SLIDE 20

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

Interleaved Train/Query (1)

Ground Truth Boundary Tree

September 16, 2015 k-Nearest Neighbors 20

slide-21
SLIDE 21

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

Interleaved Train/Query (2)

Ground Truth Boundary Tree

September 16, 2015 k-Nearest Neighbors 21

slide-22
SLIDE 22

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

Interleaved Train/Query (3)

Ground Truth Boundary Tree

September 16, 2015 k-Nearest Neighbors 22

slide-23
SLIDE 23

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

Interleaved Train/Query (4)

Ground Truth Boundary Tree

September 16, 2015 k-Nearest Neighbors 23

slide-24
SLIDE 24

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

Interleaved Train/Query (5)

Ground Truth Boundary Tree

September 16, 2015 k-Nearest Neighbors 24

slide-25
SLIDE 25

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

Interleaved Train/Query (6)

Ground Truth Boundary Tree

September 16, 2015 k-Nearest Neighbors 25

slide-26
SLIDE 26

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

Interleaved Train/Query (7)

Ground Truth Boundary Tree

September 16, 2015 k-Nearest Neighbors 26

slide-27
SLIDE 27

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

Interleaved Train/Query (8)

Ground Truth Boundary Tree

September 16, 2015 k-Nearest Neighbors 27

slide-28
SLIDE 28

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

Performance & Scaling

Boundary Tree 1-NN via Linear Scan

September 16, 2015 k-Nearest Neighbors 28

slide-29
SLIDE 29

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

Improving Accuracy via Forests

Linear increase in memory + time

1 Tree 10 Trees

September 16, 2015 k-Nearest Neighbors 29

slide-30
SLIDE 30

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

Algorithm Sketch

Required Parameters

  • nt = number of trees
  • k = maximum outdegree

– Typically leads to eventual logarithmic scaling

  • d( x, y ) = distance metric

– Need not be true metric – No assumptions made about properties

September 16, 2015 k-Nearest Neighbors 30

slide-31
SLIDE 31

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

Algorithm Sketch

Boundary Tree

Query( y )

  • v = root
  • loop

– cand = children( v ) – if |children( v )| < k

  • cand = cand U v

– vmin = argminw < cand d( w, y ) – if vmin = v: break; – v = vmin

Result

  • NN: vmin
  • Classification: class( vmin )
  • Regression: value( vmin )

Train( y )

  • n = Query( y )
  • if ShouldAdd( n, y )

– Connect( n, y )

ShouldAdd

  • NN: True
  • Classification: Diff. Class
  • Regression: Diff. by ε

September 16, 2015 k-Nearest Neighbors 31

slide-32
SLIDE 32

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

Algorithm Sketch

Boundary Forest

Query( y )

  • for ti : trees

– result[ i ] = ti.Test( y )

Result

  • NN: smallest d
  • Classification: 1/d vote
  • Regression: 1/d average

Train( y )

  • for ti : trees

– ti.Train( y )

Initialization

  • Root( ti ) = example[ i ]
  • r = remaining ( nt-1 )

– ti.Train( Rand( r, i ) )

September 16, 2015 k-Nearest Neighbors 32

slide-33
SLIDE 33

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

Checkup

  • ML task(s)?

– Classification: binary/multi-class?

  • Feature type(s)?
  • Implicit/explicit?
  • Parametric?
  • Online?

September 16, 2015 k-Nearest Neighbors 33

slide-34
SLIDE 34

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky

Summary: kNN

  • Practicality

– Easy, generally applicable – Need know nothing about the underlying process

  • Efficiency

– Training: lazy – Testing: only for small datasets

  • Though there are methods to help scale
  • Performance

– Depends upon data/parameters (e.g. D, V, k, …) – Bounded above by twice the Bayes error under certain reasonable assumptions; the error of the general kNN method asymptotically approaches that of the Bayes error and can be used to approximate it

September 16, 2015 k-Nearest Neighbors 34