Lecture 2: Nearest Neighbour Classifier Aykut Erdem September 2017 - - PowerPoint PPT Presentation

lecture 2
SMART_READER_LITE
LIVE PREVIEW

Lecture 2: Nearest Neighbour Classifier Aykut Erdem September 2017 - - PowerPoint PPT Presentation

Lecture 2: Nearest Neighbour Classifier Aykut Erdem September 2017 Hacettepe University Your 1st Classifier: Nearest Neighbor Classifier Concept Learning Definition: Acquire an operational definition of a general category of objects


slide-1
SLIDE 1

Lecture 2:

−Nearest Neighbour Classifier

Aykut Erdem

September 2017 Hacettepe University

slide-2
SLIDE 2

Your 1st Classifier: 
 Nearest Neighbor Classifier

slide-3
SLIDE 3

Concept Learning

  • Definition: Acquire an operational definition
  • f a general category of objects given positive

and negative training examples.

  • Also called binary classification, binary

supervised learning

3

slide by Thorsten Joachims

slide-4
SLIDE 4

Concept Learning Example

  • Instance Space X: Set of all possible objects describable by

attributes (often called features).

  • Concept c : Subset of objects from X (c is unknown).
  • Target Function f : Characteristic function indicating

membership in c based on attributes (i.e. label) (f is unknown).

  • Training Data S : Set of instances labeled with target function.

4

correct (3) color (2)

  • riginal

(2) presentation (3) binder (2) A+ complete yes yes clear no yes complete no yes clear no yes partial yes no unclear no no complete yes yes clear yes yes correct

(complete, partial, guessing)

color

(yes, no)

  • riginal

(yes, no)

presentation

(clear, unclear, cryptic)

binder

(yes, no)

A+ 1 complete yes yes clear no yes 2 complete no yes clear no yes 3 partial yes no unclear no no 4 complete yes yes clear yes yes

slide by Thorsten Joachims

slide-5
SLIDE 5

Concept Learning as Learning 
 A Binary Function

  • Task


– Learn (to imitate) a function f : X → {+1,-1}

  • Training Examples


– Learning algorithm is given the correct value of the 
 function for particular inputs → training examples
 – An example is a pair (x, y), where x is the input and 
 y = f(x) is the output of the target function applied to x.

  • Goal


– Find a function 
 h: X → {+1,-1} 
 that approximates 
 f: X → {+1,-1} 
 as well as possible.

5

slide by Thorsten Joachims

slide-6
SLIDE 6

Supervised Learning

6

  • Task


– Learn (to imitate) a function f : X → Y

  • Training Examples


– Learning algorithm is given the correct value of the function 
 for particular inputs → training examples
 – An example is a pair (x, f (x)), where x is the input and y = f (x) is 
 the output of the target function applied to x.

  • Goal


– Find a function 
 h: X → Y 
 that approximates 
 f: X → Y 
 as well as possible.

slide by Thorsten Joachims

slide-7
SLIDE 7

Supervised / Inductive Learning

  • Given
  • examples of a function (x, f (x))
  • Predict function f (x) for new examples x
  • Discrete f (x): Classification
  • Continuous f (x): Regression
  • f (x) = Probability(x): Probability estimation

7

slide by Thorsten Joachims

slide-8
SLIDE 8

8

slide-9
SLIDE 9

Image Classification: a core task in Computer Vision

slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

9

slide-10
SLIDE 10

The problem: semantic gap

slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

10

slide-11
SLIDE 11

Challenges: Viewpoint Variation

slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

11

slide-12
SLIDE 12

Challenges: Illumination

slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

12

slide-13
SLIDE 13

Challenges: Deformation

slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

13

slide-14
SLIDE 14

Challenges: Occlusion

slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

14

slide-15
SLIDE 15

Challenges: Background clutter

slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

15

slide-16
SLIDE 16

Challenges: Intraclass variation

slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

16

slide-17
SLIDE 17

An image classifier

slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

Unlike e.g. sorting a list of numbers, no obvious way to hard-code the algorithm for recognizing a cat, or other classes.

17

slide-18
SLIDE 18

slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

Attempts have been made

18

slide-19
SLIDE 19

slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

Data-driven approach: 1.Collect a dataset of images and labels 2.Use Machine Learning to train an image classifier 3.Evaluate the classifier on a withheld set of test images

19

slide-20
SLIDE 20

slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

First classifier: Nearest Neighbor Classifier

Remember all training images and their labels Predict the label of the most similar training image

20

slide-21
SLIDE 21

slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

21

slide-22
SLIDE 22

slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

22

slide-23
SLIDE 23

slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

How do we compare the images? What is the distance metric?

23

slide-24
SLIDE 24

Lecture 2 - 6 Jan 2016 Lecture 2 - 6 Jan 2016 24

Nearest Neighbor classifier

slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

24

slide-25
SLIDE 25

Lecture 2 - 6 Jan 2016 Lecture 2 - 6 Jan 2016 25 remember the training data Nearest Neighbor classifier

25

slide-26
SLIDE 26

Lecture 2 - 6 Jan 2016 Lecture 2 - 6 Jan 2016 26

for every test image:

  • find nearest train

image with L1 distance

  • predict the label
  • f nearest training

image

Nearest Neighbor classifier

slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

26

slide-27
SLIDE 27

Lecture 2 - 6 Jan 2016 Lecture 2 - 6 Jan 2016 27 Q: how does the classification speed depend

  • n the size of

the training data? Nearest Neighbor classifier

slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

27

slide-28
SLIDE 28

Lecture 2 - 6 Jan 2016 Lecture 2 - 6 Jan 2016 28 Q: how does the classification speed depend

  • n the size of the

training data? linearly :( Nearest Neighbor classifier

slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

28

slide-29
SLIDE 29

Lecture 2 - 6 Jan 2016 Lecture 2 - 6 Jan 2016 29

Aside: Approximate Nearest Neighbor find approximate nearest neighbors quickly

slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

29

slide-30
SLIDE 30

slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

30

slide-31
SLIDE 31

k-Nearest Neighbor

find the k nearest images, have them vote on the label

slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

31

slide-32
SLIDE 32

K-Nearest Neighbor (kNN)

32

  • Given: Training data {(!1,"1),…, (!n,"n )} 


– Attribute vectors: !# ∈ $
 – Labels: "# ∈ %

  • Parameter:


– Similarity function: & ∶ $ × $ → R
 – Number of nearest neighbors to consider: k

  • Prediction rule


– New example !′ 
 – K-nearest neighbors: k train examples with largest &(!#, !′)

  • ( 𝑦

⃗, 𝑧 , … , x, 𝑧 )

– 𝑦 ⃗ ∈ 𝑌 – 𝑧 ∈ 𝑍

𝐿 ∶ 𝑌 × 𝑌 ¡ → ¡ℜ –

x’ – 𝐿(𝑦 ⃗, 𝑦 ⃗)

slide by Thorsten Joachims

slide-33
SLIDE 33

1-Nearest Neighbor

33

slide by Thorsten Joachims

slide-34
SLIDE 34

4-Nearest Neighbors

34

slide by Thorsten Joachims

slide-35
SLIDE 35

4-Nearest Neighbors Sign

35

slide by Thorsten Joachims

slide-36
SLIDE 36

slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

36

slide-37
SLIDE 37

slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

We will talk about this later!

37

slide-38
SLIDE 38

If we get more data

38

  • 1 Nearest Neighbor
  • Converges to perfect solution if clear separation
  • Twice the minimal error rate 2p (1-p) for noisy problems
  • k-Nearest Neighbor
  • Converges to perfect solution if clear separation (but needs more data)
  • Converges to minimal error min(p, 1-p) for noisy problems if k increases
slide-39
SLIDE 39

Demo

39

slide-40
SLIDE 40

Weighted K-Nearest Neighbor

40

  • Given: Training data {(!1,"1),…, (!n,"n )} 


– Attribute vectors: !# ∈ $
 – Target attribute "# ∈ %

  • Parameter:


– Similarity function: & ∶ $ × $ → R
 – Number of nearest neighbors to consider: k

  • Prediction rule


– New example !′ 
 – K-nearest neighbors: k train examples with largest &(!#, !′)

  • 𝑦

⃗, 𝑧 , … , 𝑦 ⃗, 𝑧

– 𝑦 ⃗ ∈ 𝑌 – 𝑧 ∈ 𝑍

𝐿 ∶ 𝑌 × 𝑌 ¡ → ¡ℜ –

x’ – 𝐿 𝑦 ⃗, 𝑦 ⃗

slide-41
SLIDE 41

More Nearest Neighbors 
 in Visual Data

41

slide-42
SLIDE 42

Where in the World? [Hays & Efros, CVPR 2008]

42

A nearest neighbor
 recognition example

slide by James Hays

slide-43
SLIDE 43

43

Where in the World? [Hays & Efros, CVPR 2008]

slide by James Hays

slide-44
SLIDE 44

44

Where in the World? [Hays & Efros, CVPR 2008]

slide by James Hays

slide-45
SLIDE 45

Annotated by Flickr users

6+ million geotagged photos
 by 109,788 photographers

slide by James Hays

45

slide-46
SLIDE 46

6+ million geotagged photos
 by 109,788 photographers

Annotated by Flickr users

slide by James Hays

46

slide-47
SLIDE 47

47

slide by James Hays

47

slide-48
SLIDE 48

Scene Matches

48

slide by James Hays

slide-49
SLIDE 49

slide by James Hays

49

slide-50
SLIDE 50

Scene Matches

50

slide by James Hays

slide-51
SLIDE 51

slide by James Hays

51

slide-52
SLIDE 52

Scene Matches

52

slide by James Hays

slide-53
SLIDE 53

slide by James Hays

53

slide-54
SLIDE 54

The Importance of Data

54

slide by James Hays

slide-55
SLIDE 55

Scene Completion [Hays & Efros, SIGGRAPH07]

55

slide by James Hays

slide-56
SLIDE 56

56

… 200 total

Hays and Efros, SIGGRAPH 2007 slide by James Hays

slide-57
SLIDE 57

Context Matching

57

Hays and Efros, SIGGRAPH 2007 slide by James Hays

slide-58
SLIDE 58

58

Graph cut + Poisson blending

Hays and Efros, SIGGRAPH 2007 slide by James Hays

58

slide-59
SLIDE 59

59

Hays and Efros, SIGGRAPH 2007 slide by James Hays

slide-60
SLIDE 60

60

Hays and Efros, SIGGRAPH 2007 slide by James Hays

slide-61
SLIDE 61

61

Hays and Efros, SIGGRAPH 2007 slide by James Hays

slide-62
SLIDE 62

62

Hays and Efros, SIGGRAPH 2007 slide by James Hays

slide-63
SLIDE 63

63

Hays and Efros, SIGGRAPH 2007 slide by James Hays

slide-64
SLIDE 64

64

Hays and Efros, SIGGRAPH 2007 slide by James Hays

slide-65
SLIDE 65

Weighted K-NN for Regression

65

slide by Thorsten Joachims

  • 𝑦

1, 𝑧1 , … , 𝑦 𝑜, 𝑧𝑜

– 𝑦 𝑗 ∈ 𝑌 – 𝑧𝑗 ∈ ℜ

𝐿 ∶ 𝑌 × 𝑌 → ℜ –

x’ – 𝐿 𝑦 𝑗, 𝑦 ′

  • Given: Training data {(!1,"1),…, (!n,"n )} 


– Attribute vectors: !# ∈ $
 – Target attribute "# ∈

  • Parameter:


– Similarity function: & ∶ $ × $ →
 – Number of nearest neighbors to consider: k

  • Prediction rule


– New example !′ 
 – K-nearest neighbors: k train examples with largest &(!#,!′)

R R

slide-66
SLIDE 66

Collaborative Filtering

66

slide by Thorsten Joachims

slide-67
SLIDE 67

Overview of Nearest Neighbors

  • Very simple method
  • Retain all training data 

  • Can be slow in testing 

  • Finding NN in high dimensions is slow
  • Metrics are very important
  • Good baseline

67

slide by Rob Fergus

slide-68
SLIDE 68

Next Class:

Linear Regression and Least Squares

68