Aykut Erdem // Hacettepe University // Fall 2019
Lecture 2:
Machine Learning by Examples, Nearest Neighbor Classifier
BBM406
Fundamentals of Machine Learning
photo:@rewardyfahmi // Unsplash
BBM406 Fundamentals of Machine Learning Lecture 2: Machine Learning - - PowerPoint PPT Presentation
photo:@rewardyfahmi // Unsplash BBM406 Fundamentals of Machine Learning Lecture 2: Machine Learning by Examples, Nearest Neighbor Classifier Aykut Erdem // Hacettepe University // Fall 2019 When Do We Use Machine Learning? ML is
Aykut Erdem // Hacettepe University // Fall 2019
Lecture 2:
Machine Learning by Examples, Nearest Neighbor Classifier
photo:@rewardyfahmi // Unsplash
When Do We Use Machine Learning?
ML is used when:
2
slide based on Ethem Alpaydin
A classic example of a task that requires machine learning: It is very hard to say what makes a 2
3
slide by Geoffrey Hinton
credit: Geoffrey Hinton
5
slide by Alex Smola
6
Amazon books Don’t mix preferences on Netflix!
slide by Alex Smola
7
Should be careful
8
Avatar learns from your behavior
Black & White Lionsgate Studios
slide by Alex Smola
9
https://www.youtube.com/watch?v=lleRKHsJBJ0
slide by Alex Smola
10
https://www.youtube.com/watch?v=5iZlrBqDYPM
11
ham spam
slide by Alex Smola
12
segment image recognize handwriting
slide by Alex Smola
13
slide by Alex Smola
14
why these ads?
slide by Alex Smola
15
Image: https://medium.com/waymo/simulation-how-one-flashing-yellow-light-turns-into-thousands-of-hours-of- experience-a7a1cb475565
16
Given an audio waveform, robustly extract & recognize any spoken words
17
I need to hide a body noun, verb, preposition, …
18
Yang et al., From Facial Parts Responses to Face Detection: A Deep Learning Approach, ICCV 2015
Scene Labeling via Deep Learning
19
[Farabet et al. ICML 2012, PAMI 2013]
slide by Eric Eaton
Topic Models of Text Documents
20
Topic Models of Text Documents
slide by Eric Sudderth
Genomics: group individuals by genetic similarity
21
slide by Daphne Koller
individuals genes
22
data
Learning
knowledge prior knowledge
slide by Stuart Russell
23
data
Learning
knowledge prior knowledge
slide by Stuart Russell
IF x THEN DO y
24
slide by Mehryar Mohri
Objectives of Machine Learning
25
general learning algorithms to
– deal with large-scale problems. – make accurate predictions (unseen examples). – handle a variety of different learning problems.
– what can be learned? Under what conditions? – what learning guarantees can be given? – what is the algorithmic complexity?
slide by Mehryar Mohri
represented as a vector, associated to an example (e.g., height and weight for gender prediction).
an object (e.g., positive or negative in binary classification); in regression real value.
algorithm (often labeled data).
26
slide by Mehryar Mohri
Definitions and Terminology (cont’d.)
algorithm (unlabeled data).
intermediate scenarios.
sequence of action.
27
slide by Mehryar Mohri
slide by Alex Smola
Given x find y in {-1, 1}
Given x find y in {1, ... k}
Given x find y in R (or Rd)
Given sequence x1 ... xl find y1 ... yl
Given x find a point in the hierarchy of y (e.g. a tree)
Given xt and yt-1 ... y1 find yt
29
l(y, f(x))
slide by Alex Smola
30
slide by Alex Smola
Multiclass Classification + Annotation
31
slide by Alex Smola
32
linear nonlinear
slide by Alex Smola
33
given sequence gene finding speech recognition activity segmentation named entities
slide by Alex Smola
34
webpages genes
slide by Alex Smola
35
tomorrow’s stock price
slide by Alex Smola
slide by Alex Smola
Find a set of prototypes representing the data
Find a subspace representing the data
Find a latent causal sequence for observations
Find (small) set of factors for observation
Find the odd one out
37
slide by Alex Smola
...
38
slide by Alex Smola
39
Variance component model to account for sample structure in genome-wide association studies, Nature Genetics 2010
slide by Alex Smola
40
slide by Alex Smola
41
find them automatically
slide by Alex Smola
42
typical atypical
slide by Alex Smola
algorithm and its tuning
43
and negative training examples.
supervised learning
slide by Thorsten Joachims
attributes (often called features).
membership in c based on attributes (i.e. label) (f is unknown).
46
correct (3) color (2)
(2) presentation (3) binder (2) A+ complete yes yes clear no yes complete no yes clear no yes partial yes no unclear no no complete yes yes clear yes yes correct
(complete, partial, guessing)
color
(yes, no)
(yes, no)
presentation
(clear, unclear, cryptic)
binder
(yes, no)
A+ 1 complete yes yes clear no yes 2 complete no yes clear no yes 3 partial yes no unclear no no 4 complete yes yes clear yes yes
slide by Thorsten Joachims
Concept Learning as Learning A Binary Function
– Learn (to imitate) a function f : X → {+1,-1}
– Learning algorithm is given the correct value of the function for particular inputs → training examples – An example is a pair (x, y), where x is the input and y = f(x) is the output of the target function applied to x.
– Find a function h: X → {+1,-1} that approximates f: X → {+1,-1} as well as possible.
47
slide by Thorsten Joachims
48
– Learn (to imitate) a function f : X → Y
– Learning algorithm is given the correct value of the function for particular inputs → training examples – An example is a pair (x, f (x)), where x is the input and y = f (x) is the output of the target function applied to x.
– Find a function h: X → Y that approximates f: X → Y as well as possible.
slide by Thorsten Joachims
Supervised / Inductive Learning
49
slide by Thorsten Joachims
Image Classification: a core task in Computer Vision
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
50
The problem: semantic gap
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
51
Challenges: Viewpoint Variation
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
52
Challenges: Illumination
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
53
Challenges: Deformation
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
54
Challenges: Occlusion
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
55
Challenges: Background clutter
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
56
Challenges: Intraclass variation
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
57
An image classifier
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Unlike e.g. sorting a list of numbers, no obvious way to hard-code the algorithm for recognizing a cat, or other classes.
58
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Attempts have been made
59
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
Data-driven approach: 1.Collect a dataset of images and labels 2.Use Machine Learning to train an image classifier 3.Evaluate the classifier on a withheld set of test images
60
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
First classifier: Nearest Neighbor Classifier
Remember all training images and their labels Predict the label of the most similar training image
61
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
62
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
63
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
How do we compare the images? What is the distance metric?
64
Lecture 2 - 6 Jan 2016 Lecture 2 - 6 Jan 2016 65
Nearest Neighbor classifier
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
65
Lecture 2 - 6 Jan 2016 Lecture 2 - 6 Jan 2016 66 remember the training data Nearest Neighbor classifier
66
Lecture 2 - 6 Jan 2016 Lecture 2 - 6 Jan 2016 67
for every test image:
image with L1 distance
image
Nearest Neighbor classifier
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
67
Lecture 2 - 6 Jan 2016 Lecture 2 - 6 Jan 2016 68 Q: how does the classification speed depend
the training data? Nearest Neighbor classifier
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
68
Lecture 2 - 6 Jan 2016 Lecture 2 - 6 Jan 2016 69 Q: how does the classification speed depend
training data? linearly :( Nearest Neighbor classifier
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
69
Lecture 2 - 6 Jan 2016 Lecture 2 - 6 Jan 2016 70
Aside: Approximate Nearest Neighbor find approximate nearest neighbors quickly
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
70
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
71
k-Nearest Neighbor
find the k nearest images, have them vote on the label
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
72
73
– Attribute vectors: 𝑦𝑗 ∈ 𝑌 – Labels: 𝑧𝑗 ∈ 𝑍
– Similarity function: 𝐿 ∶ 𝑌 × 𝑌 → R – Number of nearest neighbors to consider: k
– New example 𝑦′ – K-nearest neighbors: k train examples with largest 𝐿(𝑦𝑗, 𝑦′)
⃗, 𝑧 , … , x, 𝑧 )
– 𝑦 ⃗ ∈ 𝑌 – 𝑧 ∈ 𝑍
𝐿 ∶ 𝑌 × 𝑌 ¡ → ¡ℜ –
x’ – 𝐿(𝑦 ⃗, 𝑦 ⃗)
slide by Thorsten Joachims
74
slide by Thorsten Joachims
75
slide by Thorsten Joachims
76
slide by Thorsten Joachims
77
slide by Thorsten Joachims
For binary classification problems, why is it a good idea to use an odd number of K?
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
78
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
We will talk about this later!
79
80
81
82
– Attribute vectors: 𝑦𝑗 ∈ 𝑌 – Target attribute 𝑧𝑗 ∈ 𝑍
– Similarity function: 𝐿 ∶ 𝑌 × 𝑌 → R – Number of nearest neighbors to consider: k
– New example 𝑦′ – K-nearest neighbors: k train examples with largest 𝐿(𝑦𝑗, 𝑦′)
⃗, 𝑧 , … , 𝑦 ⃗, 𝑧
– 𝑦 ⃗ ∈ 𝑌 – 𝑧 ∈ 𝑍
𝐿 ∶ 𝑌 × 𝑌 ¡ → ¡ℜ –
x’ – 𝐿 𝑦 ⃗, 𝑦 ⃗
83
Where in the World? [Hays & Efros, CVPR 2008]
84
A nearest neighbor recognition example
slide by James Hays
85
Where in the World? [Hays & Efros, CVPR 2008]
slide by James Hays
86
Where in the World? [Hays & Efros, CVPR 2008]
slide by James Hays
Annotated by Flickr users
6+ million geotagged photos by 109,788 photographers
slide by James Hays
87
6+ million geotagged photos by 109,788 photographers
Annotated by Flickr users
slide by James Hays
88
89
slide by James Hays
89
90
slide by James Hays
slide by James Hays
91
92
slide by James Hays
slide by James Hays
93
94
slide by James Hays
slide by James Hays
95
96
slide by James Hays
Scene Completion [Hays & Efros, SIGGRAPH07]
97
slide by James Hays
98
… 200 total
Hays and Efros, SIGGRAPH 2007 slide by James Hays
99
Hays and Efros, SIGGRAPH 2007 slide by James Hays
100
Graph cut + Poisson blending
Hays and Efros, SIGGRAPH 2007 slide by James Hays
100
101
Hays and Efros, SIGGRAPH 2007 slide by James Hays
102
Hays and Efros, SIGGRAPH 2007 slide by James Hays
103
Hays and Efros, SIGGRAPH 2007 slide by James Hays
104
Hays and Efros, SIGGRAPH 2007 slide by James Hays
105
Hays and Efros, SIGGRAPH 2007 slide by James Hays
106
Hays and Efros, SIGGRAPH 2007 slide by James Hays
107
slide by Thorsten Joachims
1, 𝑧1 , … , 𝑦 𝑜, 𝑧𝑜
– 𝑦 𝑗 ∈ 𝑌 – 𝑧𝑗 ∈ ℜ
𝐿 ∶ 𝑌 × 𝑌 → ℜ –
x’ – 𝐿 𝑦 𝑗, 𝑦 ′
– Attribute vectors: 𝑦𝑗 ∈ 𝑌 – Target attribute 𝑧𝑗 ∈
– Similarity function: 𝐿 ∶ 𝑌 × 𝑌 → – Number of nearest neighbors to consider: k
– New example 𝑦′ – K-nearest neighbors: k train examples with largest 𝐿(𝑦𝑗,𝑦′)
R R
108
slide by Thorsten Joachims
Overview of Nearest Neighbors
109
slide by Rob Fergus
Linear Regression and Least Squares
110