ECE 5984: Introduction to Machine Learning Topics: Supervised - - PowerPoint PPT Presentation

ece 5984 introduction to machine learning
SMART_READER_LITE
LIVE PREVIEW

ECE 5984: Introduction to Machine Learning Topics: Supervised - - PowerPoint PPT Presentation

ECE 5984: Introduction to Machine Learning Topics: Supervised Learning Measuring performance Nearest Neighbour Readings: Barber 14 (kNN) Dhruv Batra Virginia Tech TA: Qing Sun PhD candidate at ECE department Research


slide-1
SLIDE 1

ECE 5984: Introduction to Machine Learning

Dhruv Batra Virginia Tech

Topics:

– Supervised Learning – Measuring performance – Nearest Neighbour Readings: Barber 14 (kNN)

slide-2
SLIDE 2

TA: Qing Sun

  • PhD candidate at ECE department
  • Research work/interest:

– Diverse outputs based on structured probabilistic models – Structured-output prediction

(C) Dhruv Batra 2

slide-3
SLIDE 3

Recap from last time

(C) Dhruv Batra 3

slide-4
SLIDE 4

(C) Dhruv Batra 4

Slide Credit: Yaser Abu-Mostapha

slide-5
SLIDE 5

Nearest Neighbour

  • Demo 1

– http://cgm.cs.mcgill.ca/~soss/cs644/projects/perrier/ Nearest.html

  • Demo 2

– http://www.cs.technion.ac.il/~rani/LocBoost/

(C) Dhruv Batra 5

slide-6
SLIDE 6

Spring 2013 Projects

  • Gender Classification from body proportions

– Igor Janjic & Daniel Friedman, Juniors

(C) Dhruv Batra 6

slide-7
SLIDE 7

Plan for today

  • Supervised/Inductive Learning

– (A bit more on) Loss functions

  • Nearest Neighbour

– Common Distance Metrics – Kernel Classification/Regression – Curse of Dimensionality

(C) Dhruv Batra 7

slide-8
SLIDE 8

Loss/Error Functions

  • How do we measure performance?
  • Regression:

– L2 error

  • Classification:

– #misclassifications – Weighted misclassification via a cost matrix – For 2-class classification:

  • True Positive, False Positive, True Negative, False Negative

– For k-class classification:

  • Confusion Matrix
  • ROC curves

– http://psych.hanover.edu/JavaTest/SDT/ROC.html

(C) Dhruv Batra 8

slide-9
SLIDE 9

Nearest Neighbours

(C) Dhruv Batra 9 Image Credit: Wikipedia

slide-10
SLIDE 10

Instance/Memory-based Learning

Four things make a memory based learner:

  • A distance metric
  • How many nearby neighbors to look at?
  • A weighting function (optional)
  • How to fit with the local points?

Slide Credit: Carlos Guestrin (C) Dhruv Batra 10

slide-11
SLIDE 11

1-Nearest Neighbour

Four things make a memory based learner:

  • A distance metric

– Euclidean (and others)

  • How many nearby neighbors to look at?

– 1

  • A weighting function (optional)

– unused

  • How to fit with the local points?

– Just predict the same output as the nearest neighbour.

Slide Credit: Carlos Guestrin (C) Dhruv Batra 11

slide-12
SLIDE 12

k-Nearest Neighbour

Four things make a memory based learner:

  • A distance metric

– Euclidean (and others)

  • How many nearby neighbors to look at?

– k

  • A weighting function (optional)

– unused

  • How to fit with the local points?

– Just predict the average output among the nearest neighbours.

(C) Dhruv Batra 12 Slide Credit: Carlos Guestrin

slide-13
SLIDE 13

1-NN for Regression

(C) Dhruv Batra 13

x y

Here, this is the closest datapoint Figure Credit: Carlos Guestrin

slide-14
SLIDE 14

Multivariate distance metrics

Suppose the input vectors x1, x2, …xN are two dimensional: x1 = ( x11 , x12 ) , x2 = ( x21 , x22 ) , …xN = ( xN1 , xN2 ). One can draw the nearest-neighbor regions in input space. Dist(xi,xj) =(xi1 – xj1)2+(3xi2 – 3xj2)2

The relative scalings in the distance metric affect region shapes

Dist(xi,xj) = (xi1 – xj1)2 + (xi2 – xj2)2

Slide Credit: Carlos Guestrin

slide-15
SLIDE 15

Euclidean distance metric

where Or equivalently, D(x, x0) = sX

i

σ2

i (xi − x0 i)2

D(x, x0) = q (xi − x0

i)T A(xi − x0 i)

A

Slide Credit: Carlos Guestrin

slide-16
SLIDE 16

Scaled Euclidian (L2) Mahalanobis (non-diagonal A)

Notable distance metrics (and their level sets)

Slide Credit: Carlos Guestrin

slide-17
SLIDE 17

Minkowski distance

(C) Dhruv Batra 17

Image Credit: By Waldir (Based on File:MinkowskiCircles.svg) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

slide-18
SLIDE 18

L1 norm (absolute) Linf (max) norm Scaled Euclidian (L2) Mahalanobis (non-diagonal A)

Notable distance metrics (and their level sets)

Slide Credit: Carlos Guestrin

slide-19
SLIDE 19

Parametric vs Non-Parametric Models

  • Does the capacity (size of hypothesis class) grow

with size of training data?

– Yes = Non-Parametric Models – No = Parametric Models

  • Example

– http://www.theparticle.com/applets/ml/nearest_neighbor/

(C) Dhruv Batra 19

slide-20
SLIDE 20

Weighted k-NNs

  • Neighbors are not all the same
slide-21
SLIDE 21

1 vs k Nearest Neighbour

(C) Dhruv Batra 21 Image Credit: Ying Wu

slide-22
SLIDE 22

1 vs k Nearest Neighbour

(C) Dhruv Batra 22 Image Credit: Ying Wu

slide-23
SLIDE 23

1-NN for Regression

(C) Dhruv Batra 23

x y

Here, this is the closest datapoint Figure Credit: Carlos Guestrin

slide-24
SLIDE 24

1-NN for Regression

  • Often bumpy (overfits)

(C) Dhruv Batra 24 Figure Credit: Andrew Moore

slide-25
SLIDE 25

9-NN for Regression

  • Often bumpy (overfits)

(C) Dhruv Batra 25 Figure Credit: Andrew Moore

slide-26
SLIDE 26

Kernel Regression/Classification

Four things make a memory based learner:

  • A distance metric

– Euclidean (and others)

  • How many nearby neighbors to look at?

– All of them

  • A weighting function (optional)

– wi = exp(-d(xi, query)2 / σ2) – Nearby points to the query are weighted strongly, far points

  • weakly. The σ parameter is the Kernel Width. Very important.
  • How to fit with the local points?

– Predict the weighted average of the outputs predict = Σwiyi / Σwi

(C) Dhruv Batra 26 Slide Credit: Carlos Guestrin