Introduction to Machine Learning COMPSCI 371D Machine Learning - - PowerPoint PPT Presentation

introduction to machine learning
SMART_READER_LITE
LIVE PREVIEW

Introduction to Machine Learning COMPSCI 371D Machine Learning - - PowerPoint PPT Presentation

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning Introduction to Machine Learning 1 / 18 Outline 1 Classification, Regression, Unsupervised Learning 2 About Dimensionality 3 Drawings and


slide-1
SLIDE 1

Introduction to Machine Learning

COMPSCI 371D — Machine Learning

COMPSCI 371D — Machine Learning Introduction to Machine Learning 1 / 18

slide-2
SLIDE 2

Outline

1 Classification, Regression, Unsupervised Learning 2 About Dimensionality 3 Drawings and Intuition in Higher Dimensions 4 Classification through Regression 5 Linear Separability

COMPSCI 371D — Machine Learning Introduction to Machine Learning 2 / 18

slide-3
SLIDE 3

About Slides

  • By popular demand, lecture slides will be made available
  • nline
  • They will show up just before a lecture starts
  • Slides are grouped by topic, not by lecture
  • Slides are not for studying
  • Class notes and homework assignments are the materials
  • f record

COMPSCI 371D — Machine Learning Introduction to Machine Learning 3 / 18

slide-4
SLIDE 4

Classification, Regression, Unsupervised Learning

Parenthesis: Supervised vs Unsupervised

  • Supervised: Train with (x, y)
  • Classification:

Hand-written digit recognition

  • Regression: Median age
  • f YouTube viewers for

each video

  • Unsupervised: Train with x
  • Clustering: Color

compression

  • Distances matter!
  • We will not cover unsupervised

learning

COMPSCI 371D — Machine Learning Introduction to Machine Learning 4 / 18

slide-5
SLIDE 5

Classification, Regression, Unsupervised Learning

Machine Learning Terminology

  • Predictor h : X → Y (the signature of h)
  • X ⊆ Rd is the data space
  • Y (categorical) is the label space for a classifier
  • Y (⊆ Re) is the value space for a regressor
  • A target is either a label or a value
  • H is the hypothesis space (all h we can choose from)
  • A training set is a subset T of X × Y
  • T

def

= 2X×Y is the class of all possible training sets

  • Learner λ : T → H, so that λ(T) = h
  • ℓ(y, ˆ

y) is the loss incurred for estimating ˆ y when the true prediction is y

  • LT(h) = 1

N

N

n=1 ℓ(yn, h(xn)) is the empirical risk of h on T

COMPSCI 371D — Machine Learning Introduction to Machine Learning 5 / 18

slide-6
SLIDE 6

About Dimensionality

H is Typically Parametric

  • For polynomials, h ↔ c
  • We write LT(c) instead of LT(h)
  • “Searching H” means “find the parameters”

ˆ c ∈ arg minc∈R

m Ac − b2

  • This is common in machine learning: h(x) = hθ(x),
  • θ: a vector of parameters
  • Abstract view: ˆ

h ∈ arg minh∈H LT(h)

  • Concrete view: ˆ

θ ∈ arg minθ∈R

m LT(θ)

  • Minimize a function of real variables, rather than of

“functions”

COMPSCI 371D — Machine Learning Introduction to Machine Learning 6 / 18

slide-7
SLIDE 7

About Dimensionality

Curb your Dimensions

  • For polynomials, hc(x) : X → Y

x ∈ X ⊆ Rd and c ∈ Rm

  • We saw that d > 1 and degree k > 1

⇒ m ≫ d

  • Specifically, m(d, k) =

d+k

k

  • Things blow up when k and d grow
  • More generally, hθ(x) : X → Y

x ∈ X ⊆ Rd and θ ∈ Rm

  • Which dimension(s) do we want to curb? m? d?
  • Both, for different but related reasons

COMPSCI 371D — Machine Learning Introduction to Machine Learning 7 / 18

slide-8
SLIDE 8

About Dimensionality

Problem with m Large

  • Even just for data fitting, we generally want N ≫ m, i.e.,

(possibly many) more samples than parameters to estimate

  • For instance, in Ac = b, we want A to have more rows than

columns

  • Remember that annotating training data is costly
  • So we want to curb m: We want a small H

COMPSCI 371D — Machine Learning Introduction to Machine Learning 8 / 18

slide-9
SLIDE 9

About Dimensionality

Problems with d Large

  • We do machine learning, not just data fitting!
  • We want h to generalize to new data
  • During training, we would like the learner to see a good

sampling of all possible x (“fill X nicely”)

  • With large d, this is impossible: The curse of dimensionality

COMPSCI 371D — Machine Learning Introduction to Machine Learning 9 / 18

slide-10
SLIDE 10

Drawings and Intuition in Higher Dimensions

Drawings Help Intuition

COMPSCI 371D — Machine Learning Introduction to Machine Learning 10 / 18

slide-11
SLIDE 11

Drawings and Intuition in Higher Dimensions

Intuition Often Fails in Many Dimensions

1 1 1−ε/2

  • Gray parts dominate when d → ∞
  • Distance from center to corners diverges when d → ∞

COMPSCI 371D — Machine Learning Introduction to Machine Learning 11 / 18

slide-12
SLIDE 12

Classification through Regression

Classifiers as Partitions of X

Xy

def

= h−1(y) partitions X (not just T!)

  • Classifier = partition
  • S = h−1(red square), C = h−1(blue circle)

COMPSCI 371D — Machine Learning Introduction to Machine Learning 12 / 18

slide-13
SLIDE 13

Classification through Regression

Classification, Geometry, and Regression

  • Classification partitions X ⊂ Rd into sets
  • How do we represent sets ⊂ Rd? How do we work with

them?

  • We’ll see a couple of ways: nearest-neighbor classifier,

decision trees

  • These methods have a strong geometric flavor
  • Beware of our intuition!
  • Another technique: score-based classifiers

i.e., classification through regression

COMPSCI 371D — Machine Learning Introduction to Machine Learning 13 / 18

slide-14
SLIDE 14

Classification through Regression

Score-Based Classifiers

Score Function s=0 s > 0 s < 0

[Figure adapted from Wei et al., Structural and Multidisciplinary Optimization, 58:831–849, 2018]

  • s = 0 defines the decision boundaries
  • s > 0 and s < 0 defines the (two) decision regions

COMPSCI 371D — Machine Learning Introduction to Machine Learning 14 / 18

slide-15
SLIDE 15

Classification through Regression

Score-Based Classifiers

  • Threshold some score function s(x):
  • Example: 's'(red squares) and 'c'(blue circles)
  • Correspond to two sets S ⊆ X and C = X \ S

If we can estimate something like s(x) = P[x ∈ S] h(x) = 's' if s(x) > 1/2 'c'

  • therwise

COMPSCI 371D — Machine Learning Introduction to Machine Learning 15 / 18

slide-16
SLIDE 16

Classification through Regression

Classification through Regression

  • If you prefer 0 as a threshold, let

s(x) = 2P[x ∈ S] − 1 ∈ [−1, 1] h(x) = 's' if s(x) > 0 'c'

  • therwise
  • Scores are convenient even without probabilities, because

they are easy to work with

  • We implement a classifier h by building a regressor s
  • Example: Logistic-regression classifiers

COMPSCI 371D — Machine Learning Introduction to Machine Learning 16 / 18

slide-17
SLIDE 17

Linear Separability

Linearly Separable Training Sets

  • Some line (hyperplane in Rd) separates C, S
  • Requires much smaller H
  • Simplest score: s(x) = b + wTx. The line is s(x) = 0

h(x) = 's' if s(x) > 0 'c'

  • therwise

COMPSCI 371D — Machine Learning Introduction to Machine Learning 17 / 18

slide-18
SLIDE 18

Linear Separability

Data Representation?

  • Linear separability is a property of the data in a given

representation

r Δr

  • Xform 1: z = x2

1 + x2 2 implies x ∈ S ⇔ a ≤ z ≤ b

  • Xform 2: z = |
  • x2

1 + x2 2 − r| implies linear separability:

x ∈ S ⇔ z ≤ ∆r

COMPSCI 371D — Machine Learning Introduction to Machine Learning 18 / 18