Machine Learning 2007: Lecture 3 Instructor: Tim van Erven - - PowerPoint PPT Presentation

machine learning 2007 lecture 3 instructor tim van erven
SMART_READER_LITE
LIVE PREVIEW

Machine Learning 2007: Lecture 3 Instructor: Tim van Erven - - PowerPoint PPT Presentation

Machine Learning 2007: Lecture 3 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website: www.cwi.nl/erven/teaching/0708/ml/ September 20, 2007 1 / 30 Overview Organisational Organisational Matters Matters Hypothesis Spaces


slide-1
SLIDE 1

1 / 30

Machine Learning 2007: Lecture 3 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website: www.cwi.nl/˜erven/teaching/0708/ml/

September 20, 2007

slide-2
SLIDE 2

Overview

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 2 / 30

  • Organisational Matters
  • Hypothesis Spaces
  • Method: Least Squares Linear Regression
  • Being Informal about Feature Vectors
  • Method: LIST-THEN-ELIMINATE for Concept Learning

A Biased Hypothesis Space

An Unbiased Hypothesis Space?

slide-3
SLIDE 3

Organisational Matters

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 3 / 30

Course Organisation:

  • Intermediate exam: October 25, 11.00 – 13.00 in 04A05.
  • Biweekly exercises

This Lecture versus Mitchell

  • All of it is in the book (Chapters 1 and 2), except for “Being

Informal About Feature Vectors”.

  • The presentation is different though: We recognise methods

from Mitchell as methods to deal with regression and classification.

slide-4
SLIDE 4

Overview

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 4 / 30

  • Organisational Matters
  • Hypothesis Spaces
  • Method: Least Squares Linear Regression
  • Being Informal about Feature Vectors
  • Method: LIST-THEN-ELIMINATE for Concept Learning

A Biased Hypothesis Space

An Unbiased Hypothesis Space?

slide-5
SLIDE 5

Reminder of Machine Learning Categories

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 5 / 30

Prediction: Given data D = y1, . . . , yn, predict how the

sequence continues with yn+1.

Regression: Given data D =

y1 x1

  • , . . . ,

yn xn

  • , learn to predict

the value of the label y for any new feature vector x. Typically y can take infinitely many values. Acceptable if your prediction is close to the correct y.

Classification: Given data D =

y1 x1

  • , . . . ,

yn xn

  • , learn to

predict the class label y for any new feature vector x. Only finitely many categories. Your prediction is either correct or wrong.

slide-6
SLIDE 6

Hypotheses and Hypothesis Spaces

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 6 / 30

Definition of a Hypothesis:

A hypothesis h is a candidate description of the regularity or patterns in your data.

  • Prediction example: yn+1 = h(y1, . . . , yn) = yn−1 + yn
  • Regression example: y = h(x) = 5x1
  • Classification example: y = h(x) =
  • +1

if 3x1 − 20 > 0; −1

  • therwise.

Definition of a Hypothesis Space:

A hypothesis space H is the set {h} of hypotheses that are being considered.

  • Regression example: {ha(x) = a · x1|a ∈ R}
slide-7
SLIDE 7

Overview

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 7 / 30

  • Organisational Matters
  • Hypothesis Spaces
  • Method: Least Squares Linear Regression
  • Being Informal about Feature Vectors
  • Method: LIST-THEN-ELIMINATE for Concept Learning

A Biased Hypothesis Space

An Unbiased Hypothesis Space?

slide-8
SLIDE 8

Linear Regression

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 8 / 30

Linear Regression:

In linear regression the goal is to select a linear hypothesis that best captures the regularity in the data.

−10 −5 5 10 15 −20 20 40 60 80 100

x y

slide-9
SLIDE 9

Hypothesis Space of Linear Hypotheses

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 9 / 30

Linear Function:

y = hw(x) = w0 + w1x1 + . . . + wdxd

  • x = (x1, . . . , xd)⊤ is a d-dimensional feature vector.
  • w = (w0, w1, . . ., wd)⊤ are called the weights.

Examples:

hw(x) = 2 + 9x1 (w0 = 2, w1 = 9) hw(x) = 3 + 16x1 − 2x3 (w0 = 3, w1 = 16, w2 = 0, w3 = −2)

Hypothesis Space of All Linear Hypotheses:

H = {hw | w ∈ Rd+1}.

slide-10
SLIDE 10

Example: A Linear Function with Noise

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 10 / 30

−10 −5 5 10 15 −20 20 40 60 80 100

x y

Data generated by a linear function y = 6x + 20 + ǫ, where ǫ is noise with distribution N(0, 10). Can we recover this function from the data alone?

slide-11
SLIDE 11

Determining Weights from the Data

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 11 / 30

Squared Error:

For given w, we may evaluate the squared error of hw on a single data-item yi xi

  • :

Squared Error = (yi − hw(xi))2

Least Squares Linear Regression:

Given data D = y1 x1

  • , . . . ,

yn xn

  • , select w to minimize the sum
  • f squared errors SSE(D) on all data:

min

w SSE(D) = min w n

  • i=1

(yi − hw(xi))2.

slide-12
SLIDE 12

Linear Regression Example

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 12 / 30

The previous example again:

−10 −5 5 10 15 −20 20 40 60 80 100

x y

Original Function y = 6x + 20 + ǫ

slide-13
SLIDE 13

Linear Regression Example

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 12 / 30

The previous example again:

−10 −5 5 10 15 −20 20 40 60 80 100

x y

Original Function Least Squares y = 6x + 20 + ǫ y = 6.38x + 17.37

slide-14
SLIDE 14

Inductive Bias

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 13 / 30

Least Squares Linear Regression:

  • Only looks for linear patterns in the data.

For example, it cannot discover y = x2

1 even if it gets an

infinite amount of data.

  • Minimizes the sum of squared errors.

Why not something else, like for example the sum of absolute errors? min

w n

  • i=1

|yi − hw(xi)|

slide-15
SLIDE 15

Overview

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 14 / 30

  • Organisational Matters
  • Hypothesis Spaces
  • Method: Least Squares Linear Regression
  • Being Informal about Feature Vectors
  • Method: LIST-THEN-ELIMINATE for Concept Learning

A Biased Hypothesis Space

An Unbiased Hypothesis Space?

slide-16
SLIDE 16

EnjoySport Representation 1

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 15 / 30

Numbering Attribute Values:

Attribute Sky AirTemp EnjoySport Value Sunny Cloudy Rainy Warm Cold No Yes Encoding 1 2 3 1 2 1 2

slide-17
SLIDE 17

EnjoySport Representation 1

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 15 / 30

Numbering Attribute Values:

Attribute Sky AirTemp EnjoySport Value Sunny Cloudy Rainy Warm Cold No Yes Encoding 1 2 3 1 2 1 2

Example:

Sky, AirTemp EnjoySport Representation Sunny, Warm Yes x = 1 1

  • , y = 2

Rainy, Cold No x = 3 2

  • , y = 1

Sunny, Cold Yes x = 1 2

  • , y = 2
  • The difference between feature vectors has no clear meaning. For

example 3 2

1 1

  • =

2 1

  • .
slide-18
SLIDE 18

EnjoySport Representation 2

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 16 / 30

Another Way to Do It:

Attribute Sky AirTemp EnjoySport Value Sunny Cloudy Rainy Warm Cold No Yes Encoding   1     1     1   1

  • 1
  • 1

2

slide-19
SLIDE 19

EnjoySport Representation 2

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 16 / 30

Another Way to Do It:

Attribute Sky AirTemp EnjoySport Value Sunny Cloudy Rainy Warm Cold No Yes Encoding   1     1     1   1

  • 1
  • 1

2

Example (table is on its side to fit vectors):

Sky, AirTemp Sunny, Warm Rainy, Cold Sunny, Cold EnjoySport Yes No Yes Representation x =       1 1       , y = 2 x =       1 1       , y = 1 x =       1 1       , y = 2

  • The number of non-zero entries in the difference between two

vectors is twice the number of attributes that differ.

slide-20
SLIDE 20

Being Informal about Feature Vectors

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 17 / 30

  • (Feature) vectors x and labels y contain numbers.
  • But sometimes it will be convenient to be informal

(mathematically imprecise): Formal Informal x = 1 1

x = Sunny Warm

  • y = 2

⇔ y = Yes

  • Why?

Reason 1: Don’t care about details of representation.

Reason 2: Emphasize meaning of features and labels.

  • Don’t forget what’s really going on!
slide-21
SLIDE 21

Overview

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 18 / 30

  • Organisational Matters
  • Hypothesis Spaces
  • Method: Least Squares Linear Regression
  • Being Informal about Feature Vectors
  • Method: LIST-THEN-ELIMINATE for Concept Learning

A Biased Hypothesis Space

An Unbiased Hypothesis Space?

slide-22
SLIDE 22

Hypothesis Space for EnjoySport

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 19 / 30

A hypothesis h is specified by a list of constraints on the attributes: Sky, AirTemp, Humidity, Wind, Water, Forecast. h(x) =

  • yes

if x satisfies all constraints, no

  • therwise.
slide-23
SLIDE 23

Hypothesis Space for EnjoySport

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 19 / 30

A hypothesis h is specified by a list of constraints on the attributes: Sky, AirTemp, Humidity, Wind, Water, Forecast. h(x) =

  • yes

if x satisfies all constraints, no

  • therwise.

List of constraints looks like: ?, Cold, High, ?, ?, ?

Attribute Description ? Any value is acceptable for the attribute. ∅ No value is acceptable. Warm Single required value for attribute (e.g. Warm)

slide-24
SLIDE 24

Hypothesis Space for EnjoySport

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 19 / 30

A hypothesis h is specified by a list of constraints on the attributes: Sky, AirTemp, Humidity, Wind, Water, Forecast. h(x) =

  • yes

if x satisfies all constraints, no

  • therwise.

List of constraints looks like: ?, Cold, High, ?, ?, ?

Attribute Description ? Any value is acceptable for the attribute. ∅ No value is acceptable. Warm Single required value for attribute (e.g. Warm)

Hypothesis Space:

H = {h} = {?, ?, ?, ?, ?, ?, Sunny, ?, ?, ?, ?, ?, Warm, ?, ?, ?, ?, ?, . . . , ∅, ∅, ∅, ∅, ∅, ∅}

slide-25
SLIDE 25

LIST-THEN-ELIMINATE Algorithm

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 20 / 30

  • Given: data D =

y1 x1

  • , . . .,

yn xn

  • .
  • A hypothesis h is consistent with example

yi xi

  • if it assigns

the right label to xi: h(xi) = yi.

  • LIST-THEN-ELIMINATE finds the set, VersionSpace, of all

hypotheses that are consistent with the training data.

slide-26
SLIDE 26

LIST-THEN-ELIMINATE Algorithm

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 20 / 30

  • Given: data D =

y1 x1

  • , . . .,

yn xn

  • .
  • A hypothesis h is consistent with example

yi xi

  • if it assigns

the right label to xi: h(xi) = yi.

  • LIST-THEN-ELIMINATE finds the set, VersionSpace, of all

hypotheses that are consistent with the training data.

LIST-THEN-ELIMINATE Algorithm:

1: VersionSpace ← H 2: for i = 1, . . . , n do 3:

Remove from VersionSpace any h such that h(xi) = yi.

4: end for 5: return VersionSpace

slide-27
SLIDE 27

LIST-THEN-ELIMINATE Example Run

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 21 / 30

Simplified Hypothesis Space:

Suppose for the moment that H = {?, ?, Sunny, ?, ∅, ?}.

Example Run:

x1 = Sunny Warm

  • , y1 = Yes

x2 = Rainy Cold

  • , y2 = No

?, ? + − Sunny, ? + + ∅, ? − +

  • + = consistent, − = inconsistent

Resulting VersionSpace:

VersionSpace = {Sunny, ?}

slide-28
SLIDE 28

Classifying New Instances

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 22 / 30

LIST-THEN-ELIMINATE:

  • Given: data D =

y1 x1

  • , . . .,

yn xn

  • .
  • LIST-THEN-ELIMINATE finds the set, VersionSpace, of all

hypotheses that are consistent with the training data.

Classifying New Instances:

  • Suppose we get xn+1, how should we classify it?
slide-29
SLIDE 29

Classifying New Instances

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 22 / 30

LIST-THEN-ELIMINATE:

  • Given: data D =

y1 x1

  • , . . .,

yn xn

  • .
  • LIST-THEN-ELIMINATE finds the set, VersionSpace, of all

hypotheses that are consistent with the training data.

Classifying New Instances:

  • Suppose we get xn+1, how should we classify it?
  • If all hypotheses in VersionSpace agree on the label of xn+1,

then it’s easy; Otherwise we don’t know: yn+1 =

  • z

if h(xn+1) = z for all h ∈ VersionSpace, don’t know

  • therwise.
slide-30
SLIDE 30

Inductive Bias and Practical Issues

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 23 / 30

Inductive Bias:

  • Can only learn target concepts that are contained in the

hypothesis space H.

  • Not robust if the target concept is not in H.
  • Sensitive to noise/errors in the training data: might

accidentally remove the best hypothesis.

  • Doesn’t have any preference between consistent hypotheses.

(Strength or weakness?)

slide-31
SLIDE 31

Inductive Bias and Practical Issues

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 23 / 30

Inductive Bias:

  • Can only learn target concepts that are contained in the

hypothesis space H.

  • Not robust if the target concept is not in H.
  • Sensitive to noise/errors in the training data: might

accidentally remove the best hypothesis.

  • Doesn’t have any preference between consistent hypotheses.

(Strength or weakness?)

Practical Issue:

  • Uses too much memory (to store VersionSpace). The book

discusses the CANDIDATE-ELIMINATION algorithm, which does the same thing using less memory.

slide-32
SLIDE 32

Overview

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 24 / 30

  • Organisational Matters
  • Hypothesis Spaces
  • Method: Least Squares Linear Regression
  • Being Informal about Feature Vectors
  • Method: LIST-THEN-ELIMINATE for Concept Learning

A Biased Hypothesis Space

An Unbiased Hypothesis Space?

slide-33
SLIDE 33

Some Notation: The Sets X and Y

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 25 / 30

X and Y:

  • X = {x} is the set of all possible feature vectors.
  • Y = {y} is the set of all possible labels.
slide-34
SLIDE 34

Some Notation: The Sets X and Y

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 25 / 30

X and Y:

  • X = {x} is the set of all possible feature vectors.
  • Y = {y} is the set of all possible labels.

The Number of Elements in a Set:

For any set A, we let |A| denote the number of elements in A. For example, |{a, b, c}| = 3.

slide-35
SLIDE 35

Some Notation: The Sets X and Y

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 25 / 30

X and Y:

  • X = {x} is the set of all possible feature vectors.
  • Y = {y} is the set of all possible labels.

The Number of Elements in a Set:

For any set A, we let |A| denote the number of elements in A. For example, |{a, b, c}| = 3.

EnjoySport Example:

Attribute Sky AirTemp Humidity Wind Water Forecast # Values 3 2 2 2 2 2

  • The number of possible feature vectors:
  • The number of possible labels:
slide-36
SLIDE 36

Some Notation: The Sets X and Y

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 25 / 30

X and Y:

  • X = {x} is the set of all possible feature vectors.
  • Y = {y} is the set of all possible labels.

The Number of Elements in a Set:

For any set A, we let |A| denote the number of elements in A. For example, |{a, b, c}| = 3.

EnjoySport Example:

Attribute Sky AirTemp Humidity Wind Water Forecast # Values 3 2 2 2 2 2

  • The number of possible feature vectors: |X| = 3 · 25 = 96
  • The number of possible labels:
slide-37
SLIDE 37

Some Notation: The Sets X and Y

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 25 / 30

X and Y:

  • X = {x} is the set of all possible feature vectors.
  • Y = {y} is the set of all possible labels.

The Number of Elements in a Set:

For any set A, we let |A| denote the number of elements in A. For example, |{a, b, c}| = 3.

EnjoySport Example:

Attribute Sky AirTemp Humidity Wind Water Forecast # Values 3 2 2 2 2 2

  • The number of possible feature vectors: |X| = 3 · 25 = 96
  • The number of possible labels: |Y| = 2
slide-38
SLIDE 38

Counting Hypotheses

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 26 / 30

LIST-THEN-ELIMINATE:

  • Syntactically distinct hypotheses: 5 · 45 = 5120
  • But Warm, ?, ?, ∅, ?, Change = ∅, ∅, ∅, ∅, ∅, ∅ and the same

holds for any hypothesis containing at least one ∅.

  • Semantically distinct hypotheses: 1 + 4 · 35 = 973
slide-39
SLIDE 39

Counting Hypotheses

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 26 / 30

LIST-THEN-ELIMINATE:

  • Syntactically distinct hypotheses: 5 · 45 = 5120
  • But Warm, ?, ?, ∅, ?, Change = ∅, ∅, ∅, ∅, ∅, ∅ and the same

holds for any hypothesis containing at least one ∅.

  • Semantically distinct hypotheses: 1 + 4 · 35 = 973

All possible hypotheses:

  • A hypothesis h can be any function from X to Y.
  • To each feature vector in X it might assign any label from Y.
  • Semantically distinct hypotheses: |Y||X| = 296 ≈ 1029

Conclusion:

LIST-THEN-ELIMINATE has a very strong representation bias.

slide-40
SLIDE 40

Overview

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 27 / 30

  • Organisational Matters
  • Hypothesis Spaces
  • Method: Least Squares Linear Regression
  • Being Informal about Feature Vectors
  • Method: LIST-THEN-ELIMINATE for Concept Learning

A Biased Hypothesis Space

An Unbiased Hypothesis Space?

slide-41
SLIDE 41

An Unbiased Hypothesis Space

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 28 / 30

All Possible Hypotheses:

  • Why not take all possible hypotheses as a hypothesis space

for LIST-THEN-ELIMINATE? H = {h|h is a function from X to Y}

slide-42
SLIDE 42

An Unbiased Hypothesis Space

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 28 / 30

All Possible Hypotheses:

  • Why not take all possible hypotheses as a hypothesis space

for LIST-THEN-ELIMINATE? H = {h|h is a function from X to Y}

LIST-THEN-ELIMINATE:

  • Given: data D =

y1 x1

  • , . . .,

yn xn

  • .
  • What happens if we try to classify a new feature vector xn+1?
slide-43
SLIDE 43

Classifying New Instances

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 29 / 30

  • For any hypothesis h ∈ H, there exists a h′ ∈ H such that

h(x) = h′(x) if x = xn+1, h(x) = h′(x) for any other x.

slide-44
SLIDE 44

Classifying New Instances

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 29 / 30

  • For any hypothesis h ∈ H, there exists a h′ ∈ H such that

h(x) = h′(x) if x = xn+1, h(x) = h′(x) for any other x.

Consequence:

  • Suppose xn+1 does not occur in D.
  • Then for every h ∈ VersionSpace, there exists an alternative

h′ ∈ VersionSpace that disagrees on the label of xn+1: h(xn+1) = h′(xn+1)

Conclusion:

In an unbiased hypothesis space, the LIST-THEN-ELIMINATE algorithm cannot generalise at all. Bias is unavoidable!

slide-45
SLIDE 45

Summary

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? Summary 30 / 30

  • Hypothesis h: candidate description of regularity in the data
  • Hypothesis space H: set of hypotheses being considered
slide-46
SLIDE 46

Summary

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? Summary 30 / 30

  • Hypothesis h: candidate description of regularity in the data
  • Hypothesis space H: set of hypotheses being considered
  • Least squares linear regression:

Method for regression

Selects the linear hypothesis that minimizes the sum of squared errors on the data.

slide-47
SLIDE 47

Summary

Organisational Matters Hypothesis Spaces Least Squares Linear Regression Being Informal about Feature Vectors LIST-THEN-ELIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? Summary 30 / 30

  • Hypothesis h: candidate description of regularity in the data
  • Hypothesis space H: set of hypotheses being considered
  • Least squares linear regression:

Method for regression

Selects the linear hypothesis that minimizes the sum of squared errors on the data.

  • The LIST-THEN-ELIMINATE algorithm:

Method for classification/concept learning

Finds the set, VersionSpace, of hypotheses in H that are consistent with the data.

With H containing a list of constraints on attributes, it has a strong representation bias.

With H containing all possible hypotheses it cannot generalise: bias is unavoidable!