Outline Learning from Examples 1 Motivation Supervised Learning - - PowerPoint PPT Presentation

outline
SMART_READER_LITE
LIVE PREVIEW

Outline Learning from Examples 1 Motivation Supervised Learning - - PowerPoint PPT Presentation

Learning from Examples Classification Regression Model Selection & Generalization Recap Learning from Examples Classification Regression Model Selection & Generalization Recap Outline Learning from Examples 1 Motivation


slide-1
SLIDE 1

Learning from Examples Classification Regression Model Selection & Generalization Recap

Supervised Learning

Steven J Zeil

Old Dominion Univ.

Fall 2010

1 Learning from Examples Classification Regression Model Selection & Generalization Recap

Outline

1

Learning from Examples Motivation Aspects of Supervised Learning

2

Classification Hypotheses Noise Multiple Classes

3

Regression

4

Model Selection & Generalization

5

Recap

2 Learning from Examples Classification Regression Model Selection & Generalization Recap

Motivation

We have a collection of data Some instances of which are known to be examples of an interesting class We wish to learn a rule by which we can determine membership in that class.

3 Learning from Examples Classification Regression Model Selection & Generalization Recap

Example: Spam Detection

Training set: A collection of email messages Humans have flagged some as spam. Knowledge Extraction:

We scan the messages for “typical” spam vocabulary: names

  • f drugs, “mortgage”, Nigeria, . . .

We also scan for irregularities in the formal content

  • bfuscated URLs

routing via open SMTP relays mismatches between “From:” domain and IP address of

  • rigin, etc.

Input Representation: x,

xi, 0 ≤ i < k: count of number of occurrences of spam term wi xi, k ≤ i < n: 1 if irregularity zi−k is present, 0 ow.

4

slide-2
SLIDE 2

Learning from Examples Classification Regression Model Selection & Generalization Recap

Example: “Family” Cars

Is car x a “family car”? Knowledge Extraction: What features define a “family” car? Training set: positive (+) and negative (−) examples Input representation: x1: price, x2: engine power

5 Learning from Examples Classification Regression Model Selection & Generalization Recap

Aspects of Supervised Learning

We have a sample X:{ xt, rt}N

t=1

that is independent and identically distributed

  • rdering is not important

all instances are drawn from the same joint distribution p( x, r)

A model g( x|θ) θ are the parameters An procedure to find the “best” parameter values θ∗

6 Learning from Examples Classification Regression Model Selection & Generalization Recap

Training Set X: Family Cars

X: { xt, rt} Input: x = x1 x2

  • Label:

r = 1if x is a family car 0otherwise

7 Learning from Examples Classification Regression Model Selection & Generalization Recap

Hypothesis Class

We have a hypothesis class H parameterized by p1, p2, e1, e2), (p1 ≤ price ≤ p2) ∧ (e1 ≤ enginepower ≤ e2) We want to “learn” a hypothesis h ∈ H to approximate our

8

slide-3
SLIDE 3

Learning from Examples Classification Regression Model Selection & Generalization Recap

Empirical Error

Given an hypothesis h that returns 1 for a positive prediction and 0 for a negative prediction, The error of the hypothesis: E(h|X) =

N

  • t=1

1(h( xt) = rt))

9 Learning from Examples Classification Regression Model Selection & Generalization Recap

Version Space

In this case, there are an infinite number of hypotheses (the version space) for which E(h|X) = 0.

Suppose that the blue area is the “true” class If we choose the yellow region as our h, the pure yellow bars represent false positives, the pure blue represent false negatives

10 Learning from Examples Classification Regression Model Selection & Generalization Recap

Doubt

The most specific hypothesis S is the tightest rectangle enclosing the positive examples.

prone to false negatives

The most general hypothesis G is the largest rectangle enclosing the positive examples but containing no negative examples.

prone to false positives

Perhaps we should choose something in between? S ⊆ G, G − S is the region of doubt

11 Learning from Examples Classification Regression Model Selection & Generalization Recap

Model Complexity

What if not every point can be accounted for by H? Use a more complicated hypothesis class, or Stay with the simpler one and tolerate a non-zero error? Depends partly on whether we believe this is noise or a failure of modeling.

12

slide-4
SLIDE 4

Learning from Examples Classification Regression Model Selection & Generalization Recap

Coping with Noise

Arguments in favor of staying with the simpler model Easier/faster to work with Easier to train (fewer examples needed) Easier to explain May generalize better (Occam’s razor)

13 Learning from Examples Classification Regression Model Selection & Generalization Recap

Learning Multiple Classes

Training set is X:{ xt, rt}N

t=1

where rt

i =

1 if xt ∈ Ci 0 if xt ∈ Cj, j = i Notice that rt partitions the input space (even if the {Ci} do not).

14 Learning from Examples Classification Regression Model Selection & Generalization Recap

Multiple Class Hypotheses

Train hypotheses hi( x), i = 1, . . . , k: hi( xt) = 1 if xt ∈ Ci 0 if xt ∈ Cj, j = i We train k hypotheses to recognize k classes,

Each class against the world

15 Learning from Examples Classification Regression Model Selection & Generalization Recap

Regression

g(x) = w1x + w0 g(x) = w2x2 + w1x + w0 X = { xt, rt} rt ∈ R rt = f (xt) + ǫ E(g|X) = 1 N

N

  • t=1
  • rt − g(xt)
  • 16
slide-5
SLIDE 5

Learning from Examples Classification Regression Model Selection & Generalization Recap

Regression

g(x) = w1x + w0 g(x) = w2x2 + w1x + w0 Learning is an ill-posed problem; data is usually not sufficient to find a unique solution. Assumptions about H form our inductive basis Generalization: how well does a model perform on new data? Overfitting: H is more complex than C (or f ) Underfitting: H is less complex than C (or f )

17 Learning from Examples Classification Regression Model Selection & Generalization Recap

Triple Trade-Off

There is a trade-off between three factors

1

Complexity of H, c(H),

2

Training set size, N

3

Generalization error on new data, E

As N increases, so does E As c(H) increases, E initially decreases and then increases

18 Learning from Examples Classification Regression Model Selection & Generalization Recap

Recap: Aspects of Supervised Learning

We have a sample X:{ xt, rt}N

t=1

that is independent and identically distributed

  • rdering is not important

all instances are drawn from the same joint distribution p( x, r)

A model g( x|θ) θ are the parameters A Loss function L(·) computing the different between the desired output rt and our approximation to it, g( xt|θ) Approximation Error or Loss E(θ|X) =

  • t

L

  • rt, g(

xt|θ)

  • An optimization procedure to find θ∗

θ∗ = arg min

θ E(θ|X)

19