ECE 4524 Artificial Intelligence and Engineering Applications - - PowerPoint PPT Presentation

ece 4524 artificial intelligence and engineering
SMART_READER_LITE
LIVE PREVIEW

ECE 4524 Artificial Intelligence and Engineering Applications - - PowerPoint PPT Presentation

ECE 4524 Artificial Intelligence and Engineering Applications Lecture 23: Learning Theory Reading: AIAMA 18.4-18.5 Todays Schedule: Evaluating Hypotheses/Models PAC Learning and Sample Complexity Assumptions about Training and


slide-1
SLIDE 1

ECE 4524 Artificial Intelligence and Engineering Applications

Lecture 23: Learning Theory Reading: AIAMA 18.4-18.5 Today’s Schedule:

◮ Evaluating Hypotheses/Models ◮ PAC Learning and Sample Complexity

slide-2
SLIDE 2

Assumptions about Training and Testing Sets

Critical assumptions of supervised learning are

◮ the true f does not change, it is stationary ◮ the samples from f are independent and identically distributed

(IID)

slide-3
SLIDE 3

Error Rate

We define the error rate as the proportion of mistakes made by h

  • ver a set of N examples

Error Rate = 1 N

N

  • i=1

✶yi=h(xi) where ✶ is the indicator function.

◮ When this error rate is zero over the training set, h is said to

be consistent.

◮ It is always possible to find a hypothesis space H complex

enough so that some h ∈ H is consistent.

slide-4
SLIDE 4

Test Error Rate

Thus we are more concerned with the test error rate.

◮ A low test error indicate h generalizes well ◮ Often a consistent hypothesis has worse generalization than a

less-complex one.

◮ This trade-off between the complexity of H and the test

performance is the core of supervised machine learning.

slide-5
SLIDE 5

Cross-Validation

◮ So, the test error is the final word on the performance of h,

but recall that we can only use the test set once. Otherwise we are said to be peeking.

◮ However, if we use the entire training set for training we will

likely over-train.

◮ The answer is to use cross-validation to estimate the

generalization performance of h. We partition the training set into a training and validation set.

◮ holdout cross-validation - reserve a percentage (typically 1/3)

from D for validation.

◮ k-fold cross-validation - generate k independent subsets of D.

giving k estimates of generalization performance

◮ when k = N this is called leave-one-out cross-validation.

slide-6
SLIDE 6
slide-7
SLIDE 7

Selecting Hypothesis Complexity

So, to select an optimal h we need a learning algorithm, a way to

  • ptimize the parameters over a given set H

◮ Define the size of H as some parameter which adjusts the

complexity of H.

◮ For increasing values of size use cross-validation and the

learning algorithm to give an estimate of the training and validation error.

◮ stop when h is consistent or the training error has converged ◮ search backwards to find the size with the smallest validation

error

◮ finally, train h at the optimal size using the full training set.

slide-8
SLIDE 8
slide-9
SLIDE 9

Loss Functions

Minimizing the error rate assumes that all errors are equal in the success of the agent. From our discussion of Utility we know this is not true.

◮ In ML it is traditional to work with a cost rather than utility

via a loss function. L(x, y, ˆ y) = U(result of y given x) − U(result of ˆ y given x) where y = f (x) and ˆ y = h(x)

◮ We often assume no dependence on x on the loss so we just

have L(y, ˆ y).

slide-10
SLIDE 10

Empirical Loss

◮ We would like to minimize the expected loss over the

validation set

N

  • i=1

L(yi, h(xi))P(xi, yi) however we don’t know the joint probability

◮ Instead we assume a uniform distribution and optimize the

empirical loss 1 N

N

  • i=1

L(yi, h(xi))

slide-11
SLIDE 11

Probably Approximately Correct Learning

For Boolean functions (binary classifiers) define: error(h) =

  • x
  • y

L0/1(y, h(x)) N ≥ 1

ǫ

  • ln 1

δ + ln |H|

slide-12
SLIDE 12

Next Actions

◮ Reading on Linear Models (AIAMA 18.6) ◮ No warmup.

Reminders:

◮ Quiz 3 is this Thursday (4/12).