Read Chapter 7 of Machine Learning [Suggested exercises: 7.1, 7.2, - - PowerPoint PPT Presentation

read chapter 7 of machine learning
SMART_READER_LITE
LIVE PREVIEW

Read Chapter 7 of Machine Learning [Suggested exercises: 7.1, 7.2, - - PowerPoint PPT Presentation

Read Chapter 7 of Machine Learning [Suggested exercises: 7.1, 7.2, 7.5, 7.7] Function Approximation Given: Instance space X: - e.g. X is set of boolean vectors of length n; x = <0,1,1,0,0,1> Hypothesis space H: set of functions


slide-1
SLIDE 1

Read Chapter 7 of Machine Learning

[Suggested exercises: 7.1, 7.2, 7.5, 7.7]

slide-2
SLIDE 2

Given:

  • Instance space X:
  • e.g. X is set of boolean vectors of length n; x = <0,1,1,0,0,1>
  • Hypothesis space H: set of functions h: X Y
  • e.g., H is the set of boolean functions (Y={0,1}) defined by conjunction of

constraints on the features of x.

  • Training Examples D: sequence of positive and negative examples of an

unknown target function c: X {0,1}

  • <x1, c(x1)>, … <xm, c(xm)>

Determine:

  • A hypothesis h in H such that h(x)=c(x) for all x in X

Function Approximation

slide-3
SLIDE 3

Given:

  • Instance space X:
  • e.g. X is set of boolean vectors of length n; x = <0,1,1,0,0,1>
  • Hypothesis space H: set of functions h: X Y
  • e.g., H is the set of boolean functions (Y={0,1}) defined by conjunctions of

constraints on the features of x.

  • Training Examples D: sequence of positive and negative examples of an

unknown target function c: X {0,1}

  • <x1, c(x1)>, … <xm, c(xm)>

Determine:

  • A hypothesis h in H such that h(x)=c(x) for all x in X
  • A hypothesis h in H such that h(x)=c(x) for all x in D

Function Approximation

What we want What we want What we can What we can

  • bserve
  • bserve
slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6

Instances, Hypotheses, and More-General-Than

slide-7
SLIDE 7

i.e., minimizes the number of queries needed to converge to the correct hypothesis.

slide-8
SLIDE 8
slide-9
SLIDE 9
slide-10
SLIDE 10
slide-11
SLIDE 11
slide-12
SLIDE 12

D

instances drawn at random from

Probability distribution P(x) Set of training examples

slide-13
SLIDE 13

D

instances drawn at random from

Probability distribution P(x) Set of training examples Can we bound in terms of ?? D

slide-14
SLIDE 14
slide-15
SLIDE 15

true error less

slide-16
SLIDE 16

Any(!) learner that outputs a hypothesis consistent with all training examples (i.e., an h contained in VSH,D)

slide-17
SLIDE 17

What it means

[Haussler, 1988]: probability that the version space is not ε-exhausted after m training examples is at most

  • 1. How many training examples suffice?

Suppose we want this probability to be at most δ

  • 2. If then with probability at least (1-δ):
slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20

Sufficient condition: Holds if L requires

  • nly a polynomial

number of training examples, and processing per example is polynomial

slide-21
SLIDE 21

true error training error degree of overfitting

slide-22
SLIDE 22

Additive Hoeffding Bounds – Agnostic Learning

  • Given m independent coin flips of coin with Pr(heads) = θ

bound the error in the estimate

  • Relevance to agnostic learning: for any single hypothesis h
  • But we must consider all hypotheses in H
  • So, with probability at least (1-δ) every h satisfies
slide-23
SLIDE 23

General Hoeffding Bounds

  • When estimating parameter θ ∈ [a,b] from m examples
  • When estimating a probability θ ∈ [0,1], so
  • And if we’re interested in only one-sided error, then