CS485/685 Lecture 16: March 1, 2012 Agnostic Learning [BDSS] Chapters - - PDF document

cs485 685 lecture 16 march 1 2012
SMART_READER_LITE
LIVE PREVIEW

CS485/685 Lecture 16: March 1, 2012 Agnostic Learning [BDSS] Chapters - - PDF document

01/03/2012 CS485/685 Lecture 16: March 1, 2012 Agnostic Learning [BDSS] Chapters 2, 3 CS485/685 (c) 2012 P. Poupart 1 Agnostic PAC Learning Definition: A learner that doesnt assume that contains an error free hypothesis and that simply


slide-1
SLIDE 1

01/03/2012 1

CS485/685 Lecture 16: March 1, 2012

Agnostic Learning [BDSS] Chapters 2, 3

CS485/685 (c) 2012 P. Poupart 1

Agnostic PAC Learning

  • Definition: A learner that doesn’t assume that

contains an error free hypothesis and that simply finds the hypothesis with minimum training error is

  • ften called an agnostic learner

CS485/685 (c) 2012 P. Poupart 2

slide-2
SLIDE 2

01/03/2012 2

Agnostic PAC Learnability

  • A hypothesis class is agnostic PAC learnable if for

any 0, ∈ 0,1, there exists an N

  • ,
  • and a learning algorithm such that for any and N

i.i.d. samples it returns ∈ such that with probability 1 min

CS485/685 (c) 2012 P. Poupart 3

‐representative

  • Definition: A training set is called ‐representative if

∀ ∈ , | |

  • Lemma: Assume that a training set is
  • ‐representative.

Then any output of an empirical risk minimizing algorithm satisfies min

  • Proof:
  • CS485/685 (c) 2012 P. Poupart

4

slide-3
SLIDE 3

01/03/2012 3

Uniform Convergence

  • Definition: A hypothesis class has the uniform

convergence property if there exists a function : 0,1 → such that for every probability distribution , if is a sample of , examples drawn i.i.d. according to , then with probability at least 1 , is ‐representative.

CS485/685 (c) 2012 P. Poupart 5

Uniform Convergence

  • Corollary 2: If a class has the uniform convergence

property with a function , then the class is agnostically PAC learnable with sample complexity N

  • , . Furthermore, an empirical risk

minimization algorithm is a successful agnostic PAC learner for .

CS485/685 (c) 2012 P. Poupart 6

slide-4
SLIDE 4

01/03/2012 4

Uniform Convergence

  • To show that uniform convergence holds,

show that:

1. | | is likely to be small for any fixed hypothesis (chosen before seeing the data)

  • 2. Think of as a random variable with mean

. Then the distribution of is concentrated around its mean for all ∈ .

CS485/685 (c) 2012 P. Poupart 7

Measure Concentration

  • Let be random variables with mean . Then as

→ ∞,

  • Use measure concentration inequalities to quantify

the deviation of

  • from for finite

CS485/685 (c) 2012 P. Poupart 8

slide-5
SLIDE 5

01/03/2012 5

Markov’s Inequality

  • Markov’s inequality:

∀ 0 Pr

  • Derivation:

Pr

  • Pr
  • Pr
  • Pr

CS485/685 (c) 2012 P. Poupart 9

Chebyshev’s Inequality

  • Bound deviation from the mean on both sides:

Pr Pr

  • Since

  • for i.i.d. ’s,

then Pr

  • CS485/685 (c) 2012 P. Poupart

10

slide-6
SLIDE 6

01/03/2012 6

Chebyshev’s Inequality

  • Lemma: Let , … , be i.i.d., ∀ and

1 ∀, then for any ∈ 0,1, with probability 1 , we have

  • Proof: Let Pr
  • Then
  • Hence
  • and
  • CS485/685 (c) 2012 P. Poupart

11

Hoeffding’s Inequality

  • Tighter bound than Chebyshev’s inequality
  • Let , … , be i.i.d. variables with mean
  • Assume that Pr 1
  • Then Pr
  • 2
  • Hence Pr

2

CS485/685 (c) 2012 P. Poupart 12

slide-7
SLIDE 7

01/03/2012 7

Agnostic PAC Learnability

  • Theorem: Let be finite, ∈ 0,1, 0 and
  • , then with probability at least 1 ,

we have min

CS485/685 (c) 2012 P. Poupart 13

Agnostic PAC Learnability

  • Proof: From Corollary 2, it suffices to show that

Pr ∃ ∈ ,

  • Using the union bound:

Pr ∃ ∈ ,

Pr

2

  • since
  • CS485/685 (c) 2012 P. Poupart

14