The PAC Learning Framework Guoqing Zheng January 20, 2015 Guoqing - - PowerPoint PPT Presentation

the pac learning framework
SMART_READER_LITE
LIVE PREVIEW

The PAC Learning Framework Guoqing Zheng January 20, 2015 Guoqing - - PowerPoint PPT Presentation

The PAC Learning Framework Guoqing Zheng January 20, 2015 Guoqing Zheng The PAC Learning Framework Intro Questions about learning: What can be learned efficiently? (what is hard to learn?) How many examples needed? General model of


slide-1
SLIDE 1

The PAC Learning Framework

Guoqing Zheng January 20, 2015

Guoqing Zheng The PAC Learning Framework

slide-2
SLIDE 2

Intro

Questions about learning: What can be learned efficiently? (what is hard to learn?) How many examples needed? General model of learning? Answer: = ⇒ Probably Approximately Correct (PAC) learning framework

Guoqing Zheng The PAC Learning Framework

slide-3
SLIDE 3

Learning theory basics

Notations X input space, Y label space, Y = {0, 1} Concept c : X → Y, Concept class C Hypothesis h, Hypothesis set H (H may or may not be the same as C) S = (X1, ..., Xm) sample of m iid examples from unknown but fixed distribution D Task of learning: Use S to select a hypothesis hS ∈ H that has a small generalization error w.r.t to c

Guoqing Zheng The PAC Learning Framework

slide-4
SLIDE 4

Generalization error and empirical error

Generalization error: R(h) = EX∼D [I(h(X) = c(X))] (1) Empirical error: ˆ R(h) = 1 m

m

  • i=1

I(h(Xi) = c(Xi)) (2) Relationship: ES∼Dm

  • ˆ

R(h)

  • = R(h)

(3)

Guoqing Zheng The PAC Learning Framework

slide-5
SLIDE 5

PAC learning

A concept class C is PAC-learnable if there exists an algorithm A, for all distributions D on X, for any target concept c ∈ C, for any ǫ > 0 and δ > 0, after observing m ≥ poly (1/ǫ, 1/δ, n, size(c)) examples, it returns a hypothesis hS where PS∈Dm[R(hS) ≤ ǫ

  • Approximately

]

  • Probably

≥ 1 − δ (4) It is further efficiently PAC-learnable if also A runs in poly(1/ǫ, 1/δ, n, size(c)) time.

Guoqing Zheng The PAC Learning Framework

slide-6
SLIDE 6

Example: Learning axis-aligned rectangles

Figure: R is the target rectangle, R’ is the constructed rectangle

Proof the target class if PAC learnable. Construct RS=R’ as the tightest rectangle containing the positive points; Denote P(R) the probability of a point randomly drawn from D to fall within R; Error can only happen for points falling inside R. If P(R) ≤ ǫ, P(R(RS) > ǫ) = 0 < 1 − δ for any δ > 0;

Guoqing Zheng The PAC Learning Framework

slide-7
SLIDE 7

Example: Learning axis-aligned rectangles (contd.)

Figure: R is the target rectangle, R’ is the constructed rectangle

Now assume P(R) > ǫ, construct r1, r2, r3, r4 such that P(ri) = ǫ/4 for i = 1, 2, 3, 4; If RS meets all four regions, R(RS) ≤ ǫ; otherwise, if R(RS) > ǫ, RS must miss at least one of the four regions.

Guoqing Zheng The PAC Learning Framework

slide-8
SLIDE 8

Example: Learning axis-aligned rectangles (contd.)

Hence, P(R(RS) > ǫ) ≤ P 4

  • i=1

{RS ∩ ri = ∅}

  • (5)

4

  • i=1

P ({RS ∩ ri = ∅}) (6) ≤ 4(1 − ǫ/4)m (7) ≤ 4 exp(−mǫ/4) (because 1 − x ≤ e−x) (8) Let 4 exp(−mǫ/4) ≤ δ ⇔ m ≥ 4

ǫ log 4 δ. So for any ǫ > 0 and

δ > 0, when m ≥ 4

ǫ log 4 δ, P(R(RS) > ǫ) ≤ δ.

Also the representation cost for the point and for the rectangles is

  • const. Hence, the concept class of axis-aligned rectangles is

PAC-learnable.

Guoqing Zheng The PAC Learning Framework

slide-9
SLIDE 9

Generalization bounds for finite H (consistent case)

For finite H and consistent hypothesis hS, PAC learnable if m ≥ 1 ǫ

  • log |H| + log 1

δ

  • (9)

Proof: P

  • ∃h ∈ H : ˆ

R(h) = 0 ∧ R(h) > ǫ

  • (10)

=P

  • (h1 ∈ H, ˆ

R(h1) = 0 ∧ R(h1) > ǫ) (11) ∨ (h2 ∈ H, ˆ R(h2) = 0 ∧ R(h2) > ǫ) ∨ ...

  • (12)

  • h∈H

P ˆ R(h) = 0 ∧ R(h) > ǫ

  • (13)

  • h∈H

P ˆ R(h) = 0|R(h) > ǫ

  • (14)

  • h∈H

(1 − ǫ)m = |H|(1 − ǫ)m (15) ≤|H|e−mǫ (because 1 − x ≤ e−x) (16)

Guoqing Zheng The PAC Learning Framework

slide-10
SLIDE 10

Hoeffding’s inequality

Markov’s inequality: For X ≥ 0 and any ǫ > 0, P(X ≥ ǫ) ≤ ǫ−1E(X) (17) because E(X) = ∞ xp(x)dx ≥ ∞

ǫ

xp(x)dx ≥ ∞

ǫ

ǫp(x)dx (18) = ǫ ∞

ǫ

p(x)dx = ǫP(X ≥ ǫ) (19) Chernoff bounding technique: For X and any ǫ > 0, t > 0, P(X ≥ ǫ) = P(etX ≥ etǫ) ≤ e−tǫE(etX) (20)

Guoqing Zheng The PAC Learning Framework

slide-11
SLIDE 11

Hoeffding’s lemma

Hoeffding’s lemma: Suppose E(X) = 0 and a ≤ X ≤ b. Then for any t > 0 E(etX) ≤ e

t2(b−a)2 8

(21) Proof: etX = et( b−X

b−a a+ X−a b−a b) ≤ b − X

b − a eta + X − a b − a etb (22) ⇒E(etX) ≤ b b − aeta + −a t − aetb ≡ eg(u) (23) where u = t(b − a), g(u) = −γu + log(1 − γ + γeu), γ ≡ − a

b−a

Guoqing Zheng The PAC Learning Framework

slide-12
SLIDE 12

Hoeffding’s lemma (contd.)

For g(u) = −γu + log(1 − γ + γeu), we can verify g(0) = 0; g′(u) = −γ +

γeu γeu+1−γ , hence g′(0) = 0;

Further, g′′(u) = (1 − γ)γeu (γeu + 1 − γ)2 ≤ (1 − γ)γeu 4(1 − γ)γeu = 1 4 (24) (because (a + b)2 ≥ 4ab.) By Taylor’s theorem, ∃ξ ∈ (0, u) s.t. g(u) = g(0) + ug′(0) + u2 2 g′′(ξ) = u2 2 g′′(ξ) ≤ u2 8 = t2(b − a)2 8 (25) ⇒ E(etX) ≤ eg(u) ≤ e

t2(b−a)2 8

. (26)

Guoqing Zheng The PAC Learning Framework

slide-13
SLIDE 13

Hoeffding’s inequality

Hoeffding’s inequality: Let S = {X1, X2, ..., Xm} are a sample

  • f m independent variables with Xi ∈ [a, b] and common mean µ.

Let Xm be the sample mean. For any ǫ > 0, PS∈Dm Xm − µ ≥ ǫ

  • ≤ exp
  • −2mǫ2/(b − a)2

(27) PS∈Dm Xm − µ ≤ −ǫ

  • ≤ exp
  • −2mǫ2/(b − a)2

(28) Proof: For any t > 0, PS∈Dm Xm − µ ≥ ǫ

  • = PS∈Dm

m

  • i=1

Xi − mµ ≥ mǫ

  • (29)

≤ e−tmǫE

  • et(

m

i=1 Xi−mµ)

= e−tmǫ

m

  • i=1

E

  • et(Xi−µ)

(30) ≤ e−tmǫ

m

  • i=1

e

t2(b−a)2 8

= e−tmǫ+t2m(b−a)2/8 ≤ e−2mǫ2/(b−a)2. (31) The other side is similar.

Guoqing Zheng The PAC Learning Framework

slide-14
SLIDE 14

Generalization bounds for finite H (inconsistent case)

By Hoeffding’s inequality: PS∈Dm

  • | ˆ

R(h) − R(h)| ≥ ǫ

  • ≤ 2 exp(−2mǫ2)

(32) For a fixed h, w.p. ≥ 1 − δ R(h) ≤ ˆ R(h) +

  • log 2

δ

2m (33) For finite H and inconsistent case, ∀h ∈ H, w.p. ≥ 1 − δ R(h) ≤ ˆ R(h) +

  • log |H| + log 2

δ

2m (34)

Guoqing Zheng The PAC Learning Framework

slide-15
SLIDE 15

Generalization bounds for finite H (inconsistent case)

Proof: P

  • ∃h ∈ H, | ˆ

R(h) − R(h)| > ǫ

  • (35)

=P

  • | ˆ

R(h1) − R(h1)| > ǫ ∨ ... ∨ | ˆ R

  • h|H|
  • − R
  • h|H|
  • | > ǫ
  • (36)

  • h∈H

P

  • | ˆ

R(h) − R(h)| > ǫ

  • (37)

≤2|H| exp(−2mǫ2) (38) Setting the RHS to be δ complets the proof.

Guoqing Zheng The PAC Learning Framework

slide-16
SLIDE 16

Generalities

Agnostic (Non-relizable) PAC-leanring: for all D over X × Y, for any ǫ > 0 and δ > 0 and sample size m ≥ poly(1/ǫ, 1/δ, n, size(c)), if the following holds: PS∈Dm R(hS) − min

h∈H R(h) ≤ ǫ

  • ≥ 1 − δ

(39) Bayes hypothesis: hypothesis h such that R(h) = R∗ ≡ inf

h R(h)

(40) Note: Bayes hypothesis may or may not be in H Estimation and approximation errors R(h) − R∗ = (R(h) − R(h∗))

  • estimation

+ (R(h∗) − R∗)

  • approximation

(41)

Guoqing Zheng The PAC Learning Framework

slide-17
SLIDE 17

Generalities (contd.)

Estimation error can sometimes be bounded in terms of generalization error from PAC. For exmaple, for hERM

S

the hypothesis returned by empirical risk minimization, R(hERM

S

) − R(h∗) = R(hERM

S

) − ˆ R(hERM

S

) + ˆ R(hERM

S

) − R(h∗) ≤ R(hERM

S

) − ˆ R(hERM

S

) + ˆ R(h∗) − R(h∗) ≤ 2 sup

h∈H

|R(h) − ˆ R(h)| (42)

Guoqing Zheng The PAC Learning Framework

slide-18
SLIDE 18

Questions?

Thanks!

Guoqing Zheng The PAC Learning Framework