Learning From Data Lecture 6 Bounding The Growth Function Bounding - - PowerPoint PPT Presentation

learning from data lecture 6 bounding the growth function
SMART_READER_LITE
LIVE PREVIEW

Learning From Data Lecture 6 Bounding The Growth Function Bounding - - PowerPoint PPT Presentation

Learning From Data Lecture 6 Bounding The Growth Function Bounding the Growth Function Models are either Good or Bad The VC Bound - replacing |H| with m H ( N ) M. Magdon-Ismail CSCI 4100/6100 recap: The Growth Function m H ( N ) A new


slide-1
SLIDE 1

Learning From Data Lecture 6 Bounding The Growth Function

Bounding the Growth Function Models are either Good or Bad The VC Bound - replacing |H| with mH(N)

  • M. Magdon-Ismail

CSCI 4100/6100

slide-2
SLIDE 2

recap: The Growth Function mH(N)

A new measure for the diversity of a hypothesis set. H(x1, . . . , xN) = {(h(x1), . . . , h(xN))} The dichotomies (N-tuples) H implements on x1, . . . , xN.

H H viewed through D

The growth function mH(N) considers the worst possible x1, . . . , xN. mH(N) = max

x1,...,xN |H(x1, . . . , xN)|.

This lecture: Can we bound mH(N) by a polynomial in N? Can we replace |H| by mH(N) in the generalization bound?

c A M L Creator: Malik Magdon-Ismail

Bounding the Growth Function: 2 /31

Example growth functions − →

slide-3
SLIDE 3

Example Growth Functions

N 1 2 3 4 5 · · · 2-D perceptron 2 4 8 14 · · · 1-D pos. ray 2 3 4 5 · · · 2-D pos. rectangles 2 4 8 16 < 25 · · ·

  • mH(N) drops below 2N – there is hope.
  • A break point is any k for which mH(k) < 2k.

c A M L Creator: Malik Magdon-Ismail

Bounding the Growth Function: 3 /31

Quiz I − →

slide-4
SLIDE 4

Pop Quiz I

I give you a set of k∗ points x1, . . . , xk∗ on which H implements < 2k∗ dichotomys. (a) k∗ is a break point. (b) k∗ is not a break point. (c) all break points are > k∗. (d) all break points are ≤ k∗. (e) we don’t know anything about break points.

c A M L Creator: Malik Magdon-Ismail

Bounding the Growth Function: 4 /31

Answer − →

slide-5
SLIDE 5

Pop Quiz I

I give you a set of k∗ points x1, . . . , xk∗ on which H implements < 2k∗ dichotomys. (a) k∗ is a break point. (b) k∗ is not a break point. (c) all break points are > k∗. (d) all break points are ≤ k∗.

(e) we don’t know anything about break points.

c A M L Creator: Malik Magdon-Ismail

Bounding the Growth Function: 5 /31

Quiz II − →

slide-6
SLIDE 6

Pop Quiz II

For every set of k∗ points x1, . . . , xk∗, H implements < 2k∗ dichotomys. (a) k∗ is a break point. (b) k∗ is not a break point. (c) all k ≥ k∗ are break points. (d) all k < k∗ are break points. (e) we don’t know anything about break points.

c A M L Creator: Malik Magdon-Ismail

Bounding the Growth Function: 6 /31

Answer − →

slide-7
SLIDE 7

Pop Quiz II

For every set of k∗ points x1, . . . , xk∗, H implements < 2k∗ dichotomys.

(a) k∗ is a break point.

(b) k∗ is not a break point.

(c) all k ≥ k∗ are break points.

(d) all k < k∗ are break points. (e) we don’t know anything about break points.

c A M L Creator: Malik Magdon-Ismail

Bounding the Growth Function: 7 /31

Quiz III − →

slide-8
SLIDE 8

Pop Quiz III

To show that k is not a break point for H: (a) Show a set of k points x1, . . . xk which H can shatter. (b) Show H can shatter any set of k points. (c) Show a set of k points x1, . . . xk which H cannot shatter. (d) Show H cannot shatter any set of k points. (e) Show mH(k) = 2k.

c A M L Creator: Malik Magdon-Ismail

Bounding the Growth Function: 8 /31

Answer − →

slide-9
SLIDE 9

Pop Quiz III

To show that k is not a break point for H:

(a) Show a set of k points x1, . . . xk which H can shatter.

  • verkill (b) Show H can shatter any set of k points.

(c) Show a set of k points x1, . . . xk which H cannot shatter. (d) Show H cannot shatter any set of k points.

(e) Show mH(k) = 2k.

c A M L Creator: Malik Magdon-Ismail

Bounding the Growth Function: 9 /31

Quiz IV − →

slide-10
SLIDE 10

Pop Quiz IV

To show that k is a break point for H: (a) Show a set of k points x1, . . . xk which H can shatter. (b) Show H can shatter any set of k points. (c) Show a set of k points x1, . . . xk which H cannot shatter. (d) Show H cannot shatter any set of k points. (e) Show mH(k) > 2k.

c A M L Creator: Malik Magdon-Ismail

Bounding the Growth Function: 10 /31

Answer − →

slide-11
SLIDE 11

Pop Quiz IV

To show that k is a break point for H: (a) Show a set of k points x1, . . . xk which H can shatter. (b) Show H can shatter any set of k points. (c) Show a set of k points x1, . . . xk which H cannot shatter.

(d) Show H cannot shatter any set of k points.

(e) Show mH(k) > 2k.

c A M L Creator: Malik Magdon-Ismail

Bounding the Growth Function: 11 /31

Combinatorial puzzle again − →

slide-12
SLIDE 12

Back to Our Combinatorial Puzzle

How many dichotomies can you list on 4 points so that no 2 is shattered.

x1 x2 x3 x4

  • Can we add a 6th dichotomy?

c A M L Creator: Malik Magdon-Ismail

Bounding the Growth Function: 12 /31

Can’t add a 6th dichotomy − →

slide-13
SLIDE 13

Can’t Add A 6th Dichotomy

x1 x2 x3 x4

  • c

A M L Creator: Malik Magdon-Ismail

Bounding the Growth Function: 13 /31

B(N, K) − →

slide-14
SLIDE 14

The Combinatorial Quantity B(N, k)

How many dichotomies can you list on 4 points so that no 2 are shattered. ↑ ↑ N k B(N, k): Max. number of dichotomys on N points so that no k are shattered.

x1 x2 x3

  • x1 x2 x3 x4
  • B(3, 2) = 4

B(4, 2) = 5

c A M L Creator: Malik Magdon-Ismail

Bounding the Growth Function: 14 /31

B(4, 3) − →

slide-15
SLIDE 15

Let’s Try To Bound B(4, 3)

How many dichotomies can you list on 4 points so that no subset of 3 is shattered. x1 x2 x3 x4

  • c

A M L Creator: Malik Magdon-Ismail

Bounding the Growth Function: 15 /31

Two kinds of dichotomys − →

slide-16
SLIDE 16

Two Kinds of Dichotomys

Prefix appears once or prefix appears twice. x1 x2 x3 x4

  • c

A M L Creator: Malik Magdon-Ismail

Bounding the Growth Function: 16 /31

Reorder the dichotomys − →

slide-17
SLIDE 17

Reorder the Dichotomys

x1 x2 x3 x4 α

  • β
  • β
  • α: prefix appears once

β: prefix appears twice

B(4, 3) = α + 2β

c A M L Creator: Malik Magdon-Ismail

Bounding the Growth Function: 17 /31

Bound for α + β − →

slide-18
SLIDE 18

First, Bound α + β

x1 x2 x3 x4 α

  • β
  • β
  • α + β ≤ B(3, 3)

A list on 3 points, with no 3 shattered (why?)

c A M L Creator: Malik Magdon-Ismail

Bounding the Growth Function: 18 /31

Bound for β − →

slide-19
SLIDE 19

Second, Bound β

x1 x2 x3 x4 α

  • β
  • β
  • β ≤ B(3, 2)

If 2 points are shattered, then using the mirror di- chotomies you shatter 3 points (why?)

c A M L Creator: Malik Magdon-Ismail

Bounding the Growth Function: 19 /31

Combine the bounds − →

slide-20
SLIDE 20

Combining to Bound α + 2β

x1 x2 x3 x4 α

  • β
  • β
  • B(4, 3) = α + β

+ β ≤ B(3, 3) + B(3, 2) The argument generalizes to (N, k) B(N, k) ≤ B(N − 1, k)+B(N − 1, k − 1)

c A M L Creator: Malik Magdon-Ismail

Bounding the Growth Function: 20 /31

Simple boundary cases − →

slide-21
SLIDE 21

Boundary Cases: B(N, 1) and B(N, N)

k 1 2 3 4 5 6 · · · N 1 1 2 1 3 3 1 7 4 1 15 5 1 31 6 1 63 . . . . . . ... B(N, 1) = 1

(why?)

B(N, N) = 2N − 1

(why?)

c A M L Creator: Malik Magdon-Ismail

Bounding the Growth Function: 21 /31

Getting B(3, 2) − →

slide-22
SLIDE 22

Recursion Gives B(N, k) Bound

B(N, k) ≤ B(N − 1, k) + B(N − 1, k − 1) k 1 2 3 4 5 6 · · · N 1 1 2 1 3

ց ↓

3 1 4 7 4 1 15 5 1 31 6 1 63 . . . . . . . . . . . . . . . . . . . . . ...

c A M L Creator: Malik Magdon-Ismail

Bounding the Growth Function: 22 /31

Filling the table − →

slide-23
SLIDE 23

Recursion Gives B(N, k) Bound

B(N, k) ≤ B(N − 1, k) + B(N − 1, k − 1) k 1 2 3 4 5 6 · · · N 1 1 2 1 3 3 1 4 7 4 1 5 11 15 5 1 6 16 26 31 6 1 7 22 42 57 63 . . . . . . . . . . . . . . . . . . . . . ...

c A M L Creator: Malik Magdon-Ismail

Bounding the Growth Function: 23 /31

B(N, k) ≤

k−1

  • i=0

N i

slide-24
SLIDE 24

Analytic Bound for B(N, k)

Theorem. B(N, k) ≤

k−1

  • i=0

N

i

  • .

Proof: (Induction on N.)

  • 1. Verify for N = 1: B(1, 1) ≤

1

  • = 1
  • 2. Suppose B(N, k) ≤

k−1

  • i=0

N

i

  • .

Lemma. N

k

  • +

N

k−1

  • =

N+1

k

  • .

B(N + 1, k) ≤ B(N, k) + B(N, k − 1) ≤

k−1

  • i=0

N

i

  • +

k−2

  • i=0

N

i

  • =

k−1

  • i=0

N

i

  • +

k−1

  • i=1

N

i−1

  • =

1 +

k−1

  • i=1

N

i

  • +

N

i−1

  • =

1 +

k−1

  • i=1

N+1

i

  • (lemma)

=

k−1

  • i=0

N+1

i

  • c

A M L Creator: Malik Magdon-Ismail

Bounding the Growth Function: 24 /31

mH(N) ≤ B(N, k) − →

slide-25
SLIDE 25

mH(N) is bounded by B(N, k)!

  • Theorem. Suppose that H has a break point at k. Then,

mH(N) ≤ B(N, k). x1 x2 x3 x4 . . . xN

  • . . .
  • . . .
  • . . .
  • . . .
  • . . .
  • . . .
  • . . .
  • .

. . . . . . . . . . . . . . . . .

Consider any k points. They cannot be shattered (otherwise k woud not be a break point). B(N, k) is largest such list. mH(N) ≤ B(N, k)

c A M L Creator: Malik Magdon-Ismail

Bounding the Growth Function: 25 /31

Once broken, forever polynomial − →

slide-26
SLIDE 26

Once bitten, twice shy . . . Once Broken, Forever Polynomial

  • Theorem. If k is any break point for H, so mH(k) < 2k, then

mH(N) ≤

k−1

  • i=0

N

i

  • .

Facts (Problems 2.5 and 2.6):

k−1

  • i=0

N

i

        

N k−1 + 1

eN

k − 1

k−1

(polynomial in N) This is huge: if we can replace |H| with mH(N) in the bound, then learning is feasible.

c A M L Creator: Malik Magdon-Ismail

Bounding the Growth Function: 26 /31

There’s good, bad, no ugly − →

slide-27
SLIDE 27

A Hypothesis Set is either Good and Bad

N log mH(N) the good H the bad H the ugly H N mH(N) 1 2 3 4 5 · · · 2-D perceptron 2 4 8 14 · · · · · · ≤ N3 + 1 1-D pos. ray 2 3 4 5 · · · · · · ≤ N1 + 1 2-D pos. rectangles 2 4 8 16 < 25 · · · ≤ N4 + 1

c A M L Creator: Malik Magdon-Ismail

Bounding the Growth Function: 27 /31

We have a bound on mH; next: |H| ← mH? − →

slide-28
SLIDE 28

We have One Step in the Puzzle

Can we get a polynomial bound on mH(N) even for infinite H? Can we replace |H| with mH(N) in the generalization bound?

c A M L Creator: Malik Magdon-Ismail

Bounding the Growth Function: 28 /31

Ghost ‘test’ set D′ represents Eout − →

slide-29
SLIDE 29

(i) How to Deal With Eout (Sketch)

The ghost data set: a ‘fictitious’ data set D′:

Age Income Age Income

Eout Ein ր ց

Age Income

E′

in Eout Probability distribution

  • f Ein, E′

in

Ein E′

in

E′

in is like a test error on N new points.

Ein deviates from Eout implies Ein deviates from E′

in.

Ein and E′

in have the same distribution.

P[(E′

in(g), Ein(g)) “deviate”] ≥ 1 2 P [(Eout(g), Ein(g)) “deviate”]

We can analyze deviations between two in-sample errors.

c A M L Creator: Malik Magdon-Ismail

Bounding the Growth Function: 29 /31

D ∪ D′ = ⇒ mH(2N) − →

slide-30
SLIDE 30

(ii) Real Plus Ghost Data Set = 2N points

x1 x2 x3 . . . xN xN+1 xN+2 xN+3 . . . x2N

  • . . .
  • . . .
  • Number of dichotomys is at most mH(2N).

Up to technical details, analyze a “hypothesis set” of size at most mH(2N).

c A M L Creator: Malik Magdon-Ismail

Bounding the Growth Function: 30 /31

The VC-Bound − →

slide-31
SLIDE 31

The Vapnik-Chervonenkis Bound (VC Bound)

P [|Ein(g) − Eout(g)| > ǫ] ≤ 4mH(2N)e−ǫ2N/8,

for any ǫ > 0.

P [|Ein(g) − Eout(g)| ≤ ǫ] ≥ 1 − 4mH(2N)e−ǫ2N/8,

for any ǫ > 0. Eout(g) ≤ Ein(g) +

  • 8

N log 4mH(2N) δ

, w.p. at least 1 − δ.

c A M L Creator: Malik Magdon-Ismail

Bounding the Growth Function: 31 /31