if H has a b reak p oint k m H ( N ) Hoeffding Inequality - - PowerPoint PPT Presentation

if h
SMART_READER_LITE
LIVE PREVIEW

if H has a b reak p oint k m H ( N ) Hoeffding Inequality - - PowerPoint PPT Presentation

Review of Leture 6 The V C Inequalit y is p olynomial if H has a b reak p oint k m H ( N ) Hoeffding Inequality Union Bound VC Bound space of k data sets 1 2 3 4 5 6 . . . 1 1 2 2 2 2 2 . . D 2


slide-1
SLIDE 1 Review
  • f
Le ture 6
  • mH(N)
is p
  • lynomial
if H has a b reak p
  • int k

bottom top

k 1 2 3 4 5 6 . . 1 1 2 2 2 2 2 . . 2 1 3 4 4 4 4 . . 3 1 4 7 8 8 8 . . N 4 1 5 11 . . . . . . . . 5 1 6 : . 6 1 7 : . : : : : .

mH(N) ≤

k−1

  • i=0

N i

  • maximum
p
  • w
er is Nk−1
  • The
V C Inequalit y

(a) (b) (c)

. data sets space of Hoeffding Inequality Union Bound VC Bound

D

P P [|E in(g) − E
  • ut(g)| > ǫ] ≤ 2

M e− 2 ǫ2N ↓ ↓ ↓ ↓ ↓ ↓

P P [|E in(g) − E
  • ut(g)| > ǫ] ≤ 4

mH(2N) e− 1

8 ǫ2N

slide-2
SLIDE 2 Lea rning F rom Data Y aser S. Abu-Mostafa Califo rnia Institute
  • f
T e hnology Le ture 7: The V C Dimension Sp
  • nso
red b y Calte h's Provost O e, E&AS Division, and IST
  • T
uesda y , Ap ril 24, 2012
slide-3
SLIDE 3 Outline
  • The
denition
  • V
C dimension
  • f
p er eptrons
  • Interp
reting the V C dimension
  • Generalization
b
  • unds

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 7 2/24
slide-4
SLIDE 4 Denition
  • f
V C dimension The V C dimension
  • f
a hyp
  • thesis
set H , denoted b y d v (H) , is the la rgest value
  • f N
fo r whi h mH(N) = 2N the most p
  • ints H
an shatter

N ≤ d

v (H)

= ⇒ H

an shatter N p
  • ints

k > d

v (H)

= ⇒ k

is a b reak p
  • int
fo r H

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 7 3/24
slide-5
SLIDE 5 The gro wth fun tion In terms
  • f
a b reak p
  • int k
:

mH(N) ≤

k−1

  • i=0

N i

  • In
terms
  • f
the V C dimension d v :

mH(N) ≤

d

v
  • i=0

N i

  • maximum
p
  • w
er is N d v

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 7 4/24
slide-6
SLIDE 6 Examples
  • H
is p
  • sitive
ra ys:
  • d
v = 1
  • H
is 2D p er eptrons:
  • d
v = 3
  • H
is
  • nvex
sets:

d

v = ∞

up bottom

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 7 5/24
slide-7
SLIDE 7 V C dimension and lea rning

d

v (H) is nite

= ⇒ g ∈ H

will generalize
  • Indep
endent
  • f
the lea rning algo rithm
  • Indep
endent
  • f
the input distribution
  • Indep
endent
  • f
the ta rget fun tion

HYPOTHESIS SET ALGORITHM LEARNING FINAL HYPOTHESIS H A

g ~ f ~ f: X Y

TRAINING EXAMPLES UNKNOWN TARGET FUNCTION DISTRIBUTION PROBABILITY

  • n

P

X

x y x y

N N 1 1

( , ), ... , ( , )

up down

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 7 6/24
slide-8
SLIDE 8 V C dimension
  • f
p er eptrons F
  • r d = 2, d
v = 3 In general,

d

v = d + 1 W e will p rove t w
  • dire tions:

d

v ≤ d + 1

d

v ≥ d + 1

down up

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 7 7/24
slide-9
SLIDE 9 Here is
  • ne
dire tion A set
  • f N = d + 1
p
  • ints
in Rd shattered b y the p er eptron:

X =       

x T

1

  • x
T

2

  • x
T

3

  • .
. . x T

d+1

      =        1 . . . 0 1 1 . . . 0 1 1 . . .

. . . . . . 0

1 0 . . . 1        X

is invertible

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 7 8/24
slide-10
SLIDE 10 Can w e shatter this data set? F
  • r
any y =

     y1 y2

. . .

yd+1      =      ±1 ±1

. . .

±1     ,

an w e nd a ve to r w satisfying sign(Xw) = y Easy! Just mak e sign(Xw)= y whi h means

w = X−1y

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 7 9/24
slide-11
SLIDE 11 W e an shatter these d + 1 p
  • ints
This implies what? [a℄ d v = d + 1 [b℄ d v ≥ d + 1
  • [ ℄ d
v ≤ d + 1 [d℄ No
  • n lusion

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 7 10/24
slide-12
SLIDE 12 No w, to sho w that d v ≤ d + 1 W e need to sho w that: [a℄ There a re d + 1 p
  • ints
w e annot shatter [b℄ There a re d + 2 p
  • ints
w e annot shatter [ ℄ W e annot shatter any set
  • f d + 1
p
  • ints
[d℄ W e annot shatter any set
  • f d + 2
p
  • ints
  • A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 7 11/24
slide-13
SLIDE 13 T ak e any d + 2 p
  • ints
F
  • r
any d + 2 p
  • ints,

x1, · · · , xd+1, xd+2

Mo re p
  • ints
than dimensions =

w e must have

xj =

  • i=j

ai xi

where not all the ai 's a re zeros

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 7 12/24
slide-14
SLIDE 14 So?

xj =

  • i=j

ai xi

Consider the follo wing di hotomy:

xi

's with non-zero ai get

yi =

sign(ai) and xj gets

yj = −1

No p er eptron an implement su h di hotomy!

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 7 13/24
slide-15
SLIDE 15 Why?

xj =

  • i=j

ai xi = ⇒ w

Txj =
  • i=j

ai w

Txi If yi = sign(w Txi) = sign(ai) , then

ai w

Txi > 0 This fo r es

w

Txj =
  • i=j

ai w

Txi > 0 Therefo re,

yj =

sign(w Txj) = +1

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 7 14/24
slide-16
SLIDE 16 Putting it together W e p roved

d

v ≤ d + 1 and

d

v ≥ d + 1

d

v = d + 1 What is d + 1 in the p er eptron? It is the numb er
  • f
pa rameters w0, w1, · · · , wd

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 7 15/24
slide-17
SLIDE 17 Outline
  • The
denition
  • V
C dimension
  • f
p er eptrons
  • Interp
reting the V C dimension
  • Generalization
b
  • unds

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 7 16/24
slide-18
SLIDE 18 1. Degrees
  • f
freedom P a rameters reate degrees
  • f
freedom #
  • f
pa rameters: analog degrees
  • f
freedom

d

v : equivalent `bina ry' degrees
  • f
freedom

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 7 17/24
slide-19
SLIDE 19 The usual susp e ts P
  • sitive
ra ys (d v = 1): PSfrag repla ements

x1 x2 x3 xN . . . h(x) = −1 h(x) = +1 a

0.2 0.4 0.6 0.8 1
  • 0.1
  • 0.08
  • 0.06
  • 0.04
  • 0.02
0.02 0.04 0.06 0.08 0.1 P
  • sitive
intervals (d v = 2): PSfrag repla ements

x1 x2 x3 xN . . . h(x) = −1 h(x) = −1 h(x) = +1

0.2 0.4 0.6 0.8 1
  • 0.1
  • 0.08
  • 0.06
  • 0.04
  • 0.02
0.02 0.04 0.06 0.08 0.1

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 7 18/24
slide-20
SLIDE 20 Not just pa rameters P a rameters ma y not
  • ntribute
degrees
  • f
freedom:

down down

y x

d

v measures the ee tive numb er
  • f
pa rameters

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 7 19/24
slide-21
SLIDE 21 2. Numb er
  • f
data p
  • ints
needed T w
  • small
quantities in the V C inequalit y: P P [|E in(g) − E
  • ut(g)| > ǫ] ≤ 4mH(2N)e−1

8ǫ2N

  • δ
If w e w ant ertain ǫ and δ , ho w do es N dep end
  • n d
v ? Let us lo
  • k
at

N de−N

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 7 20/24
slide-22
SLIDE 22

N de−N

Fix N de−N = small value Ho w do es N hange with d? Rule
  • f
thumb:

N ≥ 10 d

v

20 40 60 80 100 120 140 160 180 200 10

−5

10 10

5

10

10

N 30e−N

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 7 21/24
slide-23
SLIDE 23 Outline
  • The
denition
  • V
C dimension
  • f
p er eptrons
  • Interp
reting the V C dimension
  • Generalization
b
  • unds

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 7 22/24
slide-24
SLIDE 24 Rea rranging things Sta rt from the V C inequalit y: P P [|E
  • ut − E
in| > ǫ] ≤ 4mH(2N)e−1

8ǫ2N

  • δ
Get ǫ in terms
  • f δ
:

δ = 4mH(2N)e−1

8ǫ2N =

⇒ ǫ =

  • 8

N ln 4mH(2N) δ

With p robabilit y ≥ 1 − δ ,

|E

  • ut − E
in| ≤ Ω(N, H, δ)

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 7 23/24
slide-25
SLIDE 25 Generalization b
  • und
With p robabilit y ≥ 1 − δ ,

|E

  • ut − E
in| ≤ Ω(N, H, δ)

= ⇒

With p robabilit y ≥ 1 − δ ,

E

  • ut

≤ E

in + Ω

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 7 24/24