[PDF] - Ev aluating Hyp otheses [Read Ch. 5] [Recommended exercises: PDF Document

SLIDE 1 Ev aluating Hyp

theses

[Read Ch. 5] [Recommended exercises: 5.2, 5.3, 5.4]

Sample

error, true error

Condence

in terv als for

bserv

ed h yp

thesis

error

Estimators
Binomial

distribution, Normal distribution, Cen tral Limit Theorem

P

aired t tests

Comparing

learning metho ds 74 lecture slides for textb

k

Machine L e arning, T. Mitc hell, McGra w Hill, 1997

SLIDE 2 Tw

Denitions
f

Error The true error

f

h yp

thesis

h with resp ect to target function f and distribution D is the probabilit y that h will misclassify an instance dra wn at random according to D . er r

r

D (h)

Pr

x2D [f (x) 6= h(x)] The sample error

f

h with resp ect to target function f and data sample S is the prop

rtion
f

examples h misclassies er r

r

S (h)

1

n X x2S

(f

(x) 6= h(x)) Where

(f

(x) 6= h(x)) is 1 if f (x) 6= h(x), and

therwise.

Ho w w ell do es er r

r

S (h) estimate er r

r

D (h)? 75 lecture slides for textb

k

Machine L e arning, T. Mitc hell, McGra w Hill, 1997

SLIDE 3 Problems Estimating Error 1. Bias: If S is training set, er r

r

S (h) is

ptimisticall

y biased bias

E

[er r

r

S (h)]

er

r

r

D (h) F

r

un biased estimate, h and S m ust b e c hosen indep enden tly 2. V arianc e: Ev en with un biased S , er r

r

S (h) ma y still vary from er r

r

D (h) 76 lecture slides for textb

k

Machine L e arning, T. Mitc hell, McGra w Hill, 1997

SLIDE 4 Example Hyp

thesis

h misclassies 12

f

the 40 examples in S er r

r

S (h) = 12 40 = :30 What is er r

r

D (h)? 77 lecture slides for textb

k

Machine L e arning, T. Mitc hell, McGra w Hill, 1997

SLIDE 5 Estimators Exp erimen t: 1. c ho

se

sample S

f

size n according to distribution D 2. measure er r

r

S (h) er r

r

S (h) is a random v ariable (i.e., result

f

an exp erimen t) er r

r

S (h) is an un biased estimator for er r

r

D (h) Giv en

bserv

ed er r

r

S (h) what can w e conclude ab

ut

er r

r

D (h)? 78 lecture slides for textb

k

Machine L e arning, T. Mitc hell, McGra w Hill, 1997

SLIDE 6 Condence In terv als If

S

con tains n examples, dra wn indep enden tly

f

h and eac h

ther
n
30

Then

With

appro ximately 95% probabilit y , er r

r

D (h) lies in in terv al er r

r

S (h)

1:96

v u u u u t er r

r

S (h)(1

er

r

r

S (h)) n 79 lecture slides for textb

k

Machine L e arning, T. Mitc hell, McGra w Hill, 1997

SLIDE 7 Condence In terv als If

S

con tains n examples, dra wn indep enden tly

f

h and eac h

ther
n
30

Then

With

appro ximately N% probabilit y , er r

r

D (h) lies in in terv al er r

r

S (h)

z

N v u u u u t er r

r

S (h)(1

er

r

r

S (h)) n where N %: 50% 68% 80% 90% 95% 98% 99% z N : 0.67 1.00 1.28 1.64 1.96 2.33 2.58 80 lecture slides for textb

k

Machine L e arning, T. Mitc hell, McGra w Hill, 1997

SLIDE 8 er r

r

S (h) is a Random V ariable Rerun the exp erimen t with dieren t randomly dra wn S (of size n) Probabilit y

f
bserving

r misclassied examples:

0.02 0.04 0.06 0.08 0.1 0.12 0.14 5 10 15 20 25 30 35 40 P(r) Binomial distribution for n = 40, p = 0.3

P (r ) = n! r !(n

r

)! er r

r

D (h) r (1

er

r

r

D (h)) nr 81 lecture slides for textb

k

Machine L e arning, T. Mitc hell, McGra w Hill, 1997

SLIDE 9 Binomial Probabilit y Distributi

n

0.02 0.04 0.06 0.08 0.1 0.12 0.14 5 10 15 20 25 30 35 40 P(r) Binomial distribution for n = 40, p = 0.3

P (r ) = n! r !(n

r

)! p r (1

p)

nr Probabilit y P (r )

f

r heads in n coin ips, if p = Pr (heads)

Exp

ected,

r

mean v alue

f

X , E [X ], is E [X ]

n

X i=0 iP (i) = np

V

ariance

f

X is V ar (X )

E

[(X

E

[X ]) 2 ] = np(1

p)
Standard

deviation

f

X ,

X

, is

X
r

E [(X

E

[X ]) 2 ] = r np(1

p)

82 lecture slides for textb

k

Machine L e arning, T. Mitc hell, McGra w Hill, 1997

SLIDE 10 Normal Distributi

n

Appro ximates Bino- mial er r

r

S (h) follo ws a Binomial distribution, with

mean
er

r

r

S (h) = er r

r

D (h)

standard

deviation

er

r

r

S (h)

er

r

r

S (h) = v u u u u t er r

r

D (h)(1

er

r

r

D (h)) n Appro ximate this b y a Normal distribution with

mean
er

r

r

S (h) = er r

r

D (h)

standard

deviation

er

r

r

S (h)

er

r

r

S (h)

v

u u u u t er r

r

S (h)(1

er

r

r

S (h)) n 83 lecture slides for textb

k

Machine L e arning, T. Mitc hell, McGra w Hill, 1997

SLIDE 11 Normal Probabilit y Distributi

n

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

3
2
1

1 2 3 Normal distribution with mean 0, standard deviation 1

p(x) = 1 p 2

2

e

1

2 ( x

)

2 The probabilit y that X will fall in to the in terv al (a; b) is giv en b y Z b a p(x)dx

Exp

ected,

r

mean v alue

f

X , E [X ], is E [X ] =

V

ariance

f

X is V ar (X ) =

2
Standard

deviation

f

X ,

X

, is

X

=

84

lecture slides for textb

k

Machine L e arning, T. Mitc hell, McGra w Hill, 1997

SLIDE 12 Normal Probabilit y Distributi

n

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

3
2
1

1 2 3

80%

f

area (probabilit y) lies in

1:28

N%

f

area (probabilit y) lies in

z

N

N

%: 50% 68% 80% 90% 95% 98% 99% z N : 0.67 1.00 1.28 1.64 1.96 2.33 2.58 85 lecture slides for textb

k

Machine L e arning, T. Mitc hell, McGra w Hill, 1997

SLIDE 13 Condence In terv als, More Correctly If

S

con tains n examples, dra wn indep enden tly

f

h and eac h

ther
n
30

Then

With

appro ximately 95% probabilit y , er r

r

S (h) lies in in terv al er r

r

D (h)

1:96

v u u u u t er r

r

D (h)(1

er

r

r

D (h)) n equiv alen tl y , er r

r

D (h) lies in in terv al er r

r

S (h)

1:96

v u u u u t er r

r

D (h)(1

er

r

r

D (h)) n whic h is appro ximately er r

r

S (h)

1:96

v u u u u t er r

r

S (h)(1

er

r

r

S (h)) n 86 lecture slides for textb

k

Machine L e arning, T. Mitc hell, McGra w Hill, 1997

SLIDE 14 Cen tral Limit Theorem Consider a set

f

indep enden t, iden ticall y distributed random v ariables Y 1 : : : Y n , all go v erned b y an arbitrary probabilit y distribution with mean

and

nite v ariance

2

. Dene the sample mean,

Y
1

n n X i=1 Y i Cen tral Limit Theorem. As n ! 1, the distribution go v erning

Y

approac hes a Normal distribution, with mean

and

v ariance

2

n . 87 lecture slides for textb

k

Machine L e arning, T. Mitc hell, McGra w Hill, 1997

SLIDE 15 Calculating Condence In terv als 1. Pic k parameter p to estimate

er

r

r

D (h) 2. Cho

se

an estimator

er

r

r

S (h) 3. Determine probabilit y distribution that go v erns estimator

er

r

r

S (h) go v erned b y Binomial distribution, appro ximated b y Normal when n

30

4. Find in terv al (L; U ) suc h that N%

f

probabilit y mass falls in the in terv al

Use

table

f

z N v alues 88 lecture slides for textb

k

Machine L e arning, T. Mitc hell, McGra w Hill, 1997

SLIDE 16 Dierence Bet w een Hyp

theses

T est h 1

n

sample S 1 , test h 2

n

S 2 1. Pic k parameter to estimate d

er

r

r

D (h 1 )

er

r

r

D (h 2 ) 2. Cho

se

an estimator ^ d

er

r

r

S 1 (h 1 )

er

r

r

S 2 (h 2 ) 3. Determine probabilit y distribution that go v erns estimator

^

d

s

error S 1 (h 1 )(1

error

S 1 (h 1 )) n 1 + error S 2 (h 2 )(1

error

S 2 (h 2 )) n 2 4. Find in terv al (L; U ) suc h that N%

f

probabilit y mass falls in the in terv al ^ dz N v u u u u u t er r

r

S 1 (h 1 )(1

er

r

r

S 1 (h 1 )) n 1 + er r

r

S 2 (h 2 )(1

er

r

r

S 2 (h 2 )) n 2 89 lecture slides for textb

k

Machine L e arning, T. Mitc hell, McGra w Hill, 1997

SLIDE 17 P aired t test to compare h A ,h B 1. P artition data in to k disjoin t test sets T 1 ; T 2 ; : : : ; T k

f

equal size, where this size is at least 30. 2. F

r

i from 1 to k , do

i

er r

r

T i (h A )

er

r

r

T i (h B ) 3. Return the v alue

,

where

1

k k X i=1

i

N % condence in terv al estimate for d:

t

N ;k 1 s

s
v

u u u u u t 1 k (k

1)

k X i=1 ( i

)

2 Note

i

appr

ximately

Normal ly distribute d 90 lecture slides for textb

k

Machine L e arning, T. Mitc hell, McGra w Hill, 1997

SLIDE 18 Comparing learning algorithms L A and L B What w e'd lik e to estimate: E S D [er r

r

D (L A (S ))

er

r

r

D (L B (S ))] where L(S ) is the h yp

thesis
utput

b y learner L using training set S i.e., the exp ected dierence in true error b et w een h yp

theses
utput

b y learners L A and L B , when trained using randomly selected training sets S dra wn according to distribution D . But, giv en limited data D , what is a go

d

estimator?

could

partition D in to training set S and training set T , and measure er r

r

T (L A (S ))

er

r

r

T (L B (S ))

ev

en b etter, rep eat this man y times and a v erage the results (next slide) 91 lecture slides for textb

k

Machine L e arning, T. Mitc hell, McGra w Hill, 1997

SLIDE 19 Comparing learning algorithms L A and L B 1. P artition data D in to k disjoin t test sets T 1 ; T 2 ; : : : ; T k

f

equal size, where this size is at least 30. 2. F

r

i from 1 to k , do use T i for the test set, and the r emaining data for tr aining set S i

S

i fD

T

i g

h

A L A (S i )

h

B L B (S i )

i

er r

r

T i (h A )

er

r

r

T i (h B ) 3. Return the v alue

,

where

1

k k X i=1

i

92 lecture slides for textb

k

Machine L e arning, T. Mitc hell, McGra w Hill, 1997

SLIDE 20 Comparing learning algorithms L A and L B Notice w e'd lik e to use the paired t test

n
to
btain

a condence in terv al but not really correct, b ecause the training sets in this algorithm are not indep enden t (they

v

erlap!) more correct to view algorithm as pro ducing an estimate

f

E S D [er r

r

D (L A (S ))

er

r

r

D (L B (S ))] instead

f

E S D [er r

r

D (L A (S ))

er

r

r

D (L B (S ))] but ev en this appro ximation is b etter than no comparison 93 lecture slides for textb

k

Machine L e arning, T. Mitc hell, McGra w Hill, 1997