Ev aluating Hyp otheses [Read Ch. 5] [Recommended exercises: - - PDF document

ev aluating hyp otheses read ch 5 recommended exercises 5
SMART_READER_LITE
LIVE PREVIEW

Ev aluating Hyp otheses [Read Ch. 5] [Recommended exercises: - - PDF document

Ev aluating Hyp otheses [Read Ch. 5] [Recommended exercises: 5.2, 5.3, 5.4] Sample error, true error Condence in terv als for observ ed h yp othesis error Estimators Binomial distribution, Normal


slide-1
SLIDE 1 Ev aluating Hyp
  • theses
[Read Ch. 5] [Recommended exercises: 5.2, 5.3, 5.4]
  • Sample
error, true error
  • Condence
in terv als for
  • bserv
ed h yp
  • thesis
error
  • Estimators
  • Binomial
distribution, Normal distribution, Cen tral Limit Theorem
  • P
aired t tests
  • Comparing
learning metho ds 74 lecture slides for textb
  • k
Machine L e arning, T. Mitc hell, McGra w Hill, 1997
slide-2
SLIDE 2 Tw
  • Denitions
  • f
Error The true error
  • f
h yp
  • thesis
h with resp ect to target function f and distribution D is the probabilit y that h will misclassify an instance dra wn at random according to D . er r
  • r
D (h)
  • Pr
x2D [f (x) 6= h(x)] The sample error
  • f
h with resp ect to target function f and data sample S is the prop
  • rtion
  • f
examples h misclassies er r
  • r
S (h)
  • 1
n X x2S
  • (f
(x) 6= h(x)) Where
  • (f
(x) 6= h(x)) is 1 if f (x) 6= h(x), and
  • therwise.
Ho w w ell do es er r
  • r
S (h) estimate er r
  • r
D (h)? 75 lecture slides for textb
  • k
Machine L e arning, T. Mitc hell, McGra w Hill, 1997
slide-3
SLIDE 3 Problems Estimating Error 1. Bias: If S is training set, er r
  • r
S (h) is
  • ptimisticall
y biased bias
  • E
[er r
  • r
S (h)]
  • er
r
  • r
D (h) F
  • r
un biased estimate, h and S m ust b e c hosen indep enden tly 2. V arianc e: Ev en with un biased S , er r
  • r
S (h) ma y still vary from er r
  • r
D (h) 76 lecture slides for textb
  • k
Machine L e arning, T. Mitc hell, McGra w Hill, 1997
slide-4
SLIDE 4 Example Hyp
  • thesis
h misclassies 12
  • f
the 40 examples in S er r
  • r
S (h) = 12 40 = :30 What is er r
  • r
D (h)? 77 lecture slides for textb
  • k
Machine L e arning, T. Mitc hell, McGra w Hill, 1997
slide-5
SLIDE 5 Estimators Exp erimen t: 1. c ho
  • se
sample S
  • f
size n according to distribution D 2. measure er r
  • r
S (h) er r
  • r
S (h) is a random v ariable (i.e., result
  • f
an exp erimen t) er r
  • r
S (h) is an un biased estimator for er r
  • r
D (h) Giv en
  • bserv
ed er r
  • r
S (h) what can w e conclude ab
  • ut
er r
  • r
D (h)? 78 lecture slides for textb
  • k
Machine L e arning, T. Mitc hell, McGra w Hill, 1997
slide-6
SLIDE 6 Condence In terv als If
  • S
con tains n examples, dra wn indep enden tly
  • f
h and eac h
  • ther
  • n
  • 30
Then
  • With
appro ximately 95% probabilit y , er r
  • r
D (h) lies in in terv al er r
  • r
S (h)
  • 1:96
v u u u u t er r
  • r
S (h)(1
  • er
r
  • r
S (h)) n 79 lecture slides for textb
  • k
Machine L e arning, T. Mitc hell, McGra w Hill, 1997
slide-7
SLIDE 7 Condence In terv als If
  • S
con tains n examples, dra wn indep enden tly
  • f
h and eac h
  • ther
  • n
  • 30
Then
  • With
appro ximately N% probabilit y , er r
  • r
D (h) lies in in terv al er r
  • r
S (h)
  • z
N v u u u u t er r
  • r
S (h)(1
  • er
r
  • r
S (h)) n where N %: 50% 68% 80% 90% 95% 98% 99% z N : 0.67 1.00 1.28 1.64 1.96 2.33 2.58 80 lecture slides for textb
  • k
Machine L e arning, T. Mitc hell, McGra w Hill, 1997
slide-8
SLIDE 8 er r
  • r
S (h) is a Random V ariable Rerun the exp erimen t with dieren t randomly dra wn S (of size n) Probabilit y
  • f
  • bserving
r misclassied examples:

0.02 0.04 0.06 0.08 0.1 0.12 0.14 5 10 15 20 25 30 35 40 P(r) Binomial distribution for n = 40, p = 0.3

P (r ) = n! r !(n
  • r
)! er r
  • r
D (h) r (1
  • er
r
  • r
D (h)) nr 81 lecture slides for textb
  • k
Machine L e arning, T. Mitc hell, McGra w Hill, 1997
slide-9
SLIDE 9 Binomial Probabilit y Distributi
  • n

0.02 0.04 0.06 0.08 0.1 0.12 0.14 5 10 15 20 25 30 35 40 P(r) Binomial distribution for n = 40, p = 0.3

P (r ) = n! r !(n
  • r
)! p r (1
  • p)
nr Probabilit y P (r )
  • f
r heads in n coin ips, if p = Pr (heads)
  • Exp
ected,
  • r
mean v alue
  • f
X , E [X ], is E [X ]
  • n
X i=0 iP (i) = np
  • V
ariance
  • f
X is V ar (X )
  • E
[(X
  • E
[X ]) 2 ] = np(1
  • p)
  • Standard
deviation
  • f
X ,
  • X
, is
  • X
  • r
E [(X
  • E
[X ]) 2 ] = r np(1
  • p)
82 lecture slides for textb
  • k
Machine L e arning, T. Mitc hell, McGra w Hill, 1997
slide-10
SLIDE 10 Normal Distributi
  • n
Appro ximates Bino- mial er r
  • r
S (h) follo ws a Binomial distribution, with
  • mean
  • er
r
  • r
S (h) = er r
  • r
D (h)
  • standard
deviation
  • er
r
  • r
S (h)
  • er
r
  • r
S (h) = v u u u u t er r
  • r
D (h)(1
  • er
r
  • r
D (h)) n Appro ximate this b y a Normal distribution with
  • mean
  • er
r
  • r
S (h) = er r
  • r
D (h)
  • standard
deviation
  • er
r
  • r
S (h)
  • er
r
  • r
S (h)
  • v
u u u u t er r
  • r
S (h)(1
  • er
r
  • r
S (h)) n 83 lecture slides for textb
  • k
Machine L e arning, T. Mitc hell, McGra w Hill, 1997
slide-11
SLIDE 11 Normal Probabilit y Distributi
  • n

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

  • 3
  • 2
  • 1

1 2 3 Normal distribution with mean 0, standard deviation 1

p(x) = 1 p 2
  • 2
e
  • 1
2 ( x
  • )
2 The probabilit y that X will fall in to the in terv al (a; b) is giv en b y Z b a p(x)dx
  • Exp
ected,
  • r
mean v alue
  • f
X , E [X ], is E [X ] =
  • V
ariance
  • f
X is V ar (X ) =
  • 2
  • Standard
deviation
  • f
X ,
  • X
, is
  • X
=
  • 84
lecture slides for textb
  • k
Machine L e arning, T. Mitc hell, McGra w Hill, 1997
slide-12
SLIDE 12 Normal Probabilit y Distributi
  • n

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

  • 3
  • 2
  • 1

1 2 3

80%
  • f
area (probabilit y) lies in
  • 1:28
N%
  • f
area (probabilit y) lies in
  • z
N
  • N
%: 50% 68% 80% 90% 95% 98% 99% z N : 0.67 1.00 1.28 1.64 1.96 2.33 2.58 85 lecture slides for textb
  • k
Machine L e arning, T. Mitc hell, McGra w Hill, 1997
slide-13
SLIDE 13 Condence In terv als, More Correctly If
  • S
con tains n examples, dra wn indep enden tly
  • f
h and eac h
  • ther
  • n
  • 30
Then
  • With
appro ximately 95% probabilit y , er r
  • r
S (h) lies in in terv al er r
  • r
D (h)
  • 1:96
v u u u u t er r
  • r
D (h)(1
  • er
r
  • r
D (h)) n equiv alen tl y , er r
  • r
D (h) lies in in terv al er r
  • r
S (h)
  • 1:96
v u u u u t er r
  • r
D (h)(1
  • er
r
  • r
D (h)) n whic h is appro ximately er r
  • r
S (h)
  • 1:96
v u u u u t er r
  • r
S (h)(1
  • er
r
  • r
S (h)) n 86 lecture slides for textb
  • k
Machine L e arning, T. Mitc hell, McGra w Hill, 1997
slide-14
SLIDE 14 Cen tral Limit Theorem Consider a set
  • f
indep enden t, iden ticall y distributed random v ariables Y 1 : : : Y n , all go v erned b y an arbitrary probabilit y distribution with mean
  • and
nite v ariance
  • 2
. Dene the sample mean,
  • Y
  • 1
n n X i=1 Y i Cen tral Limit Theorem. As n ! 1, the distribution go v erning
  • Y
approac hes a Normal distribution, with mean
  • and
v ariance
  • 2
n . 87 lecture slides for textb
  • k
Machine L e arning, T. Mitc hell, McGra w Hill, 1997
slide-15
SLIDE 15 Calculating Condence In terv als 1. Pic k parameter p to estimate
  • er
r
  • r
D (h) 2. Cho
  • se
an estimator
  • er
r
  • r
S (h) 3. Determine probabilit y distribution that go v erns estimator
  • er
r
  • r
S (h) go v erned b y Binomial distribution, appro ximated b y Normal when n
  • 30
4. Find in terv al (L; U ) suc h that N%
  • f
probabilit y mass falls in the in terv al
  • Use
table
  • f
z N v alues 88 lecture slides for textb
  • k
Machine L e arning, T. Mitc hell, McGra w Hill, 1997
slide-16
SLIDE 16 Dierence Bet w een Hyp
  • theses
T est h 1
  • n
sample S 1 , test h 2
  • n
S 2 1. Pic k parameter to estimate d
  • er
r
  • r
D (h 1 )
  • er
r
  • r
D (h 2 ) 2. Cho
  • se
an estimator ^ d
  • er
r
  • r
S 1 (h 1 )
  • er
r
  • r
S 2 (h 2 ) 3. Determine probabilit y distribution that go v erns estimator
  • ^
d
  • s
error S 1 (h 1 )(1
  • error
S 1 (h 1 )) n 1 + error S 2 (h 2 )(1
  • error
S 2 (h 2 )) n 2 4. Find in terv al (L; U ) suc h that N%
  • f
probabilit y mass falls in the in terv al ^ dz N v u u u u u t er r
  • r
S 1 (h 1 )(1
  • er
r
  • r
S 1 (h 1 )) n 1 + er r
  • r
S 2 (h 2 )(1
  • er
r
  • r
S 2 (h 2 )) n 2 89 lecture slides for textb
  • k
Machine L e arning, T. Mitc hell, McGra w Hill, 1997
slide-17
SLIDE 17 P aired t test to compare h A ,h B 1. P artition data in to k disjoin t test sets T 1 ; T 2 ; : : : ; T k
  • f
equal size, where this size is at least 30. 2. F
  • r
i from 1 to k , do
  • i
er r
  • r
T i (h A )
  • er
r
  • r
T i (h B ) 3. Return the v alue
  • ,
where
  • 1
k k X i=1
  • i
N % condence in terv al estimate for d:
  • t
N ;k 1 s
  • s
  • v
u u u u u t 1 k (k
  • 1)
k X i=1 ( i
  • )
2 Note
  • i
appr
  • ximately
Normal ly distribute d 90 lecture slides for textb
  • k
Machine L e arning, T. Mitc hell, McGra w Hill, 1997
slide-18
SLIDE 18 Comparing learning algorithms L A and L B What w e'd lik e to estimate: E S D [er r
  • r
D (L A (S ))
  • er
r
  • r
D (L B (S ))] where L(S ) is the h yp
  • thesis
  • utput
b y learner L using training set S i.e., the exp ected dierence in true error b et w een h yp
  • theses
  • utput
b y learners L A and L B , when trained using randomly selected training sets S dra wn according to distribution D . But, giv en limited data D , what is a go
  • d
estimator?
  • could
partition D in to training set S and training set T , and measure er r
  • r
T (L A (S ))
  • er
r
  • r
T (L B (S ))
  • ev
en b etter, rep eat this man y times and a v erage the results (next slide) 91 lecture slides for textb
  • k
Machine L e arning, T. Mitc hell, McGra w Hill, 1997
slide-19
SLIDE 19 Comparing learning algorithms L A and L B 1. P artition data D in to k disjoin t test sets T 1 ; T 2 ; : : : ; T k
  • f
equal size, where this size is at least 30. 2. F
  • r
i from 1 to k , do use T i for the test set, and the r emaining data for tr aining set S i
  • S
i fD
  • T
i g
  • h
A L A (S i )
  • h
B L B (S i )
  • i
er r
  • r
T i (h A )
  • er
r
  • r
T i (h B ) 3. Return the v alue
  • ,
where
  • 1
k k X i=1
  • i
92 lecture slides for textb
  • k
Machine L e arning, T. Mitc hell, McGra w Hill, 1997
slide-20
SLIDE 20 Comparing learning algorithms L A and L B Notice w e'd lik e to use the paired t test
  • n
  • to
  • btain
a condence in terv al but not really correct, b ecause the training sets in this algorithm are not indep enden t (they
  • v
erlap!) more correct to view algorithm as pro ducing an estimate
  • f
E S D [er r
  • r
D (L A (S ))
  • er
r
  • r
D (L B (S ))] instead
  • f
E S D [er r
  • r
D (L A (S ))
  • er
r
  • r
D (L B (S ))] but ev en this appro ximation is b etter than no comparison 93 lecture slides for textb
  • k
Machine L e arning, T. Mitc hell, McGra w Hill, 1997