page
Foundations of Machine Learning
This Lecture
Basic definitions and concepts. Introduction to the problem of learning. Probability tools.
16
This Lecture Basic de fi nitions and concepts. Introduction to the - - PowerPoint PPT Presentation
This Lecture Basic de fi nitions and concepts. Introduction to the problem of learning. Probability tools. Foundations of Machine Learning page 16 De fi nitions Spaces: input space , output space . Y X Loss function:
page
Foundations of Machine Learning
16
page
Foundations of Machine Learning
17
X Y L: Y ×Y →R
page
Foundations of Machine Learning
18
S
D X×Y S = ((x1, y1), . . . , (xm, ym)). h∈H
page
Foundations of Machine Learning
19
h∈H R(h) = E
(x,y)∼D[L(h(x), y)].
m
i=1
h h measurable
R(h). h∈H S R? =0.
page
Foundations of Machine Learning
20
x ∈ X noise(x) = min{Pr[1|x], Pr[0|x]}.
page
Foundations of Machine Learning
21
page
Foundations of Machine Learning
22
page
Foundations of Machine Learning
23
estimation
+ [R(h∗) − R∗] | {z }
approximation
.
page
Foundations of Machine Learning
24
h = argmin
h∈H
b R(h). h∈H
page
Foundations of Machine Learning
25
Pr sup
h∈H
|R(h) − b R(h)| > ✏
R(h0) − R(h∗) = R(h0) − b R(h0) + b R(h0) − R(h∗) ≤ R(h0) − b R(h0) + b R(h∗) − R(h∗) ≤ 2 sup
h∈H
|R(h) − b R(h)|. How should we choose ? (model selection problem) H
page
Foundations of Machine Learning
26
γ
error γ∗
estimation approximation upper bound
H = [
γ∈Γ
Hγ.
page
Foundations of Machine Learning
27
(Vapnik, 1995)
H1 ⊂ H2 ⊂ · · · ⊂ Hn ⊂ · · · h = argmin
h∈Hn,n∈N
b R(h) + penalty(Hn, m).
page
Foundations of Machine Learning
28
h = argmin
h∈H
b R(h). Hn ⊆Hn+1 λ≥0 h = argmin
h∈H
b R(h) + λkhk2. h = argmin
h∈Hn,n∈N
b R(h) + penalty(Hn, m).
page
Foundations of Machine Learning
29
page
Foundations of Machine Learning
30
Pr[A ∨ B] ≤ Pr[A] + Pr[B]. Pr[X ≥ ✏]≤f(✏) δ >0 1−δ f(E[X])≤E[f(X)] E[X]= Z +∞ Pr[X > t] dt X ≥0 X ≤f −1(δ)
page
Foundations of Machine Learning
31
Pr[X ≥✏]≤ E[X]
✏
. Pr[|X − E[X]| ≥ ✏] ≤ 2
X
✏2 .
page
Foundations of Machine Learning
32
µ Xi ∈[a, b] a<b ✏>0 Pr µ − 1 m
m
X
i=1
Xi > ✏
✓ − 2m✏2 (b − a)2 ◆ Pr 1 m
m
i=1
page
Foundations of Machine Learning
33
(McDiarmid, 1989)
x1,...,xm,x0
i
|f(x1, . . . , xi, . . . , xm)−f(x1, . . . , x0
i, . . . , xm)|≤ci.
Pr h f(X1, . . . , Xm)−E[f(X1, . . . , Xm)]
i ≤2 exp ✓ − 2✏2 Pm
i=1 c2 i
◆ .
Foundations of Machine Learning
page
34
page
Foundations of Machine Learning
35
X
t>0 Pr[X ≥ tE[X]] ≤ 1 t . Pr[X ≥ t E[X]] = X
x≥tE[X]
Pr[X = x] ≤ X
x≥t E[X]
Pr[X = x] x t E[X] ≤ X
x
Pr[X = x] x t E[X] = E X t E[X]
t .
page
Foundations of Machine Learning
36
X Var[X]<∞ t>0 Pr[|X − E[X]| ≥ tσX] ≤ 1 t2 . Pr[|X − E[X]| ≥ tσX] = Pr[(X − E[X])2 ≥ t2σ2
X].
page
Foundations of Machine Learning
37
(Xn)n∈N µ σ2<∞ ✏>0 Xn = 1
n
Pn
i=1 Xi
lim
n→∞ Pr[|Xn − µ| ≥ ✏] = 0.
n
i=1
page
Foundations of Machine Learning
38
page
Foundations of Machine Learning
39
b−a eta + X−a b−a etb] = b b−aeta + −a b−aetb = eφ(t),
E[X]=0 X ∈ [a, b] b6=a t>0 E[etX] ≤ e
t2(b−a)2 8
. x7!etx a≤x≤b etx ≤ b − x b − aeta + x − a b − a etb. φ(t) = log(
b b−aeta + −a b−aetb) = ta + log( b b−a + −a b−aet(b−a)).
page
Foundations of Machine Learning
40
φ0(t) = a −
aet(b−a)
b b−a a b−a et(b−a) = a −
a
b b−a e−t(b−a) a b−a .
φ(0) = 0 and φ0(0) = 0.
Φ00(t) = −abet(ba) [
b baet(ba) − a ba]2
= α(1 − α)et(ba)(b − a)2 [(1 − α)et(ba) + α]2 = α [(1 − α)et(ba) + α] (1 − α)et(ba) [(1 − α)et(ba) + α](b − a)2 = u(1 − u)(b − a)2 ≤ (b − a)2 4 ,
α = −a b − a. 0≤θ≤t φ(t) = φ(0) + tφ0(0) + t2 2 φ00(θ) ≤ t2 (b − a)2 8 .
page
Foundations of Machine Learning
41
Sm = Pm
i=1 Xi
i=1(bi−ai)2
Pr[Sm − E[Sm] ≤ −✏] ≤ e−2✏2/ Pm
i=1(bi−ai)2.
X t>0
page
Foundations of Machine Learning
42
i=1 E[et(Xi−E[Xi])]
(lemma applied to Xi−E[Xi]) ≤ e−t✏Πm
i=1et2(bi−ai)2/8
= e−t✏et2 Pm
i=1(bi−ai)2/8
≤ e−2✏2/ Pm
i=1(bi−ai)2,
t = 4✏/ Pm
i=1(bi − ai)2.
page
Foundations of Machine Learning
43
✏>0 D h: X →{0, 1} Pr[ b R(h) − R(h) ≥ ✏] ≤ e−2m✏2 Pr[ b R(h) − R(h) ≤ −✏] ≤ e−2m✏2. Pr h b R(h) − R(h)
i ≤ 2e−2m✏2.
page
Foundations of Machine Learning
44
✏>0 D h: X →{0, 1} Pr[ b R(h) ≥ (1 + ✏)R(h)] ≤ e−m R(h) ✏2/3 Pr[ b R(h) ≤ (1 − ✏)R(h)] ≤ e−m R(h) ✏2/2.
page
Foundations of Machine Learning
45
(McDiarmid, 1989)
x1,...,xm,x0
i
|f(x1, . . . , xi, . . . , xm)−f(x1, . . . , x0
i, . . . , xm)|≤ci.
Pr h f(X1, . . . , Xm)−E[f(X1, . . . , Xm)]
i ≤2 exp ✓ − 2✏2 Pm
i=1 c2 i
◆ .
page
Foundations of Machine Learning
46
f(x1, . . . , xm) = 1 m
m
X
i=1
xi and ci = |bi − ai| m .
page
Foundations of Machine Learning
47
X f(E[X]) ≤ E[f(X)]. f