SMILE Seminar, 24 September 2012
Mixability in Statistical Learning
Tim van Erven
Joint work with: Peter Grünwald, Mark Reid, Bob Williamson
Mixability in Statistical Learning Tim van Erven Joint work with: - - PowerPoint PPT Presentation
Mixability in Statistical Learning Tim van Erven Joint work with: Peter Grnwald, Mark Reid, Bob Williamson SMILE Seminar, 24 September 2012 Summary Stochastic mixability fast rates of convergence in different settings: statistical
SMILE Seminar, 24 September 2012
Joint work with: Peter Grünwald, Mark Reid, Bob Williamson
iid
f∈F
iid
f∈F
iid
f∈F
iid
f∈F
f∈F
E e−⌘`(Y,pµ) e−⌘`(Y,pµ∗)
1 √ 2⇡⌧ 2
Z e−
η 2σ2 (y−µ)2+ η 2σ2 (y−µ∗)2− 1 2τ2 (y−µ∗)2dy
=
1 √ 2⇡⌧ 2
Z e−
1 2τ2 (y−µ)2dy = 1
E e−⌘`(Y,pµ) e−⌘`(Y,pµ∗)
1 √ 2⇡⌧ 2
Z e−
η 2σ2 (y−µ)2+ η 2σ2 (y−µ∗)2− 1 2τ2 (y−µ∗)2dy
=
1 √ 2⇡⌧ 2
Z e−
1 2τ2 (y−µ)2dy = 1
d(f, f ∗) = E[`(Y, f(X)) − `(Y, f ∗(X))] V (f, f ∗) = E
2
κ ≥ 1, c0 > 0 [Tsybakov, 2004]
c0 > 0 κ = 1 ` [0, V ] (`, F, P ∗)
κ = 1
against adversarial data
t , . . . , ˆ
t
n
t=1
k
n
t=1
t (xt))
t , . . . , ˆ
t
n
t=1
k
n
t=1
t (xt))
pθ∈F
where denotes the convex hull of .
p∈F E[− ln p(Y )] =
p∈co(F) E[− ln p(Y )],
where denotes the convex hull of .
p∗ p∗ pθ∗ pθ∗ Stochastically mixable Not stochastically mixable
F F
co(F) co(F)
p∈F E[− ln p(Y )] =
p∈co(F) E[− ln p(Y )],
min
p∈PF(η) E[− 1 η ln p(Y |X)] =
min
p∈co(PF(η)) E[− 1 η ln p(Y |X)]
f∈F
n
i=1
m
min
p∈PF(η) E[− 1 η ln p(Y |X)] =
min
p∈co(PF(η)) E[− 1 η ln p(Y |X)]
min
p∈PF(η) E[− 1 η ln p(Y |X)] =
min
p∈co(PF(η)) E[− 1 η ln p(Y |X)]
min
p∈PF(η) 1 n n
X
i=1
− 1
η ln p(Yi|Xi) ≥
min
p∈co(PF(η)) 1 n n
X
i=1
− 1
η ln p(Yi|Xi) − something
where “something” depends on concentration inequalities and penalty function.
Statistics, 2006
2004.
System Sciences, 74:1228–1244, 2008.
Computational Learning Theory, pages 51–60. ACM, 1995.
Slides and NIPS 2012 paper: www.timvanerven.nl