Neural Information Processing Systems
Slide 1 / 52
Statistical Learning Theory: A Hitchhikers Guide John Shawe-Taylor - - PowerPoint PPT Presentation
Statistical Learning Theory: A Hitchhikers Guide John Shawe-Taylor UCL Omar Rivasplata UCL / DeepMind December 2018 Neural Information Processing Systems Slide 1 / 52 Why SLT NeurIPS 2018 Slide 2 / 52 Error distribution picture 20 Parzen
Neural Information Processing Systems
Slide 1 / 52
NeurIPS 2018 Slide 2 / 52
NeurIPS 2018 Slide 3 / 52
0.2 0.4 0.6 0.8 1 2 4 6 8 10 12 14 16 18 20
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 4 / 52
NeurIPS 2018 Slide 5 / 52
0.2 0.4 0.6 0.8 1 2 4 6 8 10 12 14 16 18 20
NeurIPS 2018 Slide 6 / 52
NeurIPS 2018 Slide 7 / 52
NeurIPS 2018 Slide 8 / 52
NeurIPS 2018 Slide 9 / 52
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 10 / 52
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 11 / 52
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 12 / 52
m
i=1 ℓ(h(Xi), Yi)
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 13 / 52
NeurIPS 2018 Slide 14 / 52
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 15 / 52
m
i=1 ℓ(h(Xi), Yi)
2m log
δ
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 16 / 52
f∈H Pm
2m log
δ
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 17 / 52
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 18 / 52
m log
δ
k=0
k
m log2em d
m log4 δ
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 19 / 52
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 20 / 52
NeurIPS 2018 Slide 21 / 52
0.2 0.4 0.6 0.8 1 5 10 15 20 25 30 35 40
2σ2
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 22 / 52
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 23 / 52
NeurIPS 2018 Slide 24 / 52
NeurIPS 2018 Slide 25 / 52
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 26 / 52
k∈N Hk
k wk ≤ 1.
h∈H
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 27 / 52
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 28 / 52
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 29 / 52
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 30 / 52
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 31 / 52
h∈H
1, . . . , Z′ m),
h∈H
h∈H
m
i) − ℓ(h, Zi)
h∈H
m
i) − ℓ(h, Zi)
h∈H
m
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 32 / 52
h∈H
m
h∈H
h∈H
h∈H
2m log2 δ
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 33 / 52
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 34 / 52
m
m
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 35 / 52
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 36 / 52
δ
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 37 / 52
1 2µ2 + logm+1 δ
2µ2
1 √ 2π
−∞ e−x2/2dx
µ Rout(Qµ)
NeurIPS 2018 Slide 38 / 52
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 39 / 52
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 40 / 52
NeurIPS 2018 Slide 41 / 52
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 42 / 52
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 43 / 52
1, . . . , z′ m)
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 44 / 52
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 45 / 52
riskS (h)
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 46 / 52
nβ2 2σ2
2 log 1 δ
δ
KL(QQ0)+log
n+1 δ
2 log1 δ
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 47 / 52
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 48 / 52
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 49 / 52
Why SLT Overview Notation First generation Second generation Next generation NeurIPS 2018 Slide 50 / 52
NeurIPS 2018 Slide 51 / 52
Research, 19(50):1–34, 2018
44(4):615–631, 1997
Press, 1992
https://arxiv.org/abs/math/0507180v3, 2011
3:463–482, 2002
UK, 2014
Conference on Computational Learning Theory (COLT), 2008
Processing Systems, 2013
parameters than training data. CoRR, abs/1703.11008, 2017
¸ois Laviolette, and Mario Marchand. PAC-Bayes risk bounds for general loss functions. In Proceedings of the 2006 conference on Neural Information Processing Systems (NIPS-06), accepted, 2006
¸ois Laviolette, and Mario Marchand. PAC-Bayes risk bounds for general loss functions. In Proceedings of the 2006 conference on Neural Information Processing Systems (NIPS-06), accepted, 2006
Processing Systems, 2011
459, 2000
NeurIPS 2018 Slide 52 / 52
Press
(COLT), 2003
1989, pages 148–188. Cambridge University Press, Cambridge, 1989
andez, Amiran Ambroladze, John Shawe-Taylor, and Shiliang Sun. Pac-bayes bounds with data dependent priors. J. Mach.
Statistics, 1998. (To appear. An earlier version appeared in: D.H. Fisher, Jr. (ed.), Proceedings ICML97, Morgan Kaufmann.)
University of Edinburgh, 2003
Transactions on Information Theory, 44(5), 1998
error of kernel-pca. IEEE Transactions on Information Theory, 51:2510–2522, 2005
International ACM SIGIR Conference on Research and Development in Information Retrieval, 2000
181:915 – 918, 1968
Applications, 16(2):264–280, 1971