Advanced Machine Learning
MEHRYAR MOHRI MOHRI@
COURANT INSTITUTE & GOOGLE RESEARCH..
Advanced Machine Learning Learning Kernels MEHRYAR MOHRI - - PowerPoint PPT Presentation
Advanced Machine Learning Learning Kernels MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH .. Outline Kernel methods. Learning kernels scenario. learning bounds. algorithms. Advanced Machine Learning -
COURANT INSTITUTE & GOOGLE RESEARCH..
page
Advanced Machine Learning - Mohri@
2
page
Advanced Machine Learning - Mohri@
3
page
Advanced Machine Learning - Mohri@
4
page
Advanced Machine Learning - Mohri@
5
page
Advanced Machine Learning - Mohri@
6
1
2
1
2
page
Advanced Machine Learning - Mohri@
(1, 1, − √ 2, + √ 2, − √ 2, 1)
√ 2 x1x2 √ 2 x1
(1, 1, − √ 2, − √ 2, + √ 2, 1) (1, 1, + √ 2, − √ 2, − √ 2, 1) (1, 1, + √ 2, + √ 2, + √ 2, 1)
7
(1, 1) (−1, 1) (−1, −1) (1, −1)
page
Advanced Machine Learning - Mohri@
8
σ2
page
Advanced Machine Learning - Mohri@
9
w,b
m
α m
m
m
(Cortes and Vapnik, 1995; Boser, Guyon, and Vapnik, 1992)
page
Advanced Machine Learning - Mohri@
10
(Hoerl and Kennard, 1970; Sanders et al., 1998)
αRm −α (K + λI)α + 2α y.
w λw2 + m
page
Advanced Machine Learning - Mohri@
11
page
Advanced Machine Learning - Mohri@
12
page
Advanced Machine Learning - Mohri@
13
page
Advanced Machine Learning - Mohri@
14
page
Advanced Machine Learning - Mohri@
15
p
µp
page
Advanced Machine Learning - Mohri@
16
1 p − 1 q xq.
xp = N
|xi|p 1
p
(|xi|p)
q p
p
q N
(1)
q q−p
1− p
q
p
= xqN
1 p − 1 q .
kxkp kxkq p =
N
X
i=1
|xi| kxkq | {z }
≤1
p
X
i=1
|xi| kxkq q = 1.
page
Advanced Machine Learning - Mohri@
17
(Koltchinskii and Panchenko, 2002)
δ
page
Advanced Machine Learning - Mohri@
18
(Cortes, MM, and Rostamizadeh, 2010)
1 q + 1 r =1
δ
page
Advanced Machine Learning - Mohri@
19
1 q + 1 r =1 b RS(Hq) = 1 m E
σ
h sup
h2Hq m
X
i=1
σih(xi) i = 1 m E
σ
h sup
µ2∆q,α>Kµα1 m
X
i,j=1
σiαjKµ(xi, xj) i = 1 m E
σ
h sup
µ2∆q,α>Kµα1
σ>Kµα i = 1 m E
σ
h sup
µ2∆q,kαk
K1/2 µ
1
hσ, αiK1/2
µ
i = 1 m E
σ
h sup
µ2∆q
q σ>Kµσ i (Cauchy-Schwarz) = 1 m E
σ
h sup
µ2∆q
pµ · uσ i ⇥ uσ = (σ>K1σ, . . . , σ>Kpσ)>) ⇤ = 1 m E
σ
⇥p kuσkr ⇤ . (definition of dual norm)
page
Advanced Machine Learning - Mohri@
20
(Cortes, MM, and Rostamizadeh, 2010)
σ
page
Advanced Machine Learning - Mohri@
21
b RS(Hq) = 1 m E
σ
⇥p kuσkr ⇤ 1 m E
σ
⇥p kuσks ⇤ = 1 m E
σ
hh
p
X
k=1
(σ>Kkσ)si 1
2s i
1 m h E
σ
h
p
X
k=1
(σ>Kkσ)sii 1
2s (Jensen’s inequality)
= 1 m h
p
X
k=1
E
σ
h (σ>Kkσ)sii 1
2s
1 m h
p
X
k=1
⇣ s Tr[Kk] ⌘si 1
2s =
p skuks m . (lemma)
page
Advanced Machine Learning - Mohri@
(Cortes, MM, and Rostamizadeh, 2010)
x
p
k=1 Tr[Kk]
δ
page
Advanced Machine Learning - Mohri@
23
1 s
k=1
s
1 s
p
k=1 Tr[Kk].
δ
page
Advanced Machine Learning - Mohri@
24
k
page
Advanced Machine Learning - Mohri@
(Srebro and Ben-David, 2006)
25
R(h) ≤ Rρ(h) + ⇥ 8 2 + p log 128em3R2
ρ2p
+ 256 R2
ρ2 log ρem 8R log 128mR2 ρ2
+ log(1/δ) m .
page
Advanced Machine Learning - Mohri@
26
page
Advanced Machine Learning - Mohri@
(Cortes, MM, and Rostamizadeh, 2010)
1 q + 1 r =1
x
1 r maxp
k=1 Tr[Kk]
δ
page
Advanced Machine Learning - Mohri@
28
1 2r
1 4
k=1 µk p
1 r µq = p 1 r
k=1 µkKk =
k=1 µk
HK1= (p k=1 µk)h2 HK
k=1 µk =0
1 2r }
page
Advanced Machine Learning - Mohri@
29
page
Advanced Machine Learning - Mohri@
30
KK
α
page
Advanced Machine Learning - Mohri@
31
µ∆ max α
page
Advanced Machine Learning - Mohri@
32
µ∆1 max αA 2 α1 − αYKµYα
αA min µ∆1 2 α1 − αYKµYα
αA 2 α1 − max µ∆1 αYKµYα
αA 2 α1 − max k[1,p] αYKkYα.
k=1 µkKk µ ∈ ∆1
page
Advanced Machine Learning - Mohri@
33
(Lanckriet et al., 2004)
α,t
page
Advanced Machine Learning - Mohri@
34
min
w,µ∈∆q
1 2
p
X
k=1
kwkk2
2
µk + C
m
X
i=1
max ( 0, 1 yi p X
k=1
wk · Φk(xi) !) .
page
Advanced Machine Learning - Mohri@
35
page
Advanced Machine Learning - Mohri@
36
page
Advanced Machine Learning - Mohri@
37
µ max α
p
(Cortes, MM, and Rostamizadeh, 2009)
page
Advanced Machine Learning - Mohri@
38
µ
(Kµ + λI)1y
∂F ∂µk = Tr ∂y(Kµ + λI)1y ∂(Kµ + λI) ∂(Kµ + λI) ∂µk
∂µk
page
Advanced Machine Learning - Mohri@
39
k max(0, µ k)
µµ0
page
Advanced Machine Learning - Mohri@
40
v
page
Advanced Machine Learning - Mohri@
41
1000 2000 3000 4000 5000 6000 .52 .54 .56 .58 0.6 .62 Reuters (acq) baseline L2 L1
2000 4000 6000 1.44 1.46 1.48 1.5 1.52 1.54 1.56 DVD baseline L1 L2
(Cortes, MM, and Rostamizadeh, 2009)
page
Advanced Machine Learning - Mohri@
42
Advanced Machine Learning - Mohri@
page
Bousquet, Olivier and Herrmann, Daniel J. L. On the complexity of learning the kernel matrix. In NIPS, 2002. Corinna Cortes, Marius Kloft, and Mehryar Mohri. Learning kernels using local Rademacher complexity. In Proceedings of NIPS, 2013. Corinna Cortes, Mehryar Mohri, and Afshin Rostamizadeh. Learning non-linear combinations of kernels. In NIPS, 2009. Cortes, Corinna, Mohri, Mehryar, and Rostamizadeh, Afshin. Generalization Bounds for Learning Kernels. In ICML, 2010. Cortes, Corinna, Mohri, Mehryar, and Rostamizadeh, Afshin. Two-stage learning kernel methods. In ICML, 2010. Corinna Cortes, Mehryar Mohri, Afshin Rostamizadeh. Algorithms for Learning Kernels Based on Centered Alignment. JMLR 13: 795-828, 2012.
43
Advanced Machine Learning - Mohri@
page
Corinna Cortes, Mehryar Mohri, and Afshin Rostamizadeh. Tutorial: Learning
Zakria Hussain, John Shawe-Taylor. Improved Loss Bounds For Multiple Kernel
Sham M. Kakade, Shai Shalev-Shwartz, Ambuj Tewari: Regularization Techniques for Learning with Matrices. JMLR 13: 1865-1890, 2012. Koltchinskii, V. and Panchenko, D. Empirical margin distributions and bounding the generalization error of combined classifiers. Annals of Statistics, 30, 2002. Koltchinskii, Vladimir and Yuan, Ming. Sparse recovery in large ensembles of kernel machines on-line learning and bandits. In COLT, 2008. Lanckriet, Gert, Cristianini, Nello, Bartlett, Peter, Ghaoui, Laurent El, and Jordan,
2004.
44
Advanced Machine Learning - Mohri@
page
Mehmet Gönen, Ethem Alpaydin: Multiple Kernel Learning Algorithms. JMLR 12: 2211-2268 (2011). Srebro, Nathan and Ben-David, Shai. Learning bounds for support vector machines with learned kernels. In COLT, 2006. Ying, Yiming and Campbell, Colin. Generalization bounds for learning the kernel
45