a chaining algorithm for online non parametric regression
Pierre Gaillard December 2, 2015
University of Copenhagen This is joint work with Sebastien Gerchinovitz
a chaining algorithm for online non parametric regression Pierre - - PowerPoint PPT Presentation
a chaining algorithm for online non parametric regression Pierre Gaillard December 2, 2015 University of Copenhagen This is joint work with Sebastien Gerchinovitz table of contents 1. Online prediction of arbitrary sequences 2. Finite
University of Copenhagen This is joint work with Sebastien Gerchinovitz
2
t=1(
1
4
def
n
t=1
f∈F n
t=1
4
def
n
t=1
f∈F n
t=1
Goal
4
s=1
j=1 exp
s=1
k=1
def
n
t=1
f∈F n
t=1
1
Littlestone and M. K. Warmuth (1994) and Vovk (1990)
6
k=1
for η⩽1/(8B2)
k=1
yt−fk(xt)
by definition of pk,t+1
yt−fk(xt)
n
t=1
7
Regn(F) = Regn(Fε) + inf
fε∈Fε n
∑
t=1
( yt − fε(xt) )2 − inf
f∈F n
∑
t=1
( yt − f(xt) )2 ≲ log N∞(F, ε)
+ ε n
9
ε≈1/n
j=1ujφj :
j=1ujφj :
2
Journal of Approximation Theory (2013).
10
ε≈1/n
j=1ujφj :
j=1ujφj :
2
Journal of Approximation Theory (2013).
10
ε=n−1/(p+1)
p p+1
1 1+β
3
G.G. Lorentz. “Metric Entropy, Widths, and Superpositions of Functions”. In: Amer. Math. Monthly 6 (1962).
11
ε=n−1/(p+1)
p p+1
p p+2
if p < 2 n
p−1 p
if p > 2
1 3
1 1+β
1 1+2β . 3
G.G. Lorentz. “Metric Entropy, Widths, and Superpositions of Functions”. In: Amer. Math. Monthly 6 (1962).
11
γ⩾ε⩾0
ε
ε
4
12
γ⩾ε⩾0
ε
ε
4
12
γ⩾ε⩾0
ε
ε
p p+2
4
12
ε
s
13
f∈F
f∈Fε n
t=1
f∈Fε n
t=1
∈F(0)
Kε
k=0 |small increments|⩽3γ/2k+1
14
f∈Fε n
t=1
f∈Fε n
t=1
∈F(0)
Kε
k=0 |small increments|⩽3γ/2k+1
f0∈F(0)
g0∈G(0),...,gKε ∈G(Kε) n
t=1
ε
15
def
+ : ∑N i=1 ui = 1}.
def
t−1
s=1
uk,sℓs(
n
t=1
u∈∆N n
t=1
5
Kivinen and M. Warmuth (1997) and Cesa-Bianchi (1999)
16
Let ∆N denote the simplex in RN. Goal: minimize a sequence of multi-variable losses ( u(1), . . . , u(K)) → ℓt ( u(1), . . . , u(K)) simultaneously over all variables (u(1), . . . , u(K)) ∈ ∆N1 × . . . × ∆NK.
input : tuning parameters η(1), . . . , η(K) > 0. initialization : set u(k)
1 def
= ( 1
Nk , . . . , 1 Nk
) ∈ ∆Nk for all k = 1, . . . , K. for each round t = 2, 3, . . . do Compute the weight vectors ( u(1)
t
, . . . , u(K)
t
) ∈ ∆N1 × . . . × ∆NK as follows (Z(k)
t
is a normalization factor):
t,i def
= exp ( −η(k)
t−1
∑
s=1
∂
u(k) s,i
ℓs (
s , . . . ,
u(K)
s
)) Z(k)
t
, i ∈ {1, . . . , Nk}
end
Regret bound: If the ℓt are jointly convex and differentiable with ∥∇u(k)ℓt∥∞ ⩽ G(k), then Multi-variable EG tuned with η(k) = √ 2 log(Nk)/n /G(k) satisfies:
n
∑
t=1
ℓt (
t
, . . . , u(K)
t
) − min
u(1),...,u(K) n
∑
t=1
ℓt ( u(1), . . . , u(K)) ⩽ √ 2n
K
∑
k=1
G(k)√ log Nk
17
f0∈F(0)
g0∈G(0),...,gKε ∈G(Kε) n
t=1
ε
18
19
+ + + + + +
20
21
t=1 ℓt
t=1 ℓt
(was thanks to exp-concavity)
ε
Lipschitz class on [0, 1]d Metric entropy EWA Regret Our Regret d = 1 ε−1 n2/3 n1/2 d = 2 ε−2 n3/4 n1/2 log n d ⩾ 3 ε−d n(d+1)/(d+2) n(d−1)/d
6
23
def
t=1 ℓt(
t=1 ℓt(x)
ε
ε
24
25
26
27