Setting Adaptativity of Stochastic Gradient Descent Aymeric - - PowerPoint PPT Presentation

setting
SMART_READER_LITE
LIVE PREVIEW

Setting Adaptativity of Stochastic Gradient Descent Aymeric - - PowerPoint PPT Presentation

Setting Adaptativity of Stochastic Gradient Descent Aymeric Dieuleveut F. Bach, Non parametric stochastic approximation with large step sizes , in the Annals of Statistics Setting : random-design least-squares regression problem in a RKHS


slide-1
SLIDE 1

Setting

Adaptativity of Stochastic Gradient Descent Aymeric Dieuleveut F. Bach, Non parametric stochastic approximation with large step sizes, in the Annals of Statistics Setting : random-design least-squares regression problem in a RKHS framework. Risk : for g : X → R ε(g) := Eρ

  • (g(X) − Y )2

. We thus want to minimize prediction error. Regression function : gρ(X) = E[Y |X] minimises ε on L2

ρX .

We build a sequence (gk) of estimators in an RKHS H. Why considering RKHS ? hypothesis space for non parametric regression, high dimensional problem (d >> n) analysis framework, natural analysis when mapping data in feature space via a p.d. kernel.

Aymeric Dieuleveut Adaptativity of SGD 1 / 3

slide-2
SLIDE 2

Regularity assumptions

Algorithm (Stochastic approximation) Simple one pass stochastic gradient descent with constant step sizes and averaging. Difficulty of the problem Let Σ = E[KxK t

x] be the covariance operator. We assume that

tr(Σ1/α) < ∞ We assume gρ ∈ Σr(L2

ρX ).

(α, r) encode the difficulty of the problem.

Aymeric Dieuleveut Adaptativity of SGD 2 / 3

slide-3
SLIDE 3

Results

Theorem (Non parametric regression) Under a suitable choice of the learning rate, we get the optimal rate of convergence for non parametric regression. Theorem (Adaptativity in Euclidean spaces) If H is a d-dimensional Euclidean space : E [ε (¯ gn) − ε(gρ)] min

1α, −1

2 q 1 2

  • 16σ2 tr(Σ1/α)(γn)1/α

n + 8||T −qθH||2

H

(nγ)2q+1

  • .

SGD is adaptative to the regularity of the objective function and to the decay

  • f the spectrum of the covariance matrix.

explains behaviour for d >> n.

Aymeric Dieuleveut Adaptativity of SGD 3 / 3