SLIDE 1
Empirical Loss Minimization Traffic sign - STOP Sample - - PowerPoint PPT Presentation
Empirical Loss Minimization Traffic sign - STOP Sample - - PowerPoint PPT Presentation
Empirical Loss Minimization Traffic sign - STOP Sample i.i.d. points Stochastic Gradient Descent Lon Bottou, Frank E Curtis, Jorge Nocedal Optimization methods for large-scale machine learning SVRG:
SLIDE 2
SLIDE 3
Empirical Loss Minimization
SLIDE 4
Traffic sign - STOP
SLIDE 5
SLIDE 6
Sample i.i.d. points
SLIDE 7
Stochastic Gradient Descent
SLIDE 8
SLIDE 9
SLIDE 10
- Léon Bottou, Frank E Curtis, Jorge Nocedal
Optimization methods for large-scale machine learning
SLIDE 11
SVRG: Stochastic Variance Reduced Gradient
SLIDE 12
- Unbiased stochastic gradient:
SLIDE 13
SLIDE 14
SLIDE 15
SAG/SAGA
SLIDE 16
SLIDE 17
SLIDE 18
SARAH
č
SLIDE 19
SLIDE 20
SLIDE 21
SLIDE 22
- …
SLIDE 23
RCV Dataset
SVRG and SARAH need full gradient after restart Variance of SARAH goes to zero Variance of SVRG is decreased after each restart
SLIDE 24
SLIDE 25
SARAH+ Practical Variant
SLIDE 26
good performance across many datasets
SLIDE 27
Numerical Experiments
SLIDE 28
One has to tune parameters to get a good performance! Not for SARAH+!
SLIDE 29
SLIDE 30
SLIDE 31
SLIDE 32
Summary
SLIDE 33
SLIDE 34
Convex Case
SLIDE 35
SLIDE 36
Non-Convex Case
SLIDE 37
SLIDE 38
SLIDE 39