Stochastic Optimization Techniques for Big Data Machine Learning
Tong Zhang
Rutgers University & Baidu Inc.
- T. Zhang
Big Data Optimization 1 / 73
Stochastic Optimization Techniques for Big Data Machine Learning - - PowerPoint PPT Presentation
Stochastic Optimization Techniques for Big Data Machine Learning Tong Zhang Rutgers University & Baidu Inc. T. Zhang Big Data Optimization 1 / 73 Outline Background: big data optimization in machine learning: special structure T. Zhang
Big Data Optimization 1 / 73
Big Data Optimization 2 / 73
Big Data Optimization 2 / 73
Big Data Optimization 2 / 73
Big Data Optimization 3 / 73
Big Data Optimization 3 / 73
Big Data Optimization 4 / 73
Big Data Optimization 4 / 73
Big Data Optimization 4 / 73
Big Data Optimization 5 / 73
Big Data Optimization 5 / 73
Big Data Optimization 5 / 73
Big Data Optimization 5 / 73
Big Data Optimization 5 / 73
Big Data Optimization 6 / 73
Big Data Optimization 6 / 73
Big Data Optimization 6 / 73
Big Data Optimization 7 / 73
Big Data Optimization 7 / 73
Big Data Optimization 8 / 73
Big Data Optimization 8 / 73
Big Data Optimization 9 / 73
Big Data Optimization 9 / 73
stochastic gradient descent computational cost training error gradient descent
Big Data Optimization 10 / 73
stochastic gradient descent computational cost training error gradient descent
Big Data Optimization 10 / 73
n
i=1 gi = ∇f.
2.
Big Data Optimization 11 / 73
n
i=1 gi = ∇f.
2.
Big Data Optimization 11 / 73
Big Data Optimization 12 / 73
Big Data Optimization 12 / 73
Big Data Optimization 13 / 73
Big Data Optimization 13 / 73
Big Data Optimization 13 / 73
Big Data Optimization 14 / 73
Big Data Optimization 14 / 73
Big Data Optimization 14 / 73
Big Data Optimization 15 / 73
i := gi − ˜
Big Data Optimization 15 / 73
i := gi − ˜
Big Data Optimization 15 / 73
i := gi − ˜
Big Data Optimization 15 / 73
Big Data Optimization 16 / 73
Big Data Optimization 17 / 73
Big Data Optimization 18 / 73
Big Data Optimization 19 / 73
Big Data Optimization 20 / 73
Big Data Optimization 21 / 73
Big Data Optimization 22 / 73
Big Data Optimization 23 / 73
Big Data Optimization 23 / 73
Big Data Optimization 23 / 73
Big Data Optimization 24 / 73
Big Data Optimization 24 / 73
Big Data Optimization 24 / 73
Big Data Optimization 25 / 73
Big Data Optimization 26 / 73
Big Data Optimization 27 / 73
Big Data Optimization 28 / 73
Big Data Optimization 28 / 73
Big Data Optimization 28 / 73
Big Data Optimization 29 / 73
Big Data Optimization 29 / 73
Big Data Optimization 29 / 73
Big Data Optimization 30 / 73
Big Data Optimization 30 / 73
Big Data Optimization 31 / 73
Big Data Optimization 32 / 73
Big Data Optimization 32 / 73
Big Data Optimization 33 / 73
Big Data Optimization 34 / 73
Big Data Optimization 34 / 73
Big Data Optimization 35 / 73
Big Data Optimization 36 / 73
5 10 15 20 25 10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
SDCA SDCA−Perm SGD
Big Data Optimization 37 / 73
5 10 15 20 25 10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
SDCA SDCA−Perm SGD
Big Data Optimization 37 / 73
5 10 15 20 25 30 35 10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
SDCA SDCA−Perm SGD
Big Data Optimization 38 / 73
Big Data Optimization 39 / 73
Big Data Optimization 40 / 73
Big Data Optimization 41 / 73
Big Data Optimization 42 / 73
Big Data Optimization 42 / 73
Big Data Optimization 43 / 73
Big Data Optimization 43 / 73
2 4 6 8 10 12 14 16 18 10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
SDCA DCA−Cyclic SDCA−Perm Bound
Big Data Optimization 44 / 73
2 4 6 8 10 12 14 16 18 10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
SDCA DCA−Cyclic SDCA−Perm Bound
Big Data Optimization 44 / 73
Big Data Optimization 45 / 73
Big Data Optimization 46 / 73
Big Data Optimization 47 / 73
Big Data Optimization 47 / 73
Big Data Optimization 47 / 73
i (αi + ∆) = (αi + ∆)Yi ln((αi + ∆)Yi) + (1 − (αi + ∆)Yi) ln(1 − (αi + ∆)Yi)
Big Data Optimization 48 / 73
Big Data Optimization 49 / 73
Big Data Optimization 49 / 73
Big Data Optimization 49 / 73
Big Data Optimization 50 / 73
Big Data Optimization 50 / 73
Big Data Optimization 50 / 73
Big Data Optimization 51 / 73
2 (κ-strongly convex)
Big Data Optimization 51 / 73
1 λǫ
λ ǫ
λ ǫ,
λǫ}
d ǫ2
n ǫ
ǫ
ǫ,
ǫ }
λ
λ
λ,
λ}
Big Data Optimization 52 / 73
20 40 60 80 100 0.1 0.2 0.3 0.4 0.5 AccProxSDCA ProxSDCA FISTA 20 40 60 80 100 0.1 0.2 0.3 0.4 0.5 AccProxSDCA ProxSDCA FISTA
Big Data Optimization 53 / 73
Big Data Optimization 54 / 73
Big Data Optimization 55 / 73
Big Data Optimization 56 / 73
Big Data Optimization 56 / 73
Big Data Optimization 57 / 73
Big Data Optimization 57 / 73
Big Data Optimization 57 / 73
Big Data Optimization 58 / 73
Big Data Optimization 58 / 73
Big Data Optimization 58 / 73
Big Data Optimization 59 / 73
Big Data Optimization 60 / 73
Big Data Optimization 61 / 73
10
6
10
7
10
8
10
−3
10
−2
10
−1
#processed examples Primal suboptimality m=52 m=523 m=5229 AGD SDCA
Big Data Optimization 62 / 73
Big Data Optimization 63 / 73
Big Data Optimization 63 / 73
Big Data Optimization 63 / 73
Big Data Optimization 64 / 73
Big Data Optimization 64 / 73
Big Data Optimization 65 / 73
Big Data Optimization 66 / 73
Big Data Optimization 66 / 73
Big Data Optimization 67 / 73
Big Data Optimization 68 / 73
Big Data Optimization 69 / 73
Big Data Optimization 70 / 73
Big Data Optimization 70 / 73
Big Data Optimization 71 / 73
Big Data Optimization 72 / 73
Big Data Optimization 73 / 73