Splash
User-friendly Programming Interface for Parallelizing Stochastic Algorithms
Yuchen Zhang and Michael Jordan
AMP Lab, UC Berkeley
AMP Lab Splash April 2015 1 / 27
Splash User-friendly Programming Interface for Parallelizing - - PowerPoint PPT Presentation
Splash User-friendly Programming Interface for Parallelizing Stochastic Algorithms Yuchen Zhang and Michael Jordan AMP Lab, UC Berkeley AMP Lab Splash April 2015 1 / 27 Batch Algorithm v.s. Stochastic Algorithm n Consider minimizing a
AMP Lab, UC Berkeley
AMP Lab Splash April 2015 1 / 27
n
i=1 ℓi(w).
AMP Lab Splash April 2015 2 / 27
n
i=1 ℓi(w).
AMP Lab Splash April 2015 2 / 27
n
i=1 ℓi(w).
running time (seconds) 50 100 150 200 250 loss function 0.55 0.6 0.65 0.7
Gradient Descent - 64 threads
AMP Lab Splash April 2015 2 / 27
n
i=1 ℓi(w).
AMP Lab Splash April 2015 3 / 27
n
i=1 ℓi(w).
running time (seconds) 50 100 150 200 250 loss function 0.55 0.6 0.65 0.7
Gradient Descent - 64 threads Stochastic Gradient Descent
AMP Lab Splash April 2015 3 / 27
AMP Lab Splash April 2015 4 / 27
AMP Lab Splash April 2015 4 / 27
AMP Lab Splash April 2015 4 / 27
AMP Lab Splash April 2015 4 / 27
AMP Lab Splash April 2015 5 / 27
AMP Lab Splash April 2015 5 / 27
AMP Lab Splash April 2015 5 / 27
running time (seconds) 20 40 60 loss function 20 40 60 80 100
Single-thread SGD Parallel SGD - 64 threads
AMP Lab Splash April 2015 5 / 27
AMP Lab Splash April 2015 6 / 27
AMP Lab Splash April 2015 6 / 27
1 Frequent communication between threads:
Pros: general approach to resolving conflict. Cons: inter-node (asynchronous) communication is expensive!
AMP Lab Splash April 2015 6 / 27
1 Frequent communication between threads:
Pros: general approach to resolving conflict. Cons: inter-node (asynchronous) communication is expensive!
2 Carefully partition the data to avoid threads simultaneously
Pros: doesn’t need frequent communication. Cons: need problem-specific partitioning schemes; only works for a subset of problems.
AMP Lab Splash April 2015 6 / 27
AMP Lab Splash April 2015 7 / 27
AMP Lab Splash April 2015 7 / 27
AMP Lab Splash April 2015 7 / 27
AMP Lab Splash April 2015 7 / 27
AMP Lab Splash April 2015 8 / 27
AMP Lab Splash April 2015 9 / 27
n
i=1(wxi − yi)2.
AMP Lab Splash April 2015 10 / 27
n
i=1(wxi − yi)2.
AMP Lab Splash April 2015 10 / 27
n
i=1(wxi − yi)2.
AMP Lab Splash April 2015 10 / 27
AMP Lab Splash April 2015 11 / 27
AMP Lab Splash April 2015 12 / 27
AMP Lab Splash April 2015 13 / 27
AMP Lab Splash April 2015 13 / 27
AMP Lab Splash April 2015 14 / 27
AMP Lab Splash April 2015 14 / 27
1 Convert RDD dataset to Parametrized RDD:
AMP Lab Splash April 2015 15 / 27
1 Convert RDD dataset to Parametrized RDD:
2 Set a function that implements the algorithm:
AMP Lab Splash April 2015 15 / 27
1 Convert RDD dataset to Parametrized RDD:
2 Set a function that implements the algorithm:
3 Start running:
AMP Lab Splash April 2015 15 / 27
AMP Lab Splash April 2015 16 / 27
1 Propose candidate degrees of parallelism m1, . . . , mk such that
i mi = m := (# of cores). For each i ∈ [k], collect mi cores and
AMP Lab Splash April 2015 17 / 27
1 Propose candidate degrees of parallelism m1, . . . , mk such that
i mi = m := (# of cores). For each i ∈ [k], collect mi cores and
1
Each core gets a sub-sequence of samples (by default
1 m of the full
data). They process the samples sequentially using the process
AMP Lab Splash April 2015 17 / 27
1 Propose candidate degrees of parallelism m1, . . . , mk such that
i mi = m := (# of cores). For each i ∈ [k], collect mi cores and
1
Each core gets a sub-sequence of samples (by default
1 m of the full
data). They process the samples sequentially using the process
2
Combine the updates of all mi cores to get the global update. There are different strategies for combining different types of updates. For add operations, the updates are averaged.
AMP Lab Splash April 2015 17 / 27
1 Propose candidate degrees of parallelism m1, . . . , mk such that
i mi = m := (# of cores). For each i ∈ [k], collect mi cores and
1
Each core gets a sub-sequence of samples (by default
1 m of the full
data). They process the samples sequentially using the process
2
Combine the updates of all mi cores to get the global update. There are different strategies for combining different types of updates. For add operations, the updates are averaged.
2 If k > 1, then select the best mi by a parallel cross-validation
AMP Lab Splash April 2015 17 / 27
1 Propose candidate degrees of parallelism m1, . . . , mk such that
i mi = m := (# of cores). For each i ∈ [k], collect mi cores and
1
Each core gets a sub-sequence of samples (by default
1 m of the full
data). They process the samples sequentially using the process
2
Combine the updates of all mi cores to get the global update. There are different strategies for combining different types of updates. For add operations, the updates are averaged.
2 If k > 1, then select the best mi by a parallel cross-validation
3 Broadcast the best update to all machines to apply this update. Then
AMP Lab Splash April 2015 17 / 27
0.5 1 1.5 2
0.2 0.4 0.6 0.8 1
(a) Optimal solution (b) Solution with full update (c) Local solutions with unit-weight update
AMP Lab Splash April 2015 18 / 27
0.5 1 1.5 2
0.2 0.4 0.6 0.8 1
(a) Optimal solution (b) Solution with full update (c) Local solutions with unit-weight update (d) Average local solutions in (c) (e) Aggregate local solutions in (c) (29,8)
AMP Lab Splash April 2015 18 / 27
0.5 1 1.5 2
0.2 0.4 0.6 0.8 1
(a) Optimal solution (b) Solution with full update (c) Local solutions with unit-weight update (d) Average local solutions in (c) (e) Aggregate local solutions in (c) (f) Local solutions with weighted update (29,8)
AMP Lab Splash April 2015 18 / 27
0.5 1 1.5 2
0.2 0.4 0.6 0.8 1
(a) Optimal solution (b) Solution with full update (c) Local solutions with unit-weight update (d) Average local solutions in (c) (e) Aggregate local solutions in (c) (f) Local solutions with weighted update (g) Average local solutions in (f) (29,8)
AMP Lab Splash April 2015 18 / 27
AMP Lab Splash April 2015 19 / 27
MNIST 8M (LR): 8 million samples, 7,840 parameters. Netflix (CF): 100 million samples, 65 million parameters. NYTimes (LDA): 100 million samples, 200 million parameters.
AMP Lab Splash April 2015 20 / 27
runtime (seconds)
100 200 300 400 500
loss function
0.45 0.5 0.55 0.6
Splash (SGD) Single-thread SGD MLlib (L-BFGS)
loss function value
0.46 0.47 0.48 0.49
speedup rate
10 20 30 40
Over single-thread SGD Over MLlib (L-BFGS)
AMP Lab Splash April 2015 21 / 27
runtime (seconds)
100 200 300 400 500
prediction loss
0.8 1 1.2 1.4
Splash (SGD) Single-thread SGD MLlib (ALS)
AMP Lab Splash April 2015 22 / 27
runtime (seconds)
1000 2000
predictive log-likelihood
Splash (Gibbs) Single-thread (Gibbs) MLlib (VI)
AMP Lab Splash April 2015 23 / 27
MNIST 8M (LR) Netflix (CF) NYTimes (LDA)
Runtime per pass
10 20 30 40 50 60
Computation time Waiting time Communication time
AMP Lab Splash April 2015 24 / 27
AMP Lab Splash April 2015 25 / 27
Fast performance: order-of-magnitude faster than MLlib. Ease of use: call with one line of code. Integration: easy to build a data analytics pipeline.
Stochastic gradient descent. Stochastic matrix factorization. Gibbs sampling for LDA.
AMP Lab Splash April 2015 26 / 27
AMP Lab Splash April 2015 27 / 27