Massively Parallel Optimization on a Cluster Environment Stratis - - PowerPoint PPT Presentation

massively parallel optimization on a cluster environment
SMART_READER_LITE
LIVE PREVIEW

Massively Parallel Optimization on a Cluster Environment Stratis - - PowerPoint PPT Presentation

Massively Parallel Optimization on a Cluster Environment Stratis Ioannidis Data, Networks, and Algorithms Lab q Machine Learning q Optimization q Distributed Computing q Privacy 5000-Level Course: q Parallel Processing for Data


slide-1
SLIDE 1

Massively Parallel Optimization

  • n a Cluster Environment

Stratis Ioannidis

slide-2
SLIDE 2

q Machine Learning q Optimization q Distributed Computing q Privacy 5000-Level Course: q Parallel Processing for Data Analytics

Data, Networks, and Algorithms Lab

Massively Parallel Optimization on a Cluster Environment 1

slide-3
SLIDE 3

DNAL Research on MGHPCC

Massively Parallel Optimization on a Cluster Environment 2

Machine Learning for Retinopathy of Prematurity

NSF-1622536

Image Analysis

, , , , , ,

Deep Learning

Pre- Plus Plus Norm al

, , , , , ,

Scalable Graph Distances

NSF-1741197 Garbled Circuit

f

Privacy-Preserving Machine Learning

NSF-1717213, Google Research

Distributed Caching Algorithms

NSF-1718355

slide-4
SLIDE 4

q Optimization over large datasets

q TB of data q Millions of variables q 1000's of CPUs

q Computational Frameworks

q Map-Reduce/Spark q GraphLab q TensorFlow q MPI q …

q Optimization Methods

q ADMM q SGD q SDCA q …

Distributed Optimization and Big Data

Massively Parallel Optimization on a Cluster Environment 3

slide-5
SLIDE 5

Alternating Directions Method of Multipliers

Massively Parallel Optimization on a Cluster Environment

Distributed optimization and statistical learning via the alternating direction method

  • f multipliers S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, 2011

ˆ β1

ˆ β2

ˆ β3

min

β∈Rd n

X

i=1

`(; xi, yi) + kk1

4

slide-6
SLIDE 6

Alternating Directions Method of Multipliers

Massively Parallel Optimization on a Cluster Environment

¯ β

Consensus value Solve problem again, forcing agreement with

¯ β

Distributed optimization and statistical learning via the alternating direction method

  • f multipliers S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, 2011

min

β∈Rd n

X

i=1

`(; xi, yi) + kk1

5

slide-7
SLIDE 7

q Converges if loss is convex q Admits many regularization penalties q Message complexity determined by data sparsity

ADMM properties

Massively Parallel Optimization on a Cluster Environment 6

`

k · k

dataset 2 3 n-1 n 1 i … … feature_1 feature_2 feature_3 feature_d-1 feature_d β … … feature_ j Dependence Graph

slide-8
SLIDE 8

Our Research

Massively Parallel Optimization on a Cluster Environment 7

https://github.com/yahoo/SparkADMM

q Parallel implementation

q Application to: q Timeseries Forecasting q Scalable Graph Distances

[I., Jiang, Amizadeh, Laptev, 2016] [I., Bento, 2017]

t

t + 1

slide-9
SLIDE 9

q Marguerite Frank & Philip Wolfe, 1956 q Sparse convex optimization q Continuous greedy algorithm for submodular maximization

Frank-Wolfe Algorithm

Massively Parallel Optimization on a Cluster Environment 8

Minimize: subject to:

Minimize F(θ) to: θ 2 D,

2 sk = arg mins2D s> ·∇F(θ k) θ k+1 = (1γk)θ k +γksk,

FW Algorithm:

Maximize a linear function over

2 D,

Interpolate between solutions

slide-10
SLIDE 10

q Parallelize FW via map-reduce

q Formal conditions under which MR applies

q Several problems amenable to parallelization:

q Experiment Design, Adaboost, Convex Approximation

q Implementation over Spark

q Solve problems of 10M variables in 44 mins using 210 CPUs q Serial execution would take 3.4 days

Our Research

Massively Parallel Optimization on a Cluster Environment 9

https://github.com/neu-spiral/FrankWolfe [Moharrer, I., 2017]

slide-11
SLIDE 11

q Evaluate ADMM+FW over heterogeneous cluster architecture

q Communication, computation, & memory profiling

q Data partitioning

q Communication q Convergence

q Read/Writes:

q Hard disk/RAM q Multi-tier caches

Proposal

Massively Parallel Optimization on a Cluster Environment 10

slide-12
SLIDE 12

Thank You!