Splash User-friendly Programming Interface for Parallelizing - PowerPoint PPT Presentation

Splash User-friendly Programming Interface for Parallelizing Stochastic Algorithms Yuchen Zhang and Michael Jordan AMP Lab, UC Berkeley AMP Lab Splash April 2015 1 / 27

Batch Algorithm v.s. Stochastic Algorithm � n Consider minimizing a loss function L ( w ) := 1 i =1 ℓ i ( w ). n AMP Lab Splash April 2015 2 / 27

Batch Algorithm v.s. Stochastic Algorithm � n Consider minimizing a loss function L ( w ) := 1 i =1 ℓ i ( w ). n Gradient Descent: iteratively update w t +1 = w t − η t ∇ L ( w t ) . AMP Lab Splash April 2015 2 / 27

Batch Algorithm v.s. Stochastic Algorithm � n Consider minimizing a loss function L ( w ) := 1 i =1 ℓ i ( w ). n Gradient Descent: iteratively update w t +1 = w t − η t ∇ L ( w t ) . Pros: Easy to parallelize (via Spark). Cons: May need hundreds of iterations to converge. 0.7 Gradient Descent - 64 threads 0.65 loss function 0.6 0.55 0 50 100 150 200 250 running time (seconds) AMP Lab Splash April 2015 2 / 27

Batch Algorithm v.s. Stochastic Algorithm � n Consider minimizing a loss function L ( w ) := 1 i =1 ℓ i ( w ). n Stochastic Gradient Descent (SGD): randomly draw ℓ t , then w t +1 = w t − η t ∇ ℓ t ( w t ) . AMP Lab Splash April 2015 3 / 27

Batch Algorithm v.s. Stochastic Algorithm � n Consider minimizing a loss function L ( w ) := 1 i =1 ℓ i ( w ). n Stochastic Gradient Descent (SGD): randomly draw ℓ t , then w t +1 = w t − η t ∇ ℓ t ( w t ) . Pros: Much faster convergence. Cons: Sequential algorithm, difficult to parallelize. 0.7 Gradient Descent - 64 threads Stochastic Gradient Descent 0.65 loss function 0.6 0.55 0 50 100 150 200 250 running time (seconds) AMP Lab Splash April 2015 3 / 27

More Stochastic Algorithms Convex Optimization Adaptive SGD (Duchi et al.) Stochastic Average Gradient Method (Schmidt et al.) Stochastic Dual Coordinate Ascent (Shalev-Shwartz and Zhang) AMP Lab Splash April 2015 4 / 27

More Stochastic Algorithms Convex Optimization Adaptive SGD (Duchi et al.) Stochastic Average Gradient Method (Schmidt et al.) Stochastic Dual Coordinate Ascent (Shalev-Shwartz and Zhang) Probabilistic Model Inference Markov chain Monte Carlo and Gibbs sampling Expectation propagation (Minka) Stochastic variational inference (Hoffman et al.) AMP Lab Splash April 2015 4 / 27

More Stochastic Algorithms Convex Optimization Adaptive SGD (Duchi et al.) Stochastic Average Gradient Method (Schmidt et al.) Stochastic Dual Coordinate Ascent (Shalev-Shwartz and Zhang) Probabilistic Model Inference Markov chain Monte Carlo and Gibbs sampling Expectation propagation (Minka) Stochastic variational inference (Hoffman et al.) SGD variants for Matrix factorization Learning neural networks Learning denoising auto-encoder AMP Lab Splash April 2015 4 / 27

More Stochastic Algorithms Convex Optimization Adaptive SGD (Duchi et al.) Stochastic Average Gradient Method (Schmidt et al.) Stochastic Dual Coordinate Ascent (Shalev-Shwartz and Zhang) Probabilistic Model Inference Markov chain Monte Carlo and Gibbs sampling Expectation propagation (Minka) Stochastic variational inference (Hoffman et al.) SGD variants for Matrix factorization Learning neural networks Learning denoising auto-encoder How to parallelize these algorithms? AMP Lab Splash April 2015 4 / 27

First Attempt After processing a subsequence of random samples... Single-thread Algorithm: incremental update w ← w + ∆. AMP Lab Splash April 2015 5 / 27

First Attempt After processing a subsequence of random samples... Single-thread Algorithm: incremental update w ← w + ∆. Parallel Algorithm: Thread 1 (on 1 / m of samples): w ← w + ∆ 1 . Thread 2 (on 1 / m of samples): w ← w + ∆ 2 . . . . Thread m (on 1 / m of samples): w ← w + ∆ m . AMP Lab Splash April 2015 5 / 27

First Attempt After processing a subsequence of random samples... Single-thread Algorithm: incremental update w ← w + ∆. Parallel Algorithm: Thread 1 (on 1 / m of samples): w ← w + ∆ 1 . Thread 2 (on 1 / m of samples): w ← w + ∆ 2 . . . . Thread m (on 1 / m of samples): w ← w + ∆ m . Aggregate parallel updates w ← w + ∆ 1 + · · · + ∆ m . AMP Lab Splash April 2015 5 / 27

First Attempt After processing a subsequence of random samples... Single-thread Algorithm: incremental update w ← w + ∆. Parallel Algorithm: Thread 1 (on 1 / m of samples): w ← w + ∆ 1 . Thread 2 (on 1 / m of samples): w ← w + ∆ 2 . . . . Thread m (on 1 / m of samples): w ← w + ∆ m . Aggregate parallel updates w ← w + ∆ 1 + · · · + ∆ m . 100 Single-thread SGD Parallel SGD - 64 threads 80 loss function 60 Doesn’t work for SGD! 40 20 0 0 20 40 60 running time (seconds) AMP Lab Splash April 2015 5 / 27

Conflicts in Parallel Updates Reason of failure : ∆ 1 , . . . , ∆ m simultaneously manipulate the same variable w , causing conflicts in parallel updates. AMP Lab Splash April 2015 6 / 27

Conflicts in Parallel Updates Reason of failure : ∆ 1 , . . . , ∆ m simultaneously manipulate the same variable w , causing conflicts in parallel updates. How to resolve conflicts AMP Lab Splash April 2015 6 / 27

Conflicts in Parallel Updates Reason of failure : ∆ 1 , . . . , ∆ m simultaneously manipulate the same variable w , causing conflicts in parallel updates. How to resolve conflicts 1 Frequent communication between threads: Pros: general approach to resolving conflict. Cons: inter-node (asynchronous) communication is expensive! AMP Lab Splash April 2015 6 / 27

Conflicts in Parallel Updates Reason of failure : ∆ 1 , . . . , ∆ m simultaneously manipulate the same variable w , causing conflicts in parallel updates. How to resolve conflicts 1 Frequent communication between threads: Pros: general approach to resolving conflict. Cons: inter-node (asynchronous) communication is expensive! 2 Carefully partition the data to avoid threads simultaneously manipulating the same variable: Pros: doesn’t need frequent communication. Cons: need problem-specific partitioning schemes; only works for a subset of problems. AMP Lab Splash April 2015 6 / 27

Splash : A Principle Solution Splash is A programming interface for developing stochastic algorithms An execution engine for running stochastic algorithm on distributed systems. AMP Lab Splash April 2015 7 / 27

Splash : A Principle Solution Splash is A programming interface for developing stochastic algorithms An execution engine for running stochastic algorithm on distributed systems. Features of Splash include: Easy Programming : User develop single-thread algorithms via Splash: no communication protocol, no conflict management, no data partitioning, no hyper-parameters tuning. AMP Lab Splash April 2015 7 / 27

Splash : A Principle Solution Splash is A programming interface for developing stochastic algorithms An execution engine for running stochastic algorithm on distributed systems. Features of Splash include: Easy Programming : User develop single-thread algorithms via Splash: no communication protocol, no conflict management, no data partitioning, no hyper-parameters tuning. Fast Performance : Splash adopts novel strategy for automatic parallelization with infrequent communication. Communication is no longer a performance bottleneck. AMP Lab Splash April 2015 7 / 27

Splash : A Principle Solution Splash is A programming interface for developing stochastic algorithms An execution engine for running stochastic algorithm on distributed systems. Features of Splash include: Easy Programming : User develop single-thread algorithms via Splash: no communication protocol, no conflict management, no data partitioning, no hyper-parameters tuning. Fast Performance : Splash adopts novel strategy for automatic parallelization with infrequent communication. Communication is no longer a performance bottleneck. Integration with Spark : taking RDD as input and returning RDD as output. Work with KeystoneML, MLlib and other data analysis tools on Spark. AMP Lab Splash April 2015 7 / 27

Programming Interface AMP Lab Splash April 2015 8 / 27

Programming with Splash Splash users implement the following function: def process(sample: Any, weight: Int, var: VariableSet) { /*implement stochastic algorithm*/ } where sample — a random sample from the dataset. weight — observe the sample duplicated by weight times. var — set of all shared variables. AMP Lab Splash April 2015 9 / 27

Example: SGD for Linear Regression Goal: find w ∗ = arg min w 1 � n i =1 ( wx i − y i ) 2 . n SGD update: randomly draw ( x i , y i ), then w ← w − η ∇ w ( wx i − y i ) 2 . AMP Lab Splash April 2015 10 / 27

Example: SGD for Linear Regression Goal: find w ∗ = arg min w 1 � n i =1 ( wx i − y i ) 2 . n SGD update: randomly draw ( x i , y i ), then w ← w − η ∇ w ( wx i − y i ) 2 . Splash implementation: def process(sample: Any, weight: Int, var: VariableSet) { val stepsize = var.get(“eta”) * weight val gradient = sample.x * (var.get(“w”) * sample.x - sample.y) var.add(“w”, - stepsize * gradient) } AMP Lab Splash April 2015 10 / 27

Splash User-friendly Programming Interface for Parallelizing - PowerPoint PPT Presentation

Splash User-friendly Programming Interface for Parallelizing Stochastic Algorithms Yuchen Zhang and Michael Jordan AMP Lab, UC Berkeley AMP Lab Splash April 2015 1 / 27 Batch Algorithm v.s. Stochastic Algorithm n Consider minimizing a

SPLASH Water Safety Campaign 2017 Law Enforcement Off the Pavement SPLASH SPLASH Water Safety

THE SPLASH DRONE 3 THE MODULAR ALL WEATHER WATERPROOF DRONE The Splash drone 3 is the most

Martindale Park Splash Pad Public Meeting #1 February 13, 2019 Please sign in. 1 Presentation

Safety and Health 2016 and beyond SPlASH - a Scottish Plan for Action on Safety and Health 2016

Clil project An English splash into the Mediterranean Sea Prof.ssa Caterina Moccia Prof.ssa

Overview of Sanctuary Splash/ Big Mama Programming Designed as a three part lesson able

OVERWRAP COMPOSITE REPAIRS OF OFFSHORE RISERS AT TOPSIDE AND SPLASH ZONE A.Y.L. Leong 1 , K.H.

PAWS! AQUATICS WATER SAFETY Lets be safe around water! HOMES, FARMS, POOLS, BEACHES, RIVERS

Splash Page A/B Testing Oct 6, 2015 What is Tilt? Tilt is the easiest way to collect money from

SPlasH! - Stop to plastic in H2O! An EU Project to investigate the state of the port environment

Anne Blenkinsopp Department for International Development, UK Coordinating European water

Welcome FY12 SPLASH SNAP Ed Administrative & Operations Update Mary Grill MNN Program

Poppy Hand Painted Servingware Add a splash of colour to your table this spring with Novas

PARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites Christian

Building a Stronger Community, One Splash at a Time Quote Feb 5, 2019 Chicago Tribune

Investor Presentation August 2018 Magic Island Splash Pad, Charleston, WV NYSE: AWK Funded by

Jaume Abella, Francisco J. Cazorla July 4 th Euromicro Conference on Real-Time Systems Barcelona,

Categorification of perfect matchings Alastair King, Bath work in progress with I. Canakci &

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

Advanced Algorithms (IV) Chihao Zhang Shanghai Jiao Tong University Mar. 18, 2019 Advanced

Organization to Teach Gathering and Implementation of Requirements Gregor Gabrysiak, Regina

CS CS 466 466 In Introduct ctio ion t to B Bio ioin informatics ics Lecture 6 Mohammed

Motivation Portfolio approaches Javier Estrada Standard/Traditional IESE Business

Acceleration through Optimistic No-Regret Dynamics Jun-Kun Wang and Jacob Abernethy Georgia Tech

Splash User-friendly Programming Interface for Parallelizing - PowerPoint PPT Presentation

Splash User-friendly Programming Interface for Parallelizing Stochastic Algorithms Yuchen Zhang and Michael Jordan AMP Lab, UC Berkeley AMP Lab Splash April 2015 1 / 27 Batch Algorithm v.s. Stochastic Algorithm n Consider minimizing a

SPLASH Water Safety Campaign 2017 Law Enforcement Off the Pavement SPLASH SPLASH Water Safety

THE SPLASH DRONE 3 THE MODULAR ALL WEATHER WATERPROOF DRONE The Splash drone 3 is the most

Martindale Park Splash Pad Public Meeting #1 February 13, 2019 Please sign in. 1 Presentation

Safety and Health 2016 and beyond SPlASH - a Scottish Plan for Action on Safety and Health 2016

Clil project An English splash into the Mediterranean Sea Prof.ssa Caterina Moccia Prof.ssa

Overview of Sanctuary Splash/ Big Mama Programming Designed as a three part lesson able

OVERWRAP COMPOSITE REPAIRS OF OFFSHORE RISERS AT TOPSIDE AND SPLASH ZONE A.Y.L. Leong 1 , K.H.

PAWS! AQUATICS WATER SAFETY Lets be safe around water! HOMES, FARMS, POOLS, BEACHES, RIVERS

Splash Page A/B Testing Oct 6, 2015 What is Tilt? Tilt is the easiest way to collect money from

SPlasH! - Stop to plastic in H2O! An EU Project to investigate the state of the port environment

Anne Blenkinsopp Department for International Development, UK Coordinating European water

Welcome FY12 SPLASH SNAP Ed Administrative &amp; Operations Update Mary Grill MNN Program

Poppy Hand Painted Servingware Add a splash of colour to your table this spring with Novas

PARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites Christian

Building a Stronger Community, One Splash at a Time Quote Feb 5, 2019 Chicago Tribune

Investor Presentation August 2018 Magic Island Splash Pad, Charleston, WV NYSE: AWK Funded by

Jaume Abella, Francisco J. Cazorla July 4 th Euromicro Conference on Real-Time Systems Barcelona,

Categorification of perfect matchings Alastair King, Bath work in progress with I. Canakci &amp;

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

Advanced Algorithms (IV) Chihao Zhang Shanghai Jiao Tong University Mar. 18, 2019 Advanced

Organization to Teach Gathering and Implementation of Requirements Gregor Gabrysiak, Regina

CS CS 466 466 In Introduct ctio ion t to B Bio ioin informatics ics Lecture 6 Mohammed

Motivation Portfolio approaches Javier Estrada Standard/Traditional IESE Business

Acceleration through Optimistic No-Regret Dynamics Jun-Kun Wang and Jacob Abernethy Georgia Tech

Welcome FY12 SPLASH SNAP Ed Administrative & Operations Update Mary Grill MNN Program

Categorification of perfect matchings Alastair King, Bath work in progress with I. Canakci &