Dynamic Parameter Allocation in Parameter Servers Alexander - PowerPoint PPT Presentation

Dynamic Parameter Allocation in Parameter Servers Alexander Renz-Wieland 1 , Rainer Gemulla 2 , Stefgen Zeuch 1,3 , Volker Markl 1,3 VLDB 2020 1 / 11 1 TU Berlin, 2 University of Mannheim, 3 DFKI

Takeaways communication overhead 2 / 11 ◮ Key challenge in distributed Machine Learning (ML): ◮ Parameter Servers (PSs) ◮ Intuitive ◮ Limited support for common techniques to reduce overhead ◮ How to improve support? ◮ Dynamic parameter allocation ◮ Is this support benefjcial? ◮ Up to two orders of magnitude faster

Background: Distributed Machine Learning Physical worker worker worker parameters worker worker worker parameters worker worker worker parameters 3 / 11 worker Logical Parameter Server worker ◮ Distributed training is a necessity for large-scale ML tasks ◮ Parameter management is a key concern ◮ Parameter servers (PS) are widely used push() p pull() push() u pull() p s u h l ( l ) ( ) worker ...

Problem: Communication Overhead Training knowledge graph embeddings (RESCAL, dimension 100): 4 / 11 ◮ Communication overhead can limit scalability ◮ Performance can fall behind a single node Epoch run time in minutes Classic PS (PS−Lite) 4.5h 4h 200 2.4h 100 1.5h 0 1x4 2x4 4x4 8x4 Parallelism (nodes x threads)

Problem: Communication Overhead Training knowledge graph embeddings (RESCAL, dimension 100): 4 / 11 ◮ Communication overhead can limit scalability ◮ Performance can fall behind a single node Epoch run time in minutes Classic PS (PS−Lite) 4.5h 4h 200 Classic PS with fast local access 2.4h 100 1.5h 1.2h 0 1x4 2x4 4x4 8x4 Parallelism (nodes x threads)

Problem: Communication Overhead Training knowledge graph embeddings (RESCAL, dimension 100): 4 / 11 ◮ Communication overhead can limit scalability ◮ Performance can fall behind a single node Epoch run time in minutes Classic PS (PS−Lite) 4.5h 4h 200 Classic PS with fast local access 2.4h 100 Dynamic Allocation PS (Lapse), ● 1.5h incl. fast local access 1.2h ● ● 0.6h ● 0 0.4h 0.2h 1x4 2x4 4x4 8x4 Parallelism (nodes x threads)

How to reduce communication overhead? DATA worker 2 worker 1 5 / 11 Latency hiding PARAMETERS Parameter blocking PARAMETERS DATA Data clustering PARAMETERS DATA parameter access ◮ Common techniques to reduce overhead: ◮ Key is to avoid remote accesses ◮ Do PSs support these techniques? ◮ Techniques require local access at difgerent nodes over time ◮ But PSs allocate parameters statically

Dynamic Parameter Allocation Localize(parameters) 6 / 11 ◮ What if the PS could allocate parameters dynamically? ◮ Would provide support for ◮ Data clustering � ◮ Parameter blocking � ◮ Latency hiding � ◮ We call this dynamic parameter allocation

The Lapse Parameter Server Stale Serializability Sequential Causal PRAM Eventual 7 / 11 Lapse PS per-key consistency guarantees (for synchronous operations) Classic ◮ Features ◮ Dynamic allocation ◮ Location transparency ◮ Retains sequential consistency � � � � � � � � × � � × × × × ◮ Many system challenges (see paper) ◮ Manage parameter locations ◮ Route parameter accesses to current location ◮ Relocate parameters ◮ Handle reads and writes during relocations ◮ All while maintaining sequential consistency

3. Comparison to bounded staleness PSs 4. Comparison to manual management 5. Ablation study Experimental study Tasks: matrix factorization, knowledge graph embeddings, word vectors Cluster: 1–8 nodes, each with 4 worker threads, 10 GBit Ethernet 1. Performance of Classic PSs 2–28x faster and more scalable Competitive to a specialized low-level implementation Combining fast local access and dynamic allocation is key 8 / 11 ◮ 2–8 nodes barely outperformed 1 node in all tested tasks 2. Efgect of dynamic parameter allocation ◮ 4–203x faster than a Classic PSs, up to linear speed-ups Epoch run time in minutes Classic PS (PS−Lite) 4.5h 4h 200 Classic PS with fast local access 2.4h 100 Dynamic Allocation PS (Lapse), ● 1.5h incl. fast local access 1.2h ● ● 0.6h ● 0 0.4h 0.2h 1x4 2x4 4x4 8x4 Parallelism (nodes x threads)

Comparison to Bounded Staleness PS Non-linear scaling 9 / 11 overhead Single-node ◮ Matrix factorization (matrix with 1b entries, rank 100) ◮ Parameter blocking 40 Epoch run time in minutes Bounded staleness PS (Petuum), client sync. 0.6x 30 20 Bounded staleness PS ● (Petuum), server sync. 10 ● Dynamic Allocation 2.9x ● PS (Lapse) 8.4x ● 0 1x4 2x4 4x4 8x4 Parallelism (nodes x threads)

Experimental study Tasks: matrix factorization, knowledge graph embeddings, word vectors Cluster: 1–8 nodes, each with 4 worker threads, 10 GBit Ethernet 1. Performance of Classic PSs 10 / 11 ◮ 2–8 nodes barely outperformed 1 node in all tested tasks 2. Efgect of dynamic parameter allocation ◮ 4–203x faster than a Classic PSs, up to linear speed-ups 3. Comparison to bounded staleness PSs ◮ 2–28x faster and more scalable 4. Comparison to manual management ◮ Competitive to a specialized low-level implementation 5. Ablation study ◮ Combining fast local access and dynamic allocation is key

Dynamic Parameter Allocation in Parameter Servers communication overhead https://github.com/alexrenz/lapse-ps 11 / 11 ◮ Key challenge in distributed Machine Learning (ML): ◮ Parameter Servers (PSs) ◮ Intuitive ◮ Limited support for common techniques to reduce overhead ◮ How to improve support? ◮ Dynamic parameter allocation ◮ Is this support benefjcial? ◮ Up to two orders of magnitude faster ◮ Lapse is open source:

Dynamic Parameter Allocation in Parameter Servers Alexander - PowerPoint PPT Presentation

Dynamic Parameter Allocation in Parameter Servers Alexander Renz-Wieland 1 , Rainer Gemulla 2 , Stefgen Zeuch 1,3 , Volker Markl 1,3 VLDB 2020 1 / 11 1 TU Berlin, 2 University of Mannheim, 3 DFKI Takeaways communication overhead 2 / 11 Key

Ordinary DNS: www.google.com A? Client's k.root-servers.net com. NS a.gtld-servers.net Resolver

Dynamic Memory Allocation Today Dynamic memory allocation mechanisms & policies

Ordinary DNS: www.google.com A? Client's k.root-servers.net com. NS a.gtld-servers.net Resolver

COSC 2P91 Dynamic allocation Week 4a Brock University Brock University (Week 4a) Dynamic

Dynamic Resource Allocation for Database Servers Running on Virtual Storage Gokul Soundararajan,

More Register Allocation Last time Register allocation Global allocation via graph

Server Design Server Design Srinidhi Varadarajan Topics Topics Types of servers Server

Services Stephen James Clients vs Servers Clients consume services Servers provide

Chapter 4 The Medium Access Control Sublayer 1 The Channel Allocation Problem Static

6. Parameter Passing Parameter Passing CS 381 Spring 2016 Example (Formal) Parameter void

10/16/19 Parameter Control Genetic Algorithms Motivation Parameter setting Tuning

Dynamic Routing; Configuration of Overloaded Interacting Servers N.D. Vvedenskaya, Moscow, IITP

Project Nexus Principle Workshop Project Nexus Principle Workshop ALLOCATION ALLOCATION 15

WP5 : Auction- -Driven Driven WP5 : Auction Dynamic Spectrum Allocation Dynamic Spectrum

COMMUNICATING [with empathy] @ DY DYNAMIC JILL JILL @ DY DYNAMIC JILL TENSION IS INEVITABLE @

Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Minema Minema

Commonsense benchmarks Or how to measure that your model is actually doing some commonsense

Modular Codegen Further Benefits of Explicit Modularization Module Flavours #ifndef FOO_H

BUILD A BROWSER BY KOTLIN COLIN LEE SEBASTIAN KASPARI @colinmlee @Anti_Hype Copenhagen

Dynamic Montague Grammar Lite Martin Jansche November 1998 . . . we are convinced that the

OWL: the Web Ontology Language Alun Preece http://www.csd.abdn.ac.uk/~ apreece/foaf.rdf OWL:

Networking Layers Heterogeneity: Abstraction: Flexibility: Multiple link-types App is

John Zhuang Sep. 2018 www.britesemi.com CONTENTS Revolution in Business Model 01 Trusted

for Data-Efficient GAN Training Shengyu Zhao 1,2 Ji Lin 1 Jun-Yan Zhu 3,4 Song Han 1 Zhijian Liu 1

Dynamic Parameter Allocation in Parameter Servers Alexander - PowerPoint PPT Presentation

Dynamic Parameter Allocation in Parameter Servers Alexander Renz-Wieland 1 , Rainer Gemulla 2 , Stefgen Zeuch 1,3 , Volker Markl 1,3 VLDB 2020 1 / 11 1 TU Berlin, 2 University of Mannheim, 3 DFKI Takeaways communication overhead 2 / 11 Key

Ordinary DNS: www.google.com A? Client's k.root-servers.net com. NS a.gtld-servers.net Resolver

Dynamic Memory Allocation Today Dynamic memory allocation mechanisms &amp; policies

Ordinary DNS: www.google.com A? Client's k.root-servers.net com. NS a.gtld-servers.net Resolver

COSC 2P91 Dynamic allocation Week 4a Brock University Brock University (Week 4a) Dynamic

Dynamic Resource Allocation for Database Servers Running on Virtual Storage Gokul Soundararajan,

More Register Allocation Last time Register allocation Global allocation via graph

Server Design Server Design Srinidhi Varadarajan Topics Topics Types of servers Server

Services Stephen James Clients vs Servers Clients consume services Servers provide

Chapter 4 The Medium Access Control Sublayer 1 The Channel Allocation Problem Static

6. Parameter Passing Parameter Passing CS 381 Spring 2016 Example (Formal) Parameter void

10/16/19 Parameter Control Genetic Algorithms Motivation Parameter setting Tuning

Dynamic Routing; Configuration of Overloaded Interacting Servers N.D. Vvedenskaya, Moscow, IITP

Project Nexus Principle Workshop Project Nexus Principle Workshop ALLOCATION ALLOCATION 15

WP5 : Auction- -Driven Driven WP5 : Auction Dynamic Spectrum Allocation Dynamic Spectrum

COMMUNICATING [with empathy] @ DY DYNAMIC JILL JILL @ DY DYNAMIC JILL TENSION IS INEVITABLE @

Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Minema Minema

Commonsense benchmarks Or how to measure that your model is actually doing some commonsense

Modular Codegen Further Benefits of Explicit Modularization Module Flavours #ifndef FOO_H

BUILD A BROWSER BY KOTLIN COLIN LEE SEBASTIAN KASPARI @colinmlee @Anti_Hype Copenhagen

Dynamic Montague Grammar Lite Martin Jansche November 1998 . . . we are convinced that the

OWL: the Web Ontology Language Alun Preece http://www.csd.abdn.ac.uk/~ apreece/foaf.rdf OWL:

Networking Layers Heterogeneity: Abstraction: Flexibility: Multiple link-types App is

John Zhuang Sep. 2018 www.britesemi.com CONTENTS Revolution in Business Model 01 Trusted

for Data-Efficient GAN Training Shengyu Zhao 1,2 Ji Lin 1 Jun-Yan Zhu 3,4 Song Han 1 Zhijian Liu 1

Dynamic Memory Allocation Today Dynamic memory allocation mechanisms & policies