Decentralized Stochastic Optimization and Gossip Algorithms with - PowerPoint PPT Presentation

Feb 15, 2023 •343 likes •432 views

ICML 2019 Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi EPFL, Switzerland mlo.epfl.ch June 11, 2019 S. U. Stich CHOCO-SGD 1 Decentralized

ICML 2019 Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi EPFL, Switzerland mlo.epfl.ch June 11, 2019 S. U. Stich CHOCO-SGD 1
Decentralized Stochastic Optimization � � n f ( x ) := 1 � min f i ( x ) n x ∈ R d i =1 ← devices ← communication links f j ( x ) f i ( x ) each device has oracle access to stochastic gradients g i ( x ) , E g i ( x ) = ∇ f i ( x ) , Var[ g i ] ≤ σ 2 i S. U. Stich CHOCO-SGD 2
Decentralized Stochastic Optimization Applications: servers, mobile devices, sensors, hospitals, ... Advantages: • no central coordinator • local communication vs. all-reduce • data distributed (storage & privacy aspects) This work: bandwidth restricted setting where communication is a bottleneck S. U. Stich CHOCO-SGD 3
Data Compression for Efficient Communication Communication Compression: Compress models/model updates before sending over the network. This work: Arbitrary compressors, supporting the main SOTA techniques! General Compressor: Q : R d → R d can be biased! E Q � x − Q ( x ) � 2 ≤ (1 − δ ) � x � 2 ∀ x ∈ R d Examples: Quantization, rounding, sign, top- k , rank- k S. U. Stich CHOCO-SGD 4
Main Contribution: CHOCO-SGD We propose CHOCO-SGD: a decentralized SGD algorithm with communication compression. Main result: CHOCO-SGD converges at the rate � ¯ � σ 2 1 x T ) − f ⋆ = O f (¯ + µnT µ 2 δ 2 ρ 4 T 2 � �� linear speedup higher order term, accounting matches centralized baseline for topology and compression σ = 1 n σ 2 f µ -strong convex, variance ¯ i , spectral gap of topology ρ > 0 • first scheme with linear speedup for arbitrary compressors • improves over previous approach [Tang et al., Neurips 18] S. U. Stich CHOCO-SGD 5
Key Technique: CHOCO-Gossip We propose CHOCO-Gossip: a new algorithm with communication compression for the average consensus problem: n x = 1 � ¯ x i n i =1 classic gossip averaging compression with error feedback + [Xiao & Boyd, 04] [Stich et al., NeurIPS 18] • linear convergence for arbitrary compressors • all previous gossip schemes with compression did not converge linearly (or not at all) for arbitrary compressors S. U. Stich CHOCO-SGD 6
Experimental Results Example: quantization to 4bits epochs transmitted data Logistic regression on epsilon dataset, ring topology with n = 9 nodes. S. U. Stich CHOCO-SGD 7

Recommend

CS5412: USING GOSSIP TO BUILD OVERLAY NETWORKS Lecture XX Ken Birman Gossip and Network

Gossip-Based Networking Workshop 1 CS5412: USING GOSSIP TO BUILD OVERLAY NETWORKS Lecture XX Ken Birman Gossip and Network Overlays A topic that has received a lot of recent attention Today well look at three representative

443 views • 16 slides

Gossip and Self-Stabilization Lonnie Princehouse CS 5412 February 28, 2012 Gossip Protocols

Gossip and Self-Stabilization Lonnie Princehouse CS 5412 February 28, 2012 Gossip Protocols Gossip is the family of protocols loosely characterized by Randomized peer selection Probabilistic convergence Round-based execution

1.18k views • 51 slides

Ken Birman i Cornell University. CS5410 Fall 2008. Gossip 201 Last time we saw that gossip

Ken Birman i Cornell University. CS5410 Fall 2008. Gossip 201 Last time we saw that gossip spreads in log(system size) time But is this actually fast? B i hi ll f 1.0 d % infected 0.0 Time Gossip in distributed

493 views • 32 slides

Balancing Gossip Exchanges in Networks with van Renesse and Firewalls L. Rodrigues

Balancing Gossip Exchanges in Networks with Firewalls J. Leit ao, R. Balancing Gossip Exchanges in Networks with van Renesse and Firewalls L. Rodrigues Introduction Balancing J. Leit ao, R. van Renesse and L. Rodrigues Gossip

1.02k views • 76 slides

Heterogeneous Gossip Davide Frey Rachid Guerraoui Anne-Marie Kermarrec Boris Koldehofe Maxime

Heterogeneous Gossip Davide Frey Rachid Guerraoui Anne-Marie Kermarrec Boris Koldehofe Maxime Monod Martin Mogensen Vivien Quma Outline Context Live Streaming Gossip Limitations Heterogeneous Gossip Protocol

824 views • 24 slides

Gossip Gossip pping in pp pp pping in p g g Bolo Bolo ogna ogna Ozalp Ba Ozalp Ba

Gossip Gossip pping in pp pp pping in p g g Bolo Bolo ogna ogna Ozalp Ba Ozalp Ba abaoglu abaoglu ALMA MATER STUDIORUM U ALMA MATER STUDIORUM U UNIVERSITA DI BOLOGNA UNIVERSITA DI BOLOGNA Background Background

655 views • 47 slides

CS5412/LECTURE 12 Ken Birman GOSSIP PROTOCOLS CS5412 Spring 2019

CS5412/LECTURE 12 Ken Birman GOSSIP PROTOCOLS CS5412 Spring 2019 HTTP://WWW.CS.CORNELL.EDU/COURSES/CS5412/2019SP 1 GOSSIP 101 Gossip protocols: Ones in which information is spread node-to-node at random, like a Zombie virus. At first, the

544 views • 42 slides

CS5412: BIMODAL MULTICAST ASTROLABE Lecture XIX Ken Birman Leiden; Dec 06 Gossip 201 2

Gossip-Based Networking Workshop 1 CS5412: BIMODAL MULTICAST ASTROLABE Lecture XIX Ken Birman Leiden; Dec 06 Gossip 201 2 Recall from early in the semester that gossip spreads in log(system size) time But is this actually

607 views • 40 slides

Hierarchical Spatial Gossip for Hierarchical Spatial Gossip for Multi- -Resolution

Hierarchical Spatial Gossip for Hierarchical Spatial Gossip for Multi- -Resolution Representations Resolution Representations Multi in Sensor Networks in Sensor Networks Rik Sarkar Sarkar Xianjin Zhu Zhu Jie Jie Gao Gao Rik

489 views • 23 slides

Dual Effect in Stochastic Optimization February 10, 2015 P. Carpentier Master MMMEF Cours

Closed Loop Stochastic Optimization Problems Dual Effect in Stochastic Optimization Dual Effect in Stochastic Optimization February 10, 2015 P. Carpentier Master MMMEF Cours MNOS 2014-2015 162 / 267 Closed Loop Stochastic Optimization

673 views • 49 slides

Stochastic Optimization and Discretization January 06, 2021 P. Carpentier Master Optimization

Stochastic Programming: the Scenario Tree Method Stochastic Optimal Control and Discretization Puzzles A General Convergence Result Stochastic Optimization and Discretization January 06, 2021 P. Carpentier Master Optimization Stochastic

1.04k views • 59 slides

Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic

Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert spaces 2 / 48 Outline

1.48k views • 102 slides

OVERHEAD OF A DECENTRALIZED GOSSIP ALGORITHM ON THE PERFORMANCE OF HPC APPLICATIONS ELY

The Hebrew University of Jerusalem Faculty of Computer Science Institute of Systems Architecture, Operating Systems Group OVERHEAD OF A DECENTRALIZED GOSSIP ALGORITHM ON THE PERFORMANCE OF HPC APPLICATIONS ELY LEVY, AMNON BARAK, AMNON

202 views • 17 slides

Introduction to Stochastic Optimization January 13, 2015 P. Carpentier Master MMMEF Cours

General Introduction to Stochastic Optimization Stochastic Gradient Method Overview Introduction to Stochastic Optimization January 13, 2015 P. Carpentier Master MMMEF Cours MNOS 2014-2015 3 / 265 General Introduction to Stochastic

734 views • 46 slides

Stochastic Online Optimization Jian Li Institute of Interdisciplinary Information Sciences

CNCC 2016 Stochastic Online Optimization Jian Li Institute of Interdisciplinary Information Sciences Tsinghua University lijian83@mail.tsinghua.edu.cn Stochastic Online Optimization Stochastic Matching Stochastic Probing

648 views • 49 slides

Simple, Fast and Deterministic Gossip and Rumor Spreading Main paper by: B. Haeupler, MIT Talk

Simple, Fast and Deterministic Gossip and Rumor Spreading Main paper by: B. Haeupler, MIT Talk by: Alessandro Dovis, ETH Presentation Outline What is gossip? Applications Basic Algorithms Advanced Algorithms Other

1.57k views • 88 slides

Combinatorial specifications of permutation classes, via their decomposition trees Mathilde

Combinatorial specifications of permutation classes, via their decomposition trees Mathilde Bouvel (Institut f ur Mathematik, Universit at Z urich) talk based on joint works with F. Bassino, A. Pierrot, C. Pivoteau, D. Rossin Journ

948 views • 94 slides

Advanced Algorithms k -SAT n Boolean variables: x 1 , x 2 ,..., x n

Advanced Algorithms k -SAT n Boolean variables: x 1 , x 2 ,..., x n {true,false} conjunctive normal form: k -CNF = C 1 C 2 C m Is satisfiable? m clauses: C 1 , C 2 ,..., C m

1.02k views • 42 slides

Parallel 3D-FFTs for multi-core nodes on a mesh communication network Joachim Hein 1,2 , Heike

Parallel 3D-FFTs for multi-core nodes on a mesh communication network Joachim Hein 1,2 , Heike Jagode 3,4 , Ulrich Sigrist 2 , Alan Simpson 1,2 , Arthur Trew 1,2 1 HPCX Consortium 2 EPCC, The University of Edinburgh 3 The University of Tennessee

417 views • 24 slides

Op#mizing DNS Authority Server Placement Ning Kong, Guangqing

Op#mizing DNS Authority Server Placement Ning Kong, Guangqing Deng IETF 90, DNSOP Background DNS system is s#ll in an expanding period Universal

625 views • 16 slides

Integrated pollster and vehicle routing S. Gutirrez, A. Miniguano, D. Recalde, L. M. Torres, R.

Integrated pollster and vehicle routing S. Gutirrez, A. Miniguano, D. Recalde, L. M. Torres, R. Torres Centro de Modelizacin Matemtica - ModeMat Escuela Politcnica Nacional - Quito CO@Work 2020 TU Berlin - Berlin Mathematical School,

744 views • 49 slides

Energy Optimal Control for Time Varying Wireless Networks Michael J. Neely University of

Energy Optimal Control for Time Varying Wireless Networks Michael J. Neely University of Southern California http://www-rcf.usc.edu/~mjneely Part 1: A single wireless downlink ( L links) S={Totally Awesome} L 2 1 Slotted time t = 0, 1, 2,

434 views • 22 slides

Symmetric dense matrix tridiagonalization on a GPU cluster Ichitaro Yamazaki, Tim Dong, Stan

Symmetric dense matrix tridiagonalization on a GPU cluster Ichitaro Yamazaki, Tim Dong, Stan Tomov, Jack Dongarra Inovative Computing Lab. University of Tennessee, Knoxville Accelerators and Hybrid Exascale Systems (AsHES) Workshop Boston,

572 views • 18 slides

Decision Trees Lecture 11 David Sontag New York University

Decision Trees Lecture 11 David Sontag New York University Slides adapted from Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore A learning problem: predict fuel efficiency

568 views • 44 slides