ICML 2019
Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication
Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi EPFL, Switzerland mlo.epfl.ch June 11, 2019
- S. U. Stich
CHOCO-SGD 1
Decentralized Stochastic Optimization and Gossip Algorithms with - - PowerPoint PPT Presentation
ICML 2019 Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi EPFL, Switzerland mlo.epfl.ch June 11, 2019 S. U. Stich CHOCO-SGD 1 Decentralized
ICML 2019
Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi EPFL, Switzerland mlo.epfl.ch June 11, 2019
CHOCO-SGD 1
min
x∈Rd
n
n
fi(x)
fj(x) ← devices ← communication links
each device has oracle access to stochastic gradients gi(x), Egi(x) = ∇fi(x), Var[gi] ≤ σ2
i
CHOCO-SGD 2
Applications: servers, mobile devices, sensors, hospitals, ... Advantages:
This work: bandwidth restricted setting where communication is a bottleneck
CHOCO-SGD 3
Communication Compression: Compress models/model updates before sending over the network. This work: Arbitrary compressors, supporting the main SOTA techniques! General Compressor: Q: Rd → Rd can be biased! EQ x − Q(x)2 ≤ (1 − δ) x2 ∀x ∈ Rd Examples: Quantization, rounding, sign, top-k, rank-k
CHOCO-SGD 4
We propose CHOCO-SGD: a decentralized SGD algorithm with communication compression. Main result: CHOCO-SGD converges at the rate f(¯ xT ) − f⋆ = O ¯ σ2 µnT
linear speedup matches centralized baseline
+ 1 µ2δ2ρ4T 2
for topology and compression
σ = 1
nσ2 i , spectral gap of topology ρ > 0
CHOCO-SGD 5
We propose CHOCO-Gossip: a new algorithm with communication compression for the average consensus problem: ¯ x = 1 n
n
xi classic gossip averaging
compression with error feedback [Xiao & Boyd, 04] [Stich et al., NeurIPS 18]
linearly (or not at all) for arbitrary compressors
CHOCO-SGD 6
Example: quantization to 4bits epochs transmitted data
Logistic regression on epsilon dataset, ring topology with n = 9 nodes.
CHOCO-SGD 7
+ compression with error feedback gives drastic reduction in communication, without hurting the convergence + first compressed gossip scheme that converges at linear rate + first decentralized SGD with compressed communication that converges for arbitrary compression (without hampering the rate) Compression for free, by enabling error feedback in the decentralized setting
CHOCO-SGD 8
Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication Anastasia Koloskova * 1 Sebastian U. Stich * 1 Martin Jaggi 1 Abstract We consider decentralized stochastic optimiza- tion with the objective function (e.g. data samples for machine learning tasks) being distributed over n machines that can only communicate to their neighbors on a fixed communication graph. To address the communication bottleneck, the nodes compress (e.g. quantize or sparsify) their model