cpSGD: c ommunication-efficient and differentially- p rivate - - PowerPoint PPT Presentation

▶

Aug 26, 2023 736 likes •879 views

cpSGD: c ommunication-efficient and differentially- p rivate distributed SGD Naman Agarwal, Ananda Theertha Suresh, Felix X. Yu Sanjiv Kumar, H. Brendan McMahan Distributed learning with mobile devices Train a centralized model; data stays on

SLIDE 1

cpSGD: communication-efficient and differentially-private distributed SGD

Naman Agarwal, Ananda Theertha Suresh, Felix X. Yu Sanjiv Kumar, H. Brendan McMahan

SLIDE 2

Distributed learning with mobile devices

Train a centralized model; data stays on mobile phones. In each iteration...

SLIDE 3

w w

Server sends model to clients...

w w w w ∊ Rd: the model vector

SLIDE 4

w - learning_rate ∑iδwi/n δw4

Clients send updates back...

δw3 δw1 δw2 n: number of clients δwi: gradient of the i-th client

SLIDE 5

w - learning_rate ∑iQ(δwi)/n Q(δw4)

Challenge I: uplink communication is expensive

Q(δw3) Q(δw1) Q(δw2)

Q: quantization

SLIDE 6

How to design the quantization?

Convergence of SGD depends on the MSE of the estimated gradient.
Sufficient to study:

bits vs. quantization error in distributed mean estimation. ○ No compression (float): 32 bits per coordinate; 0 MSE. ○ Binary quantization: 1 bit; O(d/n) MSE ○ Variable length coding: O(1/n) MSE ○ [Suresh et al., 17] [Alistarh et al., 17] [Wen et al., 17] [Bernstein et al., 18]

SLIDE 7

Challenge II: user privacy is important

Differential privacy (DP)

○ Removing or changing single client’s data should not result in big difference in the estimated mean ○ Adding Gaussian noise [Abadi et al., 16]

Both communication efficiency and differential privacy

Goal of this paper

SLIDE 8

∑iQ(xi)/n + Q(x4)

Attempt 1: add Gaussian noise on the server

Q(x3) Q(x1) Q(x2)

DP results readily available

○ Assuming L2 norm of the gradient is bounded (gradient clipping).

Server has to be trustworthy.

SLIDE 9

∑iQ(xi)/n Q(x4)

Attempt 2: add Gaussian noise on the client

Q(x3) Q(x1) Q(x2)

After quantization: no communication efficiency.
Before quantization: hard to analyze.

SLIDE 10

∑iQ(xi)/n Q(x4)

cpSGD: add binomial noise after quantization

Q(x3) Q(x1) Q(x2)

SLIDE 11

cpSGD

Maintains communication efficiency

○ Binomial is discrete.

Differentially private

○ Binomial similar to Gaussian. ○ Extended to d-dimension with improved bound.

Works if server is negligent but not malicious
Works even if clients do not trust the server

○ Secure aggregation.

SLIDE 12

Tue Dec 4th 05:00 -- 07:00 PM Room 210 & 230 AB #27

For d variables and n ≈ d clients, cpSGD uses

O(log log(nd)) bits of communication per client per coordinate
Constant privacy