Distributed Learning over Unreliable Networks Chen Yu , Hanlin Tang, - - PowerPoint PPT Presentation

▶

Mar 21, 2023 612 likes •708 views

Distributed Learning over Unreliable Networks Chen Yu , Hanlin Tang, Cedric Renggli, Simon Kassing, Ankit Singla, Dan Alistarh, Ce Zhang, Ji Liu Presenter: Chen Yu AllReduce SGD Server 1 Server 2 Server 2 Worker 1 Worker 2 Worker 3 n x t

SLIDE 1

Distributed Learning over Unreliable Networks

Chen Yu, Hanlin Tang, Cedric Renggli, Simon Kassing, Ankit Singla, Dan Alistarh, Ce Zhang, Ji Liu

Presenter: Chen Yu

SLIDE 2

Server 1 Server 2 Server 2 Worker 1 Worker 2 Worker 3

AllReduce SGD

xt+1 = xt − 1 n

∑

i=1

∇F(xt; ξ(i)

t )

AllReduce

SLIDE 3

Unreliable Network

Server 1 Server 2 Server 2 Worker 1 Worker 2 Worker 3

0.5 1 1.5 2 2.5 20 40 60 80 100 120 140 160

Training loss Epoch 95% Arrival 99% Arrival Baseline

Sharing Gradients Won’t Work

SLIDE 4

Reliable Parameter Server (RPS)

High Level: Share Models

v(i)

= x(i)

t − γg(i) t ,

v(i)

= ((v(i,1)

)

⊤, (v(i,2) t

)

⊤, ⋯, (v(i,n) t

)

⊤

)

⊤

. Local Partition:

˜ v(i)

t =

1 |𝒪(i)

t | ∑ j∈𝒪( j)

v(i,j)

Robust Averaging:

x(i,j)

t+1 = {

˜ v(j)

j ∈ ˜ 𝒪(i)

v(i,j)

j ∉ ˜ 𝒪(i)

Model Update:

SLIDE 5

Convergence Rate

1 T

∑

t=1

𝔽 ∇f(xt)

2 ≲

(σ + ζ)(1 + p(1 − p)) (1 − p) nT + 1 T

Assumptions:

p: Package Dropping Rate T: Total Iterations

Non Convex, with L-Lipschitz Gradient; f(x)

𝔽ξ∼𝒠i∥∇F(x; ξ) − ∇fi(x)∥2 ≤ σ2, ∀i, ∀x;

1 n

∑

i=1

∇fi(x) − ∇f(x)

2 ⩽ ζ2,

∀i, ∀x .

Bounded Data Variance Bounded Dataset Difference

SLIDE 6

Experiments

16 NVIDIA TITAN Xp GPUs, ResNet-110 on CIFAR-10 RPS is Robust Standard SGD is Vulnerable

0.5 1 1.5 2 2.5 20 40 60 80 100 120 140 160

Training loss Epoch 60% Arrival 80% Arrival 90% Arrival 95% Arrival 99% Arrival Baseline

0.5 1 1.5 2 2.5 20 40 60 80 100 120 140 160

Training loss Epoch 95% Arrival 99% Arrival Baseline

SLIDE 7

Thanks

Welcome to Pacific Ballroom #97 to see the poster for more detail