Coding for Distributed Computing Albin Severinson , Alexandre Graell - - PowerPoint PPT Presentation

coding for distributed computing
SMART_READER_LITE
LIVE PREVIEW

Coding for Distributed Computing Albin Severinson , Alexandre Graell - - PowerPoint PPT Presentation

Coding for Distributed Computing Albin Severinson , Alexandre Graell i Amat , and Eirik Rosnes Department of Electrical Engineering, Chalmers University of Technology, Gothenburg, Sweden University of Bergen/Simula Research


slide-1
SLIDE 1

Coding for Distributed Computing

Albin Severinson†‡, Alexandre Graell i Amat†, and Eirik Rosnes‡

† Department of Electrical Engineering, Chalmers University of Technology, Gothenburg, Sweden ‡ University of Bergen/Simula Research Lab, Bergen, Norway

Finse, May 09, 2018

slide-2
SLIDE 2

Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing...

Motivation

Master . . .

Server S1 Server S2

. . .

Server SK Communication Bus Challenges

  • Straggler problem: May induce a large computational delay.
  • Bandwidth scarcity: Need to reduce the communication load.

Problem addressed: Matrix multiplication

  • Given an m × n matrix A and N vectors x1, . . . , xN, we want to compute

y1 = Ax1, . . . , yN = AxN using K servers.

Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 1 / 16

slide-3
SLIDE 3

Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing...

Bandwidth Scarcity

(Coded MapReduce, Li et al., 2015)

y1 = Ax1, y2 = Ax2, y3 = Ax3

Server S1 Server S2 Server S3

A =

A2 A1 A3 Has: Has: Has: Needs: Needs: Needs: ⊕ A3x2 A1x3 A2x1 A1x1 A1x2 A1x3 A3x1 A3x2 A3x3 A2x1 A2x1 A2x2 A2x3 A1x1 A1x2 A1x3 A3x2 A3x1 A3x2 A3x3 A2x1 A2x2 A2x3 A1x3

Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 2 / 16

slide-4
SLIDE 4

Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing...

The straggler problem

(Speeding up Distributed Machine Learning Using Codes, Lee et al., 2016)

Time Server S3 Server S2 Server S1 Task 1 completed Task 2 completed Task 3 completed Time Server S3 Server S2 Server S1 Task 1 completed Task 2 completed Task 3 completed

Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 3 / 16

slide-5
SLIDE 5

Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing...

The Straggler Problem y = Ax A =

A2 A1 A1 + A2

Server S1 Server S2 Server S3

Decoding

Ax

A1 ×x A2 ×x A1 + A2 ×x

In general

  • Introduce redundancy by encoding the input matrix A.
  • Each server is given more work. However, this may still lower the

computational delay!

Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 4 / 16

slide-6
SLIDE 6

Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing...

Coding for distributed computing

  • [Lee et al. ’17]: Introduce redundant computations using MDS codes to

alleviate the straggler problem.

  • [Li, Maddah-Ali, Avestimehr ’17]: A fundamental tradeoff between

computational delay and communication load. A unified coding framework trading higher computational delay for lower communication load. 0.5 1.0 1.5 2.0 2.5 3.0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 computational delay

  • commun. load

Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 5 / 16

slide-7
SLIDE 7

Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing...

Unified coding framework [Li, Maddah-Ali, Avestimehr ’17]

  • Encode the columns of A ∈ Fm×n using an (r, m) MDS code by

multiplying A by an r × n encoding matrix ΨMDS, i.e., C = ΨMDSA.

  • Code length r proportional to number of rows of A → high overall delay!

0.5 0.6 0.7 load w/o encoding&decoding w/ encoding&decoding 101 102 2 4 6 number of servers (K) delay

  • A with n = 10000 columns and m = 2000K/3 rows, N = 2000K/3

vectors, and code rate 2/3 (2000 rows assigned to each server).

Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 6 / 16

slide-8
SLIDE 8

Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing...

In this talk

Two coding schemes to reduce the overall computational delay

  • Block-diagonal coding scheme, based on a block-diagonal encoding matrix

and shorter MDS codes.

  • LT code-based scheme under inactivation decoding.

Outcome

  • Block-diagonal coding scheme: Significantly lower overall computational

delay than the scheme by [Li, Maddah-Ali, Avestimehr ’17] with no or little impact on communication load.

  • LT code-based scheme: Very good performance when requiring to meet a

deadline with high probability, at the expense of an increased communication load.

Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 7 / 16

slide-9
SLIDE 9

Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing...

Block-diagonal coding scheme

Idea

  • Partition A into T disjoint submatrices and apply smaller MDS codes to

each submatrix, C = ΨBDCA, ΨBDC =

  

ψ1 ... ψT

   ,

ψi :

r

T , m T

  • MDS code.

ΨBDC A =

ψ2 ψ1 ψ3

r × m

A2 A3 A1

m × n =

ψ2A2 ψ1A1 ψ3A3

r × n

  • Need any m/T out of r/T rows from each partition to decode.

Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 8 / 16

slide-10
SLIDE 10

Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing...

Assignment of coded rows to servers

ψ2A2 ψ1A1 ψ3A3 Assignment Strategy

. . .

Server S1 Server S2 Server SK

Optimization Solver

  • Need to assign coded rows to servers very carefully in some instances

(such as when the number of servers is small).

  • This assignment can be formulated as an optimization problem.

Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 9 / 16

slide-11
SLIDE 11

Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing...

Lossless partitioning

Theorem

For T ≤ r/K

µq

  • , there exists an assignment matrix such that the

communication load and the computational delay (not taking encoding/decoding delay into account) are equal to those of the unpartitioned scheme by [Li, Maddah-Ali, Avestimehr ’17].

However...

The overall computational delay of the block-diagonal coding scheme is much lower than that of the scheme by Li et al. due to its lower encoding and decoding complexity.

Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 10 / 16

slide-12
SLIDE 12

Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing...

Luby-transform code-based scheme

LT code-based scheme

  • Encode A as C = ΨLTA; ΨLT corresponds to an LT code of fixed rate.
  • Decode the LT code using inactivation decoding.

Code design

  • Design the LT code for a minimum overhead ǫmin and a target failure

probability Pf,target, such that Pf(ǫmin) ≤ Pf,target.

  • Increasing ǫmin leads to lower encoding/decoding complexity but increased

communication load and may require waiting for more servers − → optimal ǫmin depends on the scenario.

  • For a given ǫmin and Pf,target, optimize the LT code so that the decoding

complexity is minimized: for a fixed computational delay of Cx1, . . . , CxN, minimize the computational delay of the decoding phase.

Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 11 / 16

slide-13
SLIDE 13

Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing...

Computational delay and communication load

0.4 0.5 0.6 0.7 0.8 0.9 1 load 101 102 1 2 3 4 5 number of servers (K) delay Unified [Li et al.] BDC LT [Lee et al.] 0 7 0.8 0.9 1 load

  • A with n = 10000 columns and m = 2000K/3 rows. N = 2000K/3
  • vectors. Rate 2/3, i.e., 2000 rows assigned to each server and m/T = 10

rows per partition.

Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 12 / 16

slide-14
SLIDE 14

Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing...

Performance as a function of the number of partitions

0.5 0.6 0.7 0.8 0.9 1.0 load Unified [Li et al.] BDC LT [Lee et al.] 101 102 103 1 2 3 4 5 number of partitions (T) delay

  • A with m = 6000 rows and n = 6000 columns, N = 6 vectors, K = 9

servers, and code rate 2/3.

Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 13 / 16

slide-15
SLIDE 15

Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing...

Distributed computing under a deadline

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 ·104 10−14 10−12 10−10 10−8 10−6 10−4 10−2 100 t Pr(delay > t) Uncoded Unified [Li et al.] BDC LT

  • A with m = 134000 rows and n = 10000 columns, N = 134000 vectors,

K = 201 servers, T = 13400 partitions, and code rate 2/3.

Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 14 / 16

slide-16
SLIDE 16

Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing...

Conclusion

Take-home message...

  • The encoding and decoding delay may contribute significantly to the
  • verall computational delay.
  • The BDC scheme yields significantly lower computational delay (up to

70%–80%) with no or little impact on the communication load.

  • The LT code-based scheme achieves very good performance when needing

to meet a deadline with high probability.

  • Paper available in arxiv:
  • A. Severinson, A. Graell i Amat, and E. Rosnes, “Block-Diagonal and LT

Codes for Distributed Computing With Straggling Servers”.

  • Code on Github: github.com/severinson/coded-computing-tools

Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 15 / 16

slide-17
SLIDE 17

Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing...

One More Thing... A =

A2 A1 randomness

Server S1 Server S2 Server S3 Server S4

Encode and Scramble Descramble and Decode

Ax

×x ×x ×x ×x

Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 16 / 16