coding for distributed computing
play

Coding for Distributed Computing Albin Severinson , Alexandre Graell - PowerPoint PPT Presentation

Coding for Distributed Computing Albin Severinson , Alexandre Graell i Amat , and Eirik Rosnes Department of Electrical Engineering, Chalmers University of Technology, Gothenburg, Sweden University of Bergen/Simula Research


  1. Coding for Distributed Computing Albin Severinson †‡ , Alexandre Graell i Amat † , and Eirik Rosnes ‡ † Department of Electrical Engineering, Chalmers University of Technology, Gothenburg, Sweden ‡ University of Bergen/Simula Research Lab, Bergen, Norway Finse, May 09, 2018

  2. Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing... Motivation Master . . . Communication Bus . . . Server S 1 Server S 2 Server S K Challenges • Straggler problem: May induce a large computational delay. • Bandwidth scarcity: Need to reduce the communication load. Problem addressed: Matrix multiplication • Given an m × n matrix A and N vectors x 1 , . . . , x N , we want to compute y 1 = Ax 1 , . . . , y N = Ax N using K servers. Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 1 / 16

  3. Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing... Bandwidth Scarcity (Coded MapReduce, Li et al. , 2015) y 1 = Ax 1 , y 2 = Ax 2 , y 3 = Ax 3 Server S 1 Has: A 1 A 1 x 1 A 1 x 2 A 1 x 3 A 3 x 1 A 3 x 2 A 3 x 3 A = A 2 Needs: A 2 x 1 A 3 A 2 x 1 A 3 x 2 ⊕ A 1 x 3 Has: Has: A 2 x 1 A 2 x 2 A 2 x 3 A 3 x 1 A 3 x 2 A 3 x 3 A 1 x 1 A 1 x 2 A 1 x 3 A 2 x 1 A 2 x 2 A 2 x 3 Needs: Needs: A 3 x 2 A 1 x 3 Server S 2 Server S 3 Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 2 / 16

  4. Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing... The straggler problem (Speeding up Distributed Machine Learning Using Codes, Lee et al. , 2016) Server S 1 Server S 2 Server S 3 Time Task 1 completed Task 2 completed Task 3 completed Server S 1 Server S 2 Server S 3 Time Task 1 completed Task 2 completed Task 3 completed Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 3 / 16

  5. Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing... The Straggler Problem y = Ax A 1 × x A 1 Server S 1 A = A 2 × x Ax A 2 Decoding Server S 2 A 1 + A 2 × x A 1 + A 2 Server S 3 In general • Introduce redundancy by encoding the input matrix A . • Each server is given more work. However, this may still lower the computational delay! Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 4 / 16

  6. Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing... Coding for distributed computing • [Lee et al. ’17]: Introduce redundant computations using MDS codes to alleviate the straggler problem. • [Li, Maddah-Ali, Avestimehr ’17]: A fundamental tradeoff between computational delay and communication load. A unified coding framework trading higher computational delay for lower communication load. 1 0 . 9 commun. load 0 . 8 0 . 7 0 . 6 0 . 5 0 . 4 0 . 3 0 . 2 0 . 5 1 . 0 1 . 5 2 . 0 2 . 5 3 . 0 computational delay Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 5 / 16

  7. Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing... Unified coding framework [Li, Maddah-Ali, Avestimehr ’17] • Encode the columns of A ∈ F m × n using an ( r, m ) MDS code by multiplying A by an r × n encoding matrix Ψ MDS , i.e., C = Ψ MDS A . • Code length r proportional to number of rows of A → high overall delay! 0 . 7 0 . 6 load w/o encoding&decoding 0 . 5 w/ encoding&decoding 6 delay 4 2 10 1 10 2 number of servers ( K ) • A with n = 10000 columns and m = 2000 K/ 3 rows, N = 2000 K/ 3 vectors, and code rate 2 / 3 ( 2000 rows assigned to each server). Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 6 / 16

  8. Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing... In this talk Two coding schemes to reduce the overall computational delay • Block-diagonal coding scheme, based on a block-diagonal encoding matrix and shorter MDS codes. • LT code-based scheme under inactivation decoding. Outcome • Block-diagonal coding scheme: Significantly lower overall computational delay than the scheme by [Li, Maddah-Ali, Avestimehr ’17] with no or little impact on communication load. • LT code-based scheme: Very good performance when requiring to meet a deadline with high probability, at the expense of an increased communication load. Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 7 / 16

  9. Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing... Block-diagonal coding scheme Idea • Partition A into T disjoint submatrices and apply smaller MDS codes to each submatrix,   ψ 1 � r T , m � ... C = Ψ BDC A , Ψ BDC =  , ψ i : MDS code .   T  ψ T ψ 1 ψ 1 A 1 A 1 Ψ BDC A = ψ 2 A 2 ψ 2 A 2 = A 3 ψ 3 ψ 3 A 3 r × m m × n r × n • Need any m/T out of r/T rows from each partition to decode. Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 8 / 16

  10. Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing... Assignment of coded rows to servers Optimization Solver Server S 1 ψ 1 A 1 Server S 2 ψ 2 A 2 Assignment Strategy . . . ψ 3 A 3 Server S K • Need to assign coded rows to servers very carefully in some instances (such as when the number of servers is small). • This assignment can be formulated as an optimization problem. Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 9 / 16

  11. Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing... Lossless partitioning Theorem For T ≤ r/ � K � , there exists an assignment matrix such that the µq communication load and the computational delay (not taking encoding/decoding delay into account) are equal to those of the unpartitioned scheme by [Li, Maddah-Ali, Avestimehr ’17]. However... The overall computational delay of the block-diagonal coding scheme is much lower than that of the scheme by Li et al. due to its lower encoding and decoding complexity. Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 10 / 16

  12. Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing... Luby-transform code-based scheme LT code-based scheme • Encode A as C = Ψ LT A ; Ψ LT corresponds to an LT code of fixed rate. • Decode the LT code using inactivation decoding. Code design • Design the LT code for a minimum overhead ǫ min and a target failure probability P f , target , such that P f ( ǫ min ) ≤ P f , target . • Increasing ǫ min leads to lower encoding/decoding complexity but increased communication load and may require waiting for more servers − → optimal ǫ min depends on the scenario. • For a given ǫ min and P f , target , optimize the LT code so that the decoding complexity is minimized: for a fixed computational delay of Cx 1 , . . . , Cx N , minimize the computational delay of the decoding phase. Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 11 / 16

  13. Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing... Computational delay and communication load 1 0 . 9 0 . 8 load 0 . 7 0 . 6 0 . 5 0 . 4 5 4 delay Unified [Li et al. ] BDC 3 LT [Lee et al. ] 2 1 10 1 10 2 number of servers ( K ) • A with n = 10000 columns and m = 2000 K/ 3 rows. N = 2000 K/ 3 vectors. Rate 2 / 3 , i.e., 2000 rows assigned to each server and m/T = 10 1 rows per partition. 0 . 9 Coding for Distributed Computing 0 . 8 | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 12 / 16 load 0 7

  14. Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing... Performance as a function of the number of partitions 1 . 0 0 . 9 0 . 8 load Unified [Li et al. ] BDC 0 . 7 LT [Lee et al. ] 0 . 6 0 . 5 5 4 delay 3 2 1 10 1 10 2 10 3 number of partitions ( T ) • A with m = 6000 rows and n = 6000 columns, N = 6 vectors, K = 9 servers, and code rate 2 / 3 . Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 13 / 16

  15. Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing... Distributed computing under a deadline 10 0 10 − 2 10 − 4 Pr(delay > t) 10 − 6 10 − 8 Uncoded 10 − 10 Unified [Li et al. ] BDC 10 − 12 LT 10 − 14 0 . 2 0 . 4 0 . 6 0 . 8 1 1 . 2 1 . 4 1 . 6 t · 10 4 • A with m = 134000 rows and n = 10000 columns, N = 134000 vectors, K = 201 servers, T = 13400 partitions, and code rate 2 / 3 . Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 14 / 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend