numerically stable binary gradient coding
play

Numerically Stable Binary Gradient Coding Neophytos Charalambides - PowerPoint PPT Presentation

Numerically Stable Binary Gradient Coding Neophytos Charalambides Hessam Mahdavifar Alfred Hero Department of Electrical Engineering and Computer Science, University of Michigan June, 2020 1 / 21 Outline for section 1 Introduction and


  1. Numerically Stable Binary Gradient Coding Neophytos Charalambides Hessam Mahdavifar Alfred Hero Department of Electrical Engineering and Computer Science, University of Michigan June, 2020 1 / 21

  2. Outline for section 1 Introduction and Motivation Gradient Coding Problem Setup Binary Scheme Allocation to Heterogeneous Workers 2 / 21

  3. Issues and Motivation Introduction and Motivation Machine Learning Today: Curse of Dimensionality ◮ Large Datasets — many samples ◮ Complex Datasets — large dimension ◮ Problems become intractable Use distributed methods ◮ Distribute smaller computation assignments ◮ Multiple servers complete various tasks Drawbacks of Distributed Synchronous Computations ◮ Requires all servers to respond — communication overhead ◮ What if stragglers are present? ◮ Stragglers — servers with delays or non-responsive 3 / 21

  4. Gradient Coding 1 Introduction and Motivation 1. Speed up distributive computation — gradient methods 2. Mitigate stragglers 1 R Tandon et al. “Gradient Coding: Avoiding Stragglers in Synchronous Gradient Descent”. In: stat 1050 (2017), p. 8. 4 / 21

  5. Benefits of our Binary Scheme Introduction and Motivation Few schemes deal with exact recovery Common issues with current exact recovery schemes 1. construct and search through a decoding matrix 1 A T ∈ R ( n s ) × n 2. storage issue, and further delay 3. work over R and C — further numerical instability 4. have a strict assumption that ( s + 1) | n Our scheme 1. faster online decoding 2. only deal with { 0 , 1 } encodings — view as “task assignments” 3. ... this makes encoding and decoding numerically stable 4. works for any pair s , n 5. ... extend our construction to work for heterogeneous workers also 5 / 21

  6. Outline for section 2 Introduction and Motivation Gradient Coding Problem Setup Binary Scheme Allocation to Heterogeneous Workers 6 / 21

  7. Distributed Gradient Descent Gradient Coding i =1 � R p × R , or X ∈ R N × p ; y ∈ R N ◮ Dataset D = { ( x i , y i ) } N k � ◮ Partition D = D j , s.t. D i ∩ D j = ∅ and |D j | = N k j =1 ◮ Partial gradients g j — gradient on D j k � ◮ Minimize the loss L ( D ; θ ) = ℓ ( D j ; θ ) j =1 ◮ Gradient descent updates: θ ( t +1) = θ ( t ) − α t g ( t ) g ( t ) j � �� � k � D j ; θ ( t ) � k ◮ g ( t ) = ∇ θ L � D ; θ ( t ) � � � g ( t ) = ∇ θ ℓ = j j =1 j =1 ◮ additive structure allows g ( t ) to be computed in parallel ! 7 / 21

  8. Synchronous Distributed Computation Gradient Coding ◮ Execute gradient descent distributively ◮ Need all workers to respond Figure: Need all responses — g = g 1 + g 2 + g 3 8 / 21

  9. Table of Contents Introduction and Motivation Gradient Coding Problem Setup Binary Scheme Allocation to Heterogeneous Workers 9 / 21

  10. General Setup Problem Setup 10 / 21

  11. Encoding matrix Problem Setup ◮ Rows: workers { W i } n i =1 ◮ b i = encoding vector for W i ◮ Columns: partitions {D j } k i =1 1. nonzero entries: assigned partitions 2. redundancy in assigned D j ’s ◮ Stragglers ≡ erasing rows of B 11 / 21

  12. Table of Contents Introduction and Motivation Gradient Coding Problem Setup Binary Scheme Allocation to Heterogeneous Workers 12 / 21

  13. Example of our Binary Scheme Binary Scheme n = k = 11 , s = 3 = ⇒ r ≡ 3 mod ( s + 1) r workers for B 1 , and ( s + 1 − r ) for B 2   1 1 1 1 1 1 1 1     1 1 1 1     1 1 1 1     ∈ { 0 , 1 } 9 × 11 B 1 = 1 1 1 1     1 1 1 1     1 1 1     1 1 1   1 1 1 � � 1 1 1 1 1 1 ∈ { 0 , 1 } 2 × 11 B 2 = 1 1 1 1 1 13 / 21

  14. Example — Encoding and Decoding Binary Scheme Decoding : only take received workers of same color a T Example : { 2 , 6 , 10 } B = 1 11 × 1             0 0 0 1 1 1 1 1       0 0 0 1 1 1 1 1                             0 0 0 1 1 1 1 1                             1 1 1 1 1 1 0 0 0 1                             0 0 0 1 1 1 1 1                         B = a I ∈ 0 , , 0 , 0 1 1 1 1 1                       0 0 0 1 1 1 1 1                             1 1 1 1 1 0 0 0 1                             1 1 1 1 0 0 0                             0 0 0 1 1 1 1                   0 0 0 1 1 1 1 14 / 21

  15. Main Idea of Our Binary Scheme Binary Scheme ◮ Have B as sparse as possible = ⇒ nnzr( B ) = k · ( s + 1) ◮ Work with congruence classes (mod s + 1) ◮ superposition of rows of each class results in 1 1 × k ◮ Allocate tasks s.t. � b i � 0 ≃ � b j � 0 for all i , j ∈ { 1 , · · · , n } , while satisfying the above two constraints ◮ Formally, construct B that is a solution to n � � � � � � � b i � 0 − ( s +1) · k / n s.t. nnzr( B ) = k · ( s +1) min � B ∈ N n × k i =1 0 ◮ Intuition : B is close to being block diagonal 15 / 21

  16. Construction and Decoding Binary Scheme ◮ Congruence classes C 1 = { [ i ] } r − 1 i =0 and C 2 = { [ i ] } s i = r : 1. r ≡ n mod ( s + 1) 2. respectively identically 3. within each C 1 , C 2 , cardinalities do not differ by more than one 4. construct B 1 and B 2 ◮ B = aggregation of B 1 and B 2 ◮ Decoding : By the pigeonhole principle , for any f workers, at least one complete residue system is present 16 / 21

  17. ⇒ r = 5 Larger Example: n = k = 165 and s = 15 = Binary Scheme Do not want a lot of redundancy — close to block diagonal 17 / 21

  18. Outline for section 3 Introduction and Motivation Gradient Coding Problem Setup Binary Scheme Allocation to Heterogeneous Workers 18 / 21

  19. Setup a Linear System Allocation to Heterogeneous workers ◮ Assume two groups of different machines T 1 , T 2 , s.t. : t i = E [time for T i to compute g j ] and t 1 � t 2 ◮ Goal : Want same expectation time for each worker ◮ Let |J T i | = # of partitions allocated to T i ’s workers ◮ Let |T i | = τ i and τ 1 = α β · τ 2 Solve the linear system: 1. t 1 · |J T 1 | = t 2 · |J T 2 | 2. |J T 1 | · τ 1 + |J T 2 | · τ 2 = ( s + 1) · k 3. τ 2 = β α · τ 1 19 / 21

  20. Main Takeaways of Our Scheme ◮ Gave a simple gradient coding scheme ◮ Faster online decoding ◮ Numerically stable in encoding and decoding ◮ Works for any pair s , n ◮ Extended it to accommodate heterogeneous workers also 20 / 21

  21. Thank you for your attention! �

  22. Outline for section 4 Additional Slides Details of the constructions Explicit Algorithms 22 / 21

  23. Idea Behind Binary Scheme Details of the constructions ◮ When ( s + 1) | n and k = n — B is block diagonal � � ◮ assign to each worker ℓ = n partitions in a repeated sense s +1 ◮ For ( s + 1) ∤ n , each worker in blocks of ( s + 1) rows corresponds to a distinct congruence class (c.c.) mod( s + 1) ◮ When any f workers send their computations, at least one congruence class is met in every block — pigeonhole � � ◮ ∃ i ∈ Z / ( s + 1) s.t. i + j ( s + 1) ∈ I , for all j = 0 , 1 , · · · , ℓ − 1 ◮ there received workers “always form a coset” ◮ Decoding: select any such i , and sum the vectors received by the ℓ − 1 a T = � workers of the c.c. i — e i + j ( s +1) j =0 ◮ Want “even” number of assignments — homogeneous servers 23 / 21

  24. Binary Scheme when ( s + 1) ∤ n Details of the constructions ◮ Determine the integer parameters ◮ n = ℓ · ( s + 1) + r 0 ≤ r < s + 1 ◮ r = t · ℓ + q 0 ≤ q < ℓ ◮ n = λ · ( ℓ + 1) + ˜ 0 ≤ ˜ r < ℓ + 1 r ◮ Define: C 1 := { [ i ] s +1 } r − 1 C 2 := { [ i ] s +1 } s and i =0 i = r ◮ workers C 1 lie in all ( ℓ + 1) blocks, and C 1 lie in first ℓ ◮ C 1 load: { s + 1 , s } if ℓ + r > s , o.w. { λ + 1 , λ } ◮ C 2 load: { s + t + 2 , s + t + 1 } if q > 0, o.w. all have s + t + 1 24 / 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend