Coded QR Decomposi.on Quang Minh Nguyen, MIT Haewon Jeong, Harvard - PowerPoint PPT Presentation

Coded QR Decomposi.on Quang Minh Nguyen, MIT Haewon Jeong, Harvard University Pulkit Grover, Carnegie Mellon University 1

Mo.va.on 2

Mo.va.on Coded Compu)ng Coding Theory + Distributed Compu.ng Straggling Issue in Cloud Compu.ng Other Issues ?? - Coded Matrix Mul)plica)on [Lee et al. ’15, ’17, Yu et al. ’17, Jeong et al. ’17, ‘18, Baharav ’ 18, Sinong et al. ‘18, Shahrzad et al. ‘19] - Coded MapReduce [Li et al. ’15, ’17, ’18] - Coded Gradient Descent [Tandon et al. ’16, Raviv et al. ’17 Halbawi et al. ’18, Ye ’18] 3

Mo.va.on Other Issues ?? - Processor Failure Issue in High Performance Compu.ng (HPC) 4

Mo.va.on Other Issues ?? - Processor Failure Issue in High Performance Compu.ng (HPC) Larger Scale à Unreliability !! Fugaku supercomputer (2021) 150,000 nodes Mean-.me-between-failures (MTBF) System-level MTBF=24-48 hours ~ node-level MTBF=411-822 years!! 5

Mo.va.on Other Issues ?? - Processor Failure Issue in High Performance Compu.ng (HPC) Larger Scale à Unreliability !! HPC’s Solu.on: Algorithm-based fault-tolerance (ABFT) = Fugaku supercomputer (2021) adding encoded redundancy tailored 150,000 nodes to specific algorithm. Mean-.me-between-failures (MTBF) System-level MTBF=24-48 hours ~ node-level MTBF=411-822 years!! Same idea as Coded Compu)ng !! 6

Mo.va.on bridge the gap ABFT for Coded HPC Compu)ng • QR Decomposi.on-- an important matrix factoriza.on in HPC, where ABFT faces challenges • More prac.cal HPC seeng that was not considered in coded compu.ng literature: - Block-cyclic distribu.on - In-node checksum storage (storing redundancies in systema.c nodes) à Coded QR Decomposi>on 7

What is QR Decomposi.on? Orthogonal Q (i.e. Q T Q = I) Upper triangular R • QR decomposi.on is widely used in many HPC applica.ons: solving system of linear equa.ons, SVM, linear least squares problem, etc. 8

ABFT for QR Decomposi.on Key idea: [O. Maslennikow et al. ‘98, P. Du et al. ‘12, P.Wu et al. ’ 14] R’ A R check- Q checksums sums R’ is upper-triangular à R is upper-triangular So we can retrieve A=Q x R as the QR decomposi.on of A. 9

Challenges in Coding for QR Decomposi.on • Can we do the same trick for Q protec.on? NO . Not orthogonal Q A checksums = x Q’ R Q’ T x Q’ = I does not imply Q T x Q = I checksums • Proven in [Theorem 5.1, P. Du et al. ’ 12] . à Challenge 1: Q protec)on via coding? Can we efficiently restore the orthogonality of Q? 10

Challenges in Coding for QR Decomposi.on In-node checksum storage: • was recently proposed for ABFT [P. Du et al. ’ 12] . • stores coded data (checksums) in original processors instead of adding extra processors for fault tolerance. 11

Challenges in Coding for QR Decomposi.on In-node checksum storage: Out-of-node checksum storage: (Conven.onal seeng) checksum checksum A 0 A 1 A 0 +A 1 A 0 A 1 A 0 +A 1 Node Node Node Node Node 0 1 0 1 2 - Fundamental Limit?? - Op.mal coding strategy: MDS à Can we s.ll have some op.mality guarantee like MDS condi.on? à Challenge 2: minimal number of checksums required under in-node checksum storage? 12

Summary of Challenges Challenge 1: Q protec)on via coding? Challenge 2: minimal number of checksums required under in-node checksum storage? à Our Contribu>on: Address these 2 challenges 13

System Model • For fault tolerance, we encode the n x n matrix A with both ver.cal and horizontal checksums as follows: where and are checksum-generator matrices. G v G h • Out-of-node checksum storage: The checksums are distributed over the new set of checksum processors. 14

System Model Coded Compu)ng: Master-Worker SeWng Input Master Node A 0 A 1 A 2 A 3 redundancy A 2 A 3 A 0 A 1 A 0 A 1 Worker Worker Worker Worker Worker 1 2 3 4 5 Output Master Node 15

System Model Coded Compu)ng: Master-Worker SeWng HPC SeWng: 2D block-cyclic distribu)on Input Master Node The input matrix A is distributed among • processors. The below layout is maintained throughout the A 0 A 1 A 2 A 3 • computa.on. redundancy Systema.c A 2 A 3 A 0 A 1 A 0 A 1 processors Worker Worker Worker Worker Worker Checksum 1 2 3 4 5 processors Output Master Node 16

Failure Model and Real -.me Recovery in HPC Single-node fail-stop failures: • A failure corresponds to a systema.c processor that completely stops responding, and loses its part of the global data. • The iden.ty of the failed processor is provided by some external source. Real-.me Recovery: • The failure can occur at any point during the execu.on of QR decomposi.on, immediately triggering the recovery process. • Computa.on con.nues once the system has recovered from its latest failure. 17

QR Decomposi.on: Modified Gram- Schmidt (MGS) algorithm We consider MGS, one of the 3 most widely use algorithms for QR decomposi.on. R Q computa.on computa.on 18

Main Results Checksum-preserva.on for MGS Checksums preserved to facilitate fault-tolerant computa.on Challenge 1: Q protec)on via coding? à Post-orthogonaliza.on Post-processing to restore the Degraded Orthogonality Challenge 2: minimal number of checksums required under in-node checksum storage? à Op.mality for in-node checksum storage seeng Minimal number of checksums for single-node failure tolerance 19

Checksum-preserva.on for MGS • To facilitate real-.me recovery, we want the checksums to be preserved at any itera.on of MGS (or GS). A → ! ! • We encode , and QR-factorizes . A A • At each itera.on , the algorithm t = 1,..., T maintains the updates and , so that at Q ( t ) R ( t ) ! the end is the QR decomposi.on A = Q ( T ) R ( T ) ! of . A 20

Checksum-preserva.on for MGS We prove that: At any itera.on of MGS, t ! Q ( t ) R ( t ) A ( t ) ( t ) ( t ) G h Q 1 R R A AG h 1 1 ( t ) G v Q 1 G v A Checksums preserved! 21

Checksum-preserva.on for MGS At the end, i.e. , we have: t = T ! Q ( T ) R ( T ) A Q 1 R R 1 G h A AG h 1 G v Q 1 G v A à Retrieve where is non-orthogonal (first challenge), and Q 1 A = Q 1 R 1 is upper-triangular. R 22 1

Challenge 1: Degraded Orthogonality of Conven.onal Coding Challenge 1: Not orthogonal R R 1 G h AG h Q 1 A 1 G v Q 1 G v A In this work, we raise the ques.on “How ‘non-orthogonal’ is ?” Q 1 23

Challenge 1: Degraded Orthogonality of Conven.onal Coding Challenge 1: Not orthogonal R R 1 G h AG h Q 1 A 1 G v Q 1 G v A In this work, we raise the ques.on “How ‘non-orthogonal’ is ?” Q 1 Main Idea: Cheap Post-processing: orthogonal matrix ! Q 1 → 24

Challenge 1: Degraded Orthogonality of Conven.onal Coding Challenge 1: Not orthogonal R R 1 G h AG h Q 1 A 1 G v Q 1 G v A In this work, we raise the ques.on “How ‘non-orthogonal’ is ?” Q 1 Main Idea: Cheap Post-processing: G 0 Q 1 à Post-orthogonaliza)on: orthogonal matrix ! Q 1 → 25

Post-orthogonaliza.on Ques)on: Can we always construct such that G 0 is orthogonal? G 0 Q 1 Not orthogonal c x n matrix It depends on . G v Q 1 Orthogonal G v Q 1 Checksum-generator matrix under our control !! 26

Construc.on of G 0 n G v : G 1 c V n-c c c n-c c V I c + G 1 G 0 is sparse as G 0 = n-c − I n − c V T 27

Post-orthogonaliza.on Condi.on for Checksum-generator Matrix Main Result: We could prove that if , then: • is orthogonal ( G 0 Q 1 ) Post-orthogonaliza)on • is inver.ble condi)on G 0 Reminder: ⎡ ⎤ checksum-generator matrix: G v = G 1 V ⎣ ⎦ à is now the QR decomposi.on A ' = G 0 A = ( G 0 Q 1 ) R of ! But would be useful? A ' A ' 28

Post-orthogonaliza.on for Linear Solvers • We consider QR decomposi.on in solving a non-singular square system of linear equa.ons: Ax = b ⇔ A ' x = ( G 0 A ) x = G 0 b • QR factoriza.on of can now be used to find x: A ' ( G 0 Q 1 ) Rx = G 0 b Overhead of post-orthogonaliza.on: Matrix mul.plica.ons and ⇔ Rx = ( G 0 Q 1 ) T ( G 0 b ) ( G 0 Q 1 ) ( G 0 b ) • Finally, x can be found by triangular solve. à As G 0 is sparse, the total overhead for fault- tolerance is negligible. 29

Checksum-Generator Matrices for Single-Node Failures Note: • Single-node failure is the most common scenario in HPC. • Anything related to mul.ple-node failure scenarios would be interes.ng future work! 30

Checksum-Generator Matrices for Single- Node Failures Recap: R-factor protec.on: • Designing is straighporward, as there is no restric.on. G h • We can use MDS code for op.mality. Post-orthogonaliza)on Q-factor protec.on: condi)on • must sa.sfy . ⎡ ⎤ G v = G 1 V ⎣ ⎦ à Construc.on of to tolerate single-node failures. G v 31

In-node Checksum Storage 32

In-node Checksum Storage checksum A 0 A 1 A 0 +A 1 Node Node 0 1 • This new seeng could be more appealing in prac.ce as it does not require addi.onal processors. à Can we s.ll have some op.mality guarantee like MDS condi.on? à Challenge 2: minimal number of checksums required under this seWng? 33

Coded QR Decomposi.on Quang Minh Nguyen, MIT Haewon Jeong, Harvard - PowerPoint PPT Presentation

Coded QR Decomposi.on Quang Minh Nguyen, MIT Haewon Jeong, Harvard University Pulkit Grover, Carnegie Mellon University 1 Mo.va.on 2 Mo.va.on Coded Compu)ng Coding Theory + Distributed Compu.ng Straggling Issue in Cloud Compu.ng Other Issues

Mesh Anima.on Decomposi.on and Compression Zhigang Deng

M12 X-coded 10Gb/s M12 X-Coded Field installable for Rail D4 Industrial Ethernet, Ethernet/IP

Coded Computational Photography ! EE367/CS448I: Computational Imaging and Display !

Turbo Codes and Turbo-Coded Modulation Turbo Codes and Turbo-Coded Modulation in CDMA Mobile

A Decomposi+on-Based Architecture for Distributed Virtual Network

Construcng Tree Decomposions Using Itera*ve Compression

Time Series Analysis and Mining with R Time Series Decomposi- tion Time Series Forecasting

Cyclic Coded Integer-Forcing Equalization Or Ordentlich Joint work with Uri Erez EE-Systems, Tel

Network Coding-Aware Queue Network Coding Aware Queue Management for Unicast Flows over Coded

Applies to research involving coded private information or human biological specimens that is

Binary-Coded Genetic Algorithm Lecture 22 ME EN 575 Andrew Ning aning@byu.edu Outline

5/22/2013 Gottschalk v. Benson (1972) 8. The method of converting signals from binary coded

Improved Computation-Communication Trade-Off for Coded Distributed Computing using Linear

Improved Lower Bounds for Coded Caching Aditya Ramamoorthy Iowa State University Joint work with

Analysis and Reporting of Injury-Related Inpatient Hospitalizations Using ICD-10-CM-coded

Functional Assessment of Erasure Coded Storage Archive Computer Systems, Cluster, and Networking

Vectors The standard geometric definition of vector is as something which has direction and

Chapter 6 Orthogonality and Least Squares Section 6.1 Inner Product, Length, and Orthogonality

NUMERICS OF THE GRAM-SCHMIDT ORTHOGONALIZATION PROCESS Miro Rozlo zn k Institute of

6.1 Inner Product, Length & Orthogonality Not all linear systems have solutions. 1 2 3 x 1

Double-glueing and Orthogonality: Refining Models of Linear Logic through Realizability

More Regression Algebra James H. Steiger Department of Psychology and Human Development

Orthogonality and orthonormality Inner product 1 Definition (inner product) Let V be a vector

Classification of self-orthogonal F q + u F q -codes Classification of self-orthogonal F q + u F q

Coded QR Decomposi.on Quang Minh Nguyen, MIT Haewon Jeong, Harvard - PowerPoint PPT Presentation

Coded QR Decomposi.on Quang Minh Nguyen, MIT Haewon Jeong, Harvard University Pulkit Grover, Carnegie Mellon University 1 Mo.va.on 2 Mo.va.on Coded Compu)ng Coding Theory + Distributed Compu.ng Straggling Issue in Cloud Compu.ng Other Issues

Mesh Anima.on Decomposi.on and Compression Zhigang Deng

M12 X-coded 10Gb/s M12 X-Coded Field installable for Rail D4 Industrial Ethernet, Ethernet/IP

Coded Computational Photography ! EE367/CS448I: Computational Imaging and Display !

Turbo Codes and Turbo-Coded Modulation Turbo Codes and Turbo-Coded Modulation in CDMA Mobile

A Decomposi+on-Based Architecture for Distributed Virtual Network

Construc*ng Tree Decomposi*ons Using Itera*ve Compression

Time Series Analysis and Mining with R Time Series Decomposi- tion Time Series Forecasting

Cyclic Coded Integer-Forcing Equalization Or Ordentlich Joint work with Uri Erez EE-Systems, Tel

Network Coding-Aware Queue Network Coding Aware Queue Management for Unicast Flows over Coded

Applies to research involving coded private information or human biological specimens that is

Binary-Coded Genetic Algorithm Lecture 22 ME EN 575 Andrew Ning aning@byu.edu Outline

5/22/2013 Gottschalk v. Benson (1972) 8. The method of converting signals from binary coded

Improved Computation-Communication Trade-Off for Coded Distributed Computing using Linear

Improved Lower Bounds for Coded Caching Aditya Ramamoorthy Iowa State University Joint work with

Analysis and Reporting of Injury-Related Inpatient Hospitalizations Using ICD-10-CM-coded

Functional Assessment of Erasure Coded Storage Archive Computer Systems, Cluster, and Networking

Vectors The standard geometric definition of vector is as something which has direction and

Chapter 6 Orthogonality and Least Squares Section 6.1 Inner Product, Length, and Orthogonality

NUMERICS OF THE GRAM-SCHMIDT ORTHOGONALIZATION PROCESS Miro Rozlo zn k Institute of

6.1 Inner Product, Length &amp; Orthogonality Not all linear systems have solutions. 1 2 3 x 1

Double-glueing and Orthogonality: Refining Models of Linear Logic through Realizability

More Regression Algebra James H. Steiger Department of Psychology and Human Development

Orthogonality and orthonormality Inner product 1 Definition (inner product) Let V be a vector

Classification of self-orthogonal F q + u F q -codes Classification of self-orthogonal F q + u F q

Construcng Tree Decomposions Using Itera*ve Compression

6.1 Inner Product, Length & Orthogonality Not all linear systems have solutions. 1 2 3 x 1