Algorithms & Techniques for Dense Linear Algebra over Small - PowerPoint PPT Presentation

Algorithms & Techniques for Dense Linear Algebra over Small Finite Fields Martin R. Albrecht (martinralbrecht+summerschool@googlemail.com) POLSYS Team, UPMC, Paris, France ECrypt II PhD Summer School

Outline F 2 Gray Codes Multiplication Elimination F p F 2 e Precomputation Tables Karatsuba Multiplication Performance F p [ x ]

The M4RI Library ◮ available under the GPL Version 2 or later (GPLv2+) ◮ provides basic arithmetic (addition, equality testing, stacking, augmenting, sub-matrices, randomisation, etc.) ◮ asymptotically fast multiplication ◮ asymptotically fast elimination ◮ some multi-core support ◮ Linux, Mac OS X (x86 and PPC), OpenSolaris (Sun Studio Express) and Windows (Cygwin) http://m4ri.sagemath.org

F 2 ◮ field with two elements. ◮ logical bitwise XOR is ⊕ ⊙ addition. 0 0 0 0 ◮ logical bitwise AND is 0 1 1 0 multiplication. 1 0 1 0 ◮ 64 (128) basic operations in 1 1 0 1 at most one CPU cycle ◮ . . . arithmetic rather cheap Memory access is the expensive operation, not arithmetic.

Gray Codes The Gray code [Gra53], named after Frank Gray and also known as reflected binary code, is a numbering system where two consecutive values differ in only one digit.

Gray Code Examples 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 0 0 0 0 0 1 1 0 0 0 1 0 1 1 1 0 0 ⇓ 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 1 1 1 1 1 0 1 1 0 0 1 0 ⇑ 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 0 0 1 1 1 0 1 0 1 0 1 0 1 1 1 0 0 1 1 0 0 0

Applications Gray codes are used in various applications where all vectors over small finite fields need to be enumerated, such as: ◮ matrix multiplication; ◮ fast exhaustive search of Boolean polynomial systems; ◮ cube attacks on Grain-128. Gray codes are a pretty basic part of the cryptographer’s toolkit because they allow to reduce the cost of enumerating all vectors over F 2 of length n from n 2 n − 1 to 2 n − 1.

M4RM [ADKF70] I Consider C = A · B ( A is m × ℓ and B is ℓ × n ). A can be divided into ℓ/ k vertical “stripes” A 0 . . . A ( ℓ − 1) / k of k columns each. B can be divided into ℓ/ k horizontal “stripes” B 0 . . . B ( ℓ − 1) / k of k rows each. We have: ( ℓ − 1) / k � C = A · B = A i · B i . 0

M4RM [ADKF70] II       1 1 0 1 1 0 1 1 1 1 0 0 0 0 0 1 1 0 0 0       A =  , B =  , A 0 =       1 1 1 1 0 1 1 0 1 1     0 1 1 1 0 1 0 1 0 1   0 1 � 1 � 0 � � 0 0 0 1 1 1 1 0   A 1 =  , B 0 = , B 1 =   0 1 1 0 0 1 0 1 1 1  1 1     0 1 0 1 1 1 0 1 0 0 0 0 0 0 0 0     A 0 · B 0 =  , A 1 · B 1 =     1 1 0 1 0 0 1 1    0 1 1 0 0 0 1 1

� n 3 / log n � M4RM: Algorithm O 1 begin C ← − create an m × n matrix with all entries 0; 2 k ← − ⌊ log n ⌋ ; 3 for 0 ≤ i < ( ℓ/ k ) do 4 // create table of 2 k − 1 linear combinations T ← MakeTable ( B , i × k , 0 , k ); 5 for 0 ≤ j < m do 6 // read index for table T id ← − ReadBits ( A , j , i × k , k ); 7 add row id from T to row j of C ; 8 return C ; 9 Algorithm 1: M4RM

Strassen-Winograd [Str69] Multiplication ◮ fastest known pratical algorithm ◮ complexity: O � n log 2 7 � ◮ linear algebra constant: ω = log 2 7 ◮ M4RM can be used as base case for small dimensions → optimisation of this base case

Cache Friendly M4RM I 1 begin C ← − create an m × n matrix with all entries 0; 2 for 0 ≤ i < ( ℓ/ k ) do 3 // this is cheap in terms of memory access T ← MakeTable ( B , i × k , 0 , k ); 4 for 0 ≤ j < m do 5 // for each load of row j we take care of only k bits id ← − ReadBits ( A , j , i × k , k ); 6 add row id from T to row j of C ; 7 return C ; 8

Cache Friendly M4RM II 1 begin C ← − create an m × n matrix with all entries 0; 2 for 0 ≤ start < m / b s do 3 for 0 ≤ i < ( ℓ/ k ) do 4 // we regenerate T for each block T ← MakeTable ( B , i × k , 0 , k ); 5 for 0 ≤ s < b s do 6 j ← − start × b s + s ; 7 id ← − ReadBits ( A , j , i × k , k ); 8 add row id from T to row j of C ; 9 return C ; 10

t > 1 Gray Code Tables I ◮ actual arithmetic is quite cheap compared to memory reads and writes ◮ the cost of memory accesses greatly depends on where in memory data is located ◮ try to fill all of L1 with Gray code tables. ◮ Example: k = 10 and 1 Gray code table → 10 bits at a time. k = 9 and 2 Gray code tables, still the same memory for the tables but deal with 18 bits at once. ◮ The price is one extra row addition, which is cheap if the operands are all in cache.

t > 1 Gray Code Tables II 1 begin C ← − create an m × n matrix with all entries 0; 2 for 0 ≤ i < ( ℓ/ (2 k )) do 3 T 0 ← MakeTable ( B , i × 2 k , 0 , k ); 4 T 1 ← MakeTable ( B , i × 2 k + k , 0 , k ); 5 for 0 ≤ j < m do 6 id 0 ← − ReadBits ( A , j , i × 2 k , k ); 7 id 1 ← − ReadBits ( A , j , i × 2 k + k , k ); 8 add row id 0 from T 0 and row id 1 from T 1 to row j of C ; 9 return C ; 10

Performance: Multiplication Magma 31s M4RI 25s execution time t 19s 13s 7s 1s 2000 8000 14000 20000 26000 matrix dimension n Figure: 2.66 Ghz Intel i7, 4GB RAM

PLE Decomposition I Definition (PLE) Let A be a m × n matrix over a field K . A PLE decomposition of A is a triple of matrices P , L and E E such that P is a m × m permutation matrix, L is a unit L lower triangular matrix, and E is a m × n matrix in row-echelon form, and A = PLE . PLE decomposition can be in-place, that is L and E are stored in A and P is stored as an m -vector.

PLE Decomposition II From the PLE decomposition we can ◮ read the rank r , ◮ read the row rank profile (pivots), ◮ compute the null space, ◮ solve y = Ax for x and ◮ compute the (reduced) row echelon form. C.-P. Jeannerod, C. Pernet, and A. Storjohann. Rank-profile revealing Gaussian elimination and the CUP matrix decomposition. arXiv:1112.5717 , 35 pages, 2012.

Block Recursive PLE Decomposition O ( n ω ) I

Block Recursive PLE Decomposition O ( n ω ) II

Block Recursive PLE Decomposition O ( n ω ) III A NE ← L − 1 NW × A NE

Block Recursive PLE Decomposition O ( n ω ) IV A SE ← A SE + A SW × A NE

Block Recursive PLE Decomposition O ( n ω ) V

Block Recursive PLE Decomposition O ( n ω ) VI

Block Iterative PLE Decomposition I We need an efficient base case for PLE Decomposition ◮ block recursive PLE decomposition gives rise to a block iterative PLE decomposition ◮ choose blocks of size k = log n and use M4RM for the “update” multiplications n 3 / log n ◮ this gives a complexity O � �

Block Iterative PLE Decomposition II

Block Iterative PLE Decomposition III L

Block Iterative PLE Decomposition IV A NE ← L − 1 × A NE L

Block Iterative PLE Decomposition V

Block Iterative PLE Decomposition VI A SE ← A SE + A SW × A NE

Block Iterative PLE Decomposition VII

Block Iterative PLE Decomposition VIII

Block Iterative PLE Decomposition IX A NE = L − 1 × A NE

Block Iterative PLE Decomposition X A SE = A SE + A SW × A NE

Block Iterative PLE Decomposition XI

Performance: Reduced Row Echelon Form 31s Magma 25s execution time t c MAGMA ≈ 6 . 8 · 10 − 12 19s M4RI 13s 7s c M4RI ≈ 4 . 3 · 10 − 12 1s 2000 8000 14000 20000 26000 matrix dimension n Figure: 2.66 Ghz Intel i7, 4GB RAM

Performance: Row Echelon Form Using one core – on sage.math – we can compute the echelon form of a 500 , 000 × 500 , 000 dense random matrix over F 2 in 9711 seconds = 2 . 7 hours ( c ≈ 10 − 12 ) . Using four cores decomposition we can compute the echelon form of a random dense 500 , 000 × 500 , 000 matrix in 3806 seconds = 1 . 05 hours.

Caveat: Sensitivity to Sparsity 6 execution time t 5 4 3 Magma 2 M4RI 1 PLE 2 6 10 14 18 non-zero elements per row Figure: Gaussian elimination of 10 , 000 × 10 , 000 matrices on Intel 2.33GHz Xeon E5345 comparing Magma 2.17-12 and M4RI 20111004.

Algorithms & Techniques for Dense Linear Algebra over Small - PowerPoint PPT Presentation

Algorithms & Techniques for Dense Linear Algebra over Small Finite Fields Martin R. Albrecht (martinralbrecht+summerschool@googlemail.com) POLSYS Team, UPMC, Paris, France ECrypt II PhD Summer School Outline F 2 Gray Codes Multiplication

Lecture 14: Dense Linear Algebra David Bindel 18 Oct 2010 Where we are This week: dense

Automation in Dense Linear Algebra Paper by Paolo Bientinesi and Robert van de Geijn Presented by

Chapter 1 What is Linear Algebra? Chapter 1 What is Linear Algebra? The study of linear

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Dense Flow Visualization Lecture 10 February 27, 2020 General Overview Dense methods in 2D

A Massively Parallel Dense Symmetric A Massively Parallel Dense Symmetric A Massively Parallel

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

Linear Algebra Linear algebra has become as basic and as applicable as calculus, and

Linear and Sublinear Linear Algebra Algorithms: Preconditioning Stochastic Gradient Algorithms

PV Math Department MCL Vision Credit Options Credit General General/Post- College Honors

CS 294-73 Software Engineering for Scientific Computing Lecture 10:Dense Linear

Dense matrix algorithms We are going to study algorithms involving dense matrices (as opposed

Linear algebra explained in four pages Excerpt from the N O BULLSHIT GUIDE TO LINEAR ALGEBRA by

Matrices Basic Linear Algebra Overview Lecture will cover why matrices and linear algebra

MATRICES AND LINEAR ALGEBRA Linear Algebra Matrix manipulation is the original essence of

Expressive Linear Algebra in Haskell Henning Thielemann 2019-08-21 Expressive Linear Algebra in

The p -adic arithmetic curve: algebraic and analytic aspects Kiran S. Kedlaya Department of

Sumsets via Spectral Properties of Dynamical Systems John Griesmer University of Denver Ergodic

Efficient arithmetic on elliptic curves in large characteristic D. J. Bernstein University of

Ramseys Theorem on Trees Wei Li Joint Work with C. T. Chong, Wei Wang and Yue Yang

On the Regularity Method for Hypergraphs Mathias Schacht October 2004 Regularity Method for

Applications of model theory in extremal graph combinatorics Artem Chernikov (IMJ-PRG, UCLA)

Approximating partial by total: fixpoint characterizations of back-and-forth equivalences Samson

Truth in the limit Marcin Mostowski Institute of Philosophy, Warsaw University

Algorithms & Techniques for Dense Linear Algebra over Small - PowerPoint PPT Presentation

Algorithms & Techniques for Dense Linear Algebra over Small Finite Fields Martin R. Albrecht (martinralbrecht+summerschool@googlemail.com) POLSYS Team, UPMC, Paris, France ECrypt II PhD Summer School Outline F 2 Gray Codes Multiplication

Lecture 14: Dense Linear Algebra David Bindel 18 Oct 2010 Where we are This week: dense

Automation in Dense Linear Algebra Paper by Paolo Bientinesi and Robert van de Geijn Presented by

Chapter 1 What is Linear Algebra? Chapter 1 What is Linear Algebra? The study of linear

Graphics 2014 Linear Algebra II Linear Maps &amp; Matrices Linear Maps &amp; Matrices CORE

Dense Flow Visualization Lecture 10 February 27, 2020 General Overview Dense methods in 2D

A Massively Parallel Dense Symmetric A Massively Parallel Dense Symmetric A Massively Parallel

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

Linear Algebra Linear algebra has become as basic and as applicable as calculus, and

Linear and Sublinear Linear Algebra Algorithms: Preconditioning Stochastic Gradient Algorithms

PV Math Department MCL Vision Credit Options Credit General General/Post- College Honors

CS 294-73 Software Engineering for Scientific Computing Lecture 10:Dense Linear

Dense matrix algorithms We are going to study algorithms involving dense matrices (as opposed

Linear algebra explained in four pages Excerpt from the N O BULLSHIT GUIDE TO LINEAR ALGEBRA by

Matrices Basic Linear Algebra Overview Lecture will cover why matrices and linear algebra

MATRICES AND LINEAR ALGEBRA Linear Algebra Matrix manipulation is the original essence of

Expressive Linear Algebra in Haskell Henning Thielemann 2019-08-21 Expressive Linear Algebra in

The p -adic arithmetic curve: algebraic and analytic aspects Kiran S. Kedlaya Department of

Sumsets via Spectral Properties of Dynamical Systems John Griesmer University of Denver Ergodic

Efficient arithmetic on elliptic curves in large characteristic D. J. Bernstein University of

Ramseys Theorem on Trees Wei Li Joint Work with C. T. Chong, Wei Wang and Yue Yang

On the Regularity Method for Hypergraphs Mathias Schacht October 2004 Regularity Method for

Applications of model theory in extremal graph combinatorics Artem Chernikov (IMJ-PRG, UCLA)

Approximating partial by total: fixpoint characterizations of back-and-forth equivalences Samson

Truth in the limit Marcin Mostowski Institute of Philosophy, Warsaw University

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE