The M4RI & M4RIE libraries for linear algebra over F 2 and small - PowerPoint PPT Presentation

The M4RI & M4RIE libraries for linear algebra over F 2 and small extensions Martin R. Albrecht Nancy, March 30, 2011

Outline M4RI Introduction Multiplication Elimination M4RIE Introduction Travolta Tables Karatsuba Multiplication Results Final

The M4RI Library ◮ available under the GPL Version 2 or later (GPLv2+) ◮ provides basic arithmetic (addition, equality testing, stacking, augmenting, sub-matrices, randomisation, etc.) ◮ implements asymptotically fast multiplication [ABH10] ◮ implements asymptotically fast decomposition [AP10] ◮ implements some multi-core support ◮ Linux, Mac OS X (x86 and PPC), OpenSolaris (Sun Studio Express) and Windows (Cygwin) http://m4ri.sagemath.org

F 2 ◮ field with two elements. ◮ logical bitwise XOR is ⊕ ⊙ addition. 0 0 0 0 ◮ logical bitwise AND is 0 1 1 0 multiplication. 1 0 1 0 ◮ 64 (128) basic operations in 1 1 0 1 at most one CPU cycle ◮ . . . arithmetic rather cheap Memory access is the expensive operation, not arithmetic.

M4RM [ADKF70] I Consider C = A · B ( A is m × l and B is l × n ). A can be divided into l / k vertical “stripes” A 0 . . . A ( l − 1) / k of k columns each. B can be divided into l / k horizontal “stripes” B 0 . . . B ( l − 1) / k of k rows each. We have: ( l − 1) / k � C = A · B = A i · B i . 0

M4RM [ADKF70] II       1 1 0 1 1 0 1 1 1 1 0 0 0 0 0 1 1 0 0 0       A =  , B =  , A 0 =       1 1 1 1 0 1 1 0 1 1     0 1 1 1 0 1 0 1 0 1   0 1 � 1 � 0 � � 0 0 0 1 1 1 1 0   A 1 =  , B 0 = , B 1 =   0 1 1 0 0 1 0 1 1 1  1 1     0 1 0 1 1 1 0 1 0 0 0 0 0 0 0 0     A 0 · B 0 =  , A 1 · B 1 =     1 1 0 1 0 0 1 1    0 1 1 0 0 0 1 1

� n 3 / log n � M4RM: Algorithm O begin 1 C ← − create an m × n matrix with all entries 0; 2 k ← − ⌊ log n ⌋ ; 3 for 0 ≤ i < ( ℓ/ k ) do 4 // create table of 2 k − 1 linear combinations T ← MakeTable ( B , i × k , 0 , k ); 5 for 0 ≤ j < m do 6 // read index for table T id ← − ReadBits ( A , j , i × k , k ); 7 add row id from T to row j of C ; 8 return C ; 9 end 10 Algorithm 1 : M4RM

Strassen-Winograd [Str69] Multiplication ◮ fastest known pratical algorithm ◮ complexity: O � n log 2 7 � ◮ linear algebra constant: ω = log 2 7 ◮ M4RM can be used as base case for small dimensions → optimisation of this base case

Cache [Bhu99] Memory Regs L1 L2 Ram Swap 10 2 10 7 Speed (ns) 0.5 2 6 2 · 10 7 Cost (cycles) 1 4 14 200 Size 4 · 64-bit 64k 1-4M 1G 100G

Cache Friendly M4RM I begin 1 C ← − create an m × n matrix with all entries 0; 2 for 0 ≤ i < ( ℓ/ k ) do 3 // this is cheap in terms of memory access T ← MakeTable ( B , i × k , 0 , k ); 4 for 0 ≤ j < m do 5 // we load each row j to take care of k bits id ← − ReadBits ( A , j , i × k , k ); 6 add row id from T to row j of C ; 7 return C ; 8 end 9

Cache Friendly M4RM II begin 1 C ← − create an m × n matrix with all entries 0; 2 for 0 ≤ start < m / b s do 3 for 0 ≤ i < ( ℓ/ k ) do 4 // we regenerate T for each block T ← MakeTable ( B , i × k , 0 , k ); 5 for 0 ≤ s < b s do 6 j ← − start × b s + s ; 7 id ← − ReadBits ( A , j , i × k , k ); 8 add row id from T to row j of C ; 9 return C ; 10 end 11

Cache Friendly M4RM III Matrix Dimensions Plain Cache Friendly 10 , 000 × 10 , 000 4.141 2.866 16 , 384 × 16 , 384 16.434 12.214 20 , 000 × 20 , 000 29.520 20.497 32 , 000 × 32 , 000 86.153 82.446 Table: Strassen-Winograd with different base cases on 64-bit Linux, 2.33Ghz Core 2 Duo

t > 1 Gray Code Tables I ◮ actual arithmetic is quite cheap compared to memory reads and writes ◮ the cost of memory accesses greatly depends on where in memory data is located ◮ try to fill all of L1 with Gray code tables. ◮ Example: k = 10 and 1 Gray code table → 10 bits at a time. k = 9 and 2 Gray code tables, still the same memory for the tables but deal with 18 bits at once. ◮ The price is one extra row addition, which is cheap if the operands are all in cache.

t > 1 Gray Code Tables II begin 1 C ← − create an m × n matrix with all entries 0; 2 for 0 ≤ i < ( ℓ/ (2 k )) do 3 T 0 ← MakeTable ( B , i × 2 k , 0 , k ); 4 T 1 ← MakeTable ( B , i × 2 k + k , 0 , k ); 5 for 0 ≤ j < m do 6 id 0 ← − ReadBits ( A , j , i × 2 k , k ); 7 id 1 ← − ReadBits ( A , j , i × 2 k + k , k ); 8 add row id 0 from T 0 and row id 1 from T 1 to row j of C ; 9 return C ; 10 end 11

t > 1 Gray Code Tables III Matrix Dimensions t = 1 t = 2 t = 8 10 , 000 × 10 , 000 4.141 1.982 1.599 16 , 384 × 16 , 384 16.434 7.258 6.034 20 , 000 × 20 , 000 29.520 14.655 11.655 32 , 000 × 32 , 000 86.153 49.768 44.999 Table: Strassen-Winograd with different base cases on 64-bit Linux, 2.33Ghz Core 2 Duo

Results: Multiplication Multiplication 35 0.6 Magma Sage 30 0.4 Magma/Sage 25 0.2 execution time t log 2 relation 20 0.0 15 0.2 10 0.4 5 0.6 0 0.8 1000 5000 9000 13000 17000 21000 25000 29000 Figure: 2.66 Ghz Intel i7, 4GB RAM

Work-in-Progress: Small Matrices M4RI is efficient for large matrices, but not so for small matrices. But there is work under way by Carlo Wood to fix this. Emmanuel M4RI transpose 4.9064 µ s 5.3642 µ s copy 0.2019 µ s 0.2674 µ s add 0.2533 µ s 0.7503 µ s mul 0.2535 µ s 0.4472 µ s Table: 64 × 64 matrices ( matops.c ) Note One performance bottleneck is that our matrix structure is much more complicated than Emmanuel’s.

Results: Multiplication Revisited Multiplication 35 M4RI Magma 30 'optimal' 7 n M Execution time in seconds 25 7 n − 1 (7 M +15 A ) 20 15 10 5 0 0 5000 10000 15000 20000 25000 30000 35000 Figure: 2.66 Ghz Intel i7, 4GB RAM

PLE Decomposition I Definition (PLE) Let A be a m × n matrix over a field K . A PLE decomposition of A is a triple of matrices P , L and E such that P is a m × m permutation matrix, L is a unit lower triangular matrix, and E is am × n matrix in row-echelon form, and A = PLE . PLE decomposition can be in-place, that is L and E are stored in A and P is stored as an m -vector.

PLE Decomposition II From the PLE decomposition we can ◮ read the rank r , ◮ read the row rank profile (pivots), ◮ compute the null space, ◮ solve y = Ax for x and ◮ compute the (reduced) row echelon form.

Block Recursive PLE Decomposition O ( n ω ) Write � A NW � A NE � � A = A W A E = A SW A SE Main steps: 1. Call PLE on A W 2. Apply row permutation to A E 3. L NW ← the lower left triangular matrix in A NW 4. A NE ← L − 1 NW × A NE 5. A SE ← A SE + A SW × A NE 6. Call PLE on A SE 7. Apply row permutation to A SW 8. Compress L

Visualisation

Block Iterative PLE Decomposition I We need an efficient base case for PLE Decomposition ◮ block recursive PLE decomposition gives rise to a block iterative PLE decomposition ◮ choose blocks of size k = log n and use M4RM for the “update” multiplications n 3 / log n ◮ this gives a complexity O � � ◮ this is an alternative way of looking at the M4RI algorithm or its PLE decomposition equivalent (“MMPF”) ◮ M4RI is more cache friendly than straight block iterative PLE decomposition, so we adapt it PLE using M4RI idea

Visualisation

Results: Reduced Row Echelon Form Elimination 30 Magma Sage 25 Magma/Sage 2.0 20 execution time t log 2 relation 15 1.5 10 1.0 5 0 1000 5000 9000 13000 17000 21000 25000 29000 Figure: 2.66 Ghz Intel i7, 4GB RAM

Results: Row Echelon Form Using one core we can compute the echelon form of a 500 , 000 × 500 , 000 dense random matrix over F 2 in 9711 . 42 seconds = 2 . 7 hours . Using four cores decomposition we can compute the echelon form of a random dense 500 , 000 × 500 , 000 matrix in 3806 . 28 seconds = 1 . 05 hours.

Work-in-Progress: Sensitivity to Sparsity 14 M4RI M+P 0.15 12 PLS Magma 10 execution time t 8 6 4 2 0 0 10 20 30 40 50 nonzero elements per row on average

The M4RI & M4RIE libraries for linear algebra over F 2 and small - PowerPoint PPT Presentation

The M4RI & M4RIE libraries for linear algebra over F 2 and small extensions Martin R. Albrecht Nancy, March 30, 2011 Outline M4RI Introduction Multiplication Elimination M4RIE Introduction Travolta Tables Karatsuba Multiplication

Matrices Basic Linear Algebra Overview Lecture will cover why matrices and linear algebra

Chapter 1 What is Linear Algebra? Chapter 1 What is Linear Algebra? The study of linear

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Linear Algebra Linear algebra has become as basic and as applicable as calculus, and

Lecture 14: Dense Linear Algebra David Bindel 18 Oct 2010 Where we are This week: dense

PV Math Department MCL Vision Credit Options Credit General General/Post- College Honors

MATRICES AND LINEAR ALGEBRA Linear Algebra Matrix manipulation is the original essence of

Linear algebra explained in four pages Excerpt from the N O BULLSHIT GUIDE TO LINEAR ALGEBRA by

Expressive Linear Algebra in Haskell Henning Thielemann 2019-08-21 Expressive Linear Algebra in

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Libraries Jonathan Platt Head of Libraries and Heritage 22 nd July 2014 Libraries 1.

Libraries In C++ its possible to create static libraries and shared libraries Static

ManyCore ManyCore Computing: ManyCore ManyCore Computing: Computing: Computing: The Impact on

Chapter 4: Vectors, Matrices, and Linear Algebra Scott Owen & Greg Corrado Linear Algebra is

Basic Concepts in Linear Algebra Department of Mathematics Boise State University September 14,

Essence of Linear Algebra Linear combinations and Span Some cool intuitions The box game:

CS481: Bioinformatics Algorithms Can Alkan EA224 calkan@cs.bilkent.edu.tr

Computing and Processing Correspondences with Functional Maps SIGGRAPH 2017 course Maks

Probabilistic Dimensionality Reduction Neil D. Lawrence University of Sheffield Facebook, London

(Indicative) outline Introduction Multimedia Indexing and Retrieval Descriptors Georges

On APN functions EA-equivalent to permutations Valeriya Idrisova Sobolev Institute of

SUBTRACTION-FREE COMPLEXITY, CLUSTER TRANSFORMATIONS, AND SPANNING TREES SERGEY FOMIN, DIMA

Welcome Today: Basic overview of the course and objectives CS1007: Object Oriented

Welcome Today: Basic overview of the course and objectives Goal: Thing are much

The M4RI & M4RIE libraries for linear algebra over F 2 and small - PowerPoint PPT Presentation

The M4RI & M4RIE libraries for linear algebra over F 2 and small extensions Martin R. Albrecht Nancy, March 30, 2011 Outline M4RI Introduction Multiplication Elimination M4RIE Introduction Travolta Tables Karatsuba Multiplication

Matrices Basic Linear Algebra Overview Lecture will cover why matrices and linear algebra

Chapter 1 What is Linear Algebra? Chapter 1 What is Linear Algebra? The study of linear

Graphics 2014 Linear Algebra II Linear Maps &amp; Matrices Linear Maps &amp; Matrices CORE

Linear Algebra Linear algebra has become as basic and as applicable as calculus, and

Lecture 14: Dense Linear Algebra David Bindel 18 Oct 2010 Where we are This week: dense

PV Math Department MCL Vision Credit Options Credit General General/Post- College Honors

MATRICES AND LINEAR ALGEBRA Linear Algebra Matrix manipulation is the original essence of

Linear algebra explained in four pages Excerpt from the N O BULLSHIT GUIDE TO LINEAR ALGEBRA by

Expressive Linear Algebra in Haskell Henning Thielemann 2019-08-21 Expressive Linear Algebra in

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Libraries Jonathan Platt Head of Libraries and Heritage 22 nd July 2014 Libraries 1.

Libraries In C++ its possible to create static libraries and shared libraries Static

ManyCore ManyCore Computing: ManyCore ManyCore Computing: Computing: Computing: The Impact on

Chapter 4: Vectors, Matrices, and Linear Algebra Scott Owen &amp; Greg Corrado Linear Algebra is

Basic Concepts in Linear Algebra Department of Mathematics Boise State University September 14,

Essence of Linear Algebra Linear combinations and Span Some cool intuitions The box game:

CS481: Bioinformatics Algorithms Can Alkan EA224 calkan@cs.bilkent.edu.tr

Computing and Processing Correspondences with Functional Maps SIGGRAPH 2017 course Maks

Probabilistic Dimensionality Reduction Neil D. Lawrence University of Sheffield Facebook, London

(Indicative) outline Introduction Multimedia Indexing and Retrieval Descriptors Georges

On APN functions EA-equivalent to permutations Valeriya Idrisova Sobolev Institute of

SUBTRACTION-FREE COMPLEXITY, CLUSTER TRANSFORMATIONS, AND SPANNING TREES SERGEY FOMIN, DIMA

Welcome Today: Basic overview of the course and objectives CS1007: Object Oriented

Welcome Today: Basic overview of the course and objectives Goal: Thing are much

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Chapter 4: Vectors, Matrices, and Linear Algebra Scott Owen & Greg Corrado Linear Algebra is