Matrix multiplication over word-size modular rings using Binis - PowerPoint PPT Presentation

Matrix multiplication over word-size modular rings using Bini’s approximate formula Brice Bo�� Jean-Guillaume D�m�� JNCF �� Novembre ��

Motivations/Goals – matrix multiplication over word size Z / p Z is a critical building block in exact linear algebra (matrix multiplication over Z , over GF ( q ) , Chinese remaindering,…) – perform faster matrix multiplication over Z / p Z (using fewer products) p. �

Outline � Introduction � Approximate formula to Exact formula � Implementation and Timings p. �

– Complexity of n ω with ω ≈ 2 . 780 for ( 12 , 12 , 12 ) on 3 000 × 3 000 matrices. Bini’s approximate formula Facts – Multiplication of 3 × 2 and 2 × 2 matrices (noted ( 3 , 2 , 2 ) ) using 10 products. – Also (by duality): ( 2 , 3 , 2 ) and ( 2 , 2 , 3 ) multiplications. (compared to Strassen’s ω ≈ 2 . 807 ) – One call to Bini’s algorithm saves roughly 4 . 5 � operations ( vs. Strassen’s) Facts p. � – Computes C ε = A × B + εD ( ε ) .

on 3 000 × 3 000 matrices. Bini’s approximate formula Facts – Multiplication of 3 × 2 and 2 × 2 matrices (noted ( 3 , 2 , 2 ) ) using 10 products. – Also (by duality): ( 2 , 3 , 2 ) and ( 2 , 2 , 3 ) multiplications. (compared to Strassen’s ω ≈ 2 . 807 ) – One call to Bini’s algorithm saves roughly 4 . 5 � operations ( vs. Strassen’s) Facts p. � – Computes C ε = A × B + εD ( ε ) . – Complexity of n ω with ω ≈ 2 . 780 for ( 12 , 12 , 12 )

Bini’s approximate formula Facts – Multiplication of 3 × 2 and 2 × 2 matrices (noted ( 3 , 2 , 2 ) ) using 10 products. – Also (by duality): ( 2 , 3 , 2 ) and ( 2 , 2 , 3 ) multiplications. (compared to Strassen’s ω ≈ 2 . 807 ) – One call to Bini’s algorithm saves roughly 4 . 5 � operations ( vs. Strassen’s) Facts p. � – Computes C ε = A × B + εD ( ε ) . – Complexity of n ω with ω ≈ 2 . 780 for ( 12 , 12 , 12 ) on 3 000 × 3 000 matrices.

Bini’s approximate formula P 7 T 9 P 0 Algorithm P 2 P 3 P 4 P 5 P 6 P 8 T 7 P 9 C 11 C 12 C 21 C 22 C 31 C 32 Algorithm S 9 P 1 T 6 S 5 S 1 T 1 T 2 S 3 T 3 S 4 T 4 p. � S 6 T 5 ← A 11 + A 22 ← B 22 + ε ⋅ B 11 ← B 21 + B 22 ← A 32 + ε ⋅ A 31 ← B 11 + ε ⋅ B 21 ← A 22 + ε ⋅ A 12 ← B 21 − ε ⋅ B 11 ← A 11 + ε ⋅ A 12 ← B 22 + ε ⋅ B 12 ← A 21 + A 32 ← B 11 + ε ⋅ B 22 ← B 11 + B 12 ← A 21 + ε ⋅ A 31 ← B 12 − ε ⋅ B 22 B ← A 11 × B 22 ← S 1 × T 1 ← A 22 × T 2 ← S 3 × T 3 ← S 4 × T 4 ← S 5 × T 5 ← S 6 × T 6 ← A 21 × T 7 ← A 32 × B 11 ← S 9 × T 9 B ← ( P 1 − P 2 + P 4 − P 0 )/ ε ← ( P 5 − P 0 )/ ε ← P 4 − P 3 + P 6 ← P 1 − P 5 + P 9 ← ( P 3 − P 8 )/ ε ← ( P 6 − P 7 + P 9 − P 8 )/ ε

Bini’s approximate formula Dependencies Dependencies Symmetries ! p. �

Bini’s approximate formula C11 X := e . A31 + A32 S3 13 C11 := C21 - C11 31 C22 Y := B11 + e . B21 T3 14 X := e . A12 + A22 S4 30 := C21 - C22 C31 11 Y := e . B11 + B22 T1 28 C31 := A21 * Y P7 C21 C22 := X * Y P1 29 C32 := C32 - C31 C32 12 32 := X * Y T7 := A21 + e . A31 35 C31 := (C31 - Y)/e C31 18 X S9 := (C21 + C11)/e 36 C32 := (C32 - Y)/e C32 Scheduling of the Algorithm – Only 2 temporaries! ( X and Y ) – Easy to make it inplace while overwriting the left or right operand C11 C11 P3 C21 15 Y := B21 - e . B11 T4 33 C21 := C21 - C31 16 17 C21 := X * Y P4 34 Y := A32 * B11 P8 10 := B11 + B12 Scheduling of the Algorithm 21 := X * Y P9 3 Y := e . B12 + B22 T5 C22 20 := C22 + C32 C22 4 C22 := X * Y P5 22 C32 S5 := A21 + A32 C11 # operation var # operation var 1 := A11 * B22 := A11 + e . A12 P0 19 Y := B12 - e . B22 T9 2 X X S6 Y 26 C21 := C21 + C31 C21 8 C11 := C11 + C31 C11 C32 P2 := C32 + C31 C32 9 X := A11 + A22 S1 27 25 := A22 * Y 5 6 C12 := (C22 - C11)/e C12 23 Y := B11 + e . B22 T6 Y C31 := B21 + B22 T2 24 C31 := X * Y P6 7 p. �

Bini’s approximate formula X := C21 - C11 C11 31 Y := B11 + e . B21 T3 14 := e . A12 + A22 13 S4 32 C31 := X * Y P3 15 Y C11 S3 T4 29 C31 := A21 * Y P7 11 C21 := X * Y P1 C32 := e . A31 + A32 := C32 - C31 C32 12 C22 := C21 - C22 C22 30 X := B21 - e . B11 33 T1 – Easy to make it inplace while overwriting the left or right operand S9 36 C32 := (C32 - Y)/e C32 Scheduling of the Algorithm – Only 2 temporaries! ( X and Y ) Brice Boyer, Jean-Guillaume Dumas, Clément Pernet, X and Wei Zhou. Memory efficient scheduling of Strassen-Winograd’s matrix multiplication algorithm. In Proceedings of the �� Internat. Symp. Symbolic Algebraic Comput., ISSAC ’��, pages ��–��, New York, NY, USA, ��. ACM. := A21 + e . A31 18 C21 Y := C21 - C31 C21 16 C21 := X * Y P4 34 := A32 * B11 C31 P8 17 C11 := (C21 + C11)/e C11 35 C31 := (C31 - Y)/e 28 := e . B11 + B22 Scheduling of the Algorithm := C22 + C32 P9 3 Y := e . B12 + B22 T5 21 C22 C22 C32 4 C22 := X * Y P5 22 X := A21 + A32 := X * Y 20 5 C11 # operation var # operation var 1 := A11 * B22 S5 P0 19 Y := B12 - e . B22 T9 2 X := A11 + e . A12 S6 C12 Y C32 8 C11 := C11 + C31 C11 26 C32 := C32 + C31 9 := C21 + C31 X := A11 + A22 S1 27 Y := B11 + B12 T7 10 C21 C21 := (C22 - C11)/e := B21 + B22 C12 23 Y := B11 + e . B22 T6 6 Y T2 25 24 C31 := X * Y P6 7 C31 := A22 * Y P2 p. �

Outline � Introduction � Approximate formula to Exact formula � Implementation and Timings p. �

Getting an exact algorithm – Find a d + 1 scalars α i , and d + 1 pair-wise distinct scalars ε i . – Make sure ∑ d + 1 – 吁en ∑ d + 1 p. � – Let d = deg ε ( εD ( ε )) . i = 1 α i = 1 and for j = 1 , … , d that ∑ d + 1 i = 1 α i ε j i = 0 . i = 1 α i C ε i = A × B

Getting an exact algorithm – Find a d + 1 scalars α i , and d + 1 pair-wise distinct scalars ε i . – Make sure ∑ d + 1 – 吁en ∑ d + 1 D. Bini. Relations between exact and approximate bilinear algorithms. applications. Calcolo, ��:��–��, ��. ��.��/BF��. p. � – Let d = deg ε ( εD ( ε )) . i = 1 α i = 1 and for j = 1 , … , d that ∑ d + 1 i = 1 α i ε j i = 0 . i = 1 α i C ε i = A × B

ε 2 ≈ 0 . ε 2 = 0 . Using Only One Call Ideas – Only “one recursive” call – for a double : ε = 2 − 27 – Modulo p : ε = p = 0 Ideas p. �

Matrix multiplication over word-size modular rings using Binis - PowerPoint PPT Presentation

Matrix multiplication over word-size modular rings using Binis approximate formula Brice Bo Jean-Guillaume Dm JNCF Novembre Motivations/Goals matrix multiplication over word size Z / p Z

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Matrix Multiplication Matrix multiplication is an operation with properties quite different from

Modular Budgets Modular Budgets Modular Budgets Modular Budgets OSPA NANO Session 10/25/06

CS 140 : Matrix multiplication Warmup: Matrix times vector: communication volume Matrix

Shared Memory with Cilk++ Matrix-matrix multiplication Matrix-vector multiplication

Parallel Scientific Computing Matrix-vector multiplication. Matrix-matrix multiplication.

Complexity of matrix multiplication (For Hierarchical matrix) For Usual matrix The

CS 401 Integer Multiplication / Matrix Multiplication Xiaorui Sun 1 Integer Multiplication

Optical Rings and Hybrid Mesh Rings Optical Networks draft-papadimitriou-optical-rings-00.txt

Matrix-chain multiplication Carola Wenk 1 CMPS 6610 Algorithms Matrix-chain multiplication

Chapter VI All Pair Shortest Paths and Matrix Multiplication VI.1 APSPs and Matrix

Efficient multiplication 2 Matrix multiplication If you have square matrices A and B, then C =

Matrix Calculations: Kernels & Images, Matrix Multiplication A. Kissinger (and H. Geuvers)

Communication Lower Bounds for Matrix-Matrix Multiplication Dagstuhl Seminar #15281 July 6-9,

High-performance and Memory-saving Sparse General Matrix-Matrix Multiplication for Pascal GPU

MATH 105: Finite Mathematics 2-5: Matrix Multiplication Prof. Jonathan Duncan Walla Walla

CS 5150 So(ware Engineering Requirements Analysis William

where user experience and software engineering meet Andrew J. Ko

3/9/2020 The Virtual The Virtual The Virtual The Virtual Certification Certification

Windows Search Protocol & Samba Noel Power noel.power@suse.com Agenda Overview

Accelerating Deep Learning Frameworks with Micro-batches Yosuke Oyama 1 * Tal Ben-Nun 2 Torsten

Design and Analysis of Algorithms

Laboratory Session: MapReduce Algorithm Design in MapReduce Pietro Michiardi Eurecom Pietro

Natural Language Processing Coreference and Anaphora Resolution Alessandro Moschitti & Olga

Sambuz

Useful Links

Newsletter

Mail Us

Matrix multiplication over word-size modular rings using Binis - PowerPoint PPT Presentation

Matrix multiplication over word-size modular rings using Binis approximate formula Brice Bo Jean-Guillaume Dm JNCF Novembre Motivations/Goals matrix multiplication over word size Z / p Z

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Matrix Multiplication Matrix multiplication is an operation with properties quite different from

Modular Budgets Modular Budgets Modular Budgets Modular Budgets OSPA NANO Session 10/25/06

CS 140 : Matrix multiplication Warmup: Matrix times vector: communication volume Matrix

Shared Memory with Cilk++ Matrix-matrix multiplication Matrix-vector multiplication

Parallel Scientific Computing Matrix-vector multiplication. Matrix-matrix multiplication.

Complexity of matrix multiplication (For Hierarchical matrix) For Usual matrix The

CS 401 Integer Multiplication / Matrix Multiplication Xiaorui Sun 1 Integer Multiplication

Optical Rings and Hybrid Mesh Rings Optical Networks draft-papadimitriou-optical-rings-00.txt

Matrix-chain multiplication Carola Wenk 1 CMPS 6610 Algorithms Matrix-chain multiplication

Chapter VI All Pair Shortest Paths and Matrix Multiplication VI.1 APSPs and Matrix

Efficient multiplication 2 Matrix multiplication If you have square matrices A and B, then C =

Matrix Calculations: Kernels &amp; Images, Matrix Multiplication A. Kissinger (and H. Geuvers)

Communication Lower Bounds for Matrix-Matrix Multiplication Dagstuhl Seminar #15281 July 6-9,

High-performance and Memory-saving Sparse General Matrix-Matrix Multiplication for Pascal GPU

MATH 105: Finite Mathematics 2-5: Matrix Multiplication Prof. Jonathan Duncan Walla Walla

CS 5150 So(ware Engineering Requirements Analysis William

where user experience and software engineering meet Andrew J. Ko

3/9/2020 The Virtual The Virtual The Virtual The Virtual Certification Certification

Windows Search Protocol &amp; Samba Noel Power noel.power@suse.com Agenda Overview

Accelerating Deep Learning Frameworks with Micro-batches Yosuke Oyama 1 * Tal Ben-Nun 2 Torsten

Design and Analysis of Algorithms

Laboratory Session: MapReduce Algorithm Design in MapReduce Pietro Michiardi Eurecom Pietro

Natural Language Processing Coreference and Anaphora Resolution Alessandro Moschitti &amp; Olga

Sambuz

Useful Links

Newsletter

Mail Us

Matrix Calculations: Kernels & Images, Matrix Multiplication A. Kissinger (and H. Geuvers)

Windows Search Protocol & Samba Noel Power noel.power@suse.com Agenda Overview

Natural Language Processing Coreference and Anaphora Resolution Alessandro Moschitti & Olga