matrix multiplication over word size modular rings using
play

Matrix multiplication over word-size modular rings using Binis - PowerPoint PPT Presentation

Matrix multiplication over word-size modular rings using Binis approximate formula Brice Bo Jean-Guillaume Dm JNCF Novembre Motivations/Goals matrix multiplication over word size Z / p Z


  1. Matrix multiplication over word-size modular rings using Bini’s approximate formula Brice Bo��� Jean-Guillaume D�m�� JNCF ���� � Novembre ����

  2. Motivations/Goals – matrix multiplication over word size Z / p Z is a critical building block in exact linear algebra (matrix multiplication over Z , over GF ( q ) , Chinese remaindering,…) – perform faster matrix multiplication over Z / p Z (using fewer products) p. �

  3. Motivations/Goals – matrix multiplication over word size Z / p Z is a critical building block in exact linear algebra (matrix multiplication over Z , over GF ( q ) , Chinese remaindering,…) – perform faster matrix multiplication over Z / p Z (using fewer products) p. �

  4. Outline � Introduction � Approximate formula to Exact formula � Implementation and Timings p. �

  5. Outline � Introduction � Approximate formula to Exact formula � Implementation and Timings p. �

  6. Outline � Introduction � Approximate formula to Exact formula � Implementation and Timings p. �

  7. Outline � Introduction � Approximate formula to Exact formula � Implementation and Timings p. �

  8. – Complexity of n ω with ω ≈ 2 . 780 for ( 12 , 12 , 12 ) on 3 000 × 3 000 matrices. Bini’s approximate formula Facts – Multiplication of 3 × 2 and 2 × 2 matrices (noted ( 3 , 2 , 2 ) ) using 10 products. – Also (by duality): ( 2 , 3 , 2 ) and ( 2 , 2 , 3 ) multiplications. (compared to Strassen’s ω ≈ 2 . 807 ) – One call to Bini’s algorithm saves roughly 4 . 5 � operations ( vs. Strassen’s) Facts p. � – Computes C ε = A × B + εD ( ε ) .

  9. – Complexity of n ω with ω ≈ 2 . 780 for ( 12 , 12 , 12 ) on 3 000 × 3 000 matrices. Bini’s approximate formula Facts – Multiplication of 3 × 2 and 2 × 2 matrices (noted ( 3 , 2 , 2 ) ) using 10 products. – Also (by duality): ( 2 , 3 , 2 ) and ( 2 , 2 , 3 ) multiplications. (compared to Strassen’s ω ≈ 2 . 807 ) – One call to Bini’s algorithm saves roughly 4 . 5 � operations ( vs. Strassen’s) Facts p. � – Computes C ε = A × B + εD ( ε ) .

  10. – Complexity of n ω with ω ≈ 2 . 780 for ( 12 , 12 , 12 ) on 3 000 × 3 000 matrices. Bini’s approximate formula Facts – Multiplication of 3 × 2 and 2 × 2 matrices (noted ( 3 , 2 , 2 ) ) using 10 products. – Also (by duality): ( 2 , 3 , 2 ) and ( 2 , 2 , 3 ) multiplications. (compared to Strassen’s ω ≈ 2 . 807 ) – One call to Bini’s algorithm saves roughly 4 . 5 � operations ( vs. Strassen’s) Facts p. � – Computes C ε = A × B + εD ( ε ) .

  11. on 3 000 × 3 000 matrices. Bini’s approximate formula Facts – Multiplication of 3 × 2 and 2 × 2 matrices (noted ( 3 , 2 , 2 ) ) using 10 products. – Also (by duality): ( 2 , 3 , 2 ) and ( 2 , 2 , 3 ) multiplications. (compared to Strassen’s ω ≈ 2 . 807 ) – One call to Bini’s algorithm saves roughly 4 . 5 � operations ( vs. Strassen’s) Facts p. � – Computes C ε = A × B + εD ( ε ) . – Complexity of n ω with ω ≈ 2 . 780 for ( 12 , 12 , 12 )

  12. Bini’s approximate formula Facts – Multiplication of 3 × 2 and 2 × 2 matrices (noted ( 3 , 2 , 2 ) ) using 10 products. – Also (by duality): ( 2 , 3 , 2 ) and ( 2 , 2 , 3 ) multiplications. (compared to Strassen’s ω ≈ 2 . 807 ) – One call to Bini’s algorithm saves roughly 4 . 5 � operations ( vs. Strassen’s) Facts p. � – Computes C ε = A × B + εD ( ε ) . – Complexity of n ω with ω ≈ 2 . 780 for ( 12 , 12 , 12 ) on 3 000 × 3 000 matrices.

  13. Bini’s approximate formula P 7 T 9 P 0 Algorithm P 2 P 3 P 4 P 5 P 6 P 8 T 7 P 9 C 11 C 12 C 21 C 22 C 31 C 32 Algorithm S 9 P 1 T 6 S 5 S 1 T 1 T 2 S 3 T 3 S 4 T 4 p. � S 6 T 5 ← A 11 + A 22 ← B 22 + ε ⋅ B 11 ← B 21 + B 22 ← A 32 + ε ⋅ A 31 ← B 11 + ε ⋅ B 21 ← A 22 + ε ⋅ A 12 ← B 21 − ε ⋅ B 11 ← A 11 + ε ⋅ A 12 ← B 22 + ε ⋅ B 12 ← A 21 + A 32 ← B 11 + ε ⋅ B 22 ← B 11 + B 12 ← A 21 + ε ⋅ A 31 ← B 12 − ε ⋅ B 22 B ← A 11 × B 22 ← S 1 × T 1 ← A 22 × T 2 ← S 3 × T 3 ← S 4 × T 4 ← S 5 × T 5 ← S 6 × T 6 ← A 21 × T 7 ← A 32 × B 11 ← S 9 × T 9 B ← ( P 1 − P 2 + P 4 − P 0 )/ ε ← ( P 5 − P 0 )/ ε ← P 4 − P 3 + P 6 ← P 1 − P 5 + P 9 ← ( P 3 − P 8 )/ ε ← ( P 6 − P 7 + P 9 − P 8 )/ ε

  14. Bini’s approximate formula Dependencies Dependencies Symmetries ! p. �

  15. Bini’s approximate formula Dependencies Dependencies Symmetries ! p. �

  16. Bini’s approximate formula C11 X := e . A31 + A32 S3 13 C11 := C21 - C11 31 C22 Y := B11 + e . B21 T3 14 X := e . A12 + A22 S4 30 := C21 - C22 C31 11 Y := e . B11 + B22 T1 28 C31 := A21 * Y P7 C21 C22 := X * Y P1 29 C32 := C32 - C31 C32 12 32 := X * Y T7 := A21 + e . A31 35 C31 := (C31 - Y)/e C31 18 X S9 := (C21 + C11)/e 36 C32 := (C32 - Y)/e C32 Scheduling of the Algorithm – Only 2 temporaries! ( X and Y ) – Easy to make it inplace while overwriting the left or right operand C11 C11 P3 C21 15 Y := B21 - e . B11 T4 33 C21 := C21 - C31 16 17 C21 := X * Y P4 34 Y := A32 * B11 P8 10 := B11 + B12 Scheduling of the Algorithm 21 := X * Y P9 3 Y := e . B12 + B22 T5 C22 20 := C22 + C32 C22 4 C22 := X * Y P5 22 C32 S5 := A21 + A32 C11 # operation var # operation var 1 := A11 * B22 := A11 + e . A12 P0 19 Y := B12 - e . B22 T9 2 X X S6 Y 26 C21 := C21 + C31 C21 8 C11 := C11 + C31 C11 C32 P2 := C32 + C31 C32 9 X := A11 + A22 S1 27 25 := A22 * Y 5 6 C12 := (C22 - C11)/e C12 23 Y := B11 + e . B22 T6 Y C31 := B21 + B22 T2 24 C31 := X * Y P6 7 p. �

  17. Bini’s approximate formula C11 X := e . A31 + A32 S3 13 C11 := C21 - C11 31 C22 Y := B11 + e . B21 T3 14 X := e . A12 + A22 S4 30 := C21 - C22 C31 11 Y := e . B11 + B22 T1 28 C31 := A21 * Y P7 C21 C22 := X * Y P1 29 C32 := C32 - C31 C32 12 32 := X * Y T7 := A21 + e . A31 35 C31 := (C31 - Y)/e C31 18 X S9 := (C21 + C11)/e 36 C32 := (C32 - Y)/e C32 Scheduling of the Algorithm – Only 2 temporaries! ( X and Y ) – Easy to make it inplace while overwriting the left or right operand C11 C11 P3 C21 15 Y := B21 - e . B11 T4 33 C21 := C21 - C31 16 17 C21 := X * Y P4 34 Y := A32 * B11 P8 10 := B11 + B12 Scheduling of the Algorithm 21 := X * Y P9 3 Y := e . B12 + B22 T5 C22 20 := C22 + C32 C22 4 C22 := X * Y P5 22 C32 S5 := A21 + A32 C11 # operation var # operation var 1 := A11 * B22 := A11 + e . A12 P0 19 Y := B12 - e . B22 T9 2 X X S6 Y 26 C21 := C21 + C31 C21 8 C11 := C11 + C31 C11 C32 P2 := C32 + C31 C32 9 X := A11 + A22 S1 27 25 := A22 * Y 5 6 C12 := (C22 - C11)/e C12 23 Y := B11 + e . B22 T6 Y C31 := B21 + B22 T2 24 C31 := X * Y P6 7 p. �

  18. Bini’s approximate formula X := C21 - C11 C11 31 Y := B11 + e . B21 T3 14 := e . A12 + A22 13 S4 32 C31 := X * Y P3 15 Y C11 S3 T4 29 C31 := A21 * Y P7 11 C21 := X * Y P1 C32 := e . A31 + A32 := C32 - C31 C32 12 C22 := C21 - C22 C22 30 X := B21 - e . B11 33 T1 – Easy to make it inplace while overwriting the left or right operand S9 36 C32 := (C32 - Y)/e C32 Scheduling of the Algorithm – Only 2 temporaries! ( X and Y ) Brice Boyer, Jean-Guillaume Dumas, Clément Pernet, X and Wei Zhou. Memory efficient scheduling of Strassen-Winograd’s matrix multiplication algorithm. In Proceedings of the ���� Internat. Symp. Symbolic Algebraic Comput., ISSAC ’��, pages ��–��, New York, NY, USA, ����. ACM. := A21 + e . A31 18 C21 Y := C21 - C31 C21 16 C21 := X * Y P4 34 := A32 * B11 C31 P8 17 C11 := (C21 + C11)/e C11 35 C31 := (C31 - Y)/e 28 := e . B11 + B22 Scheduling of the Algorithm := C22 + C32 P9 3 Y := e . B12 + B22 T5 21 C22 C22 C32 4 C22 := X * Y P5 22 X := A21 + A32 := X * Y 20 5 C11 # operation var # operation var 1 := A11 * B22 S5 P0 19 Y := B12 - e . B22 T9 2 X := A11 + e . A12 S6 C12 Y C32 8 C11 := C11 + C31 C11 26 C32 := C32 + C31 9 := C21 + C31 X := A11 + A22 S1 27 Y := B11 + B12 T7 10 C21 C21 := (C22 - C11)/e := B21 + B22 C12 23 Y := B11 + e . B22 T6 6 Y T2 25 24 C31 := X * Y P6 7 C31 := A22 * Y P2 p. �

  19. Outline � Introduction � Approximate formula to Exact formula � Implementation and Timings p. �

  20. Getting an exact algorithm – Find a d + 1 scalars α i , and d + 1 pair-wise distinct scalars ε i . – Make sure ∑ d + 1 – 吁en ∑ d + 1 p. � – Let d = deg ε ( εD ( ε )) . i = 1 α i = 1 and for j = 1 , … , d that ∑ d + 1 i = 1 α i ε j i = 0 . i = 1 α i C ε i = A × B

  21. Getting an exact algorithm – Find a d + 1 scalars α i , and d + 1 pair-wise distinct scalars ε i . – Make sure ∑ d + 1 – 吁en ∑ d + 1 D. Bini. Relations between exact and approximate bilinear algorithms. applications. Calcolo, ��:��–��, ����. ��.����/BF��������. p. � – Let d = deg ε ( εD ( ε )) . i = 1 α i = 1 and for j = 1 , … , d that ∑ d + 1 i = 1 α i ε j i = 0 . i = 1 α i C ε i = A × B

  22. ε 2 ≈ 0 . ε 2 = 0 . Using Only One Call Ideas – Only “one recursive” call – for a double : ε = 2 − 27 – Modulo p : ε = p = 0 Ideas p. �

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend