supermatrix a multithreaded runtime scheduling
play

SuperMatrix: A Multithreaded Runtime Scheduling System for - PowerPoint PPT Presentation

SuperMatrix: A Multithreaded Runtime Scheduling System for Algorithms-by-Blocks Ernie Chan, Field G. Van Zee, Robert van de Geijn, Paolo Bientinesi, Enrique S. Quintana-Ort and Gregorio Quintana-Ort Software Engineering Seminar Luc Humair


  1. SuperMatrix: A Multithreaded Runtime Scheduling System for Algorithms-by-Blocks Ernie Chan, Field G. Van Zee, Robert van de Geijn, Paolo Bientinesi, Enrique S. Quintana-Ortí and Gregorio Quintana-Ortí Software Engineering Seminar – Luc Humair

  2. Motivation • Multicore architectures demand concurrent algorithms • Complicated and error prone linear algebra libraries 2

  3. Motivation • SuperMatrix offers level of abstraction for algorithms-by-block: – Automatic parallelization – Straight forward implementation of algorithms-by-block 3

  4. Motivation • SuperMatrix offers level of abstraction for algorithms-by-block: – Automatic parallelization – Straight forward implementation of algorithms-by-block • Work with blocked matrices (FLAME/FLASH API) • Dependency analysis • Out of order scheduling 4

  5. Inversion of a SPD Matrix Given symmetric positive definite matrix 1 1 1    n n A R A 1 2 1 1 1 3 5

  6. Inversion of a SPD Matrix Given symmetric positive definite matrix 1 1 1    n n A R A 1 2 1 1 1 3 1. Cholesky factorization (CHOL) 1 1 1   T A U U U 0 1 0 0 0 1.4 6

  7. Inversion of a SPD Matrix Given symmetric positive definite matrix 1 1 1    n n A R A 1 2 1 1 1 3 1. Cholesky factorization (CHOL) 1 1 1   T A U U U 0 1 0 0 0 1.4 2. Inversion of triangular matrix (TRINV) 1 -1 -0.7   U  1 : R R 0 1 0 0 0 0.7 7

  8. Inversion of a SPD Matrix Given symmetric positive definite matrix 1 1 1    n n A R A 1 2 1 1 1 3 1. Cholesky factorization (CHOL) 1 1 1   T A U U U 0 1 0 0 0 1.4 2. Inversion of triangular matrix (TRINV) 1 -1 -0.7   U  1 : R R 0 1 0 0 0 0.7 3. Triangular transpose matrix mult. (TTMM) 2.5 -1 -0.5  :  1   1 T A A RR -1 1 0 -0.5 0 0.5 8

  9. Inversion of a SPD Matrix – Proof   n n SPD matrix A R  T A U U 1. (CHOL)  U  1 2. (TRINV) R :  :  1 T 3. (TTMM) A RR Proof: 1 , 3 2         1 T T T 1 T T T AA U URR U UU U U U I 9

  10. Inversion of a SPD Matrix – Proof   n n SPD matrix A R  T A U U 1. (CHOL)  U  1 2. (TRINV) R :  :  1 T 3. (TTMM) A RR Proof: 1 , 3 2         1 T T T 1 T T T AA U URR U UU U U U I 10

  11. Inversion of a SPD Matrix – Proof   n n SPD matrix A R  T A U U 1. (CHOL)  U  1 2. (TRINV) R :  :  1 T 3. (TTMM) A RR Proof: 1 , 3 2         1 T T T 1 T T T AA U URR U UU U U U I 11

  12. Inversion of a SPD Matrix – Proof   n n SPD matrix A R  T A U U 1. (CHOL)  U  1 2. (TRINV) R :  :  1 T 3. (TTMM) A RR Proof: 1 , 3 2         1 T T T 1 T T T AA U URR U UU U U U I 12

  13. Inversion of a SPD Matrix – Proof   n n SPD matrix A R  T A U U 1. (CHOL)  U  1 2. (TRINV) R :  :  1 T 3. (TTMM) A RR Proof: 1 , 3 2         1 T T T 1 T T T AA U URR U UU U U U I 13

  14. One variant of computing (CHOL) Source: Paper 14

  15. One variant of computing (CHOL) Source: Paper 15

  16. First iteration (4x4 matrix blocks) A 1,1 A 1,2 A 1,3 A 1,4 (A 2,1 ) A 2,2 A 2,3 A 2,4 (A 3,1 ) (A 3,2 ) A 3,3 A 3,4 (A 4,1 ) (A 4,2 ) (A 4,3 ) A 4,4 Source: Paper 16

  17. First iteration (4x4 matrix blocks) A 1,1 A 1,2 A 1,3 A 1,4 (A 2,1 ) A 2,2 A 2,3 A 2,4 (A 3,1 ) (A 3,2 ) A 3,3 A 3,4 (A 4,1 ) (A 4,2 ) (A 4,3 ) A 4,4 Source: Paper 17

  18. First iteration (4x4 matrix blocks) Computations: CHOL 0 TRSM 1 TRSM 2 TRSM 3 CHOL(A 1,1 ) Inv(A 1,1 ) A 1,2 Inv(A 1,1 ) A 1,3 Inv(A 1,1 ) A 1,4 SYRK 4 GEMM 5 GEMM 6 (A 2,1 ) A 2,2 – A T A 2,3 – A T A 2,4 – A T 1,2 A 1,2 1,2 A 1,3 1,2 A 1,4 SYRK 7 GEMM 8 (A 3,1 ) (A 3,2 ) A 3,3 – A T A 3,4 – A T 1,3 A 1,3 1,3 A 1,4 SYRK 9 (A 4,1 ) (A 4,2 ) (A 4,3 ) A 4,4 – A T 1,4 A 1,4 Source: Paper 18

  19. First iteration (4x4 matrix blocks) Computations: CHOL 0 TRSM 1 TRSM 2 TRSM 3 CHOL(A 1,1 ) Inv(A 1,1 ) A 1,2 Inv(A 1,1 ) A 1,3 Inv(A 1,1 ) A 1,4 SYRK 4 GEMM 5 GEMM 6 (A 2,1 ) A 2,2 – A T A 2,3 – A T A 2,4 – A T 1,2 A 1,2 1,2 A 1,3 1,2 A 1,4 SYRK 7 GEMM 8 (A 3,1 ) (A 3,2 ) A 3,3 – A T A 3,4 – A T 1,3 A 1,3 1,3 A 1,4 SYRK 9 (A 4,1 ) (A 4,2 ) (A 4,3 ) A 4,4 – A T 1,4 A 1,4 Source: Paper • CHOL Cholesky factorization 19

  20. First iteration (4x4 matrix blocks) Computations: CHOL 0 TRSM 1 TRSM 2 TRSM 3 CHOL(A 1,1 ) Inv(A 1,1 ) A 1,2 Inv(A 1,1 ) A 1,3 Inv(A 1,1 ) A 1,4 SYRK 4 GEMM 5 GEMM 6 (A 2,1 ) A 2,2 – A T A 2,3 – A T A 2,4 – A T 1,2 A 1,2 1,2 A 1,3 1,2 A 1,4 SYRK 7 GEMM 8 (A 3,1 ) (A 3,2 ) A 3,3 – A T A 3,4 – A T 1,3 A 1,3 1,3 A 1,4 SYRK 9 (A 4,1 ) (A 4,2 ) (A 4,3 ) A 4,4 – A T 1,4 A 1,4 Source: Paper • TRSM • CHOL Triangular solves Cholesky with multiple right factorization 20 hand sides

  21. First iteration (4x4 matrix blocks) Computations: CHOL 0 TRSM 1 TRSM 2 TRSM 3 CHOL(A 1,1 ) Inv(A 1,1 ) A 1,2 Inv(A 1,1 ) A 1,3 Inv(A 1,1 ) A 1,4 SYRK 4 GEMM 5 GEMM 6 (A 2,1 ) A 2,2 – A T A 2,3 – A T A 2,4 – A T 1,2 A 1,2 1,2 A 1,3 1,2 A 1,4 SYRK 7 GEMM 8 (A 3,1 ) (A 3,2 ) A 3,3 – A T A 3,4 – A T 1,3 A 1,3 1,3 A 1,4 SYRK 9 (A 4,1 ) (A 4,2 ) (A 4,3 ) A 4,4 – A T 1,4 A 1,4 Source: Paper • TRSM • CHOL • SYRK Triangular solves Symmetric rank-k Cholesky with multiple right factorization update 21 hand sides

  22. First iteration (4x4 matrix blocks) Computations: CHOL 0 TRSM 1 TRSM 2 TRSM 3 CHOL(A 1,1 ) Inv(A 1,1 ) A 1,2 Inv(A 1,1 ) A 1,3 Inv(A 1,1 ) A 1,4 SYRK 4 GEMM 5 GEMM 6 (A 2,1 ) A 2,2 – A T A 2,3 – A T A 2,4 – A T 1,2 A 1,2 1,2 A 1,3 1,2 A 1,4 SYRK 7 GEMM 8 (A 3,1 ) (A 3,2 ) A 3,3 – A T A 3,4 – A T 1,3 A 1,3 1,3 A 1,4 SYRK 9 (A 4,1 ) (A 4,2 ) (A 4,3 ) A 4,4 – A T 1,4 A 1,4 Source: Paper • TRSM • CHOL • SYRK • GEMM Triangular solves Symmetric rank-k Cholesky Matrix-Matrix with multiple right factorization update multiplication 22 hand sides

  23. First iteration (4x4 matrix blocks) Computations: Dependency Graph: CHOL 0 TRSM 1 TRSM 2 TRSM 3 CHOL(A 1,1 ) Inv(A 1,1 ) A 1,2 Inv(A 1,1 ) A 1,3 Inv(A 1,1 ) A 1,4 SYRK 4 GEMM 5 GEMM 6 (A 2,1 ) A 2,2 – A T A 2,3 – A T A 2,4 – A T 1,2 A 1,2 1,2 A 1,3 1,2 A 1,4 SYRK 7 GEMM 8 (A 3,1 ) (A 3,2 ) A 3,3 – A T A 3,4 – A T 1,3 A 1,3 1,3 A 1,4 SYRK 9 (A 4,1 ) (A 4,2 ) (A 4,3 ) A 4,4 – A T 1,4 A 1,4 Source: Paper • TRSM • CHOL • SYRK • GEMM Triangular solves Symmetric rank-k Cholesky Matrix-Matrix with multiple right factorization update multiplication 23 hand sides

  24. First iteration (4x4 matrix blocks) Computations: Dependency Graph: CHOL 0 TRSM 1 TRSM 2 TRSM 3 (A 1,1 ) (A 1,2 ) (A 1,3 ) (A 1,4 ) CHOL(A 1,1 ) Inv(A 1,1 ) A 1,2 Inv(A 1,1 ) A 1,3 Inv(A 1,1 ) A 1,4 TRSM 2 CHOL 0 TRSM 1 SYRK 4 GEMM 5 GEMM 6 (A 2,1 ) (A 2,1 ) CHOL(A 2,2 ) Inv(A 2,2 ) A 2,3 Inv(A 2,2 ) A 2,4 A 2,2 – A T A 2,3 – A T A 2,4 – A T 1,2 A 1,2 1,2 A 1,3 1,2 A 1,4 SYRK 3 GEMM 4 SYRK 7 GEMM 8 (A 3,1 ) (A 3,2 ) (A 3,1 ) (A 3,2 ) A 3,3 – A T 2,3 A 2,3 A 3,4 – A T 2,3 A 2,4 A 3,3 – A T A 3,4 – A T 1,3 A 1,3 1,3 A 1,4 SYRK 7 SYRK 9 (A 4,1 ) (A 4,2 ) (A 3,2 ) (A 4,1 ) (A 4,2 ) (A 4,3 ) A 4,4 – A T 2,4 A 2,4 A 4,4 – A T 1,4 A 1,4 Source: Paper • TRSM • CHOL • SYRK • GEMM Triangular solves Symmetric rank-k Cholesky Matrix-Matrix with multiple right factorization update multiplication 24 hand sides

  25. First iteration (4x4 matrix blocks) Computations: Dependency Graph: CHOL 0 TRSM 1 TRSM 2 TRSM 3 (A 1,3 ) (A 1,4 ) (A 1,1 ) (A 1,2 ) (A 1,3 ) (A 1,4 ) (A 1,1 ) (A 1,2 ) CHOL(A 1,1 ) Inv(A 1,1 ) A 1,2 Inv(A 1,1 ) A 1,3 Inv(A 1,1 ) A 1,4 TRSM 2 CHOL 0 TRSM 1 SYRK 4 GEMM 5 GEMM 6 (A 2,1 ) (A 2,1 ) (A 2,1 ) (A 2,2 ) (A 2,3 ) (A 2,4 ) CHOL(A 2,2 ) Inv(A 2,2 ) A 2,3 Inv(A 2,2 ) A 2,4 A 2,2 – A T A 2,3 – A T A 2,4 – A T 1,2 A 1,2 1,2 A 1,3 1,2 A 1,4 SYRK 3 GEMM 4 TRSM 2 CHOL 0 SYRK 7 GEMM 8 (A 3,1 ) (A 3,2 ) (A 3,1 ) (A 3,2 ) (A 3,1 ) (A 3,2 ) A 3,3 – A T 2,3 A 2,3 A 3,4 – A T 2,3 A 2,4 A 3,3 – A T A 3,4 – A T Inv(A 3,3 ) A 3,4 CHOL(A 3,3 ) 1,3 A 1,3 1,3 A 1,4 SYRK 7 SYRK 9 SYRK 3 (A 4,1 ) (A 4,2 ) (A 3,2 ) (A 4,1 ) (A 4,1 ) (A 4,2 ) (A 4,2 ) (A 4,3 ) (A 4,3 ) A 4,4 – A T 2,4 A 2,4 A 4,4 – A T A 4,4 – A T 3,4 A 3,4 1,4 A 1,4 Source: Paper • TRSM • CHOL • SYRK • GEMM Triangular solves Symmetric rank-k Cholesky Matrix-Matrix with multiple right factorization update multiplication 25 hand sides

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend