How Two-sided Matrix Transformation Algorithms Can Benefit from Task Parallelism
Mirko Myllykoski
Department of Computing Science Ume˚ a University
Nordic Numerical Linear Algebra Meeting KTH, Stockholm, 21-22 October, 2019
1 / 85
How Two-sided Matrix Transformation Algorithms Can Benefit from Task - - PowerPoint PPT Presentation
How Two-sided Matrix Transformation Algorithms Can Benefit from Task Parallelism Mirko Myllykoski Department of Computing Science Ume a University Nordic Numerical Linear Algebra Meeting KTH, Stockholm, 21-22 October, 2019 1 / 85
Department of Computing Science Ume˚ a University
1 / 85
2 / 85
1 AQ1 and
2 HQ2.
3 / 85
20k 40k 60k 80k 100k 120k Matrix dimension 0.0 0.2 0.4 0.6 0.8 1.0 Relative runtime 1.6 - 2.9 fold speedup StarNEig PDHSEQR
20k 40k 60k 80k 100k 120k Matrix dimension 0.0 0.2 0.4 0.6 0.8 1.0 Relative runtime 2.8 - 5.0 fold speedup StarNEig PDTRSEN
1https://github.com/NLAFET/SEVP-PDHSEQR-Alg953/. 4 / 85
2 x 2 e i g e n v a l u e p r
l e m b u l g e
5 / 85
Apply locally Group transformations Propagate with BLAS-3 updates In L2 cache
6 / 85
time cores / ranks
7 / 85
S c h u r r e d u c t i
R e
d e r D e fl a t e Bulge chasing S h i f t s S p i k e Hessenberg reduction
8 / 85
time cores / ranks AED AED
W R W W R W R W R W R L L L L L dependences t a s k s R R R R R R R R R R L L L L L L L L L L
11 / 85
critical path critical path dependences ready for scheduling can be scheduled
Figure: An illustration of how the task graph is traversed.
12 / 85
13 / 85
14 / 85
15 / 85
16 / 85
17 / 85
18 / 85
19 / 85
20 / 85
21 / 85
22 / 85
23 / 85
24 / 85
25 / 85
26 / 85
27 / 85
28 / 85
29 / 85
30 / 85
31 / 85
32 / 85
33 / 85
34 / 85
35 / 85
36 / 85
37 / 85
38 / 85
39 / 85
40 / 85
41 / 85
42 / 85
43 / 85
44 / 85
45 / 85
46 / 85
47 / 85
48 / 85
49 / 85
50 / 85
51 / 85
52 / 85
53 / 85
54 / 85
55 / 85
56 / 85
57 / 85
58 / 85
59 / 85
60 / 85
61 / 85
62 / 85
63 / 85
64 / 85
65 / 85
66 / 85
67 / 85
68 / 85
69 / 85
70 / 85
71 / 85
72 / 85
73 / 85
74 / 85
75 / 85
76 / 85
77 / 85
78 / 85
79 / 85
80 / 85
81 / 85
82 / 85
20k 40k 60k 80k 100k 120k Matrix dimension 0.0 0.2 0.4 0.6 0.8 1.0 Relative runtime 1.6 - 2.9 fold speedup StarNEig PDHSEQR
20k 40k 60k 80k 100k 120k Matrix dimension 0.0 0.2 0.4 0.6 0.8 1.0 Relative runtime 2.8 - 5.0 fold speedup StarNEig PDTRSEN
1https://github.com/NLAFET/SEVP-PDHSEQR-Alg953/. 83 / 85
1In collaboration with Carl Christian Kjelgaard Mikkelsen, Angelika Schwarz,
2https://nlafet.github.io/StarNEig 84 / 85
20k 40k 60k 80k 100k 120k 140k 160k Matrix dimension 500 1000 1500 2000 2500 3000 Runtime [s] 1 nodes 4 nodes 9 nodes 16 nodes 25 nodes
85 / 85