comparative study of one sided factorizations with
play

Comparative Study of One-Sided Factorizations with Multiple Software - PowerPoint PPT Presentation

Comparative Study of One-Sided Factorizations with Multiple Software Packages on Multi-Core Hardware Emmanuel A GULLO Jack D ONGARRA Bilel H ADRI Jakub K URZAK Hatem L TAIEF Piotr L USZCZEK Scheduling for Large-Scale Systems, Knoxville, TN, May


  1. Comparative Study of One-Sided Factorizations with Multiple Software Packages on Multi-Core Hardware Emmanuel A GULLO Jack D ONGARRA Bilel H ADRI Jakub K URZAK Hatem L TAIEF Piotr L USZCZEK Scheduling for Large-Scale Systems, Knoxville, TN, May 13-15, 2009 P LASMA group Comparative Study of One-Sided Factorizations 1

  2. Outline 1. Tile Algorithms Cholesky Factorization QR (&LU) Factorizations 2. Experimental environment Libraries Hardware Metrics 3. Tuning PLASMA 4. Comparison against other libraries Experiments on few cores Experiments on a large number of cores PLASMA scalability 5. Conclusion and current work P LASMA group Comparative Study of One-Sided Factorizations 2

  3. Tile Algorithms Outline 1. Tile Algorithms Cholesky Factorization QR (&LU) Factorizations 2. Experimental environment Libraries Hardware Metrics 3. Tuning PLASMA 4. Comparison against other libraries Experiments on few cores Experiments on a large number of cores PLASMA scalability 5. Conclusion and current work P LASMA group Comparative Study of One-Sided Factorizations 3

  4. Tile Algorithms Cholesky Outline 1. Tile Algorithms Cholesky Factorization QR (&LU) Factorizations 2. Experimental environment Libraries Hardware Metrics 3. Tuning PLASMA 4. Comparison against other libraries Experiments on few cores Experiments on a large number of cores PLASMA scalability 5. Conclusion and current work P LASMA group Comparative Study of One-Sided Factorizations 4

  5. Tile Algorithms Cholesky Tile Cholesky Factorization ��� ��������������� ���� ��� ����������� ���������������� � ������������������������ ������������ � ���������������� ���� ��� ����������������� ������� ��� ����������� �������������������� � ��������������������������������� ���������������� � ������������������������ ⋆ Basically identical to the block algorithm ( LAPACK ). ⋆ Input matrix stored and processed by square tiles. ⋆ Complex DAG. P LASMA group Comparative Study of One-Sided Factorizations 5

  6. Tile Algorithms Cholesky Tile Cholesky Factorization - Static pipeline ⋆ Work partitioned in one dimension (by block-rows). ⋆ Cyclic assignment of work across all steps of the factorization (pipelining of factorization steps). ⋆ Process tracking by a global progress table. ⋆ Stall on dependencies (busy waiting). P LASMA group Comparative Study of One-Sided Factorizations 6

  7. Tile Algorithms QR & LU Outline 1. Tile Algorithms Cholesky Factorization QR (&LU) Factorizations 2. Experimental environment Libraries Hardware Metrics 3. Tuning PLASMA 4. Comparison against other libraries Experiments on few cores Experiments on a large number of cores PLASMA scalability 5. Conclusion and current work P LASMA group Comparative Study of One-Sided Factorizations 7

  8. Tile Algorithms QR & LU Tile QR (&LU) Factorization ��� ��������������� ���� ����������������� � ���������������� ���� ��� ����������������� ���������������������������������� � ���������������������������������� ���� ��� ����������������� ���������������� � ���������������������������������� �������� ��� � ���������������� ����������������������������� � ������������������������������������������� ⋆ Different from the block algorithm. ⋆ Derived from out-of-core algorithm. ⋆ Input matrix stored and processed by square tiles. ⋆ Complex DAG. P LASMA group Comparative Study of One-Sided Factorizations 8

  9. Tile Algorithms QR & LU Tile QR Factorization - Static pipeline ������������������������������������ ���������������������������������������������� ����������������������������������������������� ⋆ Work partitioned in one dimension (by ����������������������������������������������������������� ����������� !"#���"��� ����� ����$���%&'(��) block-rows). �����**�������+�%&'(*�� ,� ����� ����� ����-��%&'(�..���-��%&'(��) ⋆ Cyclic assignment of work across all steps of the ������/�"���������/�" ��� ����/�"������ ������/�" **� ���� �� ����/�" �����%&'(��) factorization (pipelining of factorization steps). ����������/�"��*��#����"�� � �������� ����� ����/�"��$���%&'(�..���/�"��-��%&'(��) ��������������/�"�**����/�"������/�"�+�%&'(*��/�"�� ��������,���/�" �����/�"�� ⋆ Process tracking by a global progress table. ����, ���� �� ����������) �������� �� �� �������) ⋆ Stall on dependencies (busy waiting). ������������ ����� �0�������1�21�2�3���+��� �������������������41�21�2���1�21�2�� ������������0�������1�21�2����� ��������, �������� ���� ) ������������ ����� �0�������1 21�2�3���+��� �������������������41�21�2��41 21�2���1 21�2�� ������������0�������1 21�2����� ��������, ����, ���� ���� �) �������� �� �� �������) ������������ ����� �0�������1�21�2�3����� ������������ ����� �0�������1�21�2�3���+��� �������������������41�21�2���1�21�2��41�21�2�� ��������, �������� ���� �) ������������ ����� �0�������1 21�2�3����� ������������ ����� �0�������1 21�2�3���+��� �������������������41 21�2���1 21�2��41�21�2��41 21�2�� ������������0�������1 21�2����� ��������, ����, ����������/�"��� �����/�" ��������/�"�� , P LASMA group Comparative Study of One-Sided Factorizations 9

  10. Experimental environment Outline 1. Tile Algorithms Cholesky Factorization QR (&LU) Factorizations 2. Experimental environment Libraries Hardware Metrics 3. Tuning PLASMA 4. Comparison against other libraries Experiments on few cores Experiments on a large number of cores PLASMA scalability 5. Conclusion and current work P LASMA group Comparative Study of One-Sided Factorizations 10

  11. Experimental environment Libraries Outline 1. Tile Algorithms Cholesky Factorization QR (&LU) Factorizations 2. Experimental environment Libraries Hardware Metrics 3. Tuning PLASMA 4. Comparison against other libraries Experiments on few cores Experiments on a large number of cores PLASMA scalability 5. Conclusion and current work P LASMA group Comparative Study of One-Sided Factorizations 11

  12. Experimental environment Libraries Libraries ⋆ LAPACK : ◮ LAPACK 3.2 on Intel machine; ◮ LAPACK 3.1.1 on IBM machine; ⋆ SCALAPACK : ◮ SCALAPACK 1.8.0; ⋆ Vendor libraries: ◮ Intel MKL 10.1; ◮ IBM ESSL 4.3; ◮ IBM PESSL 3.3; ⋆ Tile algorithms: ◮ PLASMA ; ◮ TBLAS . P LASMA group Comparative Study of One-Sided Factorizations 12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend