communication avoiding sparse direct solvers for linear
play

Communication-avoiding sparse direct solvers for linear systems - PowerPoint PPT Presentation

hpcgarage.org/uiuc20 Slides are here right now Piyush Sao get em while theyre hot (ORNL) Communication-avoiding sparse direct solvers for linear systems & Sherry Li graph problems (LBNL) Ramki Kannan Richard (Rich) Vuduc


  1. hpcgarage.org/uiuc20 Slides are here right now— Piyush Sao get ‘em while they’re hot 🔦 (ORNL) Communication-avoiding sparse direct solvers for linear systems & Sherry Li graph problems (LBNL) Ramki Kannan Richard (Rich) Vuduc ORNL February 28, 2020 CSL Student Conference @ UIUC

  2. “The problems are solved, not by giving new information, but by arranging what we have known since long.” — Ludwig Wittgenstein

  3. hpcgarage.org/uiuc20 https://twitter.com/aydoz/status/1130559070627348480 3

  4. <latexit sha1_base64="2rFIgRkcbTkTUbYHfJWXWkZ9ys=">AF63icdVTbjts2EFUubhL3k362BemRoBdQLEl+b5AgKROu1sgqV03myxgGQuaHtnCUpRAUut1BH1F34qib0X7G/2M/k2Hso2uWkeABIpzpkLhzNLeKi04/x96/adu5VP7t1/UP30s8+/eHjw6PE7FaeSwRmLeSzPZ1QBDwWc6VBzOE8k0GjG4f3scmDs769AqjAWb/U6gWlEFyIMQkY1bl0cPaF+HD48og8J/4sXAyzH/KLg5pTb/ZaTshTr3lOd1e1yz6bqvfI27dKZ6atX1GF4/u/uXPY5ZGIDTjVKmJ6yR6mlGpQ8Yhr/qpgoSyS7qAjEZKraNZTp5GVC/Vf21mc59tkuqgN81CkaQaBEMI2oKUEx0TkxeZhxKY5mtcUCZD9EzYkrKNGZfrT4lb+glkFPgV4A2SvQSORDQlOuSH8Uoh/k0WxZIjF2CgBWLo4iKuR/QKOTrLS/zVbBd5qQk8tadZkEsikiN728FxQMhEjiW/QqICj+EYlHKHW24CwX8ZapjrACmEAoF0pwUiQMC1pSopAB5bpdF3sF9RUE2AnkdDQgJ5j9AkjRI0SxJURAnj0jkxEmqDWQBhkyoIJ8ZyI6PiaD4evh2ZjHpl2mh0utE3XcaKxWq7qRSOXGgj9RI9lINxWt9koVC6MylE5qsIzVnBexFT8ZRjYyUzGK5Fn45Nv8sx1PLvbtv9fTiewhbWadpu2+73e4enIT5Tq3v2a2e3WntQcWSisVOz/NQ0GnbHW8Pcg2cx6sdsunantOxO809yIWE9c51p4Mh9sy7Bzin8vIGuN9G713b9Uw6/7ZX5i8TtihYphUm3jTL/I1Izc1rXu7vTrpEYlwWpZpg15UblQR+f9nmcLtJxUlLTjYUyMFKd7veA7Yj2TwevxTE1tYrzmUzhsFlQE1N504QEfYt+YmE3PjY67KAZhxs3EPQqUSDLzRevYDbM/Sc+h0Af1lxfhoulPvpI+mOgXOVlFRmR7/0n4xuMV4CjScIbNA8TkFRjUc3syzPzqeLQ2028vHFO6/uNuvej63ai/Pt+LtvfWV9bR1artW1Xlin1sg6s5h1bf1m/WH9WYkqP1d+qfy6gd6+teV8aZWeyu/AEBhAPE=</latexit> hpcgarage.org/pp20 Ax = b nnz( A ) = O ( N ) 4

  5. hpcgarage.org/pp20 unit lower triangular L = PA = LU upper triangular U = permutation (pivoting) P = 5

  6. Baseline: SuperLU_DIST hpcgarage.org/pp20 2.0 X. Li & J. Demmel (ACM TOMS ’03) 1.7 2 2 1.5 32 1.6 1.1 1.3 1.5 1.7 1.3 0.81 16 1.2 0.67 0.93 0.91 1.1 0.97 0.94 8 “2D” algorithm (strong scaling) P z 0.28 0.42 0.49 0.48 0.49 0.53 4 0.8 0.093 0.12 0.17 0.21 0.23 0.24 2 0.4 Teraflop/s 0.036 0.044 0.055 0.065 0.064 0.073 1 (32x procs → 2x speedup) 24 48 96 192 384 768 2D-Grid MPI processes (2D process grid; 4 cores / process) 6

  7. <latexit sha1_base64="FvmfGaFZLK4N8dl/k8PLliSOBOU=">AK4HiclVbdb+NEPcdFzjM1xVekHjxERW1KASv7TjxA1LFcRLSdC1vR5xrlo7k8Q6f4T15trc4nfeEI/wn/HfMGvny4lTFUutRrszv5nf7O5v4k3DIOW6/u+Dh+8qr373uP31Q8+/OjT54cfHqZJjPmw4WfhAm78mgKYRDBQ94CFdTBjTyQnjhvX4m91+8AZYGSXzO51MYRHQcB6PApxyXrg8e/X2oXaSgjZKYa2nwFjTb0qZcVd0Ybvwkimg8FO4QRpeUZX0yEC7E6YxBRPlE1Enm3qZT6kNW9u+xBAuIniNetowWcbldUY5C263nF4Z24I5qdrD+HmrEUyCpAYDLNeth3xA+ZOe8BkZFaq2B0x6m9lLvDXTHaofx+Mf0543rCsb2y3wH3qhjDiR6JuZC4LxhN+vKcriHM+AU6LPpaBhVtsyZ7uhnUjGO8JK7b2hE2qY2TlPg1FN1sEliO7MZxPAjasah3JhLm3V2cQnd8kp1tx8n86EriZsHn2ShwtV4zT7Djb0ysJNWEA9wYz7wB7hsZ5EMH+4pYe9y1vhXhHjVWYW1VWXPSrMtjObdcKt2vhPr3N9pWXu7y8P9L8bqRf74/0NivR82AcxAJ+iyljdP41ghckNZdjZ9JF4EvtO82xVcwx3PC9flLXm6ZOHJtoaFiEtG0LMNsGY5Gmnr+1U8+V/Kvd31Qa7rDxJ9FEHM/pGnaJ/qUDwRlPBDyWyWAlb2mo6hj2ZMsYSByMlk2iGuDFEHWaGF+epmhKBRms4jDz3zw9zek4tVe/0ZH3UGIoinMw6xXyQazUKNJ5pUZG0YMPB5OEeD+izAWjV/QvGxcdRtZTGD1nqJ0Mw8+ZubDAIpXiXKIrVgW6sTSB8A1xGM9g85xGNgnCOQkFnIcfjTkdLW1UPNdTLSfleKguO/NgQ1jcSZ5cfNM0IMqWCkl2FVJ1TwEPi4HM0Z0CoxyvVj5ButNM9FjFaNlNXLhXJO2vkw72veUCUgq7tEQe0fjdbUhFl6paWSCOIlmfHItluKpBWXjfu7wyoET4X9N3EfB85p15yW3BbLGWk8N3GuGwDcYz7WcZH7K+ZVP8yfQ9YHG2o/4vrUpDYFz0I7k9oyFifw50cRkx6ocLPi7Y0MNvHCGI5WNvUzoTcOydcdstxr4am2DrG2HWLptPAp2GaDtJwGaberwFhyE6/QTKdjLQAMx3DszsI2dUvaCKYR3WjgasupAJM6tYQqIskSoYh1jIbVadhWRWzCaDxe8+q01wsopMWcsmBLIMsKjEMJKa3GrZRATeHMExuVnCObphLZp2CTIFWdC+HM0nD0O2GbVbAjRnMV2C2teqSTdpLigsz52nb2PKO/FNRYZcyqu03Lo0mwW79YtRPeoXUKo+VL5QvlSOFKG3lRPlJ6SkXil9Tal/Vvq3pqf+of6p/lW4PnywiPlMKX3qP/8BeoDc7w=</latexit> <latexit sha1_base64="FvmfGaFZLK4N8dl/k8PLliSOBOU=">AK4HiclVbdb+NEPcdFzjM1xVekHjxERW1KASv7TjxA1LFcRLSdC1vR5xrlo7k8Q6f4T15trc4nfeEI/wn/HfMGvny4lTFUutRrszv5nf7O5v4k3DIOW6/u+Dh+8qr373uP31Q8+/OjT54cfHqZJjPmw4WfhAm78mgKYRDBQ94CFdTBjTyQnjhvX4m91+8AZYGSXzO51MYRHQcB6PApxyXrg8e/X2oXaSgjZKYa2nwFjTb0qZcVd0Ybvwkimg8FO4QRpeUZX0yEC7E6YxBRPlE1Enm3qZT6kNW9u+xBAuIniNetowWcbldUY5C263nF4Z24I5qdrD+HmrEUyCpAYDLNeth3xA+ZOe8BkZFaq2B0x6m9lLvDXTHaofx+Mf0543rCsb2y3wH3qhjDiR6JuZC4LxhN+vKcriHM+AU6LPpaBhVtsyZ7uhnUjGO8JK7b2hE2qY2TlPg1FN1sEliO7MZxPAjasah3JhLm3V2cQnd8kp1tx8n86EriZsHn2ShwtV4zT7Djb0ysJNWEA9wYz7wB7hsZ5EMH+4pYe9y1vhXhHjVWYW1VWXPSrMtjObdcKt2vhPr3N9pWXu7y8P9L8bqRf74/0NivR82AcxAJ+iyljdP41ghckNZdjZ9JF4EvtO82xVcwx3PC9flLXm6ZOHJtoaFiEtG0LMNsGY5Gmnr+1U8+V/Kvd31Qa7rDxJ9FEHM/pGnaJ/qUDwRlPBDyWyWAlb2mo6hj2ZMsYSByMlk2iGuDFEHWaGF+epmhKBRms4jDz3zw9zek4tVe/0ZH3UGIoinMw6xXyQazUKNJ5pUZG0YMPB5OEeD+izAWjV/QvGxcdRtZTGD1nqJ0Mw8+ZubDAIpXiXKIrVgW6sTSB8A1xGM9g85xGNgnCOQkFnIcfjTkdLW1UPNdTLSfleKguO/NgQ1jcSZ5cfNM0IMqWCkl2FVJ1TwEPi4HM0Z0CoxyvVj5ButNM9FjFaNlNXLhXJO2vkw72veUCUgq7tEQe0fjdbUhFl6paWSCOIlmfHItluKpBWXjfu7wyoET4X9N3EfB85p15yW3BbLGWk8N3GuGwDcYz7WcZH7K+ZVP8yfQ9YHG2o/4vrUpDYFz0I7k9oyFifw50cRkx6ocLPi7Y0MNvHCGI5WNvUzoTcOydcdstxr4am2DrG2HWLptPAp2GaDtJwGaberwFhyE6/QTKdjLQAMx3DszsI2dUvaCKYR3WjgasupAJM6tYQqIskSoYh1jIbVadhWRWzCaDxe8+q01wsopMWcsmBLIMsKjEMJKa3GrZRATeHMExuVnCObphLZp2CTIFWdC+HM0nD0O2GbVbAjRnMV2C2teqSTdpLigsz52nb2PKO/FNRYZcyqu03Lo0mwW79YtRPeoXUKo+VL5QvlSOFKG3lRPlJ6SkXil9Tal/Vvq3pqf+of6p/lW4PnywiPlMKX3qP/8BeoDc7w=</latexit> <latexit sha1_base64="FvmfGaFZLK4N8dl/k8PLliSOBOU=">AK4HiclVbdb+NEPcdFzjM1xVekHjxERW1KASv7TjxA1LFcRLSdC1vR5xrlo7k8Q6f4T15trc4nfeEI/wn/HfMGvny4lTFUutRrszv5nf7O5v4k3DIOW6/u+Dh+8qr373uP31Q8+/OjT54cfHqZJjPmw4WfhAm78mgKYRDBQ94CFdTBjTyQnjhvX4m91+8AZYGSXzO51MYRHQcB6PApxyXrg8e/X2oXaSgjZKYa2nwFjTb0qZcVd0Ybvwkimg8FO4QRpeUZX0yEC7E6YxBRPlE1Enm3qZT6kNW9u+xBAuIniNetowWcbldUY5C263nF4Z24I5qdrD+HmrEUyCpAYDLNeth3xA+ZOe8BkZFaq2B0x6m9lLvDXTHaofx+Mf0543rCsb2y3wH3qhjDiR6JuZC4LxhN+vKcriHM+AU6LPpaBhVtsyZ7uhnUjGO8JK7b2hE2qY2TlPg1FN1sEliO7MZxPAjasah3JhLm3V2cQnd8kp1tx8n86EriZsHn2ShwtV4zT7Djb0ysJNWEA9wYz7wB7hsZ5EMH+4pYe9y1vhXhHjVWYW1VWXPSrMtjObdcKt2vhPr3N9pWXu7y8P9L8bqRf74/0NivR82AcxAJ+iyljdP41ghckNZdjZ9JF4EvtO82xVcwx3PC9flLXm6ZOHJtoaFiEtG0LMNsGY5Gmnr+1U8+V/Kvd31Qa7rDxJ9FEHM/pGnaJ/qUDwRlPBDyWyWAlb2mo6hj2ZMsYSByMlk2iGuDFEHWaGF+epmhKBRms4jDz3zw9zek4tVe/0ZH3UGIoinMw6xXyQazUKNJ5pUZG0YMPB5OEeD+izAWjV/QvGxcdRtZTGD1nqJ0Mw8+ZubDAIpXiXKIrVgW6sTSB8A1xGM9g85xGNgnCOQkFnIcfjTkdLW1UPNdTLSfleKguO/NgQ1jcSZ5cfNM0IMqWCkl2FVJ1TwEPi4HM0Z0CoxyvVj5ButNM9FjFaNlNXLhXJO2vkw72veUCUgq7tEQe0fjdbUhFl6paWSCOIlmfHItluKpBWXjfu7wyoET4X9N3EfB85p15yW3BbLGWk8N3GuGwDcYz7WcZH7K+ZVP8yfQ9YHG2o/4vrUpDYFz0I7k9oyFifw50cRkx6ocLPi7Y0MNvHCGI5WNvUzoTcOydcdstxr4am2DrG2HWLptPAp2GaDtJwGaberwFhyE6/QTKdjLQAMx3DszsI2dUvaCKYR3WjgasupAJM6tYQqIskSoYh1jIbVadhWRWzCaDxe8+q01wsopMWcsmBLIMsKjEMJKa3GrZRATeHMExuVnCObphLZp2CTIFWdC+HM0nD0O2GbVbAjRnMV2C2teqSTdpLigsz52nb2PKO/FNRYZcyqu03Lo0mwW79YtRPeoXUKo+VL5QvlSOFKG3lRPlJ6SkXil9Tal/Vvq3pqf+of6p/lW4PnywiPlMKX3qP/8BeoDc7w=</latexit> <latexit sha1_base64="gQJtegFK7Xec6ZaECn/FHx6O7z8=">AK4HiclVZLb9tGEGbSqE3ZV9we9nUcGEXqsqXKPFQIGjcohdXKmI7D1ExltRQIsKHulzFVra891b02P6z/pvOknpRogyXgI3B7sw383ufiNvGoUZ17R/791/70Hj/Q8efqh+9PEn3726ODzydMR8u/DRK2QuPZhCFCVzwkEfwYsqAxl4Ez703T+X+87fAsjBNzvl8CsOYjpMwCH3Kcenq4MHfR+QiAxKkCSdZ+A6IbZEpV1U3gWs/jWOajIQ7guCSsnygD4ULSTZjEFM+EYd67t5kU+pDXvXvsxQLiJ8hXr6MFkm+5XVGOQtvtpxeG9tuCOZnaw/hFqxFGoRIDEZ5P9+O+BFzZ31gMjKvVOwGjPpbmUv8NZMd6j+E419SXjQsHxjbLXAfuxE/FgcGrnLwvGEn+zpCuKcT4DTso9VYOGW7Knu2G9GMZ7wsqtPWGT+hZuU8j0csXgdXIXgLnk5CN6lqn58Lc26sziM+v09OtOPk/CwRupmyevxbHyxXjND/J9/RKQk0YwJ3BzFvAnqJxHsawv7ilx13LWyHeUmMd5laVNRf9RVs57aT0u1KuI9v8n3lFS4v740vx3p1d2R3uUVeh6Mw0TAbwljM6/QfCSJHE5diZbBL4k3xPHVjHaMP36tGh1jI13bF1goal6x0bDcsw24ZD9JZWfIfK4utfHTRa7ij1ZzEk3I9olg10bcqHgjIe+pFkNsAK3tDxzBAM6FYwlAUZHJyhCsj1EFWamGxuhkhaJxl89hDz+Iwt/fkYt3eYMaD7lCEyXTGIfHLRMEsIjwlUpHJKGTg82iOBvVZiLUSf0LxsXHUbWSxo9Y5qcjMIvmbmwiKR4VyiK1YFurE0gegtcRjPYPOeAxmE0R6Ggs4jcWfB0lbVI4J6OaneCw/VZWcebAiLOymSi29bBsT5UiH1XYVU3VPAw2Igc/SmwCjHq1VMkN40F31WM1p2E5fuNUkH6TDfW+5hJTCLi1RDR/d5tS0aWq1haIo0jWJ8diFa5uUJbedy6vCigR/tf0XQ8m3lnXnpTMlusFeTwnca4IfANJnNSkCxOubjyWfEej7QhPyE75tMaQScAzmW2zMWpfLnRAuTnahysODvjg018KIZjlQ29nKhtQzL1hyz027iq7UNfW07uqXZRhufgm029bT1DudOjCWXicrNPpWgsAwzEcu7uwTc2SNoIRXTOauNp2asCkTi2hykh9iVDGOkbT6jZtqyY2ZTQZr3l1O2sulq7pbeRSAFmGvqjEMJCY1m7aRg3cHKIovV7BOZphLpl1SzIlWtm9As7Um4ZmN2zBm7MYL4Cs61Vl2y9s6S4MAueto0t78o/FRV2KaNkv3FptHTs1q/G4ZP+QmsfKl8qXynHiq50lCfKz0pfuVD8htL4uvFdQ1M9Q/1T/Wv0vX+vUXMF0rlU/5D4Li3Jc=</latexit> Baseline: SuperLU_DIST hpcgarage.org/pp20 2.0 X. Li & J. Demmel (ACM TOMS ’03) 1.7 2 2 1.5 32 Example: 
 1.6 1.1 1.3 1.5 1.7 1.3 0.81 16 P x × P y = 96 (Best configuration shown) 1.2 0.67 0.93 0.91 1.1 0.97 0.94 8 P z 0.28 0.42 0.49 0.48 0.49 0.53 4 0.8 0.093 0.12 0.17 0.21 0.23 0.24 2 0.4 Teraflop/s 0.036 0.044 0.055 0.065 0.064 0.073 1 (32x procs → 2x speedup) 24 48 96 192 384 768 2D-Grid MPI processes (2D process grid; 4 cores / process) 7

  8. Baseline: SuperLU_DIST hpcgarage.org/pp20 2.0 X. Li & J. Demmel (ACM TOMS ’03) 1.7 2 2 1.5 32 1.6 1.1 1.3 1.5 1.7 1.3 0.81 16 1.2 0.67 0.93 0.91 1.1 0.97 0.94 8 “2D” algorithm (strong scaling) P z 0.28 0.42 0.49 0.48 0.49 0.53 4 0.8 0.093 0.12 0.17 0.21 0.23 0.24 2 0.4 Teraflop/s 0.036 0.044 0.055 0.065 0.064 0.073 1 (32x procs → 2x speedup) 24 48 96 192 384 768 2D-Grid MPI processes (2D process grid; 4 cores / process) 8

  9. Communication-avoiding idea 
 hpcgarage.org/pp20 For matrix multiplication, C += A ⋅ B , on P processors B A C 9

  10. Communication-avoiding idea 
 hpcgarage.org/pp20 For matrix multiplication, C += A ⋅ B , on P processors B A C 10

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend