Utilizing machine topology in numerical algorithms
Luke Olson Department of Computer Science Computational Science and Engineering University of Illinois at Urbana-ChampaignUtilizing machine topology in numerical algorithms Luke Olson - - PowerPoint PPT Presentation
Utilizing machine topology in numerical algorithms Luke Olson - - PowerPoint PPT Presentation
Utilizing machine topology in numerical algorithms Luke Olson Department of Computer Science Computational Science and Engineering University of Illinois at Urbana-Champaign Overview I use Blue Waters to solve large sparse
Overview
- I use Blue Waters to solve large sparse linear systems and to study the
performance
- Why?
[ ][][]
A x b =
Time step Linearization Assembly Solve Adapt
solve
- ther
Application: Plasma-coupled Combustion
- The Center for Exascale Simulation of Plasma-Coupled Combustion
@ Illinois
XPACC
Oxidizer O - Ar; Air 2 Fuel H ; CH 2 4Application: Plasma-coupled Combustion
- The Center for Exascale Simulation of Plasma-Coupled Combustion
@ Illinois
- Electric conductivity influences the electric field and current density over time
XPACC
r · σrrφ = g
<latexit sha1_base64="gyIeNogWzLnXIxRGRrbsZ lcO3k=">A CDnicbVA9SwNBEJ3zM8avU0stFkPAKtzZaCMEbSwTMB+QC2Fus0mW7O0du3tC PkFNv4VGxF bK3t7Pwpbj4ETXw 8Hhvhpl5YSK4Np736Swtr6yurWc2sptb2zu7 t5+VcepoqxCYxGreoiaCS5ZxXAjWD1RDKNQsFrYvxr7tVumNI/ljRk rBlhV/IOp2is1HLzgcRQIAloOzYk0LwbYUuRHzXpcXJBui035xW8Ccgi8WckVzx6Kn8BQKnlfgTtmKYRk4YK1Lrhe4lpDlEZTgUbZYNUswRpH7usYanEiOnmcPLOiOSt0iadWNmShkzU3xNDjLQeRKHtjND09Lw3Fv/zGqnpnDeHXCapYZJOF3VSQUxMxtmQNleMGjGwBKni9lZCe6iQGptg1obgz7+8SKqnBd8r+GWbxiVMkYFDOIYT8OEMinANJagAhTt4gGd4ce6dR+fVeZu2LjmzmQP4A+f9G4wNnVc=</latexit> <latexit sha1_base64="6dsFXVh1S8TqVSgnIwp21R0hVj4=">A CDnicbVC7SgNBFJ2Nrxhfq5aKDIaAVdi10UYI2lgmYB6QDcvdySQZMju7zMwKYUlpZeOv2EhQxNbazm/wJ5w8BE08cOFwzr3ce08Qc6a043xamaXl dW17HpuY3Nre8fe3aupKJGEVknEI9kIQFHOBK1qpjltxJ CGHBaD/pXY79+S6VikbjRg5i2QugK1mE tJF8u+AJCDhgj7Qj T3FuiH4Ev+ocY/hC9z17bxTdCbAi8SdkXzpcFT5ujsalX37w2tHJAmp0ISDUk3XiXUrBakZ4XSY8xJFYyB96NKmoQJCqlrp5J0hLhiljTuRNCU0nqi/J1I lRqEgekMQf UvDcW/ Oaie6ct1Im4kRTQa LOgnHOsLjbHCbSUo0HxgCRDJzKyY9kEC0STBnQnDnX14ktdOi6xTdiknjEk2R QfoGJ0gF52hErpGZVRFBN2jR/SMXqwH68l6td6mrRlrNrOP/sB6/wZqUp69</latexit> <latexit sha1_base64="6dsFXVh1S8TqVSgnIwp21R0hVj4=">A CDnicbVC7SgNBFJ2Nrxhfq5aKDIaAVdi10UYI2lgmYB6QDcvdySQZMju7zMwKYUlpZeOv2EhQxNbazm/wJ5w8BE08cOFwzr3ce08Qc6a043xamaXl dW17HpuY3Nre8fe3aupKJGEVknEI9kIQFHOBK1qpjltxJ CGHBaD/pXY79+S6VikbjRg5i2QugK1mE tJF8u+AJCDhgj7Qj T3FuiH4Ev+ocY/hC9z17bxTdCbAi8SdkXzpcFT5ujsalX37w2tHJAmp0ISDUk3XiXUrBakZ4XSY8xJFYyB96NKmoQJCqlrp5J0hLhiljTuRNCU0nqi/J1I lRqEgekMQf UvDcW/ Oaie6ct1Im4kRTQa LOgnHOsLjbHCbSUo0HxgCRDJzKyY9kEC0STBnQnDnX14ktdOi6xTdiknjEk2R QfoGJ0gF52hErpGZVRFBN2jR/SMXqwH68l6td6mrRlrNrOP/sB6/wZqUp69</latexit> <latexit sha1_base64="cA6f1IiS0YtSmUKcNY5x0R5U1ug=">A CDnicbVBNS8NAEN34WetX1KOXxVLwVBIvehGKXjxWsB/QlD ZbNKlm03Y3Qgl9Bd48a948aCIV8/e/Ddu2wja+mDg8d4M /OCjDOlHefLWl dW9/YrGxVt3d29/btg8O SnNJaJukPJW9ABTlTNC2ZprTXiYpJAGn3WB0PfW791Qqlo 7Pc7oI FYsIgR0Eby7bonIOCAPRKmGnuKxQn4Ev+o2ZDhSxz7ds1pODPgZeKWpIZKtHz70wtTkidUaMJBqb7rZHpQgNSMcDqpermiGZARxLRvqICEqkExe2eC60YJcZRKU0Ljmfp7o BEqXESmM4E9FAtelPxP6+f6+hiUDCR5ZoKMl8U5RzrFE+zwSGTlGg+NgSIZOZWTIYg WiTYNWE4C6+vEw6Zw3Xabi3Tq15VcZRQcfoBJ0iF52jJrpBLdRGBD2gJ/SCXq1H69l6s97nrStWOXOE/sD6+AZ8E5sS</latexit>- The Center for Exascale Simulation of Plasma-Coupled Combustion @ Illinois
- Electric field a key element in the plasma arc
- 30+ meshes
- Image credit: Kyle Mckay @ Illinois (joining LLNL)
Application: Plasma-coupled Combustion
XPACC
Why Blue Waters?
- Sparse matrix operations are communication dominant — performance models
play a key role.
- Exploiting machine layout plays an important role in addressing bottlenecks in
communication.
- Blue Waters has enabled us to develop/test/scales codes to address these issues
Structured Multigrid Algebraic Multigrid
Sparsity (a.k.a. data relationships)
Multilevel solvers for solving
- Series or hierarchy of successively smaller (and more dense) problems
- Iteratively annihilate the error in the solution through this hierarchy of problems
A
<latexit sha1_base64="fkV3sATAImxEwvm50qE27cFfdH4=">A B6HicbZC7SwNBEMbnfMbzFbW0WQyCVbiz0UaM2lgmYB6QHGFvM5es2ds7dveE AL2NhaK2PrP2Nv537h5FJr4wcKP75thZyZMBdfG876dpeWV1bX13Ia7ubW9s5vf26/pJFM qywRiWqEVKPgEquG 4GNVCGNQ4H1sH8z usPqDRP5J0ZpBjEtCt5xBk1 qpctfMFr+hNRBbBn0Hh8tO9eASAcjv/1eokLItRGiao1k3fS0 wpMpwJnDktjKNKWV92sWmRUlj1MFwMuiIHFunQ6JE2ScNmbi/O4Y01noQh7Yypqan57Ox+V/WzEx0Hgy5TDODk 0/ijJBTELGW5MOV8iMGFigTHE7K2E9qigz9jauPYI/v/Ii1E6Lvlf0K16hdA1T5eAQjuAEfDiDEtxCGarA OEJXuDVuXe nTfnfVq65Mx6DuCPnI8fAOSOjg= </latexit> <latexit sha1_base64="cXTU+CXtwvT+XR6CxDO4y9+Gj2Q=">A B6HicbZC7SgNBFIbPxltcb1FLm8UgWIVdG23EqI1lAuYCyRJmJ2eTMbOzy8ysEJY8gY2FIrb6MPY24ts4uRSa+MPAx/+fw5xzgoQzpV3 28otLa+sruX 7Y3Nre2dwu5eXcWp FijMY9lMyAKORNY0 xzbCYS R wbASD63HeuEepWCxu9TB PyI9wUJGiTZW9bJTKLoldyJnEbwZFC8+7P k/cu dAqf7W5M0wiFp wo1fLcRPsZkZpRjiO7nSpMCB2QHrYMChKh8rPJoCPnyDhdJ4yleUI7E/d3R0YipYZRYCojovtqPhub/2WtVIdnfsZEkmoUdPpRmHJHx854a6fLJFLNhwYIlczM6tA+kYRqcxvbHMGbX3kR6iclzy15VbdYvoKp8nA h3AMHpxCGW6gAjWg PA T/Bs3VmP1ov1Oi3NWbOef gj6+0H8mSQAg= </latexit> <latexit sha1_base64="cXTU+CXtwvT+XR6CxDO4y9+Gj2Q=">A B6HicbZC7SgNBFIbPxltcb1FLm8UgWIVdG23EqI1lAuYCyRJmJ2eTMbOzy8ysEJY8gY2FIrb6MPY24ts4uRSa+MPAx/+fw5xzgoQzpV3 28otLa+sruX 7Y3Nre2dwu5eXcWp FijMY9lMyAKORNY0 xzbCYS R wbASD63HeuEepWCxu9TB PyI9wUJGiTZW9bJTKLoldyJnEbwZFC8+7P k/cu dAqf7W5M0wiFp wo1fLcRPsZkZpRjiO7nSpMCB2QHrYMChKh8rPJoCPnyDhdJ4yleUI7E/d3R0YipYZRYCojovtqPhub/2WtVIdnfsZEkmoUdPpRmHJHx854a6fLJFLNhwYIlczM6tA+kYRqcxvbHMGbX3kR6iclzy15VbdYvoKp8nA h3AMHpxCGW6gAjWg PA T/Bs3VmP1ov1Oi3NWbOef gj6+0H8mSQAg= </latexit> <latexit sha1_base64="4NHxK0U8odePM2yYT46eryV6cQE=">A B6HicbVA9TwJBEJ3DL8Qv1NJmIzGxInc2UqI2lpDIRwIXsrfMwcre3mV3z4Rc+AU2Fhpj60+y89+4wBUKvmS l/dmMjMvSATXxnW/ncLG5tb2TnG3tLd/cHhUPj5p6zhVDFs FrHqBlSj4BJbh uB3UQhjQKBnWByN/c7T6g0j+WDmSboR3QkecgZNVZq3gzKFbfqLkDWiZeTCuRoDMpf/WHM0gilY Jq3fPcxPgZVY zgbNSP9WYUDahI+xZKm E2s8Wh87IhVWGJIyVLWnIQv09kdFI62kU2M6ImrFe9ebif14vNWHNz7hMUoOSLReFqSAmJvOvyZArZEZMLaFMcXsrYWOqKDM2m5INwVt9eZ20r6qeW/WabqV+m8dRhDM4h0vw4BrqcA8NaAEDhGd4hTfn0Xlx3p2PZWvByWdO4Q+czx+RnYzB</latexit>Problem Constructing the solver Applying the solver ~SpMM + several SpMVs ~many SpMVs SpMM SpMV
A ∗ v
<latexit sha1_base64="eRKzg74bhR+smcNHJXLbDq E6V0=">A B6nicbZDLSgMxFIbP1Fsdb1WXboJFEBdlxo1uxKoblxXtBdqhZNJMG5pJhiRTKEPBF3DjQhG3vot7d76N6W hrT8EPv7/H LOCRPOtPG8bye3tLy upZfdzc2t7Z3Crt7NS1TRWiVSC5VI8SaciZo1TD aSNRFMchp/WwfzPO6wOqNJPiwQwTGsS4K1jECDbWur86GbQLRa/kTYQWwZ9B8fLTvXgEgEq78NXqSJLGVBjCsdZN30tMkGFlGOF05LZSTRNM+rhLmxYFjqkOs moI3RknQ6KpLJPGDRxf3dkONZ6GIe2Msamp+ezsflf1kxNdB5kTCSpoYJMP4pSjoxE471RhylKDB9awEQxOysiPaw MfY6rj2CP7/yItROS75X8u+8YvkapsrDARzCMfhwBmW4hQpUgUAXnuAFXh3uPDtvzvu0NOfMevbhj5yPHz3pj0I=</latexit> <latexit sha1_base64="6KB8A8ZQm6Fx92R+4UZsc3AfLR8=">A B6nicbZDLSgMxFIbPeK3jrerSTbAI4qJk3OhGrLpxWdFeoB1KJs20oZnMkGQKZegjuHGhiEt9F/duxLcxvSy09YfAx/+fQ845QSK4Nh /OwuLS8srq7k1d31jc2s7v7Nb1XGqK vQWMSqHhDNBJesYrgRrJ4oRqJAsFrQux7ltT5Tmsfy3gwS5kekI3nIKTHWurs87rfyBVzEY6F58KZQuPhwz5O3L7fcyn82 zFNIyYNFUTrhocT42dEGU4FG7rNVLOE0B7psIZFS Km/Ww86hAdWqeNwljZJw0au787MhJpPYgCWxkR09Wz2cj8L2ukJjz My6T1DBJ x+FqUAmRqO9UZsrRo0YWCBUcTsrol2iCDX2Oq49gje78jxUT4oeLnq3uFC6golysA8HcAQenEIJbqAMFaDQgQd4gmdHOI/Oi/M6KV1wpj178EfO+w8veJC2</latexit> <latexit sha1_base64="6KB8A8ZQm6Fx92R+4UZsc3AfLR8=">A B6nicbZDLSgMxFIbPeK3jrerSTbAI4qJk3OhGrLpxWdFeoB1KJs20oZnMkGQKZegjuHGhiEt9F/duxLcxvSy09YfAx/+fQ845QSK4Nh /OwuLS8srq7k1d31jc2s7v7Nb1XGqK vQWMSqHhDNBJesYrgRrJ4oRqJAsFrQux7ltT5Tmsfy3gwS5kekI3nIKTHWurs87rfyBVzEY6F58KZQuPhwz5O3L7fcyn82 zFNIyYNFUTrhocT42dEGU4FG7rNVLOE0B7psIZFS Km/Ww86hAdWqeNwljZJw0au787MhJpPYgCWxkR09Wz2cj8L2ukJjz My6T1DBJ x+FqUAmRqO9UZsrRo0YWCBUcTsrol2iCDX2Oq49gje78jxUT4oeLnq3uFC6golysA8HcAQenEIJbqAMFaDQgQd4gmdHOI/Oi/M6KV1wpj178EfO+w8veJC2</latexit> <latexit sha1_base64="1cK2pAPxUsBz4v3btmoXmp0T1a8=">A B6nicbVA9SwNBEJ2LXzF+RS1tFoMgFuHORsuojWVE8wHJEfY2c8mSvb1jdy8QjvwEGwtFbP1Fdv4bN8kVmvhg4PHeD PzgkRwbVz32ymsrW9sbhW3Szu7e/sH5cOjpo5TxbDBYhGrdkA1Ci6xYbgR2E4U0igQ2ApGdzO/NUaleSyfzCRBP6IDyUPOqLHS483FuFeu FV3DrJKvJxUIEe9V/7q9mOWRigNE1Trjucmxs+oMpwJnJa6qcaEshEdYMdS PUfjY/dUrOrNInYaxsSUPm6u+JjEZaT6LAdkbUDPWyNxP/8zqpCa/9jMskNSjZYlGYCmJiMvub9LlCZsTE soUt7cSNqSKMmPTKdkQvOWXV0nzsuq5Ve/BrdRu8ziKcAKncA4eXE N7qEODWAwgGd4hTdHOC/Ou/OxaC04+cwx/IHz+QPOo 1 </latexit>A ∗ B
<latexit sha1_base64="qo5g4ZkcYtnq0UvfDnPzlMNjQis=">A B6nicbZDLSgMxFIbP1Fsdb1WXboJFEBdlxo1uxFo3LivaC7RDyaSZNjTJDElGKEPBF3DjQhG3vot7d76N6W hrT8EPv7/H LOCRPOtPG8bye3tLy upZfdzc2t7Z3Crt7dR2nitAaiXmsmiHWlDNJa4YZTpuJoliEnDbCwfU4bzxQpVks780woYHAPckiRrCx1t3VSaVTKHolbyK0CP4Mipef7sUjAFQ7ha92NyapoNIQjrVu+V5ig wrw inI7edap gMsA92rIosaA6yCaj tCRdbo ipV90qCJ+7sjw0LroQhtpcCmr+ezsflf1kpNdB5kTCapoZJMP4pSjkyMxnujLlOUGD60gIlidlZE+lh Yux1XHsEf37lRaiflnyv5N96xXIFpsrDARzCMfhwBmW4gSrUgEAPnuAFXh3uPDtvzvu0NOfMevbhj5yPH+8Kjw4=</latexit> <latexit sha1_base64="D+ioTeHpmdkp /f+ivpLiGfby+k=">A B6nicbZDLSgMxFIbPeK3jrerSTbAI4qJk3OhGrHXjsqK9QDuUTJp QzOZIckIZegjuHGhiEt9F/duxLcxvSy09YfAx/+fQ845QSK4Nh /OwuLS8srq7k1d31jc2s7v7Nb03GqK vSWMSqERDNBJesargRrJEoRqJAsHrQvxrl9XumNI/lnRk zI9IV/KQU2KsdXt5XG7nC7iIx0Lz4E2hcPHhnidvX26lnf9sdWKaRkwaKojWTQ8nxs+IMpwKNnRbqWYJoX3SZU2Lk RM+9l41CE6tE4HhbGyTxo0dn93ZCTSehAFtjIipqdns5H5X9ZMTXjmZ1wmqWGSTj4KU4FMjEZ7ow5XjBoxsECo4nZWRHtE WrsdVx7BG925XmonRQ9XPRucKFUholysA8HcAQenEIJrqECVaDQhQd4gmdHOI/Oi/M6KV1wpj178EfO+w/gmZC </latexit> <latexit sha1_base64="D+ioTeHpmdkp /f+ivpLiGfby+k=">A B6nicbZDLSgMxFIbPeK3jrerSTbAI4qJk3OhGrHXjsqK9QDuUTJp QzOZIckIZegjuHGhiEt9F/duxLcxvSy09YfAx/+fQ845QSK4Nh /OwuLS8srq7k1d31jc2s7v7Nb03GqK vSWMSqERDNBJesargRrJEoRqJAsHrQvxrl9XumNI/lnRk zI9IV/KQU2KsdXt5XG7nC7iIx0Lz4E2hcPHhnidvX26lnf9sdWKaRkwaKojWTQ8nxs+IMpwKNnRbqWYJoX3SZU2Lk RM+9l41CE6tE4HhbGyTxo0dn93ZCTSehAFtjIipqdns5H5X9ZMTXjmZ1wmqWGSTj4KU4FMjEZ7ow5XjBoxsECo4nZWRHtE WrsdVx7BG925XmonRQ9XPRucKFUholysA8HcAQenEIJrqECVaDQhQd4gmdHOI/Oi/M6KV1wpj178EfO+w/gmZC </latexit> <latexit sha1_base64="aklsuNjWXvc7KYlocEhLW12pNnA=">A B6nicbVA9SwNBEJ2LXzF+RS1tFoMgFuHORs YG8uI5gOSI+xt9pIle3vH7pwQjvwEGwtFbP1Fdv4bN8kVmvhg4PHeD PzgkQKg67 RTW1jc2t4rbpZ3dvf2D8uFRy8SpZrzJYhnrTkANl0LxJgqUvJNoTqNA8nYwvp357SeujYjVI04S7kd0qEQoGEUrPdxc1Pvlilt15yCrxMtJBXI0+uWv3iBmacQVMkmN6Xpugn5GNQom+bTUSw1PKBvTIe9aqmjEjZ/NT52SM6sMSBhrWwrJXP09kdHImEkU2M6I4sgsezPxP6+bYnjtZ0IlKXLF ovCVBKMyexvMhCaM5QTSyjTwt5K2IhqytCmU7IheMsvr5LWZdVzq969W6nV8ziKcAKncA4eXE N7qABTWAwhGd4hTdHOi/Ou/OxaC04+cwx/IHz+QN/0o1B</latexit>Ax = b
<latexit sha1_base64="67UDMgoFYltg- brz7rR2DIHe0=">A
Sparse Matrices
- Complexity: O(nnz)
- This is CSR, other formats
are similar (in cost, not memory access)
1.1 2.2 3.3 4.4 5.5 6.6 7.7 8.8 9.9 10.0 11.1Arowptr = [0 2 4 7 9 11] Acol = [0 2 1 3 0 2 4 1 3 3 4] Aval = [1.1 2.2 3.3 4.4 5.5 6.6 7.7 8.8 9.9 10.0 11.1] for(i = 0; i < n; i++){ sum = y[i]; for(jj = Arowptr[i]; jj < Arowptr[i+1]; jj++){ sum += Aval[jj] * x[ Acol[jj] ]; } y[i] = sum; }
1 2 3 4 1 2 3 4y = A ∗ x
<latexit sha1_base64="3zSqcJljsA3uhLQGKOG/FuzEJEw=">A B8HicbVBNSwMxEJ2tX7V+VT16CRZBPJRdEfQiVL14rGA/pF1KNs2 oUl2SbLisvRXePGgiFd/j f/jWm7B219MPB4b4aZeUHMmTau+ 0UlpZXVteK6 WNza3tnfLuXlNHiSK0QSIeqXaANeVM0oZh tN2rCgWAaetYHQz8VuPVGkWyXuTxtQXeCBZyAg2VnpI0SW6QifoqVeu FV3CrRIvJxUIEe9V/7q9iOSC oN4VjrjufGxs+wMoxwOi51E01jTEZ4QDuWSiyo9rPpwWN0ZJU+CiNlSxo0VX9PZFhonYrAdgpshnrem4j/eZ3EhBd+xmScGCrJbFGYcGQiNPke9ZmixPDUEkwUs7ciMsQKE2MzKtkQvPmXF0nztOq5Ve/urFK7zuMowgEcwjF4cA41uIU6NICAgGd4hTdHOS/Ou/Mxay04+cw+/IHz+QO X47t</latexit> <latexit sha1_base64="3zSqcJljsA3uhLQGKOG/FuzEJEw=">A B8HicbVBNSwMxEJ2tX7V+VT16CRZBPJRdEfQiVL14rGA/pF1KNs2 oUl2SbLisvRXePGgiFd/j f/jWm7B219MPB4b4aZeUHMmTau+ 0UlpZXVteK6 WNza3tnfLuXlNHiSK0QSIeqXaANeVM0oZh tN2rCgWAaetYHQz8VuPVGkWyXuTxtQXeCBZyAg2VnpI0SW6QifoqVeu FV3CrRIvJxUIEe9V/7q9iOSC oN4VjrjufGxs+wMoxwOi51E01jTEZ4QDuWSiyo9rPpwWN0ZJU+CiNlSxo0VX9PZFhonYrAdgpshnrem4j/eZ3EhBd+xmScGCrJbFGYcGQiNPke9ZmixPDUEkwUs7ciMsQKE2MzKtkQvPmXF0nztOq5Ve/urFK7zuMowgEcwjF4cA41uIU6NICAgGd4hTdHOS/Ou/Mxay04+cw+/IHz+QO X47t</latexit> <latexit sha1_base64="3zSqcJljsA3uhLQGKOG/FuzEJEw=">A B8HicbVBNSwMxEJ2tX7V+VT16CRZBPJRdEfQiVL14rGA/pF1KNs2 oUl2SbLisvRXePGgiFd/j f/jWm7B219MPB4b4aZeUHMmTau+ 0UlpZXVteK6 WNza3tnfLuXlNHiSK0QSIeqXaANeVM0oZh tN2rCgWAaetYHQz8VuPVGkWyXuTxtQXeCBZyAg2VnpI0SW6QifoqVeu FV3CrRIvJxUIEe9V/7q9iOSC oN4VjrjufGxs+wMoxwOi51E01jTEZ4QDuWSiyo9rPpwWN0ZJU+CiNlSxo0VX9PZFhonYrAdgpshnrem4j/eZ3EhBd+xmScGCrJbFGYcGQiNPke9ZmixPDUEkwUs7ciMsQKE2MzKtkQvPmXF0nztOq5Ve/urFK7zuMowgEcwjF4cA41uIU6NICAgGd4hTdHOS/Ou/Mxay04+cw+/IHz+QO X47t</latexit> <latexit sha1_base64="3zSqcJljsA3uhLQGKOG/FuzEJEw=">A B8HicbVBNSwMxEJ2tX7V+VT16CRZBPJRdEfQiVL14rGA/pF1KNs2 oUl2SbLisvRXePGgiFd/j f/jWm7B219MPB4b4aZeUHMmTau+ 0UlpZXVteK6 WNza3tnfLuXlNHiSK0QSIeqXaANeVM0oZh tN2rCgWAaetYHQz8VuPVGkWyXuTxtQXeCBZyAg2VnpI0SW6QifoqVeu FV3CrRIvJxUIEe9V/7q9iOSC oN4VjrjufGxs+wMoxwOi51E01jTEZ4QDuWSiyo9rPpwWN0ZJU+CiNlSxo0VX9PZFhonYrAdgpshnrem4j/eZ3EhBd+xmScGCrJbFGYcGQiNPke9ZmixPDUEkwUs7ciMsQKE2MzKtkQvPmXF0nztOq5Ve/urFK7zuMowgEcwjF4cA41uIU6NICAgGd4hTdHOS/Ou/Mxay04+cw+/IHz+QO X47t</latexit>Key Challenge: Parallel Efficiency of Sparse Operations
- Solid blocks: on-process portion
- Patterned blocks: off-process portion (requires communication of the input vector)
w A v P0 P1 P2 P3
w ← A ∗ v
Data layout Where data is sent
Blue Waters Case Study: Laplacian
1 2 3 4 5 6 7 8 Level in AMG Hierarchy 0.00 0.05 0.10 0.15 0.20 Setup Time
Comp Vector Comm Matrix Comm
1 2 3 4 5 6 7 8 Level in AMG Hierarchy 10 20 30 Solve Time
Comp Vector Comm
8192 processors, 512 nodes, ~200 rows per processor
How do we address this?
- 1. Remove data
- 2. Data layout
- 3. Data partition
- 4. Data traffic
- R. D. Falgout and J. B. Schroder, “Non-Galerkin coarse grids for algebraic multigrid,”
- E. Treister and I. Yavneh, “Non-Galerkin multigrid based on sparsified smoothed ag-
- A. Bienz, R. D. Falgout, W. Gropp, L. N. Olson, and J. B. Schroder, “Reducing parallel
Reorganize communication Recognize limits of communication Recognize opportunities in the machine hierarchy
Observation 1: high volume/number of messages
Maximum number of messages Maximum size of messages
5 10 15 20 AMG Level 102 103 Max Number of Messages 5 10 15 20 AMG Level 104 105 Max Messages Size (bytes)np = 16384
Observation 2: diminishing returns with higher communicating cores
node n node m 100 101 102 103 104 105 10 Number of Bytes Communicated 10−6 10−5 10−4 Time (seconds Network (PPN ≥ 4) Network (PPN < 4) On-Node On-SocketT = α + ppn · s min (RN, ppn · RB)
latency message size Bandwidth between two processes Node injection bandwidth Modeling MPI Communication Performance on SMP Nodes: Is it Time to Retire the Ping Pong Test, Gropp, Olson, Samfass, EuroMPI 2016.- Concurrency increasing
- Hierarchy of compute nodes (sockets , nodes, etc)
- Range of compute units (power 9, GPU, etc)
- Blue Waters is providing a roadmap
node socket die cores
Observation 3: node locality
A node level approach to a SpMV
P0 P1 P2 P3 P4 P5 N0 N1 N2Six processes distributed across three nodes Linear system distributed across the processes
w A v P0 P1 P2 P3 P4 P5
Standard Communication
Node coren m p n m q
Node core- Duplicate data
- Many messages
3-step Communication
n m p q
2-step Communication;
n m
- ref. SpMV
- ref. SpMV
Total Time Strong Scaling
Node aware sparse matrix-vector multiplication, Bienz, Gropp, Olson, JPDC, 2019.Case Study: 3 step communication, Linear Elasticity
Impact on SpMM and SpMV
2 4 6 8 10 12 Level in AMG Hierarchy 0.000 0.002 0.004 0.006 Time (seconds)
Standard 3-Step Node-Aware 2-Step Node-Aware2 4 6 8 10 12 Level in AMG Hierarchy 0.0000 0.0002 0.0004 0.0006 0.0008 0.0010 Time (seconds)
Standard 3-Step Node-Aware 2-Step Node-Aware Row-Wise SpGEMM: AP SpMV: AxPerformance Results: Strong Scaling
- MFEM Grad-Div problem; Blue Waters; 356,352 rows; 14,145,024 nnz
- New algorithms + performance models = extended scaling
Automatic selection Time Speedup
Reducing Communication in Algebraic Multigrid with Multi-step Node Aware Communication Bienz, Gropp, Olson https://arxiv.org/abs/1904.05838Node-Aware MPI Library
n m q n m p q n mImpact of Blue Waters
- Amanda Bienz, PhD (2018)
reducing communication
- Andrew Reisner, PhD (2019)
scalable structured solvers
- Lukas Spies, PhD (current)
communication and multi-GPU systems
- Philipp Samfass, MS (2016)
performance modeling
- Shelby Lockhart, PhD (current)
high performance Krylov methods
- John Calhoun, PhD (2017, BW Fellow!)
fault resilience in HPC
- Blue Waters has been an invaluable
recruitment tool, both students and faculty
- Blue Waters has directly contributed
to the visibility and quality of the research
- Blue Waters has been a gateway to
developing new codes, testing new methods, and anticipating new and upcoming architectures.
Where to find more
- github.com/cedar-framework/cedar
- Scaling Structured Multigrid to 500K+ Cores through Coarse-Grid Redistribution
- github.com/raptor-library/raptor
- Node-Aware Sparse Matrix-Vector Communication
- Improving Performance Models for Irregular Point-to-Point Communication
- Reducing Communication in Algebraic Multigrid with Multi-step Node Aware
- github.com/bienz2/Node_Aware_MPI
This material is based in part upon work supported by the Department of Energy, National Nuclear Security Administration, under Award Number DE-NA0002374. This research is part of the Blue Waters sustained petascale computing project, which is supported by the National Science Foundation (awards OCI-0725070 and ACI-1238993) and the state of Illinois. Blue Waters is a joint effort of the University of Illinois at Urbana-Champaign and its National Center for Supercomputing Applications.