Utilizing machine topology in numerical algorithms Luke Olson - - PowerPoint PPT Presentation

utilizing machine topology in numerical algorithms
SMART_READER_LITE
LIVE PREVIEW

Utilizing machine topology in numerical algorithms Luke Olson - - PowerPoint PPT Presentation

Utilizing machine topology in numerical algorithms Luke Olson Department of Computer Science Computational Science and Engineering University of Illinois at Urbana-Champaign Overview I use Blue Waters to solve large sparse


slide-1
SLIDE 1

Utilizing machine topology in numerical algorithms

Luke Olson Department of Computer Science Computational Science and Engineering University of Illinois at Urbana-Champaign
slide-2
SLIDE 2

Overview

  • I use Blue Waters to solve large sparse linear systems and to study the

performance
 
 
 


  • Why?

[ ][][]

A x b =

Time step Linearization Assembly Solve Adapt

solve

  • ther
Image: David Keyes??
slide-3
SLIDE 3

Application: Plasma-coupled Combustion

  • The Center for Exascale Simulation of Plasma-Coupled Combustion

@ Illinois

XPACC

Oxidizer O - Ar; Air 2 Fuel H ; CH 2 4
slide-4
SLIDE 4

Application: Plasma-coupled Combustion

  • The Center for Exascale Simulation of Plasma-Coupled Combustion

@ Illinois


  • Electric conductivity influences the electric field and current density over time

XPACC

r · σrrφ = g

<latexit sha1_base64="gyIeNogWzLnXIxRGRrbsZ lcO3k=">A CDnicbVA9SwNBEJ3zM8avU0stFkPAKtzZaCMEbSwTMB+QC2Fus0mW7O0du3tC PkFNv4VGxF bK3t7Pwpbj4ETXw 8Hhvhpl5YSK4Np736Swtr6yurWc2sptb2zu7 t5+VcepoqxCYxGreoiaCS5ZxXAjWD1RDKNQsFrYvxr7tVumNI/ljRk rBlhV/IOp2is1HLzgcRQIAloOzYk0LwbYUuRHzXpcXJBui035xW8Ccgi8WckVzx6Kn8BQKnlfgTtmKYRk4YK1Lrhe4lpDlEZTgUbZYNUswRpH7usYanEiOnmcPLOiOSt0iadWNmShkzU3xNDjLQeRKHtjND09Lw3Fv/zGqnpnDeHXCapYZJOF3VSQUxMxtmQNleMGjGwBKni9lZCe6iQGptg1obgz7+8SKqnBd8r+GWbxiVMkYFDOIYT8OEMinANJagAhTt4gGd4ce6dR+fVeZu2LjmzmQP4A+f9G4wNnVc=</latexit> <latexit sha1_base64="6dsFXVh1S8TqVSgnIwp21R0hVj4=">A CDnicbVC7SgNBFJ2Nrxhfq5aKDIaAVdi10UYI2lgmYB6QDcvdySQZMju7zMwKYUlpZeOv2EhQxNbazm/wJ5w8BE08cOFwzr3ce08Qc6a043xamaXl dW17HpuY3Nre8fe3aupKJGEVknEI9kIQFHOBK1qpjltxJ CGHBaD/pXY79+S6VikbjRg5i2QugK1mE tJF8u+AJCDhgj7Qj T3FuiH4Ev+ocY/hC9z17bxTdCbAi8SdkXzpcFT5ujsalX37w2tHJAmp0ISDUk3XiXUrBakZ4XSY8xJFYyB96NKmoQJCqlrp5J0hLhiljTuRNCU0nqi/J1I lRqEgekMQf UvDcW/ Oaie6ct1Im4kRTQa LOgnHOsLjbHCbSUo0HxgCRDJzKyY9kEC0STBnQnDnX14ktdOi6xTdiknjEk2R QfoGJ0gF52hErpGZVRFBN2jR/SMXqwH68l6td6mrRlrNrOP/sB6/wZqUp69</latexit> <latexit sha1_base64="6dsFXVh1S8TqVSgnIwp21R0hVj4=">A CDnicbVC7SgNBFJ2Nrxhfq5aKDIaAVdi10UYI2lgmYB6QDcvdySQZMju7zMwKYUlpZeOv2EhQxNbazm/wJ5w8BE08cOFwzr3ce08Qc6a043xamaXl dW17HpuY3Nre8fe3aupKJGEVknEI9kIQFHOBK1qpjltxJ CGHBaD/pXY79+S6VikbjRg5i2QugK1mE tJF8u+AJCDhgj7Qj T3FuiH4Ev+ocY/hC9z17bxTdCbAi8SdkXzpcFT5ujsalX37w2tHJAmp0ISDUk3XiXUrBakZ4XSY8xJFYyB96NKmoQJCqlrp5J0hLhiljTuRNCU0nqi/J1I lRqEgekMQf UvDcW/ Oaie6ct1Im4kRTQa LOgnHOsLjbHCbSUo0HxgCRDJzKyY9kEC0STBnQnDnX14ktdOi6xTdiknjEk2R QfoGJ0gF52hErpGZVRFBN2jR/SMXqwH68l6td6mrRlrNrOP/sB6/wZqUp69</latexit> <latexit sha1_base64="cA6f1IiS0YtSmUKcNY5x0R5U1ug=">A CDnicbVBNS8NAEN34WetX1KOXxVLwVBIvehGKXjxWsB/QlD ZbNKlm03Y3Qgl9Bd48a948aCIV8/e/Ddu2wja+mDg8d4M /OCjDOlHefLWl dW9/YrGxVt3d29/btg8O SnNJaJukPJW9ABTlTNC2ZprTXiYpJAGn3WB0PfW791Qqlo 7Pc7oI FYsIgR0Eby7bonIOCAPRKmGnuKxQn4Ev+o2ZDhSxz7ds1pODPgZeKWpIZKtHz70wtTkidUaMJBqb7rZHpQgNSMcDqpermiGZARxLRvqICEqkExe2eC60YJcZRKU0Ljmfp7o BEqXESmM4E9FAtelPxP6+f6+hiUDCR5ZoKMl8U5RzrFE+zwSGTlGg+NgSIZOZWTIYg WiTYNWE4C6+vEw6Zw3Xabi3Tq15VcZRQcfoBJ0iF52jJrpBLdRGBD2gJ/SCXq1H69l6s97nrStWOXOE/sD6+AZ8E5sS</latexit>
slide-5
SLIDE 5
  • The Center for Exascale Simulation of Plasma-Coupled Combustion @ Illinois



 
 
 
 
 
 
 


  • Electric field a key element in the plasma arc
  • 30+ meshes
  • Image credit: Kyle Mckay @ Illinois (joining LLNL)

Application: Plasma-coupled Combustion

XPACC

slide-6
SLIDE 6

Why Blue Waters?

  • Sparse matrix operations are communication dominant — performance models

play a key role.

  • Exploiting machine layout plays an important role in addressing bottlenecks in

communication.

  • Blue Waters has enabled us to develop/test/scales codes to address these issues

Structured Multigrid Algebraic Multigrid

slide-7
SLIDE 7

Sparsity (a.k.a. data relationships)

slide-8
SLIDE 8

Multilevel solvers for solving

  • Series or hierarchy of successively smaller (and more dense) problems
  • Iteratively annihilate the error in the solution through this hierarchy of problems

A

<latexit sha1_base64="fkV3sATAImxEwvm50qE27cFfdH4=">A B6HicbZC7SwNBEMbnfMbzFbW0WQyCVbiz0UaM2lgmYB6QHGFvM5es2ds7dveE AL2NhaK2PrP2Nv537h5FJr4wcKP75thZyZMBdfG876dpeWV1bX13Ia7ubW9s5vf26/pJFM qywRiWqEVKPgEquG 4GNVCGNQ4H1sH8z usPqDRP5J0ZpBjEtCt5xBk1 qpctfMFr+hNRBbBn0Hh8tO9eASAcjv/1eokLItRGiao1k3fS0 wpMpwJnDktjKNKWV92sWmRUlj1MFwMuiIHFunQ6JE2ScNmbi/O4Y01noQh7Yypqan57Ox+V/WzEx0Hgy5TDODk 0/ijJBTELGW5MOV8iMGFigTHE7K2E9qigz9jauPYI/v/Ii1E6Lvlf0K16hdA1T5eAQjuAEfDiDEtxCGarA OEJXuDVuXe nTfnfVq65Mx6DuCPnI8fAOSOjg= </latexit> <latexit sha1_base64="cXTU+CXtwvT+XR6CxDO4y9+Gj2Q=">A B6HicbZC7SgNBFIbPxltcb1FLm8UgWIVdG23EqI1lAuYCyRJmJ2eTMbOzy8ysEJY8gY2FIrb6MPY24ts4uRSa+MPAx/+fw5xzgoQzpV3 28otLa+sruX 7Y3Nre2dwu5eXcWp FijMY9lMyAKORNY0 xzbCYS R wbASD63HeuEepWCxu9TB PyI9wUJGiTZW9bJTKLoldyJnEbwZFC8+7P k/cu dAqf7W5M0wiFp wo1fLcRPsZkZpRjiO7nSpMCB2QHrYMChKh8rPJoCPnyDhdJ4yleUI7E/d3R0YipYZRYCojovtqPhub/2WtVIdnfsZEkmoUdPpRmHJHx854a6fLJFLNhwYIlczM6tA+kYRqcxvbHMGbX3kR6iclzy15VbdYvoKp8nA h3AMHpxCGW6gAjWg PA T/Bs3VmP1ov1Oi3NWbOef gj6+0H8mSQAg= </latexit> <latexit sha1_base64="cXTU+CXtwvT+XR6CxDO4y9+Gj2Q=">A B6HicbZC7SgNBFIbPxltcb1FLm8UgWIVdG23EqI1lAuYCyRJmJ2eTMbOzy8ysEJY8gY2FIrb6MPY24ts4uRSa+MPAx/+fw5xzgoQzpV3 28otLa+sruX 7Y3Nre2dwu5eXcWp FijMY9lMyAKORNY0 xzbCYS R wbASD63HeuEepWCxu9TB PyI9wUJGiTZW9bJTKLoldyJnEbwZFC8+7P k/cu dAqf7W5M0wiFp wo1fLcRPsZkZpRjiO7nSpMCB2QHrYMChKh8rPJoCPnyDhdJ4yleUI7E/d3R0YipYZRYCojovtqPhub/2WtVIdnfsZEkmoUdPpRmHJHx854a6fLJFLNhwYIlczM6tA+kYRqcxvbHMGbX3kR6iclzy15VbdYvoKp8nA h3AMHpxCGW6gAjWg PA T/Bs3VmP1ov1Oi3NWbOef gj6+0H8mSQAg= </latexit> <latexit sha1_base64="4NHxK0U8odePM2yYT46eryV6cQE=">A B6HicbVA9TwJBEJ3DL8Qv1NJmIzGxInc2UqI2lpDIRwIXsrfMwcre3mV3z4Rc+AU2Fhpj60+y89+4wBUKvmS l/dmMjMvSATXxnW/ncLG5tb2TnG3tLd/cHhUPj5p6zhVDFs FrHqBlSj4BJbh uB3UQhjQKBnWByN/c7T6g0j+WDmSboR3QkecgZNVZq3gzKFbfqLkDWiZeTCuRoDMpf/WHM0gilY Jq3fPcxPgZVY zgbNSP9WYUDahI+xZKm E2s8Wh87IhVWGJIyVLWnIQv09kdFI62kU2M6ImrFe9ebif14vNWHNz7hMUoOSLReFqSAmJvOvyZArZEZMLaFMcXsrYWOqKDM2m5INwVt9eZ20r6qeW/WabqV+m8dRhDM4h0vw4BrqcA8NaAEDhGd4hTfn0Xlx3p2PZWvByWdO4Q+czx+RnYzB</latexit>

Problem Constructing the solver Applying the solver ~SpMM + several SpMVs ~many SpMVs SpMM SpMV

A ∗ v

<latexit sha1_base64="eRKzg74bhR+smcNHJXLbDq E6V0=">A B6nicbZDLSgMxFIbP1Fsdb1WXboJFEBdlxo1uxKoblxXtBdqhZNJMG5pJhiRTKEPBF3DjQhG3vot7d76N6W hrT8EPv7/H LOCRPOtPG8bye3tLy upZfdzc2t7Z3Crt7NS1TRWiVSC5VI8SaciZo1TD aSNRFMchp/WwfzPO6wOqNJPiwQwTGsS4K1jECDbWur86GbQLRa/kTYQWwZ9B8fLTvXgEgEq78NXqSJLGVBjCsdZN30tMkGFlGOF05LZSTRNM+rhLmxYFjqkOs moI3RknQ6KpLJPGDRxf3dkONZ6GIe2Msamp+ezsflf1kxNdB5kTCSpoYJMP4pSjoxE471RhylKDB9awEQxOysiPaw MfY6rj2CP7/yItROS75X8u+8YvkapsrDARzCMfhwBmW4hQpUgUAXnuAFXh3uPDtvzvu0NOfMevbhj5yPHz3pj0I=</latexit> <latexit sha1_base64="6KB8A8ZQm6Fx92R+4UZsc3AfLR8=">A B6nicbZDLSgMxFIbPeK3jrerSTbAI4qJk3OhGrLpxWdFeoB1KJs20oZnMkGQKZegjuHGhiEt9F/duxLcxvSy09YfAx/+fQ845QSK4Nh /OwuLS8srq7k1d31jc2s7v7Nb1XGqK vQWMSqHhDNBJesYrgRrJ4oRqJAsFrQux7ltT5Tmsfy3gwS5kekI3nIKTHWurs87rfyBVzEY6F58KZQuPhwz5O3L7fcyn82 zFNIyYNFUTrhocT42dEGU4FG7rNVLOE0B7psIZFS Km/Ww86hAdWqeNwljZJw0au787MhJpPYgCWxkR09Wz2cj8L2ukJjz My6T1DBJ x+FqUAmRqO9UZsrRo0YWCBUcTsrol2iCDX2Oq49gje78jxUT4oeLnq3uFC6golysA8HcAQenEIJbqAMFaDQgQd4gmdHOI/Oi/M6KV1wpj178EfO+w8veJC2</latexit> <latexit sha1_base64="6KB8A8ZQm6Fx92R+4UZsc3AfLR8=">A B6nicbZDLSgMxFIbPeK3jrerSTbAI4qJk3OhGrLpxWdFeoB1KJs20oZnMkGQKZegjuHGhiEt9F/duxLcxvSy09YfAx/+fQ845QSK4Nh /OwuLS8srq7k1d31jc2s7v7Nb1XGqK vQWMSqHhDNBJesYrgRrJ4oRqJAsFrQux7ltT5Tmsfy3gwS5kekI3nIKTHWurs87rfyBVzEY6F58KZQuPhwz5O3L7fcyn82 zFNIyYNFUTrhocT42dEGU4FG7rNVLOE0B7psIZFS Km/Ww86hAdWqeNwljZJw0au787MhJpPYgCWxkR09Wz2cj8L2ukJjz My6T1DBJ x+FqUAmRqO9UZsrRo0YWCBUcTsrol2iCDX2Oq49gje78jxUT4oeLnq3uFC6golysA8HcAQenEIJbqAMFaDQgQd4gmdHOI/Oi/M6KV1wpj178EfO+w8veJC2</latexit> <latexit sha1_base64="1cK2pAPxUsBz4v3btmoXmp0T1a8=">A B6nicbVA9SwNBEJ2LXzF+RS1tFoMgFuHORsuojWVE8wHJEfY2c8mSvb1jdy8QjvwEGwtFbP1Fdv4bN8kVmvhg4PHeD PzgkRwbVz32ymsrW9sbhW3Szu7e/sH5cOjpo5TxbDBYhGrdkA1Ci6xYbgR2E4U0igQ2ApGdzO/NUaleSyfzCRBP6IDyUPOqLHS483FuFeu FV3DrJKvJxUIEe9V/7q9mOWRigNE1Trjucmxs+oMpwJnJa6qcaEshEdYMdS PUfjY/dUrOrNInYaxsSUPm6u+JjEZaT6LAdkbUDPWyNxP/8zqpCa/9jMskNSjZYlGYCmJiMvub9LlCZsTE soUt7cSNqSKMmPTKdkQvOWXV0nzsuq5Ve/BrdRu8ziKcAKncA4eXE N7qEODWAwgGd4hTdHOC/Ou/OxaC04+cwx/IHz+QPOo 1 </latexit>

A ∗ B

<latexit sha1_base64="qo5g4ZkcYtnq0UvfDnPzlMNjQis=">A B6nicbZDLSgMxFIbP1Fsdb1WXboJFEBdlxo1uxFo3LivaC7RDyaSZNjTJDElGKEPBF3DjQhG3vot7d76N6W hrT8EPv7/H LOCRPOtPG8bye3tLy upZfdzc2t7Z3Crt7dR2nitAaiXmsmiHWlDNJa4YZTpuJoliEnDbCwfU4bzxQpVks780woYHAPckiRrCx1t3VSaVTKHolbyK0CP4Mipef7sUjAFQ7ha92NyapoNIQjrVu+V5ig wrw inI7edap gMsA92rIosaA6yCaj tCRdbo ipV90qCJ+7sjw0LroQhtpcCmr+ezsflf1kpNdB5kTCapoZJMP4pSjkyMxnujLlOUGD60gIlidlZE+lh Yux1XHsEf37lRaiflnyv5N96xXIFpsrDARzCMfhwBmW4gSrUgEAPnuAFXh3uPDtvzvu0NOfMevbhj5yPH+8Kjw4=</latexit> <latexit sha1_base64="D+ioTeHpmdkp /f+ivpLiGfby+k=">A B6nicbZDLSgMxFIbPeK3jrerSTbAI4qJk3OhGrHXjsqK9QDuUTJp QzOZIckIZegjuHGhiEt9F/duxLcxvSy09YfAx/+fQ845QSK4Nh /OwuLS8srq7k1d31jc2s7v7Nb03GqK vSWMSqERDNBJesargRrJEoRqJAsHrQvxrl9XumNI/lnRk zI9IV/KQU2KsdXt5XG7nC7iIx0Lz4E2hcPHhnidvX26lnf9sdWKaRkwaKojWTQ8nxs+IMpwKNnRbqWYJoX3SZU2Lk RM+9l41CE6tE4HhbGyTxo0dn93ZCTSehAFtjIipqdns5H5X9ZMTXjmZ1wmqWGSTj4KU4FMjEZ7ow5XjBoxsECo4nZWRHtE WrsdVx7BG925XmonRQ9XPRucKFUholysA8HcAQenEIJrqECVaDQhQd4gmdHOI/Oi/M6KV1wpj178EfO+w/gmZC </latexit> <latexit sha1_base64="D+ioTeHpmdkp /f+ivpLiGfby+k=">A B6nicbZDLSgMxFIbPeK3jrerSTbAI4qJk3OhGrHXjsqK9QDuUTJp QzOZIckIZegjuHGhiEt9F/duxLcxvSy09YfAx/+fQ845QSK4Nh /OwuLS8srq7k1d31jc2s7v7Nb03GqK vSWMSqERDNBJesargRrJEoRqJAsHrQvxrl9XumNI/lnRk zI9IV/KQU2KsdXt5XG7nC7iIx0Lz4E2hcPHhnidvX26lnf9sdWKaRkwaKojWTQ8nxs+IMpwKNnRbqWYJoX3SZU2Lk RM+9l41CE6tE4HhbGyTxo0dn93ZCTSehAFtjIipqdns5H5X9ZMTXjmZ1wmqWGSTj4KU4FMjEZ7ow5XjBoxsECo4nZWRHtE WrsdVx7BG925XmonRQ9XPRucKFUholysA8HcAQenEIJrqECVaDQhQd4gmdHOI/Oi/M6KV1wpj178EfO+w/gmZC </latexit> <latexit sha1_base64="aklsuNjWXvc7KYlocEhLW12pNnA=">A B6nicbVA9SwNBEJ2LXzF+RS1tFoMgFuHORs YG8uI5gOSI+xt9pIle3vH7pwQjvwEGwtFbP1Fdv4bN8kVmvhg4PHeD PzgkQKg67 RTW1jc2t4rbpZ3dvf2D8uFRy8SpZrzJYhnrTkANl0LxJgqUvJNoTqNA8nYwvp357SeujYjVI04S7kd0qEQoGEUrPdxc1Pvlilt15yCrxMtJBXI0+uWv3iBmacQVMkmN6Xpugn5GNQom+bTUSw1PKBvTIe9aqmjEjZ/NT52SM6sMSBhrWwrJXP09kdHImEkU2M6I4sgsezPxP6+bYnjtZ0IlKXLF ovCVBKMyexvMhCaM5QTSyjTwt5K2IhqytCmU7IheMsvr5LWZdVzq969W6nV8ziKcAKncA4eXE N7qABTWAwhGd4hTdHOi/Ou/OxaC04+cwx/IHz+QN/0o1B</latexit>

Ax = b

<latexit sha1_base64="67UDMgoFYltg
  • brz7rR2DIHe0=">A
B7nicbZC7SgNBFIbPeo3xFrW0GQyCVdi10SY tbGMYC6QLGF2cjYZMju7zMyKYQn4CjYWitj6JvZ2vo2TS6GJPwx8/P85zDknSATXxnW/naXl dW19dxGfnNre2e3sLdf13GqGNZYLGLVDKhGwSXWD cCm4lCGgUCG8Hgepw37lFpHs 7M0zQj2hP8pAzaqzVuCQPpEyCTqHoltyJyCJ4MyhefObLjwBQ7RS+2t2YpRFKw TVu W5ifEzqgxnAkf5dqoxoWxAe9iyKGmE2s8m4 7IsXW6JIyVfdKQifu7I6OR1sMosJURNX09n43N/7JWasJzP+MySQ1KNv0oTAUxMRnvTrpcITNiaIEyxe2shPWposzYC+XtEbz5lRehflry3J 36xYrVzBVDg7hCE7AgzOowA1UoQYMBvAEL/DqJM6z8+a8T0uXnFnPAfyR8/EDG+OQ = </latexit> <latexit sha1_base64="Dm8roTSXrCMH8bYjA6SsA0Akb4A=">A B7nicbZC7SgNBFIbPxltcb1FLm8EgWIVdG2 CURvLCOYCyRJmJ7PJkNnZYWZWDEsewsZCEQsb38TeRnwbJ5dCE38Y+Pj/c5hzTig508bzvp3c0vLK6lp+3d3Y3NreKezu1XWSKkJrJOGJaoZYU84ErRlmOG1KRXEc toIB1fjvHFHlWaJuDVDSYMY9wSLGMHGWo0LdI/K OwUil7Jmwgtgj+D4vmHW5ZvX261U/hsdxOSxlQYwrHWLd+TJsiwMoxwOnLbqaYSkwHu0Z FgWOqg2wy7g dWaeLokTZJwyauL87MhxrPYxDWxlj09fz2dj8L2ulJjoLMiZkaqg 04+ilCOToPHuqMsUJY PLWCimJ0VkT5WmBh7IdcewZ9feRHqJyXfK/k3XrFyCVPl4QAO4Rh8OIUKXEMVakBgA /wBM+OdB6dF+d1Wp zZj378EfO+w8NcpG1</latexit> <latexit sha1_base64="Dm8roTSXrCMH8bYjA6SsA0Akb4A=">A B7nicbZC7SgNBFIbPxltcb1FLm8EgWIVdG2 CURvLCOYCyRJmJ7PJkNnZYWZWDEsewsZCEQsb38TeRnwbJ5dCE38Y+Pj/c5hzTig508bzvp3c0vLK6lp+3d3Y3NreKezu1XWSKkJrJOGJaoZYU84ErRlmOG1KRXEc toIB1fjvHFHlWaJuDVDSYMY9wSLGMHGWo0LdI/K OwUil7Jmwgtgj+D4vmHW5ZvX261U/hsdxOSxlQYwrHWLd+TJsiwMoxwOnLbqaYSkwHu0Z FgWOqg2wy7g dWaeLokTZJwyauL87MhxrPYxDWxlj09fz2dj8L2ulJjoLMiZkaqg 04+ilCOToPHuqMsUJY PLWCimJ0VkT5WmBh7IdcewZ9feRHqJyXfK/k3XrFyCVPl4QAO4Rh8OIUKXEMVakBgA /wBM+OdB6dF+d1Wp zZj378EfO+w8NcpG1</latexit> <latexit sha1_base64="0AhjPnWdFWZS08Jhs02RXICp+ao=">A B7nicbVA9SwNBEJ2LXzF+RS1tFoNgFe5sTCNEbSwjmA9IjrC3mUuW7O0du3tiOPIjbCwUsfX32Plv3CRXaOKDgcd7M8zMCxLBtXHdb6ewtr6xuVXcLu3s7u0flA+PWjpOFcMmi0WsOgHVKLjEpuFGYCdRSKNAYDsY38789iMqzWP5YCYJ+hEdSh5yRo2V2tfkiVyRoF+u FV3DrJKvJxUIEejX/7qDWKWRigNE1Tr ucmxs+oMpwJnJZ6qcaEsjEdYtdS PUfjY/d0rOrDIgYaxsSUPm6u+JjEZaT6LAdkbUjPSyNxP/87qpCWt+xmWSGpRs ShMBTExmf1OBlwhM2JiCW K21sJG1F mbEJlWwI3vL q6R1UfXcqnfvVuo3eRxFOIFTOAcPLqEOd9CAJjAYwzO8wpuTOC/Ou/OxaC04+cwx/IHz+QOsnI50</latexit>
slide-9
SLIDE 9

Sparse Matrices

  • Complexity: O(nnz)
  • This is CSR, other formats

are similar (in cost, not memory access)

1.1 2.2 3.3 4.4 5.5 6.6 7.7 8.8 9.9 10.0 11.1

Arowptr = [0 2 4 7 9 11]
 Acol = [0 2 1 3 0 2 4 1 3 3 4]
 Aval = [1.1 2.2 3.3 4.4 5.5 6.6 7.7 8.8 9.9 10.0 11.1] for(i = 0; i < n; i++){ sum = y[i]; for(jj = Arowptr[i]; jj < Arowptr[i+1]; jj++){ sum += Aval[jj] * x[ Acol[jj] ]; } y[i] = sum; }

1 2 3 4 1 2 3 4

y = A ∗ x

<latexit sha1_base64="3zSqcJljsA3uhLQGKOG/FuzEJEw=">A B8HicbVBNSwMxEJ2tX7V+VT16CRZBPJRdEfQiVL14rGA/pF1KNs2 oUl2SbLisvRXePGgiFd/j f/jWm7B219MPB4b4aZeUHMmTau+ 0UlpZXVteK6 WNza3tnfLuXlNHiSK0QSIeqXaANeVM0oZh tN2rCgWAaetYHQz8VuPVGkWyXuTxtQXeCBZyAg2VnpI0SW6QifoqVeu FV3CrRIvJxUIEe9V/7q9iOSC oN4VjrjufGxs+wMoxwOi51E01jTEZ4QDuWSiyo9rPpwWN0ZJU+CiNlSxo0VX9PZFhonYrAdgpshnrem4j/eZ3EhBd+xmScGCrJbFGYcGQiNPke9ZmixPDUEkwUs7ciMsQKE2MzKtkQvPmXF0nztOq5Ve/urFK7zuMowgEcwjF4cA41uIU6NICAgGd4hTdHOS/Ou/Mxay04+cw+/IHz+QO X47t</latexit> <latexit sha1_base64="3zSqcJljsA3uhLQGKOG/FuzEJEw=">A B8HicbVBNSwMxEJ2tX7V+VT16CRZBPJRdEfQiVL14rGA/pF1KNs2 oUl2SbLisvRXePGgiFd/j f/jWm7B219MPB4b4aZeUHMmTau+ 0UlpZXVteK6 WNza3tnfLuXlNHiSK0QSIeqXaANeVM0oZh tN2rCgWAaetYHQz8VuPVGkWyXuTxtQXeCBZyAg2VnpI0SW6QifoqVeu FV3CrRIvJxUIEe9V/7q9iOSC oN4VjrjufGxs+wMoxwOi51E01jTEZ4QDuWSiyo9rPpwWN0ZJU+CiNlSxo0VX9PZFhonYrAdgpshnrem4j/eZ3EhBd+xmScGCrJbFGYcGQiNPke9ZmixPDUEkwUs7ciMsQKE2MzKtkQvPmXF0nztOq5Ve/urFK7zuMowgEcwjF4cA41uIU6NICAgGd4hTdHOS/Ou/Mxay04+cw+/IHz+QO X47t</latexit> <latexit sha1_base64="3zSqcJljsA3uhLQGKOG/FuzEJEw=">A B8HicbVBNSwMxEJ2tX7V+VT16CRZBPJRdEfQiVL14rGA/pF1KNs2 oUl2SbLisvRXePGgiFd/j f/jWm7B219MPB4b4aZeUHMmTau+ 0UlpZXVteK6 WNza3tnfLuXlNHiSK0QSIeqXaANeVM0oZh tN2rCgWAaetYHQz8VuPVGkWyXuTxtQXeCBZyAg2VnpI0SW6QifoqVeu FV3CrRIvJxUIEe9V/7q9iOSC oN4VjrjufGxs+wMoxwOi51E01jTEZ4QDuWSiyo9rPpwWN0ZJU+CiNlSxo0VX9PZFhonYrAdgpshnrem4j/eZ3EhBd+xmScGCrJbFGYcGQiNPke9ZmixPDUEkwUs7ciMsQKE2MzKtkQvPmXF0nztOq5Ve/urFK7zuMowgEcwjF4cA41uIU6NICAgGd4hTdHOS/Ou/Mxay04+cw+/IHz+QO X47t</latexit> <latexit sha1_base64="3zSqcJljsA3uhLQGKOG/FuzEJEw=">A B8HicbVBNSwMxEJ2tX7V+VT16CRZBPJRdEfQiVL14rGA/pF1KNs2 oUl2SbLisvRXePGgiFd/j f/jWm7B219MPB4b4aZeUHMmTau+ 0UlpZXVteK6 WNza3tnfLuXlNHiSK0QSIeqXaANeVM0oZh tN2rCgWAaetYHQz8VuPVGkWyXuTxtQXeCBZyAg2VnpI0SW6QifoqVeu FV3CrRIvJxUIEe9V/7q9iOSC oN4VjrjufGxs+wMoxwOi51E01jTEZ4QDuWSiyo9rPpwWN0ZJU+CiNlSxo0VX9PZFhonYrAdgpshnrem4j/eZ3EhBd+xmScGCrJbFGYcGQiNPke9ZmixPDUEkwUs7ciMsQKE2MzKtkQvPmXF0nztOq5Ve/urFK7zuMowgEcwjF4cA41uIU6NICAgGd4hTdHOS/Ou/Mxay04+cw+/IHz+QO X47t</latexit>
slide-10
SLIDE 10 p = 0 p = 1 p = 2 p = 3 p = 4 p = 5

Key Challenge: Parallel Efficiency of Sparse Operations

  • Solid blocks: on-process portion
  • Patterned blocks: off-process portion (requires communication of the input vector)

w A v P0 P1 P2 P3

w ← A ∗ v

Data layout Where data is sent

slide-11
SLIDE 11

Blue Waters Case Study: Laplacian

1 2 3 4 5 6 7 8 Level in AMG Hierarchy 0.00 0.05 0.10 0.15 0.20 Setup Time

Comp Vector Comm Matrix Comm

1 2 3 4 5 6 7 8 Level in AMG Hierarchy 10 20 30 Solve Time

Comp Vector Comm

8192 processors, 512 nodes, ~200 rows per processor

slide-12
SLIDE 12

How do we address this?

  • 1. Remove data

  • 2. Data layout

  • 3. Data partition

  • 4. Data traffic
  • R. D. Falgout and J. B. Schroder, “Non-Galerkin coarse grids for algebraic multigrid,”
SISC, 2014
  • E. Treister and I. Yavneh, “Non-Galerkin multigrid based on sparsified smoothed ag-
gregation,” SISC, 2015
  • A. Bienz, R. D. Falgout, W. Gropp, L. N. Olson, and J. B. Schroder, “Reducing parallel
communication in algebraic multigrid through sparsification,” SISC, 2016 Mets, Scotch, Zoltan Bowman, Wolf, “A Nested Dissection Partitioning Method forParallel Sparse Matrix- Vector Multiplication” Graph reorderings Redistribution Gahvari, Gropp, Jordan, Schulz, Meier Yang, Systematic Reduction of Data Movement in Algebraic Multigrid Solvers, 2014

Reorganize communication Recognize limits of communication Recognize opportunities in the machine hierarchy

slide-13
SLIDE 13

Observation 1: high volume/number of messages

Maximum number of messages Maximum size of messages

5 10 15 20 AMG Level 102 103 Max Number of Messages 5 10 15 20 AMG Level 104 105 Max Messages Size (bytes)

np = 16384

slide-14
SLIDE 14

Observation 2: diminishing returns with higher communicating cores

node n node m 100 101 102 103 104 105 10 Number of Bytes Communicated 10−6 10−5 10−4 Time (seconds Network (PPN ≥ 4) Network (PPN < 4) On-Node On-Socket

T = α + ppn · s min (RN, ppn · RB)

latency message size Bandwidth between two processes Node injection bandwidth Modeling MPI Communication Performance on SMP Nodes: Is it Time to Retire the Ping Pong Test,
 Gropp, Olson, Samfass, EuroMPI 2016.
slide-15
SLIDE 15
  • Concurrency increasing
  • Hierarchy of compute nodes (sockets , nodes, etc)
  • Range of compute units (power 9, GPU, etc)
  • Blue Waters is providing a roadmap

node socket die cores

Observation 3: node locality

slide-16
SLIDE 16

A node level approach to a SpMV

P0 P1 P2 P3 P4 P5 N0 N1 N2

Six processes distributed across three nodes Linear system distributed across the processes

w A v P0 P1 P2 P3 P4 P5

slide-17
SLIDE 17

Standard Communication

Node core

n m p n m q

Node core
  • Duplicate data
  • Many messages
slide-18
SLIDE 18

3-step Communication

n m p q

slide-19
SLIDE 19

2-step Communication;

n m

slide-20
SLIDE 20 5 10 15 20 AMG Level 10−3 10−2 Time (seconds)
  • ref. SpMV
TAPSpMV 2000 4000 6000 8000 10000 12000 14000 16000 18000 Number of Processes 10−1 Time (seconds)
  • ref. SpMV
TAPSpMV

Total Time Strong Scaling

Node aware sparse matrix-vector multiplication,
 Bienz, Gropp, Olson, JPDC, 2019.

Case Study: 3 step communication, Linear Elasticity

slide-21
SLIDE 21

Impact on SpMM and SpMV

2 4 6 8 10 12 Level in AMG Hierarchy 0.000 0.002 0.004 0.006 Time (seconds)

Standard 3-Step Node-Aware 2-Step Node-Aware

2 4 6 8 10 12 Level in AMG Hierarchy 0.0000 0.0002 0.0004 0.0006 0.0008 0.0010 Time (seconds)

Standard 3-Step Node-Aware 2-Step Node-Aware Row-Wise SpGEMM: AP SpMV: Ax
slide-22
SLIDE 22

Performance Results: Strong Scaling

  • MFEM Grad-Div problem; Blue Waters; 356,352 rows; 14,145,024 nnz
  • New algorithms + performance models = extended scaling
128 512 1024 2048 4096 8192 16384 Level in AMG Hierarchy 1 2 3 4 Total Speedup RS/(NAP RS) SA/(NAP SA) 5000 10000 15000 Number of Processes 2 4 6 8 Time RSS NAP RSS SAS NAP SAS Hypre 1 2 3 4 5 6 Level in AMG Hierarchy 1e-05 0.0001 0.001 0.01 Measured Times Standard NAP2 NAP3

Automatic selection Time Speedup

Reducing Communication in Algebraic Multigrid with Multi-step Node Aware Communication
 Bienz, Gropp, Olson https://arxiv.org/abs/1904.05838
slide-23
SLIDE 23

Node-Aware MPI Library

n m q n m p q n m
slide-24
SLIDE 24

Impact of Blue Waters

  • Amanda Bienz, PhD (2018)


reducing communication

  • Andrew Reisner, PhD (2019)


scalable structured solvers

  • Lukas Spies, PhD (current)


communication and multi-GPU systems

  • Philipp Samfass, MS (2016)


performance modeling

  • Shelby Lockhart, PhD (current)


high performance Krylov methods

  • John Calhoun, PhD (2017, BW Fellow!)


fault resilience in HPC

  • Blue Waters has been an invaluable

recruitment tool, both students and faculty

  • Blue Waters has directly contributed

to the visibility and quality of the research

  • Blue Waters has been a gateway to

developing new codes, testing new methods, and anticipating new and upcoming architectures.

slide-25
SLIDE 25

Where to find more

  • github.com/cedar-framework/cedar
  • Scaling Structured Multigrid to 500K+ Cores through Coarse-Grid Redistribution

Reisner, Olson, Moulton, SISC, 2018
  • github.com/raptor-library/raptor
  • Node-Aware Sparse Matrix-Vector Communication

Bienz, Gropp, Olson, JPDC, 2019
  • Improving Performance Models for Irregular Point-to-Point Communication

Bienz, Gropp, Olson, in review EuroMPI, 2018.
  • Reducing Communication in Algebraic Multigrid with Multi-step Node Aware
Communication, https://arxiv.org/abs/1904.05838
  • github.com/bienz2/Node_Aware_MPI

This material is based in part upon work supported by the Department of Energy, National Nuclear Security Administration, under Award Number DE-NA0002374. This research is part of the Blue Waters sustained petascale computing project, which is supported by the National Science Foundation (awards OCI-0725070 and ACI-1238993) and the state of Illinois. Blue Waters is a joint effort of the University of Illinois at Urbana-Champaign and its National Center for Supercomputing Applications.