Improving Virtual Prototyping and Certification with Implicit Finite - - PowerPoint PPT Presentation

improving virtual prototyping and certification with
SMART_READER_LITE
LIVE PREVIEW

Improving Virtual Prototyping and Certification with Implicit Finite - - PowerPoint PPT Presentation

Improving Virtual Prototyping and Certification with Implicit Finite Element Method at Scale Seid Koric 1,2 , Robert F. Lucas 3 , Erman Guleryuz 1 1 National Center for Supercomputing Applications 2 Mechanical Science and Engineering Department,


slide-1
SLIDE 1

Improving Virtual Prototyping and Certification with Implicit Finite Element Method at Scale

Seid Koric1,2, Robert F. Lucas3, Erman Guleryuz1

1National Center for Supercomputing Applications 2Mechanical Science and Engineering Department, University of Illinois 3Livermore Software Technology Corporation

Blue Waters Symposium 2019, June 5th

slide-2
SLIDE 2

2

Seid Koric, Erman Guleryuz Todd Simons, James Ong Jef Dawson, Ting-Ting Zhu Robert Lucas, Roger Grimes, Francois-Henry Rouet

slide-3
SLIDE 3

3

Overview of the project

q Today: Virtual prototypes supplement physical tests in design and certification q Vision: Further reduce cost & risk (Supplement → Replacement) q Immediate goal: Increase impact of simulation technology q Impact of simulation = f (speed, scale, fidelity) q Performance scaling = f (code, input, machine) q FEM: Partial differential equations → Sparse linear system q HPC strategy: Sparse linear algebra → Dense linear algebra q Overall approach: Scale-analyze-improve with real-life models

Rolls-Royce Representative Engine Model

slide-4
SLIDE 4

4

Overview of challenges

q More specific: These apply to LS-DYNA, and any other significant MCAE ISVs § Large legacy code, cannot start from scratch, must gracefully evolve § General-purpose code, cannot optimize for narrow class of problems § Key algorithms are NP-complete/hard, need to depend on heuristics q More universal: These probably apply to any significant scientific or engineering code § Limited number of software development tools, especially for performance engineering § Increasing complexity of hardware architectures, combined with frequent design updates § Performance portability constraints for codes used on many systems § Limited HPC access, especially true for ISVs

slide-5
SLIDE 5

5

Parallel scaling at the beginning of the Blue Waters project

2000 4000 6000 8000 10000 12000 14000 128 256 512 1024 2048

Time (seconds) MPI ranks

100M DOF, Three implicit load steps

Before, Hybrid (8 threads/MPI)

slide-6
SLIDE 6

6

Improvement framework and progress highlights

q Memory management improvements

§ Dynamic allocation

q Existing algorithm improvements

§ Inter-node communication

q Previously unknown bottlenecks

§ Constraint processing

q Entirely new algorithms

§ Parallel matrix reordering § Parallel symbolic factorization

q Computation workflow modifications

§ Offline parsing and decomposition of the model

Measure Analyze Improve Scale-up

slide-7
SLIDE 7

7

NCSA OVIS view of LS-DYNA execution

10 20 30 40 50 60 70 20 40 60 80 100 120 140

Free memory (GB) Time (minutes)

Input processing Reordering Sequential symbolic preprocessing Assemble, redistribute, factor and solve 2X 105M DOF model, 256 MPI ranks, 8 threads each Free memory on MPI rank zero’s node

slide-8
SLIDE 8

8

Multifrontal sparse linear solver

Multifrontal method: Input processing > Matrix reordering > Symbolic factorization > Numeric factorization > Triangular solution

Assembly tree of submatrices

{ } { }

t+Δt t+Δt t+Δt i-1 i-1 i-1

K Δu = R é ù ë û

Sparse linear system

5 10 15 20 25 30 2 4 6 8 10 12 14 16 18 20 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000 18,000

Numeric factorization rate (Tflop/s) Numeric factorization memory footprint per process (GB) Number of threads

Multifrontal factorization parallel scaling

slide-9
SLIDE 9

9

Results – Comparison with MUMPS factorization

slide-10
SLIDE 10

10

LS-GPart nested dissection for eight processors

slide-11
SLIDE 11

11

Results – LS-GPart matrix reordering quality

0.1 1 10 t

  • r

s

  • 3

T r a n s p

  • r

t m e m c h i p a t m

  • s

m

  • d

d a u d i k w 1 b

  • n

e S 1 c a g e 1 3 C u r l C u r l 4 G 3 c i r c u i t H V 1 5 R k k t p

  • w

e r L

  • n

g C

  • u

p d t 6 M L G e e r nnz(L+U), normalized wrt Metis AMD MMD Sparspak-ND Metis Scotch Spectral LS-GPart

LS-GPart added to reordering comparison presented in “Preconditioning using Rank- structured Sparse Matrix Factorization”, Ghysels, et.al., SIAM PP 2018

slide-12
SLIDE 12

Results - LS-GPart performance

128 256 512 1024 2048 50 100 150 200 250 300 350 LS-GPart ParMETIS PT-Scotch

Processor count Time (seconds)

slide-13
SLIDE 13

13

5 10 15 20 25 30 128 256 512 1024 2048

Time (seconds) MPI ranks mf3Sym with LS-GPart

mf3Indist mf3Permute mf3Locate mf3Affinity mf3PrmSize mf3Redist mf3ObjMember wait mf3DomSym mf3DomSymFct mf3SepTree mf3SepSymFct mf3FormTree mf3Assign mf3PostOrder mf3PtrInit mf3Symbolic mf3Owner mf3KObjStats mf3Finish mf3SNtile mf3SMPsize

Results – New symbolic factorization performance scaling 100M DOF

~300 sec. in original sequential code

slide-14
SLIDE 14

14

Results – Before and after Blue Waters engagement

2000 4000 6000 8000 10000 12000 14000 128 256 512 1024 2048

Time (seconds) MPI ranks

100M DOF, Three implicit load steps

Before, Hybrid (8 threads/MPI) After, MPP After, Hybrid (8 threads/MPI)

slide-15
SLIDE 15

15

Results – Overall practical impact

q Finite element model with 200 million degrees of freedom q Cumulative effect of better code and more compute resources q Two orders of magnitude reduction in time-to-solution q Work in progress for more practical impact

slide-16
SLIDE 16

16

Future work and concluding remarks

q Industrial challenges are beyond the capabilities of today’s H/W and S/W! q New design decisions based on finer grain analyses and more benchmarks! q More scale will also couple with more physics! q The right collaboration model accelerates progress! q HPC access is critical in advancing the state of the art! q Project benefits much broader community and sectors! q Special thanks to Blue Waters SEAS team for technical support!

slide-17
SLIDE 17

17