a daptivity t hrough the l ens of p4est
play

A DAPTIVITY T HROUGH THE L ENS OF p4est 1 N ONCONFORMING M ESHES IN - PowerPoint PPT Presentation

U NITING P ERFORMANCE AND E XTENSIBILITY IN A DAPTIVE F INITE E LEMENT C OMPUTATIONS Toby Isaac tisaac@ices.utexas.edu The University of Chicago at Austin September 14, 2015 CAAM Colloqium Rice University T. Isaac (U. Chicago) Adaptivity:


  1. U NITING P ERFORMANCE AND E XTENSIBILITY IN A DAPTIVE F INITE E LEMENT C OMPUTATIONS Toby Isaac tisaac@ices.utexas.edu The University of Chicago at Austin September 14, 2015 CAAM Colloqium Rice University T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 1 / 43

  2. A DAPTIVITY T HROUGH THE L ENS OF p4est 1 N ONCONFORMING M ESHES IN PETS C 2 T HE I NTERACTIVE P ORTION . . . 3 T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 2 / 43

  3. W HY A DAPTIVE M ESH R EFINEMENT (AMR)? W HY A DAPTIVE A NYTHING ? Non-adaptive (branch-free) calculations are fast. Why bother? 1 Your non-adaptive calculations have reached the end of your resources (or the end of weak-scalability), and you want to push back. 2 (Ideally) you have a performance model that predicts it can help. E XAMPLE : hp -FEM T HEORY Predicts exponential convergence in N dof : If we want zero error, it’s worth it. If we have a nonzero tolerance, we must consider that hp systems require more resources per dof to solve than uniform, low-order systems. There is always a crossover. T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 3 / 43

  4. E XAMPLE A PPLICATIONS M ANTLE C ONVECTION Subduction zone resolution globally ⇒ trillions of dofs. AMR reduces to O (10 8 − 10 9 ) . [Stadler et al., 2010]: Stabilized [ Q 1 ] 3 × Q 1 elements, black-box algebraic multigrid. [Rudi et al., 2015]: Stable [ Q 2 ] 3 × Q disc elements, custom hybrid 0 algebraic/geometric multigrid solver, demonstrated implicit solver weak-scalability to 1 . 5 million BG/Q cores and O (10 11 ) dofs. T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 4 / 43

  5. E XAMPLE A PPLICATIONS I CE S HEET D YNAMICS [T.I. et al., 2015c]: Stable [ Q k ] 3 × Q disc k − 2 finite elements, complex domain with variable resolution demands, Robin-type boundary conditions, domain anisotropy, unusual solver demands. T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 5 / 43

  6. E XAMPLE A PPLICATIONS U NCERTAINTY Q UANTIFICATION [T.I. et al., 2015b]: Inversion (deterministic and Bayesian) for unknown boundary coefficient fields ( O (10 5 ) parameters) from surface observations in the previous ice sheet model. [Not in the above work] The tools that drive adaptivity are important for quantifying model error in Bayesian inversion. Bayesian inversion requires two components: a prior distribution on the parameters and a likelihood function of the parameters given data ∼ the probability of the data given parameters, π ( d | p ) . This should incorporate not only the “noise” of the data, but the uncertainty due to error in the model-to-parameter map, i.e., the a posteriori error of the finite element solution. T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 6 / 43

  7. A PPROACHES TO M ESHING AND A DAPTIVITY W HICH S HOULD I C HOOSE ? S TRUCTURED (G RID /L ATTICE ) Fast Adaptivity: uniform, (occasionally) tensor U NSTRUCTURED (A DJACENCY G RAPH /CW-C OMPLEX ) Flexible Adaptivity: arbitrary S EMI - STRUCTURED (E XPLICIT T REE /I MPLICIT T REE ) Dynamic Adaptivity: local, (occasionally) anisotropic T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 7 / 43

  8. C OMPOSITION OF M ESHING A PPROACHES L IBRARIES /F RAMEWORKS FOR C OMPOSITE M ESHING Several examples exist: P ATCH - BASED AMR (C HOMBO , SAMRAI, ETC .) Fast stencil-based computations with local refinement & unstructured trees. H IERARCHICAL H YBRID G RIDS [G MEINER ET AL ., 2015] Fast stencil-based computations on non-Cartesian geometries. T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 8 / 43

  9. p4est : F ORESTS OF Q UADTREES /O CTREES Main developers: C. Burstedde, T.I., many other contributors. [Burstedde et al., 2011, T.I. et al., 2012, 2015a], p4est.org Backend: deal.II, PETSc (in progress). An unstructured hexahedral mesh (“the forest”); where each hexahedron contains an arbitrarily refined octree; space-filling curve (SFC) orders elements; philosophy: as-simple-as-possible coarse mesh describes geometry, refinement captures all detail. T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 9 / 43

  10. p4est : F ORESTS OF Q UADTREES /O CTREES Main developers: C. Burstedde, T.I., many other contributors. [Burstedde et al., 2011, T.I. et al., 2012, 2015a], p4est.org Backend: deal.II, PETSc (in progress). An unstructured hexahedral mesh (“the forest”); where each hexahedron contains an arbitrarily refined octree; space-filling curve (SFC) orders elements; philosophy: as-simple-as-possible coarse mesh describes geometry, refinement captures all detail. x 0 k 0 k 1 y 0 k 1 k 0 x 1 p 0 p 1 p 1 p 2 y 1 T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 9 / 43

  11. p4est ’ S R EFINEMENT C YCLE C REATE T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 10 / 43

  12. p4est ’ S R EFINEMENT C YCLE R EFINE T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 10 / 43

  13. p4est ’ S R EFINEMENT C YCLE 2:1 B ALANCE T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 10 / 43

  14. p4est ’ S R EFINEMENT C YCLE R EPARTITION ( LOAD BALANCE ) T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 10 / 43

  15. p4est ’ S R EFINEMENT C YCLE R EPARTITION ( LOAD BALANCE ) Not pictured: construct FE basis and communication patterns. T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 10 / 43

  16. p4est ’ S S CALABILITY W EAK SCALING OF MESH REFINEMENT CYCLE (2:1 BALANCE HIGHLIGHTED ) Partition Balance Ghost Nodes 100 90 80 70 Percentage of runtime 60 50 40 30 20 10 0 12 60 432 3444 27540 220320 Number of CPU cores T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 11 / 43

  17. p4est ’ S S CALABILITY W EAK SCALING OF MESH REFINEMENT CYCLE (2:1 BALANCE HIGHLIGHTED ) Old New 6 5 Seconds per (million elements / core) 4 3 2 1 0 12 96 768 6144 49152 112128 Number of CPU cores T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 11 / 43

  18. E XAMPLE : A N I CE S HEET M ODEL B UILT ON p4est Ice sheet thickness: ∼ 2 km Ice sheet extent: O (10 3 ) km T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 12 / 43

  19. T HE P ROBLEM WITH O CTREES IN T HIN D OMAINS The space filling curve does not respect column order: Columns split between processors when partitioning. Dofs not ordered in columns for efficient preconditioning (e.g., ILU). T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 13 / 43

  20. A N A NISOTROPIC S OLUTION A NOTHER L AYER OF M ESH C OMPOSITION partition 0 partition 1 A p4est forest of quadtrees to manage columns, with each column stored as a flat, linear binary tree of layers, which guarantees column integrity. An extension to p4est : hybrid routines have the prefix “ p6est_ ”, reproduce most of the standard p4est API, are documented on the website 1 . 1 p4est.github.io/api T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 14 / 43

  21. A N A NISOTROPIC S OLUTION A NOTHER L AYER OF M ESH C OMPOSITION A p4est forest of quadtrees to manage columns, with each column stored as a flat, linear binary tree of layers, which guarantees column integrity. An extension to p4est : hybrid routines have the prefix “ p6est_ ”, reproduce most of the standard p4est API, are documented on the website 1 . 1 p4est.github.io/api T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 14 / 43

  22. A NOTHER A PPLICATION NUMA: N ONHYDROSTATIC U NIFIED M ODEL OF THE A TMOSPHERE Well-suited for other climate and earth systems models. NUMA: Non-hydrostatic Unified Model of the Atmosphere 2 [Giraldo et al., 2013] is using p6est for partitioning (adaptivity in progress). Scalability to 1M processes on Mira BG/Q [in preparation]. 2 faculty.nps.edu/fxgirald/projects/NUMA/Introduction_to_NUMA.html T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 15 / 43

  23. A NOTHER A PPLICATION NUMA: N ONHYDROSTATIC U NIFIED M ODEL OF THE A TMOSPHERE Well-suited for other climate and earth systems models. NUMA: Non-hydrostatic Unified Model of the Atmosphere 2 [Giraldo et al., 2013] is using p6est for partitioning (adaptivity in progress). Scalability to 1M processes on Mira BG/Q [in preparation]. Ω p 2 faculty.nps.edu/fxgirald/projects/NUMA/Introduction_to_NUMA.html T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 15 / 43

  24. T ESTING THE L IMITS OF p4est ’ S P ARTITIONING Scalability to 458K BG/Q cores of JUQUEEN from [T.I. et al., 2015a]. forest-to-mesh runtime in seconds 10 2 8B 64B510B 16M130M 1B 2M 10 0 240k 28k 5.7k 10 − 2 10 1 10 2 10 3 10 4 10 5 10 6 P P , 16-way: 16 128 1024 8192 65536 458752 P , 32-way: 32 256 2048 16384 131072 917504 P , 64-way: 64 512 4096 32768 262144 T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 16 / 43

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend