multilevel domain decomposition at extreme scales
play

Multilevel domain decomposition at extreme scales S. Badia, A. - PowerPoint PPT Presentation

Multilevel domain decomposition at extreme scales S. Badia, A. Martin, J. Principe Universitat Politcnica de Catalunya & CIMNE Jeju, July 7th, 2015 0 / 24 Outline 1 Motivation 2 Multilevel framework 3 Multilevel linear solvers 4


  1. Multilevel domain decomposition at extreme scales S. Badia, A. Martin, J. Principe Universitat Politècnica de Catalunya & CIMNE Jeju, July 7th, 2015 0 / 24

  2. Outline 1 Motivation 2 Multilevel framework 3 Multilevel linear solvers 4 Conclusions 0 / 24

  3. Outline 1 Motivation 2 Multilevel framework 3 Multilevel linear solvers 4 Conclusions 0 / 24

  4. Current trends of supercomputing • Transition from today’s 10 Petaflop/s supercomputers (SCs) • ... to exascale systems w/ 1 Exaflop/s expected in 2020 • × 100 performance based on concurrency (not higher freq) • Future: Multi-Million- core (in broad sense) SCs 1 / 24

  5. Current trends of supercomputing • Transition from today’s 10 Petaflop/s supercomputers (SCs) • ... to exascale systems w/ 1 Exaflop/s expected in 2020 • × 100 performance based on concurrency (not higher freq) • Future: Multi-Million- core (in broad sense) SCs 1 / 24

  6. Weakly scalable solvers • This talk: One challenge, weakly scalable algorithms Weak scalability If we increase X times the number of Source: Dey et al, 2010 processors, we can solve an X times larger problem • Key property to face more complex problems / increase accuracy Source: parFE project 2 / 24

  7. Scalable linear solvers (AMG) • Most scalable solvers for CSE are parallel AMG (Trilinos [Lin, Shadid, Tuminaro, ...], Hypre [Falgout, Yang,...],...) • Hard to scale up to largest SCs today (one million cores, < 10 PFs) • Problems: large communication/computation ratios at coarser levels, densification coarser problems,... 3 / 24

  8. Multilevel framework • Propose a highly scalable implementation of Multilevel DD methods (MLBDDC [Mandel et al’08]) • MLDD based on a hierarchy of meshes/functional spaces • It involves local subdomain problems at all levels (L1, L2, ...) FE mesh Subdomains (L1) Subdomains (L2) 4 / 24

  9. Outline 1 Motivation I: Develop a multilevel framework suitable for extremely scalable implementations 2 Motivation II: Apply the multilevel framework for scalable linear algebra (MLBDDC) 5 / 24

  10. Outline 1 Motivation I: Develop a multilevel framework suitable for extremely scalable implementations 2 Motivation II: Apply the multilevel framework for scalable linear algebra (MLBDDC) 5 / 24

  11. Outline 1 Motivation I: Develop a multilevel framework suitable for extremely scalable implementations 2 Motivation II: Apply the multilevel framework for scalable linear algebra (MLBDDC) All implementations in FEMPAR (in-house code) to be dis- tributed as open-source SW soon * * Funded by Proof of Concept Grant 640957 - FEXFEM: On a free open source extreme scale finite element software 5 / 24

  12. Outline 1 Motivation 2 Multilevel framework 3 Multilevel linear solvers 4 Conclusions 5 / 24

  13. Premilinaries • Element-based (non-overlapping DD) distribution (+ limited ghost info) ˜ T 1 h , T 2 h , T 3 T 1 T h h h • Gluing info based on objects • Object: Maximum set of interface nodes that belong to the same set of subdomains 6 / 24

  14. Premilinaries • Element-based (non-overlapping DD) distribution (+ limited ghost info) ˜ T 1 h , T 2 h , T 3 T 1 T h h h • Gluing info based on objects • Object: Maximum set of interface nodes that belong to the same set of subdomains 6 / 24

  15. Automatic hierarchical mesh generator Classification of objects (vef’s at the next level) in 3D • Faces: Objects that belong to 2 subdomains • Edges: Objects that belong to more than 2 subdomains • Corners: Edges and faces with cardinality 1 7 / 24

  16. Coarser triangulation • Similar to FE triangulation object but wo/ reference element • Instead, aggregation info object level 1 = aggregation (vef’s level 0) 8 / 24

  17. Coarser FE space • On top of coarser triangulation, we create a FE-like functional space • DOFs on geometrical objects at the coarser level (as in FEs) • Aggregation info for DOFs ( u α 1 = F α ( u 1 )) 9 / 24

  18. Coarser FE space • On top of coarser triangulation, we create a FE-like functional space • DOFs on geometrical objects at the coarser level (as in FEs) • Aggregation info for DOFs ( u α 1 = F α ( u 1 )) 9 / 24

  19. Coarser FE space • On top of coarser triangulation, we create a FE-like functional space • DOFs on geometrical objects at the coarser level (as in FEs) • Aggregation info for DOFs ( u α 1 = F α ( u 1 )) 1 X u α 1 = u 1 ( p ) #( p ) p ∈E α 9 / 24

  20. Hierarchical FE spaces • The under-assembled space ¯ V 0 = { v ∈ ˜ V 0 | continuous F 1 ( v ) } • ¯ V 0 is a multiscale space ˜ ¯ V 0 V 0 V 0 • Compute sol’on in V 0 using ¯ V 0 correction as preconditioner (multilevel precond) • BDDC DD preconditioner is a particular realization of ¯ V 0 (corners/edges/faces) 10 / 24

  21. Hierarchical FE spaces • The under-assembled space ¯ V 0 = { v ∈ ˜ V 0 | continuous F 1 ( v ) } • ¯ V 0 is a multiscale space ˜ ¯ V 0 V 0 V 0 • Compute sol’on in V 0 using ¯ V 0 correction as preconditioner (multilevel precond) • BDDC DD preconditioner is a particular realization of ¯ V 0 (corners/edges/faces) 10 / 24

  22. Hierarchical FE spaces • The under-assembled space ¯ V 0 = { v ∈ ˜ V 0 | continuous F 1 ( v ) } • ¯ V 0 is a multiscale space ˜ ¯ V 0 V 0 V 0 • Compute sol’on in V 0 using ¯ V 0 correction as preconditioner (multilevel precond) • BDDC DD preconditioner is a particular realization of ¯ V 0 (corners/edges/faces) 10 / 24

  23. Hierarchical FE spaces The under-assembled space ¯ V 0 can be decomposed as [Dohrmann’03]: • Its bubble space ¯ 0 = { v ∈ ¯ V b V 0 |F ( v ) = 0 } • The coarser FE space V 1 = { v ∈ ¯ A ¯ V b V 0 | v ⊥ ˜ 0 } F ( u 0 ) = 0 ¯ ¯ V b = ⊕ V 0 V 1 0 11 / 24

  24. Hierarchical FE spaces The under-assembled space ¯ V 0 can be decomposed as [Dohrmann’03]: • Its bubble space ¯ 0 = { v ∈ ¯ V b V 0 |F ( v ) = 0 } • The coarser FE space V 1 = { v ∈ ¯ A ¯ V b V 0 | v ⊥ ˜ 0 } F ( u 0 ) = 0 ¯ ¯ V b = ⊕ V 0 V 1 0 11 / 24

  25. Coarse corner function • Compute via local problems a basis for V 1 = { Φ 1 , . . . , Φ n c } • Every Φ is a coarse shape function related to a coarse DoF Circle domain partitioned into 9 V 1 corner basis function subdomains 12 / 24

  26. Coarse edge function • Compute via local problems a basis for V 1 = { Φ 1 , . . . , Φ n c } • Every Φ is a coarse shape function related to a coarse DoF Circle domain partitioned into 9 V 1 edge basis function subdomains 13 / 24

  27. Multilevel/scale concurrency The problem in ¯ V 0 = V 1 ⊕ V b 0 : u 0 ∈ ¯ v 0 ∈ ¯ ¯ V 0 : a (¯ u 0 , ¯ v 0 ) = ( f , ¯ v 0 ) ∀ ¯ V 0 A ¯ u b V b can be decomposed as ¯ u 0 = ¯ 0 + u 1 (orthogonality V 1 ⊥ ˜ 0 ) 0 ∈ ¯ 0 ) ∀ v 0 ∈ ¯ u b V b : a ( u b 0 , v b 0 ) = ( f 0 , v b V b 0 0 u 1 ∈ V 1 : a ( u 1 , v 1 ) = ( f 1 , v 1 ) ∀ v 1 ∈ V 1 • Bubble component is local to every subdomain (parallel) • Coarse global problem 14 / 24

  28. Multilevel/scale concurrency The problem in ¯ V 0 = V 1 ⊕ V b 0 : u 0 ∈ ¯ v 0 ∈ ¯ ¯ V 0 : a (¯ u 0 , ¯ v 0 ) = ( f , ¯ v 0 ) ∀ ¯ V 0 A ¯ u b V b can be decomposed as ¯ u 0 = ¯ 0 + u 1 (orthogonality V 1 ⊥ ˜ 0 ) 0 ∈ ¯ 0 ) ∀ v 0 ∈ ¯ u b V b : a ( u b 0 , v b 0 ) = ( f 0 , v b V b 0 0 u 1 ∈ V 1 : a ( u 1 , v 1 ) = ( f 1 , v 1 ) ∀ v 1 ∈ V 1 • Bubble component is local to every subdomain (parallel) • Coarse global problem 14 / 24

  29. Multilevel/scale concurrency The problem in ¯ V 0 = V 1 ⊕ V b 0 : u 0 ∈ ¯ v 0 ∈ ¯ ¯ V 0 : a (¯ u 0 , ¯ v 0 ) = ( f , ¯ v 0 ) ∀ ¯ V 0 A ¯ u b V b can be decomposed as ¯ u 0 = ¯ 0 + u 1 (orthogonality V 1 ⊥ ˜ 0 ) 0 ∈ ¯ 0 ) ∀ v 0 ∈ ¯ u b V b : a ( u b 0 , v b 0 ) = ( f 0 , v b V b 0 0 u 1 ∈ V 1 : a ( u 1 , v 1 ) = ( f 1 , v 1 ) ∀ v 1 ∈ V 1 • Bubble component is local to every subdomain (parallel) • Coarse global problem 14 / 24

  30. Multilevel/scale concurrency The problem in ¯ V 0 = V 1 ⊕ V b 0 : u 0 ∈ ¯ v 0 ∈ ¯ ¯ V 0 : a (¯ u 0 , ¯ v 0 ) = ( f , ¯ v 0 ) ∀ ¯ V 0 A ¯ u b V b can be decomposed as ¯ u 0 = ¯ 0 + u 1 (orthogonality V 1 ⊥ ˜ 0 ) 0 ∈ ¯ 0 ) ∀ v 0 ∈ ¯ u b V b : a ( u b 0 , v b 0 ) = ( f 0 , v b V b 0 0 u 1 ∈ V 1 : a ( u 1 , v 1 ) = ( f 1 , v 1 ) ∀ v 1 ∈ V 1 • Bubble component is local to every subdomain (parallel) • Coarse global problem Multilevel concurrency is BASIC for extreme scalability implementations 14 / 24

  31. Multilevel concurrency P 0 P 1 P 2 t = • L1 duties are fully parallel • L2 duties destroy scalability because • # L1 proc’s ∼ × 1000 # L2 proc’s • L2 problem size increases w/ number of proc’s 15 / 24

  32. Multilevel concurrency P 0 P 1 P 2 P 3 t = • Every processor has one level/scale duties • Idling dramatically reduced (energy-aware solvers) • Overlapped communications / computations among levels 15 / 24

  33. Multilevel concurrency P 0 P 1 P 2 P 3 t = Inter-level overlapped bulk asynchronous (MPMD) im- plementation in FEMPAR 15 / 24

  34. FEMPAR implementation Multilevel extension straightforward (starting the alg’thm with V 1 and level-1 mesh) 3rd level 1st level MPI comm 2nd level MPI comm MPI comm 1 2 1 2 3 4 P 1 2 P 1 ..... ..... e e e e e e e e e r r r r r r r r o o o o o o o o r o c c c c c c c c c parallel (distributed) global communication ..... global communication ..... time ..... 16 / 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend