dune on blue gene p
play

DUNE on Blue Gene / P Markus Blatt - PowerPoint PPT Presentation

DUNE on Blue Gene / P Markus Blatt (Markus.Blatt@iwr.uni-heidelberg.de) joint work with: Olaf Ippisch and Felix Heimann Interdisziplin ares Zentrum f ur wissenschaftliches Rechnen Universit at Heidelberg SciComp 15, Barcelona, May 21,


  1. DUNE on Blue Gene / P Markus Blatt (Markus.Blatt@iwr.uni-heidelberg.de) joint work with: Olaf Ippisch and Felix Heimann Interdisziplin¨ ares Zentrum f¨ ur wissenschaftliches Rechnen Universit¨ at Heidelberg SciComp 15, Barcelona, May 21, 2009 M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 1 / 19

  2. Outline DUNE 1 Parallelization Approach 2 Porting to BG/P 3 Scalability 4 M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 2 / 19

  3. DUNE DUNE Why another framework? • Lots of good frameworks for PDEs out there. • Using one it might be • either impossible have a particular feature, • or very inefficient in certain applications. • Extension of the feature set is usually hard D istributed and U nified N umerics E nvironment • Separation of data structures and algorithms by abstract interfaces. • Efficient implementation of these interfaces using generic programming techniques in C++. • Static polymorphism enables extensive optimization by the compiler. • Algorithms are parametrized with data structures. Interface is removed at compile time. • Open Source available from http://www.dune-project.org M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 3 / 19

  4. DUNE DUNE is modular dune−pdelab−howto dune−pdelab dune−fem dune−grid−howto dune−localfunctions dune−grid dune−istl Metis NeuronGrid SuperLU UG Alberta ALU dune−common VTK Gmsh • Grid interface : (non-)conforming hierarchically nested, multi-element-type parallel grids in arbitrary space dimensions. • Iterative Solver Template Library : Generic sparse and dense matrix and vector classes supporting recursive block structures. Corresponding (parallel) solvers, e.g. AMG. • PDELab : Discretization module that is closely related to the mathematical formulation of finite element methods. M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 4 / 19

  5. DUNE Sample Simulations • Flow and transport in porous media • Neuron network simulation • Density-driven flow • Root uptake M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 5 / 19

  6. Parallelization Approach Parallelization Approach DUNE 1 Parallelization Approach 2 Porting to BG/P 3 Scalability 4 M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 6 / 19

  7. Parallelization Approach Index Based Communication Goals • Allow reuse of efficient sequential data structures for computations • Let user initiate communication when needed. • Support • Unstructuredness • Adaptivity • Communication of different data with the same decomposition. Approach • Keep decomposition and communication information outside of data structures. • Use simple and portable index identification of items. • Data structures need to be augmented to contain ghost items. M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 7 / 19

  8. Parallelization Approach Index Sets Index Set • Distributed overlapping index set I = � P − 1 I p 0 • Process p stores and manages mapping I p − → [0 , n p ). • Supports adaptivity. M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 8 / 19

  9. Parallelization Approach Index Sets Index Set • Distributed overlapping index set I = � P − 1 I p 0 • Process p stores and manages mapping I p − → [0 , n p ). • Supports adaptivity. Global Index • Identifies a position (index) globally. • Arbitrary and not consecutive (to support adaptivity). • Persistent. M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 8 / 19

  10. Parallelization Approach Index Sets Index Set • Distributed overlapping index set I = � P − 1 I p 0 • Process p stores and manages mapping I p − → [0 , n p ). • Supports adaptivity. Local Index • Addresses a position in the local container. • Convertible to an integral type. • Consecutive index starting from 0. • Non-persistent. • Provides an attribute to partition the set. M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 8 / 19

  11. Parallelization Approach Remote Index Information • Communication between different distributions of the index set is possible, e.g. • Data agglomeration onto fewer processes. • Data redistribution for load balancing. • For each process one needs to store all global indices, which are stored on that process, too, together with the corresponding attribute. • The remote index information can either be setup by hand (better efficiency) • or computed automatically using global communication. M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 9 / 19

  12. Parallelization Approach Communication Interface • Contains information on a specific communication scheme. • Target and source partition of the index is chosen using attribute flags, e.g from ghost to owner and ghost . • Still independent of the data to be communicated. • For each process a list of corresponding local indices at the source and target index set is stored. M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 10 / 19

  13. Parallelization Approach Communication • Communication occurs according to the setup interfaces. • Communication is possible in both directions (from source to target and vice versa). • Data associated to indices can either • be of the same size for each index, • or of different size for each index. • Data can be manipulated either at the source or at the target (customizable by user) M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 11 / 19

  14. Porting to BG/P Porting to BG/P DUNE 1 Parallelization Approach 2 Porting to BG/P 3 Scalability 4 M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 12 / 19

  15. Porting to BG/P Porting, a piece of cake? Naive Assumptions • Dune uses the autotools-toolchain together with a custom script for managing the module dependencies. • Autotools support cross compilation. • Configure test that need to run MPI programs can be switched off. • DUNE uses standard C++ (but advanced template stuff). • This should be really easy! Worked on other LINUX clusters, too! The real HPC World • XLC lacks support for some standard template code (e.g. partial template specialization). • Libtool gets confused somehow and tries to link shared libraries statically. • Bottleneck ( O ( P )) in communication setup becomes apparent. M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 13 / 19

  16. Porting to BG/P Porting, a piece of cake? Naive Assumptions • Dune uses the autotools-toolchain together with a custom script for managing the module dependencies. • Autotools support cross compilation. • Configure test that need to run MPI programs can be switched off. • DUNE uses standard C++ (but advanced template stuff). • This should be really easy! Worked on other LINUX clusters, too! The real HPC World • XLC lacks support for some standard template code (e.g. partial template specialization). • Libtool gets confused somehow and tries to link shared libraries statically. • Bottleneck ( O ( P )) in communication setup becomes apparent. M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 13 / 19

  17. Porting to BG/P Problem Resolutions Missing template support in XLC • Thank goodness, GNU C++ compiler is also available! Libtool problem • Use special option for Darwin (-dynamic). • Thanks to Bernd Mohr (JSC) and Frank Ingram (IBM). O ( P ) bottleneck • At the time programming we were not thinking > 512 processors. • Fortunately we use a structured tensor product grid for our simulation. • Therefore we do not need send all indices in a ring!! • Switched to asynchronous communication with just the neighboring processors. Now O (3 d ) for dimension d. M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 14 / 19

  18. Porting to BG/P Problem Resolutions Missing template support in XLC • Thank goodness, GNU C++ compiler is also available! Libtool problem • Use special option for Darwin (-dynamic). • Thanks to Bernd Mohr (JSC) and Frank Ingram (IBM). O ( P ) bottleneck • At the time programming we were not thinking > 512 processors. • Fortunately we use a structured tensor product grid for our simulation. • Therefore we do not need send all indices in a ring!! • Switched to asynchronous communication with just the neighboring processors. Now O (3 d ) for dimension d. M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 14 / 19

  19. Porting to BG/P Problem Resolutions Missing template support in XLC • Thank goodness, GNU C++ compiler is also available! Libtool problem • Use special option for Darwin (-dynamic). • Thanks to Bernd Mohr (JSC) and Frank Ingram (IBM). O ( P ) bottleneck • At the time programming we were not thinking > 512 processors. • Fortunately we use a structured tensor product grid for our simulation. • Therefore we do not need send all indices in a ring!! • Switched to asynchronous communication with just the neighboring processors. Now O (3 d ) for dimension d. M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 14 / 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend