M A PHYS or the development of a parallel algebraic domain - PowerPoint PPT Presentation

M A PHYS or the development of a parallel algebraic domain decomposition solver in the course of the Solstice project Emmanuel A GULLO , Luc G IRAUD , Abdou G UERMOUCHE , Azzam H AIDAR , Yohan L I -T IN -Y IEN , Jean R OMAN HiePACS project - INRIA Bordeaux Sud-Ouest joint INRIA-CERFACS lab. on High Performance Computing CERFACS Sparse Days Toulouse, June 2010

Outline Motivations 1 A parallel algebraic domain decompostion solver 2 Parallel and numerical scalability on 3D academic 3 problems Parallel and numerical scalability on 3D Solstice 4 problems Prospectives 5 HiePACS team 2/30 Algebraic parallel domain decomposition solver

Motivations A x = b The “spectrum” of linear algebra solvers Direct Iterative Robust/accurate for general Problem dependent efficiency/controlled problems accuracy BLAS-3 based implementations Only mat-vect required, fine grain computation Memory/CPU prohibitive for large 3 D Less memory computation, possible trade-off problems with CPU Limited parallel scalability Attractive “build-in” parallel features

Overlapping Domain Decomposition Classical Additive Schwarz preconditioners δ Ω2 Goal: solve linear system A x = b Use iterative method Apply the preconditioner at each step The convergence rate deteriorates as the number of subdomains increases Ω1 0 1 0 1 − 1 A 1 , 1 A 1 ,δ A 1 , 1 A 1 ,δ A = ⇒ M δ A = A δ, 1 A δ,δ A δ, 2 AS = − 1 A δ, 1 A δ,δ A δ, 2 @ @ A A δ, 2 A 2 , 2 A δ, 2 A 2 , 2 Classical Additive Schwarz preconditioners N subdomains case N “ ” T “ ” − 1 R δ X M δ R δ A δ AS = i i i i = 1 HiePACS team 4/30 Algebraic parallel domain decomposition solver

Non-overlapping Domain Decomposition Schur complement reduced system Γ Ω2 Goal: solve linear system A x = b Apply partially Gaussian elimination Solve the reduced system S x Γ = f Then solve A i x i = b i − A i , Γ x Γ Ω1 0 1 b 1 0 1 0 1 A 1 , 1 0 A 1 , Γ x 1 B C B C B C B C b 2 B C B C B C 0 A 2 , 2 A 2 , Γ x 2 A = B C B C B C 2 @ A @ B C X A Γ , i A − 1 @ A b Γ − b i S 0 0 x Γ i , i i = 1 Solve A x = b = ⇒ solve the reduced system S x Γ = f = ⇒ then solve A i x i = b i − A i , Γ x Γ 2 X A Γ , i A − 1 where S = A Γ , Γ − i , i A i , Γ , i = 1 2 X A Γ , i A − 1 f = − and b Γ b i . i , i i = 1 HiePACS team 5/30 Algebraic parallel domain decomposition solver

Nonoverlapping Domain Decomposition Schur complement reduced system m n k l Ωι+1 Γ = k ∪ ℓ ∪ m ∪ n Ωι Ωι+2 Distributed Schur complement Ω ι Ω ι + 1 Ω ι + 2 z }| { z }| { z }| { ! ! ! S ( ι ) S ( ι + 1 ) S ( ι + 2 ) S k ℓ S ℓ m S mn mm kk ℓℓ S ( ι ) S ( ι + 1 ) S ( ι + 2 ) S ℓ k S m ℓ S nm mm nn ℓℓ X In an assembled form: S ℓℓ = S ( ι ) ℓℓ + S ( ι + 1 ) S ( ι ) = ⇒ S ℓℓ = ℓℓ ℓℓ ι ∈ adj HiePACS team 6/30 Algebraic parallel domain decomposition solver

Non-overlapping Domain Decomposition Algebraic Additive Schwarz preconditioner [ L.Carvalho, L.G., G.Meurant - 01] N X R T Γ i S ( i ) R Γ i S = i = 1 0 1 0 1 ... ... B C B C B C B − 1 C S kk S k ℓ S kk S k ℓ B C B C S = = ⇒ M = B C B C − 1 S ℓ k S ℓℓ S ℓ m S ℓ k S ℓℓ S ℓ m B C B C @ A @ A S m ℓ S mm S mn S m ℓ S mm S mn S nm S nn S nm S nn N X R T S ( i ) ) − 1 R Γ i Γ i ( ¯ M = i = 1 S ( i ) is obtained from S ( i ) where ¯ Similarity with Neumann-Neumann preconditioner [J.F Bourgat, R. ! „ « S ( ι ) S k ℓ S kk S k ℓ Glowinski, P . Le Tallec and M. S ( i ) = S ( i ) = ⇒ ¯ kk = S ( ι ) S ℓ k S ℓℓ Vidrascu - 89] [Y.H. de S ℓ k ℓℓ Roeck, P . Le Tallec and M. Vidrascu | {z } | {z } - 91] local Schur local assembled Schur ց ր X S ( ι ) ℓℓ ι ∈ adj HiePACS team 7/30 Algebraic parallel domain decomposition solver

Parallel preconditioning features S ( i ) = A ( i ) Γ i Γ i − A Γ i I i A − 1 I i I i A I i Γ i E g E m Ω i # domains � i (¯ R T S ( i ) ) − 1 R i E ℓ M AS = E k i = 1 Ω j S ( i ) 0 1 0 1 S mg S mk S m ℓ S mm S mg S mk S m ℓ mm S ( i ) S gm S gg S gk S g ℓ S gm S gk S g ℓ B C S ( i ) = S ( i ) = ¯ B C gg B C B C S ( i ) S km S kg S kk S k ℓ B S km S kg S k ℓ C @ A kk @ A S ℓ m S ℓ g S ℓ k S ℓℓ S ( i ) S ℓ m S ℓ g S ℓ k ℓℓ Assembled local Schur complement local Schur complement � S ( j ) S mm = mm j ∈ adj ( m ) HiePACS team 8/30 Algebraic parallel domain decomposition solver

Parallel implementation Each subdomain A ( i ) is handled by one processor „ A I i I i « A I i Γ i A ( i ) ≡ A ( i ) A I i Γ i ΓΓ Concurrent partial factorizations are performed on each processor to form the so called “local Schur complement” S ( i ) = A ( i ) ΓΓ − A Γ i I i A − 1 I i I i A I i Γ i The reduced system S x Γ = f is solved using a distributed Krylov solver Γ ) k = ( y ( i ) ) k - One matrix vector product per iteration each processor computes S ( i ) ( x ( i ) - One local preconditioner apply ( M ( i ) )( z ( i ) ) k = ( r ( i ) ) k - Local neighbor-neighbor communication per iteration - Global reduction (dot products) Compute simultaneously the solution for the interior unknowns A I i I i x I i = b I i − A I i Γ i x Γ i HiePACS team 9/30 Algebraic parallel domain decomposition solver

Algebraic Additive Schwarz preconditioner Main characteristics in 2 D The ratio interface/interior is small Does not require large amount of memory to store the preconditioner Computation/application of the preconditioner are fast They consist in a call to L APACK /B LAS -2 kernels Main characteristics in 3 D The ratio interface/interior is large The storage of the preconditioner might not be affordable The construction of the preconditioner can be computationally expensive Need cheaper Algebraic Additive Schwarz form of the preconditioner HiePACS team 10/30 Algebraic parallel domain decomposition solver

How to alleviate the preconditioner construction Sparsification strategy through dropping � ¯ ¯ s k ℓ ≥ ξ ( | ¯ s kk | + | ¯ s ℓℓ | ) s k ℓ if � s k ℓ = 0 else Approximation through ILU - [ INRIA PhyLeas - A. Haidar, L.G., Y.Saad - 10] A ii ! „ « „ ˜ « A i Γ i ˜ L − 1 ˜ L i 0 U i A i Γ pILU ( A ( i ) ) ≡ pILU ≡ i A ( i ) A Γ i ˜ ˜ U − 1 S ( i ) A Γ i i I 0 Γ i Γ i i Mixed arithmetic strategy Compute and store the preconditioner in 32-bit precision arithmetic Remarks: the backward stability result of GMRES indicates that it is hopeless to expect convergence at a backward error level smaller than the 32-bit accuracy [C.Paige, M.Rozloˇ zn´ ık, Z.Strakoˇ s - 06] Idea: To overcome this limitation we use FGMRES [Y.Saad - 93; Arioli, Duff - 09] Exploit two levels of parallelism Use a parallel sparse direct solver on each sub-domains/sub-graphs HiePACS team 11/30 Algebraic parallel domain decomposition solver

Academic model problems Problem patterns Circular flow velocity Problem −1− 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 Diffusion equation ( ǫ = 1 and v = 0) and convection-diffusion equation  − ǫ div ( K . ∇ u ) + v . ∇ u = f in Ω , u = 0 on ∂ Ω . Heterogeneous problems Anisotropic-heterogeneous problems Convection dominated term HiePACS team 12/30 Algebraic parallel domain decomposition solver

Numerical behaviour of sparse preconditioners Convergence history of PCG Time history of PCG 3D heterogeneous diffusion problem 3D heterogeneous diffusion problem 0 Dense calculation 0 Dense calculation 10 10 Sparse with ξ =10 −5 Sparse with ξ =10 −5 Sparse with ξ =10 −4 Sparse with ξ =10 −4 −2 −2 10 10 Sparse with ξ =10 −3 Sparse with ξ =10 −3 Sparse with ξ =10 −2 Sparse with ξ =10 −2 −4 −4 10 10 −6 −6 10 10 ||r k ||/||b|| ||r k ||/||b|| −8 −8 10 10 −10 −10 10 10 −12 −12 10 10 −14 −14 10 10 −16 −16 10 10 0 20 40 60 80 100 120 140 160 180 200 220 240 0 20 40 60 80 100 120 140 160 180 # iter Time(sec) 3 D heterogeneous diffusion problem with 43 Mdof mapped on 1000 processors For ( ξ ≪ )the convergence is marginally affected while the memory saving is significant 15% For ( ξ ≫ ) a lot of resources are saved but the convergence becomes very poor 1% Even though they require more iterations, the sparsified variants converge faster as the time per iteration is smaller and the setup of the preconditioner is cheaper HiePACS team 13/30 Algebraic parallel domain decomposition solver

M A PHYS or the development of a parallel algebraic domain - PowerPoint PPT Presentation

M A PHYS or the development of a parallel algebraic domain decomposition solver in the course of the Solstice project Emmanuel A GULLO , Luc G IRAUD , Abdou G UERMOUCHE , Azzam H AIDAR , Yohan L I -T IN -Y IEN , Jean R OMAN HiePACS project -

Welcome to Comp/Phys/Mtsc 715 1/11/2011 Introduction Comp/Phys/Mtsc 715 Taylor 1 1/11/2011

A. Operations with algebraic Algebra practice part 1 expressions 3 4 A. Operations with

Parallel Processing in Algebraic Number Theory Bill Hart February 1, 2007 Bill Hart Parallel

Early Universe and BBN ASTR/PHYS 4080: Intro to Cosmology Week 9 ASTR/PHYS 4080: Introduction to

7-azaindole dimer Takeuchi and Tahara, Chem. Phys. Lett. 277, 340 (1997) J. Phys. Chem. A 103, 4808

3-BODY QUANTIZATION CONDITION IN UNITARY FORMALISM [Eur.Phys.J. A53 (2017) no.9] [Eur.Phys.J.

Some Algebraic Structures Here are some algebraic structures we will study this year : Rings

Algebraic and holomorphic flows in the bi-algebraic context Emmanuel Ullmo, IHES joint work with

Algebraic property testing Elad Haramaty Northeastern University Algebraic property testing

Algebraic Properties of ln( x ) We can derive algebraic properties of our new function f ( x ) =

Combinatorial algebraic topology of toric arrangements. Emanuele Delucchi (SNSF / Universit e

The ABCs of ADTs Algebraic Data Types Justin Lubin January 18, 2018 Asynchronous Anonymous @

Convex Algebraic Geometry Cynthia Vinzant, North Carolina State University Cynthia Vinzant

Algebraic Data Types Mark Hibberd Mar 28, 2011 Mark Hibberd Algebraic Data Types Overview

How to Find Algebraic Relations? Manuel Kauers RISC-Linz, Austria Algebraic Relations

An Implementation of Algebraic Data Types in Java using the Visitor Pattern Anton Setzer 1.

Major Factors to the FY08 Budget: #1. A 5% Revenue Increase based on existing rates. #2. A 6%

BOARD OF TRUSTEES FINANCE AND AUDIT COMMITTEE BUDGET WORKSHOP MAY 27, 2004 BUDGET PROCESS

Distributed Representations of Sentences and Documents Quoc Le and Tomas Mikolov (ICML 2014)

Simple and Effective Multi-Paragraph Reading Comprehension Christopher Clark and Matt Gardner

APPLICABLE AUTOMOTIVE TECHNOLOGY IN THE FUTURE. GAIKINDO INTERNATIONAL AUTOMOTIVE CONFERENCE

Oriola-KD Corporation January-March 2012 Eero Hautaniemi President and CEO 26 April 2012 Q1

Resistor networks and transfer resistance matrices K. Paridis, 1 A Adler 2 W. R. B. Lionheart,

LAND AFFIDAVITS LAND AFFIDAVITS CONSIST OF THREE MAIN SECTIONS CONCERNING THE STATUS OF THE

Sambuz

Useful Links

Newsletter

Mail Us