 
              Automatic Domain Decomposition for a Black-Box PDE Solver Torsten Adolph and Willi Sch¨ onauer Forschungszentrum Karlsruhe Institute for Scientific Computing Karlsruhe, Germany torsten.adolph@iwr.fzk.de willi.schoenauer@iwr.fzk.de http://www.fzk.de/iwr http://www.rz.uni-karlsruhe.de/rz/docs/FDEM/Literatur
Motivation Numerical solution of non-linear systems of Partial Differential Equations (PDEs) • Finite Difference Method (FDM) • Finite Element Method (FEM) • Finite Volume Method (FVM) Finite Difference Element Method (FDEM) Combination of advantages of FDM and FEM: FDM on unstructured FEM grid Torsten Adolph 2 DD17, July 3-7, 2006
Objectives • Elliptic and parabolic non-linear systems of PDEs • 2-D and 3-D with arbitrary geometry • Arbitrary non-linear boundary conditions (BCs) • Subdomains with different PDEs • Robustness • Black-box (PDEs/BCs and domain) • Error estimate • Order control/Mesh refinement • Efficient parallelization Torsten Adolph DD17, July 3-7, 2006 3
Difference formulas of order q on unstructured grid Polynomial approach of order q (m coefficients) 2-D: m = (q+1) · (q+2)/2 3-D: m = (q+1) · (q+2) · (q+3)/6 ✉ 3 ✉ 4 ✚ ✚ ✚ ✚ ✚ ✚ ✚✚ ✚✚ ✚✚ ✚✚ ✚✚ ✚✚ ✉ 2 ✚ ✚ ✚ ✚ ✚ ✚ q=2 ✚✚ ✚✚ ✚✚ ✚✚ ✚✚ ✚✚ ♠ ✉ 0 ✚ ✚ ✚ ✚ ✚ ✚ ✚✚ ✚✚ ✚✚ ✚✚ ✚✚ ✚✚ q=4 ✉ ✚ ✚ ✚ ✚ ✚ ✚ ✚✚ ✚✚ ✚✚ ✚✚ ✚✚ ✚✚ q=6 ✚ ✚ ✚ ✚ ✚ ✚ ✚✚ ✚✚ ✚✚ ✚✚ ✚✚ ✚✚ ✉ 1 ✉ ✚ ✚ ✚ ✚ ✚ ✚ ✚✚ ✚✚ ✚✚ ✚✚ ✚✚ ✚✚ 5=m-1 order q=2 ⎧ 1, node i ⎪ ⎨ Influence polynomial P q,i = → u d , u x,d , u y,d , u xx,d , u yy,d , u xy,d 0, other nodes ⎪ ⎩ Search for nodes in rings (up to order q+ ∆ q) → m+r nodes Selection of m appropriate nodes by special algorithm Torsten Adolph DD17, July 3-7, 2006 4
Discretization error estimate u x = u x,d,q + ¯ d x,q = u x,d,q +2 + ¯ e.g. for u x : d x,q +2 + ¯ � � → d x,q = u x,d,q +2 − u x,d,q d x,q +2 Error equation P u ≡ P ( t, x, y, u, u t , u x , u y , u xx , u yy , u xy ) Linearization by Newton-Raphson Discretization with error estimates d t , d x , . . . and linearization in d t , d x , . . . → ∆ u d = ∆ u P u + ∆ u D t + ∆ u D x + ∆ u D y + ∆ u D xy = (level of solution) = Q − 1 · [( P u ) d + D t + { D x + D y + D xy } ] (level of equation) d Only apply Newton correction ∆ u P u : → Q d · ∆ u P u = ( P u ) d Torsten Adolph DD17, July 3-7, 2006 5
Problem: Black-box for PDEs and domain User input: any system of PDEs Local mesh refinement any unstructured FEM grid 2-D and 3-D (Sliding) dividing lines dividing line P (1) P (2) Subdomains with P (3) different PDEs Solution: 1-D DD with overlap Torsten Adolph DD17, July 3-7, 2006 6
Re-sorting of the nodes Domain Node numbering Objective (from mesh generator) (by re-sorting for x-coordinate) 4 22 21 20 19 18 17 3 8 16 24 32 40 48 56 64 23 44 43 42 41 40 39 16 7 15 23 31 39 47 55 63 24 45 58 57 56 55 38 15 6 14 22 30 38 46 54 62 25 46 59 64 63 54 37 14 5 13 21 29 37 45 53 61 26 47 60 61 62 53 36 13 4 12 20 28 36 44 52 60 27 48 49 50 51 52 35 12 3 11 19 27 35 43 51 59 ✉ ✉ ✉ ✉ 28 29 30 31 32 33 34 11 2 10 18 26 34 42 50 58 ✉ ❤ ✉ ✉ ✉ ❤ 1 5 6 7 8 9 10 2 1 9 17 25 33 41 49 57 ✉ ✉ ✉ ✉ Difference star: 29 1 5 28 48 49 10 1 2 9 11 19 → all 4 processors involved → only 2 (neighboured) Proc. 1 2 3 4 processors involved Torsten Adolph DD17, July 3-7, 2006 7
Algorithm for global sorting of the nodes I • Needs 2 · (np-1) steps on np processors • Step i ∈ { 1, . . . , np-1 } : Sorting until first sorted nodes are received by proc. np • Step i ∈ { np, . . . , 2 · (np-1) } : Sorting, proc. np sends sorted nodes to processors 1 to np-1 • Always send to the right neighbour processor (except for processor np) • Always receive from the left neighbour processor • Up to np/2 processors active in parallel • Communication via MPI • Start with local sorting of the nodes (heapsort) • Step after receiving nodes: merging (= local sorting) Torsten Adolph DD17, July 3-7, 2006 8
Algorithm for global sorting of the nodes II • Formal description of step i (illustration on next slide) ip ≥ i+3 ∧ ip ≤ min(i,np) i odd merge 2 ip ≥ i+1 ∧ ip ≤ min(i,np) send 2 ip ≥ i+3 ∧ ip ≤ min(i,np) receive 2 i ∈ { 1, . . . , np-1 } : additionally: ip = i+1 i ∈ { np, . . . , 2 · (np-1) } : ip = i-np+1 ip ≥ i 2 + 1 ∧ ip ≤ min(i,np) i even merge ip ≥ i 2 + 1 ∧ ip ≤ min(i,np) send ip ≥ i 2 + 2 ∧ ip ≤ min(i,np) receive i ∈ { 1, . . . , np-1 } : additionally: ip = i+1 i ∈ { np, . . . , 2 · (np-1) } : ip = i-np+1 Torsten Adolph DD17, July 3-7, 2006 9
Illustration of the global re-sorting algorithm Step Proc. 1 2 3 4 5 6 7 1 S M Merge 2 R M S S S Send 3 R M S R M S S R Receive 4 R M S R M S R M S S 5 R M S R M S R M S R 6 R M S R M S R 7 R M S R np=8 R Step Proc. 8 9 10 11 12 13 14 1 R 2 R 3 R 4 R 5 M S S R 6 M S R M S R M S S R 7 M S R M S R M S R M S R M S S R np=8 M S R M S R M S R M S R M S R M S R M S Torsten Adolph DD17, July 3-7, 2006 10
Distribution of the elements mesh generator mesh file proc. ip 1 ip 2 ip np-1 ip np · · · old data . . . (static) shifted in ring to the right new data Processor that owns leftmost node of an element becomes element owner → Execution of 2 ring shifts 1 st : Determination of owner of leftmost node 2 nd : Storing of element numbers on owning processors Torsten Adolph DD17, July 3-7, 2006 11
Distribution of the boundary/DL/SDL nodes mesh generator DL: dividing line SDL: sliding dividing line mesh file proc. ip 1 ip 2 ip np-1 ip np · · · old data . . . (static) shifted in ring to the right new data Compare received node numbers of boundary/DL/SDL nodes to node numbers of own nodes → Store matching node numbers in arrays for boundary/DL/SDL nodes Send non-matching node numbers to right neighbour processor Torsten Adolph DD17, July 3-7, 2006 12
Illustration of 1-D DD (np=4) 1 2 Proc. 1 2 3 4 3 4 1 2 3 4 1 2 3 4 Torsten Adolph DD17, July 3-7, 2006 13
Overlap ⎫ Computation of the right hand side ⎬ ⎭ local (without communication) and of the matrix Q d → Store necessary nodes and elements of neighbour processors on proc. ip Proc. ip-1 ip ip+1 overlap own overlap necessary nodes on proc. ip ip-1, ip+1: overlap processors of proc. ip Width of overlap: Compute mean edge length h mean Choose safety factor a overlap Compute x overlap,1 , x overlap,2 x overlap,1 = 0.5 · a overlap · h mean · ( √ m(q+ ∆ q)-1 ) 1. criterion (enough nodes): x overlap,2 = a overlap · h mean · (q+ ∆ q) 2. criterion (enough rings): Compute x overlap = max(x overlap,1 , x overlap,2 ) Torsten Adolph DD17, July 3-7, 2006 14
Illustration of overlap x left = x min − x overlap x right = x max + x overlap ip2 ip3 ip4 ip5 ip6 ip7 x overlap x overlap x left,4 x min,4 x max,4 x right,4 Torsten Adolph DD17, July 3-7, 2006 15
Bandwidth optimization Before re-sorting After re-sorting Bandwidth: Full 2253 Before re-sorting 2154 After re-sorting 185 With SSP BO 112 SSP: own improved Cuthill-McKee Torsten Adolph DD17, July 3-7, 2006 16
Summary • Black-box PDE solver FDEM (URL: http://www.rz.uni-karlsruhe.de/rz/docs/FDEM/Literatur) • User input: any PDE system, any domain, 2-D and 3-D • Global re-sorting algorithm for nodes • Send elements in ring-shift to owning processors → 1-D DD with overlap • Computation of linear system of equations purely local • Efficient parallelization with MPI • Built-in bandwidth optimizer This 1-D DD is simple, robust and efficient! Torsten Adolph DD17, July 3-7, 2006 17
Recommend
More recommend