automatic domain decomposition for a black box pde solver
play

Automatic Domain Decomposition for a Black-Box PDE Solver Torsten - PowerPoint PPT Presentation

Automatic Domain Decomposition for a Black-Box PDE Solver Torsten Adolph and Willi Sch onauer Forschungszentrum Karlsruhe Institute for Scientific Computing Karlsruhe, Germany torsten.adolph@iwr.fzk.de willi.schoenauer@iwr.fzk.de


  1. Automatic Domain Decomposition for a Black-Box PDE Solver Torsten Adolph and Willi Sch¨ onauer Forschungszentrum Karlsruhe Institute for Scientific Computing Karlsruhe, Germany torsten.adolph@iwr.fzk.de willi.schoenauer@iwr.fzk.de http://www.fzk.de/iwr http://www.rz.uni-karlsruhe.de/rz/docs/FDEM/Literatur

  2. Motivation Numerical solution of non-linear systems of Partial Differential Equations (PDEs) • Finite Difference Method (FDM) • Finite Element Method (FEM) • Finite Volume Method (FVM) Finite Difference Element Method (FDEM) Combination of advantages of FDM and FEM: FDM on unstructured FEM grid Torsten Adolph 2 DD17, July 3-7, 2006

  3. Objectives • Elliptic and parabolic non-linear systems of PDEs • 2-D and 3-D with arbitrary geometry • Arbitrary non-linear boundary conditions (BCs) • Subdomains with different PDEs • Robustness • Black-box (PDEs/BCs and domain) • Error estimate • Order control/Mesh refinement • Efficient parallelization Torsten Adolph DD17, July 3-7, 2006 3

  4. Difference formulas of order q on unstructured grid Polynomial approach of order q (m coefficients) 2-D: m = (q+1) · (q+2)/2 3-D: m = (q+1) · (q+2) · (q+3)/6 ✉ 3 ✉ 4 ✚ ✚ ✚ ✚ ✚ ✚ ✚✚ ✚✚ ✚✚ ✚✚ ✚✚ ✚✚ ✉ 2 ✚ ✚ ✚ ✚ ✚ ✚ q=2 ✚✚ ✚✚ ✚✚ ✚✚ ✚✚ ✚✚ ♠ ✉ 0 ✚ ✚ ✚ ✚ ✚ ✚ ✚✚ ✚✚ ✚✚ ✚✚ ✚✚ ✚✚ q=4 ✉ ✚ ✚ ✚ ✚ ✚ ✚ ✚✚ ✚✚ ✚✚ ✚✚ ✚✚ ✚✚ q=6 ✚ ✚ ✚ ✚ ✚ ✚ ✚✚ ✚✚ ✚✚ ✚✚ ✚✚ ✚✚ ✉ 1 ✉ ✚ ✚ ✚ ✚ ✚ ✚ ✚✚ ✚✚ ✚✚ ✚✚ ✚✚ ✚✚ 5=m-1 order q=2 ⎧ 1, node i ⎪ ⎨ Influence polynomial P q,i = → u d , u x,d , u y,d , u xx,d , u yy,d , u xy,d 0, other nodes ⎪ ⎩ Search for nodes in rings (up to order q+ ∆ q) → m+r nodes Selection of m appropriate nodes by special algorithm Torsten Adolph DD17, July 3-7, 2006 4

  5. Discretization error estimate u x = u x,d,q + ¯ d x,q = u x,d,q +2 + ¯ e.g. for u x : d x,q +2 + ¯ � � → d x,q = u x,d,q +2 − u x,d,q d x,q +2 Error equation P u ≡ P ( t, x, y, u, u t , u x , u y , u xx , u yy , u xy ) Linearization by Newton-Raphson Discretization with error estimates d t , d x , . . . and linearization in d t , d x , . . . → ∆ u d = ∆ u P u + ∆ u D t + ∆ u D x + ∆ u D y + ∆ u D xy = (level of solution) = Q − 1 · [( P u ) d + D t + { D x + D y + D xy } ] (level of equation) d Only apply Newton correction ∆ u P u : → Q d · ∆ u P u = ( P u ) d Torsten Adolph DD17, July 3-7, 2006 5

  6. Problem: Black-box for PDEs and domain User input: any system of PDEs Local mesh refinement any unstructured FEM grid 2-D and 3-D (Sliding) dividing lines dividing line P (1) P (2) Subdomains with P (3) different PDEs Solution: 1-D DD with overlap Torsten Adolph DD17, July 3-7, 2006 6

  7. Re-sorting of the nodes Domain Node numbering Objective (from mesh generator) (by re-sorting for x-coordinate) 4 22 21 20 19 18 17 3 8 16 24 32 40 48 56 64 23 44 43 42 41 40 39 16 7 15 23 31 39 47 55 63 24 45 58 57 56 55 38 15 6 14 22 30 38 46 54 62 25 46 59 64 63 54 37 14 5 13 21 29 37 45 53 61 26 47 60 61 62 53 36 13 4 12 20 28 36 44 52 60 27 48 49 50 51 52 35 12 3 11 19 27 35 43 51 59 ✉ ✉ ✉ ✉ 28 29 30 31 32 33 34 11 2 10 18 26 34 42 50 58 ✉ ❤ ✉ ✉ ✉ ❤ 1 5 6 7 8 9 10 2 1 9 17 25 33 41 49 57 ✉ ✉ ✉ ✉ Difference star: 29 1 5 28 48 49 10 1 2 9 11 19 → all 4 processors involved → only 2 (neighboured) Proc. 1 2 3 4 processors involved Torsten Adolph DD17, July 3-7, 2006 7

  8. Algorithm for global sorting of the nodes I • Needs 2 · (np-1) steps on np processors • Step i ∈ { 1, . . . , np-1 } : Sorting until first sorted nodes are received by proc. np • Step i ∈ { np, . . . , 2 · (np-1) } : Sorting, proc. np sends sorted nodes to processors 1 to np-1 • Always send to the right neighbour processor (except for processor np) • Always receive from the left neighbour processor • Up to np/2 processors active in parallel • Communication via MPI • Start with local sorting of the nodes (heapsort) • Step after receiving nodes: merging (= local sorting) Torsten Adolph DD17, July 3-7, 2006 8

  9. Algorithm for global sorting of the nodes II • Formal description of step i (illustration on next slide) ip ≥ i+3 ∧ ip ≤ min(i,np) i odd merge 2 ip ≥ i+1 ∧ ip ≤ min(i,np) send 2 ip ≥ i+3 ∧ ip ≤ min(i,np) receive 2 i ∈ { 1, . . . , np-1 } : additionally: ip = i+1 i ∈ { np, . . . , 2 · (np-1) } : ip = i-np+1 ip ≥ i 2 + 1 ∧ ip ≤ min(i,np) i even merge ip ≥ i 2 + 1 ∧ ip ≤ min(i,np) send ip ≥ i 2 + 2 ∧ ip ≤ min(i,np) receive i ∈ { 1, . . . , np-1 } : additionally: ip = i+1 i ∈ { np, . . . , 2 · (np-1) } : ip = i-np+1 Torsten Adolph DD17, July 3-7, 2006 9

  10. Illustration of the global re-sorting algorithm Step Proc. 1 2 3 4 5 6 7 1 S M Merge 2 R M S S S Send 3 R M S R M S S R Receive 4 R M S R M S R M S S 5 R M S R M S R M S R 6 R M S R M S R 7 R M S R np=8 R Step Proc. 8 9 10 11 12 13 14 1 R 2 R 3 R 4 R 5 M S S R 6 M S R M S R M S S R 7 M S R M S R M S R M S R M S S R np=8 M S R M S R M S R M S R M S R M S R M S Torsten Adolph DD17, July 3-7, 2006 10

  11. Distribution of the elements mesh generator mesh file proc. ip 1 ip 2 ip np-1 ip np · · · old data . . . (static) shifted in ring to the right new data Processor that owns leftmost node of an element becomes element owner → Execution of 2 ring shifts 1 st : Determination of owner of leftmost node 2 nd : Storing of element numbers on owning processors Torsten Adolph DD17, July 3-7, 2006 11

  12. Distribution of the boundary/DL/SDL nodes mesh generator DL: dividing line SDL: sliding dividing line mesh file proc. ip 1 ip 2 ip np-1 ip np · · · old data . . . (static) shifted in ring to the right new data Compare received node numbers of boundary/DL/SDL nodes to node numbers of own nodes → Store matching node numbers in arrays for boundary/DL/SDL nodes Send non-matching node numbers to right neighbour processor Torsten Adolph DD17, July 3-7, 2006 12

  13. Illustration of 1-D DD (np=4) 1 2 Proc. 1 2 3 4 3 4 1 2 3 4 1 2 3 4 Torsten Adolph DD17, July 3-7, 2006 13

  14. Overlap ⎫ Computation of the right hand side ⎬ ⎭ local (without communication) and of the matrix Q d → Store necessary nodes and elements of neighbour processors on proc. ip Proc. ip-1 ip ip+1 overlap own overlap necessary nodes on proc. ip ip-1, ip+1: overlap processors of proc. ip Width of overlap: Compute mean edge length h mean Choose safety factor a overlap Compute x overlap,1 , x overlap,2 x overlap,1 = 0.5 · a overlap · h mean · ( √ m(q+ ∆ q)-1 ) 1. criterion (enough nodes): x overlap,2 = a overlap · h mean · (q+ ∆ q) 2. criterion (enough rings): Compute x overlap = max(x overlap,1 , x overlap,2 ) Torsten Adolph DD17, July 3-7, 2006 14

  15. Illustration of overlap x left = x min − x overlap x right = x max + x overlap ip2 ip3 ip4 ip5 ip6 ip7 x overlap x overlap x left,4 x min,4 x max,4 x right,4 Torsten Adolph DD17, July 3-7, 2006 15

  16. Bandwidth optimization Before re-sorting After re-sorting Bandwidth: Full 2253 Before re-sorting 2154 After re-sorting 185 With SSP BO 112 SSP: own improved Cuthill-McKee Torsten Adolph DD17, July 3-7, 2006 16

  17. Summary • Black-box PDE solver FDEM (URL: http://www.rz.uni-karlsruhe.de/rz/docs/FDEM/Literatur) • User input: any PDE system, any domain, 2-D and 3-D • Global re-sorting algorithm for nodes • Send elements in ring-shift to owning processors → 1-D DD with overlap • Computation of linear system of equations purely local • Efficient parallelization with MPI • Built-in bandwidth optimizer This 1-D DD is simple, robust and efficient! Torsten Adolph DD17, July 3-7, 2006 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend