Algorithms in the parallel Algorithms in the parallel partitioning - - PowerPoint PPT Presentation

algorithms in the parallel algorithms in the parallel
SMART_READER_LITE
LIVE PREVIEW

Algorithms in the parallel Algorithms in the parallel partitioning - - PowerPoint PPT Presentation

Keldysh Institute Keldysh Institute of of Applied Applied Mathematics Mathematics (KIAM) (KIAM) RAS, RAS, Moscow Moscow, , Russia Russia Algorithms in the parallel Algorithms in the parallel partitioning tool partitioning tool


slide-1
SLIDE 1

Algorithms in the parallel Algorithms in the parallel partitioning tool partitioning tool GridSpiderPar GridSpiderPar for large for large mesh decomposition mesh decomposition

Evdokia Evdokia N.

  • N. Golovchenko

Golovchenko, , Marina Marina A.

  • A. Kornilina

Kornilina, , Mikhail Mikhail V.

  • V. Yakobovskiy

Yakobovskiy

Keldysh Keldysh Institute Institute of

  • f Applied

Applied Mathematics Mathematics (KIAM) (KIAM) RAS, RAS, Moscow Moscow, , Russia Russia

slide-2
SLIDE 2

2 2

Decomposition

  • parallel mesh-based numerical simulations in

continuum mechanics, electrodynamics and other PDE’s problems on distributed memory systems

Geometric parallelism

Efficient processors usage

balanced mesh distribution among processors reducing interprocessor communications

slide-3
SLIDE 3

3 3

Use of partitions into microdomains

forming of subdomains from microdomains large mesh storage domain decomposition methods (Schwarz method)

slide-4
SLIDE 4

4 4

Serial partitioning tools

METIS, Jostle, Scotch, Chaco, Party

Parallel partitioning tools

ParMETIS, Jostle, PT-Scotch, Zoltan

Research area

  • unstructured meshes with up to 109 elements
slide-5
SLIDE 5

5 5

Multilevel algorithm of graph partitioning

slide-6
SLIDE 6

6 6

Shortcomings of present graph partitioning methods

  • forming of unconnected subdomains
  • generation of strongly imbalanced partitions

(ParMETIS: number of vertices in some subdomains can be two times larger than in the others)

  • can’t always make partitions into large

number of microdomains

slide-7
SLIDE 7

7 7

Connectivity is important:

  • iterative linear system

solving methods

  • mesh data

compression

  • subdomain

composition algorithm1

  • TIM-2D code

parallelizing method2

1 Ilyushin A.I., Kolmakov A.A., Menshov I.S. Constructing parallel numerical model by means of

the composition of computational objects // Mathematical Models and Computer Simulations.

  • 2012. Vol. 4. Issue 1. 118-128.

2 A. A. Voropinov. Data decomposition for TIM-2D code parallelizing method and its quality

evaluation criteria // Bulletin of the South Ural State University. Series «Mathematical modelling, programming & computer software». 2009. Issue 4. №37(170). 40-50. unconnected subdomain

slide-8
SLIDE 8

8 8

What’s new: Partitioning tool GridSpiderPar

  • parallel incremental algorithm of graph

partitioning

  • parallel geometric algorithm of mesh

partitioning

slide-9
SLIDE 9

9 9

Algorithms

make partitions of unstructured meshes with up

to 109 elements into large number of microdomains

criteria:

  • generation of balanced partitions
  • forming of connected subdomains
  • reducing edge-cut
slide-10
SLIDE 10

10 10

Incremental algorithm of graph partitioning

  • incremental growth of subdomains
  • diffusion of border vertices between

subdomains

(M. Yakobovskiy, 2005, KIAM RAS)

Example: mesh around an airfoil with a flap

slide-11
SLIDE 11

11 11

Incremental algorithm

  • local refinement of subdomains
  • subdomain quality control
  • release some part of the vertices

in bad subdomains

Example: mesh around an airfoil with a flap

φ = =

− + 1 1

, \ \ T T T T T

k k k k

A

slide-12
SLIDE 12

12 12

Incremental algorithm of graph partitioning: Distinctions

  • it is not based on multilevel approach
  • it has some features similar to bubble growing

and diffusion algorithms

  • the bubble growing algorithm doesn’t

guarantee that resulting partitions will be balanced

  • difference from diffusion algorithms: it

releases some part of the vertices in subdomains and then grows new subdomains

  • new criterion for subdomain quality control

(layers continuity)

slide-13
SLIDE 13

13 13

  • geometric distribution
  • f vertices among

processors

  • redistribution of small

groups of vertices

Parallel incremental algorithm of graph partitioning

  • local partitioning
  • collecting groups of

bad subdomains and its repartitioning

Example: mesh around an airfoil with a flap

slide-14
SLIDE 14

14 14

Parallel incremental algorithm

  • f graph partitioning:

Distinctions

  • working with groups of subdomains of poor

quality

  • trying to decrease edge-cut in incremental

growth of subdomains

  • number of bad subdomains and edge-cut are

taken into account in criterion of subdomains quality control

slide-15
SLIDE 15

15 15

Parallel incremental algorithm

  • f graph partitioning:

Advantages

  • is aimed at forming of connected subdomains
  • balance of partitions is better than that made

by other graph partitioning methods (5% (60%) → 0.05%)

slide-16
SLIDE 16

16 16

Parallel geometric algorithm of mesh partitioning

  • recursive

coordinate bisection

slide-17
SLIDE 17

cutting plane

17 17

  • making cuts of the cutting plane

along other coordinate axes

  • sorting only coordinates of

vertices close to the cutting plane in local recursive coordinate bisection

Parallel geometric algorithm

  • f mesh partitioning:

Distinctions Advantages

  • difference in numbers of vertices in resulting

subdomains is no more than 1 vertex

  • efficient memory usage (only coordinates are

stored)

slide-18
SLIDE 18

Edge-cut

1 11 1 7 7

slide-19
SLIDE 19

19 19

Tetrahedral meshes

2·108 vertices, 1.46·109 edges 2.6·108 vertices, 1.8·109 edges 2.8·108 vertices, 1.9·109 edges 108 vertices, 7.7·108 edges

slide-20
SLIDE 20

20 20

geometric methods graph partitioning 0,01 0,02 0,01

0,01

RCB

Methods Mesh 1 Mesh 2 Mesh 3 Mesh 4 IncrDecomp 3,5

0,1

0,3 0,2

PartKway

53,4 59,8 58,6

64,3

PartGeomKway

48,7 50,4

62,4

56,5

PT-Scotch

8,3

8,3 8,3 8,3 GeomDecomp

0,01

0,01 0,02 0,01

Partitions into microdomains

Imbalance in 25600 microdomains, %

slide-21
SLIDE 21

21 21

geometric methods graph partitioning 44 14 43

64

RCB

Methods Mesh 1 Mesh 2 Mesh 3 Mesh 4 IncrDecomp 1

PartKway

69

35 37 29

PartGeomKway

67

34 28 37

PT-Scotch

7

2 4 GeomDecomp

62

38 16 33

Partitions into microdomains

Number of unconnected microdomains in 25600

slide-22
SLIDE 22

22 22

5,1 3,7 5,4

5,3

Simple average

geometric methods graph partitioning microdomain graph partitioning Methods Mesh 1 Mesh 2 Mesh 3 Mesh 4

PartKway

12,9 20,6 17,6

28,4

PartGeomKway

31,1 35,7 44,2

51,4

PT-Scotch

4,9

1,7 2,8 2,9 GeomDecomp

Partitions into subdomains

Imbalance in 512 subdomains, %

slide-23
SLIDE 23

MARPLE3D code

(KIAM RAS)

  • Designed for

Designed for multiphysics multiphysics simulations in simulations in the field of the field of radiative radiative plasma dynamics plasma dynamics

23 23

  • Testing of partitions obtained by tools

GridSpiderPar, ParMETIS, Zoltan, and PT-Scotch was performed using simulations of the gas-dynamic problems

  • Computational performance of the simulations with

MARPLE3D code (KIAM RAS) run on different partitions was compared

slide-24
SLIDE 24

24 24

Model simulation of turbulent plasma flow in the ITER (future Tokamak) divertor

  • complex hydrodynamics

system including

  • turbulence
  • conductive&radiative heat

transfer

  • explicit and implicit

schemes

slide-25
SLIDE 25

25 25

Shock wave propagation in an extended structure (shock tube)

  • complex hydrodynamics

system including

  • turbulence
  • explicit and implicit schemes
slide-26
SLIDE 26

26 26

Near-earth explosion simulation

  • full hydrodynamics system including
  • conductive heat transfer
  • explicit and implicit schemes
slide-27
SLIDE 27

27 27

Test meshes Test meshes

  • 3D tetrahedral mesh

(over 3 millions tetrahedrons)

  • mesh refinement in the

vicinity of small objects

  • 256 subdomains

Tokamak divertor (divertor) Shock tube (tube)

  • 3D tetrahedral mesh

(over 25 millions tetrahedrons)

  • mesh refinement in the vicinity
  • f small objects
  • 4096 subdomains
slide-28
SLIDE 28

28 28

Test meshes Test meshes

3D rectangular mesh Over 61 millions cells for “boom” Over 116 millions cells for “boomL” Parallelepipeds with different aspect ratio boom:

  • 4096 subdomains

boomL:

  • 10080 subdomains

28 28

Near-earth explosion (boom и boomL)

  • Dual graphs were constructed for each test mesh with

number of vertices 2.8·106 - 1.2·108 and number of edges 2.3·107 - 1.0·109

  • Computations were carried out on MVS-100K (227,94 TFlop/s),

“Lomonosov" (1700 Tflop/s) and «Helios» (1524.1 TFlop/s)

slide-29
SLIDE 29

0,06% 76,08% 42,51% 5,00% 0,00% 0,00% 0,00% 0,00%

0% 10% 20% 30% 40% 50% 60% 70% 80% I PK PGK PTScotch G RCB RIB HSFC

29 29 29 29

Imbalance in subdomains: lack of vertices (boom)

slide-30
SLIDE 30

0,06% 4,51% 4,51% 5,00% 0,01% 0,01% 0,01% 0,01%

0% 1% 2% 3% 4% 5% 6% I PK PGK PTScotch G RCB RIB HSFC

30 30 30 30

Imbalance in subdomains:

  • verflow of vertices (boom)
slide-31
SLIDE 31

1,79E+07 1,71E+07 1,71E+07 1,74E+07 1,86E+07 1,96E+07 2,23E+07 1,97E+07 6,54E+07

5,0E+06 1,5E+07 2,5E+07 3,5E+07 4,5E+07 5,5E+07 6,5E+07 I PK PGK PTScotch PHG G RCB RIB HSFC

31 31 31 31

Cut edges (tube)

slide-32
SLIDE 32

7,8E+07 8,2E+07 8,1E+07 8,0E+07 8,9E+07 9,3E+07 9,8E+07 1,1E+08

7,0E+07 8,0E+07 9,0E+07 1,0E+08 1,1E+08 I PK PGK PTScotch G RCB RIB HSFC

32 32 32 32

Cut edges (boomL)

slide-33
SLIDE 33

33 33 33 33

Number of time steps (divertor)

8236 8189 7720 8068 5893 5874 5764 4289

4000 4800 5600 6400 7200 8000 I PK PGK PTScotch PHG G RCB RIB HSFC

slide-34
SLIDE 34

34 34 34 34

Number of time steps (tube)

1488 1401 1465 1433 1228 1004 1130 341

200 400 600 800 1000 1200 1400 1600 I PK PGK PTScotch PHG G RCB RIB HSFC

slide-35
SLIDE 35

35 35

Testing of microdomain graph partitions

  • n near-earth explosion simulation

problem

Mesh info Micro- domains Micro- domains in sub- domain Imbalance, % Cut edges Neighbou- ring subdomains (max.) Uncon- nected sub- domains Time steps

имя:

3072 1 9,1

53 140 207

28 1107 BoomL

24576 8

62,5

64 611 859 25 833 49152 16 37,5 66 566 874 25 880

116 214 272

98304 32 18,7 68 841 339 23 949

hexahedrons 196608

64 7,9

68 207 798

21 999

slide-36
SLIDE 36

36 36

Strong scaling

partitioning of the hexahedral mesh with 1.47 · 107 cells into 1024 subdomains

parallel incremental algorithm

  • f graph partitioning

parallel geometric algorithm

  • f mesh partitioning
slide-37
SLIDE 37

37 37 37 37

Results

1.

Algorithms for parallel decomposition

  • f

large computational meshes (up to 109 elements) were devised: parallel incremental algorithm of graph partitioning and parallel geometric algorithm of mesh partitioning.

2.

A partitioning tool GridSpiderPar was developed.

3.

Different partitions into microdomains, microdomain graph partitions and partitions into subdomains of several meshes (108 vertices, 109 elements) obtained by means of the partitioning tool GridSpiderPar and the packages ParMETIS, Zoltan and PT-Scotch were

  • compared. The results revealed advantages of the

devised algorithms in the quality of the partitions.

slide-38
SLIDE 38

38 38 38 38

Results

4.

GridSpiderPar, ParMETIS, Zoltan, and PT-Scotch were compared via gas-dynamic problem simulations. Test studies demonstrate efficiency of the developed algorithms.

5.

Testing of microdomain graph partitions on the near- earth explosion simulation problem revealed the potential of using this strategy for simulations.

slide-39
SLIDE 39

39 39 39 39

Thank You! Thank You!