Solution of dynamic solid deformation using hybrid parallelization with MPI and OpenMP
- MSc. Miguel Vargas-Félix
ISUM 2012
1/24
Solution of dynamic solid deformation using hybrid parallelization - - PowerPoint PPT Presentation
Solution of dynamic solid deformation using hybrid parallelization with MPI and OpenMP MSc. Miguel Vargas-Flix ISUM 2012 1/24 Problem description Problem description We want to solve large scale dynamic problems with linear deformation
1/24
Problem description
2/24
Schur substructuring method
Γf
Γd Ω
i
j Finite element domain (left), domain discretization (center), partitioning (right).
3/24
Schur substructuring method
K 1
II
K 1
IB
K 2
II
K 2
IB
K 3
II
K 3
IB
K
BB
K 2
IB
(
II
IB
II
IB
II
IB
BI
BI
BI
BB)
Figure 1. Partitioning example.
K1
II
K1
IB
K2
II
K2
IB
K3
II
K3
IB
⋮ ⋱ ⋮ K p
II
K p
IB
K1
BI
K2
BI
K3
BI
⋯ K p
BI
K
BB)(
d1
I
d2
I
d3
I
⋮ d p
I
d
B)
=( f 1
I
f 2
I
f 3
I
⋮ f p
I
f
B)
4/24
Schur substructuring method
Ki
II
Ki
IB
Ki
BI
K
BB)(
di
I
d
B)=(
f i
I
f
B), i=1… p.
I as
I=(Ki II) −1(f i I−Ki IBd B).
BB−∑ i=1 p
BI(Ki II) −1Ki IB)d B=f B−∑ i=1 p
BI(Ki II) −1f i I.
I with (3).
BB=Ki BI(Ki II) −1Ki IB, to calculate it [Sori00], we proceed column by column using an
II t=[Ki IB]c,
IB]c are null. Next we can complete Ki BB with,
BB]c=Ki BIt.
5/24
Schur substructuring method
B=Ki BI(Ki II) −1f i I, in this case only one system has to be solved
Ki
IIt=f i I,
B=Ki BI t.
BB and ̄
B holds the contribution of each partition to (4), this can be written as
BB−∑ i=1 p
̄ Ki
BB)d B=f B−∑ i=1 p
̄ f i
B,
II is sparse and has to be solved many times in (5), a efficient way to proceed is to use a
BB are not. To solve this system of equations an sparse version
p
̄ Ki
BB) is not assembled, but
6/24
Matrix storage
A that will contain the non-zero values of
A with their respective column indexes. For example a matrix A and its
8 4
1 2
1 3
3 4
2 1
1 3
7
5
9 3
2 3
1
6
5
6
A=(9,3,1)
A=(2,3,6)
A∣ or by ∣j i A∣. Therefore the q th non zero value of the
A)q and the index of this value as ( j i A)q, with q=1,…,∣vi A∣.
7/24
Cholesky factorization for sparse matrices
k =1 j−1
k =1 j−1
2 .
8/24
Cholesky factorization for sparse matrices
L=
9/24
Cholesky factorization for sparse matrices
L'=
10/24
Cholesky factorization for sparse matrices
r p ← r p∪{ j} end_for
A=( a11 a12 a16 a21 a22 a23 a24 a32 a33 a35 a42 a44 a53 a55 a56 a61 a65 a66) a2={3,4} L=( l 11 l 21 l 22 l 32 l 33 l 42 l 43 l 44 l 53 l 54 l55 l 61 l 62 l 63 l 64 l65 l 66)
l 2={3,4,6}
11/24
Cholesky factorization for sparse matrices
k ∈(J (i)∩J ( j )) k < j
k ∈J ( j ) k < j
L j k
2
Core 1 Core 2 Core N
12/24
Cholesky factorization for sparse matrices
Equations nnz(A) nnz(L) Cholesky [s] CGJ [s] 1,006 6,140 14,722 0.086 0.081 3,110 20,112 62,363 0.137 0.103 10,014 67,052 265,566 0.309 0.184 31,615 215,807 1’059,714 1.008 0.454 102,233 705,689 4’162,084 3.810 2.891 312,248 2’168,286 14’697,188 15.819 19.165 909,540 6’336,942 48’748,327 69.353 89.660 3’105,275 21’681,667 188’982,798 409.365 543.110 10’757,887 75’202,303 743’643,820 2,780.734 3,386.609
1,000 10,000 100,000 1,000,000 10,000,000 1 10 100 1,000 10,000 0.1 0.1 0.2 0.5 2.9 19.2 89.7 543.1 3,386.6 0.1 0.1 0.3 1.0 3.8 15.8 69.4 409.4 2,780.7 Cholesky CGJ
Equations Time [s]
1,000 10,000 100,000 1,000,000 10,000,000 1E+5 1E+6 1E+7 1E+8 1E+9 1E+10 417,838 1,314,142 4,276,214 13,581,143 44,071,337 134,859,928 393,243,516 1,343,496,475 4,656,139,711 707,464 2,632,403 10,168,743 38,186,672 145,330,127 512,535,099 1,747,287,767 7,134,437,212 32,703,892,477 Cholesky CGJ
Equations Memory [bytes]
13/24
Numerical experiment, building deformation
Substructuration of the domain (left) resulting deformation (right) 14/24
Numerical experiment, building deformation Number of processes Partitioning time [s] Inversion time (Cholesky) [s] Schur complement time (CG) [s] CG steps Total time [s] 14 47.6 18520.8 4444.5 6927 23025.0 28 45.7 6269.5 2444.5 8119 8771.6 56 44.1 2257.1 2296.3 9627 4608.9
14 28 56 5000 10000 15000 20000 Schur complement time (CG) [s] Inversion time (Cholesky) [s] Partitioning time [s]
Number of processes Time [s]
14 28 56 10 20 30 40 50 60 70 80 Slave processes [GB]
Number of processes Memory [Giga bytes]
Number of processes Master process [GB] Slave processes [GB] Total memory [GB] 14 1.89 73.00 74.89 28 1.43 67.88 69.32 56 1.43 62.97 64.41 15/24
Larger systems of equations
1°C 2°C 3°C 4°C
16/24
Larger systems of equations
25,010,001 50,027,329 75,012,921 100,020,001 125,014,761 150,038,001 50 100 150 200 250 300 Schur complement time (CG) [min] Inversion time (Cholesky) [min] Partitioning time [min] Total time [min]
Number of equations Time [min]
Equations Partitioning time [min] Inversion time (Cholesky) [min] Schur complement time (CG) [min] CG steps Total time [min] 25,010,001 6.2 17.3 4.7 872.0 29.4 50,027,329 13.3 43.7 6.3 1012.0 65.4 75,012,921 20.6 80.2 4.3 1136.0 108.3 100,020,001 28.5 115.1 5.4 1225.0 152.9 125,014,761 38.3 173.5 7.5 1329.0 224.2 150,038,001 49.3 224.1 8.9 1362.0 288.5 17/24
Larger systems of equations
25,010,001 50,027,329 75,012,921 100,020,001 125,014,761 150,038,001 50 100 150 200 250 300 350 Slave processes [GB] Master process [GB]
Number of equations Memory [Giga bytes]
Equations Master process [GB] Average slave processes [GB] Slave processes [GB] Total memory [GB] 25,010,001 4.05 0.41 47.74 51.79 50,027,329 8.10 0.87 101.21 109.31 75,012,921 12.15 1.37 158.54 170.68 100,020,001 16.20 1.88 217.51 233.71 125,014,761 20.25 2.38 276.04 296.29 150,038,001 24.30 2.92 338.29 362.60 18/24
Dynamic problem formulation
2
2/4 and γ=(1−2α)/2.
19/24
Infante Henrique bridge over the Douro river, Portugal
20/24
Infante Henrique bridge over the Douro river, Portugal
Nodes 332,462 Elements 1’381,944 Element type Tetrahedron Time steps 372 HHT alpha factor Rayleigh damping a 0.5 Rayleigh damping b 0.5 Degrees of freedom 997,386 nnz(K) 38’302,119 Partitioning time 32.9 s Factorization time 87.3 s Time per step (CGJ) 132.9 s Total time 13.7 h 21/24
Thank you!Questions?
22/24
References
23/24
References
24/24