Parallel Iterative Poisson Solver for a Distributed Memory - - PowerPoint PPT Presentation

parallel iterative poisson solver for a distributed
SMART_READER_LITE
LIVE PREVIEW

Parallel Iterative Poisson Solver for a Distributed Memory - - PowerPoint PPT Presentation

1 Parallel Iterative Poisson Solver for a Distributed Memory Architecture Eric Dow Aerospace Computational Design Lab Department of Aeronautics and Astronautics 2 Motivation Solving Poissons Equation is a common sub - problem in many


slide-1
SLIDE 1

Parallel Iterative Poisson Solver for a Distributed Memory Architecture

Eric Dow Aerospace Computational Design Lab Department of Aeronautics and Astronautics

1

slide-2
SLIDE 2

Motivation

  • Solving Poisson’s Equation is a common sub-

problem in many numerical schemes, notably the solution of the incompressible Navier-Stokes equations.

  • This step is typically the most expensive of any

iterative method, so an efficient Poisson solver is essential.

2

slide-3
SLIDE 3

Problem Description

  • Poisson’s equation on any arbitrary geometry

with homogeneous Dirichlet boundary conditions.

3

         x u x f u

2

  

slide-4
SLIDE 4

Iterative Solution Techniques

  • In 2D, Poisson’s equation can be discretized with Finite Differences:
  • This suggests the following iterative scheme, known as the Jacobi Iterative

Method:

  • This is rather slow to converge, and can be made faster by using the

updated values of the solution as soon as they are available (Gauss-Seidel Method): 4

j i j i j i j i j i j i

f x u u u u u

, 2 1 , 1 , , , 1 , 1

4      

   

 

j i n n n n n

f x u u u u u

j i j i j i j i j i

, 2 1

1 , 1 , , 1 , 1 ,

4 1      

   

 

j i n n n n n

f x u u u u u

j i j i j i j i j i

, 2 1 1 1

1 , 1 , , 1 , 1 ,

4 1      

   

  

slide-5
SLIDE 5

Iterative Solution Techniques

  • For very large problems (especially in 3D), a

direct solve is impractical.

  • A desired level of accuracy can be attained with

an iterative solver: simply stop iterating when a desired level of accuracy is achieved. This is not possible with direct solution techniques such as LU.  Potential to save a great deal of computational effort

5

slide-6
SLIDE 6

Parallelization

6

  • Jacobi method seems like a poor choice relative to the Gauss-Seidel

method: ▫ Slower to converge ▫ Requires twice as much storage

  • However, Parallelization of the Jacobi method is straight forward.

▫ Inherent Data Parallelism: The same operations are performed

  • n each grid point, so it makes sense to distribute the data among

processes. ▫ All values can be updated contemporaneously.

  • We need to be more clever with the Gauss-Seidel method…
slide-7
SLIDE 7

Red Black Node Ordering

  • If the sum of the row and column index of a node is even, the node

is colored red, otherwise the node is colored black. ▫ Update all of the red nodes in parallel using the values at black nodes. ▫ Update all of the black nodes in parallel using the values at red nodes.

  •  Restores Data Parallelism

7

slide-8
SLIDE 8

Distributing the Data

  • Spectral Graph Partitioning: Recursively divide the domain into (roughly)

equal pieces. 8

slide-9
SLIDE 9

Distributing the Data

  • This scheme does not result in an optimal

partition, i.e. one that creates partitions of equal size while minimizing the number of edge cuts.

  • Result: Large variation in size of boundary

between subdomains. ▫ This creates a communication bottleneck, and some processes will be waiting on others to finish communication.

9

slide-10
SLIDE 10

Implementation

  • Serial and parallel solvers implemented in C, MPI used

for parallelization. ▫ Each process is given a collection of nodes to update ▫ At the end of each iteration, each thread sends and receives values needed for next iteration ▫ Call to MPI_Barrier required at the end of each communication block to prevent faster processes from racing ahead

  • Solvers run on Beowulf cluster – 1, 2 and 4 nodes

10

slide-11
SLIDE 11

Results

  • Serial and parallel codes agree on steady state

solution.

11

slide-12
SLIDE 12

Results: Jacobi Method

12 75 x 75 Nodes 150 x 150 Nodes

slide-13
SLIDE 13

Results: Gauss-Seidel Method

13 75 x 75 Nodes 150 x 150 Nodes

slide-14
SLIDE 14

Conclusions

  • Speedup highly dependent on problem size.

▫ Doubling the number of grid points in each dimension from 75 to 150 quadruples the workload of each process, but only doubles the amount of communication required. This explains the speedup observed. ▫ This is actually good news: typically only use iterative solvers for very large problems. Since the speedup seems to increase with problem size, it makes sense to parallelize these solvers.

  • Jacobi outperforms Gauss-Seidel in parallel performance due to

limited communication.

14

slide-15
SLIDE 15

Future Work

  • Multigrid: Very efficient iterative solution technique

built around basic iterative solvers such as Jacobi and Gauss-Seidel.

  • Parallel component is already in place, simply integrate

parallel solvers in to create a parallel Multigrid method.

  • Integrate graph partitioning scheme into solver

(currently a collection of separate MATLAB functions).

  • Speedup ceiling: Is there a maximum attainable speedup

as the problem size increases (other than the obvious ideal one-to-one speedup)?

15

slide-16
SLIDE 16

References

[1] G. Strang, Computational Science and

  • Engineering. Wellesley, MA:Wellesley-

Cambridge Press, 2007.

16