Multigrid for Poissons Equation Prof. Richard Vuduc Georgia - - PowerPoint PPT Presentation

multigrid for poisson s equation
SMART_READER_LITE
LIVE PREVIEW

Multigrid for Poissons Equation Prof. Richard Vuduc Georgia - - PowerPoint PPT Presentation

Multigrid for Poissons Equation Prof. Richard Vuduc Georgia Institute of Technology CSE/CS 8803 PNA, Spring 2008 [L.12] Thursday, February 14, 2008 1 Sources for todays material CS 267 (Yelick & Demmel, UCB) Applied Numerical Linear


slide-1
SLIDE 1

Multigrid for Poisson’s Equation

  • Prof. Richard Vuduc

Georgia Institute of Technology CSE/CS 8803 PNA, Spring 2008 [L.12] Thursday, February 14, 2008

1

slide-2
SLIDE 2

Sources for today’s material

CS 267 (Yelick & Demmel, UCB) Applied Numerical Linear Algebra, by Demmel Heath (UIUC)

2

slide-3
SLIDE 3

Review: Parallel FFT

3

slide-4
SLIDE 4

Exploiting structure to obtain fast algorithms for 2-D Poisson

Dense LU: Assume no structure ⇒ O(n6) Sparse LU: Sparsity ⇒ O(n3), need extra memory CG: Symmetric positive definite ⇒ O(n3), a little extra memory RB SOR: Fixed sparsity pattern ⇒ O(n3), no extra memory FFT: Eigendecomposition ⇒ O(n2 log n)

4

slide-5
SLIDE 5

Fast Fourier transform algorithm

FFT(x, ω, m) if m == 1 then return x0 else xeven ← FFT(xeven, ω2, m 2 ) xodd ← FFT(xodd, ω2, m 2 ) w ←

  • w0, w1, . . . , ω

m 2 −1

⇐ = precomputed return

  • xeven + (w. ∗ xodd), xeven − (w. ∗ xodd)
  • 5
slide-6
SLIDE 6

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

FFT with transpose (p=4) log (m/p): No comm log p: No comm

6

slide-7
SLIDE 7

Algorithm Serial PRAM Memory # procs Dense LU Band LU Jacobi Explicit inverse Sparse LU

  • Conj. grad.

RB SOR FFT Multigrid Lower bound N3 N N2 N2 N2 (N7/3) N N3/2 (N5/3) N (N4/3) N2 (N5/3) N (N2/3) N N N2 log N N2 N2 N3/2 (N2) N1/2 N log N (N4/3) N N3/2 (N4/3) N1/2(1/3) log N N N N3/2 (N4/3) N1/2 (N1/3) N N N log N log N N N N log2 N N N N log N N Algorithms for 2-D (3-D) Poisson, N=n2 (=n3) PRAM = idealized parallel model with zero communication cost. Source: Demmel (1997)

7

slide-8
SLIDE 8

Recall: Discretizing 2-D Poisson

Graph and stencil

3 6 9 2 5 8 1 4 7 2 4 5 6 8

              4 −1 −1 −1 4 −1 −1 −1 4 −1 −1 4 −1 −1 −1 −1 4 −1 −1 −1 −1 4 −1 −1 4 −1 −1 −1 4 −1 −1 −1 4              

4uij − ui+1,j − ui−1,j − ui,j+1 − ui,j−1 = h2fi,j

8

slide-9
SLIDE 9

Recall: Jacobi’s method

Rearrange terms in (2-D) Poisson:

ui,j = 1 4

  • ui−1,j + ui+1,j + ui,j−1 + ui,j+1 + h2fi,j
  • For each (i, j), iteratively update (weighted averaging):

Motivation: Match solution to discrete Poisson exactly at nodes.

u(t+1)

i,j

= 1 4

  • u(t)

i−1,j + u(t) i+1,j + u(t) i,j−1 + u(t) i,j+1 + h2fi,j

  • 9
slide-10
SLIDE 10

Problem: Slow convergence

RHS True solution Best possible 5-step 5 steps of Jacobi

10

slide-11
SLIDE 11

Recall: Convergence of Jacobi’s method

Converges in O(N=n2) steps, so serial complexity is O(N2). Define error at each step as: For Jacobi, can show:

ǫt

  • i,j

(ut

i,j − ui,j)2

ǫt ≤

  • cos

π n + 1 t ǫ0

n→∞

  • 1 − π2

4 · t n2

  • ǫ0

11

slide-12
SLIDE 12

Consider Jacobi in 1-D

h2fi = −ui−1 + 2ui − ui+1 u(t+1)

i

= 1 2

  • u(t)

i−1 + u(t) i+1

  • + h2

2 fi ⇓ u(t+1) =

  • I − 1

2Tn

  • ≡R

·u(t) + h2 2 f ≡ R · u(t) + c

12

slide-13
SLIDE 13

Closer look at 1-D Jacobi’s convergence

Consider total error at each step

ǫ(t) ≡ u(t) − u = R · ǫ(t−1) = Rt · e0

13

slide-14
SLIDE 14

Closer look at 1-D Jacobi’s convergence

Consider slightly broader class of weighted Jacobi methods

Tnu = h2f ≡ c Rw ≡ I − w 2 Tn u(t+1) = Rwu(t) + w 2 c ⇓ ǫ(t) = Rt

wǫ(0)

14

slide-15
SLIDE 15

1-D Poisson: Eigendecomposition of Tn

Lemma: Eigenvalues and eigenvectors of Tn are

= ⇒ Tn = ZΛZT ZZT = I

Tnzj = λjzj λj = 2

  • 1 − cos

πj n + 1

  • zj(k)

=

  • 2

n + 1 sin πjk n + 1

15

slide-16
SLIDE 16

Error “frequencies”

ǫ(t) = Rt

w · ǫ(0)

= (I − w 2 ZΛZT )t · ǫ(0) = Z

  • I − w

2 Λ t ZT · ǫ(0) ⇓ ZT · ǫ(t) =

  • I − w

2 Λ t ZT · ǫ(0)

  • ZT · ǫ(t)

j

=

  • I − w

2 Λ t

jj

  • ZT · ǫ(0)

j

16

slide-17
SLIDE 17
  • 1.00
  • 0.75
  • 0.50
  • 0.25

0.25 0.50 0.75 1.00

Spectrum of R_w

w=1/2 w=2/3 w=1 For 1-D discrete Poisson, with n = 99 j=n/2

17

slide-18
SLIDE 18

Initial error “Rough” Lots of high frequency components Norm = 1.65 Error after 1 weighted Jacobi step “Smoother” Less high frequency component Norm = 1.06 Error after 2 weighted Jacobi steps “Smooth” Little high frequency component Norm = .92, won’t decrease much more

18

slide-19
SLIDE 19

Faster information propagation?

“Multigrid” idea

Approximate problem on fine grid by a coarser grid Solve the coarse grid problem approximately Interpolate coarse grid solution onto fine grid, as an initial guess Recursively solve coarse grid problems

To work, coarse grid solution must be a good approximation to fine grid

19

slide-20
SLIDE 20

Same idea applies elsewhere

Multilevel graph partitioning (METIS)

Coarsen graph via maximal independent set problem Partition coarse graph, refine using Kernighan-Lin

Barnes-Hut and fast multipole method for, say, gravity problems

Approximate particles in a region by total mass, center of gravity Divide regions recursively

20

slide-21
SLIDE 21

Divide-and-conquer in multigrid

Spatial domain

Get an initial solution for an n×n grid by solving approximately on n/2×n/2 grid Recurse

Frequency domain

Think of error as sum of eigenvectors, e.g., sine-curves of different frequencies Solving on a particular grid “smooths” (dampens) high-frequency error

21

slide-22
SLIDE 22

“Multigrids” in 1-D

P (3) P (2) P (1)

P (i) = Problem on 2i + 1 grid T (i)x(i) = b(i)

22

slide-23
SLIDE 23

“Multigrids” in 2-D

P (3)

P (i) = Problem on

  • 2i + 1
  • ×
  • 2i + 1
  • grid

T (i)x(i) = b(i)

P (1) P (2)

23

slide-24
SLIDE 24

Multigrid operators

Restrict(3) Restrict(2) Interpolate(2) Interpolate(1)

P (3) P (2) P (1) P (3) P (2) P (1)

24

slide-25
SLIDE 25

Multigrid operators

P (i) : x(i) ≡ b(i) ≡ R(i) : b(i−1) ← R(i) b(i) L(i) : x(i) ← L(i−1) x(i−1) S(i) : x(i) improved ← S(i) b(i), x(i)

Current estimated solution Right-hand side ⇐ Restriction ⇐ Interpolation ⇑ Solution operator (smoother)

25

slide-26
SLIDE 26

Multigrid “V-cycle”: Building block for the full multigrid algorithm

⇐ Returns improved x(i) for T(i)x(i) = b(i)

MGV

  • b(i), x(i)

if i == 1 then return exact solution of P (1) else x(i) ← S(i) b(i), x(i) r(i) ← T (i) · x(i) − b(i) d(i) ← L(i−1) MGV

  • R(i)

r(i) , 0

  • x(i) ← x(i) − d(i)

x(i) ← S(i) b(i), x(i) return x(i)

i time

5 4 3 2 1

Calls & returns

26

slide-27
SLIDE 27

Multigrid V-cycle

⇐ Returns improved x(i) for T(i)x(i) = b(i) Base case: 1 unknown

MGV

  • b(i), x(i)

if i == 1 then return exact solution of P (1) else x(i) ← S(i) b(i), x(i) r(i) ← T (i) · x(i) − b(i) d(i) ← L(i−1) MGV

  • R(i)

r(i) , 0

  • x(i) ← x(i) − d(i)

x(i) ← S(i) b(i), x(i) return x(i)

27

slide-28
SLIDE 28

Multigrid V-cycle

⇐ Returns improved x(i) for T(i)x(i) = b(i) Base case: 1 unknown Smooth (damps high-freq error)

MGV

  • b(i), x(i)

if i == 1 then return exact solution of P (1) else x(i) ← S(i) b(i), x(i) r(i) ← T (i) · x(i) − b(i) d(i) ← L(i−1) MGV

  • R(i)

r(i) , 0

  • x(i) ← x(i) − d(i)

x(i) ← S(i) b(i), x(i) return x(i)

28

slide-29
SLIDE 29

Multigrid V-cycle

⇐ Returns improved x(i) for T(i)x(i) = b(i) Base case: 1 unknown Smooths (damps high-freq error) Computes residual Recursively solves Corrects fine-grid solution

MGV

  • b(i), x(i)

if i == 1 then return exact solution of P (1) else x(i) ← S(i) b(i), x(i) r(i) ← T (i) · x(i) − b(i) d(i) ← L(i−1) MGV

  • R(i)

r(i) , 0

  • x(i) ← x(i) − d(i)

x(i) ← S(i) b(i), x(i) return x(i)

29

slide-30
SLIDE 30

Multigrid V-cycle

⇐ Returns improved x(i) for T(i)x(i) = b(i) Base case: 1 unknown Smooths (damps high-freq error) Computes residual Smooth again Recursively solves Corrects fine-grid solution

MGV

  • b(i), x(i)

if i == 1 then return exact solution of P (1) else x(i) ← S(i) b(i), x(i) r(i) ← T (i) · x(i) − b(i) d(i) ← L(i−1) MGV

  • R(i)

r(i) , 0

  • x(i) ← x(i) − d(i)

x(i) ← S(i) b(i), x(i) return x(i)

30

slide-31
SLIDE 31

Multigrid V-cycle

⇐ Returns improved x(i) for T(i)x(i) = b(i)

MGV

  • b(i), x(i)

if i == 1 then return exact solution of P (1) else x(i) ← S(i) b(i), x(i) r(i) ← T (i) · x(i) − b(i) d(i) ← L(i−1) MGV

  • R(i)

r(i) , 0

  • x(i) ← x(i) − d(i)

x(i) ← S(i) b(i), x(i) return x(i)

i time

5 4 3 2 1

Calls & returns

31

slide-32
SLIDE 32

Multigrid V-cycle: Correction step

T (i)d(i) = r(i) = T (i)x(i) − b(i) = ⇒ b(i) = T (i) x(i) − d(i)

Justification:

. . . r(i) ← T (i) · x(i) − b(i) d(i) ← L(i−1) MGV

  • R(i)

r(i) , 0

  • x(i) ← x(i) − d(i)

. . .

32

slide-33
SLIDE 33

Multigrid V-cycle: Complexity

i time

5 4 3 2 1

Calls & returns C(i) =

  • O
  • 4i

+ C(i − 1) i > 1 O(1) i = 1

  • C(m)

=

m

  • i=1

O(4i) = O(4m) = O(no. of unknowns) Serial complexity (2-D Poisson):

33

slide-34
SLIDE 34

Full multigrid algorithm

FMG

  • b(k), x(k)

x(1) ← Exact solution to P (1) for i = 2 to k do x(i) ← MGV

  • b(i), L(i−1)

x(i−1)

k=2 3 4 5

34

slide-35
SLIDE 35

Full multigrid algorithm: Complexity

k=2 3 4 5

C(m) =

m

  • k=1

O(4k) = O(4m) = O(no. of unknowns) PRAM =

m

  • k=1

O(k) = O(m2) = O(log2(unknowns))

35

slide-36
SLIDE 36

Full multigrid: Convergence

Theorem: After one FMG call,

error after <= .5 * (error before) Independent of no. of unknowns (!)

Corollary: Can make error < any fixed tolerance in a fixed no. of steps, independent of grid size (!)

36

slide-37
SLIDE 37

Administrivia

37

slide-38
SLIDE 38

Two joint classes with CS 8803 SC

Tues 2/19: Floating-point issues in parallel computing by me Tues 2/26: GPGPUs by Prof. Hyesoon Kim Both classes meet in Klaus 1116E

38

slide-39
SLIDE 39

Administrative stuff

New room (dumpier, but cozier?): College of Computing Building (CCB) 101 Accounts: Apparently, you already have them Front-end login node: ccil.cc.gatech.edu (CoC Unix account)

We “own” warp43—warp56 Some docs (MPI): http://www-static.cc.gatech.edu/projects/ihpcl/mpi.html Sign-up for mailing list: https://mailman.cc.gatech.edu/mailman/listinfo/ihpc-lab

39

slide-40
SLIDE 40

Homework 1: Parallel conjugate gradients

Implement a parallel solver for Ax = b (serial C version provided)

Evaluate on three matrices: 27-pt stencil, and two application matrices “Simplified:” No preconditioning Bonus: Reorder, precondition

Performance models to understand scalability of your implementation

Make measurements Build predictive models

Collaboration encouraged: Compare programming models or platforms

40

slide-41
SLIDE 41

Some details on multigrid operators

41

slide-42
SLIDE 42

Solution operator (smoother), S(i): Weighted Jacobi

xj ← 1 2 (xj−1 + xj+1 + bj) xj ← 1 3 (xj−1 + xj + xj+1 + bj)

Recall: Jacobi Weighted Jacobi

42

slide-43
SLIDE 43

Initial error “Rough” Lots of high frequency components Norm = 1.65 Error after 1 weighted Jacobi step “Smoother” Less high frequency component Norm = 1.06 Error after 2 weighted Jacobi steps “Smooth” Little high frequency component Norm = .92, won’t decrease much more

43

slide-44
SLIDE 44

Restriction operator, R(i)

x(c)

j

← 1 4x(f)

j−1 + 1

2x(f)

j

+ 1 4x(f)

j+1

In 2-D: Average with all 8 neighbors

P (c) P (f) R(f)

44

slide-45
SLIDE 45

45

slide-46
SLIDE 46

Interpolation operator, L(i)

In 2-D: Average with all 8 neighbors

P (c) P (f) L(c)

x(f)

j

=    x(c)

j

if ∃ x(c)

j 1 2

  • x(c)

left(j) + x(c) right(j)

  • therwise

  

46

slide-47
SLIDE 47

47

slide-48
SLIDE 48

Convergence (1-D)

48

slide-49
SLIDE 49

Convergence (2-D)

49

slide-50
SLIDE 50

Parallelizing multigrid

50

slide-51
SLIDE 51

51

slide-52
SLIDE 52

Performance model (2-D)

Domain: 2m+1 × 2m+1 grid Processors: p = 4k, in a 2k × 2k grid Each processor owns 2m-k × 2m-k points Cost of one level j of V-cycle If j ≥ k: O(4j-k) flops + O(1)⋅α + O(2j-k)⋅β-1

If j < k: O(1) flops + O(1)⋅α + O(1)⋅β-1

Sum over j to get total V-cycle cost

52

slide-53
SLIDE 53

Parallel Poisson solvers compared

Flops Messages Words MG FFT SOR

  • N

p + log p log N N p + log p log N log2 N √p N p N log N p N

3 2

p √ N N p

53

slide-54
SLIDE 54

Multigrid on unstructured meshes

54

slide-55
SLIDE 55

Multigrid on an unstructured mesh

How to coarsen?

Can’t just pick every other point Maximal independent sets

How to restrict? Interpolate? How to “smooth?” How to handle coarsest meshes?

Switch to fewer processors? Switch to another solver?

55

slide-56
SLIDE 56

“In conclusion…”

56