Fourier-pseudospectral method for Cahn-Hilliard Equation on GPU - - PowerPoint PPT Presentation

fourier pseudospectral method for cahn hilliard equation
SMART_READER_LITE
LIVE PREVIEW

Fourier-pseudospectral method for Cahn-Hilliard Equation on GPU - - PowerPoint PPT Presentation

Fourier-pseudospectral method for Cahn-Hilliard Equation on GPU Kangping Zhu Courant Institute of Mathematical Sciences kangping@cims.nyu.edu December 18, 2012 Kangping Zhu (CIMS) CH equation on GPU December 18, 2012 1 / 16 Background


slide-1
SLIDE 1

Fourier-pseudospectral method for Cahn-Hilliard Equation on GPU

Kangping Zhu

Courant Institute of Mathematical Sciences kangping@cims.nyu.edu

December 18, 2012

Kangping Zhu (CIMS) CH equation on GPU December 18, 2012 1 / 16

slide-2
SLIDE 2

Background

Cahn-Hilliard & Allen-Cahn Equation

Modelling phase separation of binary fluid or material during polymer formation and etc. It can be viewed as gradient descent of Modica-Mortola energy under different Soblev norm. Eǫ(u) = ǫ 2|∇u|2 + (u2 − 1)2 ǫ (1) ∂uǫ ∂t = −(−∆)α(−∆u + u3 − u) (2)

Kangping Zhu (CIMS) CH equation on GPU December 18, 2012 2 / 16

slide-3
SLIDE 3

Motivation

Cahn-Hilliard Equation

Coarsening rate of the flow. i.e. How fast or slow will the binary polymer formation finish? For Cahn-Hilliard equation. Length scale will behave like t

1 3 .

For Allen-Cahn equation maybe t

1 2 ? Kangping Zhu (CIMS) CH equation on GPU December 18, 2012 3 / 16

slide-4
SLIDE 4

Numerical Project

Fractional Cahn-Hilliard Equation

Consider Fractional Cahn-Hilliard equation i.e fractional α α = 1

2 means binary separation on 2D surface of 3D material.

α = 1

2 is critical point that behaviour of the PDE changed.

This numerical project is aiming to find the right time scale for α < 1

2.

Kangping Zhu (CIMS) CH equation on GPU December 18, 2012 4 / 16

slide-5
SLIDE 5

Numerical Method

Pseudo-spectral Method

Fraction Laplacian is easy to deal with in Frouier Space Fourier transform on both side of equation then use inverse Fourier transform Use implicit time stepping to get accuracy and stability. Use Pre-conditioned conjugate gradient method to solve each time step

Kangping Zhu (CIMS) CH equation on GPU December 18, 2012 5 / 16

slide-6
SLIDE 6

Initial time

Kangping Zhu (CIMS) CH equation on GPU December 18, 2012 6 / 16

slide-7
SLIDE 7

Later

Kangping Zhu (CIMS) CH equation on GPU December 18, 2012 7 / 16

slide-8
SLIDE 8

Down to earth FINALLY

Numerical task in the project

’FAST!!’ Fourier Transform Why? 105 conjugated gradient step in total,10 FFT each step. Reduction Matrix entry-wise multiplication(solve by customizing FFT kernel to hide the calculation)

Kangping Zhu (CIMS) CH equation on GPU December 18, 2012 8 / 16

slide-9
SLIDE 9

Demo

Demo

Let’s see a simple demo of my several version of FFT

Kangping Zhu (CIMS) CH equation on GPU December 18, 2012 9 / 16

slide-10
SLIDE 10

Demo

Demo

Let’s see a simple demo of my several version of FFT BTW CuFFT achieves over 300GFLOPs on Tesla Fermi.

Kangping Zhu (CIMS) CH equation on GPU December 18, 2012 10 / 16

slide-11
SLIDE 11

Approach to get a FAST FFT

Sequential 1DFFT

Higher Radix usually gives better result.

  • 1. Better complexity, but not much. up to 25% better than original

Cooley-Tukey Radix-4

  • 2. Better memory access pattern, less data transfer more real work!

My implementation of Radix-2 to Radix-4 to Radix-8 each gives me a factor of 2 speed up FFTW usually use radix-16 or radix-32 depends on problem size

  • PS. UNROLL the loops

Kangping Zhu (CIMS) CH equation on GPU December 18, 2012 11 / 16

slide-12
SLIDE 12

Approach to get a FAST FFT

GPU 1DFFT

  • 1. Higher Radix is even better. More work in between memory access.
  • 2. Separate first pass and later pass. factor of 2
  • 3. Exchange data between local work items when possible 30% speed up
  • 4. Put first three pass in a single kernel. (64 point FFT using local sync)

However, this means we have to use hierarchy FFT.

  • PS. Better to generate the code automatically?

Kangping Zhu (CIMS) CH equation on GPU December 18, 2012 12 / 16

slide-13
SLIDE 13

Approach to get a FAST FFT

GPU 2DFFT

  • 1. Don’t follow the TEXTBOOK!
  • 2. Multiple different kernels in one for loop is bad
  • 3. Put everything in one kernel if possible. i.e write 2D FFT kernel (Don’t

be lazy)

Kangping Zhu (CIMS) CH equation on GPU December 18, 2012 13 / 16

slide-14
SLIDE 14

Approach to get a FAST FFT

Leftover & future work

  • 1. 3D matrix transposition(Hierarchy FFT)
  • 2. Complex multiplication on GPU.
  • 3. For special size matrix, better hierarchy separation.

Kangping Zhu (CIMS) CH equation on GPU December 18, 2012 14 / 16

slide-15
SLIDE 15

References

Brian Wetton et al. (2012) High accuracy solutions to energy gradient flows from material science models Journal of Computational Physics submitted. Naga Govindaraju et al. (2008) High performance discrete Fourier Transforms on Graphics Processors Supercomputing

Kangping Zhu (CIMS) CH equation on GPU December 18, 2012 15 / 16

slide-16
SLIDE 16

The End

Kangping Zhu (CIMS) CH equation on GPU December 18, 2012 16 / 16