Experiences using 2decomp&fft to solve Partial Differential - PowerPoint PPT Presentation

Experiences using 2decomp&fft to solve Partial Differential Equations using Fourier Spectral Methods Sudarshan Balakrishnan, Abdullah H. Bargash, Gong Chen, Brandon Cloutier, Ning Li, Dave Malicke, Benson Muite, Michael Quell, Paul Rigge, Damian San Rom´ an Alerigi Mamdouh Solimani, Andre Souza, Abdulaziz S.Thiban, Mark Van Moer, Jeremy West Tartu ¨ Ulikool benson.muite@ut.ee http://math.ut.ee/˜benson http: //en.wikibooks.org/wiki/Parallel_Spectral_Numerical_Methods

Outline Project Aim Fourier Series The Heat Equation The Allen Cahn Equation The Gray-Scott Equations Nonlinear Schr¨ odinger Equation Navier-Stokes Equation Maxwell’s Equations Possible Further Work

Project Aim Teaching tool for use in mathematics and computer science courses from 1st year undergraduate to postgraduate level Research tool: investigate partial differential equations, investigate computer performance Help paper reproducibility and verifiability Reduce coding time Rely on 2decomp&fft for MPI parallelization ( http://2decomp.org ) Material has been tested in a course on Multivariable calculus and on an introduction to partial differential equations

Fourier series: Separation of Variables 1 dy = y (1) dt dy = dt (2) y � dy � = dt (3) y ln y + a = t + b (4) e ln y + a e t + b = (5) e ln y e a e t e b = (6) e b e a e t y = (7) ce t . y ( t ) = (8)

Fourier series: Separation of Variables 2 u t = u xx (9) Suppose u = X ( x ) T ( t ) d 2 X d T d t ( t ) d x 2 ( x ) T ( t ) = X ( x ) = − C , (10) Solving each of these separately and then using linearity we get a general solution � � � α n exp ( − C n t ) sin ( C n x ) + β n exp ( − C n t ) cos ( C n x ) n (11)

Fourier series: Separation of Variables 3 How do we find a particular solution? Suppose u ( x , t = 0 ) = f ( x ) Suppose u ( 0 , t ) = u ( 2 π, t ) and u x ( 0 , t ) = u x ( 2 π, t ) then recall � 2 π � π m = n sin ( nx ) sin ( mx ) = m � = n , (12) 0 0 � 2 π � π m = n cos ( nx ) cos ( mx ) = m � = n , (13) 0 0 � 2 π cos ( nx ) sin ( mx ) = 0 . (14) 0

Fourier series: Separation of Variables 4 So if � f ( x ) = α n sin ( nx ) + β n cos ( nx ) . (15) n then � 2 π f ( x ) sin ( nx ) d x 0 α n = (16) � 2 π sin 2 ( nx ) d x 0 � 2 π f ( x ) cos ( nx ) d x 0 β n = . (17) � 2 π cos 2 ( nx ) d x 0 and � exp ( − n 2 t ) [ α n sin ( nx ) + β n cos ( nx )] u ( x , t ) = (18) n The Fast Fourier Transform allows one to find good approximations to α n and β n when the solution is found at a finite number of evenly spaced grid points

The 1D Heat Equation: Finding derivatives and timestepping Let � ˆ u ( x ) = u k exp ( ikx ) (19) k then d ν u � ( ik ) ν ˆ d x ν = u k . (20) Consider u t = u xx , which is approximated by ∂ ˆ u k α ( ik ) 2 ˆ = u k (21) ∂ t u n + 1 ˆ − ˆ u n k k α ( ik ) 2 ˆ u n + 1 = (22) k h u n + 1 ( 1 − α h ( ik ) 2 ) u n ˆ ˆ = (23) k k u n ˆ u n + 1 k ˆ = ( 1 − α h ( ik ) 2 ) . (24) k

The 2D Allen Cahn Equation Consider u t = ǫ ( u xx + u yy ) + u − u 3 , which is approximated by ∂ ˆ u � ( ik x ) 2 + ( ik y ) 2 � u − ˆ u + ˆ ˆ u 3 = ǫ (25) ∂ t u n + 1 − ˆ u n ˆ � ( ik x ) 2 + ( ik y ) 2 � ˆ ˆ u n + 1 + ˆ u n − ( u n ) 3 (26) = ǫ h

The 3D Gray-Scott Equations ∂ u = D u ∆ u + α ( 1 − u ) − uv 2 , (27) ∂ t ∂ v = D v ∆ v − β v + uv 2 . (28) ∂ t Solved using a splitting method (More information on splitting for this at http://arxiv.org/abs/1310.3901 ) http://web.student.tuwien.ac.at/˜e1226394/

The 2D nonlinear Schr¨ odinger Equation i ψ t + ψ xx + ψ yy = | ψ | 2 ψ Solved using Fast Fourier Transform and splitting

The 2D nonlinear Schr¨ odinger Equation Table: Computation times in seconds for 20 time steps of 10 − 5 for a Fourier split step method for the cubic nonlinear Schr¨ odinger equation on [ − 5 π, 5 π ] 2 . Grid GPU GPU GPU Xeon Phi CPU Size (Cuf) (Cuda) (OpenACC) (61 cores) (1 core) 256 2 0.00802 0.0116 0.0130 0.0122 0.442 512 2 0.0234 0.0315 0.0369 0.0291 1.94 1024 2 0.0851 0.105 0.132 0.118 12.7 2048 2 0.334 0.415 0.527 0.422 57.2 4096 2 1.49 2.02 2.30 1.626 329

The Real Cubic Klein-Gordon Equation u tt − ∆ u + u = | u | 2 u (29) � 1 2 | u t | 2 + 1 2 | u | 2 + 1 2 |∇ u | 2 − 1 4 | u | 4 d x E ( u , u t ) = (30) u n + 1 − 2 u n + u n − 1 − ∆ u n + 1 + 2 u n + u n − 1 + u n + 1 + 2 u n + u n − 1 ( δ t ) 2 4 4 (31) � 2 u n � u n � � = ± (32) Parallelization done using 2decomp library for FFT and processing independent loops Other time stepping algorithms possible, including splitting

Simulations and Videos by Brian Leu, Albert Liu, and Parth Sheth 10 2 Measured Ideal Computation Time (s) 10 1 10 0 10 4 10 5 10 6 Number of Cores Figure: Strong scaling on Mira for a 4096 3 discretization. http://www-personal.umich.edu/˜alberliu/ http://www-personal.umich.edu/˜brianleu/ http://www-personal.umich.edu/˜pssheth/

The 2D and 3D Navier Stokes Equation Consider incompressible case only � ∂ u � ∂ t + u · ∇ u = −∇ p + µ ∆ u (33) ρ ∇ · u = 0 . (34) p pressure, µ viscosity, ρ , density 2D u ( x , y ) = ( u ( x , y ) , v ( x , y )) 3D u ( x , y , z ) = ( u ( x , y , z ) , v ( x , y , z ) , w ( x , y , z ))

2D Vorticity-Streamfunction Formulation ω = ∇ × u = ∂ v ∂ x − ∂ u ∂ y = − ∆ ψ � ∂ω � ∂ t + u ∂ω ∂ x + v ∂ω ρ = µ ∆ ω (35) ∂ y and ∆ ψ = − ω. (36)

Time Discretization � ω n + 1 , k + 1 − ω n (37) ρ δ t u n + 1 , k ∂ω n + 1 , k + v n + 1 , k ∂ω n + 1 , k + u n ∂ω n ∂ x + v n ∂ω n + 1 � �� 2 ∂ x ∂ y ∂ y = µ � ω n + 1 , k + 1 + ω n � 2 ∆ , and ∆ ψ n + 1 , k + 1 = − ω n + 1 , k + 1 , (38) u n + 1 , k + 1 = ∂ψ n + 1 , k + 1 v n + 1 , k + 1 = − ∂ψ n + 1 , k + 1 , . (39) ∂ y ∂ x Fixed point iteration used to obtain nonlinear terms

Example Videos http://www-personal.umich.edu/˜cloutbra/ research.html Simulations on a single NVIDIA Fermi GPU about 20 times faster than a 16 core CPU

3D Equivalent Formulation Simplification of equation with periodic boundary conditions � ∂ u � ρ ∂ t + u · ∇ u = −∇ p + µ ∆ u (40) ∇ · u = 0 (41) so � ∂ u � �� ∇ · ∂ t + u · ∇ u = ∇ · [ −∇ p + µ ∆ u ] (42) ρ ρ ∇ · ( u · ∇ u ) = − ∆ p (43) p = ∆ − 1 [ ∇ · ( u · ∇ u )] (44) so � ∂ u ∆ − 1 [ ∇ · ( u · ∇ u )] � � � ρ ∂ t + u · ∇ u = − ρ ∇ + µ ∆ u (45)

3D Equivalent Formulation - Implicit Midpoint Time Discretization � u n + 1 , j + 1 − u n + u n + 1 , j + u n � u n + 1 , j + u n �� ρ · ∇ δ t 2 2 ( u n + 1 , j + u n ) · ∇ ( u n + 1 , j + u n ) ∆ − 1 � � � �� = ρ ∇ ∇ · 4 + µ ∆ u n + 1 , j + 1 + u n , 2 Video of Taylor Green Vortex http://vimeo.com/87981782

3D Equivalent Formulation - Carpenter-Kennedy Discretization 1: procedure R UNGE –K UTTA ( u ) h = 0 2: u = u n 3: for k = 1 → 5 do 4: h ← g ( u ) + β k h 5: µ = 0 . 5 δ t ( α k + 1 − α k ) 6: v − µ l ( v ) = u + γ k δ t h + µ l ( u ) 7: u ← v 8: end for 9: u n + 1 = u 10: 11: end procedure

Performance δ t = 0 . 005 for 512 3 and δ t = 0 . 01 for 256 3 grid points. For IMR scheme, fixed point iteration procedure was stopped once the difference between two successive iterates was less than 10 − 10 in l ∞ norm of velocity fields. Core Hours Method Grid Size Cores Time Steps Time (s) Timestep 256 3 IMR 512 1000 4060 0.578 512 3 IMR 1024 500 9899 5.68 512 3 CK 4096 2000 7040 4.0 Table: Performance of Fourier pseudospectral code on Shaheen. IMR is an abbreviation for implicit midpoint rule and CK is an abbreviation for Carpenter–Kennedy.

Kinetic Energy Evolution Kinetic Energy 0.13 0.12 0.11 Energy 0.1 IMR 256 0.09 IMR 512 CK 512 0.08 Reference 0 2 4 6 8 10 Time Figure: KE of solutions are so close they are almost indistinguishable

Kinetic Energy Dissipation Rate Kinetic Energy Dissipation Rate 0.014 IMR 256 0.012 IMR 512 Energy Dissipation Rate CK 512 0.01 Reference 0.008 0.006 0.004 0.002 0 0 2 4 6 8 10 Time Figure: Plot during the initial stage, where flow is essentially inviscid and laminar. Fully developed turbulent flow is observed around t max ≈ 8.

Kinetic Energy Dissipation Rate Kinetic Energy Dissipation Rate − 4 15x 10 IMR 256 − Reference Energy Dissipation Rate IMR 512 − Reference 10 CK 512 − Reference 5 0 − 5 0 2 4 6 8 10 Time Figure: Difference in kinetic energy dissipation rates between the current discretizations and the reference solution.

Vorticity Figure: Square of the vorticity in the plane centered at ( π, 0 , 0 ) with normal vector ( 1 , 0 , 0 ) .

Discrete energy equality for midpoint rule � T � u ( t = T ) � 2 l 2 − � u ( t = 0 ) � 2 �∇ u � 2 l 2 = − µ l 2 d t 0 N − 1 l 2 = − µ 2 � U n + U n + 1 �� u N � 2 l 2 − � u 0 � 2 � ∇ l 2 δ t . � � 4 � n = 0

Experiences using 2decomp&fft to solve Partial Differential - PowerPoint PPT Presentation

Experiences using 2decomp&fft to solve Partial Differential Equations using Fourier Spectral Methods Sudarshan Balakrishnan, Abdullah H. Bargash, Gong Chen, Brandon Cloutier, Ning Li, Dave Malicke, Benson Muite, Michael Quell, Paul Rigge,

2DECOMP&FFT A Highly Scalable 2D Decomposition Library and FFT Interface Ning Li and

The Fast Fourier Transform - FFT Sound Design and Interactive Music - FFT Learning Objectives

FFT Application Examples and Implementation FFT Example 1: Signal Sparsity in time Frequency

FFT libraries on Cray XT: CRay Adaptive FFT (CRAFFT) Jonathan Bentz Cray Inc. Outline

Overview Partial Constituent Fronting in German The phenomenon: Partial constituent fronting

FFT analysis of DNA sequences Harvey Lab Group Meeting March 1, 2004 Russell Hanson 2 Nave

Fast Convolutions Via the Overlap- and-Save Method Using Shared Memory FFT Karel Admek , Sofia

Partial Functions and Categories of Partial Maps Science Atlantic at Acadia University Darien

Partial Orders on the integers. In this case ( a , b ) R if a b . a a so R is reflexive. a b

JUST THE MATHS SLIDES NUMBER 14.1 PARTIAL DIFFERENTIATION 1 (Partial derivatives of the

The Semantics of Partial Model Introduction Transformations Partial Models Transforming

FFT Group Update Singapore Roadshow www.fftsecurity.com Forward Looking Statements This

LOW-COMMUNICATION FFT WITH FAST MULTIPOLE METHOD Cris Cecka, Senior Research Scientist. May 11,

Analyzing fluid flows via the ergodicity defect ergodicity defect Sherry E. Scott FFT 2013

The The Fast Fourier Transform Fast Fourier Transform Basic FFT Stuff That s Good to

Advanced Digital Signal Processing Part 4: DFT and FFT Gerhard Schmidt

CS453 INTRODUCTION TO DATAFLOW ANALYSIS CS453 Lecture Register allocation using liveness

The C Programming Language J er ome Hugues ISAE/DMIA jerome.hugues@isae.fr J er

Binary search trees Dictionary ADT Binary search tree properties Search Insertion February 04,

Unit1Day4-LaBrake Monday, September 09, 2013 5:09 PM Vanden Bout/LaBrake/Crawford CH301

Outline Association Rules: Concept and Algorithms Basics of Association Rules Algorithms:

Expressions Eric McCreath Expressions on integers There is the standard set of interger

Human-Robot Interaction: Language Acquisition with Neural Network Alvin Rindra Fazrie 09.11.2015

Outline Integer representa+on and opera+ons Bit opera+ons

Experiences using 2decomp&fft to solve Partial Differential - PowerPoint PPT Presentation

Experiences using 2decomp&fft to solve Partial Differential Equations using Fourier Spectral Methods Sudarshan Balakrishnan, Abdullah H. Bargash, Gong Chen, Brandon Cloutier, Ning Li, Dave Malicke, Benson Muite, Michael Quell, Paul Rigge,

2DECOMP&amp;FFT A Highly Scalable 2D Decomposition Library and FFT Interface Ning Li and

The Fast Fourier Transform - FFT Sound Design and Interactive Music - FFT Learning Objectives

FFT Application Examples and Implementation FFT Example 1: Signal Sparsity in time Frequency

FFT libraries on Cray XT: CRay Adaptive FFT (CRAFFT) Jonathan Bentz Cray Inc. Outline

Overview Partial Constituent Fronting in German The phenomenon: Partial constituent fronting

FFT analysis of DNA sequences Harvey Lab Group Meeting March 1, 2004 Russell Hanson 2 Nave

Fast Convolutions Via the Overlap- and-Save Method Using Shared Memory FFT Karel Admek , Sofia

Partial Functions and Categories of Partial Maps Science Atlantic at Acadia University Darien

Partial Orders on the integers. In this case ( a , b ) R if a b . a a so R is reflexive. a b

JUST THE MATHS SLIDES NUMBER 14.1 PARTIAL DIFFERENTIATION 1 (Partial derivatives of the

The Semantics of Partial Model Introduction Transformations Partial Models Transforming

FFT Group Update Singapore Roadshow www.fftsecurity.com Forward Looking Statements This

LOW-COMMUNICATION FFT WITH FAST MULTIPOLE METHOD Cris Cecka, Senior Research Scientist. May 11,

Analyzing fluid flows via the ergodicity defect ergodicity defect Sherry E. Scott FFT 2013

The The Fast Fourier Transform Fast Fourier Transform Basic FFT Stuff That s Good to

Advanced Digital Signal Processing Part 4: DFT and FFT Gerhard Schmidt

CS453 INTRODUCTION TO DATAFLOW ANALYSIS CS453 Lecture Register allocation using liveness

The C Programming Language J er ome Hugues ISAE/DMIA jerome.hugues@isae.fr J er

Binary search trees Dictionary ADT Binary search tree properties Search Insertion February 04,

Unit1Day4-LaBrake Monday, September 09, 2013 5:09 PM Vanden Bout/LaBrake/Crawford CH301

Outline Association Rules: Concept and Algorithms Basics of Association Rules Algorithms:

Expressions Eric McCreath Expressions on integers There is the standard set of interger

Human-Robot Interaction: Language Acquisition with Neural Network Alvin Rindra Fazrie 09.11.2015

Outline Integer representa+on and opera+ons Bit opera+ons

2DECOMP&FFT A Highly Scalable 2D Decomposition Library and FFT Interface Ning Li and