An FPGA Implementation of Reciprocal Sums for SPME Sam Lee and Paul - - PowerPoint PPT Presentation

an fpga implementation of reciprocal sums for spme
SMART_READER_LITE
LIVE PREVIEW

An FPGA Implementation of Reciprocal Sums for SPME Sam Lee and Paul - - PowerPoint PPT Presentation

An FPGA Implementation of Reciprocal Sums for SPME Sam Lee and Paul Chow Edward S. Rogers Sr. Department of Electrical and Computer Engineering University of Toronto Objectives Accelerate part of Molecular Dynamics Simulation Smooth


slide-1
SLIDE 1

An FPGA Implementation of Reciprocal Sums for SPME

Sam Lee and Paul Chow

Edward S. Rogers Sr. Department of Electrical and Computer Engineering University of Toronto

slide-2
SLIDE 2

2

Objectives

Accelerate part of Molecular Dynamics Simulation

Smooth Particle Mesh Ewald

Implementation

FPGA based Try it and learn

Investigation

Acceleration bottleneck Precision requirement Parallelization strategy

slide-3
SLIDE 3

3

Presentation Outline

Molecular Dynamics SPME The Reciprocal Sum Compute Engine Speedup and Parallelization Precision Future work

slide-4
SLIDE 4

4

Molecular Dynamics Simulation

slide-5
SLIDE 5

5

  • 1. Calculate interatomic

forces.

  • 2. Calculate the net force.
  • 3. Integrate Newton’s

equations of motion.

Molecular Dynamics

  • Combines empirical force

calculations with Newton’s equations of motion.

  • Predict the time trajectory
  • f small atomic systems.
  • Computationally

demanding.

1 − → →

⋅ = m F a

( ) ( ) ( ) ( )

t a t t v t t r t t r

→ → → →

+ + = +

2

5 . δ δ δ

( ) ( ) ( ) ( )⎥

⎦ ⎤ ⎢ ⎣ ⎡ + + + = +

→ → → →

t t a t a t t v t t v δ δ δ 5 .

F

slide-6
SLIDE 6

6

Molecular Dynamics

Bonds All

  • b

l l k

2

) (

Θ − Θ −

Θ Angles All

  • k

2

) (

+ +

Torsions All

n A )] cos( 1 [ φ τ

⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ − ⎟ ⎠ ⎞ ⎜ ⎝ ⎛

Pairs All

r r

6 12

4 σ σ ε

Pairs All

r q q

2 1

U =

+

δ

δ

+ + + +

slide-7
SLIDE 7

7

MD Simulation

Problem scientists are facing:

SLOW! O(N2) complexity.

3 0 CPU Years

slide-8
SLIDE 8

8

Solutions

Parallelize to more compute engines Accelerate with FPGA Especially: The non-bonded calculations To be more specific, this paper addresses:

Electrostatic interaction (Reciprocal space) Smooth Particle Mesh Ewald algorithm.

slide-9
SLIDE 9

9

Previous Work

Software SPME Implementations:

Original PME Package written by Toukmaji. Used in NAMD2.

Hardware Implementations:

No previous hardware implementation of

reciprocal sums calculation.

MD-Grape & MD-Engine uses Ewald Summation. Ewald Summation is O(N2); SPME is O(NLogN)!

slide-10
SLIDE 10

10

Smooth Particle Mesh Ewald

slide-11
SLIDE 11

11

Electrostatic Interaction

Coulombic equation: Under the Periodic Boundary Condition,

the summation to calculate Electrostatic energy is only … Conditionally Convergent.

∑∑∑

= =

=

' 1 1 ,

2 1

n N i N j n ij j i

r q q U

r q q vcoulomb

2 1

4πε − =

slide-12
SLIDE 12

12

Periodic Boundary Condition

A

3 2 1 4 5

B

3 2 1 4 5

C

3 2 1 4 5

D

3 2 1 4 5

E

3 2 1 4 5

F

3 2 1 4 5

G

3 2 1 4 5

H

3 2 1 4 5

I

3 2 1 4 5

To combat Surface Effect…

3 2 1 4 5

Replication

slide-13
SLIDE 13

13

Ewald Summation Used For PBC

r q r q r

q

Direct Sum Reciprocal Sum

To calculate the Coulombic Interactions O(N2) Direct Sum + O(N2) Reciprocal Sum

slide-14
SLIDE 14

14

Smooth Particle Mesh Ewald

Shift the workload to the Reciprocal Sum. Use Fast Fourier Transform. O(N) Real + O(NLogN) Reciprocal. RSCE calculates the Reciprocal Sums

using the SPME algorithm.

slide-15
SLIDE 15

15

SPME Reciprocal Contribution

) ,m ,m m Q)( (θ ) ,m ,m (m r Q r E F

K m K m rec K m αi αi rec ~ 3 2 1 1 1 1 1 2 2 1 3 3 3 2 1

∑ ∑ ∑

− = − = − =

∂ = ∂ ∂ =

2 3 3 2 2 2 2 1 1 3 2 1

) (m b ) (m b ) (m b ) ,m ,m B(m

  • =

1 2

2 exp 1 1 2 exp

− − =

⎥ ⎦ ⎤ ⎢ ⎣ ⎡ + × − =

n k i i n i i i i

) K k πim ( ) (k M ) K )m πi(n ( ) (m b

2 2 2 2 3 2 1

exp 1 m ) /β m π ( πV ) ,m ,m C(m − =

= ≠ ) , , ,c( m ) m , m , m )F(Q)( ,m ,m F(Q)(m ) ,m ,m B(m m ) /β m π ( πV E

m ~ 3 2 1 3 2 1 3 2 1 2 2 2 2

exp 2 1 − − −

=

FFT FFT

Energy: Force:

) ,m ,m m Q)( (θ ) ,m ,m Q(m E

K m K m rec K m ~ 3 2 1 1 1 1 1 2 2 1 3 3 3 2 1

2 1 ∑ ∑ ∑

− = − = − =

  • =
slide-16
SLIDE 16

16

Charge Interpolation

A B C D E F
slide-17
SLIDE 17

17

Reciprocal Sum Compute Engine

slide-18
SLIDE 18

18

RSCE Architecture

slide-19
SLIDE 19

19

RSCE Verification Testbench

slide-20
SLIDE 20

20

RSCE Validation Environment

slide-21
SLIDE 21

21

Speedup Estimate

RSCE vs. Software Implementation

slide-22
SLIDE 22

22

RSCE Speedup

RSCE @ 100MHz vs. P4 Intel @ 2.4GHz.

Speedup: 3x to 14x

Why so insignificant?

Reciprocal Sums calculations not easily

parallelizable.

QMM memory bandwidth limitation.

Improvement:

Using more QMM memories can improve the

speedup.

Slight design modifications are required.

slide-23
SLIDE 23

23

Parallelization Strategy

Multiple RSCE

slide-24
SLIDE 24

24

RSCE Parallelization Strategy

Assume a 2-D simulation system. Assume P= 2, K= 8, N= 6. Assume NumP = 4. Four 4x4x4 Mini Meshes An 8x8x8 mesh

slide-25
SLIDE 25

25

RSCE Parallelization Strategy

P1 P3 P2 P4

Kx

1D FFT Y direction

Ky

P1 P3 P2 P4

Kx

1D FFT X direction

Ky

Mini-mesh composed -> 2D-IFFT 2D-IFFT = two passes of 1D-FFT (X and Y). X Direction FFT Y Direction FFT

slide-26
SLIDE 26

26

Parallelization Strategy ∑

=

=

3 P P Total

E E

2D-FFT 2D-IFFT -> Energy Calculation -> 2D-FFT 2D-FFT -> Force Calculation Energy Calculation Force Calculation

slide-27
SLIDE 27

27

MD Simulations RSCE + NAMD2

slide-28
SLIDE 28

28

RSCE Precision

Precision goal: Relative error bound < 10-5. Two major calculation steps:

B-Spline Calculation. 3D-FFT/ IFFT Calculation.

Due to the limited logic resource & limited

precision FFT LogiCore. = > Precision goal cannot be achieved.

slide-29
SLIDE 29

29

RSCE Precision

To achieve the relative error bound of < 10-5. Minimum calculation precision:

FFT { 14.30} , B-Spline { 1.27}

slide-30
SLIDE 30

30

MD Simulation with RSCE

RMS Energy Error Fluctuation:

E E E n Fluctuatio Energy RMS

2 2 −

=

slide-31
SLIDE 31

31

FFT Precision Vs. Energy Fluctuation

slide-32
SLIDE 32

32

Summary

Implementation of FPGA-based Reciprocal Sums

Compute Engine and its SystemC model.

Integration of the RSCE into a widely used

Molecular Dynamics program called NAMD2 for verification

RSCE Speedup Estimate

3x to 14x

Precision Requirement

B-Spline: { 1.27} & FFT: { 14: 30} = > 10-5 rel. error

Parallelization Strategy

slide-33
SLIDE 33

33

Future Work

More in-depth precision analysis. Investigation on how to further speedup

the SPME algorithm with FPGA.

slide-34
SLIDE 34

34

Questions