tsunami simulation on fpga gpu tsunami simulation on fpga
play

Tsunami simulation on FPGA/GPU Tsunami simulation on FPGA/GPU and its - PowerPoint PPT Presentation

Tsunami simulation on FPGA/GPU Tsunami simulation on FPGA/GPU and its analysis based on Statistical Model Checking Masahiro Fujita VLSI Design and Education Center (VDEC) University of Tokyo 1 2 Outline What Tsunami simulation means in


  1. Tsunami simulation on FPGA/GPU Tsunami simulation on FPGA/GPU and its analysis based on Statistical Model Checking Masahiro Fujita VLSI Design and Education Center (VDEC) University of Tokyo 1

  2. 2 Outline • What Tsunami simulation means in this talk • Acceleration with FPGA/GPU • Acceleration with FPGA/GPU – Based on stream processing (pipelining) with loop unrolling unrolling – Based on parallel processing for decomposed regions • (Formal verification of those implementation) • (Formal verification of those implementation) – (Equivalence checking between FPGA/GPU implementation and the original program in C/Fortran) implementation and the original program in C/Fortran) – Just show our strategy • Statistical model checking Statistical model checking – On software in Fortran – Acceleration with FPGA/GPU l i i h PG /GP

  3. 3 Motivation • Based on the values of many earthquake sensors (wired/wireless) compute how Tsunami wave will (wired/wireless), compute how Tsunami wave will propagate • Goal: Realize supercomputer level performance in G l R li l l f i Tsunami simulation with FPGA/GPU Compute/Predict C /P di Tsunami as fast and accurate as possible accurate as possible Earthquake sensors Generate initial wave from sensor data geographically distributed Propagate wave by numerically solving partial differential equations

  4. 4 Tsunami simulation • Tsunami simulation algorithm: Find solutions of fluid dynamics equations dynamics equations – Law of Conservation of Mass – Law of Conservation of Momentum with and without bottom friction • Solved with known boundary conditions and bathymetric input of the region y p g • Here the above is processed by numerically solving sets of partial differential equations with finite sets of partial differential equations with finite difference methods

  5. 5 Partial differential equations to be solved Partial differential equations to be solved ὴ = vertical displacement of water above still water ὴ p D= Total water depth = h+ ὴ Momentum g = Acceleration due to gravity equations along A = horizontal eddy viscosity current A h i l dd i i X ‐ axis and Y ‐ axis τ = friction along x or y direction respectively M = water flux discharge along X direction M = water flux discharge along X direction without bottom ith t b tt N = water flux discharge along Y direction friction Mass conservation Mass conservation Reference: Tsunami Modeling Manual by Prof Nobuo Shuto

  6. 6 Here we use a simplified model: Linear one (valid if sea depth is large enough) • Shallow Water Theory (Long Wave Theory)    M M N       (Mass Conservation) (Mass Conservation) 0 0    t x y          2 2 M M MN gn M           2 2 gD M N 0         7   (Momentum Conservation) (Momentum Conservation) t x D y D x   D D 3          2 2 M MN N gn M           2 2 gD M N 0         7   t x D y D y  D 3   : waveheight g D : depth p g g : g gravity y n : Manning g M , , N : flaxofx f f , , y y • Linear Long Wave Theory (Mass Conservation) (M (Momentum Conservation) t C ti )

  7. 7 Finite difference methods • Solution of mass conservation equation based on finite difference method finite difference method – Z(i,j,t+1) = Z(I,j,t) – (dt/dx)*(M(I,j,t)-M(i-1,j,t)+N(i,j,t)-N(i,j-1,t))       M M M M N N    0 Where    t t x y i, j = x, y coordinate Z(i,j,t) = Water Surface level at time t H(i,j,t) = Still water depth at time t t+1 dt = temporal step dx = spatial step M(i,j,1) = water flux discharge along x-axis at time t N(i,j,1) = water flux discharge along y-axis at time t Z(i,j,2) = water surface level at time t+dt

  8. 8 Wave Height Computation Mass conservation Momentum conservation (Wave height update) (Wave height update) ・・・ ・・・ ・・・ ・・・ N me T M,N, M,N, ・・・ ・・・ ・・・ ・・・ M M H H Tim H,Z H ・・・ ・・・ ・・・ ・・・ H 1 second 1 second ・・・ ・・・ ・・・ ・・・ T+1 Time T ・・・ ・・・ ・・・ ・・・ Z Z M,N ・・・ ・・・ ・・・ ・・・ ・・・ ・・・ ・・・ ・・・ Z Z 8

  9. 9 Target Tsunami Simulator g • “TUNAMI N1” program in FORTRAN – Developed by Tohoku University D l d b T h k U i i Standing Water Earthquake Depth (H) Depth (H) h ( ) Source Info. Source Info. f Initial Wave Height (Z0) Computation Mass Conservation d) ht Update e = 1 secon Computation (Wave height: Z) 1040 grids 1040 grids ave Heigh eration = Momentum Standing water depth H Conservation (1 grid: 1km x 1km) Computation p Wa (1 it (Momentum: M,N) 9

  10. 10 C Implementation (base program) p p g • Mass and Momentum functions are computed alternatively alternatively – Each function raster ‐ scans the grids • Since there is no data dependency between the h b h computations at grids, they can be parallelized Mass Conservation (Z update) Momentum Conservation (M,N update) No dependency No dependency (Parallelizable) (Parallelizable) 10

  11. 11 Speed of N1 simulation program • Original Fortran program has been manually converted into C – C is 4 times faster than Fortran in our environment – C version is the base simulator • Size of simulation area • Grid width: 1[km] • Numbers of grids: 1040*668 • Simulated time • 1 time step = 1 sec • 7,200 steps computed (2 hours) • Tsunami simulation time on Intel microprocessor l l (i7@2.93GHz, single core) → 78 7 → 78.7 sec

  12. 12 Profiling of Computation time of the software Profiling of Computation time of the software Start Simulation Cycles and Computation time of TUNAMI Input p 160 Initial condition 140 sec] 120 tion time[s Main Loop t=1 ; t<=T ; t++ 100 moment conservation 80 Mass Conservation Mass Conservation open boundary open boundary Computat 60 mass conservation Open Boundary Condition initial 40 M Momentum Conservation t C ti 20 0 3600 7200 10800 14400 18000 21600 Simulation Cycles (t) Simulation Cycles (t) Stop 12

  13. 13 Co-execution C C on microprocessor i FPGA/GPU FPGA/GPU • Read earthquake information from files Mass Conservation • Compute initial wave Open Boundary Condition Open Boundary Condition • Load initial wave to DRAM Momentum Conservation memory of FPGA board • Run FPGA . . . • Read data from DRAM Idl Idle . • Main loop . . • Store results to DRAM • Store results into files

  14. 14 Pipeline processing for higher throughput • Latency – After receiving input, how many cycles are required to generate its output generate its output • Throughput – How frequently input data can be processed Iter 1 Iter 2 Iter 3 Iter N Iteration Pipeline processing Pipeline processing Time Iter N latency y Iter 3 Pipeline stages Iter 2 Goal: Faster throughput Iteration Iter 1 = Larger numbers of pipeline stages Larger numbers of pipeline stages Initiation interval (~throughput)

  15. 15 Typical way for larger pipelines • Usually each loop becomes one pipeline – Multiple loops should be merged as much as possible • Number of pipeline stages depends on the length of each iteration – Better to have larger loops Better to have larger loops • Various loop optimizations have been proposed – Formal analysis becomes possible with such transformations y p for ( i=0; i<N; i++) for (t1=0; t1<N; t1++) for (t2=0; t2<N; t2++) { for ( j=0; j<N; j++) S0(j,i,0)A[t2,t1] += u[t2]*v[t1]; S0(0,i,j): A[ i ][ j ] += u[ i ] * v[ j ]; S1(i j 1) [t1] S1(i,j,1)x[t1] += A[t2,t1]*y[t1]; A[t2 t1]* [t1] i’ i S0(i,j) S1(i’,j’) for ( i=0; i<N; i++) } for ( j=0; j<N; j++) S1(1,i,j): x[ i ] += A[ j ][ i ] * y[ j ]; j’ j [Pluto 08] U. Bondhugula, et al. “"A Practical and Automatic Polyhedral Program Optimization System,” in ACM PLDI'08, 2008

  16. 16 Transforming latency-based to throughput-based computation • Stream based programming • Stream based programming – Communication/buffering becomes explicit – Easier for formal analysis as well – Easier for formal analysis as well • Works for both FPGA and GPU – And also for many-cores – And also for many-cores Instruction Data memory memory memory memory P1 µP P2 P3 Hardware “instruction, latency-based” “data, throughput-based”

  17. 17 Introducing streams • Steam • Steam = Sequence of data, functions, or combined Sequence of data functions or combined … M. N. H. Z t Formal equivalence One stream for One stream for t+1 t+1 checking with h ki i h each region manipulation of transformation matrix One stream for … each time step Data object b Original (can be a single program code stream)

  18. 18 Strategy for GPU implementation • As shown earlier, stream is based on e each region each region Tim – Easier and more efficient for GPU – But depend on memory access B t d d architecture of the target GPU systems • Essentially area where Tsunami should be simulated is decomposed into a set of small regions d d f ll – Each core of GPU is in charge of one region – Straight forward parallelism – Pipelined computation inside each core

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend