Why Parallelize? Why Parallelize? To decrease the overall - PDF document

Simple Steps for Parallelizing a Simple Steps for Parallelizing a FORTRAN Code Using FORTRAN Code Using Message Passing Interface (MPI) Message Passing Interface (MPI) Justin L. Morgan and Jason B. Gilbert Justin L. Morgan and Jason B. Gilbert Department of Aerospace Engineering, Auburn University Department of Aerospace Engineering, Auburn University Why Parallelize? Why Parallelize? � To decrease the overall computation time of a job. To decrease the overall computation time of a job. � � To decrease the per To decrease the per- -processor memory usage. processor memory usage. � � As William As William Gropp Gropp states in states in Using MPI Using MPI , , “ “To pull a bigger To pull a bigger � wagon, it is easier to add more oxen than to grow a wagon, it is easier to add more oxen than to grow a gigantic ox.” gigantic ox. ” 1

Physical Problem Formulation Physical Problem Formulation � Determine temperature distribution in a flat plate Determine temperature distribution in a flat plate � with a temperature of 300 K being applied to with a temperature of 300 K being applied to three edges and 500 K being applied to the three edges and 500 K being applied to the fourth edge. fourth edge. Governing Equation Governing Equation & & & & − + = E E E E � Conservation of Energy Conservation of Energy � in out g st (Differential Conservation Form) (Differential Conservation Form) � Assumptions Made Assumptions Made � � Front and Back Faces are Front and Back Faces are � Perfectly Insulated Perfectly Insulated � Steady Conditions Steady Conditions � � No Energy Transformation No Energy Transformation � 2

Discretization Discretization � Point Jacobi Method Point Jacobi Method � + k 1 � Iteratively solve for Iteratively solve for T � i , j ( ) ( ) − + + − + + k k 1 k k k 1 k T 2 T T T 2 T T + − + − + ≅ i 1 , j i , j i 1 , j i , j 1 i , j i , j 1 0 Δ Δ 2 2 x y Implementation in FORTRAN Implementation in FORTRAN � Dimension Arrays Dimension Arrays � � Set Initial and Boundary Conditions Set Initial and Boundary Conditions � � Begin Iterative Process Begin Iterative Process � � Monitor Convergence Monitor Convergence � 3

Results Results � Iterative Convergence Iterative Convergence � 1 ( ) ( ) ⎛ − − ⎞ α + − i max 1 j max 1 k 1 k T T ∑ ∑ ε ⎜ ⎟ ε = i , j i , j i , j ( ) ( ) ⎜ ⎟ = = = i 2 j 2 L α i , j k ⎜ ⎟ T norm N ⎜ ⎟ i , j ⎝ ⎠ Results Results � Temperature Distribution Temperature Distribution � 4

Code Verification Code Verification � Method of Manufactured Solutions (MMS) Method of Manufactured Solutions (MMS) � ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ π π π ~ x y xy = + ⎜ ⎟ + ⎜ ⎟ + ⎜ ⎟ T C C sin C cos C sin ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ 1 2 3 4 ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ a a a 1 2 3 ~ ~ ⎛ ⎞ ⎛ ⎞ ∂ ∂ ⎛ π ⎞ ⎛ π ⎞ ⎛ π ⎞ ⎛ π ⎞ 2 2 T T C x C xy C y C xy ( ) ⎜ ⎟ ⎜ ⎟ + = − π ⎜ ⎟ + π ⎜ ⎟ − π ⎜ ⎟ + π ⎜ ⎟ = 2 2 4 2 2 3 2 4 2 2 sin y sin cos x sin f x , y ⎜ ⎜ ⎟ ⎜ ⎟ ⎟ ⎜ ⎜ ⎟ ⎜ ⎟ ⎟ ∂ ∂ 2 2 2 2 2 2 ⎝ a ⎠ ⎝ a ⎠ ⎝ a ⎠ ⎝ a ⎠ x y ⎝ a a ⎠ ⎝ a a ⎠ 1 1 3 3 2 2 3 3 Code Verification Code Verification � Discretization Error (DE) Discretization Error (DE) � ~ ~ + + = − k 1 k 1 DE T T i , j i , j NUMERICAL EXACT ( ) ( ) ~ ~ ~ ~ + Δ + + Δ k k 2 k k 2 T T x T T y ~ + − + − + ≅ i , j 1 i , j 1 i 1 i 1 , j k 1 ( ) T i , j Δ Δ 2 2 NUMERICAL 2 x y 5

Code Verification Code Verification � Discretization Error (DE) Discretization Error (DE) � Mesh Nodes Maximum DE (K) 10 x 10 13.00 25 x 25 1.30 50 x 50 0.34 Code Verification Code Verification � Global Discretization Error Global Discretization Error � � Formal Order of Accuracy Formal Order of Accuracy � � Observed Order of Accuracy Observed Order of Accuracy � 1.00E+02 1.00E+01 L2Norm rm 2nd Order Slope o N 1.00E+00 2 L 1 10 100 Δ Δ x y = = k k h Δ Δ k x y 1.00E-01 1 1 1.00E-02 h 6

Parallelization Parallelization Domain Decomposition for 2 Processors Domain Decomposition for 2 Processors � � Blue box represents information Blue box represents information to be passed between processors to be passed between processors after each iteration. after each iteration. � Red boxes are fixed boundary Red boxes are fixed boundary � conditions conditions � � Green boxes include the grid points Green boxes include the grid points that are initially sent to each that are initially sent to each processor. processor. Parallel Code Structure Parallel Code Structure � The code is divided into three main sections: the portion perfor The code is divided into three main sections: the portion performed by all med by all � processors, the portion performed by the master processor, and the portion processors, the portion performed by the master processor, and t he portion performed by the slave processors. performed by the slave processors. All Processors All Processors Declare Variables Declare Variables Dimension Arrays Dimension Arrays INCLUDE ‘ ‘MPIF.H MPIF.H’ ’ INCLUDE Initialize MPI Initialize MPI If I am master If I am master then … then … Else (slave processors) Else (slave processors) … … End If End If � � MPIF.H is a file telling the compiler where to find the MPI libraries. MPIF.H is a file telling the compiler where to find the MPI libr aries. 7

Parallel Code Structure Parallel Code Structure � � The job of the master processor is to initialize the grid with i The job of the master processor is to initialize the grid with initial and nitial and boundary conditions, then decompose it and send each processor the boundary conditions, then decompose it and send each processor t he information it needs. information it needs. � Each slave processor receives its initial grid from the master n � Each slave processor receives its initial grid from the master node and ode and begins to perform calculations. After each iteration, individual begins to perform calculations. After each iteration, individual processors must pass the first and last column of their respective grid to ve grid to processors must pass the first and last column of their respecti neighboring processors to update its values. neighboring processors to update its values. � The slave processors iterate until an acceptable convergence has The slave processors iterate until an acceptable convergence has been been � reached and then send the new temperature values back to the master reached and then send the new temperature values back to the mas ter processor to reassemble the grid. processor to reassemble the grid. MPI Functions MPI Functions � MPI Functions Called By All Processors MPI Functions Called By All Processors � � MPI_INIT(IERR) MPI_INIT(IERR) � � MPI_FINALIZE(IERR) MPI_FINALIZE(IERR) � � MPI_COMM_RANK(MPI_COMM_WORLD, MYID, IERR) MPI_COMM_RANK(MPI_COMM_WORLD, MYID, IERR) � � MPI_COMM_SIZE(MPI_COMM_WORLD, NUMPROCS, IERR) MPI_COMM_SIZE(MPI_COMM_WORLD, NUMPROCS, IERR) � � MPI Communication Operations MPI Communication Operations � � MPI_SEND(BUFFER, COUNT, DATATYPE, DESTINATION, TAG, MPI_SEND(BUFFER, COUNT, DATATYPE, DESTINATION, TAG, � MPI_COMM_WORLD, IERR) MPI_COMM_WORLD, IERR) � MPI_RECV(BUFFER,COUNT, DATATYPE, SOURCE, TAG, MPI_RECV(BUFFER,COUNT, DATATYPE, SOURCE, TAG, � MPI_COMM_WORLD, STATUS, IERR) MPI_COMM_WORLD, STATUS, IERR) 8

Why Parallelize? Why Parallelize? To decrease the overall - PDF document

Simple Steps for Parallelizing a Simple Steps for Parallelizing a FORTRAN Code Using FORTRAN Code Using Message Passing Interface (MPI) Message Passing Interface (MPI) Justin L. Morgan and Jason B. Gilbert Justin L. Morgan and Jason B.

PLEXOS Breakout Discussion on Major PLEXOS Enhancements Why Parallelize? Faster, because it

USING OPENACC TO PARALLELIZE SEISMIC ONE-WAY BASED MIGRATION Kshitij Mehta (Total E&P

Using OpenACC to parallelize irregular computation (Session:S7478) Sunita Chandrasekaran Arnov

Measuring the performance improvements as you parallelize and optimize your software 0.25 s -O2

Why Im NOT Why Im NOT Why Im NOT Why Im NOT a Hindu Why Im NOT a Hindu

Speed up evaluation by parallelization /////////// November 2018 Michael Weiss Bayer AG

Targeting GPUs with OpenMP 4.5 Device Directives James Beyer, NVIDIA Jeff Larkin, NVIDIA OpenMP

Presentation Smitha Sunil Kamat and Krithika Parthan The Java Concurrency framework

On Parallelizing Advection and Navier- Stokes Simulators An Introspection Project Goals To

Identifying opportunities for parallelization In the hotspots of your code PARALLWARE SW

Understanding Manycore Scalability of File Systems Changwoo Min , Sanidhya Kashyap, Stefgen Maass

Parallel Programming: The Road to HPC Prof. Michael Robson Name Preferred Name Introductions

Parallel Programming and Heterogeneous Computing A4 Workloads & Fosters Methodology

Implementing and Evaluating Nested Parallel Transactions in STM Woongki Baek, Nathan Bronson,

Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate

Parallel programming with Sklml Quentin Carbonneaux Franois Clment Pierre Weis INRIA

Computing Shanjiang Tang , Bu-Sung Lee, Bingsheng He School of Computer Engineering Nanyang

Ligra: A Lightweight Graph Processing Framework for Shared Memory J. Shun and G. Blelloch

Building a Grid System for HPC HPC on Grid High Performance Computing (HPC): Use of computer

Software Engineering Challenges for Parallel Processing Systems Lt Col Marcus W Hervey, USAF

GROUP PLC Investor Presentation 4Q18 & FY18 Financial Results 19 February 2019

Investor Presentation 2019 Results Presentation Outline I. Overview II. Financial Highlights

Issuing the 2 nd Series of Measure JJ Bonds February 13, 2019 Measure JJ Approved in November

South Bayside Waste Management Authority Discussion of the Refunding of the Series 2009A Bonds

Why Parallelize? Why Parallelize? To decrease the overall - PDF document

Simple Steps for Parallelizing a Simple Steps for Parallelizing a FORTRAN Code Using FORTRAN Code Using Message Passing Interface (MPI) Message Passing Interface (MPI) Justin L. Morgan and Jason B. Gilbert Justin L. Morgan and Jason B.

PLEXOS Breakout Discussion on Major PLEXOS Enhancements Why Parallelize? Faster, because it

USING OPENACC TO PARALLELIZE SEISMIC ONE-WAY BASED MIGRATION Kshitij Mehta (Total E&amp;P

Using OpenACC to parallelize irregular computation (Session:S7478) Sunita Chandrasekaran Arnov

Measuring the performance improvements as you parallelize and optimize your software 0.25 s -O2

Why Im NOT Why Im NOT Why Im NOT Why Im NOT a Hindu Why Im NOT a Hindu

Speed up evaluation by parallelization /////////// November 2018 Michael Weiss Bayer AG

Targeting GPUs with OpenMP 4.5 Device Directives James Beyer, NVIDIA Jeff Larkin, NVIDIA OpenMP

Presentation Smitha Sunil Kamat and Krithika Parthan The Java Concurrency framework

On Parallelizing Advection and Navier- Stokes Simulators An Introspection Project Goals To

Identifying opportunities for parallelization In the hotspots of your code PARALLWARE SW

Understanding Manycore Scalability of File Systems Changwoo Min , Sanidhya Kashyap, Stefgen Maass

Parallel Programming: The Road to HPC Prof. Michael Robson Name Preferred Name Introductions

Parallel Programming and Heterogeneous Computing A4 Workloads &amp; Fosters Methodology

Implementing and Evaluating Nested Parallel Transactions in STM Woongki Baek, Nathan Bronson,

Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate

Parallel programming with Sklml Quentin Carbonneaux Franois Clment Pierre Weis INRIA

Computing Shanjiang Tang , Bu-Sung Lee, Bingsheng He School of Computer Engineering Nanyang

Ligra: A Lightweight Graph Processing Framework for Shared Memory J. Shun and G. Blelloch

Building a Grid System for HPC HPC on Grid High Performance Computing (HPC): Use of computer

Software Engineering Challenges for Parallel Processing Systems Lt Col Marcus W Hervey, USAF

GROUP PLC Investor Presentation 4Q18 &amp; FY18 Financial Results 19 February 2019

Investor Presentation 2019 Results Presentation Outline I. Overview II. Financial Highlights

Issuing the 2 nd Series of Measure JJ Bonds February 13, 2019 Measure JJ Approved in November

South Bayside Waste Management Authority Discussion of the Refunding of the Series 2009A Bonds

USING OPENACC TO PARALLELIZE SEISMIC ONE-WAY BASED MIGRATION Kshitij Mehta (Total E&P

Parallel Programming and Heterogeneous Computing A4 Workloads & Fosters Methodology

GROUP PLC Investor Presentation 4Q18 & FY18 Financial Results 19 February 2019