CS240A: Parallelism in CSE Applications Tao Yang Slides revised - PowerPoint PPT Presentation

CS240A: Parallelism in CSE Applications Tao Yang Slides revised from James Demmel and Kathy Yelick www.cs.berkeley.edu/~demmel/cs267_Spr11 1

Category of CSE Simulation Applications discrete • Discrete event systems • Time and space are discrete • Particle systems • Important special case of lumped systems • Ordinary Differentiation Equations (ODEs) • Location/entities are discrete, time is continuous • Partial Differentiation Equations (PDEs) • Time and space are continuous continuous CS267 Lecture 4 2

Basic Kinds of CSE Simulation • Discrete event systems: • “ Game of Life, ” Manufacturing systems, Finance, Circuits, Pacman • Particle systems: • Billiard balls, Galaxies, Atoms, Circuits, Pinball … • Ordinary Differential Equations (ODEs), • Lumped variables depending on continuous parameters • system is “ lumped ” because we are not computing the voltage/current at every point in space along a wire, just endpoints • Structural mechanics, Chemical kinetics, Circuits, Star Wars: The Force Unleashed • Partial Differential Equations (PDEs) • Continuous variables depending on continuous parameters • Heat, Elasticity, Electrostatics, Finance, Circuits, Medical Image Analysis, Terminator 3: Rise of the Machines • For more on simulation in games, see • www.cs.berkeley.edu/b-cam/Papers/Parker-2009-RTD CS267 Lecture 4 3

Table of Cotent • ODE • PDE • Discrete Events and Particle Systems

Finite-Difference Method for ODE/PDE • Discretize domain of a function • For each point in the discretized domain, name it with a variable, setup equations. • The unknown values of those points form equations. Then solve these equations

Euler’s method for ODE Initial-Value Problems dy     y f ( x , y ); y(x ) y 0 0 dx Straight line approximation y 0 x h x h x h x 0 1 2 3

Euler Method             Approximate: y x ( y x h y x ) / h   0 0 Then:      2 ' y y h y O h  n 1 n n        2 f( , ) y y h x y O h  n 1 n n n Thus starting from an initial value y 0   error     2 f( , ) y y h x y with O h  n 1 n n n

Example dy   1   0  x y y dx        f( , ) ( ) y y h x y y h x y  n 1 n n n n n n Exact Error  h  0 . 02 x y n y' n hy' n Solution n 0 1.00000 1.00000 0.02000 1.00000 0.00000 0.02 1.02000 1.04000 0.02080 1.02040 -0.00040 0.04 1.04080 1.08080 0.02162 1.04162 -0.00082 0.06 1.06242 1.12242 0.02245 1.06367 -0.00126 0.08 1.08486 1.16486 0.02330 1.08657 -0.00171 0.1 1.10816 1.20816 0.02416 1.11034 -0.00218 0.12 1.13232 1.25232 0.02505 1.13499 -0.00267 0.14 1.15737 1.29737 0.02595 1.16055 -0.00318 0.16 1.18332 1.34332 0.02687 1.18702 -0.00370 0.18 1.21019 1.39019 0.02780 1.21443 -0.00425 0.2 1.23799 1.43799 0.02876 1.24281 -0.00482

ODE with boundary value 5 8 2 d u 1 d u u    0 2 2 dr r dr r  u ( 5 ) 0 . 0038731 " ,  u ( 8 ) 0 . 0030769 " http://numericalmethods.eng.usf.edu 9

Solution Using the approximation of    2 dy y y d y y 2 y y       i 1 i i 1 i 1 i 1     and   2 2 dx x dx 2 x Gives you    u 2 u u u u u 1        i 1 i i 1 i 1 i 1 i 0       2 2 r 2 r r r i i       1 1 2 1 1 1               u u u 0                    i 1  i  i 1 2 2 2 2     2 r r   2 r r r r r r i i i http://numericalmethods.eng.usf.edu 10

Solution Cont    Step 1 At node i 0 , r a 5 0 0  u 0 . 0038731        i 1 , r r r 5 0 . 6 5 . 6 " Step 2 At node 1 0       1 1 2 1 1 1               0 u u   u                 0 1 2 2 2 2 2    2 5 . 6 0 . 6    2 5 . 6 0 . 6 0 . 6 0 . 6 0 . 6 5 . 6    2 . 6290 u 5 . 5874 u 2 . 9266 u 0 0 1 2        r r r 5 . 6 0 . 6 6 . 2 Step 3 At node i 2 , 2 1       1 1 2 1 1 1               u u u 0           1 2 3   2 2 2 2     2 6 . 2 0 . 6 0 . 6 0 . 6 6 . 2 0 . 6 2 6 . 2 0 . 6    2 . 6434 u 5 . 5816 u 2 . 9122 u 0 1 2 3 http://numericalmethods.eng.usf.edu 11

Solution Cont        Step 4 At node i 3 , r r r 6 . 2 0 . 6 6 . 8 3 2       1 1 2 1 1 1               u u u 0           2 3 4   2 2 2 2     2 6 . 8 0 . 6 2 6 . 8 0 . 6 0 . 6 0 . 6 6 . 8 0 . 6    2 . 6552 u 5 . 5772 u 2 . 9003 u 0 2 3 4        r r r 6 . 8 0 . 6 7 . 4 Step 5 At node i 4 , 4 3       1 1 2 1 1 1               0   u u   u           3 4 5 2 2 2 2     2 7 . 4 0 . 6   2 7 . 4 0 . 6 0 . 6 0 . 6 0 . 6 7 . 4    2 . 6651 u 5 . 6062 u 2 . 8903 u 0 3 4 5        r r r 7 . 4 0 . 6 8 Step 6 At node i 5 , 5 4   u u / 0 . 0030769  b 5 r http://numericalmethods.eng.usf.edu 12

Solving system of equations       1 0 0 0 0 0 0 . 0038731 u 0        0 2 . 6290 5 . 5874 2 . 9266 0 0 0 u       1        0 0 2 . 6434 5 . 5816 2 . 9122 0 0 u      2  =  0   0 0 2 . 6552 5 . 5772 2 . 9003 0   u   3        0 0 0 0 2 . 6651 5 . 6062 2 . 8903 u       4     0 . 0030769       0 0 0 0 0 1  u  5 0  3  u 0 . 0038731 u 0 . 0032689 Graph and “ stencil ” 1  4  u 0 . 0036115 u 0 . 0031586 2  5  u 0 . 0034159 u 0 . 0030769 x x x http://numericalmethods.eng.usf.edu 13

Compressed Sparse Row (CSR) Format SpMV: y = y + A*x, only store, do arithmetic, on nonzero entries x Representation of A A y Matrix-vector multiply kernel: y (i)  y (i) + A (i,j) × x (j) Matrix-vector multiply kernel: y (i)  y (i) + A (i,j) × x (j) Matrix-vector multiply kernel: y (i)  y (i) + A (i,j) × x (j) for each row i for each row i for k = ptr[i] to ptr[i + 1]-1 do for k = ptr[i] to ptr[i + 1]-1 do y[i] = y[i] + val[k] * x[ind[k]] y[i] = y[i] + val[k] * x[ind[k]] CS267 Lecture 4 15

Parallel Sparse Matrix-vector multiplication • y = A*x, where A is a sparse n x n matrix x P1 y P2 • Questions • which processors store P3 • y[i], x[i], and A[i,j] P4 • which processors compute • y[i] = sum (from 1 to n) A[i,j] * x[j] = (row i of A) * x … a sparse dot product • Partitioning May require • Partition index set {1,…,n} = N1  N2  …  Np. communication • For all i in Nk, Processor k stores y[i], x[i], and row i of A • For all i in Nk, Processor k computes y[i] = (row i of A) * x • “ owner computes ” rule: Processor k compute the y[i]s it owns. CS267 Lecture 4 16

Matrix-processor mapping vs graph partitioning • Relationship between matrix and graph 1 2 3 4 5 6 1 1 1 1 3 2 4 2 1 1 1 1 3 1 1 1 4 1 1 1 1 1 5 1 1 1 1 5 6 6 1 1 1 1 • A “ good ” partition of the graph has • equal (weighted) number of nodes in each part (load and storage balance). • minimum number of edges crossing between (minimize communication). • Reorder the rows/columns by putting all nodes in one partition together. 02/09/2010 CS267 Lecture 7 17

Matrix Reordering via Graph Partitioning • “ Ideal ” matrix structure for parallelism: block diagonal • p (number of processors) blocks, can all be computed locally. • If no non-zeros outside these blocks, no communication needed • Can we reorder the rows/columns to get close to this? • Most nonzeros in diagonal blocks, few outside P0 P1 P2 P3 P4 P0 P1 P2 = * P3 P4 CS267 Lecture 4 18

CS240A: Parallelism in CSE Applications Tao Yang Slides revised - PowerPoint PPT Presentation

CS240A: Parallelism in CSE Applications Tao Yang Slides revised from James Demmel and Kathy Yelick www.cs.berkeley.edu/~demmel/cs267_Spr11 1 Category of CSE Simulation Applications discrete Discrete event systems Time and space are

Hardware Parallelism vs. Software Parallelism USENIX Workshop on Hot Topics in Parallelism March

Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism

Pervasive Parallelism Laboratory Stanford University ppl.stanford.edu Make parallelism

Data-Level Parallelism Nima Honarmand Fall 2015 :: CSE 610 Parallel Computer Architectures

Advanced OpenMP Lecture 6: Nested parallelism Nested parallelism Nested parallelism is

CSCI341 Lecture 37, Introduction to Parallelism PIPELINING Exploits potential parallelism

Parallel Models Different ways to exploit parallelism Outline Shared-Variables Parallelism

Parallelism ! Multiple processes concurrently Parallelism CPU1 CPU1 CPU1 Pseudo- Process 1

CO444H parallelism Ben Livshits 1 Why Parallelism? One way to speed up a computation is to

Multi-core Programming: Implicit Parallelism Tuukka Haapasalo April 16, 2009 Tuukka Haapasalo

Plan Parallelism Complexity Measures 1 Multithreaded Parallelism and Performance Measures cilk

Opportunities for Parallelism Dr. Michael K. Bane HIGH END COMPUTE Questions 1. What do you

CS 5220: Locality and parallelism in simulations I David Bindel 2017-09-12 1 Parallelism and

Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What

Apache Spark CS240A Winter 2016. T Yang Some of them are based on P. Wendells Spark slides

Caches and Memory Hierarchy: Review UCSB CS240A, Fall 2017 1 Motivation Most applications

Topic 4: Cost Volume Profit analysis Ana M Arias Alvarez University of Oviedo Department

Neutron backgrounds in DUNE Aran Borkum 1 Overview Update on neutron capture rates in LAr

Bounds for the volume ratio of convex bodies DANIEL GALICER (Joint work with Mariano Merzbacher

Polyhedral Volumes Visual Techniques T. V. Raman & M. S. Krishnamoorthy Polyhedral Volumes

Combining Money Management, Portfolio Metrics, and Strategies for Investing and Trading

Thank you, sponsors Our online sponsors PLATINUM GOLD 1 6/28/2016 TOP 4 LOW COST COMPRESSOR

Neutrino interaction systematic errors in MINOS and NOvA Mayly Sanchez Iowa State University

Partitioning for applications Outline Meshes Rob H. Bisseling, Albert-Jan Yzelman, Bas Fagginger

CS240A: Parallelism in CSE Applications Tao Yang Slides revised - PowerPoint PPT Presentation

CS240A: Parallelism in CSE Applications Tao Yang Slides revised from James Demmel and Kathy Yelick www.cs.berkeley.edu/~demmel/cs267_Spr11 1 Category of CSE Simulation Applications discrete Discrete event systems Time and space are

Hardware Parallelism vs. Software Parallelism USENIX Workshop on Hot Topics in Parallelism March

Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism

Pervasive Parallelism Laboratory Stanford University ppl.stanford.edu Make parallelism

Data-Level Parallelism Nima Honarmand Fall 2015 :: CSE 610 Parallel Computer Architectures

Advanced OpenMP Lecture 6: Nested parallelism Nested parallelism Nested parallelism is

CSCI341 Lecture 37, Introduction to Parallelism PIPELINING Exploits potential parallelism

Parallel Models Different ways to exploit parallelism Outline Shared-Variables Parallelism

Parallelism ! Multiple processes concurrently Parallelism CPU1 CPU1 CPU1 Pseudo- Process 1

CO444H parallelism Ben Livshits 1 Why Parallelism? One way to speed up a computation is to

Multi-core Programming: Implicit Parallelism Tuukka Haapasalo April 16, 2009 Tuukka Haapasalo

Plan Parallelism Complexity Measures 1 Multithreaded Parallelism and Performance Measures cilk

Opportunities for Parallelism Dr. Michael K. Bane HIGH END COMPUTE Questions 1. What do you

CS 5220: Locality and parallelism in simulations I David Bindel 2017-09-12 1 Parallelism and

Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What

Apache Spark CS240A Winter 2016. T Yang Some of them are based on P. Wendells Spark slides

Caches and Memory Hierarchy: Review UCSB CS240A, Fall 2017 1 Motivation Most applications

Topic 4: Cost Volume Profit analysis Ana M Arias Alvarez University of Oviedo Department

Neutron backgrounds in DUNE Aran Borkum 1 Overview Update on neutron capture rates in LAr

Bounds for the volume ratio of convex bodies DANIEL GALICER (Joint work with Mariano Merzbacher

Polyhedral Volumes Visual Techniques T. V. Raman &amp; M. S. Krishnamoorthy Polyhedral Volumes

Combining Money Management, Portfolio Metrics, and Strategies for Investing and Trading

Thank you, sponsors Our online sponsors PLATINUM GOLD 1 6/28/2016 TOP 4 LOW COST COMPRESSOR

Neutrino interaction systematic errors in MINOS and NOvA Mayly Sanchez Iowa State University

Partitioning for applications Outline Meshes Rob H. Bisseling, Albert-Jan Yzelman, Bas Fagginger

Polyhedral Volumes Visual Techniques T. V. Raman & M. S. Krishnamoorthy Polyhedral Volumes