cs240a parallelism in
play

CS240A: Parallelism in CSE Applications Tao Yang Slides revised - PowerPoint PPT Presentation

CS240A: Parallelism in CSE Applications Tao Yang Slides revised from James Demmel and Kathy Yelick www.cs.berkeley.edu/~demmel/cs267_Spr11 1 Category of CSE Simulation Applications discrete Discrete event systems Time and space are


  1. CS240A: Parallelism in CSE Applications Tao Yang Slides revised from James Demmel and Kathy Yelick www.cs.berkeley.edu/~demmel/cs267_Spr11 1

  2. Category of CSE Simulation Applications discrete • Discrete event systems • Time and space are discrete • Particle systems • Important special case of lumped systems • Ordinary Differentiation Equations (ODEs) • Location/entities are discrete, time is continuous • Partial Differentiation Equations (PDEs) • Time and space are continuous continuous CS267 Lecture 4 2

  3. Basic Kinds of CSE Simulation • Discrete event systems: • “ Game of Life, ” Manufacturing systems, Finance, Circuits, Pacman • Particle systems: • Billiard balls, Galaxies, Atoms, Circuits, Pinball … • Ordinary Differential Equations (ODEs), • Lumped variables depending on continuous parameters • system is “ lumped ” because we are not computing the voltage/current at every point in space along a wire, just endpoints • Structural mechanics, Chemical kinetics, Circuits, Star Wars: The Force Unleashed • Partial Differential Equations (PDEs) • Continuous variables depending on continuous parameters • Heat, Elasticity, Electrostatics, Finance, Circuits, Medical Image Analysis, Terminator 3: Rise of the Machines • For more on simulation in games, see • www.cs.berkeley.edu/b-cam/Papers/Parker-2009-RTD CS267 Lecture 4 3

  4. Table of Cotent • ODE • PDE • Discrete Events and Particle Systems

  5. Finite-Difference Method for ODE/PDE • Discretize domain of a function • For each point in the discretized domain, name it with a variable, setup equations. • The unknown values of those points form equations. Then solve these equations

  6. Euler’s method for ODE Initial-Value Problems dy     y f ( x , y ); y(x ) y 0 0 dx Straight line approximation y 0 x h x h x h x 0 1 2 3

  7. Euler Method             Approximate: y x ( y x h y x ) / h   0 0 Then:      2 ' y y h y O h  n 1 n n        2 f( , ) y y h x y O h  n 1 n n n Thus starting from an initial value y 0   error     2 f( , ) y y h x y with O h  n 1 n n n

  8. Example dy   1   0  x y y dx        f( , ) ( ) y y h x y y h x y  n 1 n n n n n n Exact Error  h  0 . 02 x y n y' n hy' n Solution n 0 1.00000 1.00000 0.02000 1.00000 0.00000 0.02 1.02000 1.04000 0.02080 1.02040 -0.00040 0.04 1.04080 1.08080 0.02162 1.04162 -0.00082 0.06 1.06242 1.12242 0.02245 1.06367 -0.00126 0.08 1.08486 1.16486 0.02330 1.08657 -0.00171 0.1 1.10816 1.20816 0.02416 1.11034 -0.00218 0.12 1.13232 1.25232 0.02505 1.13499 -0.00267 0.14 1.15737 1.29737 0.02595 1.16055 -0.00318 0.16 1.18332 1.34332 0.02687 1.18702 -0.00370 0.18 1.21019 1.39019 0.02780 1.21443 -0.00425 0.2 1.23799 1.43799 0.02876 1.24281 -0.00482

  9. ODE with boundary value 5 8 2 d u 1 d u u    0 2 2 dr r dr r  u ( 5 ) 0 . 0038731 " ,  u ( 8 ) 0 . 0030769 " http://numericalmethods.eng.usf.edu 9

  10. Solution Using the approximation of    2 dy y y d y y 2 y y       i 1 i i 1 i 1 i 1     and   2 2 dx x dx 2 x Gives you    u 2 u u u u u 1        i 1 i i 1 i 1 i 1 i 0       2 2 r 2 r r r i i       1 1 2 1 1 1               u u u 0                    i 1  i  i 1 2 2 2 2     2 r r   2 r r r r r r i i i http://numericalmethods.eng.usf.edu 10

  11. Solution Cont    Step 1 At node i 0 , r a 5 0 0  u 0 . 0038731        i 1 , r r r 5 0 . 6 5 . 6 " Step 2 At node 1 0       1 1 2 1 1 1               0 u u   u                 0 1 2 2 2 2 2    2 5 . 6 0 . 6    2 5 . 6 0 . 6 0 . 6 0 . 6 0 . 6 5 . 6    2 . 6290 u 5 . 5874 u 2 . 9266 u 0 0 1 2        r r r 5 . 6 0 . 6 6 . 2 Step 3 At node i 2 , 2 1       1 1 2 1 1 1               u u u 0           1 2 3   2 2 2 2     2 6 . 2 0 . 6 0 . 6 0 . 6 6 . 2 0 . 6 2 6 . 2 0 . 6    2 . 6434 u 5 . 5816 u 2 . 9122 u 0 1 2 3 http://numericalmethods.eng.usf.edu 11

  12. Solution Cont        Step 4 At node i 3 , r r r 6 . 2 0 . 6 6 . 8 3 2       1 1 2 1 1 1               u u u 0           2 3 4   2 2 2 2     2 6 . 8 0 . 6 2 6 . 8 0 . 6 0 . 6 0 . 6 6 . 8 0 . 6    2 . 6552 u 5 . 5772 u 2 . 9003 u 0 2 3 4        r r r 6 . 8 0 . 6 7 . 4 Step 5 At node i 4 , 4 3       1 1 2 1 1 1               0   u u   u           3 4 5 2 2 2 2     2 7 . 4 0 . 6   2 7 . 4 0 . 6 0 . 6 0 . 6 0 . 6 7 . 4    2 . 6651 u 5 . 6062 u 2 . 8903 u 0 3 4 5        r r r 7 . 4 0 . 6 8 Step 6 At node i 5 , 5 4   u u / 0 . 0030769  b 5 r http://numericalmethods.eng.usf.edu 12

  13. Solving system of equations       1 0 0 0 0 0 0 . 0038731 u 0        0 2 . 6290 5 . 5874 2 . 9266 0 0 0 u       1        0 0 2 . 6434 5 . 5816 2 . 9122 0 0 u      2  =  0   0 0 2 . 6552 5 . 5772 2 . 9003 0   u   3        0 0 0 0 2 . 6651 5 . 6062 2 . 8903 u       4     0 . 0030769       0 0 0 0 0 1  u  5 0  3  u 0 . 0038731 u 0 . 0032689 Graph and “ stencil ” 1  4  u 0 . 0036115 u 0 . 0031586 2  5  u 0 . 0034159 u 0 . 0030769 x x x http://numericalmethods.eng.usf.edu 13

  14. Compressed Sparse Row (CSR) Format SpMV: y = y + A*x, only store, do arithmetic, on nonzero entries x Representation of A A y Matrix-vector multiply kernel: y (i)  y (i) + A (i,j) × x (j) Matrix-vector multiply kernel: y (i)  y (i) + A (i,j) × x (j) Matrix-vector multiply kernel: y (i)  y (i) + A (i,j) × x (j) for each row i for each row i for k = ptr[i] to ptr[i + 1]-1 do for k = ptr[i] to ptr[i + 1]-1 do y[i] = y[i] + val[k] * x[ind[k]] y[i] = y[i] + val[k] * x[ind[k]] CS267 Lecture 4 15

  15. Parallel Sparse Matrix-vector multiplication • y = A*x, where A is a sparse n x n matrix x P1 y P2 • Questions • which processors store P3 • y[i], x[i], and A[i,j] P4 • which processors compute • y[i] = sum (from 1 to n) A[i,j] * x[j] = (row i of A) * x … a sparse dot product • Partitioning May require • Partition index set {1,…,n} = N1  N2  …  Np. communication • For all i in Nk, Processor k stores y[i], x[i], and row i of A • For all i in Nk, Processor k computes y[i] = (row i of A) * x • “ owner computes ” rule: Processor k compute the y[i]s it owns. CS267 Lecture 4 16

  16. Matrix-processor mapping vs graph partitioning • Relationship between matrix and graph 1 2 3 4 5 6 1 1 1 1 3 2 4 2 1 1 1 1 3 1 1 1 4 1 1 1 1 1 5 1 1 1 1 5 6 6 1 1 1 1 • A “ good ” partition of the graph has • equal (weighted) number of nodes in each part (load and storage balance). • minimum number of edges crossing between (minimize communication). • Reorder the rows/columns by putting all nodes in one partition together. 02/09/2010 CS267 Lecture 7 17

  17. Matrix Reordering via Graph Partitioning • “ Ideal ” matrix structure for parallelism: block diagonal • p (number of processors) blocks, can all be computed locally. • If no non-zeros outside these blocks, no communication needed • Can we reorder the rows/columns to get close to this? • Most nonzeros in diagonal blocks, few outside P0 P1 P2 P3 P4 P0 P1 P2 = * P3 P4 CS267 Lecture 4 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend