sanjay rajopadhye colorado state university n class
play

Sanjay Rajopadhye Colorado State University n Class objectives, - PowerPoint PPT Presentation

Sanjay Rajopadhye Colorado State University n Class objectives, goals, introduction n CUDA performance tuning (wrap up) n Equational Programming (intro) 2 n Parallel Programming is hard n End of the free lunch [Sut05] n


  1. Sanjay Rajopadhye Colorado State University

  2. n Class objectives, goals, introduction n CUDA performance tuning (wrap up) n Equational Programming (intro) 2

  3. n Parallel Programming is hard n “End of the free lunch” [Sut05] n Arrival of “manycores” signals the end of “La-Z-Boy Programming” [Pat06] Becoming a parallel programming expert will get you a good job But your skills may become obsolete – new machines, new languages, … Parallelism must return to La-Z-Boy programming [Sut05] Herb Sutter. “The Free Lunch Is Over: A Fundamental Turn Toward Concurrency,” in Software. Dr. Dobb's Journal, vol. 30, no. 3, 2005. [Pat06] David Patterson, in keynote talk at the International Workshop on Languages and Compilers For Parallel Computers LCPC 2006, New Orleans, LA. 3

  4. n Short term Become macho GPU programmer: write “heroically tuned” codes. n Medium term Do it systematically: tuning for GTX 280 vs tuning for GTX 465: learn principles, not skills n Long term Do it automatically: Learn the foundations of automatic compilation. Focus on a “regular subset” of programs n Polyhedral Equational Model 4

  5. n Big picture n Polyhedral Equations as programs: I’m loath to write C, despite the slogan “C no evil” n Equations vs (conventional) loop programs n Equations-to-code (compiling equations) n Schedule n (processor) allocation n (memory) allocation n But what about parallelism? 5

  6. 10 assignments (basic + advanced) + term project n CUDA performance tuning (2) n Equational programming: Alpha/AlphaZ (1) n Mathematical foundations: polyhedra, affine functions, and operations (2) n Alpha analysis/transformation (1) n Analysis: scheduling & allocation (2) n Code generation/tiling (2) 6

  7. n Assignments (30%) n Midterm (take home) (30%) n Final project (30% = 2+3+5+15+5) n Proposal n Advancement report n Final report n Quality of work n Final poster n Participation/Discussion/Quizzes (10%) 7

  8. n What are polyhedra? n Why are they useful/important n What is the polyhedral model? 8

  9. n What is a model? n A mathematical/computational/mechanical/ … abstraction of some other (physical) entity n Objects in the model must “emulate” the “natural operations” of the modeled entities – semantics 9

  10. From Feautrier’s keynote at LCPC 2009 Introduction Prehistory State of the Art What Next ? Dependences Irigoin, PF 1988, Pugh, 1992 Karp, Miller, Winograd 1967 Systolic Array Design Scheduling , Quinton, Robert, 1989 Quinton, Rajopadhye, Fortes, PF , Rajopadhye, 1987 Rau Placement PF, Pingali, 1994 H. T. Kung, 1978 Code Generation Irigoin, Lengauer, Rajopadhye Cousot, Halbwachs 1977 The Polytope Model Bastoul, PF, Boulet, 1987 −− 2005 Pugh, 1991 Tiling LC Lu, 1991 , Irigoin, JL Xue, 1988 Array Shrinking PF, Rajopadhye, Darte, 2005 Bernstein 1966 Automatic Parallelization Dependence tests, Banerjee, 1976 Locality Wolfe + Lam, 1991 Kuck L. Lamport, 1974 Allen, Kennedy, 1987 Bastoul, 2003 Lam Irigoin HLS Quinton, Risset, 1996 12 / 39 10

  11. n Physical entity: programs/computations n The Polyhedral Model is a “very high level” intermediate representation (IR) of “regular computations” n Polyhedral equational model: real=abstract n Amenable to: n Mathematical static analysis n Transformation within model: closure n Transformation outside model: (tiled) code generation 11

  12. n Class objectives, goals, introduction n CUDA performance tuning (wrap up) n Equational Programming (intro)? 12

  13. n Many resources on the web (NVIDIA webinars) n Coalescing (HW1a) n Challenge question: Achieve maximum bandwidth, with fewest threads-per-block n For a “strided-by-block” access pattern. n Arithmetic peak: warps and “virtualization” n Bank conflicts in shared memory 13

  14. n MAXPYrep: n Repeatedly execute Y=A*X+Y n Where A, X and Y are matrices n Matrices are small enough to fit in shared memory (ignore global memory access coalescing) n Goal: achieve machine peak n Port all previous performance to GTX 480 n And beyond … n Teach me 14

  15. n Oxford CUDA conf (CUDA webinar online) n “Identifying Performance Limiters,” Micikevicius NVIDIA/UCF (CUDA webinar) n “Roofline for Fast Math” Sam Williams, LBL 15

  16. n Wiki page for Pascal’s Triangle http://en.wikipedia.org/wiki/Pascal's_triangle � n … and also a non-standard way to compute Fibonacci numbers 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend