CSL 860: Modern Parallel Computation Computation Course - PowerPoint PPT Presentation

CSL 860: Modern Parallel Computation Computation

Course Information • www.cse.iitd.ac.in/~subodh/courses/CSL860 • Grading: – Quizes 25 – Lab Exercise 1 7 + 8 – Project 35 (25% design, 25% presentations, 50% Demo) – Final Exam 25 Verbal discussion of assignments is fine but looking at someone else's work and then doing your own is not. Letting your work become available or visible to others is also cheating . For your project, you may borrow code available online but must clearly identify such code with due reference to the source. First instance of cheating will invite a zero in the assignment and a letter grad penalty. Repeat offender will fail the course.

Course Material • Documents posted on the course website • Reference books: – Introduction to Parallel Computing by Grama, Gupta, Karypis & Kumar by Grama, Gupta, Karypis & Kumar – An introduction to Parallel Algorithms by Jaja – Parallel Programming in C with MPI and OpenMP by Quinn

What this course is about? Learn to solve problems in parallel • – Concurrency issues – Performance/Load balance issues – Scalability issues Technical knowledge • – Theoretical models of computation – processor architecture features and constraints – processor architecture features and constraints – programming API, tools and techniques – Standard algorithms and data structures Different system architecture • – Shared memory, Communication network Hand-on • – Lots of programming – Multi-core, massively parallel – OpenMP, MPI, Cuda

Programming in the ‘Parallel’ • Understand target model (Semantics) – Implications/Restrictions of constructs/features • Design for the target model – Choice of granularity, synchronization primitive – Usually more of a performance issue Usually more of a performance issue • Think concurrent – For each thread, other threads are ‘adversaries’ • At least with regard to timing – Process launch, Communication, Synchronization • Clearly define pre and post conditions • Employ high-level constructs when possible – Debugging is extra-hard

Serial vs parallel • ATM Withdrawal Withdraw(int acountnum, int amount) { cur balance = balance(accountnum) ; if(curbalance > amount) { if(curbalance > amount) { setbalance(accountnum, curbalance-amount); eject(amount) } else … }

Some Complex Problems N -body simulation • – 1 million bodies � days/iteration Atmospheric simulation • – 1km 3D-grid, each point interacts with neighbors – Days of simulation time Movie making Movie making • – A few minutes = 30 days of rendering time Oil exploration • – months of sequential processing of seismic data Financial processing • – market prediction, investing Computational biology • – drug design – gene sequencing (Celera)

Why Parallel • Can’t clock faster • Do more per clock – Execute complex “special-purpose” instruction – Execute more simple instructions Execute more simple instructions

Measuring Performance • How fast does a job complete – Elapsed time (Latency) – compute + communicate + synchronize • How many jobs complete in a given time How many jobs complete in a given time – Throughput – Are they independent jobs?

Learning Parallel Programming • Let compiler extract parallelism? – Some predictive-issue has succeeded – In general, not successful so far – Too context sensitive – Many efficient serial data structures and algorithms are – Many efficient serial data structures and algorithms are parallel-inefficient – Even if compiler extracted parallelism from serial code, it would not be what you want • Programmer must conceptualize and code parallelism • Understand parallel algorithms and data structures

Parallel Task Decomposition Data • Data Parallel Parallel – Perform f (x) for many x • Task Parallel – Perform many functions f i Perform many functions f Task Parallel Pipeline

Fundamental Issues • Is the problem amenable to parallelization? – Are there (serial) dependencies • What machine architectures are available? – Can they be re-configured? Can they be re-configured? – Communication network • Algorithm – How to decompose the problem into tasks – How to map tasks to processors

Parallel Architectures: Flynn’s Taxonomy MISD MIMD Many ams Instruction Streams SISD SIMD 1 Many 1 Data Streams

Parallel Architectures: Components • Processors • Memory – shared – distributed • Communication Communication – Heirarchical, Crossbar, Bus, Memory – Synchronization • Control – centralized – distributed

Formal Performance Metrics Exec time using 1 processor system ( T 1 ) Speedup, S ( p ) = Exec time using p processors ( T p ) Optimal if C p = T 1 S p Efficiency = p Look out for slowdown : T 1 = n 3 Cost, C p = p × T p T p = n 2.5 , for p = n 2 C p = n 4.5

Amdahl’s Law • f = fraction of the problem that is sequential � (1 – f ) = fraction that is parallel • Parallel time T p f ( − 1 f ) p = + p • Speedup with p processors: 1 S p = 1 f − f + p

Amdahl’s Law • Only fraction (1- f) shared by p processors f Increasing p cannot speed-up fraction f • Upper bound on speedup at p = ∞ 1 1 S p S = Converges to 0 = 1 f − ∞ f f + p Example: f = 2%, S ∞ = 1 / 0.02 = 50

CSL 860: Modern Parallel Computation Computation Course - PowerPoint PPT Presentation

CSL 860: Modern Parallel Computation Computation Course Information www.cse.iitd.ac.in/~subodh/courses/CSL860 Grading: Quizes 25 Lab Exercise 1 7 + 8 Project 35 (25% design, 25% presentations, 50% Demo) Final Exam 25

CSL 860: Modern Parallel Computation Computation Hello OpenMP #pragma omp parallel { // I am

CSL 860: Modern Parallel Computation Computation PARALLEL ALGORITHM TECHNIQUES: BALANCED BINARY

CSL 860: Modern Parallel Computation Computation MPI: MESSAGE PASSING INTERFACE Message

CSL 860: Modern Parallel Computation Computation Categories of Processing Flynns classification

CSL 860: Modern Parallel Computation Computation MEMORY CONSISTENCY Intuitive Memory Model

MODERN 1 MODERN 2 MODERN 3 MODERN 4 MODERN A peep at some distant orb has power to raise

2019 AGM 203-1634 Harvey Ave., Kelowna, BC, Canada Tel: +1 250 860 8599 Fax: +1 250 860 1362

Recent PVS Language Developments Sam Owre owre@csl.sri.com URL: http://www.csl.sri.com/~owre/

Graph Covers and Iterative Decoding of Finite-Length Codes Pascal O. Vontobel (CSL, UIUC) Ralf

Random Testing in PVS Sam Owre owre@csl.sri.com URL: http://www.csl.sri.com/~owre/ Computer

Modern Risk Modern Risk Modern Risk Management Modern Risk Management anagement Concepts:

Models of Parallel Computation Mark Greenstreet CpSc 418 Oct. 10, 2013 The RAM Model of

Polyphaser, Transtector, RO Associates Kevin Turner -- Regional Sales Manager (860) 323-8012

Lead-Zinc Project 203-1634 Harvey Ave., Kelowna, BC, Canada February 2019 Tel: +1 250 860 8582

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 06: Learning with

MIT 9.520/6.860, Fall 2019 Statistical Learning Theory and Applications Class 02: Statistical

Parallel Computer Architecture Lars Karlsson Ume a University 2009-12-07 Lars Karlsson

MS degree in Computer Engineering University of Rome Tor Vergata Lecturer: Francesco Quaglia

Processor Performance and Parallelism Y. K. Malaiya Processor Execution time The time taken by

Lecture 23: Virtual Memory, Multiprocessors Todays topics: Virtual memory

RNS Arithmetic Approach in Lattice-based Cryptography Accelerating the Rounding-off Core

SIPE: Small Integer Plus Exponent Vincent LEFVRE AriC, INRIA Grenoble Rhne-Alpes / LIP,

High-speed Diffie-Hellman, part 2 D. J. Bernstein University of Illinois at Chicago Classic

Interactions between parallelism and numerical stability, accuracy Prof. Richard Vuduc Georgia