Literature Foundations of parallel algorithms aff: Practical PRAM - PowerPoint PPT Presentation

DF21500 Multicore Computing. Lecture on foundations of parallel algorithms. 2 C. Kessler, IDA, Link¨ opings Universitet, 2011. DF21500 Multicore Computing. Lecture on foundations of parallel algorithms. 1 C. Kessler, IDA, Link¨ opings Universitet, 2011. Literature Foundations of parallel algorithms aff: Practical PRAM Programming. [PPP] Keller, Kessler, Tr¨ PRAM model Wiley Interscience, New York, 2000. Chapter 2. Time, work, cost [JaJa] JaJa: An introduction to parallel algorithms. Self-simulation and Brent’s Theorem Addison-Wesley, 1992. Speedup and Amdahl’s Law [CLR] Cormen, Leiserson, Rivest: Introduction to Algorithms , NC Chapter 30. MIT press, 1989. Scalability and Gustafssons Law [JA] Jordan, Alaghband: Fundamentals of Parallel Processing. Fundamental PRAM algorithms Prentice Hall, 2003. reduction parallel prefix list ranking Survey article (see course homepage): C. Kessler, J. Keller: Models for Parallel Computing – Review and Perspectives. PRAM variants, simulation results and separation theorems. PARS-Mitteilungen 24 , Gesellschaft f¨ ur Informatik, Dec. 2007, ISSN 0177-0454 Survey of other models of parallel computation Asynchronous PRAM, Delay model, BSP , LogP , LogGP DF21500 Multicore Computing. Lecture on foundations of parallel algorithms. 4 C. Kessler, IDA, Link¨ opings Universitet, 2011. 3 C. Kessler, IDA, Link¨ DF21500 Multicore Computing. Lecture on foundations of parallel algorithms. opings Universitet, 2011. Parallel computation models (2) Parallel computation models (1) Cost model: should + abstract from hardware and technology ! generalization + explain available observations ! analyze algorithms before implementation + specify basic operations, when applicable + predict future behaviour + specify how data can be stored + abstract from unimportant details ! focus on most characteristic (w.r.t. influence on time/space complexity) independent of a particular parallel computer Simplifications to reduce model complexity: use idealized machine model ignore hardware details: memory hierarchies, network topology, ... features of a broader class of parallel machines use asymptotic analysis Cost model Programming model drop insignificant effects key parameters shared memory vs. use empirical studies message passing cost functions for basic operations calibrate parameters, evaluate model degree of synchronous execution constraints

5 DF21500 Multicore Computing. Lecture on foundations of parallel algorithms. C. Kessler, IDA, Link¨ opings Universitet, 2011. DF21500 Multicore Computing. Lecture on foundations of parallel algorithms. 6 C. Kessler, IDA, Link¨ opings Universitet, 2011. Flashback to DALG, Lecture 1: The RAM model The RAM model (2) = t load + t store + ( 2 t load + t add + t store + t branch ) = 5 N � 3 2 Θ ( N s = d(0) ) Algorithm analysis: Counting instructions RAM (Random Access Machine) [PPP 2.1] = 2 do i = 1, N-1 s = s + d(i) programming and cost model for the analysis of sequential algorithms Example: Computing the global sum of N elements end do N data memory ∑ t ..... i s M[3] M[2] + s M[1] + M[0] s + load s clock store + s + + program memory CPU s ALU + + + register 1 s ! arithmetic circuit model, directed acyclic graph (DAG) model current instruction register 2 + + + + + .... PC d[0] d[1] d[2] d[3] d[4] d[5] d[6] d[7] d[0] d[1] d[2] d[3] d[4] d[5] d[6] d[7] DF21500 Multicore Computing. Lecture on foundations of parallel algorithms. 8 C. Kessler, IDA, Link¨ opings Universitet, 2011. DF21500 Multicore Computing. Lecture on foundations of parallel algorithms. 7 C. Kessler, IDA, Link¨ opings Universitet, 2011. PRAM model: Variants for memory access conflict resolution PRAM model [PPP 2.2] Exclusive Read, Exclusive Write (EREW) PRAM Parallel Random Access Machine [Fortune/Wyllie’78] concurrent access only to different locations in the same cycle p processors Concurrent Read, Exclusive Write (CREW) PRAM MIMD simultaneous reading from or single writing to same location is possible common clock signal Shared Memory arithm./jump: 1 clock cycle Concurrent Read, Concurrent Write (CRCW) PRAM simultaneous reading from or writing to same location is possible: shared memory CLOCK Weak CRCW uniform memory access time Shared Memory ? ...... Common CRCW P P P P P a latency: 1 clock cycle (!) 0 1 2 3 p-1 Arbitrary CRCW CLOCK M M0 M2 M3 concurrent memory accesses M1 p-1 Priority CRCW sequential consistency ...... P P P P P 0 1 2 3 p-1 Combining CRCW private memory (optional) (global sum, max, etc.) M p-1 M0 M2 M3 M1 processor-local access only t: *a=0; *a=1; nop; *a=0; *a=2; No need for ERCW ...

; x 1 ; :::; x n � 1 stored in an array. DF21500 Multicore Computing. Lecture on foundations of parallel algorithms. 9 C. Kessler, IDA, Link¨ opings Universitet, 2011. DF21500 Multicore Computing. Lecture on foundations of parallel algorithms. 10 C. Kessler, IDA, Link¨ opings Universitet, 2011. n � 1 Global sum computation on EREW and Combining-CRCW PRAM (1) Global sum computation on EREW and Combining-CRCW PRAM (2) d log 2 n e time steps = 0 Given n numbers x 0 Recursive parallel sum program in the PRAM progr. language Fork [PPP] ∑ sync int parsum( sh int *d, sh int n) The global sum x i can be computed in i { on an EREW PRAM with n processors. sh int s1, s2; sh int nd2 = n / 2; Parallel algorithmic paradigm used: Parallel Divide-and-Conquer if (n==1) return d[0]; // base case d[0] d[1] d[2] d[3] d[4] d[5] d[6] d[7] $=rerank(); // re-rank processors within group ParSum(n): if ($<nd2) // split processor group: + + + + ( 1 ) s1 = parsum( d, nd2 ); ParSum(n/2) ParSum(n/2) + + else s2 = parsum( &(d[nd2]), n-nd2 ); ( n = 2 ) ! T ( n ) = T ( n = 2 ) + O ( 1 ) ( 1 ) return s1 + s2; + + } ( 1 ) t Global sum� Fork95 434 sh-loads, 344 sh-stores trv traced time period: 6 msecs 78 mpadd, 0 mpmax, 0 mpand, 0 mpor Divide phase: trivial, time O P0 ! T ( n ) 2 O ( log n ) 7 barriers, 0 msecs = 15.4% spent spinning on barriers 0 lockups, 0 msecs = 0.0% spent spinning on locks 93 sh loads, 43 sh stores, 15 mpadd, 0 mpmax, 0 mpand, 0 mpor P1 Recursive calls: parallel time T 7 barriers, 0 msecs = 14.9% spent spinning on barriers 0 lockups, 0 msecs = 0.0% spent spinning on locks 48 sh loads, 43 sh stores, 9 mpadd, 0 mpmax, 0 mpand, 0 mpor P2 7 barriers, 0 msecs = 14.9% spent spinning on barriers 0 lockups, 0 msecs = 0.0% spent spinning on locks 48 sh loads, 43 sh stores, 9 mpadd, 0 mpmax, 0 mpand, 0 mpor with base case: load operation, time O P3 7 barriers, 0 msecs = 14.4% spent spinning on barriers 0 lockups, 0 msecs = 0.0% spent spinning on locks 49 sh loads, 43 sh stores, 9 mpadd, 0 mpmax, 0 mpand, 0 mpor Combine phase: addition, time O P4 7 barriers, 0 msecs = 14.9% spent spinning on barriers 0 lockups, 0 msecs = 0.0% spent spinning on locks 48 sh loads, 43 sh stores, 9 mpadd, 0 mpmax, 0 mpand, 0 mpor P5 7 barriers, 0 msecs = 14.4% spent spinning on barriers 0 lockups, 0 msecs = 0.0% spent spinning on locks 49 sh loads, 43 sh stores, 9 mpadd, 0 mpmax, 0 mpand, 0 mpor P6 Use induction or the master theorem [CLR 4] 7 barriers, 0 msecs = 14.4% spent spinning on barriers 0 lockups, 0 msecs = 0.0% spent spinning on locks 49 sh loads, 43 sh stores, 9 mpadd, 0 mpmax, 0 mpand, 0 mpor P7 7 barriers, 0 msecs = 13.9% spent spinning on barriers 0 lockups, 0 msecs = 0.0% spent spinning on locks 50 sh loads, 43 sh stores, 9 mpadd, 0 mpmax, 0 mpand, 0 mpor 12 DF21500 Multicore Computing. Lecture on foundations of parallel algorithms. C. Kessler, IDA, Link¨ opings Universitet, 2011. DF21500 Multicore Computing. Lecture on foundations of parallel algorithms. 11 C. Kessler, IDA, Link¨ opings Universitet, 2011. PRAM model: CRCW is stronger than CREW Global sum computation on EREW and Combining-CRCW PRAM (3) Example: Iterative parallel sum program in Fork t int sum(sh int a[], sh int n) Computing the logical OR of p bits + idle idle idle idle idle idle idle { CREW: time O(log p) int d, dd; Shared Memory + + ? idle idle idle idle idle idle 0 1 0 1 0 0 0 1 int ID = rerank(); a d = 1; + + + + idle idle idle idle OR OR OR OR CLOCK while (d<n) { 1 1 0 1 a(1) a(2) a(3) a(4) a(5) a(6) a(7) a(8) dd = d; d = d*2; OR OR if (ID%d==0) a[ID] = a[ID] + a[ID+dd]; ...... 1 1 P P P P P 0 1 2 3 p-1 } OR } 1 M p-1 M0 M2 M3 M1 t: nop; *a=1; nop; *a=1; *a=1; time O(1) CRCW: On a Combining CRCW PRAM with addition as the combining operation, sh int a = 0; the global sum problem can be solved in a constant number of time steps (else do nothing) if (mybit == 1) a = 1; using n processors. e.g. for termination detection syncadd( &s, a[ID] ); // procs ranked ID in 0...n-1

Literature Foundations of parallel algorithms aff: Practical PRAM - PowerPoint PPT Presentation

DF21500 Multicore Computing. Lecture on foundations of parallel algorithms. 2 C. Kessler, IDA, Link opings Universitet, 2011. DF21500 Multicore Computing. Lecture on foundations of parallel algorithms. 1 C. Kessler, IDA, Link opings

Autonomous Formation Flying (AFF) Sensor for Precision Formation Flying Missions MiMi Aung

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

Finance Updates OPEN - HEALTH AFF - INFO 4-1 MU Health Care Finance Updates OPEN - HEALTH AFF -

Machine Learning Theory CS 446 1. SVM risk 0.6 0.5 aff tr aff te misclassification rate

OPEN - EXT AFF M&A - INFO 2-1 Ove verview I. Campaign Update II. NextGen + Missouri

recap to this point foundations foundations foundations foundations genetics =

OIB class of 2020 10th grade LV1 3 h H-G Literature 4 h 2 h (+2 h French) 11th grade

+ Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms

Parallel Algorithms Parallel Algorithms Examples Examples Concepts & Definitions

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources of

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources

Proposed Whooshh Innovations Scanning Image Collection Project at Bonneville AFF October 11,

Health Affairs Committee Kristin Hahn-Cover, MD, Chief Quality Officer OPEN - HEALTH AFF - INFO

CEO Update Jonathan Curtright OPEN HEALTH AFF INFO 3 1 WHERE WERE GOING TODAY

Financial Update January FY18 OPEN HEALTH AFF INFO 1 1 1 OPERATING RESULTS

Centre for Research in Statistical Methodology http://go.warwick.ac.uk/crism Conferences and

Relative entropy optimization in quantum information Omar Fawzi ICMP 2018, Montr eal 1/11

Chapter 11 Analysis of Algorithms Chapter Scope Efficiency goals The concept of algorithm

Topic Number 2 Efficiency Complexity Algorithm Analysis " bit twiddling: 1. (pejorative)

Crypto 2011, Santa Barbara Inverting HFE Systems is Quasi-Polynomial for All Fields Jintai Ding 1

Welcome to the course! Ben Teusch Human Resources (HR) Analytics Consultant DataCamp Human

Attrition of L1 English By Susan Dostert Emma Raykhman Introduction Sometimes I feel as

real large-scale randomised controlled trial Vic Menzies Research Trial Officer Centre for

Literature Foundations of parallel algorithms aff: Practical PRAM - PowerPoint PPT Presentation

DF21500 Multicore Computing. Lecture on foundations of parallel algorithms. 2 C. Kessler, IDA, Link opings Universitet, 2011. DF21500 Multicore Computing. Lecture on foundations of parallel algorithms. 1 C. Kessler, IDA, Link opings

Autonomous Formation Flying (AFF) Sensor for Precision Formation Flying Missions MiMi Aung

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

Finance Updates OPEN - HEALTH AFF - INFO 4-1 MU Health Care Finance Updates OPEN - HEALTH AFF -

Machine Learning Theory CS 446 1. SVM risk 0.6 0.5 aff tr aff te misclassification rate

OPEN - EXT AFF M&amp;A - INFO 2-1 Ove verview I. Campaign Update II. NextGen + Missouri

recap to this point foundations foundations foundations foundations genetics =

OIB class of 2020 10th grade LV1 3 h H-G Literature 4 h 2 h (+2 h French) 11th grade

+ Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms

Parallel Algorithms Parallel Algorithms Examples Examples Concepts &amp; Definitions

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources of

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources

Proposed Whooshh Innovations Scanning Image Collection Project at Bonneville AFF October 11,

Health Affairs Committee Kristin Hahn-Cover, MD, Chief Quality Officer OPEN - HEALTH AFF - INFO

CEO Update Jonathan Curtright OPEN HEALTH AFF INFO 3 1 WHERE WERE GOING TODAY

Financial Update January FY18 OPEN HEALTH AFF INFO 1 1 1 OPERATING RESULTS

Centre for Research in Statistical Methodology http://go.warwick.ac.uk/crism Conferences and

Relative entropy optimization in quantum information Omar Fawzi ICMP 2018, Montr eal 1/11

Chapter 11 Analysis of Algorithms Chapter Scope Efficiency goals The concept of algorithm

Topic Number 2 Efficiency Complexity Algorithm Analysis &quot; bit twiddling: 1. (pejorative)

Crypto 2011, Santa Barbara Inverting HFE Systems is Quasi-Polynomial for All Fields Jintai Ding 1

Welcome to the course! Ben Teusch Human Resources (HR) Analytics Consultant DataCamp Human

Attrition of L1 English By Susan Dostert Emma Raykhman Introduction Sometimes I feel as

real large-scale randomised controlled trial Vic Menzies Research Trial Officer Centre for

OPEN - EXT AFF M&A - INFO 2-1 Ove verview I. Campaign Update II. NextGen + Missouri

Parallel Algorithms Parallel Algorithms Examples Examples Concepts & Definitions

Topic Number 2 Efficiency Complexity Algorithm Analysis " bit twiddling: 1. (pejorative)