Performance of Parallel Programs Michelle Ku3el 1 - PowerPoint PPT Presentation

Performance ¡of ¡Parallel ¡Programs ¡ Michelle ¡Ku3el ¡ 1 ¡

Analyzing ¡algorithms ¡ • Like ¡all ¡algorithms, ¡parallel ¡algorithms ¡ should ¡be: ¡ – Correct ¡ ¡ – Efficient ¡ • First ¡we ¡will ¡talk ¡about ¡efficiency ¡ 2 ¡

Work ¡and ¡Span ¡ Let ¡ T P ¡be ¡the ¡running ¡Hme ¡if ¡there ¡are ¡ P ¡processors ¡available ¡ Two ¡key ¡measures ¡of ¡run-‑Hme: ¡ • Work: ¡How ¡long ¡it ¡would ¡take ¡1 ¡processor ¡= ¡ T 1 ¡ e.g. ¡for ¡divide-‑and-‑conquer, ¡just ¡“sequenHalize” ¡the ¡recursive ¡forking ¡ • Span: ¡How ¡long ¡it ¡would ¡take ¡infinity ¡processors ¡= ¡ T ∞ ¡ – The ¡longest ¡dependence-‑chain ¡ – Example: ¡ O ( log ¡ n ) ¡for ¡summing ¡an ¡array ¡with ¡divide-‑and-‑conquer ¡ method ¡ ¡ • NoHce ¡that ¡for ¡this ¡case, ¡having ¡> ¡ n /2 ¡processors ¡is ¡no ¡addiHonal ¡help ¡ – Also ¡called ¡“criHcal ¡path ¡length” ¡or ¡“computaHonal ¡depth” ¡ slide ¡adapted ¡from: ¡Sophomoric ¡Parallelism ¡and ¡Concurrency, ¡Lecture ¡2 ¡ 3 ¡

Speedup ¡ Speedup ¡is ¡the ¡factor ¡by ¡which ¡the ¡Hme ¡is ¡ reduced ¡compared ¡to ¡a ¡single ¡processor ¡ Speedup ¡for ¡ P ¡processes ¡= ¡ Hme ¡for ¡1 ¡process ¡ ¡ Hme ¡for ¡ P ¡processes ¡ ¡= ¡T 1 /T P . ¡ In ¡the ¡ideal ¡situaHon, ¡as ¡P ¡increases, ¡so ¡ T P ¡ should ¡decrease ¡by ¡a ¡factor ¡of ¡P. ¡ ¡ ¡ Figure ¡from ¡“Parallel ¡Programming ¡in ¡OpenMP, ¡by ¡Chandra ¡et ¡al. ¡ ¡

Scalability ¡ • Scalability ¡is ¡the ¡speed-‑up ¡of ¡a ¡program ¡as ¡the ¡number ¡of ¡ processors ¡being ¡used ¡increases. ¡ ¡ ¡ – perfect ¡linear ¡speedup ¡= ¡P ¡ – Perfect ¡linear ¡speed-‑up ¡means ¡ doubling ¡ P ¡ halves ¡running ¡Hme ¡ – Usually ¡our ¡goal; ¡hard ¡to ¡get ¡in ¡pracHce ¡ • an ¡algorithm ¡is ¡termed ¡ scalable ¡if ¡the ¡level ¡of ¡parallelism ¡increases ¡ at ¡ least ¡linearly ¡with ¡the ¡problem ¡size. ¡ ¡ – running ¡Hme ¡is ¡inversely ¡proporHonal ¡to ¡the ¡number ¡of ¡processors ¡ used. ¡ • In ¡pracHce, ¡few ¡algorithms ¡achieve ¡linear ¡scalability; ¡most ¡ reach ¡an ¡opHmal ¡level ¡of ¡performance ¡and ¡then ¡deteriorate, ¡ someHmes ¡very ¡rapidly. ¡ slide ¡adapted ¡from: ¡Sophomoric ¡Parallelism ¡and ¡Concurrency, ¡Lecture ¡2 ¡ 5 ¡

Parallelism ¡ Parallelism ¡is ¡the ¡maximum ¡possible ¡speed-‑up: ¡ T 1 ¡/ ¡T ¡ ∞ ¡ ¡ – At ¡some ¡point, ¡adding ¡processors ¡won’t ¡help ¡ – What ¡that ¡point ¡is ¡depends ¡on ¡the ¡span ¡ Parallel ¡algorithms ¡is ¡about ¡decreasing ¡span ¡ without ¡ ¡ increasing ¡work ¡too ¡much ¡ slide ¡adapted ¡from: ¡Sophomoric ¡Parallelism ¡and ¡Concurrency, ¡Lecture ¡2 ¡ 6 ¡

TradiHonal ¡Scaling ¡Process ¡ Speedup ¡ User ¡code ¡ TradiHonal ¡ Uniprocessor ¡ ¡ Time: ¡Moore’s ¡law ¡ slide from: Art of Multiprocessor Programming

MulHcore ¡Scaling ¡Process ¡ Speedup ¡ User ¡code ¡ MulHcore ¡ Unfortunately, ¡not ¡so ¡simple… ¡ slide from: Art of Multiprocessor Programming 8 ¡

Real-‑World ¡Scaling ¡Process ¡ Speedup ¡ User ¡code ¡ MulHcore ¡ ParallelizaHon ¡and ¡SynchronizaHon ¡ ¡ require ¡great ¡care… ¡ ¡ slide from: Art of Multiprocessor Programming 9 ¡

Early ¡Parallel ¡CompuHng ¡ Why ¡was ¡it ¡not ¡pursued ¡more ¡thoroughly ¡before ¡ 1990’s? ¡ • Because ¡of ¡dramaHc ¡increase ¡in ¡uniprocessor ¡ speed, ¡the ¡need ¡for ¡parallelism ¡turned ¡out ¡to ¡ be ¡less ¡than ¡expected ¡ ¡ ¡ • and ¡… ¡ 10 ¡

Amdahl’s ¡law ¡ For over a decade prophets have voiced the contention that the organization of a single computer has reached its limits and that truly significant advances can be made only by interconnection of a multiplicity of computers in such a manner as to permit co- operative solution...The nature of this overhead (in parallelism) appears to be sequential so that it is unlikely to be amenable to parallel processing techniques. Overhead alone would then place an upper limit on throughput of five to seven times the sequential processing rate, even if the housekeeping were done in a separate processor...At any point in time it is difficult to foresee how the previous bottlenecks in a sequential computer will be effectively overcome . Gene ¡Amdahl,1967 ¡ IBM ¡-‑ ¡the ¡designer ¡of ¡IBM ¡360 ¡series ¡of ¡mainframe ¡architecture ¡ ¡

Amdahl’s ¡Law ¡(mostly ¡bad ¡news) ¡ So ¡far: ¡analyze ¡parallel ¡programs ¡in ¡terms ¡of ¡work ¡and ¡span ¡ • In ¡pracHce, ¡typically ¡have ¡parts ¡of ¡programs ¡that ¡parallelize ¡ well… ¡ – e.g. ¡maps/reducHons ¡over ¡arrays ¡and ¡trees ¡ ¡ …and ¡parts ¡that ¡are ¡inherently ¡sequenHal ¡ – e.g. ¡reading ¡a ¡linked ¡list, ¡gejng ¡input, ¡doing ¡ computaHons ¡where ¡each ¡needs ¡the ¡previous ¡step, ¡etc. ¡ “Nine ¡women ¡can’t ¡make ¡one ¡baby ¡in ¡one ¡month” ¡ slide ¡adapted ¡from: ¡Sophomoric ¡Parallelism ¡and ¡Concurrency, ¡Lecture ¡2 ¡ 12 ¡

We ¡also ¡have ¡the ¡ parallelizaBon ¡ overhead ¡ Refers ¡to ¡the ¡amount ¡of ¡Hme ¡required ¡to ¡ coordinate ¡parallel ¡tasks, ¡as ¡opposed ¡to ¡doing ¡ useful ¡work. ¡e.g. ¡ ¡ • starHng ¡threads ¡ • stopping ¡threads ¡ • synchronizaHon ¡and ¡locks ¡ 13 ¡

Amdahl’s ¡law * ¡ Total ¡running ¡Hme ¡for ¡a ¡program ¡can ¡be ¡divided ¡into ¡two ¡parts: ¡ Serial ¡part ¡ ¡(s) ¡ Parallel ¡part ¡ ¡(p) ¡ ¡ T 1 =s+p ¡ For ¡n ¡processors: ¡ T n = ¡s+p/n ¡ If ¡set ¡T 1 =1, ¡then ¡s= ¡1-‑p ¡ ¡and ¡ T n = ¡1-‑p+p/n ¡ * ¡ G. ¡M. ¡Amdahl, ¡“Validity ¡of ¡the ¡single ¡processor ¡approach ¡to ¡achieving ¡large ¡scale ¡compuHng ¡ capabiliHes”, ¡ AFIPS ¡Proc. ¡Of ¡the ¡SJCC, ¡ 30 ,438-‑485,1967 ¡ 14 ¡

Amdahl’s ¡Law ¡ Speedup= ¡ Art of Multiprocessor 15 ¡ Programming

Amdahl’s ¡Law ¡ Parallel ¡ fracHon ¡ Speedup= ¡ Art of Multiprocessor 16 ¡ Programming

Amdahl’s ¡Law ¡ SequenHal ¡ Parallel ¡ fracHon ¡ fracHon ¡ Speedup= ¡ Art of Multiprocessor 17 ¡ Programming

Amdahl’s ¡Law ¡ SequenHal ¡ Parallel ¡ fracHon ¡ fracHon ¡ Speedup= ¡ Number ¡of ¡ processors ¡ Art of Multiprocessor 18 ¡ Programming

Amdahl’s ¡law * ¡ parallelism ¡(infinite ¡processors) ¡ = ¡1/s ¡ * ¡ G. ¡M. ¡Amdahl, ¡“Validity ¡of ¡the ¡single ¡processor ¡approach ¡to ¡achieving ¡large ¡scale ¡compuHng ¡ capabiliHes”, ¡ AFIPS ¡Proc. ¡Of ¡the ¡SJCC, ¡ 30 ,438-‑485,1967 ¡ 19 ¡

Example ¡ • Ten ¡processors ¡ • 60% ¡concurrent, ¡40% ¡sequenHal ¡ • How ¡close ¡to ¡10-‑fold ¡speedup? ¡ Art of Multiprocessor Programming 20 ¡

Graphing ¡Amdahl’s ¡Law ¡ graphic ¡from ¡lecture ¡slides: ¡Defining ¡Computer ¡“Speed”: ¡An ¡Unsolved ¡Challenge, ¡Dr. ¡John ¡L. ¡Gustafson, ¡ Director ¡Intel ¡Labs, ¡30 ¡Jan ¡2011 ¡

Performance of Parallel Programs Michelle Ku3el 1 - PowerPoint PPT Presentation

Performance of Parallel Programs Michelle Ku3el 1 Analyzing algorithms Like all algorithms, parallel algorithms should be: Correct Efficient

c p e c Writing Message-Passing Parallel Programs with MPI Edinburgh Parallel Computing Centre

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources of

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources

Multiple Programs How do programs communicate? 1 Multiple Programs How do programs communicate?

Parallel and Distributed Programming Introduction Kenjiro Taura 1 / 21 Contents 1 Why Parallel

Analytical Modeling of Parallel Programs (Chapter 5) Alexandre David B2-206 Topic Overview

c p e c Writing Message-Passing Parallel Programs with MPI 1 Edinburgh Parallel Computing

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

Introduction Introduction What is Parallel Architecture? Why Parallel Architecture? Evolution

Introduction to Parallel Computing George Karypis Principles of Parallel Algorithm Design

Overview Why Parallel Sorting? Parallel Quicksort Bitonic Sort Parallel Merge Sort

Parallel Computing: Opportunities and Challenges Victor Lee Parallel Computing Lab (PCL), Intel

A Massively Parallel Dense Symmetric A Massively Parallel Dense Symmetric A Massively Parallel

Shared Memory Programming with OpenMP Lecture 3: Parallel Regions Parallel region directive

Introduction to Parallel Computing George Karypis Analytical Modeling of Parallel Algorithms

Performance of Parallel Programs Wolfgang Schreiner Research Institute for Symbolic Computation

11/14/2012 Public Health Quality Improvement 101 Public Health Quality Improvement 101 Learning,

Increase Enrollment and Revenue through Differentiation January 24, 2017 Kris Murray President

NPP Calibration/Validation Program Heather Kilcoyne NPOESS Data Products Division 15 OCT 08

Introduction to Machine Learning Amel Ghouila amel.ghouila@pasteur.tn @AmelGhouila CODATA-RDA,

Welcome to Kingston Smiths School Conference 2017 Where is the education sector heading? 14

CM30174 + CM50206 Agents and Electronic Commerce Marina De Vos, Julian Padget Communication and

NCBI2R - To navigate and annotate genes and SNPs. The Problem Genome Wide Analysis provides

On the Link Between Oscillations and Negative Circuits in Discrete Genetic Regulatory Networks