Term 2 2020 Complete your myExperience and shape the future of - - PowerPoint PPT Presentation

term 2 2020
SMART_READER_LITE
LIVE PREVIEW

Term 2 2020 Complete your myExperience and shape the future of - - PowerPoint PPT Presentation

Distributed Termination, Global Snapshots and Parallel Scientific Computing Dr Vladimir Z. Tosic 1 Term 2 2020 Complete your myExperience and shape the future of education at UNSW. Click the link in Moodle or


slide-1
SLIDE 1

1

Distributed Termination, Global Snapshots and Parallel Scientific Computing Dr Vladimir Z. Tosic Term 2 2020

slide-2
SLIDE 2

Complete your myExperience and shape the future of education at UNSW.

Click the link in Moodle

  • r login to myExperience.unsw.edu.au

(use z1234567@ad.unsw.edu.au to login) The survey is confidential, your identity will never be released Survey results are not released to teaching staff until after your results are published

slide-3
SLIDE 3

MAIN TOPICS IN THE LAST LECTURE… (BEN-ARI TEXTBOOK CHAPTER 12 )

3

  • Fault

lt toler leran ance ce and inc inconsisten istent t inf inform rmation ation in distributed systems – the problem of consen ensus sus

  • By

Byzant ntine General rals algorithm hm explanation

  • Byzantine Generals algorithm examples and demo in

in DA DAJ

  • King

ing algo lgorit ithm hm explanation and examples

slide-4
SLIDE 4

MAIN TOPICS IN THIS LECTURE… (BEN-ARI TEXTBOOK CHAPTER 11 )

4

  • Glob

lobal l properti rties in a distributed system – the problem of consisten stency cy

  • Dis

Distrib ribute uted d terminati ination using the Dijkstra-Scholten and credit recovery algorithms

  • Global snapsho

hots ts and the Chandy-Lamport algorithm

  • (Briefly; not in our textbook) Parallel programming in

scientific computing and the Gravi avitat tation ional al N-Body y Problem lem

slide-5
SLIDE 5

WEEK 8 HW CLARIFICATIONS (ATOMICITY IN RICART-AGRAWALA)

5

From Chapter 10 in Ben-Ari’s Textbook

slide-6
SLIDE 6

RICART-AGRAWALA ALGORITHM – COMPLETE (1/3)

6

slide-7
SLIDE 7

RICART-AGRAWALA ALGORITHM – COMPLETE (2/3)

7

slide-8
SLIDE 8

RICART-AGRAWALA ALGORITHM – COMPLETE (3/3)

8

slide-9
SLIDE 9

RICART-AGRAWALA ALGORITHM – PROMELA FOR Main

9

slide-10
SLIDE 10

RICART-AGRAWALA ALGORITHM – PROMELA FOR Receive

10

slide-11
SLIDE 11

DISTRIBUTED TERMINATION – INTRODUCTION

11

From Chapter 11 in Ben-Ari’s Textbook and materials by G.R. Andrews

slide-12
SLIDE 12

GLOBAL PROPERTIES IN A DISTRIBUTED SYSTEM (DS)

12

  • DS conundrum 1: determining time and synchronising clocks
  • DS conundrum 2: information in a node changes while

“state” information is collected among multiple nodes

  • Therefore: not studying simultaneity in DS, but consistency

istency – unambiguous accounting of the state of the system 1.

  • 1. Dis

Distribute ributed d terminat mination – determine whether computations in all nodes have terminated 2.

  • 2. (Co

Consiste sistent) nt) snapshot hot – unambiguously account each message to a particular node/channel

slide-13
SLIDE 13

TERMINATION – BROADER PERSPECTIVE

13

  • Terminatio

ination is an important liveness property of programs that are intended to terminate

  • Sequential programs do not terminate if they diverge (i.e.
  • e. not

converge erge) and run forever

  • Concurrent programs can also deadloc

lock (incl. livelock)

  • Thus: termin

rminati ation = convergence ergence + deadlock-freed freedom

  • m
slide-14
SLIDE 14

THE NEED FOR TERMINATION DETECTION ALGORITHMS

14

  • Terminatio

ination is a property of union of states of all individual processes and all message channels (“global state”)

  • As “global state” of a distributed system is not visible to a

single node, it is not easy to know w wh when all ll processes esses term rminated inated

  • Even when all nodes are idle, there might be me

messages ages in in transi nsit (sent but not yet received) that will unblock receiving nodes

  • Several approaches possible, we will study the Dij

Dijkstra- Sc Scholte ten algorithm hm and mention some others

slide-15
SLIDE 15

DISTRIBUTED TERMINATION – DIJKSTRA-SCHOLTEN ALGORITHM

15

From Chapter 11 in Ben-Ari’s Text xtbo book

  • k
slide-16
SLIDE 16

DIJKSTRA-SCHOLTEN ALGORITHM – ASSUMPTIONS

16

  • Change to previous DS assumptions: Not every 2 nodes have to be

connected directly, nodes only have to form a dir irecte cted d graph

  • Termination algorithm is additional (to regular computations)

statements executed when sending/receiving messages

  • Assume special

ial envir ironme nment nt node – no incoming edges, all other nodes can be accessed from it, initiates DS by sending messages (all other nodes inactive), responsible for reporting termination

  • Node begins computation after receiving 1st message (on any edge),

eventually terminates, but can restart on receiving a new message

slide-17
SLIDE 17

DISTRIBUTED SYSTEM WITH ENVIRONMENT NODE AND BACK EDGES

17

  • Assume: for every regular edge from i to j there is a back

k edge from j to i carrying special type of message called sign ignal

  • Assume: each node is at all times able to receive, process and send

signals

slide-18
SLIDE 18

DIJKSTRA-SCHOLTEN ALGORITHM –

  • PRELIM. VERS. DATA STRUCTURES

18

  • Requirement: for every

ry received ived messag sage, e, sign ignal l back to the source ce

  • inD

inDeficit iciti[E [E]: difference between number of messages received on incoming edge E of node i and number of signals sent back

  • inDe

Deficiti: sum of inDeficiti[E] for ALL edges of node i

  • outDef

Defici iciti: difference between number of messages sent on ALL

  • utgoing edges of node i and number of signals received back
  • When a node terminates it no longer sends messages, but it can

continue sending signals as long as inD inDeficit iciti[E [E]≠0 for any edge E

  • DS term

rminatio ination when for the environment node: outDe Defic ficit itenv

env=0

=0

slide-19
SLIDE 19

DIJKSTRA-SCHOLTEN ALGORITHM – PRELIMINARY V. (1/3): SEND/RECEIVE

19

  • Additions to regular sending and receiving of ALL messages
slide-20
SLIDE 20

DIJKSTRA-SCHOLTEN ALGORITHM –

  • PRELIM. VERS. (2/3): SIGNALS

20

  • Additional new

w processe esses (blocked except when conditions true )

  • send sign

ignal does s not send the fina inal l sign ignal l wh whil ile the node is a is active! ive! // / note e this! is!

slide-21
SLIDE 21

DIJKSTRA-SCHOLTEN ALGORITHM –

  • PRELIM. VERS. (3/3): ENVIRONMENT

21

slide-22
SLIDE 22

DIJKSTRA-SCHOLTEN ALGORITHM –

  • PRELIM. V. CORRECTNESS / LIVENESS

22

  • For simplicity of proofs only, assume communication is synchron

hronou

  • us
  • Whether synchronous or asynchronous does not impact correctness, as

we assumed that all asynchronous messages are received eventually

  • Lemma 11.1: Inva

varia riants nts 𝑗𝑜𝐸𝑓𝑔𝑗𝑑𝑗𝑢𝑗≥0, 𝑝𝑣𝑢𝐸𝑓𝑔𝑗𝑑𝑗𝑢𝑗≥0 at each node i ; σ𝑗∈𝑜𝑝𝑒𝑓𝑡 𝑗𝑜𝐸𝑓𝑔𝑗𝑑𝑗𝑢𝑗=σ𝑗∈𝑜𝑝𝑒𝑓𝑡 𝑝𝑣𝑢𝐸𝑓𝑔𝑗𝑑𝑗𝑢𝑗

  • Theorem 11.2: If the system

stem term rminate inates, s, the envir ironme nment nt node eventuall tually announce ces s term rminatio ination

  • Task for you: Try doing this proof yourself, then read the solution

from the textbook (page 242)

slide-23
SLIDE 23

DIJKSTRA-SCHOLTEN ALGORITHM –

  • PRELIM. VERS. IS NOT SAFE

23

  • node1 sends to node2 and node3, which then send to each other
  • inDefict2=2, inDeficit2[e2]=1, inDeficit3=2, inDeficit3[e3]=1
  • By p5 and p6, both node2 and node3 signal node1, so it will have
  • utDeficit1=0 and wil

will announce nce termin rminati ation before

  • re it occurs

urs!

slide-24
SLIDE 24

DIJKSTRA-SCHOLTEN ALGORITHM – VIRTUAL SPANNING TREE

24

  • Source of 1st message to arrive at a node is this node’s parent
  • Node i waits for: signals from all its children, 𝑝𝑣𝑢𝐸𝑓𝑔𝑗𝑑𝑗𝑢𝑗=0, its own

termination; then sends s it its las last sign ignal l to it its parent nt

  • Variable parent

nt stores parent edge (or -1 if it is still unknown)

slide-25
SLIDE 25

DIJKSTRA-SCHOLTEN ALGORITHM – FINAL VERSION (1/2)

25

  • Note: no sending of messages before 1st message received
slide-26
SLIDE 26

DIJKSTRA-SCHOLTEN ALGORITHM – FINAL VERSION (2/2)

26

// reset t parent; t; new parent t possible ible if re-activa ctivate ted // note this! s! // last t signal al always s to parent! nt!

slide-27
SLIDE 27

DIJKSTRA-SCHOLTEN ALGORITHM – PARTIAL SCENARIO

27

slide-28
SLIDE 28

DIJKSTRA-SCHOLTEN ALGORITHM – DATA STRUCTURES AFTER PARTIAL SCENARIO

28

  • 1 ⇒ 2 in the table means: node1 sends message to node2
  • (pare

rent nt, , inD inDeficit icit[E] [E], , outDef Deficit icit) at each node (Es in order of nodes)

  • In the figure: outDef

Deficit icit wit within in node in ( in (), , inD inDeficit icit on edges

  • Task for you: add sig

ignals ls and decisio isions to term rminate inate (DT DTTs) s)

slide-29
SLIDE 29

DIJKSTRA-SCHOLTEN ALGORITHM – PARTIAL SCENARIO SOLUTION

29

slide-30
SLIDE 30

DIJKSTRA-SCHOLTEN ALGORITHM – CORRECTNESS / SAFETY

30

  • For non-environment node: 𝑞𝑏𝑠𝑓𝑜𝑢 ≠ −1 ⇔ node is activ

tive

  • Lemma 11.3: 𝑗𝑜𝐸𝑓𝑔𝑗𝑑𝑗𝑢𝑗=0 ⇒ 𝑝𝑣𝑢𝐸𝑓𝑔𝑗𝑑𝑗𝑢𝑗=0 is invariant at each

non-environment node i

  • Lemma 11.4: parent variables define spanning tree of active nodes

with the environment node at root; 𝑗𝑜𝐸𝑓𝑔𝑗𝑑𝑗𝑢𝑗≠0 for each active node

  • Theorem 11.2: If envir

ironment nment node announces ces term rminatio ination, n, the system stem has term rminated inated

  • Task for you: Try doing these proofs yourself, then read the solutions

from the textbook (page 246)

slide-31
SLIDE 31

DIJKSTRA-SCHOLTEN ALGORITHM – PERFORMANCE

31

  • Problem

lem: the number of additional signals = the number of messages

  • Can be HUGE overhead when a big distributed system shuts down
  • Improvement: sending 1 signal instead of N signals on same edge
  • Improvement: Initialising all parent vars to point to environment node
  • Task for you: Examine textbook (page 247) pseudocode for these

improvements

  • Another problem: when deficit count is more than max integer
  • Solution: credit recovery algorithms
slide-32
SLIDE 32

DISTRIBUTED TERMINATION – CREDIT RECOVERY ALGORITHMS

32

From Chapter 11 in Ben-Ari’s Text xtbo book

  • k
slide-33
SLIDE 33

CREDIT RECOVERY ALGORITHMS – MAIN IDEAS AND LIMITATIONS

33

  • Environment node starts with we

weigh ight W=1. 1.0, other nodes with W=0.0

  • Every time a message sent, ½ of we

weigh ight stays ys at sender r and ½ of weight “borrowed” to receiver

  • Active node has W>0.0, when term

rminat inated ed sends s all ll it its weigh ight back to the envir ironme nment nt node (which awaits W=1.0 to announce termination)

  • In Mattern’s algorithm: weight received by an active node is returned

immediately to the environment node

  • Problem

lem: W becomes very small in big distributed systems

  • Solutions: storing only negative exponent of 2; various data structures
slide-34
SLIDE 34

CREDIT RECOVERY ALGORITHM FOR ENVIRONMENT NODE

34

slide-35
SLIDE 35

CREDIT RECOVERY ALGORITHM FOR NON-ENVIRONMENT NODE (1/2)

35

slide-36
SLIDE 36

CREDIT RECOVERY ALGORITHM FOR NON-ENVIRONMENT NODE (2/2)

36

  • Note: non-environment node never receives signal
slide-37
SLIDE 37

DISTRIBUTED TERMINATION DETECTION IN RING NETWORKS

37

Material from G.R. Andrews and K. Engelhardt

slide-38
SLIDE 38

TERMINATION DETECTION IN A RING (NOT IN TEXTBOOK) - PRELIMINARIES

38

  • T[1:n]

1:n] are processe esses s (tasks) asks) forming a ring, ch ch[1:n] 1:n] are channel els – T[i] receives from ch[i], sends to ch[i%n+1]

  • Assume messages received by neighbour in the order sent
  • Idle

le process ess – terminated or waiting at receive statement (but process is active if waiting at I/O)

  • After receiving a message, idle process becomes active
  • Distributed program has terminated if every

ry process ess is idle is idle AND ND no messa sage ges s are in in tran ansit sit

slide-39
SLIDE 39

TERMINATION DETECTION IN A RING (NOT IN TEXTBOOK) – MAIN IDEAS

39

  • 1 token is passed in specia

ecial messages messages over comm. channels used in regular distributed system communications

  • Process passes a token wh

when it it is is idle idle (this continues even after a process terminated)

  • Process that receives token is

is al also idle idle (otherwise it would process regular communications)

  • Thus, upon receiving token a process sends

sends it it to to its ts neigh ighbour and and then wa wait its to receive another message

  • After

After token token passes asses wh whole le circ ircle le, every process is idle and no messages are in transit (due to their ordering)

slide-40
SLIDE 40

DISTRIBUTED MUTUAL EXCLUSION ( ( REVISITED ) ) – NIELSEN-MIZUNO

40

From Chapter 10 in Ben-Ari’s Textbook

slide-41
SLIDE 41

NIELSEN-MIZUNO TOKEN-PASSING ALGORITHM FOR DME

41

  • Niels

ielsen-Miz Mizun uno

  • toke

ken-pa passi ssing g algo lgorith ithm m for r DM DME is based

  • n passing a small token in a set of virtual spanning trees

implicitly constructed by the algorithm

  • Requires understanding of vir

irtua ual l spanni ning trees es, explained for the Dijkstra-Scholten distributed termination algorithm

  • Optional task for you: Revisit textbook Chapter 10 and

independently study Nielsen-Mizuno token-passing algorithm for distributed mutual exclusion (DME)

slide-42
SLIDE 42

NIELSEN-MIZUNO ALGORITHM – EXAMPLE DISTRIBUTED SYSTEM

42

  • As

Assumpti umption: fully connected DS (direct links between all nodes)

slide-43
SLIDE 43

NIELSEN-MIZUNO ALGORITHM EXAMPLE – SPANNING TREE

43

  • An arbitrary spanning tree with directed edges pointing to the root
  • Ro

Root node (double lines in fig.) has token and is possibly in its CS

  • Nie

Nielse lsen-Miz Mizun uno

  • toke

ken-pa passi ssing g algo lgorith ithm is based on passing a small token in a set of implicitly constructed virtual spanning trees

slide-44
SLIDE 44

NIELSEN-MIZUNO ALGORITHM EXAMPLE – STEPS (1/3)

44

  • Becky relays message (request, Becky, Aaron) to Chloe and sets
  • wn parent to Aaron (parent

nt alwa lways s set to senderID erID )

  • Aa

Aaron wa wants s to enter r CS CS: sends (request, Aaron, Aaron) to Becky

  • Message format: (request, senderID, orig

igina inatorID

  • rID)
  • Aaron also sets parent  0 to become future root node
slide-45
SLIDE 45

NIELSEN-MIZUNO ALGORITHM EXAMPLE – STEPS (2/3)

45

  • Chloe is in CS, so sets deferred to Aaron and parent to Becky
  • Generally: root sets

s it its defer erred red to orig igina inato torID rID, , it its parent ent to sender derID ID

  • Evan wa

wants ts to enter r CS CS, but Chloe is no longer root and simply relays message to Becky and sets its parent to Danielle

  • Chain of relays until (request, Becky, Evan) arrives at Aaron
  • Aaron is root

t node wit ithout

  • ut token

en

slide-46
SLIDE 46

NIELSEN-MIZUNO ALGORITHM EXAMPLE – STEPS (3/3)

46

  • Aaron sets deferred to Evan and parent to Becky
  • defer

erred red vars rs im impli licitly itly define ine queue ue of proces esses ses wait iting ing to enter er CS

  • When Chloe exits CS, sends token to Aaron which enters its CS
  • When Aaron exits CS, sends token to Evan which enters its CS
slide-47
SLIDE 47

NIELSEN-MIZUNO ALGORITHM – PSEUDOCODE VARIABLES

47

  • Very

y memo mory ry effici ficient nt: only 3 variables needed

  • Note: holding  false when entering CS (true only in root outside CS)
  • Messa

sages ges are very y small ll: request has 2 parameters, token has 0

  • Re

Relat lative ively ly effic ficient ient: request messages might be relayed through many nodes, but token messages sent directly to originatorID

  • Task for you: Write statements that initialise parent fields
slide-48
SLIDE 48

NIELSEN-MIZUNO ALGORITHM – PSEUDOCODE FOR Main

48

slide-49
SLIDE 49

NIELSEN-MIZUNO ALGORITHM – PSEUDOCODE FOR Receive

49

slide-50
SLIDE 50

GLOBAL SNAPSHOTS – CHANDY-LAMPORT ALGORITHM

50

From Chapter 11 in Ben-Ari’s Text xtbo book

  • k
slide-51
SLIDE 51

GLOBAL SNAPSHOT – DEFINITIONS

51

  • Glob

lobal l snapshot hot – a consistent recording of states of all nodes and edges (channels) in a distributed system

  • No

Node state ate – values of internal variables and sequences of messages that this node sent and received

  • Edge (chan

anne nel) l) state ate – sequence of messages sent on it but not yet delivered

  • For snapshot to be consistent

istent, each message must be in exactly ctly 1 of: sent and in transit OR received

  • It is NO

NOT required that all info is gathered at the same time

slide-52
SLIDE 52

CHANDY-LAMPORT ALG. – MESSAGES ON A CHANNEL AND SENDING A MARKER

52

  • Ch

Chandy-Lampo amport rt algorithm hm assumpt mption: all messages are delivered in the order sent, i.e. using FIFO FO channel els

  • Marker

ker – additional message (on each edge), boundary between messages sent before and after snapshot

  • Snapshot if node2 records state before marker: node1 state

before sending m12, node2 state after receiving m9, edge state m10, m11 (edge state ate recorded rded by recei eiving node) )

slide-53
SLIDE 53

CHANDY-LAMPORT ALGORITHM FOR GLOBAL SNAPSHOTS (1/2)

53

slide-54
SLIDE 54

CHANDY-LAMPORT ALGORITHM FOR GLOBAL SNAPSHOTS (2/2)

54

  • Notes: first environme

nment nt node sends markers along all its edges, states of all nodes sent back using addit ition ional l algo lgorit ithm hm (not shown)

// array y assignmen gnment // array y assignmen gnment

slide-55
SLIDE 55

CHANDY-LAMPORT ALGORITHM – STATES OF NODES AND EDGES

55

  • 1st marker

ker recei eived d init initiat iates recordi

  • rding of state

ate (union of node states)

  • For outgoing edges state is: the number

er of las last sent t messag sage

  • For incoming edge on which 1st marker received, no additional state

(all ll messag sages es recei eived ed before

  • re 1st marke

rker r are part t of node’s state)

  • For other incoming edges, state is: dif

iffer ference nce betwee ween n las last messa sage ge received before recording node’s stat ate (messageAtRecord ) and and las last messa sage ge recei eived d before

  • re marker

ker on THI HIS S edge (messageAtMarker )

  • When marker received on ALL edges, node record
  • rds

s state ate, including: stateAtRecord[E], messageAtRecord[E], messageAtRecord[E]+1 to messageAtMarker[E] if messageAtRecord[E]≠messageAtMarker[E]

slide-56
SLIDE 56

CHANDY-LAMPORT ALG. EXAMPLE – MESSAGES AND MARKERS

56

  • First all 3 messages sent from node1 and received at node2
  • Then, 3 messages sent from node1 but not yet received at node3
  • Then, 3 messages sent from node2 but not yet received at node3
  • What happens next (!) is shown in the following tables
slide-57
SLIDE 57

CHANDY-LAMPORT ALG. EXAMPLE – PART 1/2

57

ls = last Sent lr = lastReceived st = stateAtRecord rc = messageAtRecord mk = messageAtMarker (empty or unchanged data structures omitted) ⇒ send ⇐ receive M - marker

slide-58
SLIDE 58

CHANDY-LAMPORT ALG. EXAMPLE – PART 2/2

58

slide-59
SLIDE 59

CHANDY-LAMPORT ALGORITHM – CORRECTNESS (1/2)

59

  • Theorem 11.6: If the environment node initiates a snapshot, eventuall

tually the algo lgorit ithm hm disp isplay lays a consiste istent nt snapshot hot

  • Note: the recorded consistent snapshot need not have actually occurred

during computation (as different nodes recorded state at different times)

  • Proof:
  • Every node sends marker to its children, so in a connected graph

every node gets marker and records state

  • Examine program structure to check whether recording of message m

from node i to j is consistent – there are 4 cases (see the next slide)

slide-60
SLIDE 60

CHANDY-LAMPORT ALGORITHM – CORRECTNESS (2/2)

60

  • 1. m sent before marker & received before j recorded state – m in

lastReceived, so m is only in state of j

  • 2. m sent before marker & received after j recorded state – m in

lastReceived but not in messageAtRecorded, when eventually marker received m will be up to messageAtMarker, so m will be

  • nly in state of edge from i to j
  • 3. m sent after marker & received before j recorded state – impossible

because j recorded state when it received marker on FIFO channel

  • 4. m sent after marker & received after j recorded state – m will not be

recorded in state of j nor in state of edge from i to j

slide-61
SLIDE 61

CHANDY-LAMPORT ALG. – EXAMPLE OF CONSISTENT BUT NOT ACTUAL SNAPSHOT

61

“There are two nodes, p and q. p records its state before sending any messages and then sends a marker to q followed by message 1. Before receiving the marker, q sends message 2 to p which is received. q now receives the marker and records its state, sending a marker to p. The recorded state is: p has sent no messages; q has sent message 2 and p records that message 2 is in the channel. But this state never

  • ccurred, because message 2 was not in the channel when p had yet to

send a message. Nevertheless, the snapshot is consistent.”

Source: K.M. Chandy and L. Lamport. Distributed snapshots: Determining global states of distributed systems. ACM Transactions on Computer Systems, 3(1):63-75, 1985.

slide-62
SLIDE 62

PARALLEL SCIENTIFIC COMPUTING

62

Material from K. Engelhardt and G.R. Andrews

slide-63
SLIDE 63

PARALLEL SCIENTIFIC COMPUTING

63

  • Variet

iety y of appli lications ions: computational physics, weather forecasting, designing airplanes (or cars), …

  • Driving developments in high-performance computing
  • Goal for leveraging parallel systems: speedu

dup on large problems (or solving an even larger problem)

  • Starting with a good algorithm and optimized

sequential code

  • However, the problem solution must be parall

lleli lisable le

slide-64
SLIDE 64

SPEEDUP AND EFFICIENCY

64

  • T1 – time for a sequent

ntial ial program gram on 1 processor essor

  • TN – time for a parall

llel l program ram on N processor cessors

  • Speedup:

𝑼𝟐 𝑼𝑶

Effic ficiency iency:

𝒕𝒒𝒇𝒇𝒆𝒗𝒒 𝑶

  • Aiming for linear speedup: speedup = N, efficiency = 1
  • Typica

icall lly achiev ieved subli linear: speedup < N, efficiency < 1

  • In rare cases (due to cache effects) superlinear:

speedup > N, efficiency > 1

slide-65
SLIDE 65

IMPEDIMENTS TO SPEEDUP

65

  • Inherently sequential

ntial parts ts (e.g. I/O)

  • Load im

imbalan lance: some processors busy, some not

  • Sy

Synchronis hronisat ation overhead head: process creation, communication, critical sections, delays, fork/join, …

  • Amdahl’s law: For P – fraction that can be parallelised,

maximum speedup with ∞ processors is:

𝟐 𝟐−𝑸

  • Maximum speedup with N processors:

1 1−𝑄 +𝑄/𝑂

slide-66
SLIDE 66

3 COMMON PROBLEM CLASSES IN PARALLEL SCIENTIFIC COMPUTING

66

  • Some deterministic vs. some stochastic problems/solutions
  • Grid

id metho hod applications: weather, fluid/air flow, plasma, …

  • Pa

Particl ticle comput utati ation applications: gravity, electrical charge, …

  • Matr

trix ix comput utation ation applications: engineering, economics, …

slide-67
SLIDE 67

GRID METHOD

67

  • Numerical solutions to partia

tial l dif ifferential erential equatio tions s and in

  • ther applications (e.g. signal processing)
  • Domain decomposition: divide area into blocks or strips

rips of point ints; assign a worker process to each

  • Approxi
  • ximating

ating solut lution ion for contin tinuou

  • us

s system stems s using finite number of points using iterative numerical methods

  • E.g. the finite-difference method in Laplace’s equation
  • Iterative techniques (slow to fast): Jacobi iteration, Gauss-

Seidel, successive over-relaxation (SOR, red/black), multigrid

slide-68
SLIDE 68

PARTICLE COMPUTATION – THE GRAVITATIONAL N-BODY PROBLEM

68

  • Modelling dis

iscrete ete syst stem ems s consisting isting of moving ing partic ticles les that int interact act by exert rting ing forces rces on each other

  • A body

body (i.e. particle) is characterised by:

  • Gravi

avitation tational al force ce causes the bodies to accelerate and to move

  • Gravity’s magnitude is symmetric: 𝐺 = 𝐻

𝑛𝑗𝑛𝑘 𝑠2

  • Gravity’s direction is a vector from one body to another
slide-69
SLIDE 69

THE GRAVITATIONAL N-BODY PROBLEM – SIMULATION

69

  • Sim

imulat lated ed by stepping through disc iscrete ete ins instants nts of tim ime

  • At each step, calculate tota

tal l force rce (sum of forces by all other bodies) on each body and update its acceler lerat ation ion (𝑏𝑗 =

𝐺 𝑛𝑗),

veloc locit ity (using a differential equation) and posit ition ion (an integral)

initialise bodies; for [time = start to finish by DT] { calculate forces; move bodies; }

  • Sim

Simplifying assumpt mptions: all bodies on all spatial plane (2D 2D), consta stant nt acceler leration ation during the time interval, lea leapfro rog g scheme me (½ change in position due to old velocity, ½ due to new velocity)

slide-70
SLIDE 70

THE GRAVITATIONAL N-BODY PROBLEM – PARALLELISATION

70

  • Sequential algorithm: 𝒫(𝑜2) for 𝑜 bodies
  • Various approaches to parall

lleli lisation ion possible: shared memory

  • vs. message passing (manager/workers, heartbeat, pipeline, …)
  • Divide bodies among workers and use a barrier after each phase
  • Basic

ic algo lgorit ithm hm: Compute all pairs of forces – still 𝒫(𝑜2)

  • Minim

inimise ise overh rhea eads ds: Create workers once, avoid critical sections by separating calculate phase and move phase, load balancing

  • Improved

proved algorithms ms: Barnes-Hut: hierarchical, 𝒫(𝑜 log 𝑜); Fast t Mult ltipo ipole le Meth thod

  • d (FMM

MM): hierarchical, 𝒫(𝑜 log 𝑜) or better

slide-71
SLIDE 71

THE GRAVITATIONAL N-BODY PROBLEM – COMPARING PARALLEL SOLUTIONS

71

  • Even when implementing the same mathematical calculations,

concurr urrent/ ent/paral arallel lel program rams s can dif iffer er in in severa eral l wa ways:

  • ease of programming,
  • load balancing,
  • number of messages,
  • size of messages,
  • amount of locally stored data, …
  • G.R. Andrew’s textbook (hyperlink) critically compares 3 message

passing programs (manager/workers, heartbeat, pipe ipeli line)

slide-72
SLIDE 72

MATRIX COMPUTATION

72

  • Solving system

tems s of li linear equatio tions ns

  • Dense vs. sparse matrices
  • Gaussian elimination, lower-upper (LU) decomposition
  • In general: data parall

lleli lism (processes do same thing on different parts of data, execute synchronously in lock step) instead of task sk parallelism (processes run independently, execute asynchronously)

  • Examples of data parallelism: transputer, GPU
slide-73
SLIDE 73

NEXT TIME… (PREVIEW HIGHLIGHTS)

73

From additional material NOT in the textbook!

slide-74
SLIDE 74

MAIN TOPICS IN THE NEXT LECTURE… (BY LIAM O’CONNOR-DAVIS)

74

  • Co

Course se revisio ision

  • Additional topics clarifying some important issues
  • Exam preparat

aration! ion!

slide-75
SLIDE 75

Complete your myExperience and shape the future of education at UNSW.

Click the link in Moodle

  • r login to myExperience.unsw.edu.au

(use z1234567@ad.unsw.edu.au to login) The survey is confidential, your identity will never be released Survey results are not released to teaching staff until after your results are published