1
Term 2 2020 Complete your myExperience and shape the future of - - PowerPoint PPT Presentation
Term 2 2020 Complete your myExperience and shape the future of - - PowerPoint PPT Presentation
Distributed Termination, Global Snapshots and Parallel Scientific Computing Dr Vladimir Z. Tosic 1 Term 2 2020 Complete your myExperience and shape the future of education at UNSW. Click the link in Moodle or
Complete your myExperience and shape the future of education at UNSW.
Click the link in Moodle
- r login to myExperience.unsw.edu.au
(use z1234567@ad.unsw.edu.au to login) The survey is confidential, your identity will never be released Survey results are not released to teaching staff until after your results are published
MAIN TOPICS IN THE LAST LECTURE… (BEN-ARI TEXTBOOK CHAPTER 12 )
3
- Fault
lt toler leran ance ce and inc inconsisten istent t inf inform rmation ation in distributed systems – the problem of consen ensus sus
- By
Byzant ntine General rals algorithm hm explanation
- Byzantine Generals algorithm examples and demo in
in DA DAJ
- King
ing algo lgorit ithm hm explanation and examples
MAIN TOPICS IN THIS LECTURE… (BEN-ARI TEXTBOOK CHAPTER 11 )
4
- Glob
lobal l properti rties in a distributed system – the problem of consisten stency cy
- Dis
Distrib ribute uted d terminati ination using the Dijkstra-Scholten and credit recovery algorithms
- Global snapsho
hots ts and the Chandy-Lamport algorithm
- (Briefly; not in our textbook) Parallel programming in
scientific computing and the Gravi avitat tation ional al N-Body y Problem lem
WEEK 8 HW CLARIFICATIONS (ATOMICITY IN RICART-AGRAWALA)
5
From Chapter 10 in Ben-Ari’s Textbook
RICART-AGRAWALA ALGORITHM – COMPLETE (1/3)
6
RICART-AGRAWALA ALGORITHM – COMPLETE (2/3)
7
RICART-AGRAWALA ALGORITHM – COMPLETE (3/3)
8
RICART-AGRAWALA ALGORITHM – PROMELA FOR Main
9
RICART-AGRAWALA ALGORITHM – PROMELA FOR Receive
10
DISTRIBUTED TERMINATION – INTRODUCTION
11
From Chapter 11 in Ben-Ari’s Textbook and materials by G.R. Andrews
GLOBAL PROPERTIES IN A DISTRIBUTED SYSTEM (DS)
12
- DS conundrum 1: determining time and synchronising clocks
- DS conundrum 2: information in a node changes while
“state” information is collected among multiple nodes
- Therefore: not studying simultaneity in DS, but consistency
istency – unambiguous accounting of the state of the system 1.
- 1. Dis
Distribute ributed d terminat mination – determine whether computations in all nodes have terminated 2.
- 2. (Co
Consiste sistent) nt) snapshot hot – unambiguously account each message to a particular node/channel
TERMINATION – BROADER PERSPECTIVE
13
- Terminatio
ination is an important liveness property of programs that are intended to terminate
- Sequential programs do not terminate if they diverge (i.e.
- e. not
converge erge) and run forever
- Concurrent programs can also deadloc
lock (incl. livelock)
- Thus: termin
rminati ation = convergence ergence + deadlock-freed freedom
- m
THE NEED FOR TERMINATION DETECTION ALGORITHMS
14
- Terminatio
ination is a property of union of states of all individual processes and all message channels (“global state”)
- As “global state” of a distributed system is not visible to a
single node, it is not easy to know w wh when all ll processes esses term rminated inated
- Even when all nodes are idle, there might be me
messages ages in in transi nsit (sent but not yet received) that will unblock receiving nodes
- Several approaches possible, we will study the Dij
Dijkstra- Sc Scholte ten algorithm hm and mention some others
DISTRIBUTED TERMINATION – DIJKSTRA-SCHOLTEN ALGORITHM
15
From Chapter 11 in Ben-Ari’s Text xtbo book
- k
DIJKSTRA-SCHOLTEN ALGORITHM – ASSUMPTIONS
16
- Change to previous DS assumptions: Not every 2 nodes have to be
connected directly, nodes only have to form a dir irecte cted d graph
- Termination algorithm is additional (to regular computations)
statements executed when sending/receiving messages
- Assume special
ial envir ironme nment nt node – no incoming edges, all other nodes can be accessed from it, initiates DS by sending messages (all other nodes inactive), responsible for reporting termination
- Node begins computation after receiving 1st message (on any edge),
eventually terminates, but can restart on receiving a new message
DISTRIBUTED SYSTEM WITH ENVIRONMENT NODE AND BACK EDGES
17
- Assume: for every regular edge from i to j there is a back
k edge from j to i carrying special type of message called sign ignal
- Assume: each node is at all times able to receive, process and send
signals
DIJKSTRA-SCHOLTEN ALGORITHM –
- PRELIM. VERS. DATA STRUCTURES
18
- Requirement: for every
ry received ived messag sage, e, sign ignal l back to the source ce
- inD
inDeficit iciti[E [E]: difference between number of messages received on incoming edge E of node i and number of signals sent back
- inDe
Deficiti: sum of inDeficiti[E] for ALL edges of node i
- outDef
Defici iciti: difference between number of messages sent on ALL
- utgoing edges of node i and number of signals received back
- When a node terminates it no longer sends messages, but it can
continue sending signals as long as inD inDeficit iciti[E [E]≠0 for any edge E
- DS term
rminatio ination when for the environment node: outDe Defic ficit itenv
env=0
=0
DIJKSTRA-SCHOLTEN ALGORITHM – PRELIMINARY V. (1/3): SEND/RECEIVE
19
- Additions to regular sending and receiving of ALL messages
DIJKSTRA-SCHOLTEN ALGORITHM –
- PRELIM. VERS. (2/3): SIGNALS
20
- Additional new
w processe esses (blocked except when conditions true )
- send sign
ignal does s not send the fina inal l sign ignal l wh whil ile the node is a is active! ive! // / note e this! is!
DIJKSTRA-SCHOLTEN ALGORITHM –
- PRELIM. VERS. (3/3): ENVIRONMENT
21
DIJKSTRA-SCHOLTEN ALGORITHM –
- PRELIM. V. CORRECTNESS / LIVENESS
22
- For simplicity of proofs only, assume communication is synchron
hronou
- us
- Whether synchronous or asynchronous does not impact correctness, as
we assumed that all asynchronous messages are received eventually
- Lemma 11.1: Inva
varia riants nts 𝑗𝑜𝐸𝑓𝑔𝑗𝑑𝑗𝑢𝑗≥0, 𝑝𝑣𝑢𝐸𝑓𝑔𝑗𝑑𝑗𝑢𝑗≥0 at each node i ; σ𝑗∈𝑜𝑝𝑒𝑓𝑡 𝑗𝑜𝐸𝑓𝑔𝑗𝑑𝑗𝑢𝑗=σ𝑗∈𝑜𝑝𝑒𝑓𝑡 𝑝𝑣𝑢𝐸𝑓𝑔𝑗𝑑𝑗𝑢𝑗
- Theorem 11.2: If the system
stem term rminate inates, s, the envir ironme nment nt node eventuall tually announce ces s term rminatio ination
- Task for you: Try doing this proof yourself, then read the solution
from the textbook (page 242)
DIJKSTRA-SCHOLTEN ALGORITHM –
- PRELIM. VERS. IS NOT SAFE
23
- node1 sends to node2 and node3, which then send to each other
- inDefict2=2, inDeficit2[e2]=1, inDeficit3=2, inDeficit3[e3]=1
- By p5 and p6, both node2 and node3 signal node1, so it will have
- utDeficit1=0 and wil
will announce nce termin rminati ation before
- re it occurs
urs!
DIJKSTRA-SCHOLTEN ALGORITHM – VIRTUAL SPANNING TREE
24
- Source of 1st message to arrive at a node is this node’s parent
- Node i waits for: signals from all its children, 𝑝𝑣𝑢𝐸𝑓𝑔𝑗𝑑𝑗𝑢𝑗=0, its own
termination; then sends s it its las last sign ignal l to it its parent nt
- Variable parent
nt stores parent edge (or -1 if it is still unknown)
DIJKSTRA-SCHOLTEN ALGORITHM – FINAL VERSION (1/2)
25
- Note: no sending of messages before 1st message received
DIJKSTRA-SCHOLTEN ALGORITHM – FINAL VERSION (2/2)
26
// reset t parent; t; new parent t possible ible if re-activa ctivate ted // note this! s! // last t signal al always s to parent! nt!
DIJKSTRA-SCHOLTEN ALGORITHM – PARTIAL SCENARIO
27
DIJKSTRA-SCHOLTEN ALGORITHM – DATA STRUCTURES AFTER PARTIAL SCENARIO
28
- 1 ⇒ 2 in the table means: node1 sends message to node2
- (pare
rent nt, , inD inDeficit icit[E] [E], , outDef Deficit icit) at each node (Es in order of nodes)
- In the figure: outDef
Deficit icit wit within in node in ( in (), , inD inDeficit icit on edges
- Task for you: add sig
ignals ls and decisio isions to term rminate inate (DT DTTs) s)
DIJKSTRA-SCHOLTEN ALGORITHM – PARTIAL SCENARIO SOLUTION
29
DIJKSTRA-SCHOLTEN ALGORITHM – CORRECTNESS / SAFETY
30
- For non-environment node: 𝑞𝑏𝑠𝑓𝑜𝑢 ≠ −1 ⇔ node is activ
tive
- Lemma 11.3: 𝑗𝑜𝐸𝑓𝑔𝑗𝑑𝑗𝑢𝑗=0 ⇒ 𝑝𝑣𝑢𝐸𝑓𝑔𝑗𝑑𝑗𝑢𝑗=0 is invariant at each
non-environment node i
- Lemma 11.4: parent variables define spanning tree of active nodes
with the environment node at root; 𝑗𝑜𝐸𝑓𝑔𝑗𝑑𝑗𝑢𝑗≠0 for each active node
- Theorem 11.2: If envir
ironment nment node announces ces term rminatio ination, n, the system stem has term rminated inated
- Task for you: Try doing these proofs yourself, then read the solutions
from the textbook (page 246)
DIJKSTRA-SCHOLTEN ALGORITHM – PERFORMANCE
31
- Problem
lem: the number of additional signals = the number of messages
- Can be HUGE overhead when a big distributed system shuts down
- Improvement: sending 1 signal instead of N signals on same edge
- Improvement: Initialising all parent vars to point to environment node
- Task for you: Examine textbook (page 247) pseudocode for these
improvements
- Another problem: when deficit count is more than max integer
- Solution: credit recovery algorithms
DISTRIBUTED TERMINATION – CREDIT RECOVERY ALGORITHMS
32
From Chapter 11 in Ben-Ari’s Text xtbo book
- k
CREDIT RECOVERY ALGORITHMS – MAIN IDEAS AND LIMITATIONS
33
- Environment node starts with we
weigh ight W=1. 1.0, other nodes with W=0.0
- Every time a message sent, ½ of we
weigh ight stays ys at sender r and ½ of weight “borrowed” to receiver
- Active node has W>0.0, when term
rminat inated ed sends s all ll it its weigh ight back to the envir ironme nment nt node (which awaits W=1.0 to announce termination)
- In Mattern’s algorithm: weight received by an active node is returned
immediately to the environment node
- Problem
lem: W becomes very small in big distributed systems
- Solutions: storing only negative exponent of 2; various data structures
CREDIT RECOVERY ALGORITHM FOR ENVIRONMENT NODE
34
CREDIT RECOVERY ALGORITHM FOR NON-ENVIRONMENT NODE (1/2)
35
CREDIT RECOVERY ALGORITHM FOR NON-ENVIRONMENT NODE (2/2)
36
- Note: non-environment node never receives signal
DISTRIBUTED TERMINATION DETECTION IN RING NETWORKS
37
Material from G.R. Andrews and K. Engelhardt
TERMINATION DETECTION IN A RING (NOT IN TEXTBOOK) - PRELIMINARIES
38
- T[1:n]
1:n] are processe esses s (tasks) asks) forming a ring, ch ch[1:n] 1:n] are channel els – T[i] receives from ch[i], sends to ch[i%n+1]
- Assume messages received by neighbour in the order sent
- Idle
le process ess – terminated or waiting at receive statement (but process is active if waiting at I/O)
- After receiving a message, idle process becomes active
- Distributed program has terminated if every
ry process ess is idle is idle AND ND no messa sage ges s are in in tran ansit sit
TERMINATION DETECTION IN A RING (NOT IN TEXTBOOK) – MAIN IDEAS
39
- 1 token is passed in specia
ecial messages messages over comm. channels used in regular distributed system communications
- Process passes a token wh
when it it is is idle idle (this continues even after a process terminated)
- Process that receives token is
is al also idle idle (otherwise it would process regular communications)
- Thus, upon receiving token a process sends
sends it it to to its ts neigh ighbour and and then wa wait its to receive another message
- After
After token token passes asses wh whole le circ ircle le, every process is idle and no messages are in transit (due to their ordering)
DISTRIBUTED MUTUAL EXCLUSION ( ( REVISITED ) ) – NIELSEN-MIZUNO
40
From Chapter 10 in Ben-Ari’s Textbook
NIELSEN-MIZUNO TOKEN-PASSING ALGORITHM FOR DME
41
- Niels
ielsen-Miz Mizun uno
- toke
ken-pa passi ssing g algo lgorith ithm m for r DM DME is based
- n passing a small token in a set of virtual spanning trees
implicitly constructed by the algorithm
- Requires understanding of vir
irtua ual l spanni ning trees es, explained for the Dijkstra-Scholten distributed termination algorithm
- Optional task for you: Revisit textbook Chapter 10 and
independently study Nielsen-Mizuno token-passing algorithm for distributed mutual exclusion (DME)
NIELSEN-MIZUNO ALGORITHM – EXAMPLE DISTRIBUTED SYSTEM
42
- As
Assumpti umption: fully connected DS (direct links between all nodes)
NIELSEN-MIZUNO ALGORITHM EXAMPLE – SPANNING TREE
43
- An arbitrary spanning tree with directed edges pointing to the root
- Ro
Root node (double lines in fig.) has token and is possibly in its CS
- Nie
Nielse lsen-Miz Mizun uno
- toke
ken-pa passi ssing g algo lgorith ithm is based on passing a small token in a set of implicitly constructed virtual spanning trees
NIELSEN-MIZUNO ALGORITHM EXAMPLE – STEPS (1/3)
44
- Becky relays message (request, Becky, Aaron) to Chloe and sets
- wn parent to Aaron (parent
nt alwa lways s set to senderID erID )
- Aa
Aaron wa wants s to enter r CS CS: sends (request, Aaron, Aaron) to Becky
- Message format: (request, senderID, orig
igina inatorID
- rID)
- Aaron also sets parent 0 to become future root node
NIELSEN-MIZUNO ALGORITHM EXAMPLE – STEPS (2/3)
45
- Chloe is in CS, so sets deferred to Aaron and parent to Becky
- Generally: root sets
s it its defer erred red to orig igina inato torID rID, , it its parent ent to sender derID ID
- Evan wa
wants ts to enter r CS CS, but Chloe is no longer root and simply relays message to Becky and sets its parent to Danielle
- Chain of relays until (request, Becky, Evan) arrives at Aaron
- Aaron is root
t node wit ithout
- ut token
en
NIELSEN-MIZUNO ALGORITHM EXAMPLE – STEPS (3/3)
46
- Aaron sets deferred to Evan and parent to Becky
- defer
erred red vars rs im impli licitly itly define ine queue ue of proces esses ses wait iting ing to enter er CS
- When Chloe exits CS, sends token to Aaron which enters its CS
- When Aaron exits CS, sends token to Evan which enters its CS
NIELSEN-MIZUNO ALGORITHM – PSEUDOCODE VARIABLES
47
- Very
y memo mory ry effici ficient nt: only 3 variables needed
- Note: holding false when entering CS (true only in root outside CS)
- Messa
sages ges are very y small ll: request has 2 parameters, token has 0
- Re
Relat lative ively ly effic ficient ient: request messages might be relayed through many nodes, but token messages sent directly to originatorID
- Task for you: Write statements that initialise parent fields
NIELSEN-MIZUNO ALGORITHM – PSEUDOCODE FOR Main
48
NIELSEN-MIZUNO ALGORITHM – PSEUDOCODE FOR Receive
49
GLOBAL SNAPSHOTS – CHANDY-LAMPORT ALGORITHM
50
From Chapter 11 in Ben-Ari’s Text xtbo book
- k
GLOBAL SNAPSHOT – DEFINITIONS
51
- Glob
lobal l snapshot hot – a consistent recording of states of all nodes and edges (channels) in a distributed system
- No
Node state ate – values of internal variables and sequences of messages that this node sent and received
- Edge (chan
anne nel) l) state ate – sequence of messages sent on it but not yet delivered
- For snapshot to be consistent
istent, each message must be in exactly ctly 1 of: sent and in transit OR received
- It is NO
NOT required that all info is gathered at the same time
CHANDY-LAMPORT ALG. – MESSAGES ON A CHANNEL AND SENDING A MARKER
52
- Ch
Chandy-Lampo amport rt algorithm hm assumpt mption: all messages are delivered in the order sent, i.e. using FIFO FO channel els
- Marker
ker – additional message (on each edge), boundary between messages sent before and after snapshot
- Snapshot if node2 records state before marker: node1 state
before sending m12, node2 state after receiving m9, edge state m10, m11 (edge state ate recorded rded by recei eiving node) )
CHANDY-LAMPORT ALGORITHM FOR GLOBAL SNAPSHOTS (1/2)
53
CHANDY-LAMPORT ALGORITHM FOR GLOBAL SNAPSHOTS (2/2)
54
- Notes: first environme
nment nt node sends markers along all its edges, states of all nodes sent back using addit ition ional l algo lgorit ithm hm (not shown)
// array y assignmen gnment // array y assignmen gnment
CHANDY-LAMPORT ALGORITHM – STATES OF NODES AND EDGES
55
- 1st marker
ker recei eived d init initiat iates recordi
- rding of state
ate (union of node states)
- For outgoing edges state is: the number
er of las last sent t messag sage
- For incoming edge on which 1st marker received, no additional state
(all ll messag sages es recei eived ed before
- re 1st marke
rker r are part t of node’s state)
- For other incoming edges, state is: dif
iffer ference nce betwee ween n las last messa sage ge received before recording node’s stat ate (messageAtRecord ) and and las last messa sage ge recei eived d before
- re marker
ker on THI HIS S edge (messageAtMarker )
- When marker received on ALL edges, node record
- rds
s state ate, including: stateAtRecord[E], messageAtRecord[E], messageAtRecord[E]+1 to messageAtMarker[E] if messageAtRecord[E]≠messageAtMarker[E]
CHANDY-LAMPORT ALG. EXAMPLE – MESSAGES AND MARKERS
56
- First all 3 messages sent from node1 and received at node2
- Then, 3 messages sent from node1 but not yet received at node3
- Then, 3 messages sent from node2 but not yet received at node3
- What happens next (!) is shown in the following tables
CHANDY-LAMPORT ALG. EXAMPLE – PART 1/2
57
ls = last Sent lr = lastReceived st = stateAtRecord rc = messageAtRecord mk = messageAtMarker (empty or unchanged data structures omitted) ⇒ send ⇐ receive M - marker
CHANDY-LAMPORT ALG. EXAMPLE – PART 2/2
58
CHANDY-LAMPORT ALGORITHM – CORRECTNESS (1/2)
59
- Theorem 11.6: If the environment node initiates a snapshot, eventuall
tually the algo lgorit ithm hm disp isplay lays a consiste istent nt snapshot hot
- Note: the recorded consistent snapshot need not have actually occurred
during computation (as different nodes recorded state at different times)
- Proof:
- Every node sends marker to its children, so in a connected graph
every node gets marker and records state
- Examine program structure to check whether recording of message m
from node i to j is consistent – there are 4 cases (see the next slide)
CHANDY-LAMPORT ALGORITHM – CORRECTNESS (2/2)
60
- 1. m sent before marker & received before j recorded state – m in
lastReceived, so m is only in state of j
- 2. m sent before marker & received after j recorded state – m in
lastReceived but not in messageAtRecorded, when eventually marker received m will be up to messageAtMarker, so m will be
- nly in state of edge from i to j
- 3. m sent after marker & received before j recorded state – impossible
because j recorded state when it received marker on FIFO channel
- 4. m sent after marker & received after j recorded state – m will not be
recorded in state of j nor in state of edge from i to j
CHANDY-LAMPORT ALG. – EXAMPLE OF CONSISTENT BUT NOT ACTUAL SNAPSHOT
61
“There are two nodes, p and q. p records its state before sending any messages and then sends a marker to q followed by message 1. Before receiving the marker, q sends message 2 to p which is received. q now receives the marker and records its state, sending a marker to p. The recorded state is: p has sent no messages; q has sent message 2 and p records that message 2 is in the channel. But this state never
- ccurred, because message 2 was not in the channel when p had yet to
send a message. Nevertheless, the snapshot is consistent.”
Source: K.M. Chandy and L. Lamport. Distributed snapshots: Determining global states of distributed systems. ACM Transactions on Computer Systems, 3(1):63-75, 1985.
PARALLEL SCIENTIFIC COMPUTING
62
Material from K. Engelhardt and G.R. Andrews
PARALLEL SCIENTIFIC COMPUTING
63
- Variet
iety y of appli lications ions: computational physics, weather forecasting, designing airplanes (or cars), …
- Driving developments in high-performance computing
- Goal for leveraging parallel systems: speedu
dup on large problems (or solving an even larger problem)
- Starting with a good algorithm and optimized
sequential code
- However, the problem solution must be parall
lleli lisable le
SPEEDUP AND EFFICIENCY
64
- T1 – time for a sequent
ntial ial program gram on 1 processor essor
- TN – time for a parall
llel l program ram on N processor cessors
- Speedup:
𝑼𝟐 𝑼𝑶
Effic ficiency iency:
𝒕𝒒𝒇𝒇𝒆𝒗𝒒 𝑶
- Aiming for linear speedup: speedup = N, efficiency = 1
- Typica
icall lly achiev ieved subli linear: speedup < N, efficiency < 1
- In rare cases (due to cache effects) superlinear:
speedup > N, efficiency > 1
IMPEDIMENTS TO SPEEDUP
65
- Inherently sequential
ntial parts ts (e.g. I/O)
- Load im
imbalan lance: some processors busy, some not
- Sy
Synchronis hronisat ation overhead head: process creation, communication, critical sections, delays, fork/join, …
- Amdahl’s law: For P – fraction that can be parallelised,
maximum speedup with ∞ processors is:
𝟐 𝟐−𝑸
- Maximum speedup with N processors:
1 1−𝑄 +𝑄/𝑂
3 COMMON PROBLEM CLASSES IN PARALLEL SCIENTIFIC COMPUTING
66
- Some deterministic vs. some stochastic problems/solutions
- Grid
id metho hod applications: weather, fluid/air flow, plasma, …
- Pa
Particl ticle comput utati ation applications: gravity, electrical charge, …
- Matr
trix ix comput utation ation applications: engineering, economics, …
GRID METHOD
67
- Numerical solutions to partia
tial l dif ifferential erential equatio tions s and in
- ther applications (e.g. signal processing)
- Domain decomposition: divide area into blocks or strips
rips of point ints; assign a worker process to each
- Approxi
- ximating
ating solut lution ion for contin tinuou
- us
s system stems s using finite number of points using iterative numerical methods
- E.g. the finite-difference method in Laplace’s equation
- Iterative techniques (slow to fast): Jacobi iteration, Gauss-
Seidel, successive over-relaxation (SOR, red/black), multigrid
PARTICLE COMPUTATION – THE GRAVITATIONAL N-BODY PROBLEM
68
- Modelling dis
iscrete ete syst stem ems s consisting isting of moving ing partic ticles les that int interact act by exert rting ing forces rces on each other
- A body
body (i.e. particle) is characterised by:
- Gravi
avitation tational al force ce causes the bodies to accelerate and to move
- Gravity’s magnitude is symmetric: 𝐺 = 𝐻
𝑛𝑗𝑛𝑘 𝑠2
- Gravity’s direction is a vector from one body to another
THE GRAVITATIONAL N-BODY PROBLEM – SIMULATION
69
- Sim
imulat lated ed by stepping through disc iscrete ete ins instants nts of tim ime
- At each step, calculate tota
tal l force rce (sum of forces by all other bodies) on each body and update its acceler lerat ation ion (𝑏𝑗 =
𝐺 𝑛𝑗),
veloc locit ity (using a differential equation) and posit ition ion (an integral)
initialise bodies; for [time = start to finish by DT] { calculate forces; move bodies; }
- Sim
Simplifying assumpt mptions: all bodies on all spatial plane (2D 2D), consta stant nt acceler leration ation during the time interval, lea leapfro rog g scheme me (½ change in position due to old velocity, ½ due to new velocity)
THE GRAVITATIONAL N-BODY PROBLEM – PARALLELISATION
70
- Sequential algorithm: 𝒫(𝑜2) for 𝑜 bodies
- Various approaches to parall
lleli lisation ion possible: shared memory
- vs. message passing (manager/workers, heartbeat, pipeline, …)
- Divide bodies among workers and use a barrier after each phase
- Basic
ic algo lgorit ithm hm: Compute all pairs of forces – still 𝒫(𝑜2)
- Minim
inimise ise overh rhea eads ds: Create workers once, avoid critical sections by separating calculate phase and move phase, load balancing
- Improved
proved algorithms ms: Barnes-Hut: hierarchical, 𝒫(𝑜 log 𝑜); Fast t Mult ltipo ipole le Meth thod
- d (FMM
MM): hierarchical, 𝒫(𝑜 log 𝑜) or better
THE GRAVITATIONAL N-BODY PROBLEM – COMPARING PARALLEL SOLUTIONS
71
- Even when implementing the same mathematical calculations,
concurr urrent/ ent/paral arallel lel program rams s can dif iffer er in in severa eral l wa ways:
- ease of programming,
- load balancing,
- number of messages,
- size of messages,
- amount of locally stored data, …
- G.R. Andrew’s textbook (hyperlink) critically compares 3 message
passing programs (manager/workers, heartbeat, pipe ipeli line)
MATRIX COMPUTATION
72
- Solving system
tems s of li linear equatio tions ns
- Dense vs. sparse matrices
- Gaussian elimination, lower-upper (LU) decomposition
- In general: data parall
lleli lism (processes do same thing on different parts of data, execute synchronously in lock step) instead of task sk parallelism (processes run independently, execute asynchronously)
- Examples of data parallelism: transputer, GPU
NEXT TIME… (PREVIEW HIGHLIGHTS)
73
From additional material NOT in the textbook!
MAIN TOPICS IN THE NEXT LECTURE… (BY LIAM O’CONNOR-DAVIS)
74
- Co
Course se revisio ision
- Additional topics clarifying some important issues
- Exam preparat
aration! ion!
Complete your myExperience and shape the future of education at UNSW.
Click the link in Moodle
- r login to myExperience.unsw.edu.au
(use z1234567@ad.unsw.edu.au to login) The survey is confidential, your identity will never be released Survey results are not released to teaching staff until after your results are published