Parallel Programming and High-Performance Computing Part 6: Dynamic - - PowerPoint PPT Presentation

parallel programming and high performance computing
SMART_READER_LITE
LIVE PREVIEW

Parallel Programming and High-Performance Computing Part 6: Dynamic - - PowerPoint PPT Presentation

Technische Universitt Mnchen Parallel Programming and High-Performance Computing Part 6: Dynamic Load Balancing Dr. Ralf-Peter Mundani CeSIM / IGSSE Technische Universitt Mnchen 6 Dynamic Load Balancing Overview definitions


slide-1
SLIDE 1

Technische Universität München

Parallel Programming and High-Performance Computing

Part 6: Dynamic Load Balancing

  • Dr. Ralf-Peter Mundani

CeSIM / IGSSE

slide-2
SLIDE 2

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−2

6 Dynamic Load Balancing

Overview

  • definitions
  • examples of load balancing strategies
  • space filling curves
  • swarm intelligence

Computers make it easier to do a lot of things, but most of the things they make it easier to do don’t need to be done. —Andy Rooney

slide-3
SLIDE 3

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−3

6 Dynamic Load Balancing

Definitions

  • motivation

– central issue: fairly distribution of computations across all processors / nodes in order to optimise

  • run time (user’s point of view)
  • system load (computing centre’s point of view)

– so far, division of a problem into a fixed number of processes to be executed in parallel – problem

  • amount of work is often not known prior to execution
  • load situation changes permanently (adaptive mesh refinement

within numerical simulations, I/O, searches, …)

  • different processor speeds (heterogeneous systems, e. g.)
  • different latencies for communication (grid computing, e. g.)

– objective: load distribution or load balancing strategies

slide-4
SLIDE 4

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−4

6 Dynamic Load Balancing

Definitions

  • static load balancing

– to be applied before the execution of any process (in contrast to dynamic load balancing to be applied during the execution) – usually referred to as mapping problem or scheduling problem – potential static load-balancing techniques

  • round robin: assigning tasks (more general formulation than work to

cover both data and function parallelism) in sequential order to processes coming back to the first when all processes have been given a task

  • randomised: selecting processes at random to assign tasks
  • recursive bisection: recursive division of problems into smaller

subproblems of equal computational effort with less communication costs

  • genetic algorithm: finding an optimal distribution of tasks according

to a given objective function

slide-5
SLIDE 5

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−5

6 Dynamic Load Balancing

Definitions

  • static load balancing (cont’d)

– mapping should reflect communication pattern of processes in case of static network topologies when assigning tasks, i. e. short communication paths between processors / nodes to be preferred ( NP-complete problem) – missing knowledge about execution times of various parts of a program might lead to very inaccurate mappings – communication delays that vary under different circumstances are difficult to incorporate with static load balancing – algorithms might have an indeterminate number of steps to reach their solutions (traversing a graph in search algorithms, e. g.) – hence, different approaches needed to overcome the mentioned problems

slide-6
SLIDE 6

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−6

6 Dynamic Load Balancing

Definitions

  • dynamic load balancing

– division of tasks dependent upon the execution of parts of the program as they are being executed entails additional overhead (to be kept as small as possible, else bureaucracy wins) – assignment of tasks to processes can be classified as

  • centralised

– tasks are handed out from a centralised location – within a master-slave structure one dedicated master process directly controls each of a set of slave processes

  • decentralised

– tasks are passed between arbitrary processes – worker processes operate upon the problem and interact among themselves a worker process may receive tasks from other or may send tasks to other worker processes

slide-7
SLIDE 7

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−7

6 Dynamic Load Balancing

Definitions

  • centralised dynamic load balancing

– example: work pool

  • master process holds a collection of tasks to be performed by the

slave processes

  • tasks are sent ( ) to slave processes
  • when a task is completed, a slave process requests ( ) another

task from the master process

  • all slaves are the same (replicated worker), but specialised slaves

capable of performing certain tasks are also possible

queue with tasks queue with tasks master slave slave work pool …

slide-8
SLIDE 8

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−8

6 Dynamic Load Balancing

Definitions

  • centralised dynamic load balancing (cont’d)

– work pool techniques can also be readily applied when

  • tasks are quite different and of different size (in general, it is best to

hand out larger tasks first to prevent idle waiting)

  • amount of tasks may change during execution, i. e. execution of
  • ne task might generate new tasks (to be submitted to the master)

– computation terminates if both of the following are satisfied 1) task queue is empty 2) every process made a request for another task without any new tasks being generated (even if (1) is true a still running process may provide new tasks for the task queue) – a slave may detect the program termination condition by some local termination condition (searches, e. g.), hence it has to send a termination message to the master for closing down all others

slide-9
SLIDE 9

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−9

6 Dynamic Load Balancing

Definitions

  • decentralised dynamic load balancing

– example: distributed work pool

  • drawback of centralised model: master might become bottleneck in

case of too many slaves

  • hence, work pool is distributed among several masters
  • each master controls one group of slaves
  • several layers of decomposition possible building up a tree

hierarchy with tasks being passed downwards and requests / messages being passed upwards

M0 M1 MN …

slide-10
SLIDE 10

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−10

6 Dynamic Load Balancing

Definitions

  • decentralised dynamic load balancing (cont’d)

– example: fully distributed work pool

  • once tasks are (initially) distributed among processes (that moreover

are able to generate new tasks), all processes can execute tasks from each other

  • tasks could be transferred by a

– receiver-initiated method: a process that has only few or no tasks to perform requests tasks from other processes it selects (works well at high system loads) – sender-initiated method: a process with heavy load sends tasks to other processes it selects and that are willing to accept them (works well at light overall system loads)

  • in general, avoid passing on the same task that is received
  • which one to prefer, what kind of flaws do they have?
slide-11
SLIDE 11

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−11

6 Dynamic Load Balancing

Definitions

  • load models

– decisions of any strategy about passing / requesting tasks are based on the local load – problem: measurement of the load – reliable load models are based upon load indices

  • simple and composite load indices (one or more quantities)
  • might refer to different functional units (CPU, bus, memory, …)
  • snapshot or integrated or averaged quantities
  • stochastic quantities to reflect external influences

– properties of a good load index

  • precisely reflects the target quantity at present
  • allows for accurate predictions concerning the future
  • smoothing behaviour to compensate peaks
  • based upon some simple formula, easy to compute
slide-12
SLIDE 12

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−12

6 Dynamic Load Balancing

Definitions

  • termination detection

– recognising that computation has come to an end can be a significant problem in decentralised dynamic load balancing – distributed termination at time T requires the following conditions to be satisfied (BERTSEKAS and TSITSIKLIS, 1989) 1) application-specific local termination conditions exist throughout the collection of processes at time T 2) no messages are in transit between processes at time T – difference to centralised termination conditions: taking into account messages in transit, because a message in transit might restart an already terminated process – problem: how to recognise a message in transit waiting for a long enough period of time to allow any message in transit to arrive is not to be favoured

slide-13
SLIDE 13

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−13

6 Dynamic Load Balancing

Definitions

  • termination detection (cont’d)

– acknowledgement messages (1)

  • each process is in one of the two states active or inactive
  • initially, without any task to perform, a process is inactive
  • upon receiving a task it changes to the active state and the sending

process becomes its “parent” (it can itself become parent if passing a task to another inactive process thus creating a tree of process hierarchies)

  • an active process can receive more tasks from other active

processes while it is in the active state, but these processes do not become its parent

  • an acknowledgement message (ACK) from another process is

expected whenever passing a task to that process

  • whenever receiving a task from a process an immediate ACK is sent,

except if receiving a tasks from its parent process

slide-14
SLIDE 14

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−14

6 Dynamic Load Balancing

Definitions

  • termination detection (cont’d)

– acknowledgement messages (2)

  • an ACK to the parent process is only sent in case a process is

ready to become inactive when 1) a local termination condition exists (all tasks are finished) 2) all ACKs for received tasks have been sent 3) all ACKs for tasks passed to others have been received

  • due to (3) a process must become inactive before its parent

computation can terminate when first process becomes idle

A I …

  • ther processes

parent process f i r s t t a s k final ACK task ACK

slide-15
SLIDE 15

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−15

6 Dynamic Load Balancing

Definitions

  • termination detection (cont’d)

– ring termination (1)

  • processes are (logically) organised in a ring structure
  • the single-pass ring termination algorithm is defined follows

1) when P1 terminates, it generates a token that is passed to P2 2) when Pi (1 < i ≤ N) receives the token, it waits for its local termination condition (or has already terminated) and then passes the token onward to Pi+1 (PN passes the token to P1) 3) when P1 receives a token, it knows all processes have terminated a message can be sent to all processes informing them of global termination, if necessary

  • the algorithm assumes that a process cannot be reactivated after

reaching its local termination condition

slide-16
SLIDE 16

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−16

6 Dynamic Load Balancing

Definitions

  • termination detection (cont’d)

– ring termination (2)

  • the dual-pass ring termination algorithm can handle processes being

reactivated but requires two passes around the ring

  • here, tokens and processes are coloured white or black

– if process Pi passes tasks to Pj (j < i) it becomes a black process;

  • therwise it is a white process

– black processes colour a token black, white processes pass a token in its original colour (i. e. black or white) – P1 (when terminated) generates and passes a white token to P2; when P1 receives a black token it passes on a white token,

  • therwise (white) all processes have terminated

P1 Pj PN … … … Pi

slide-17
SLIDE 17

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−17

6 Dynamic Load Balancing

Definitions

  • strategy selection

– not all strategies are appropriate for any problem – crucial task: how to find the best strategy for a given problem – main aspects to be considered

  • objective: optimisation of load or run time
  • level of integration: OS, runtime system (MPI, e. g.), application
  • units to distribute: process / thread, parts of program, data, …

– further aspects

  • static / dynamic strategies, central / decentral strategies
  • source of initiative: idle slave, overloaded slave, master, …
  • costs of the chosen strategy (computation should dominate load

distribution and not vice versa)

  • placement of new processes or real process migration
slide-18
SLIDE 18

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−18

6 Dynamic Load Balancing

Overview

  • definitions
  • examples of load balancing strategies
  • space filling curves
  • swarm intelligence
slide-19
SLIDE 19

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−19

6 Dynamic Load Balancing

Examples of Load Balancing Strategies

  • diffusion model (a. k. a first order scheme)

– analogy to physical processes in nature (salt or ink in water, e. g.) – original algorithm introduced by CYBENKO (1989) for static network topologies, meanwhile it has been often studied and derived (second

  • rder scheme, dynamic network topologies, e. g.)

– idea: a process Pi balances its load simultaneously with all its neighbours N(i) ratio αij of the load difference between process Pi and Pj is swapped between them according to

1 ≤ i ≤ n, −1 < αij < 1

where wi

(t) is the workload done by process Pi at time t

– various methods to be found that determine parameter αij such as

  • optimal choice: needs global knowledge of the network
  • BOILLAT choice: needs only local knowledge of the neighbours

( ),

w w w w

N(i) j (t) j (t) i ij (t) i 1) (t i

∈ +

− ⋅ α − =

slide-20
SLIDE 20

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−20

6 Dynamic Load Balancing

Examples of Load Balancing Strategies

  • diffusion model (cont’d)

– update of workload can be done a) after all balancing factors have been computed (JACOBI-like) b) during computation of balancing factors (GAUSS-SEIDEL-like) – example: first two iteration steps according to method a) for a 2D grid with a ratio of α = 0.25 for workload swapping

→ → 4 8 12 16 32 12 8 4 16 16 16 16 initial setup (t = 0) 4 6 17 11 4 12 4 12 9 8 14 5 15 14 13 12 first step (t = 1) 4 9 11 12 7 6 13 9 9 11 9 9 14 13 13 11 second step (t = 2)

slide-21
SLIDE 21

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−21

6 Dynamic Load Balancing

Examples of Load Balancing Strategies

  • bidding (economic model)

– analogy to mechanisms of price fixing in markets – idea

  • process (with high workload) advertises tasks to its neighbours
  • neighbours submit their free resources as bid
  • process with highest bid (i. e. largest free resources) wins

– remarks

  • maybe several rounds of bidding necessary successively

extending the range of bidders

  • in case of sudden workload peaks, a process might reject the

purchased tasks

  • processes with free resources are still allowed to ask for tasks

– drawback: quite complex analysis of this model

slide-22
SLIDE 22

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−22

6 Dynamic Load Balancing

Examples of Load Balancing Strategies

  • balanced allocation (balls into bins)

– basic idea: placing N balls into N bins at random choice (extensively studied problem from probability and statistics) – variant of the above

  • each ball comes with D possible destinations (to be placed), chosen

independently and uniformly at random

  • then the ball is placed in the least full bin among the D possible

destinations – applied to load balancing: a process selects D processes at random and passes some of its workload to the least loaded one – for temporary tasks (i. e. tasks that are finished at unpredictable times) this strategy has a competitive ratio of Ο( ) compared to the optimal

  • ff-line strategy (that has global knowledge)

N

slide-23
SLIDE 23

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−23

6 Dynamic Load Balancing

Examples of Load Balancing Strategies

  • broker system

– origin of the idea: brokers at the stock exchange – designed and especially well-suited for hierarchical topologies – idea

  • each processor has one broker with local knowledge (about

workload in subtree, e. g.)

  • tasks arrive at the local broker (via an application server) and are

dependent from the available budget processed locally or passed to the parent node

  • on some level (at least at the root node), some price-based decision

and allocation is done

  • prices have to be paid for using remote resources as well as for the

broking itself local computations are cheaper – flexible strategy for hierarchical and heterogeneous topologies

slide-24
SLIDE 24

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−24

6 Dynamic Load Balancing

Examples of Load Balancing Strategies

  • random matching

– origin of the idea: graph theory – principle

  • construct a matching in the topology graph G = (V, E) of the network

(set of vertices V are the processors, set of edges E are the direct connections between processors)

  • matching: injective function f: x → y for all x, y ∈ V
  • perfect load balancing along all edges of the matching

– this is an iterative strategy, hence several steps are necessary – matching must be found in parallel

  • start with an empty set of edges in each vertex
  • local selection (by chance) of one incident edge in each vertex
  • coordination with neighbouring vertices, solution of conflicts
slide-25
SLIDE 25

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−25

6 Dynamic Load Balancing

Examples of Load Balancing Strategies

  • precalculation of the load

– all strategies so far are based on local information only – hence, load balancing is often quite expensive since (from a global point

  • f view) balancing steps not always lead to a better load distribution

among the processors – idea

  • global determination of the workload at program start or at certain

points in time

  • global determination of an appropriate load distribution
  • workload transfer with less communication

– developed and used for hierarchical network topologies workload recording and load balancing between child and parent nodes

slide-26
SLIDE 26

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−26

6 Dynamic Load Balancing

Overview

  • definitions
  • examples of load balancing strategies
  • space filling curves
  • swarm intelligence
slide-27
SLIDE 27

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−27

6 Dynamic Load Balancing

Space Filling Curves

  • definition

– origin of the idea: analysis and topology (“topological monsters”) – nice example of a construct from pure mathematics that gets practical relevance only decades later – definition of a space filling curve (SFC)

  • curve: image of a continuous mapping f: [0,1] → [0,1]D
  • SFC: continuous, surjective mapping f: [0,1] → [0,1]D that covers an

area (with a JORDAN content) greater than zero – prominent representatives

  • HILBERT’s SFC (1891): most famous SFC
  • PEANO’s SFC (1890): oldest SFC
  • LEBESGUE’s SFC: most important SFC for computer science

– further reading: H. Sagan, Space-Filling Curves, Springer (1994) – nice applet: http://www.cs.utexas.edu/users/vbb/misc/sfc/

slide-28
SLIDE 28

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−28

6 Dynamic Load Balancing

Space Filling Curves

  • HILBERT’s space filling curve

– for reasons of simplicity only in 2D f: I = [0,1] → [0,1]2 = Q – construction of SFC follows the geometric conception If I can be mapped onto Q in the space filling sense, then each of the four congruent subintervals of I can be mapped to one of the four quadrants of Q in the space filling sense, too. – recursive application of above preserves

  • neighbourhood relations: neighbouring subintervals in I are mapped
  • nto neighbouring subsquares of Q
  • subset relations (inclusion): from I1 ⊆ I2 follows f(I1) ⊆ f(I2)

– border case: HILBERT’s SFC

slide-29
SLIDE 29

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−29

6 Dynamic Load Balancing

Space Filling Curves

  • HILBERT’s space filling curve (cont’d)

– correspondence of nested intervals in I and nested squares in Q provides pairs of points in I with corresponding image points in Q –

  • f course, the iterative steps in this generation process are of practical

relevance, not the border case itself 1) starting with a generator or “Leitmotiv” that defines the order in which the subsquares are visited 2) applying generator in each subsquare (with appropriate similarity transformations if necessary) 3) connecting the open ends

generator for HILBERT’s SFC

slide-30
SLIDE 30

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−30

  • HILBERT’s space filling curve (cont’d)

– classical version of HILBERT 6 Dynamic Load Balancing

Space Filling Curves

1 2 3 4

1 2 3 4 1−16 17−32 33−48 49−64 1−4 5−8 9−12 13−16

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

slide-31
SLIDE 31

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−31

  • HILBERT’s space filling curve (cont’d)

– variant of MOORE – modulo symmetry, these are the only two possibilities 6 Dynamic Load Balancing

Space Filling Curves

1 2 3 4

1 2 3 4 1−4 5−8 9−12 13−16

2 1 4 3 6 7 8 5 12 9 10 11 14 13 16 15

1−16 17−32 33−48 49−64

slide-32
SLIDE 32

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−32

  • HILBERT’s space filling curve (cont’d)

– all iterations are injective, but HILBERT’s SFC itself is not injective (there are image points with more than one original point in I) – important precondition: there exists a bijective mapping between two finite-dimensional smooth manifolds (CANTOR, 1878), but it cannot be both bijective and continuous (NETTO, 1879) 6 Dynamic Load Balancing

Space Filling Curves

slide-33
SLIDE 33

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−33

6 Dynamic Load Balancing

Space Filling Curves

  • PEANO’s space filling curve

– ancestor of all SFCs – subdivision of I and Q into nine congruent subdomains – definition of a generator, again, defines the order of visit

3 4 9 2 5 8 1 6 7

slide-34
SLIDE 34

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−34

6 Dynamic Load Balancing

Space Filling Curves

  • PEANO’s space filling curve (cont’d)

– now, there are (modulo symmetry) 273 different possibilities to recursively apply the generator preserving neighbourhood and inclusion

serpentine type (left and centre) and meander type (right)

slide-35
SLIDE 35

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−35

  • LEBESGUE’s space filling curve

– definition of LEBESGUES’s SFC by the CANTOR set – CANTOR set C: repeatedly deleting the open middle thirds of [0,1] – C is defined as set of points not excluded, hence the remaining interval can be computed by the total length removed – the proportion of the remaining interval seems to be 1 − 1 = 0, but in fact C has the same cardinality as the unit interval [0,1] (!) 6 Dynamic Load Balancing

Space Filling Curves

1

1 1 1 3 1 81 8 27 4 9 2 3 1 3 2

3 2 N 1 N N

= ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − ⋅ = + + + + =

∞ = +

L

slide-36
SLIDE 36

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−36

6 Dynamic Load Balancing

Space Filling Curves

  • LEBESGUE’s space filling curve (cont’d)

– nested intervals of C to be represented by ternary numbers of the form 03.w1w2w3… with wi ∈ {0, 1, 2} – example: parameter T = 2/9 can be written as [0,1], [03.0,03.1], [03.02,03.10], [03.020,03.021], [03.0200,03.0201], … – since all interval borders can be written in two different ways (13.0 or 03.222…, e. g.) and the middle third ([03.1,03.2], e. g.) is repeatedly deleted, the CANTOR set only contains ternary numbers that consist of “0” and “2”

03.1 03.2 03.0 (13.0) 03.0200 03.0201

slide-37
SLIDE 37

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−37

6 Dynamic Load Balancing

Space Filling Curves

  • LEBESGUE’s space filling curve (cont’d)

– when mapping C to [0,1]2 according to and connecting the image points via linear interpolation, this results to LEBESGUE’s SFC also referred to as “Z-order” ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = K K K

4 2 2 3 1 2 4 3 2 1 3

y .y x .x ) 2 w w w .w ( : f

slide-38
SLIDE 38

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−38

6 Dynamic Load Balancing

Space Filling Curves

  • LEBESGUE’s space filling curve (cont’d)

– Z-ordering is well-known from quadtrees and octrees when lineariseing a tree by a depth-first traversal (provides a common naming scheme for cells lexicographic or MORTON index) – bitwise interleaving of coordinate values (x, y) leads to Z-value – useful for multidimensional range searches, e. g.

7 42 43 46 47 58 59 62 63 6 40 41 44 45 56 57 60 61 5 34 35 38 39 50 51 54 55 4 32 33 36 37 48 49 52 53 3 10 11 14 15 26 27 30 31 2 8 9 12 13 24 25 28 29 1 2 3 6 7 18 19 22 23 1 4 5 16 17 20 21 x/y 1 2 3 4 5 6 7 x: 02.100 → 4 y: 02.110 → 6 z: 02.110100 → 52

slide-39
SLIDE 39

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−39

6 Dynamic Load Balancing

Space Filling Curves

  • LEBESGUE’s space filling curve (cont’d)

– compared to SFCs studied so far, there are several differences

  • both HILBERT’s and PEANO’s SFC can be nowhere differentiated,

whereas LEBESGUE’s SFC can be differentiated almost everywhere

  • both HILBERT’s and PEANO’s SFC are self-similar (close relation to

fractals such as KOCH’s snowflake or SIERPIŃSKI’s triangle), but LEBESGUE’s SFC is not self-similar

  • generation is less complicated as for HILBERT and PEANO

– many applications of this SFC in computer science

slide-40
SLIDE 40

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−40

6 Dynamic Load Balancing

Space Filling Curves

  • applications

– sequentialisation of multidimensional data to one dimension while preserving locality

  • data are stringed sequentially like pearls
  • neighbouring points in the unit interval [0,1] have neighbouring

images in [0,1]D – important applications such as

  • efficient multidimensional range searches in databases (Oracle, e. g.)
  • multi-particle or N-body problems
  • adaptive grid refinement for partial differential equations
  • dynamic load balancing
slide-41
SLIDE 41

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−41

6 Dynamic Load Balancing

Space Filling Curves

  • applications (cont’d)

– example: range search in multidimensional data

  • query range x = [1,6], y = [1,3]; highest Z-value MAX = 45
  • starting from S = 22, e. g., search goes between S and MAX

7 42 43 46 47 58 59 62 63 6 40 41 44 45 56 57 60 61 5 34 35 38 39 50 51 54 55 4 32 33 36 37 48 49 52 53 3 10 11 14 15 26 27 30 31 2 8 9 12 13 24 25 28 29 1 2 3 6 7 18 19 22 23 1 4 5 16 17 20 21 x/y 1 2 3 4 5 6 7 to speed up, BIGMIN = 33 (smallest Z-value in search range) is computed search goes only between BIGMIN and MAX remark: for searches in the range lower than S, also a LITMAX (highest Z-value in search range) exists

slide-42
SLIDE 42

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−42

  • load balancing

– idea 1) assign points qi of some iteration of an SFC (i. e. qi ∈ Q) to points si in the D-dimensional space 2) linearly order points qi according to SFC’s original points pi ∈ I 3) simple partitioning based on this sequential order of points pi (while preserving locality) to processors possible – two techniques for (1) using LEBESGUE’s SFC (2D case)

  • compute binary numbers of length K from point si and retrieve

corresponding point pi in ternary representation

  • recursively “construct” Kth iteration of SFC for all points si and get

all corresponding points pi in ternary representation

i 2K 3 2 1 3 K 3 2 1 2 K 3 2 1 2 i

p : w w w .w y y y .y x x x .x : s = → ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ → ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ β α = K K K 6 Dynamic Load Balancing

Space Filling Curves

slide-43
SLIDE 43

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−43

6 Dynamic Load Balancing

Space Filling Curves

  • load balancing (cont’d)

– in both cases, keys (i. e. ternary numbers) already provide respective positions on I – steps left

  • sorting of the keys (step 2; main computational task of the load

balancing algorithm) – may be costly at the beginning – nevertheless, new workload easily to be inserted into the sorted list afterwards

  • partitioning of the workload (step 3)

– simple “cutting” of ordered list into equals parts for a fairly load distribution

slide-44
SLIDE 44

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−44

6 Dynamic Load Balancing

Space Filling Curves

  • quality considerations and costs

– locality

  • continuity guarantees that originals that are close together will be

mapped to closes image points

  • more important would be a “continuity of the inverse mapping”, but

due to missing injectivity this is not possible

  • best properties for HILBERT’s SFC

– load distribution

  • excellent parallelisation properties, almost perfect balancing
  • good efficiency already for small problem sizes

– costs

  • communication along subdomain boundaries, but more complicated

boundaries compared to recursive bisection

  • less overhead compared to other strategies
slide-45
SLIDE 45

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−45

6 Dynamic Load Balancing

Overview

  • definitions
  • examples of load balancing strategies
  • space filling curves
  • swarm intelligence
slide-46
SLIDE 46

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−46

6 Dynamic Load Balancing

Swarm Intelligence

  • basics

– origin of the idea: ant colonies in nature – ants and termites (as well as some bees and wasps) belong to the class

  • f so called social insects

– main characteristic of the above: building colonies – ants communicate indirectly via scent, also known as stigmergy (modification of local environment) – therefore, ants use pheromones

  • to be left along their paths for others to follow
  • to label bulk material inside the nest

– if an ant finds a path to some food it will mark it, hence other can follow until they find a better path – nevertheless, ants may by chance decide not to follow chance of exploring alternatives

model of ant nest

slide-47
SLIDE 47

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−47

6 Dynamic Load Balancing

Swarm Intelligence

  • basics (cont’d)

– even a central decision maker is missing, ant colonies have a high grade

  • f structure and organisation self-organisation

– self-organisation is based on the following properties

  • positive / negative feedback: one ant follows a path or it does not

follow a path (due to dissolution of pheromones, e. g.)

  • amplification of deviation: if one doesn’t follow and finds some

closer food, successively all others will follow the new path

  • mutual communication: key for spreading information among ants

and exploiting advantages induced by negative feedback – hence, ant colonies are often referred to as collective or swarm intelligence

  • they can adopt to their environment and related problems
  • they can adopt their environment according to their demands
slide-48
SLIDE 48

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−48

6 Dynamic Load Balancing

Swarm Intelligence

  • complex adaptive systems

– P2P systems have a decentralised control and exhibit extreme dynamism in structure and workload – problem: classical approaches cannot deal with this dynamism – hence, a paradigm shift to self-organisation, adaptation, and resilience as fundamental properties is necessary – complex adaptive systems (CAS) used to explain the behaviour of certain biological and social systems might come up for this shift

  • they consist of a large number of relatively simple autonomous

computing units, i. e. agents

  • they exhibit so called emergent behaviour, i. e. interaction among

agents can give rise to richer and more complex patterns than those generated by single agents (in isolation) – example of a CAS instance drawn from nature: ant colonies

slide-49
SLIDE 49

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−49

6 Dynamic Load Balancing

Swarm Intelligence

  • complex adaptive systems (cont’d)

– three simple rules for artificial ant colonies (RESNICK, 1994) 1) ant wanders around randomly, until it encounters an object 2) if it was carrying an object, it drops the object and continues to wander randomly 3) if it was not carrying an object, it picks the object up and continues to wander – independent of the initial distribution of objects, a colony of those ants is able to group objects into large clusters – although there are no rules specific to initial conditions, unforeseen scenarios, variations in the environment, or presence of failures global behaviour of large enough colonies is adaptive and resilient – hence, CAS can achieve these properties without explicitly embedding them into the individual agents

slide-50
SLIDE 50

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−50

6 Dynamic Load Balancing

Swarm Intelligence

  • anthill system

– proposed by MONTRESOR, MELING, and BABAOĞLU (2002) – composed of a self-organising overlay network of interconnected nests, where each nest is capable of hosting resources and performing computations – every node in the system is allowed to generate new tasks and to submit them to the network (tasks may remain in the originator or being transferred to other nests load balancing) – ants (autonomous agents) are generated by nests and travel across the nest network to detect unused computational power

nest 1 nest 3 nest 2 nest 4 nest 5

slide-51
SLIDE 51

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−51

6 Dynamic Load Balancing

Swarm Intelligence

  • load balancing

– variation of RESNICK’s artificial ant algorithm 1) when an ant is not carrying any object, it wanders about randomly until it encounters an object and picks it up 2) when an ant is carrying an object, the ant drops it only after having wandered about randomly “for a while” without encountering other

  • bjects

– colonies of such ants try to disperse objects (i. e. the actual tasks) uniformly over their environment rather than clustering them – ants may assume two different states

  • searchMAX: ant wanders about until it finds an “overloaded” nest;

ant records nest’s identifier and switches to searchMIN

  • searchMIN: ant wanders about looking for an “underloaded” nest;

ant requests local task manager for task transfer from over- to underloaded nest and switches back to searchMAX

slide-52
SLIDE 52

Technische Universität München

  • Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2008

6−52

6 Dynamic Load Balancing

Swarm Intelligence

  • load balancing (cont’d)

– ants do not transport tasks to avoid carrying potentially large amount

  • f data from one node to another while wandering about

– concept of overloaded and underloaded nests are relative to the average load of the nests recently visited by an ant enables ants to make decisions about task transfers between nests with unbalanced loads without the necessity of a global knowledge – each nest stores collected information about an ant’s last visited nests to be used by subsequent ants to drive their searchMAX or searchMIN phase at each step, the ant randomly selects the next node to visit among those that are believed to be more overloaded or underloaded, resp. – results: 100 idle nests and initially 10,000 tasks in one single node

  • only 15—20 iterations to transfer tasks to all other nodes
  • after 50 iterations, the workload is perfectly balanced