SPECULATIVE LOAD BALANCING
Hassan Eslami William D. Gropp
Department of Computer Science University of Illinois at Urbana Champaign
S PECULATIVE L OAD B ALANCING Hassan Eslami William D. Gropp - - PowerPoint PPT Presentation
S PECULATIVE L OAD B ALANCING Hassan Eslami William D. Gropp Department of Computer Science University of Illinois at Urbana Champaign 2 Continuous Dynamic Load Balancing Irregular parallel applications Irregular and unpredictable
Hassan Eslami William D. Gropp
Department of Computer Science University of Illinois at Urbana Champaign
2
Optimization and search problems N-Body problems
While (t TaskPool.get()) t.execute()
In execute(), one may call TaskPool.put() Idle time in TaskPool.get()
4
Thread 1 Thread 2
5
Thread 1 Thread 2
6
Thread 1 Thread 2
7
Thread 1 Thread 2
8
Thread 1 Thread 2
9
Thread 1 Thread 2
10
Thread 1 Thread 2
11
Thread 1 Thread 2
12
Thread 1 Thread 2
parallelism
13
Thread 1 Thread 2
14
Thread 1 Thread 2
15
Thread 1 Thread 2 Arbitration Request
16
Thread 1 Thread 2 Speculation Fail
17
Manager Thread 0 Thread 1 Thread 2 Thread 3
Work Request Work Request Work Request
18
Manager Thread 0 Thread 1 Thread 2 Thread 3
Work Request Work Request Work Request
19
Manager Thread 0 Thread 1 Thread 2 Thread 3
20
Manager Thread 0 Thread 1 Thread 2 Thread 3
Work Request Work Request Work Request
21
Manager Thread Some Worker Thread
22
Manager Thread Some Worker Thread
23
Manager Thread Some Worker Thread Arbitration Request for A A
24
Manager Thread Some Worker Thread Arbitration Request for B A B
25
Manager Thread Some Worker Thread Arbitration Request for E A B C D E
26
Manager Thread Some Worker Thread A B C D E Response for A: Success Commit
27
Manager Thread Some Worker Thread B C D E Response for B: Success Commit
28
Manager Thread Some Worker Thread C D E Response for C: Fail Roll Back Roll Back Roll Back Delete
29
Manager Thread Some Worker Thread Work Request
tree
cryptographic random number generator
childCount = f(nodeId) childId = SHA1(nodeId, childIndex)
geometric distribution with mean b)
30
releases a chunk to the manager If (HasSurplusWork() and NodesProcessed % release_inerval == 0) { ReleaseWork() }
31
32
Binomial (109 Nodes) Geometry (109 Nodes) Small 0.111 0.102 Medium 2.79 1.64 Large 10.6 4.23
5 10 15 20 25 30 35 40 1 10 100
Chunk Size Impact of release interval on execution time (Geometric Tree) 4
33
5 10 15 20 25 30 35 40 1 10 100
Chunk Size Impact of release interval on execution time (Geometric Tree) 4 8
34
5 10 15 20 25 30 35 40 1 10 100
Chunk Size Impact of release interval on execution time (Geometric Tree) 4 8 16
35
5 10 15 20 25 30 35 40 1 10 100
Chunk Size Impact of release interval on execution time (Geometric Tree) 4 8 16 32
36
5 10 15 20 25 30 35 40 1 10 100
Chunk Size Impact of release interval on execution time (Geometric Tree) 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536
37
5 10 15 20 25 30 35 40 1 10 100
Chunk Size Impact of release interval on execution time (Geometric Tree) 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 5 10 15 20 25 30 35 40 1 10 100
Chunk Size Impact of release interval on execution time (Geometric Tree) 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536
38
Original Speculative
10 15 20 25 30 35 40 45 50 1 10 100
Chunk Size Impact of release interval on execution time (Geometric Tree) 16 32 64 128 256 512 1024 2048 4096
39
nodes
Time (s) (128, 8) Time (s) (128, 12) Original 50.385 26.681 Speculative 18.902 18.886
40
20 40 60 80 100 120 140 160 180 10 100 1000
# of MPI Ranks Original Speculative
10 20 30 40 50 60 70 10 100 1000
# of MPI Ranks Original Speculative
41
42
44
than the time it takes to get response of an arbitration
destroy)
in its owner’s actual queue
45
speculative work stealing (Algorithm A)
messages with prefetching = optimized prefetch-based work stealing (Algorithm B)
percent overall)
UTS
where there is a limited parallelism due to data dependence
46