Understanding Task Scheduling Algorithms
Kenjiro Taura
1 / 51
Understanding Task Scheduling Algorithms Kenjiro Taura 1 / 51 - - PowerPoint PPT Presentation
Understanding Task Scheduling Algorithms Kenjiro Taura 1 / 51 Contents 1 Introduction 2 Work stealing scheduler 3 Analyzing execution time Introduction DAG model and greedy schedulers Work stealing schedulers 4 Analyzing cache misses of work
1 / 51
1 Introduction 2 Work stealing scheduler 3 Analyzing execution time
4 Analyzing cache misses of work stealing 5 Summary
2 / 51
1 Introduction 2 Work stealing scheduler 3 Analyzing execution time
4 Analyzing cache misses of work stealing 5 Summary
3 / 51
1
void ms(elem * a, elem * a_end,
2
elem * t, int dest) {
3
long n = a_end - a;
4
if (n == 1) {
5
...
6
} else {
7
...
8
create task(ms(a, c, t, 1 - dest));
9
ms(c, a_end, t + nh, 1 - dest);
10
wait tasks;
11
}
12
}
T0 T1 T161 T2 T40 T3 T31 T4 T29 T5 T11 T6 T7 T8 T9 T10 T12 T24 T13 T14 T15 T23 T16 T20 T17 T18 T19 T21 T22 T25 T26 T27 T28 T30 T32 T38 T33 T37 T34 T35 T36 T39 T41 T77 T42 T66 T43 T62 T44 T45 T61 T46 T60 T47 T56 T48 T49 T55 T50 T54 T51 T53 T52 T57 T58 T59 T63 T65 T64 T67 T74 T68 T72 T69 T71 T70 T73 T75 T76 T78 T102 T79 T82 T80 T81 T83 T101 T84 T93 T85 T86 T87 T88 T92 T89 T90 T91 T94 T95 T96 T97 T98 T100 T99 T103 T153 T104 T122 T105 T120 T106 T111 T107 T110 T108 T109 T112 T114 T113 T115 T117 T116 T118 T119 T121 T123 T137 T124 T128 T125 T126 T127 T129 T135 T130 T131 T132 T134 T133 T136 T138 T152 T139 T143 T140 T141 T142 T144 T146 T145 T147 T150 T148 T149 T151 T154 T155 T156 T158 T157 T159 T160 T162 T184 T163 T172 T164 T166 T165 T167 T171 T168 T169 T170 T173 T175 T174 T176 T181 T177 T179 T178 T180 T182 T183 T185 T187 T186 T188 T190 T189 T191 T192 T193 T195 T194 T196 T198 T197 T1994 / 51
5 / 51
5 / 51
5 / 51
1 Introduction 2 Work stealing scheduler 3 Analyzing execution time
4 Analyzing cache misses of work stealing 5 Summary
6 / 51
1
int fib(n) {
2
if (n < 2) return 1;
3
else {
4
int x, y;
5
create_task({ x = fib(n - 1); }); // share x
6
y = fib(n - 2);
7
wait_tasks;
8
return x + y;
9
}
10
}
7 / 51
1
P1
2
create_task(C1);
3
P2
4
create_task(C2);
5
P3
6
wait_tasks;
7
P4 P1 P2 P3 P4 C1 C2
8 / 51
9 / 51
10 / 51
11 / 51
12 / 51
12 / 51
12 / 51
12 / 51
13 / 51
W0 W1 W2 Wn−1 · · · · · · top bottom
14 / 51
W0 W1 W2 Wn−1 · · · · · · top bottom S P
1
2
14 / 51
W0 W1 W2 Wn−1 · · · · · · top bottom S P
14 / 51
W0 W1 W2 Wn−1 · · · · · · top bottom S P
1
14 / 51
W0 W1 W2 Wn−1 · · · · · · top bottom
1
2
14 / 51
W0 W1 W2 Wn−1 · · · · · · top bottom
14 / 51
W0 W1 W2 Wn−1 · · · · · · top bottom
1
14 / 51
W0 W1 W2 Wn−1 · · · · · · top bottom
1
14 / 51
W0 W1 W2 Wn−1 · · · · · · top bottom
1
2
14 / 51
W0 W1 W2 Wn−1 · · · · · · top bottom
1
2
14 / 51
W0 W1 W2 Wn−1 · · · · · · top bottom
14 / 51
W0 W1 W2 Wn−1 · · · · · · top bottom T P
1
14 / 51
W0 W1 W2 Wn−1 · · · · · · top bottom P
1
14 / 51
W0 W1 W2 Wn−1 · · · · · · top bottom T P
1
2
14 / 51
W0 W1 W2 Wn−1 · · · · · · top bottom P
1
2
14 / 51
W0 W1 W2 Wn−1 · · · · · · top bottom T P
1
2
14 / 51
W0 W1 W2 Wn−1 · · · · · · top bottom P
1
2
14 / 51
W0 W1 W2 Wn−1 · · · · · · top bottom S P
1
2
3
15 / 51
W0 W1 W2 Wn−1 · · · · · · top bottom S P
15 / 51
16 / 51
1 Introduction 2 Work stealing scheduler 3 Analyzing execution time
4 Analyzing cache misses of work stealing 5 Summary
17 / 51
1 Introduction 2 Work stealing scheduler 3 Analyzing execution time
4 Analyzing cache misses of work stealing 5 Summary
18 / 51
19 / 51
20 / 51
1 Introduction 2 Work stealing scheduler 3 Analyzing execution time
4 Analyzing cache misses of work stealing 5 Summary
21 / 51
22 / 51
ready nodes
P and T∞ are lower bounds, this shows any
23 / 51
24 / 51
ready nodes 1 2 3 4 5 6 7
25 / 51
ready nodes 1 2 3 4 5 6 7
25 / 51
ready nodes 1 2 3 4 5 6
25 / 51
ready nodes
25 / 51
terminated ready
26 / 51
terminated ready
26 / 51
terminated ready
26 / 51
terminated ready
26 / 51
terminated ready
26 / 51
terminated ready
26 / 51
terminated ready
26 / 51
1 Introduction 2 Work stealing scheduler 3 Analyzing execution time
4 Analyzing cache misses of work stealing 5 Summary
27 / 51
28 / 51
28 / 51
28 / 51
29 / 51
29 / 51
29 / 51
T1 steal attempts P TP
30 / 51
how many steal attempts? ready executed
31 / 51
how many steal attempts? ready executed
31 / 51
32 / 51
≈ P steal attempts create task
32 / 51
≈ P steal attempts create task
32 / 51
create task create task wait tasks wait tasks (A) (B) (C) (D)
33 / 51
create task create task wait tasks wait tasks (A) (B) (C) (D)
33 / 51
create task create task wait tasks wait tasks (A) (B) (C) (D)
33 / 51
create task create task wait tasks wait tasks (A) (B) (C) (D) x y
33 / 51
create task create task wait tasks wait tasks (A) (B) (C) (D) x y
33 / 51
create task create task a ready path create task create task n
34 / 51
35 / 51
35 / 51
35 / 51
36 / 51
1 Introduction 2 Work stealing scheduler 3 Analyzing execution time
4 Analyzing cache misses of work stealing 5 Summary
37 / 51
38 / 51
memory controller
L3 cache
hardware thread (virtual core, CPU)
L2 cache
L1 cache
38 / 51
capacity C capacity ∞ cache main memory ≤ C ≤ C ≤ C
39 / 51
capacity C capacity ∞ cache main memory ≤ C ≤ C ≤ C
39 / 51
capacity C capacity ∞ cache main memory ≤ C ≤ C ≤ C
39 / 51
40 / 51
41 / 51
1
create_task({ A });
2
B
41 / 51
42 / 51
capacity of each C capacity ∞ caches main memory
43 / 51
44 / 51
executed instructions (identical) initial cache states (different) hit miss
45 / 51
46 / 51
initial cache states (different) ≤ C t r a n s f e r s ( m i s s e s ) ≤ C transfers (misses)
47 / 51
work stealing drifted would immediately precede a drifted node in the serial order
48 / 51
49 / 51
1 Introduction 2 Work stealing scheduler 3 Analyzing execution time
4 Analyzing cache misses of work stealing 5 Summary
50 / 51
51 / 51