Online Task Remapping Strategies for Fault-tolerant Network-on-Chip - - PowerPoint PPT Presentation
Online Task Remapping Strategies for Fault-tolerant Network-on-Chip - - PowerPoint PPT Presentation
Online Task Remapping Strategies for Fault-tolerant Network-on-Chip Multiprocessors Onur Derin, Deniz Kabakc, Leandro Fiorin ALaRI Faculty of Informatics University of Lugano Lugano, Switzerland derino@alari.ch NOCS11 - Pittsburgh May
Outline
Problem ILP formulation of the optimal mapping problem of KPN applications onto NoC Minimization of the communication cost Minimization of the total computation time Online Task Remapping Optimal Task Remapping Center of Gravity method (CoG) Nonidentical Multiprocessor Scheduling Heuristics (NMS) Localized NMS Heuristic (LNMS) Case study Results
- O. Derin, ALaRI
NOCS’11— Online Task Remapping Strategies for Fault-tolerant Network-on-Chip Multiprocessors 2/31
Introduction
Continuity of service support in the MADNESS NoC platform
starting point
Kahn Process Networks as the computation model xpipes-based NoC from Uni. Cagliari, NORMA model, message-passing support
Main tasks
a middleware to execute KPN on NoCs fault detection/masking via self-testing and self-checking reconfiguration via online task remapping task migration from faulty nodes fault-tolerant interconnect
- O. Derin, ALaRI
NOCS’11— Online Task Remapping Strategies for Fault-tolerant Network-on-Chip Multiprocessors 3/31
Problem
n
1
l1
n n
7
n
8
n
5
n n
6
n
9 2 3 4
l l5 l4 l3 l l11 l12 l10 l9 l7 l6
2 8
n
RISC RISC RISC RISC RISC DSP DSP DSP DSP t11 t10 t6 t12 t4 t5 t9 t8 t7 t2 t1 t3 e 1
2
e e 3
5
e
6
e
7
e
14
e
8
e
4
e
9
e
10
e
11
e
12
e
13
e
Figure: A KPN application running on a 3x3 mesh
- O. Derin, ALaRI
NOCS’11— Online Task Remapping Strategies for Fault-tolerant Network-on-Chip Multiprocessors 4/31
Problem: Where should the tasks on the faulty core be moved?
n
1
l1
n n
7
n
8
n
5
n n
6
n
9 2 3 4
l l5 l4 l3 l l11 l12 l10 l9 l7 l6
2 8
n
RISC RISC RISC RISC RISC DSP DSP DSP DSP t11 t10 t6 t12 t4 t5 t9 t8 t7 t2 t1 t3 e 1
2
e e 3
5
e
6
e
7
e
14
e
8
e
4
e
9
e
10
e
11
e
12
e
13
e
Figure: Processing node n5 becomes faulty
- O. Derin, ALaRI
NOCS’11— Online Task Remapping Strategies for Fault-tolerant Network-on-Chip Multiprocessors 5/31
Approach
Solve the task mapping problem onto NoC-based heterogeneous multiprocessors optimally Consider different fault scenarios and find new optimal remappings Propose heuristics for the problem Compare their performances with the optimal results
- O. Derin, ALaRI
NOCS’11— Online Task Remapping Strategies for Fault-tolerant Network-on-Chip Multiprocessors 6/31
Mapping problem
Given a KPN task graph Given an architecture graph and a deterministic routing algorithm Find the optimal mapping such that computation time and amount of communication is
- ptimized
- O. Derin, ALaRI
NOCS’11— Online Task Remapping Strategies for Fault-tolerant Network-on-Chip Multiprocessors 7/31
ILP formulation of the mapping problem
A task graph gt = (Vt, Et) is composed of tasks t ∈ Vt and data dependencies e ∈ Et ⊆ Vt × Vt. An architecture graph ga = (Vn, En) is composed of processing nodes n ∈ Va and bidirectional communication links l ∈ Ea ⊆ Va × Va. A task binding βt : Vt → Va is an assignment of tasks t ∈ Vt to nodes n ∈ Va. A communication binding βc : Et → E i
a is an assignment of
data dependencies e ∈ Et to paths of length i in the architecture graph ga. A path p of length i is given by i-tuple p = (l1, l2, ..., li).
- O. Derin, ALaRI
NOCS’11— Online Task Remapping Strategies for Fault-tolerant Network-on-Chip Multiprocessors 8/31
ILP formulation of the mapping problem
path : (Va, Va) → E i
a is a function that implements a
deterministic routing algorithm and returns a path between two given nodes. Path set P is the set of paths between all node pairs: P = {pk : pk = path(ni, nj), ∀ni, nj ∈ Va ∧ ni = nj} The task graph can be annotated with demand values where demand di on a data dependency ei ∈ Et, denotes the required bandwidth between the two tasks. The architecture graph can be annotated with capacity values where capacity on an architectural link li ∈ Ea, ci, denotes the maximum bandwidth of the communication link between two architectural nodes. Core type set C consists of core types Ci and lists the types of cores available in a given NoC platform.
- O. Derin, ALaRI
NOCS’11— Online Task Remapping Strategies for Fault-tolerant Network-on-Chip Multiprocessors 9/31
Minimization of the communication cost Decision variables
X NT
ij
= 1, if tj ∈ Vt is bound onto node ni ∈ Va 0,
- therwise
Y PE
ij
= 1, if ej ∈ Et is mapped to pi ∈ P 0,
- therwise
- O. Derin, ALaRI
NOCS’11— Online Task Remapping Strategies for Fault-tolerant Network-on-Chip Multiprocessors 10/31
Minimization of the communication cost Parameters
MTE
ij
= 1, if ∃tk, ej = (ti, tk) ∈ Et −1, if ∃tk, ej = (tk, ti) ∈ Et 0,
- therwise
MNP
ij
= 1, if source(pj) = ni −1, if sink(pj) = ni 0,
- therwise
MPL
ij
= 1, if lj ∈ pi 0,
- therwise
- O. Derin, ALaRI
NOCS’11— Online Task Remapping Strategies for Fault-tolerant Network-on-Chip Multiprocessors 11/31
Minimization of the communication cost Constraints
Constraint 1 (routing): Task mapping and communication binding are constrained with each other due to the routing algorithm implemented in the NoC. X NTMTE = MNPY PE (1) Constraint 2 (task mapping): A task can be mapped exactly on
- ne node.
X TN1|Va| = 1|Vt| (2) Constraint 3 (communication mapping): A data dependency can be mapped at most on one path. Y EP1|P| ≤ 1|Et| (3)
- O. Derin, ALaRI
NOCS’11— Online Task Remapping Strategies for Fault-tolerant Network-on-Chip Multiprocessors 12/31
Minimization of the communication cost
Constraint 4 (capacity): Total bandwidth demand on a link lj should not exceed the capacity of the link cj. MLPY PEd ≤ c (4) Objective 1 (communication cost): The total traffic on the links is the sum of all demands di on the links of the paths that arise according to a given mapping: min: dT Y EPMPL 1|Ea| (5)
- O. Derin, ALaRI
NOCS’11— Online Task Remapping Strategies for Fault-tolerant Network-on-Chip Multiprocessors 13/31
Minimization of the total computation time Parameters
MTC
cap ij =
1, if Cj ∈ C is capable of realizing task ti ∈ Vt 0,
- therwise
T TC
cap ij =
- completion time of ti on Cj,
if MTC
cap ij = 1
0, if MTC
cap ij = 0
MNC
ij
= 1, if ni ∈ Va is of core type Cj ∈ C 0,
- therwise
- O. Derin, ALaRI
NOCS’11— Online Task Remapping Strategies for Fault-tolerant Network-on-Chip Multiprocessors 14/31
Minimization of the total computation time Constraints
Constraint 5 (capability): All tasks should be mapped on cores that are capable of implementing those tasks. MTC = X TNMNC ≤ MTC
cap
(6)
- O. Derin, ALaRI
NOCS’11— Online Task Remapping Strategies for Fault-tolerant Network-on-Chip Multiprocessors 15/31
Minimization of the total computation time
Objective 2 (total execution time): We calculate the total computation time of the application by finding the maximum of the sum of the execution times of tasks mapped on the same core. min: max(T N) = max(X NT(((X TNMNC) . T TC
cap ) 1|C|))
(7) We apply some linearization techniques to this formula for max() and xij ∗ xkl The analytical model for the adopted objectives is valid
for acyclic KPN applications in cases where communication is faster with respect to computation when the network is not overloaded (no congestion)
- O. Derin, ALaRI
NOCS’11— Online Task Remapping Strategies for Fault-tolerant Network-on-Chip Multiprocessors 16/31
The mapping tool
based on IBM ILOG CPLEX API multi-objective ILP problem solved with ε-constraint method
- O. Derin, ALaRI
NOCS’11— Online Task Remapping Strategies for Fault-tolerant Network-on-Chip Multiprocessors 17/31
Optimal Task Remapping New constraints
Constraint (faulty core): Given a faulty node nf , a new constraint is added to the ILP formulation that forbids mapping of tasks on the faulty node nf .
|Vt|
- j=1
X NT
fj
= 0 (8) Constraint (migrate only tasks on the faulty core): Given a faulty node nf and an initial task mapping MNT, a new constraint is added to limit the reconfiguration just to the tasks that are running on the faulty node nf . X NT
ij
= MNT
ij
, 1 ≤ i ≤ |Va|, 1 ≤ j ≤ |Vt|, i = f (9)
- O. Derin, ALaRI
NOCS’11— Online Task Remapping Strategies for Fault-tolerant Network-on-Chip Multiprocessors 18/31
Optimal Task Remapping
Why not encode optimal remapping solutions for every fault scenario? number of scenarios, N N =
|Va|
- i=1
|Va| i
- − 1 = 2|Va| − 1
number of bits, B B = (2|Va| − 1) p |Vt| ⌈log(|Va|)⌉ For |Va| = 9, |Vt| = 12, p = 5, we have B = 14.97 Kbytes We may need heuristics!
- O. Derin, ALaRI
NOCS’11— Online Task Remapping Strategies for Fault-tolerant Network-on-Chip Multiprocessors 19/31
Center of Gravity method (CoG) Idea
We transform the problem to finding the center of gravity of masses by considering the weights of communication as the masses
- f the peer tasks.
coordi =
- tj∈peers(ti) coord( map(tj) ) weight(tj, ti)
- tj∈peers(ti) weight(tj, ti)
, ti ∈ Lf coord(ni) = ⌊coordi + (0.5, 0.5)⌋ considers only communication
- O. Derin, ALaRI
NOCS’11— Online Task Remapping Strategies for Fault-tolerant Network-on-Chip Multiprocessors 20/31
Nonidentical Multiprocessor Scheduling Heuristics (NMS) Idea
The objective regarding computation is equivalent to the scheduling of independent tasks on nonidentical processors in order to minimize the makespan. Three heuristics: NMS-A, NMS-B, NMS-C Different complexities: O(n), O(n log n), O(n2) No absolute winner considers only computation
- O. Derin, ALaRI
NOCS’11— Online Task Remapping Strategies for Fault-tolerant Network-on-Chip Multiprocessors 21/31
NMS Heuristics NMS-A
The task ti ∈ Lf is scheduled on the core that minimizes its finishing time.
NMS-B
For each task ti ∈ Lf , Algorithm NMS-B first orders the tasks in Lf according to decreasing min{T TN
cap ij : 1 ≤ j ≤ |Va|}, and then calls Algorithm NMS-A.
NMS-C
This algorithm iteratively schedules the tasks by choosing a task from Lf that gives the least finishing time.
- O. Derin, ALaRI
NOCS’11— Online Task Remapping Strategies for Fault-tolerant Network-on-Chip Multiprocessors 22/31
Localized NMS Heuristics (LNMS) Idea
We limit the region of nodes where we employ the NMS heuristics. considers both communication and computation apply NMS heuristics on a region: LNMS-A, LNMS-B, LNMS-C region is parametrized in size and centered around center of gravity
- O. Derin, ALaRI
NOCS’11— Online Task Remapping Strategies for Fault-tolerant Network-on-Chip Multiprocessors 23/31
Case study
t2 t3 t4 t5 t6 t7 t8 t1 e 1 t9 t10 t11 t12
2
e e 3
4
e
5
e
6
e
7
e
8
e
9
e
10
e
11
e
12
e
13
e
14
e
Figure: MPEG-2 Decoder KPN graph
15 seconds long video with resolution 704 × 576 pixels and 25 fps d = 1.0 34.6 28.1 28.1 28.1 28.1 65.0 34.6 28.1 28.1 28.1 28.1 65.0 15.2 (in MBps)
- O. Derin, ALaRI
NOCS’11— Online Task Remapping Strategies for Fault-tolerant Network-on-Chip Multiprocessors 24/31
Case study
n
1
l1
n n
7
n
8
n
5
n n
6
n
9 2 3 4
l l5 l4 l3 l l11 l12 l10 l9 l7 l6
2 8
n
RISC RISC RISC RISC RISC DSP DSP DSP DSP
Figure: The given 3x3 NoC architecture
ci = 100 MBps, 1 ≤ i ≤ |Ea| C = {C1, C2} = {RISC, DSP} MTC
cap = 112×2
T TC
cap =
0.13 0.20 6.68 8.52 0.06 0.04 2.00 1.25 2.00 1.25 0.05 0.04 0.06 0.04 2.00 1.25 2.00 1.25 0.05 0.04 12.33 8.51 0.18 0.30 (in seconds)
- O. Derin, ALaRI
NOCS’11— Online Task Remapping Strategies for Fault-tolerant Network-on-Chip Multiprocessors 25/31
Results
Figure: Pareto curves for optimal mapping of three applications with 12, 24 and 36 tasks
- O. Derin, ALaRI
NOCS’11— Online Task Remapping Strategies for Fault-tolerant Network-on-Chip Multiprocessors 26/31
Results (MPEG2 decoder - 3x3 NoC - n5 faulty)
Figure: Pareto curves for optimal mappings and optimal remappings (unlimited task migration)
- O. Derin, ALaRI
NOCS’11— Online Task Remapping Strategies for Fault-tolerant Network-on-Chip Multiprocessors 27/31
Results (MPEG2 decoder - 3x3 NoC - n5 faulty)
Figure: Comparison of results between heuristics, optimal remappings and the initial mapping (limited task migration)
- O. Derin, ALaRI
NOCS’11— Online Task Remapping Strategies for Fault-tolerant Network-on-Chip Multiprocessors 28/31
Results (MPEG2 decoder - 3x3 NoC - all single faults)
Figure: Degradation achieved by Pareto-optimal limited remappings for all single fault scenarios Figure: Degradation achieved by proposed heuristics for all single fault scenarios
- O. Derin, ALaRI
NOCS’11— Online Task Remapping Strategies for Fault-tolerant Network-on-Chip Multiprocessors 29/31
Conclusions
We proposed an ILP formulation to the task mapping problem
- nto heterogeneous NoC multiprocessors with deterministic
routing optimizing the total execution time and total network traffic. We proposed ILP-based optimal solution to the remapping problem due to faulty nodes. We proposed heuristics and evaluated them in comparison to the optimal remapping results. ILP is not scalable but it makes sense to use it for the remapping problem.
Future work
get results for bigger task and node sizes implement on the MADNESS NoC platform
- O. Derin, ALaRI
NOCS’11— Online Task Remapping Strategies for Fault-tolerant Network-on-Chip Multiprocessors 30/31