DRAIN: Distributed Recovery Architecture for Inaccessible Nodes in Multi-core Chips
Andrew DeOrio†, Konstantinos Aisopos‡§ Valeria Bertacco†, Li-Shiuan Peh§
DAC 2011
DRAIN: Distributed Recovery Architecture for Inaccessible Nodes in - - PowerPoint PPT Presentation
DRAIN: Distributed Recovery Architecture for Inaccessible Nodes in Multi-core Chips Andrew DeOrio , Konstantinos Aisopos Valeria Bertacco , Li-Shiuan Peh University of Michigan Princeton University Massachusetts
DAC 2011
2
µp
processor cache router Detect if fault has
Diagnose where fault has occurred Recover and resume normal
Reconfigure network to account for fault
3
µP
checkpoint buffers
µP
primary link
Mem
µ
P $ Router
µ
P $ Router
µ
P $ Router
. . . . . . ... . . . . . . . . . . . . ...
processor core local cache memory controller
DRAIN emergency link
4
µp
µp
µp
µp
5
µp
µp
µp
µp
link failure
6
Fault model: faults accumulate
µp
µp
µp
µp
7
reconfigure interconnect
µp
µp
µp
µp
link failure
8
Fault model: initiate Drain recovery when a single additional fault causes a node to become isolated
µp
µp
µp
µp
9
node isolated!
µp
µp
µp
µp
drain connected nodes via primary links
10
µp
µp
µp
µp
drain disconnected node via emergency link
11
12
found not found no
found no yes done
µp
µp
µp
µp
drain connected node again
13
µp
µp
µp
14
µp
15
set decoder set M set 0
way 0 ... way N
DRAIN- enabled control logic
uP router local cache
=? data
tag
data
tag
data
tag
=? tag
existing cache logic additional cache logic
data set tag set serial to parallel emergency link input
emergency link output
para parallel to serial data primary link output primary link input DRAIN data
tag
16
0M 1M 2M 3M 4M 5M 10 20 30 40 50 60 70 80 90 100
17
18