Fault Tolerance of the I nput/ Output Ports in Massively Def ective - - PowerPoint PPT Presentation

fault tolerance of the i nput output ports in massively
SMART_READER_LITE
LIVE PREVIEW

Fault Tolerance of the I nput/ Output Ports in Massively Def ective - - PowerPoint PPT Presentation

The 38th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Second Workshop on D Dependable and Secure Nanocomputing Friday June 27, 2008 Anchorage, AK, USA Fault Tolerance of the I nput/ Output Ports in Massively


slide-1
SLIDE 1

Fault Tolerance of the I nput/ Output Ports in Massively Def ective Multicore Processor Chips

Piotr Zaj Piotr Zajc, Jacques H Jacques Henri C enri Collet, Jean A

  • llet, Jean Arlat, and Yves

rlat, and Yves Crouzet Crouzet

{firstname.lastname}@laas.fr

The 38th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Second Workshop on D Dependable and Secure Nanocomputing Friday June 27, 2008 — Anchorage, AK, USA

slide-2
SLIDE 2

2

From Multi- Cores Architectures To Multi- Multi- Cores Architectures

Multi- Core: perf ormance while coping with power

dissipation issues (very high clock f requency)

Still, transitor size f or including many of such cores

—> signif icant % of def ective cores (more than 10% ?)

Current context:

Chips are sorted according t o f requency Single core processor = “Downgraded” dual core circuits …

How to go f urther: On- line reconf iguration to cope with f aults?

Now Soon

Source: Intel

slide-3
SLIDE 3

3

Disconnected Zone Single Connected Zone

IOP IOP C

R

C

R

IOP

Core Processor Router I/O Port Failed Core Processor Inhibited Inter-router link

Mutual Diagnosis v Bad Cores I solated

Example Target Architecture

(5x9- node Network — Connectivity: 4)

  • P. Zajc, J. H. Collet,
  • J. Arlat, Y. Crouzet,

“Resilience through Self-Configuration in Future Massively Defective Nanochips”, Supplemental Volume DSN2007, Edinburgh, Scotland, UK, pp.266-271, 2007

The I / O I nterf ace (I OP) is a Hardcore and a “Blottle Neck”

slide-4
SLIDE 4

4

Preliminary Analysis of Several Options

I ncrease the number of I / O ports Consider redundant I OPs Extend I OP connectivity with grid (adjacent nodes) …

slide-5
SLIDE 5

5

I ncreasing the Number of I OPs

Example of a 4- I OP Grid I ncluding 14 Def ective Cores

slide-6
SLIDE 6

6

Redundant I OP Architecture

Example: Case of a 4- port Chip f or R = 5, 6, 7 and r = 3 Example: 4- connect RI OP with R = 3 Redundant I / O Modules (Mi)

Chip Validation Criteria?

At least r out of R modules are f ault- f ree at start- up in each RI OP Validation probability

slide-7
SLIDE 7

7

Modif ication of Grid Topology around each RI OP

Example of Overhead Analysis N = 300; NI O = 4; nc = 8 VC1: To protect communication bandwidth of each RI OP, at least 3/ 8 neighboring nodes must be f ault- f ree. VC2: Validation yield threshold:

P

W, I OP x P L(k, nc, pf , N) 80%.

f,N

  • Prob. k/ nc nodes adj / RI OP are OK

RI OP

Q = R 1

( ) NIO AIO

N A

Connectivity nc = 4 Connectivity nc = 6 Connectivity nc = 8

slide-8
SLIDE 8

8

Concluding Remarks

St udy of t he prot ect ion of t he I OPs in mult iport grid

archit ect ures

Analysis of t he dependabilit y gain and overhead induced:

redundancy, connect ivit y and chip area

Grid t opology and connect ivit y Self - diagnosis and coverage Applicat ion reconf igurat ion