Utilize Partially Faulty Links in Networks-on-Chip Changlin Chen*, Ye - - PowerPoint PPT Presentation

utilize partially faulty links in
SMART_READER_LITE
LIVE PREVIEW

Utilize Partially Faulty Links in Networks-on-Chip Changlin Chen*, Ye - - PowerPoint PPT Presentation

A Novel Flit Serialization Strategy to Utilize Partially Faulty Links in Networks-on-Chip Changlin Chen*, Ye Lu , Sorin D. Cotofana* ECIT, QUB *Computer engineering, TU Delft {c.chen-2, S.D.Cotofana}@tudelft.nl ylu10@qub.ac.uk NOCS 2012


slide-1
SLIDE 1

NOCS 2012

A Novel Flit Serialization Strategy to Utilize Partially Faulty Links in Networks-on-Chip

Changlin Chen*, Ye Lu†, Sorin D. Cotofana* *Computer engineering, TU Delft {c.chen-2, S.D.Cotofana}@tudelft.nl

†ECIT, QUB

ylu10@qub.ac.uk

1

slide-2
SLIDE 2

Computer Engineering NOCS 2012

2

Outline

Motivation Related Works Flit Serialization Evaluation results Conclusion

slide-3
SLIDE 3

Computer Engineering NOCS 2012

  • NoC: Routers + Links + Network Interfaces
  • Links are prone to

– Manufacturing defects – Chip wear out effects – Process Parameter Variations

  • Faulty links can be isolated by fault tolerant routing

algorithm

  • The remaining bandwidth of the partially faulty link is

wasted

  • Partially faulty links should be utilized

3

Motivation

slide-4
SLIDE 4

Computer Engineering NOCS 2012

  • Use pre-fabricated spare wires to replace faulty wires

– Grecu et al. – Lehtonen et al.

  • Simple flit quad splitting (SFQS)

– *Palesi et al. – Lehtonen et al.

  • Packet rebuilding/restoring

– Yu et al.

  • Partially faulty link recovery mechanism (PFLRM)

– †Vitkovskiy et al.

4

Related works

*

3 3 3 3 3 3 2 2 2 2 2 2 1 1 1 1 1 1

c b a a c b a c b a b c b a c b a c c b a c b a 

slide-5
SLIDE 5

Computer Engineering NOCS 2012

5

The remaining link bandwidth should be used more efficiently

slide-6
SLIDE 6

Computer Engineering NOCS 2012

  • Proposed link fault tolerant architecture
  • Links are diagnosed periodically and the fault vector
  • f a link is sent to the control logics at TX and RX side

6

Flit Serialization

slide-7
SLIDE 7

Computer Engineering NOCS 2012

7

slide-8
SLIDE 8

Computer Engineering NOCS 2012

mux mux mux mux

link_reg_TX flit_serialize_ctrl sel

fault_vector data_acceptable flit_type update_0 update_1 12

flit_deserialize_ctrl link_reg_RX

16 fault_vector flit_type

b

1

b

2

b

3

b 8

Flit Transmission Process:

CLK data_from_crossbar data_on_link cyclic_reg_TX clk1 clk2 clk3 clk4 clk5 high_reg_state low_reg_state data_acceptable

CLK data_to_input_buffer data_ on_link cyclic_reg_RX clk2 clk3 clk4 clk5 clk6 flit_1_recovered flit_2_recovered

3 2 1

a a a a

3 2 1 0

b b bb

3 2 1 0

c c c c

3 2 1

d d d d

3 2 1 0 3 2 1 0

c c c c b b bb

3 2 1

a a a

0 3 2

a b b

1 0 3

bb c

2 1 0

c c c

3 2 1

a a a a 

3 2 1 0 3 2 1 0

a a a a b b bb

wait 3 2 1

a a a

0 3 2

a b b

1 0 3

bb c

2 1 0

c c c

3 2 1

d d d

3 2 1

a a a 

3 2 1 0 3 2

a a a a b b 

3 2 1 0 3 2 1 0

c a a a b b bb

3 2 1 0 3 2 1 0

c c c c b b bb 3 2 1

a a a a

3 2 1 0

b b bb

3 2 1 0

c c c c

3

a

2

a

1

a a

3

a

2

a

1

a

slide-9
SLIDE 9

Computer Engineering NOCS 2012

mux mux mux mux

link_reg_TX flit_serialize_ctrl sel

fault_vector data_acceptable flit_type update_0 update_1 12

flit_deserialize_ctrl link_reg_RX

16 fault_vector flit_type

a

3

a

3

b

2

b

3

c

2

c

1

c c0 b

1

b

2

b

3

b 9

Flit Transmission Process:

CLK data_from_crossbar data_on_link cyclic_reg_TX clk1 clk2 clk3 clk4 clk5 high_reg_state low_reg_state data_acceptable

CLK data_to_input_buffer data_ on_link cyclic_reg_RX clk2 clk3 clk4 clk5 clk6 flit_1_recovered flit_2_recovered

3 2 1

a a a a

3 2 1 0

b b bb

3 2 1 0

c c c c

3 2 1

d d d d

3 2 1 0 3 2 1 0

c c c c b b bb

3 2 1

a a a

0 3 2

a b b

1 0 3

bb c

2 1 0

c c c

3 2 1

a a a a 

3 2 1 0 3 2 1 0

a a a a b b bb

wait 3 2 1

a a a

0 3 2

a b b

1 0 3

bb c

2 1 0

c c c

3 2 1

d d d

3 2 1

a a a 

3 2 1 0 3 2

a a a a b b 

3 2 1 0 3 2 1 0

c a a a b b bb

3 2 1 0 3 2 1 0

c c c c b b bb 3 2 1

a a a a

3 2 1 0

b b bb

3 2 1 0

c c c c

2

a a

2

a

1

a

3

a

2

a

1

a

slide-10
SLIDE 10

Computer Engineering NOCS 2012

CLK data_from_crossbar data_on_link cyclic_reg_TX clk1 clk2 clk3 clk4 clk5 high_reg_state low_reg_state data_acceptable

10

mux mux mux mux

link_reg_TX flit_serialize_ctrl sel

fault_vector data_acceptable flit_type update_0 update_1 12

flit_deserialize_ctrl link_reg_RX

16 fault_vector flit_type

a

3

a

3

b

2

b

3

c

2

c

1

c c b

1

b

2

b

3

b

Flit Transmission Process:

CLK data_to_input_buffer data_ on_link cyclic_reg_RX clk2 clk3 clk4 clk5 clk6 flit_1_recovered flit_2_recovered

3 2 1

a a a a

3 2 1 0

b b bb

3 2 1 0

c c c c

3 2 1

d d d d

3 2 1 0 3 2 1 0

c c c c b b bb

3 2 1

a a a

0 3 2

a b b

1 0 3

bb c

2 1 0

c c c

3 2 1

a a a a 

3 2 1 0 3 2 1 0

a a a a b b bb

wait 3 2 1

a a a

0 3 2

a b b

1 0 3

bb c

2 1 0

c c c

3 2 1

d d d

3 2 1

a a a 

3 2 1 0 3 2

a a a a b b 

3 2 1 0 3 2 1 0

c a a a b b bb

3 2 1 0 3 2 1 0

c c c c b b bb 3 2 1

a a a a

3 2 1 0

b b bb

3 2 1 0

c c c c a

2

a

1

a

3

a

2

a

1

a

1

b b

3

c a

3

a

2

a

1

a

slide-11
SLIDE 11

Computer Engineering NOCS 2012

mux mux mux mux

link_reg_TX flit_serialize_ctrl sel

fault_vector data_acceptable flit_type update_0 update_1 12

flit_deserialize_ctrl link_reg_RX

16 fault_vector flit_type 1

b b

3

c 11

CLK data_from_crossbar data_on_link cyclic_reg_TX clk1 clk2 clk3 clk4 clk5 high_reg_state low_reg_state data_acceptable

11

3

a

3

c

2

c

1

c c b

1

b

2

b

3

b

Flit Transmission Process:

CLK data_to_input_buffer data_ on_link cyclic_reg_RX clk2 clk3 clk4 clk5 clk6 flit_1_recovered flit_2_recovered

3 2 1

a a a a

3 2 1 0

b b bb

3 2 1 0

c c c c

3 2 1

d d d d

3 2 1 0 3 2 1 0

c c c c b b bb

3 2 1

a a a

0 3 2

a b b

1 0 3

bb c

2 1 0

c c c

3 2 1

a a a a 

3 2 1 0 3 2 1 0

a a a a b b bb

wait 3 2 1

a a a

0 3 2

a b b

1 0 3

bb c

2 1 0

c c c

3 2 1

d d d

3 2 1

a a a 

3 2 1 0 3 2

a a a a b b 

3 2 1 0 3 2 1 0

c a a a b b bb

3 2 1 0 3 2 1 0

c c c c b b bb 3 2 1

a a a a

3 2 1 0

b b bb

3 2 1 0

c c c c

2

a

1

a a

3

a

2

a

1

a a

3

b

2

b

3

d

2

d

1

d d

3

b

2

b

1

b b

2

c

1

c c

slide-12
SLIDE 12

Computer Engineering NOCS 2012

mux mux mux mux

link_reg_TX flit_serialize_ctrl sel

fault_vector data_acceptable flit_type update_0 update_1 12

flit_deserialize_ctrl link_reg_RX

16 fault_vector flit_type 2

c

1

c c

3

b

2

b

1

b b 12

1

b b

3

c 12

CLK data_from_crossbar data_on_link cyclic_reg_TX clk1 clk2 clk3 clk4 clk5 high_reg_state low_reg_state data_acceptable

12

3

c

2

c

1

c c b

1

b

2

b

3

b

Flit Transmission Process:

CLK data_to_input_buffer data_ on_link cyclic_reg_RX clk2 clk3 clk4 clk5 clk6 flit_1_recovered flit_2_recovered

3 2 1

a a a a

3 2 1 0

b b bb

3 2 1 0

c c c c

3 2 1

d d d d

3 2 1 0 3 2 1 0

c c c c b b bb

3 2 1

a a a

0 3 2

a b b

1 0 3

bb c

2 1 0

c c c

3 2 1

a a a a 

3 2 1 0 3 2 1 0

a a a a b b bb

wait 3 2 1

a a a

0 3 2

a b b

1 0 3

bb c

2 1 0

c c c

3 2 1

d d d

3 2 1

a a a 

3 2 1 0 3 2

a a a a b b 

3 2 1 0 3 2 1 0

c a a a b b bb

3 2 1 0 3 2 1 0

c c c c b b bb 3 2 1

a a a a

3 2 1 0

b b bb

3 2 1 0

c c c c

3

b

2

b

3

d

2

d

1

d d

2

a

1

a a

3

d

2

d

1

d

3

c

2

c

1

c c

3

e

2

e

1

e e

slide-13
SLIDE 13

Computer Engineering NOCS 2012

13

Flit Serialization

sec _ _ _sec _

SFQS

tion number flit number Latency available tion number  

Fault number Proposed 4 sections Proposed 8 sections PFLRM Best situation PFLRM Worst situation SFQS 0% 0% 0% 0% 0% 1 33.3% 14.3% 100% 100% 100% 2 100% 33.3% 100% 200% 100% 3 300% 60.0% 100% 300% 300% 4

  • 100%

100% 400%

  • 5
  • 167%

100% 500%

  • 6
  • 300%

100% 600%

  • 7
  • 700%

100% 700%

  • Table 1 Link latency overheads when flits are transmitted continuously

Link Latency Wire fault probability

Even distribution: Cluster Faults:

Faults may stay in one link section

sec _ _ _ _sec _

proposed

tion number flit number Latency fault free tion number        

( _ 1) _

PFLRM

Latency cluster size flit number   

 

1

e

n k k k N k e e e

n n P p p p k k

 

              

slide-14
SLIDE 14

Computer Engineering NOCS 2012

  • Two ways to collaborate with ECC
  • The realization of other parts can be conventional

14

ECC Integration

link_section_N link_section_1

Router Router

TX RX

ECC coder ECC decoder

...

Router Router

ECC coder

TX

ECC coder

...

ECC decoder

RX

ECC decoder

...

link_section_N link_section_1

...

slide-15
SLIDE 15

Computer Engineering NOCS 2012

  • Platform

– NoC Topology: 8 X 8 2D mesh – Router:

  • 3 pipeline stages

– Look ahead routing & VC/Switch allocation – Switch traversal – Link Traversal

  • 5 physical channels with 5 virtual channels in each
  • Each VC is 4-flit deep and 32-bit wide

– Realized at RTL level by using Verilog HDL – Synopsys Design Compiler, TSMC 65nm, 500MHz

15

Evaluation Results

slide-16
SLIDE 16

Computer Engineering NOCS 2012

  • Area and Power overhead

16

Evaluation Results

Link fault tolerant method Dynamic Power (mW) Leakage Power (mW) Area( ) Basic router 18.20 / 0% 0.5269 / 0% 69560 / 0% Proposed 4 sections 20.81 / 14.3% 0.7994 / 51.7% 89567 / 28.8% 8 sections 21.18 / 16.4% 0.9294 / 76.4% 96538 / 38.8% Spare wires 4 26.71 / 46.8% 0.9959 / 89% 102383 / 47.2% 8 29.03 / 59.5% 1.0442 / 98.2% 116789 / 67.9% PFLRM 19.30 / 6.0% 0.5789 / 9.9% 83326 / 19.8% SFQS 20.27 / 11.4% 0.6807 / 29.1% 81288 / 16.9%

Table 2 Power and area overhead of different link fault tolerant methods

2

m 

slide-17
SLIDE 17

Computer Engineering NOCS 2012

  • System performance (uniform traffic)

17

Evaluation Results

0.05 0.1 0.15 0.2 0.25 0.3 0.35 50 100 150 200 packet_length = 4, Pe = 0.001 injection rate average latency fault_free proposed_s8 proposed_s4 pflrm quad splitting 0.05 0.1 0.15 0.2 0.25 0.3 0.35 50 100 150 200 packet_length = 4, Pe = 0.01 injection rate average latency fault_free proposed_s8 proposed_s4 pflrm quad splitting

(a) fault pattern when

0.001 pe

(b) performance when

0.001 pe

(c) fault pattern when (d) performance when

0.01 pe  0.01 pe 

slide-18
SLIDE 18

Computer Engineering NOCS 2012

  • System performance (uniform traffic)

18

Evaluation Results

0.05 0.1 0.15 0.2 0.25 0.3 0.35 50 100 150 200 packet_length = 4, Pe = 0.05 injection rate average latency fault_free proposed_s8 proposed_s4 pflrm quad splitting 0.05 0.1 0.15 0.2 0.25 0.3 0.35 50 100 150 200 packet_length = 4, Pe = 0.1 injection rate average latency fault_free proposed_s8 pflrm

(a) fault pattern when (b) performance when (c) fault pattern when (d) performance when 0.05 pe  0.05 pe 

0.1 pe  0.1 pe 

slide-19
SLIDE 19

Computer Engineering NOCS 2012

  • A novel flit serialization strategy to utilize partially faulty links
  • Flits and links are divided into several sections
  • On a defective link, flit sections are serialized and transmitted
  • n all faulty free link sections
  • The flit serialization and deserialization modules are

transparent to the other part of the router

  • Link latency overhead is significantly reduced when compared

with other partially faulty link usage strategies

  • Area and energy overheads are reduced when compared with

spare wire replacement method with slight performance degradation

  • Performance of the NoCs can be degraded gracefully

19

Conclusion

slide-20
SLIDE 20

Computer Engineering NOCS 2012

20