Fine-Grained Bandwidth Adaptivity in Networks-on-Chip Using - - PowerPoint PPT Presentation

fine grained bandwidth adaptivity in networks on chip
SMART_READER_LITE
LIVE PREVIEW

Fine-Grained Bandwidth Adaptivity in Networks-on-Chip Using - - PowerPoint PPT Presentation

Fine-Grained Bandwidth Adaptivity in Networks-on-Chip Using Bidirectional Channels Robert Hesse , Jeff Nicholls, Natalie Enright Jerger University of Toronto May 10, 2012 Friday, 11 May, 12 Motivation May 10, 2012 2 University of Toronto


slide-1
SLIDE 1

May 10, 2012

University of Toronto

Fine-Grained Bandwidth Adaptivity in Networks-on-Chip Using Bidirectional Channels

Robert Hesse, Jeff Nicholls, Natalie Enright Jerger

Friday, 11 May, 12

slide-2
SLIDE 2

May 10, 2012 University of Toronto

Motivation

2

Friday, 11 May, 12

slide-3
SLIDE 3

May 10, 2012 University of Toronto

Motivation

  • NoCs are crucial for scaling CMPs

2

Friday, 11 May, 12

slide-4
SLIDE 4

May 10, 2012 University of Toronto

Motivation

  • NoCs are crucial for scaling CMPs
  • Problem:

2

Friday, 11 May, 12

slide-5
SLIDE 5

May 10, 2012 University of Toronto

Motivation

  • NoCs are crucial for scaling CMPs
  • Problem:

–NoC bandwidth resources are static

BW Time 2

Friday, 11 May, 12

slide-6
SLIDE 6

May 10, 2012 University of Toronto

Motivation

  • NoCs are crucial for scaling CMPs
  • Problem:

–NoC bandwidth resources are static –Bandwidth requirements are highly dynamic

BW Time 2

Friday, 11 May, 12

slide-7
SLIDE 7

May 10, 2012 University of Toronto

Motivation

  • NoCs are crucial for scaling CMPs
  • Problem:

–NoC bandwidth resources are static –Bandwidth requirements are highly dynamic

  • Current solution:

BW Time 2

Friday, 11 May, 12

slide-8
SLIDE 8

May 10, 2012 University of Toronto

Motivation

  • NoCs are crucial for scaling CMPs
  • Problem:

–NoC bandwidth resources are static –Bandwidth requirements are highly dynamic

  • Current solution:

–Over-provisioned link BW

BW Time 2

Friday, 11 May, 12

slide-9
SLIDE 9

May 10, 2012 University of Toronto

Motivation

  • NoCs are crucial for scaling CMPs
  • Problem:

–NoC bandwidth resources are static –Bandwidth requirements are highly dynamic

  • Current solution:

–Over-provisioned link BW

Average channel utilization: < 5% BW Time 2

Friday, 11 May, 12

slide-10
SLIDE 10

May 10, 2012 University of Toronto

Motivation

  • NoCs are crucial for scaling CMPs
  • Problem:

–NoC bandwidth resources are static –Bandwidth requirements are highly dynamic

  • Current solution:

–Over-provisioned link BW

  • Our solution:

Average channel utilization: < 5% BW Time 2

Friday, 11 May, 12

slide-11
SLIDE 11

May 10, 2012 University of Toronto

Motivation

  • NoCs are crucial for scaling CMPs
  • Problem:

–NoC bandwidth resources are static –Bandwidth requirements are highly dynamic

  • Current solution:

–Over-provisioned link BW

  • Our solution:

–Adapt link BW to demands

Average channel utilization: < 5% BW Time 2

Friday, 11 May, 12

slide-12
SLIDE 12

May 10, 2012 University of Toronto

Motivation

  • NoCs are crucial for scaling CMPs
  • Problem:

–NoC bandwidth resources are static –Bandwidth requirements are highly dynamic

  • Current solution:

–Over-provisioned link BW

  • Our solution:

–Adapt link BW to demands

Average channel utilization: < 5% Save up to 75% of BW resources BW Time 2

Friday, 11 May, 12

slide-13
SLIDE 13

May 10, 2012 University of Toronto

Motivation - Static NoC

3

Friday, 11 May, 12

slide-14
SLIDE 14

May 10, 2012 University of Toronto

Motivation - Static NoC

  • Static Topology

IP IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15

3

Friday, 11 May, 12

slide-15
SLIDE 15

May 10, 2012 University of Toronto

Motivation - Static NoC

  • Static Topology
  • Static Bandwidth

IP IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15

3

Friday, 11 May, 12

slide-16
SLIDE 16

May 10, 2012 University of Toronto

Motivation - Static NoC

  • Static Topology
  • Static Bandwidth
  • Static workloads for

evaluation

IP IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15

3 Uniform Random

Friday, 11 May, 12

slide-17
SLIDE 17

May 10, 2012 University of Toronto

Motivation - Static NoC

  • Static Topology
  • Static Bandwidth
  • Static workloads for

evaluation

IP IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15

3

IP IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Uniform Random

Friday, 11 May, 12

slide-18
SLIDE 18

May 10, 2012 University of Toronto

Motivation - Static NoC

  • Static Topology
  • Static Bandwidth
  • Static workloads for

evaluation

IP IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15

3

IP IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Transpose

Friday, 11 May, 12

slide-19
SLIDE 19

May 10, 2012 University of Toronto

Motivation - Static NoC

  • Static Topology
  • Static Bandwidth
  • Static workloads for

evaluation

IP IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15

3

IP IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Transpose

  • Specified at design time

for worst case scenario

Friday, 11 May, 12

slide-20
SLIDE 20

May 10, 2012 University of Toronto

Motivation - Static NoC

  • Static Topology
  • Static Bandwidth
  • Static workloads for

evaluation

IP IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15

3

IP IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Transpose

  • Specified at design time

for worst case scenario

  • Static NoCs can handle

temporally- and spatially- stable traffic well

Friday, 11 May, 12

slide-21
SLIDE 21

May 10, 2012 University of Toronto

Motivation - Real NoC Traffjc

4 Blackscholes

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

IP IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15 Friday, 11 May, 12

slide-22
SLIDE 22

May 10, 2012 University of Toronto

Motivation - Real NoC Traffjc

  • Highly dynamic

workloads

4 Blackscholes

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

IP IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15 Friday, 11 May, 12

slide-23
SLIDE 23

May 10, 2012 University of Toronto

Motivation - Real NoC Traffjc

  • Highly dynamic

workloads

  • Large temporal and

spatial BW variance

4 Blackscholes

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

IP IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15 Friday, 11 May, 12

slide-24
SLIDE 24

May 10, 2012 University of Toronto

Motivation - Real NoC Traffjc

  • Highly dynamic

workloads

  • Large temporal and

spatial BW variance

4

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Streamcluster

IP IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15 Friday, 11 May, 12

slide-25
SLIDE 25

May 10, 2012 University of Toronto

Motivation - Real NoC Traffjc

  • Highly dynamic

workloads

  • Large temporal and

spatial BW variance

IP IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15

4

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Streamcluster

  • Significant area and power
  • verhead with traditional

NoC implementation

Friday, 11 May, 12

slide-26
SLIDE 26

May 10, 2012 University of Toronto

Motivation - Real NoC Traffjc

  • Highly dynamic

workloads

  • Large temporal and

spatial BW variance

IP IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15

4

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Streamcluster

  • Significant area and power
  • verhead with traditional

NoC implementation

  • Channels are underutilized

most of the time

Friday, 11 May, 12

slide-27
SLIDE 27

May 10, 2012 University of Toronto

Channel Utilization

5

Friday, 11 May, 12

slide-28
SLIDE 28

May 10, 2012 University of Toronto

Channel Utilization

0" 5" 10" 15" 20" 25" 30" 35" 40" 45" B l a c k s c h

  • l

e s " B

  • d

y t r a c k " C a n n e a l " F a c e s i m " F e r r e t " F l u i d a n i m a t e " R a y t r a c e " S t r e a m c l u s t e r " S w a p ?

  • n

s " V i p s " A v g . " Channel'U)liza)on'(%)' Max."U?liza?on" Avg."U?liza?on"

5

Friday, 11 May, 12

slide-29
SLIDE 29

May 10, 2012 University of Toronto

Channel Utilization

0" 5" 10" 15" 20" 25" 30" 35" 40" 45" B l a c k s c h

  • l

e s " B

  • d

y t r a c k " C a n n e a l " F a c e s i m " F e r r e t " F l u i d a n i m a t e " R a y t r a c e " S t r e a m c l u s t e r " S w a p ?

  • n

s " V i p s " A v g . " Channel'U)liza)on'(%)' Max."U?liza?on" Avg."U?liza?on"

3.42% 5

Friday, 11 May, 12

slide-30
SLIDE 30

May 10, 2012 University of Toronto

Channel Utilization

0" 5" 10" 15" 20" 25" 30" 35" 40" 45" B l a c k s c h

  • l

e s " B

  • d

y t r a c k " C a n n e a l " F a c e s i m " F e r r e t " F l u i d a n i m a t e " R a y t r a c e " S t r e a m c l u s t e r " S w a p ?

  • n

s " V i p s " A v g . " Channel'U)liza)on'(%)' Max."U?liza?on" Avg."U?liza?on"

R R

8 8

R R

4 4

R R

2 2

  • Adjust channel width (flit width):

3.42% 5

Friday, 11 May, 12

slide-31
SLIDE 31

May 10, 2012 University of Toronto

Channel Utilization

0" 5" 10" 15" 20" 25" 30" 35" 40" 45" B l a c k s c h

  • l

e s " B

  • d

y t r a c k " C a n n e a l " F a c e s i m " F e r r e t " F l u i d a n i m a t e " R a y t r a c e " S t r e a m c l u s t e r " S w a p ?

  • n

s " V i p s " A v g . " Channel'U)liza)on'(%)' Max."U?liza?on" Avg."U?liza?on" 0" 10" 20" 30" 40" 50" 60" 70" 80" B l a c k s c h

  • l

e s " B

  • d

y t r a c k " C a n n e a l " F a c e s i m " F e r r e t " F l u i d a n i m a t e " R a y t r a c e " S t r e a m c l u s t e r " S w a p B

  • n

s " V i p s " A v g . " Latency((cycles)( 8"Bytes" 4"Bytes" 2"Bytes" Channel" width:"

R R

8 8

R R

4 4

R R

2 2

  • Adjust channel width (flit width):

3.42% 5

Friday, 11 May, 12

slide-32
SLIDE 32

May 10, 2012 University of Toronto

Channel Utilization

0" 5" 10" 15" 20" 25" 30" 35" 40" 45" B l a c k s c h

  • l

e s " B

  • d

y t r a c k " C a n n e a l " F a c e s i m " F e r r e t " F l u i d a n i m a t e " R a y t r a c e " S t r e a m c l u s t e r " S w a p ?

  • n

s " V i p s " A v g . " Channel'U)liza)on'(%)' Max."U?liza?on" Avg."U?liza?on" 0" 10" 20" 30" 40" 50" 60" 70" 80" B l a c k s c h

  • l

e s " B

  • d

y t r a c k " C a n n e a l " F a c e s i m " F e r r e t " F l u i d a n i m a t e " R a y t r a c e " S t r e a m c l u s t e r " S w a p B

  • n

s " V i p s " A v g . " Latency((cycles)( 8"Bytes" 4"Bytes" 2"Bytes" Channel" width:"

R R

8 8

R R

4 4

R R

2 2

  • Adjust channel width (flit width):

Reducing flit width leads to unacceptable latency increase 3.42% 5

Friday, 11 May, 12

slide-33
SLIDE 33

May 10, 2012 University of Toronto

Bidirectional Channels

  • Bidirectional channels to share channel resources:

A B A+B

R R

A B time 6

Friday, 11 May, 12

slide-34
SLIDE 34

May 10, 2012 University of Toronto

Bidirectional Channels

  • Bidirectional channels to share channel resources:

A B A+B

R R

A B time 6

Friday, 11 May, 12

slide-35
SLIDE 35

May 10, 2012 University of Toronto

Bidirectional Channels

  • Bidirectional channels to share channel resources:

R R

A+B

A B A+B

R R

A B time 6

Friday, 11 May, 12

slide-36
SLIDE 36

May 10, 2012 University of Toronto

Bidirectional Channels

  • Bidirectional channels to share channel resources:

R R

A+B

A B A+B

R R

A B time b b b Keep flit size! 6

Friday, 11 May, 12

slide-37
SLIDE 37

May 10, 2012 University of Toronto

Bidirectional Channels

  • Bidirectional channels to share channel resources:

R R

A+B

A B A+B

R R

A B time b b b Keep flit size!

  • Adding flexibility with fine-grained BW adaptivity

6

R R R R R R

Friday, 11 May, 12

slide-38
SLIDE 38

May 10, 2012 University of Toronto

Bidirectional Channels

  • Bidirectional channels to share channel resources:

R R

A+B

A B A+B

R R

A B time b b b Keep flit size!

  • Adding flexibility with fine-grained BW adaptivity

6

R R R R R R

b/n b/n b/n b/n Need to sub-divide flits b/n b/n

Friday, 11 May, 12

slide-39
SLIDE 39

May 10, 2012 University of Toronto

Decoupling Flit Width From Channel Width

  • Conventionally in NoC, flit width is coupled to

channel width

Flit (b) Router 1 Router 2 b 7

Friday, 11 May, 12

slide-40
SLIDE 40

May 10, 2012 University of Toronto

Decoupling Flit Width From Channel Width

  • Conventionally in NoC, flit width is coupled to

channel width

Flit (b) Router 1 Router 2 b 7

Friday, 11 May, 12

slide-41
SLIDE 41

May 10, 2012 University of Toronto

Phit-Serial Communication

  • Conventionally in NoC, flit width is coupled to

channel width

  • Break flits (flow control units) into phits

(physical transfer units) to decouple channel width from flit width

Router 1 Router 2 b/n

Phit (b/n) Phit (b/n) Phit (b/n) Phit (b/n)

b/n b/n b/n Flit (b)

serialize

8

Friday, 11 May, 12

slide-42
SLIDE 42

May 10, 2012 University of Toronto

Phit-Serial Communication

  • Conventionally in NoC, flit width is coupled to

channel width

  • Break flits (flow control units) into phits

(physical transfer units) to decouple channel width from flit width

Router 1 Router 2 b/n

Phit (b/n) Phit (b/n) Phit (b/n) Phit (b/n)

b/n b/n b/n 8

Friday, 11 May, 12

slide-43
SLIDE 43

May 10, 2012 University of Toronto

Phit-Serial Communication

  • Conventionally in NoC, flit width is coupled to

channel width

  • Break flits (flow control units) into phits

(physical transfer units) to decouple channel width from flit width

Router 1 Router 2 b/n b/n b/n b/n Flit (b)

deserialize

8

Friday, 11 May, 12

slide-44
SLIDE 44

May 10, 2012 University of Toronto

Phit-Serial Communication

  • Conventionally in NoC, flit width is coupled to

channel width

  • Break flits (flow control units) into phits

(physical transfer units) to decouple channel width from flit width

Router 1 Router 2

Phit (b/n) Phit (b/n) Phit (b/n) Phit (b/n)

b/n b/n Flit (b)

serialize

9

Friday, 11 May, 12

slide-45
SLIDE 45

May 10, 2012 University of Toronto

Phit-Serial Communication

  • Conventionally in NoC, flit width is coupled to

channel width

  • Break flits (flow control units) into phits

(physical transfer units) to decouple channel width from flit width

Router 1 Router 2

Phit (b/n) Phit (b/n) Phit (b/n) Phit (b/n)

b/n b/n 9

Friday, 11 May, 12

slide-46
SLIDE 46

May 10, 2012 University of Toronto

Phit-Serial Communication

  • Conventionally in NoC, flit width is coupled to

channel width

  • Break flits (flow control units) into phits

(physical transfer units) to decouple channel width from flit width

Router 1 Router 2

Phit (b/n) Phit (b/n) Phit (b/n) Phit (b/n)

b/n b/n 9

Friday, 11 May, 12

slide-47
SLIDE 47

May 10, 2012 University of Toronto

Phit-Serial Communication

  • Conventionally in NoC, flit width is coupled to

channel width

  • Break flits (flow control units) into phits

(physical transfer units) to decouple channel width from flit width

Router 1 Router 2

Phit (b/n) Phit (b/n) Phit (b/n) Phit (b/n)

b/n b/n Flit (b)

deserialize

9

Friday, 11 May, 12

slide-48
SLIDE 48

May 10, 2012 University of Toronto

Microarchitecture

  • Bandwidth-Adaptive Router (BAR): Only minimal

modifications to standard VC router necessary

                  

   

 

   

  • Intra- & inter-router flow control is still flit-

based

10

Friday, 11 May, 12

slide-49
SLIDE 49

May 10, 2012 University of Toronto

Bandwidth Allocation

  • Pressure-based allocation of channels to

directions

   

 

        

   



 



 

  

 

   

 

  

 

     

 



11

Friday, 11 May, 12

slide-50
SLIDE 50

May 10, 2012 University of Toronto

Bandwidth Allocation

  • Pressure-based allocation of channels to

directions

   

 

        

   



 



 

  

 

   

 

  

 

     

 



A0 A1 A2 A3 D0 D1 D2 D3

11

Friday, 11 May, 12

slide-51
SLIDE 51

May 10, 2012 University of Toronto

Bandwidth Allocation

  • Pressure-based allocation of channels to

directions

   

 

        

   



 



 

  

 

   

 

  

 

     

 



A0 A1 A2 A3 D0 D1

11

Friday, 11 May, 12

slide-52
SLIDE 52

May 10, 2012 University of Toronto

Bandwidth Allocation

  • Pressure-based allocation of channels to

directions

   

 

        

   



 



 

  

 

   

 

  

 

     

 



D0 D1 D2 D3

11

Friday, 11 May, 12

slide-53
SLIDE 53

   

 

   

          

   



       

 

 

     

 



   



 

 

 

  

   

  

 

May 10, 2012 University of Toronto

Example

Flit A Flit B Flit C Flit D

12

Friday, 11 May, 12

slide-54
SLIDE 54

   

 

   

          

   



       

 

 

     

 



   



 

 

 

  

   

  

 

May 10, 2012 University of Toronto

Example

Flit A Flit B Flit C Flit D

12

Friday, 11 May, 12

slide-55
SLIDE 55

   

 

   

          

   



       

 

 

     

 



   



 

 

 

  

   

  

 

May 10, 2012 University of Toronto

Example

A0 A1 A2 A3 Flit B Flit C D0 D1 D2 D3

12

Friday, 11 May, 12

slide-56
SLIDE 56

   

 

   

          

   



       

 

 

     

 



   



 

 

 

  

   

  

 

May 10, 2012 University of Toronto

Example

A0 A1 A2 A3 Flit B Flit C D0 D1 D2 D3

12

Friday, 11 May, 12

slide-57
SLIDE 57

   

 

   

          

   



       

 

 

     

 



   



 

 

 

  

   

  

 

May 10, 2012 University of Toronto

Example

A0 A1 Flit C D0 D1 D2 D3 A2 A3 B0 B1 B2 B3

12

Friday, 11 May, 12

slide-58
SLIDE 58

   

 

   

          

   



       

 

 

     

 



   



 

 

 

  

   

  

 

May 10, 2012 University of Toronto

Example

A0 A1 Flit C D0 D1 D2 D3 A2 A3 B0 B1 B2 B3

12

Friday, 11 May, 12

slide-59
SLIDE 59

   

 

   

          

   



       

 

 

     

 



   



 

 

 

  

   

  

 

May 10, 2012 University of Toronto

Example

A0 A1 A2 A3 D0 D1 D3 D2 B0 B1 B2 B3 C0 C1 C2 C3

12

Friday, 11 May, 12

slide-60
SLIDE 60

   

 

   

          

   



       

 

 

     

 



   



 

 

 

  

   

  

 

May 10, 2012 University of Toronto

Example

A0 A1 A2 A3 D0 D1 D3 D2 B0 B1 B2 B3 C0 C1 C2 C3

12

Friday, 11 May, 12

slide-61
SLIDE 61

   

 

   

          

   



       

 

 

     

 



   



 

 

 

  

   

  

 

May 10, 2012 University of Toronto

Example

D0 D1 D3 D2 B0 B1 B2 B3 C0 C1 C2 C3 Flit A

12

Friday, 11 May, 12

slide-62
SLIDE 62

   

 

   

          

   



       

 

 

     

 



   



 

 

 

  

   

  

 

May 10, 2012 University of Toronto

Example

D0 D1 D3 D2 B0 B1 B2 B3 C0 C1 C2 C3

12

Friday, 11 May, 12

slide-63
SLIDE 63

   

 

   

          

   



       

 

 

     

 



   



 

 

 

  

   

  

 

May 10, 2012 University of Toronto

Example

C0 C1 C2 C3 Flit B Flit D

12

Friday, 11 May, 12

slide-64
SLIDE 64

   

 

   

          

   



       

 

 

     

 



   



 

 

 

  

   

  

 

May 10, 2012 University of Toronto

Example

C0 C1 C2 C3

12

Friday, 11 May, 12

slide-65
SLIDE 65

   

 

   

          

   



       

 

 

     

 



   



 

 

 

  

   

  

 

May 10, 2012 University of Toronto

Example

Flit C

12

Friday, 11 May, 12

slide-66
SLIDE 66

May 10, 2012 University of Toronto

Related Work

13

Friday, 11 May, 12

slide-67
SLIDE 67

May 10, 2012 University of Toronto

Related Work

  • Previous Work:

13

Friday, 11 May, 12

slide-68
SLIDE 68

May 10, 2012 University of Toronto

Related Work

  • Previous Work:

–BiNoC (Lan et al., NOCS 2009)

  • only coarse-grained BW adaptivity

R R

b b Link BW: 2*b XBar ports: 2*P

BINOC

13

Friday, 11 May, 12

slide-69
SLIDE 69

May 10, 2012 University of Toronto

Related Work

  • Previous Work:

–BiNoC (Lan et al., NOCS 2009)

  • only coarse-grained BW adaptivity

–Oblivious Routing in On-Chip Bandwidth-Adaptive Networks (Cho et al., PACT 2009)

R R

b b Link BW: 2*b XBar ports: 2*P

BINOC R R

b b Link BW: N*b XBar ports: N*P

BWADAPTIVE

13

Friday, 11 May, 12

slide-70
SLIDE 70

May 10, 2012 University of Toronto

Related Work

  • Previous Work:

–BiNoC (Lan et al., NOCS 2009)

  • only coarse-grained BW adaptivity

–Oblivious Routing in On-Chip Bandwidth-Adaptive Networks (Cho et al., PACT 2009)

  • Objective is different:

R R

b b Link BW: 2*b XBar ports: 2*P

BINOC R R

b b Link BW: N*b XBar ports: N*P

BWADAPTIVE

13

Friday, 11 May, 12

slide-71
SLIDE 71

May 10, 2012 University of Toronto

Related Work

  • Previous Work:

–BiNoC (Lan et al., NOCS 2009)

  • only coarse-grained BW adaptivity

–Oblivious Routing in On-Chip Bandwidth-Adaptive Networks (Cho et al., PACT 2009)

  • Objective is different:

R R

b/n b/n Link BW: ≤ b XBar ports: P

BAR R R

b b Link BW: 2*b XBar ports: 2*P

BINOC R R

b b Link BW: N*b XBar ports: N*P

BWADAPTIVE

13

Friday, 11 May, 12

slide-72
SLIDE 72

May 10, 2012 University of Toronto

Related Work

  • Previous Work:

–BiNoC (Lan et al., NOCS 2009)

  • only coarse-grained BW adaptivity

–Oblivious Routing in On-Chip Bandwidth-Adaptive Networks (Cho et al., PACT 2009)

  • Objective is different:

R R

b/n b/n Link BW: ≤ b XBar ports: P

BAR R R

b b Link BW: 2*b XBar ports: 2*P

BINOC R R

b b Link BW: N*b XBar ports: N*P

BWADAPTIVE

Remember: BW demands << b 13

Friday, 11 May, 12

slide-73
SLIDE 73

May 10, 2012 University of Toronto

Evaluation

  • Synthetic and real workloads (PARSEC)
  • Comparing 4 router designs:

–BAR: our BW adaptive router design –STANDARD: typical virtual-channel router –BINOC: BiNoC router –BWADAPTIVE: existing BW adaptive router

14

Friday, 11 May, 12

slide-74
SLIDE 74

May 10, 2012 University of Toronto

Area Comparison

15

Friday, 11 May, 12

slide-75
SLIDE 75

May 10, 2012 University of Toronto

Area Comparison

  • Orion 2.0 with 45nm process

Architecture STANDARD BINOC BWADAPT. BAR Total # of Buf. 5 10 20 10 Total Channels 5-in 5-out 10-inout 20-inout 20-inout Each Buf. Size 32 flits 16 flits 8 flits 16 flits Total Buf. Size 160 flits 160 flits 160 flits 160 flits Crossbar 5x5 10x10 20x20 5x5 Flit width 8 byte 8 byte 8 byte 8 byte

15

Friday, 11 May, 12

slide-76
SLIDE 76

May 10, 2012 University of Toronto

Area Comparison

  • Orion 2.0 with 45nm process

0" 2" 4" 6" 8" 10" 12" 14" STANDARD" BINOC" BWADAPTIVE" BAR" Area%(normalized%to% STANDARD)% Alloca:on" SerDes" Xbar" Buffer"

Architecture STANDARD BINOC BWADAPT. BAR Total # of Buf. 5 10 20 10 Total Channels 5-in 5-out 10-inout 20-inout 20-inout Each Buf. Size 32 flits 16 flits 8 flits 16 flits Total Buf. Size 160 flits 160 flits 160 flits 160 flits Crossbar 5x5 10x10 20x20 5x5 Flit width 8 byte 8 byte 8 byte 8 byte

Initial Comparison 15

Friday, 11 May, 12

slide-77
SLIDE 77

May 10, 2012 University of Toronto

Area Comparison

  • Orion 2.0 with 45nm process

0" 2" 4" 6" 8" 10" 12" 14" STANDARD" BINOC" BWADAPTIVE" BAR" Area%(normalized%to% STANDARD)% Alloca:on" SerDes" Xbar" Buffer"

Architecture STANDARD BINOC BWADAPT. BAR Total # of Buf. 5 10 20 10 Total Channels 5-in 5-out 10-inout 20-inout 20-inout Each Buf. Size 32 flits 16 flits 8 flits 16 flits Total Buf. Size 160 flits 160 flits 160 flits 160 flits Crossbar 5x5 10x10 20x20 5x5 Flit width 8 byte 8 byte 8 byte 8 byte Architecture STANDARD BINOC BWADAPT. BAR Total # of Buf. 5 10 20 10 Total Channels 5-in 5-out 10-inout 20-inout 20-inout Each Buf. Size 32 flits 16 flits 8 flits 16 flits Total Buf. Size 160 flits 160 flits 160 flits 160 flits Crossbar 5x5 10x10 20x20 5x5 Flit width 8 byte 8 byte 8 byte 8 byte

Initial Comparison 15

Friday, 11 May, 12

slide-78
SLIDE 78

May 10, 2012 University of Toronto

Area Comparison

  • Orion 2.0 with 45nm process

0" 2" 4" 6" 8" 10" 12" 14" STANDARD" BINOC" BWADAPTIVE" BAR" Area%(normalized%to% STANDARD)% Alloca:on" SerDes" Xbar" Buffer" 0" 0.2" 0.4" 0.6" 0.8" 1" STANDARD" BINOC" BWADAPTIVE" BAR" Area%(normalized%to% STANDARD)% Alloca;on" SerDes" Xbar" Buffer"

Architecture STANDARD BINOC BWADAPT. BAR Total # of Buf. 5 10 20 10 Total Channels 5-in 5-out 10-inout 20-inout 20-inout Each Buf. Size 32 flits 16 flits 8 flits 16 flits Total Buf. Size 160 flits 160 flits 160 flits 160 flits Crossbar 5x5 10x10 20x20 5x5 Flit width 8 byte 8 byte 8 byte 8 byte Architecture STANDARD BINOC BWADAPT. BAR Total # of Buf. 5 10 20 10 Total Channels 5-in 5-out 10-inout 20-inout 20-inout Each Buf. Size 32 flits 16 flits 8 flits 16 flits Total Buf. Size 160 flits 160 flits 160 flits 160 flits Crossbar 5x5 10x10 20x20 5x5 Flit width 8 byte 8 byte 8 byte 8 byte Architecture STANDARD BINOC BWADAPT. BAR Total # of Buf. 5 10 20 10 Total Channels 5-in 5-out 10-inout 20-inout 20-inout Each Buf. Size 32 flits 32 flits 32 flits 16 flits Total Buf. Size 160 flits 160 flits 160 flits 160 flits Crossbar 5x5 10x10 20x20 5x5 Flit width 8 byte 4 byte 2 byte 8 byte

Initial Comparison Equalized Area 15

Friday, 11 May, 12

slide-79
SLIDE 79

0" 10" 20" 30" 40" 50" 60" 70" 80" 90" 100" 0" 0.05" 0.1" 0.15" 0.2" 0.25" 0.3" 0.35" 0.4" 0.45" Latency((cycles)( Injec/on(Rate((flits/node/cycle)( STANDARD" BINOC" BWADAPTIVE" BAR" 0" 10" 20" 30" 40" 50" 60" 70" 80" 90" 100" 0" 0.05" 0.1" 0.15" 0.2" 0.25" Latency((cycles)( Injec/on(Rate((flits/node/cycle)( STANDARD" BINOC" BWADAPTIVE" BAR" 0" 10" 20" 30" 40" 50" 60" 70" 80" 90" 100" 0" 0.05" 0.1" 0.15" 0.2" 0.25" 0.3" Latency((cycles)( Injec/on(Rate((flits/node/cycle)( STANDARD" BINOC" BWADAPTIVE" BAR" 0" 10" 20" 30" 40" 50" 60" 70" 80" 90" 100" 0" 0.05" 0.1" 0.15" 0.2" 0.25" 0.3" Latency((cycles)( Injec/on(Rate((flits/node/cycle)( STANDARD" BINOC" BWADAPTIVE" BAR"

May 10, 2012 University of Toronto

Static Network Performance

Uniform Random Transpose Bit-Complement Shuffle

better better

16

Friday, 11 May, 12

slide-80
SLIDE 80

0" 10" 20" 30" 40" 50" 60" 70" 80" 90" 100" 0" 0.05" 0.1" 0.15" 0.2" 0.25" 0.3" 0.35" 0.4" 0.45" Latency((cycles)( Injec/on(Rate((flits/node/cycle)( STANDARD" BINOC" BWADAPTIVE" BAR" 0" 10" 20" 30" 40" 50" 60" 70" 80" 90" 100" 0" 0.05" 0.1" 0.15" 0.2" 0.25" Latency((cycles)( Injec/on(Rate((flits/node/cycle)( STANDARD" BINOC" BWADAPTIVE" BAR" 0" 10" 20" 30" 40" 50" 60" 70" 80" 90" 100" 0" 0.05" 0.1" 0.15" 0.2" 0.25" 0.3" Latency((cycles)( Injec/on(Rate((flits/node/cycle)( STANDARD" BINOC" BWADAPTIVE" BAR" 0" 10" 20" 30" 40" 50" 60" 70" 80" 90" 100" 0" 0.05" 0.1" 0.15" 0.2" 0.25" 0.3" Latency((cycles)( Injec/on(Rate((flits/node/cycle)( STANDARD" BINOC" BWADAPTIVE" BAR"

May 10, 2012 University of Toronto

Static Network Performance

Uniform Random Transpose Bit-Complement Shuffle

better better

16

Friday, 11 May, 12

slide-81
SLIDE 81

0" 10" 20" 30" 40" 50" 60" 70" 80" 90" 100" 0" 0.05" 0.1" 0.15" 0.2" 0.25" 0.3" 0.35" 0.4" 0.45" Latency((cycles)( Injec/on(Rate((flits/node/cycle)( STANDARD" BINOC" BWADAPTIVE" BAR" 0" 10" 20" 30" 40" 50" 60" 70" 80" 90" 100" 0" 0.05" 0.1" 0.15" 0.2" 0.25" Latency((cycles)( Injec/on(Rate((flits/node/cycle)( STANDARD" BINOC" BWADAPTIVE" BAR" 0" 10" 20" 30" 40" 50" 60" 70" 80" 90" 100" 0" 0.05" 0.1" 0.15" 0.2" 0.25" 0.3" Latency((cycles)( Injec/on(Rate((flits/node/cycle)( STANDARD" BINOC" BWADAPTIVE" BAR" 0" 10" 20" 30" 40" 50" 60" 70" 80" 90" 100" 0" 0.05" 0.1" 0.15" 0.2" 0.25" 0.3" Latency((cycles)( Injec/on(Rate((flits/node/cycle)( STANDARD" BINOC" BWADAPTIVE" BAR"

May 10, 2012 University of Toronto

Static Network Performance

Uniform Random Transpose Bit-Complement Shuffle

better better

16

Friday, 11 May, 12

slide-82
SLIDE 82

0" 10" 20" 30" 40" 50" 60" 70" 80" 90" 100" 0" 0.05" 0.1" 0.15" 0.2" 0.25" 0.3" 0.35" 0.4" 0.45" Latency((cycles)( Injec/on(Rate((flits/node/cycle)( STANDARD" BINOC" BWADAPTIVE" BAR" 0" 10" 20" 30" 40" 50" 60" 70" 80" 90" 100" 0" 0.05" 0.1" 0.15" 0.2" 0.25" Latency((cycles)( Injec/on(Rate((flits/node/cycle)( STANDARD" BINOC" BWADAPTIVE" BAR" 0" 10" 20" 30" 40" 50" 60" 70" 80" 90" 100" 0" 0.05" 0.1" 0.15" 0.2" 0.25" 0.3" Latency((cycles)( Injec/on(Rate((flits/node/cycle)( STANDARD" BINOC" BWADAPTIVE" BAR" 0" 10" 20" 30" 40" 50" 60" 70" 80" 90" 100" 0" 0.05" 0.1" 0.15" 0.2" 0.25" 0.3" Latency((cycles)( Injec/on(Rate((flits/node/cycle)( STANDARD" BINOC" BWADAPTIVE" BAR"

May 10, 2012 University of Toronto

Static Network Performance

Uniform Random Transpose Bit-Complement Shuffle

better better

16

Friday, 11 May, 12

slide-83
SLIDE 83

May 10, 2012 University of Toronto

Overall System Performance

  • Full-system simulation:
  • Cycle accurate x86 simulation: FeS2 + BookSim
  • PARSEC benchmarks (16 threads)
  • 16 P4-like CPUs, 4x4 mesh NoC

17

Friday, 11 May, 12

slide-84
SLIDE 84

May 10, 2012 University of Toronto

Overall System Performance

  • Full-system simulation:
  • Cycle accurate x86 simulation: FeS2 + BookSim
  • PARSEC benchmarks (16 threads)
  • 16 P4-like CPUs, 4x4 mesh NoC

0.6$ 0.7$ 0.8$ 0.9$ 1$ STANDARD/2x8$ BINOC/2x8$ BAR/4x4$ BAR/8x2$ Performance*(IPC)* Router/Channel*Configura7on* blackscholes$ bodytrack$ canneal$ facesim$ ferret$ fluidanimate$ raytrace$ streamcluster$ swapLons$ vips$ Average$ 4*byte* bandwidth*

100% channel resources

17

Friday, 11 May, 12

slide-85
SLIDE 85

May 10, 2012 University of Toronto

Overall System Performance

  • Full-system simulation:
  • Cycle accurate x86 simulation: FeS2 + BookSim
  • PARSEC benchmarks (16 threads)
  • 16 P4-like CPUs, 4x4 mesh NoC

0.6$ 0.7$ 0.8$ 0.9$ 1$ STANDARD/2x8$ BINOC/2x8$ BAR/4x4$ BAR/8x2$ Performance*(IPC)* Router/Channel*Configura7on* blackscholes$ bodytrack$ canneal$ facesim$ ferret$ fluidanimate$ raytrace$ streamcluster$ swapLons$ vips$ Average$ 4*byte* bandwidth*

100% channel resources

100% 101.2% 99.9% 100.6% 17

Friday, 11 May, 12

slide-86
SLIDE 86

0.6$ 0.7$ 0.8$ 0.9$ 1$ STANDARD/2x4$ BINOC/2x4$ BAR/4x2$ BAR/8x1$ Performance*(IPC)* Router/Channel*Configura7on* blackscholes$ bodytrack$ canneal$ facesim$ ferret$ fluidanimate$ raytrace$ streamcluster$ swapLons$ vips$ Average$ 4*byte* bandwidth*

May 10, 2012 University of Toronto

Overall System Performance

  • Full-system simulation:
  • Cycle accurate x86 simulation: FeS2 + BookSim
  • PARSEC benchmarks (16 threads)
  • 16 P4-like CPUs, 4x4 mesh NoC

92.5% 94.1% 98.5% 99.0%

50% channel resources

17

Friday, 11 May, 12

slide-87
SLIDE 87

0.6$ 0.7$ 0.8$ 0.9$ 1$ STANDARD/2x2$ BINOC/2x2$ BAR/4x1$ Performance*(IPC)* Router/Channel*Configura7on* blackscholes$ bodytrack$ canneal$ facesim$ ferret$ fluidanimate$ raytrace$ streamcluster$ swapLons$ vips$ Average$ 4*byte* bandwidth*

May 10, 2012 University of Toronto

Overall System Performance

  • Full-system simulation:
  • Cycle accurate x86 simulation: FeS2 + BookSim
  • PARSEC benchmarks (16 threads)
  • 16 P4-like CPUs, 4x4 mesh NoC

77.2% 78.7% 91.9%

25% channel resources

17

Friday, 11 May, 12

slide-88
SLIDE 88

May 10, 2012 University of Toronto

Summary

18

Friday, 11 May, 12

slide-89
SLIDE 89

May 10, 2012 University of Toronto

Summary

  • We introduce fine-grained BW adaptivity

18

Friday, 11 May, 12

slide-90
SLIDE 90

May 10, 2012 University of Toronto

Summary

  • We introduce fine-grained BW adaptivity
  • Phit-serial communication to decouple flit

width from channel width

18

Friday, 11 May, 12

slide-91
SLIDE 91

May 10, 2012 University of Toronto

Summary

  • We introduce fine-grained BW adaptivity
  • Phit-serial communication to decouple flit

width from channel width

  • Improved channel utilization without

significantly increasing latency

18

Friday, 11 May, 12

slide-92
SLIDE 92

May 10, 2012 University of Toronto

Summary

  • We introduce fine-grained BW adaptivity
  • Phit-serial communication to decouple flit

width from channel width

  • Improved channel utilization without

significantly increasing latency

  • 50% reduction channel resources: 99% perf.

18

Friday, 11 May, 12

slide-93
SLIDE 93

May 10, 2012 University of Toronto

Summary

  • We introduce fine-grained BW adaptivity
  • Phit-serial communication to decouple flit

width from channel width

  • Improved channel utilization without

significantly increasing latency

  • 50% reduction channel resources: 99% perf.
  • 75% reduction in channel resources: 92% perf.

18

Friday, 11 May, 12

slide-94
SLIDE 94

May 10, 2012 University of Toronto

Thank you!

robert.hesse@utoronto.ca 19

Friday, 11 May, 12

slide-95
SLIDE 95

May 10, 2012 University of Toronto

Hardware Implementation

3 1

       

   

                      

    

 



 

  

   

  

Friday, 11 May, 12

slide-96
SLIDE 96

May 10, 2012 University of Toronto

Additional Delays

0" 2" 4" 6" 8" 10" 12" 14" 0" 5" 10" 15" 20" 25" 30" 35" 40" 45" 50" 8" 8" 4" 4" 2" 2"

Channel'u)liza)on'(100*u)liza)on/ available'bandwidth)' Latency'(cycles)' Channel'width'(bytes)' Conges0on"Delay" Serializa0on"Delay" Zero"Load"Delay" Channel"U0liza0on"

3 2

Friday, 11 May, 12

slide-97
SLIDE 97

May 10, 2012 University of Toronto

Implementation Results

3 3

Friday, 11 May, 12