Dynamic Flow Regulation for IP Integration on Network-on-Chip - - PowerPoint PPT Presentation

dynamic flow regulation for ip integration on network on
SMART_READER_LITE
LIVE PREVIEW

Dynamic Flow Regulation for IP Integration on Network-on-Chip - - PowerPoint PPT Presentation

Dynamic Flow Regulation for IP Integration on Network-on-Chip Zhonghai Lu and Yi Wang Dept. of Electronic Systems KTH Royal Institute of Technology Stockholm, Sweden 6th Symposium on NoCS, Denmark May 9-11, 2012 Agenda The IP integration


slide-1
SLIDE 1

Dynamic Flow Regulation for IP Integration on Network-on-Chip

Zhonghai Lu and Yi Wang

  • Dept. of Electronic Systems

KTH Royal Institute of Technology Stockholm, Sweden

6th Symposium on NoCS, Denmark May 9-11, 2012

slide-2
SLIDE 2

2

Agenda

 The IP integration problem  Why flow regulation?  Online flow characterization  Dynamic regulation  Experiments and results  Conclusion and future work

6th Symposium on NoCS, Denmark May 9-11, 2012

slide-3
SLIDE 3

3

SoC Design

 Design of IPs

 Separate concerns, e.g. in computation and

communication;

 A divide-conquer approach to manage complexity;  by IP vendors

 Integration of IPs

 via a common interface (AHB, AXI, etc.);  by SoC integrators

6th Symposium on NoCS, Denmark May 9-11, 2012

slide-4
SLIDE 4

4

The IP integration problem

 Separating concerns helps to manage complexity and

reuse expert knowledge. However this creates performance (uncertainty, quality) problem for the IP integration phase.

 Can we control the performance?

6th Symposium on NoCS, Denmark May 9-11, 2012

slide-5
SLIDE 5

5

Flow regulation

 Do not inject traffic as soon as possible

 As-soon-as-possible traffic injection creates congestion

problem as-soon-as-possible

 Disciplined traffic helps to alleviate network contention

 A formal foundation: network calculus

 Abstract flow with arrival curve  Abstract server with service curve

 Can be viewed as a proactive (vs. reactive)

congestion control scheme

You have the horse. You have the rein!

6th Symposium on NoCS, Denmark May 9-11, 2012

slide-6
SLIDE 6

6

Linear arrival curve

 An arrival curve α(t) provides an upper bound on

the cumulative amount of traffic over time.

 A linear arrival curve has the form

where σ bounds traffic burstiness, ρ average rate.

) ( ) ( t t ρ σ α + =

t t 2 . 6 . 6 ) ( + = α

5 10 15 20 25 30 35 40 45 V (bits) t (cycle) ρ = 0.2 1 8 16 σ = 6.6

s=0 t s=38

6th Symposium on NoCS, Denmark May 9-11, 2012

slide-7
SLIDE 7

7

Closed form results

Assume: F: Linear arrival curve

S: Latency-rate server  The delay bound is  The backlog bound is

+

− = ) ( ) ( T t R t β

) ( ) ( t t ρ σ α + =

R T D σ + = T B ρ σ + =

V t

) (t α

) (t β

D

σ ρ R

T

V t

) (t α

) (t β

B

T

σ ρ R

6th Symposium on NoCS, Denmark May 9-11, 2012

slide-8
SLIDE 8

8

Why regulation helps?

 Reduce the traffic burstiness  It in turn reduces contention and buffering

requirements in the interconnect.

 Example

 Flow without regulation (σ=6.6, ρ=0.2)  Flow with strongest regulation (σ=1, ρ=0.2)

6th Symposium on NoCS, Denmark May 9-11, 2012

slide-9
SLIDE 9

9

Online flow characterization

 Purpose: Characterize flow’s (σ, ρ) values  How: through a sliding window mechanism

 Calculate previous-window, current-window (σ,

ρ) values

 Predict next-window (σ, ρ) values  The (σ, ρ) values are updated window by window  The sampling window slides with overlapping,

ensuring continuity of predicted values

6th Symposium on NoCS, Denmark May 9-11, 2012

slide-10
SLIDE 10

10

Online flow characterization

6th Symposium on NoCS, Denmark May 9-11, 2012

  • Sampling window: 750
  • Predication window: 250
slide-11
SLIDE 11

11

Sliding window

(σ, ρ) updates

6th Symposium on NoCS, Denmark May 9-11, 2012

  • Sampling window: 750
  • Predication window: 250
slide-12
SLIDE 12

12

Sliding window

(σ, ρ) updates

Sampling Window Lsw=Lw Prediction Window Lpw=Lw/N

6th Symposium on NoCS, Denmark May 9-11, 2012

  • Sampling window: 750
  • Predication window: 250
slide-13
SLIDE 13

13

Sliding window

(σ, ρ) updates

6th Symposium on NoCS, Denmark May 9-11, 2012

  • Sampling window: 750
  • Predication window: 250
slide-14
SLIDE 14

14

Sliding window

(σ, ρ) updates

6th Symposium on NoCS, Denmark May 9-11, 2012

  • Sampling window: 750
  • Predication window: 250
slide-15
SLIDE 15

15

Rate ρ characterization

 Characterize:  Predict:

 base value + offset value  Use history information  exploit the continuity brought by the sliding

window mechanism to avoid abrupt change

sw sw

L L f ) ( = ρ

1 1

ˆ ( )

n n n n

ρ ρ ρ ρ

+ −

= + −

6th Symposium on NoCS, Denmark May 9-11, 2012

slide-16
SLIDE 16

16

Burstiness σ characterization

 Characterize:

 Critical instant, ,to calculate a σ bound per

window

 Predict:

1 1

ˆ ( )

n n n n

σ σ σ σ

+ −

= + −

c sw sw c c c

t L L f t f t t f ⋅ − = ⋅ − = ) ( ) ( ) ( ρ σ

c

t

6th Symposium on NoCS, Denmark May 9-11, 2012

slide-17
SLIDE 17

17

Characterizer in hardware

 Main components: Sampling

+ Characterize + Predict

 Sampling (t, f(t))  Characterize for current

profile (σ, ρ)

 Predict for regulator

parameter

 Delay

 Release the resets with

interval of Lpw

 Overlapping execution =>

  • verlapping windows

 MUX

 Select results and feed

them into “Predict”

2 GHz,12 K NAND gates (45 nm)

6th Symposium on NoCS, Denmark May 9-11, 2012

slide-18
SLIDE 18

18

Dynamic regulator

 Leaky-bucket

regulation mechanism

 Incoming flow is

served only when token is available.

 Token generate

follows a linear curve

 Regulator’s (σ, ρ)

parameters are fed by the characterizer

1.4GHz, 2.2K NAND gates (45 nm)

Server (1 unit data per token) regulated flow Input flow σ Token rate ρ

) , ( ρ σ

B

6th Symposium on NoCS, Denmark May 9-11, 2012

slide-19
SLIDE 19

19

Experiments

 Experiment 1: Fidelity of the sliding window

based online flow characterization

 Experiment 2: Effect of dynamic flow

regulation vs. static regulation vs. no regulation

6th Symposium on NoCS, Denmark May 9-11, 2012

slide-20
SLIDE 20

20

Experiment 1: Fidelity of characterization

 Build a model for the online characterizer in

Matlab

 Use a two-state (on/off) MMP (Markov

Modulated Process) as the traffic source

6th Symposium on NoCS, Denmark May 9-11, 2012

slide-21
SLIDE 21

21

Effectiveness

 Sampling window 8192 cycles, prediction window

2048 cycles.

 Compared to static characterization, dynamic

characterization closely reflects the traffic dynamics.

6th Symposium on NoCS, Denmark May 9-11, 2012

slide-22
SLIDE 22

22

Window overlapping impact

 The

Y axis gives the ratio of violation (occasions when real traffic surpasses the projected bound)

 A performance/cost tradeoff: Higher overlapping,

lower violation ratio but higher implementation cost.

6th Symposium on NoCS, Denmark May 9-11, 2012

slide-23
SLIDE 23

23

Experiment 1I: Effect of dynamic regulation

 Use RTL models for characterizers, regulators

and the network

 The network is a deflection network as it is

more challenging to control

 Use both synthetic traffic and Splash2

benchmark traces

6th Symposium on NoCS, Denmark May 9-11, 2012

slide-24
SLIDE 24

24

Experimental setup

 56 masters, 8 slaves.  Measure regulation delay and network delay.

6th Symposium on NoCS, Denmark May 9-11, 2012

slide-25
SLIDE 25

25

Experimental configuration

 Three configurations:

 No regulation: Characterizer is disabled, regulator

provides a bypass.

 Static regulation: Regulators are configured once

with offline profiled (σ, ρ) values.

 Dynamic regulation: Characterizers are enabled.

Regulators are dynamically configured.

6th Symposium on NoCS, Denmark May 9-11, 2012

slide-26
SLIDE 26

26

Synthetic traffic

 56 masters inject the on-off traffic to 8 slaves with

equal probability, creating a hot spot traffic pattern which mimics memory access scenarios.

 Each master generates 8 flows, each targeting a slave.

The 8 flows from the same master are treated as 1 aggregate.

6th Symposium on NoCS, Denmark May 9-11, 2012

slide-27
SLIDE 27

27

Maximum packet delay

 Dynamic regulation outperforms static regulation for 34 (61%) of the 56

aggregates, with the maximum and average reduction of 452 cycles (16%) and 146.8 cycles (5.8%).

 Dynamic regulation outperforms no-regulation for 46 (82%) of the 56

  • aggregates. The maximum and average improvement is 435 cycles (17.4%)

and 167.5 cycles (6.3%).

6th Symposium on NoCS, Denmark May 9-11, 2012

slide-28
SLIDE 28

28

Average packet delay

 Dynamic regulation outperforms static regulation for all 56 aggregates,

with the maximum and average reduction of 186 cycles (13.8%) and 108.6 cycles (14.5%), resp.

 Dynamic regulation outperforms no-regulation for 45 (80%) of the 56

  • aggregates. The maximum and average improvement is 332.8 cycles (54.6%)

and 147.8 cycles (17.7%), resp.

6th Symposium on NoCS, Denmark May 9-11, 2012

slide-29
SLIDE 29

29

Splash2 benchmark traces

 Full-system simulator SIMICS together with GEMS (for the

memory system).

 According to the figure, we configured a CMP system with 56

cores (masters) and 8 slaves.

 Each core has L1 I/D Caches: 64KB, 4 way set-associative; L2

Cache: 256KB, 4 way set associative, 64 Byte lines.

 Total off-chip memory size is 4 GB with each memory being

500 MB (4G/8).

 Directory-based MOESI protocol.  The configured CMP system runs Solaris 9 OS.  After being compiled, the benchmark programs ran on the OS

and traces were recorded.

6th Symposium on NoCS, Denmark May 9-11, 2012

slide-30
SLIDE 30

30

Splash2 benchmark traces

 Compared to static regulation, the improvement in overall average

packet delay ranges from 12 to 90 cycles, from 10% to 26% in percentage.

 Compared to no-regulation, it is from 53 to 190 cycles, from 22%

to 41% in percentage.

6th Symposium on NoCS, Denmark May 9-11, 2012

slide-31
SLIDE 31

31

Conclusion

 Online traffic profiling through a sliding window

presents good fidelity and enables efficient hardware implementation.

 Integrating the online characterization into flow

regulation enables dynamic proper adjustment of regulation strength.

 Compared to static and no regulation, dynamic

regulation is more powerful in improving maximum and average packet delay.

6th Symposium on NoCS, Denmark May 9-11, 2012

slide-32
SLIDE 32

32

When delay is reduced?

 Delay reduction of dynamic vs. static regulation for FFT  Future work: include network status into the control loop.

6th Symposium on NoCS, Denmark May 9-11, 2012

slide-33
SLIDE 33

33

Acknowledgements

Thanks for your attention!

6th Symposium on NoCS, Denmark May 9-11, 2012