Dynamic Flow Regulation for IP Integration on Network-on-Chip
Zhonghai Lu and Yi Wang
- Dept. of Electronic Systems
KTH Royal Institute of Technology Stockholm, Sweden
6th Symposium on NoCS, Denmark May 9-11, 2012
Dynamic Flow Regulation for IP Integration on Network-on-Chip - - PowerPoint PPT Presentation
Dynamic Flow Regulation for IP Integration on Network-on-Chip Zhonghai Lu and Yi Wang Dept. of Electronic Systems KTH Royal Institute of Technology Stockholm, Sweden 6th Symposium on NoCS, Denmark May 9-11, 2012 Agenda The IP integration
Zhonghai Lu and Yi Wang
KTH Royal Institute of Technology Stockholm, Sweden
6th Symposium on NoCS, Denmark May 9-11, 2012
2
The IP integration problem Why flow regulation? Online flow characterization Dynamic regulation Experiments and results Conclusion and future work
6th Symposium on NoCS, Denmark May 9-11, 2012
3
Design of IPs
Separate concerns, e.g. in computation and
A divide-conquer approach to manage complexity; by IP vendors
Integration of IPs
via a common interface (AHB, AXI, etc.); by SoC integrators
6th Symposium on NoCS, Denmark May 9-11, 2012
4
Separating concerns helps to manage complexity and
Can we control the performance?
6th Symposium on NoCS, Denmark May 9-11, 2012
5
Do not inject traffic as soon as possible
As-soon-as-possible traffic injection creates congestion
Disciplined traffic helps to alleviate network contention
A formal foundation: network calculus
Abstract flow with arrival curve Abstract server with service curve
Can be viewed as a proactive (vs. reactive)
You have the horse. You have the rein!
6th Symposium on NoCS, Denmark May 9-11, 2012
6
An arrival curve α(t) provides an upper bound on
A linear arrival curve has the form
t t 2 . 6 . 6 ) ( + = α
5 10 15 20 25 30 35 40 45 V (bits) t (cycle) ρ = 0.2 1 8 16 σ = 6.6
s=0 t s=38
6th Symposium on NoCS, Denmark May 9-11, 2012
7
S: Latency-rate server The delay bound is The backlog bound is
+
− = ) ( ) ( T t R t β
) ( ) ( t t ρ σ α + =
R T D σ + = T B ρ σ + =
V t
) (t α
) (t β
D
σ ρ R
T
V t
) (t α
) (t β
B
T
σ ρ R
6th Symposium on NoCS, Denmark May 9-11, 2012
8
Reduce the traffic burstiness It in turn reduces contention and buffering
Example
Flow without regulation (σ=6.6, ρ=0.2) Flow with strongest regulation (σ=1, ρ=0.2)
6th Symposium on NoCS, Denmark May 9-11, 2012
9
Purpose: Characterize flow’s (σ, ρ) values How: through a sliding window mechanism
Calculate previous-window, current-window (σ,
Predict next-window (σ, ρ) values The (σ, ρ) values are updated window by window The sampling window slides with overlapping,
6th Symposium on NoCS, Denmark May 9-11, 2012
10
6th Symposium on NoCS, Denmark May 9-11, 2012
11
(σ, ρ) updates
6th Symposium on NoCS, Denmark May 9-11, 2012
12
(σ, ρ) updates
Sampling Window Lsw=Lw Prediction Window Lpw=Lw/N
6th Symposium on NoCS, Denmark May 9-11, 2012
13
(σ, ρ) updates
6th Symposium on NoCS, Denmark May 9-11, 2012
14
(σ, ρ) updates
6th Symposium on NoCS, Denmark May 9-11, 2012
15
Characterize: Predict:
base value + offset value Use history information exploit the continuity brought by the sliding
sw sw
1 1
n n n n
+ −
6th Symposium on NoCS, Denmark May 9-11, 2012
16
Characterize:
Critical instant, ,to calculate a σ bound per
Predict:
c sw sw c c c
c
6th Symposium on NoCS, Denmark May 9-11, 2012
17
Main components: Sampling
Sampling (t, f(t)) Characterize for current
profile (σ, ρ)
Predict for regulator
parameter
Delay
Release the resets with
interval of Lpw
Overlapping execution =>
MUX
Select results and feed
them into “Predict”
2 GHz,12 K NAND gates (45 nm)
6th Symposium on NoCS, Denmark May 9-11, 2012
18
Leaky-bucket
Incoming flow is
Token generate
Regulator’s (σ, ρ)
1.4GHz, 2.2K NAND gates (45 nm)
Server (1 unit data per token) regulated flow Input flow σ Token rate ρ
) , ( ρ σ
B
6th Symposium on NoCS, Denmark May 9-11, 2012
19
Experiment 1: Fidelity of the sliding window
Experiment 2: Effect of dynamic flow
6th Symposium on NoCS, Denmark May 9-11, 2012
20
Build a model for the online characterizer in
Use a two-state (on/off) MMP (Markov
6th Symposium on NoCS, Denmark May 9-11, 2012
21
Sampling window 8192 cycles, prediction window
Compared to static characterization, dynamic
6th Symposium on NoCS, Denmark May 9-11, 2012
22
The
A performance/cost tradeoff: Higher overlapping,
6th Symposium on NoCS, Denmark May 9-11, 2012
23
Use RTL models for characterizers, regulators
The network is a deflection network as it is
Use both synthetic traffic and Splash2
6th Symposium on NoCS, Denmark May 9-11, 2012
24
56 masters, 8 slaves. Measure regulation delay and network delay.
6th Symposium on NoCS, Denmark May 9-11, 2012
25
Three configurations:
No regulation: Characterizer is disabled, regulator
Static regulation: Regulators are configured once
Dynamic regulation: Characterizers are enabled.
6th Symposium on NoCS, Denmark May 9-11, 2012
26
56 masters inject the on-off traffic to 8 slaves with
Each master generates 8 flows, each targeting a slave.
6th Symposium on NoCS, Denmark May 9-11, 2012
27
Dynamic regulation outperforms static regulation for 34 (61%) of the 56
aggregates, with the maximum and average reduction of 452 cycles (16%) and 146.8 cycles (5.8%).
Dynamic regulation outperforms no-regulation for 46 (82%) of the 56
and 167.5 cycles (6.3%).
6th Symposium on NoCS, Denmark May 9-11, 2012
28
Dynamic regulation outperforms static regulation for all 56 aggregates,
with the maximum and average reduction of 186 cycles (13.8%) and 108.6 cycles (14.5%), resp.
Dynamic regulation outperforms no-regulation for 45 (80%) of the 56
and 147.8 cycles (17.7%), resp.
6th Symposium on NoCS, Denmark May 9-11, 2012
29
Full-system simulator SIMICS together with GEMS (for the
According to the figure, we configured a CMP system with 56
Each core has L1 I/D Caches: 64KB, 4 way set-associative; L2
Total off-chip memory size is 4 GB with each memory being
Directory-based MOESI protocol. The configured CMP system runs Solaris 9 OS. After being compiled, the benchmark programs ran on the OS
6th Symposium on NoCS, Denmark May 9-11, 2012
30
Compared to static regulation, the improvement in overall average
packet delay ranges from 12 to 90 cycles, from 10% to 26% in percentage.
Compared to no-regulation, it is from 53 to 190 cycles, from 22%
to 41% in percentage.
6th Symposium on NoCS, Denmark May 9-11, 2012
31
Online traffic profiling through a sliding window
Integrating the online characterization into flow
Compared to static and no regulation, dynamic
6th Symposium on NoCS, Denmark May 9-11, 2012
32
Delay reduction of dynamic vs. static regulation for FFT Future work: include network status into the control loop.
6th Symposium on NoCS, Denmark May 9-11, 2012
33
6th Symposium on NoCS, Denmark May 9-11, 2012