2 Amateur Photography - - PowerPoint PPT Presentation
2 Amateur Photography - - PowerPoint PPT Presentation
Syn thetic Full System Traffic Models Capturing Cache Coherence Behaviour Mario Badr | Natalie Enright Jerger 2 Amateur Photography http://sevennine.net/archives/2009/10/01/torontos-skyline-at-night/ 3 Photography 101
2
Amateur Photography
3
http://sevennine.net/archives/2009/10/01/torontos-skyline-at-night/
Photography 101
4
http://sevennine.net/archives/2009/10/01/torontos-skyline-at-night/
Hundreds of Pictures
5
SynFull
6
http://sevennine.net/archives/2009/10/01/torontos-skyline-at-night/
Pictures for Facebook
7
What Does SynFull Do?
- Model real application traffic to the NoC
- Generate realistic traffic synthetically for the NoC
- Iterate over several NoC designs quickly
8
Tool Available for Download
SynFull’s Goals
- Generic – Current and future applications
– 16 different benchmarks
- Accurate – Comparable performance metrics
– 10.5% error
- Fast – Faster than full system and traces
– 52x speed up
9
NoC Simulation Methodologies
10
- Full System
- Traces
- Traffic Patterns
Full System Simulation
11
NoC Simulator Full System Simulator
NoC
Processor Cache Disk Other Components Application
Packets Sent Packets Arrived
Feedback! Accurate But Slow
Trace Simulation
12
NoC Simulator Trace Simulator Trace
NoC B
Packets Sent NoC A
Processor Cache Disk Other
Application
Faster But Less Accurate
Traffic Patterns
13
NoC Simulator
NoC
Synthetic Traffic Driver
Application
Traffic Pattern Uniform Random Bit Complement Bit Reverse Bit Rotation Shuffle Transpose Tornado Neighbour
Very Fast But Inaccurate
The Opportunity
14
Speed Accuracy
The Opportunity
15
Speed Accuracy
SynFull
Achieving the Goals
- Synthetic Cache Coherence
– Dependent Messages – Enable Research
- Time-Varying Behaviour
– Short and Long Bursts of Traffic
- Convergence
– Simulation length?
16
Accuracy Speed
Motivating Cache Coherence
Shuffle Fast Fourier Transform
17
Cache Coherence Affects Traffic Behaviour
Capturing Coherence Traffic
- Example
– MOESI Protocol – Can be adapted
18
1 2 3 4 5 6 7 8 9
Capturing Coherence Traffic
- Initiate Transaction
- Store Miss
– Source
19
1 2 3 4 5 6 7 8 9
Store Miss 7
Capturing Coherence Traffic
- Store Miss
– Source – Destination
20
1 2 3 4 5 6 8 9
Directory 3
7
Capturing Coherence Traffic
- Store Miss
- Forwarded Request
– Destination
21
1 2 3 4 5 6 8 9
Owner 1
7
Capturing Coherence Traffic
- Store Miss
- Forwarded Request
- Invalidations
– Quantity – Destinations
22
1 2 3 4 5 6 8 9
Invalidate 2, 6
7
Capturing Coherence Traffic
- Store Miss
- Forwarded Request
- Invalidations
- Acknowledgements
23
1 2 3 4 5 6 8 9
ACKs 2, 6
7
Capturing Coherence Traffic
- Store Miss
- Forwarded Request
- Invalidations
- Acknowledgements
- Data Response
24
1 2 3 4 5 6 8 9
Data to 7
7
Capturing Coherence Traffic
- Store Miss
- Forwarded Request
- Invalidations
- Acknowledgements
- Data Response
- Unblock
25
1 2 3 4 5 6 8 9
Transaction Complete
7
Time Varying Behaviour
26
Barrier Barrier Initiating Transactions and Sharing Patterns Can Change
Time-Varying Behaviour
27
Applications go through phases
Time Bin (500,000 cycles per bin) Packets Injected FluidanimateBenchmark High H H H H H Low L L L L
Modelling Time-Varying Behaviour
- Create and group phases
– Clustering
- Transition from one phase to another
- Markov Chains
28
Dividing Into Intervals
29
Intervals are a fixed size
Dividing Into Intervals
30
Visually we see: High, Low + High, and Low Intervals
Phase Transitions: Markov Chains
31
17% 83% 100% 45% 55%
P[Next State | Current State]
32
Coarse Granularity → Average Behaviour
Traffic Comparison
Actual Synthetic
- Macro Level
– 100,000s of Cycles – Long phases – Outer-Loops
- Micro Level
– 100s of Cycles – Short Bursts – Inner-Loops
- Hierarchical Model
33
Capturing Short Bursts
Modelling Parameters
- Model accuracy affected by:
– Interval Size – Interval Similarity – Number of Clusters
34
See Paper for Parameter Sweep & Recommendations
Creating The Models
35
Ideal Network
Processor Cache Disk Other
Application
Ideal Trace
Creating The Models
36
Ideal Network
Processor Cache Disk Other
Application
Ideal Trace SynFull Modelling Parameters Model
Creating The Models
37
Ideal Network
Processor Cache Disk Other
Application
Ideal Trace SynFull Modelling Parameters Model
NoC
Traffic Generator
NoC NoCs
Evaluation Methodology
Network meshDOR meshADAP fbfly Topology Mesh Mesh Flattened Butterfly Channel Width 8 bytes 4 bytes 4 bytes Virtual Channels 2 per port 2 per port 4 per port Routing XY Adaptive YX-XY UGAL
- 16 Out-of-Order Cores
- MOESI Protocol
- 16 Benchmarks (Splash-2, PARSEC)
- Traces with Dependencies Comparison
38
Packet Latency Error
39
0% 25% 50% 75% 100%
meshDOR meshADAP fbfly
GeomeanError Percentage
Trace Dependency SynFull
Lower is Better
No Throttling For Initiating Transactions
Distribution Error
40
0.00 0.04 0.08 0.12 0.16 0.20
meshDOR meshADAP fbfly
Geomeanof Helinger Distances
Trace Dependency SynFull
Lower is Better
Captures Congestion
What About Speed?
41
Markov Probability Matrix,
1 2
What About Speed?
42
=
Markov Probability Matrix, after a while… converges
1 2
56% 44%
Speed Up
43
24 27 52
10 20 30 40 50 60
Trace Dependency SynFull SynFull (SS) Speed Up
52x Speed Up With 11.7% Error
Conclusion
- Implemented Synthetic Traffic Models that are
– Accurate: 10.5% error – Fast: Over 50x average speed up – Generic: SynFull works for many applications
44
QUESTIONS & ANSWERS
Thank you for listening! 45
http://www.eecg.toronto.edu/~enright/items/synfull_download.html Try SynFull Out:
Back Up: Design Space Exploration
46
10 20 30 40 50 60 2 4 8 16 Average Packet Latency Buffer Size (Number of Flits) Full System SynFull
Same Conclusion, Less Time
Back Up: meshDOR Packet Latency
47
20 40 60 80 100 120 140
- Avg. Packet Latency
Full System Trace Dependency SynFull
Back Up: fbfly Packet Latency
48
20 40 60 80 100 120 140
- Avg. Packet Latency
Full System Trace Dependency SynFull
Back Up: meshADAP Packet Latency
49
20 40 60 80 100 120 140
- Avg. Packet Latency
Full System Trace Dependency SynFull
Back Up: Average Throughput
50
12% 16% 12% 0% 4% 8% 12% 16% 20% meshDOR meshADAP fbfly Geomean Error Percentage
Back Up: Speed Up Per Application
51
20 40 60 80 100 120 140 160
Average Speed Up Trace Dependency SynFull SynFull (SS)
Averaged Over 3 Runs (Different NoCs)
Back Up: Steady State
42% 21% 37% 1.68% 0.84% 1.48%
Steady State Acceptable +/-
0.000191
MSE
Before Simulation During Simulation
42% 21% 37% ?% ?% ?%
Steady State Current +/-
< MSE Exit
Current +/- Depends on State Transitions (RNG) and MSE
52
Back Up: Shuffle NoC
53
1 2 8 4 9 12 3 6 14 7 13 11 15 5 10
Back Up: Shuffle NoC
54
1 2 8 4 9 12 3 6 14 7 13 11 15 5 10
Ring Topology; Max. 2 Hops Needed
Triangle Score
55
Cache Coherence Time Varying Fast
NoC Simulation Methodologies
56
Cache Coherence Time Varying Fast
Full System Trace Traffic Pattern
SynFull
57