Déjà Vu Switching for Multiplane NoCs NOCS’12
Ahmed Rami Melhem Alex Jones Abousamra University of Pittsburgh - - PowerPoint PPT Presentation
Ahmed Rami Melhem Alex Jones Abousamra University of Pittsburgh - - PowerPoint PPT Presentation
Ahmed Rami Melhem Alex Jones Abousamra University of Pittsburgh Dj Vu Switching for Multiplane NoCs NOCS12 Power efficiency has become a primary concern in the design of CMPs. The NoC of Intels TeraFLOPS processor consumes more than
Déjà Vu Switching for Multiplane NoCs NOCS’12
Power efficiency has become a primary concern in the design of CMPs. The NoC of Intel’s TeraFLOPS processor consumes more than 28% of the chip’s power. Network messages can be classified into critical and non-critical It may be possible to send non-critical messages
- n a slower plane without hurting performance.
Déjà Vu Switching for Multiplane NoCs NOCS’12
Baseline Single Plane NoC Control Plane Data Plane
- Carries control and coherence
messages: data requests, invalidates, acknowledgments, …
- Operates at baseline’s voltage &
frequency
- Carries data messages
- Operates at a lower
voltage & frequency to save power
Déjà Vu Switching for Multiplane NoCs NOCS’12
Motivation & Related work Importance of data messages to performance The Optimization Problem Proposed Solution
Déjà Vu Switching Analysis of the acceptable data plane speed
Evaluation Summary
Déjà Vu Switching for Multiplane NoCs NOCS’12
1 2 3 4 5 6 7 8
Cache line having 8 data words Data Message
5 1 2 3 4 6 7 8
Header Critical Word
Subsequent miss for another word in the line Delayed cache hit
Critical Word Non-Critical Words
Déjà Vu Switching for Multiplane NoCs NOCS’12
If delayed cache hits are overly delayed, performance can suffer.
Percentage of L1 misses that are delayed cache hits
0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% barnes blackscholes bodytrack fluidanimate lu contig lu noncontig
- cean contig
radiosity radix raytrace specjbb water nsquared water spatial Geometric mean % of L1 misses that are delayed hits
Déjà Vu Switching for Multiplane NoCs NOCS’12
Motivation & Related work Importance of data messages to performance The Optimization Problem Proposed Solution
Déjà Vu Switching Analysis of the acceptable data plane speed
Evaluation Summary
Déjà Vu Switching for Multiplane NoCs NOCS’12
How slow can we operate the data plane to maximize energy savings while not impacting performance?
Déjà Vu Switching for Multiplane NoCs NOCS’12
Split NoC physically into 2 planes: control + data On data plane:
Use circuit-switching to speed-up communication. Reduce voltage & frequency.
Send a control message to establish the circuit once a cache hit is detected. Do not block circuit establishment message: Déjà Vu Switching Analyze acceptable slow down of the data plane to minimize energy while maintaining performance.
Déjà Vu Switching for Multiplane NoCs NOCS’12 Request Packet Reservation Packet Data Packet Control Plane Data Plane
1) Upon a cache miss, a data request is sent
Déjà Vu Switching for Multiplane NoCs NOCS’12 Request Packet Reservation Packet Data Packet Control Plane Data Plane 2) Upon a data hit in the next cache level, the circuit reservation packet is sent
Déjà Vu Switching for Multiplane NoCs NOCS’12 Request Packet Reservation Packet Data Packet Control Plane Data Plane 3) Finally, the data packet is sent to the requester on the reserved circuit
Déjà Vu Switching for Multiplane NoCs NOCS’12 2 3 1 2 3 1 Control Plane Data Plane
Reserved Circuits Reservation Packets Data Packets
East-North West-North West-East
Router
Déjà Vu Switching for Multiplane NoCs NOCS’12 Reservation Queue of West Input Port Reservation Queue of East Output Port Reserving the West-East Connection: E W Head of the Queues
Port Names: North West South East Local
Déjà Vu Switching for Multiplane NoCs NOCS’12 S S Reservation Queues
- f the Input Ports
Reservation Queues
- f the Output Ports
E N N N N S W North West South East Local E W N Head of the Queues North West South East Local
Port Names: North West South East Local
Déjà Vu Switching for Multiplane NoCs NOCS’12 Reservation Queues
- f the Input Ports
Reservation Queues
- f the Output Ports
N S North West South East Local E W N Head of the Queues North West South East Local N
Port Names: North West South East Local
Déjà Vu Switching for Multiplane NoCs NOCS’12 Reservation Queues
- f the Input Ports
Reservation Queues
- f the Output Ports
N S North West South East Local E W N
Port Names: North West South East Local
Head of the Queues North West South East Local N
Déjà Vu Switching for Multiplane NoCs NOCS’12
Each input and output port must independently track the reserved circuits it is part of. Any two reservation packets that share part of their paths, must traverse all the shared links in the same order Data packets must be injected onto the data plane in the same order their reservation packets are injected onto the control plane.
Déjà Vu Switching for Multiplane NoCs NOCS’12
Motivation & Related work Importance of data messages to performance The Optimization Problem Proposed Solution
Déjà Vu Switching Analysis of the acceptable data plane speed
Evaluation Summary
Déjà Vu Switching for Multiplane NoCs NOCS’12 8 2 3 4 5 6 7 9 12 10 11 1 13
Time
Reservation Packet injected early at cycle 1 Data Packet injected at cycle 3
Déjà Vu Switching for Multiplane NoCs NOCS’12 8 2 3 4 5 6 7 9 12 10 11 1 13
Time
Reservation Packet injected early at cycle 1 Data Packet injected at cycle 3
Déjà Vu Switching for Multiplane NoCs NOCS’12
Déjà Vu Switching for Multiplane NoCs NOCS’12
Motivation & Related work Importance of data messages to performance The Optimization Problem Proposed Solution
Déjà Vu Switching Analysis of the acceptable data plane speed
Evaluation Summary
Déjà Vu Switching for Multiplane NoCs NOCS’12
We use the functional simulator, Simics, to simulate cache coherent CMPs of 16 and 64 cores. We use Orion2 to get power numbers for the interconnect routers We evaluate with
Synthetic traces: allows varying the network load Execution driven simulation of parallel benchmarks from the SPLASH-2 and PARSEC suits
Interconnect Topology: Mesh
Déjà Vu Switching for Multiplane NoCs NOCS’12
Baseline NoC:
Single plane, 16 byte links, packet switched with 3 cycles router pipeline, clocked at 4 GHz
Evaluated NoC:
Control plane: 6 byte links, packet switched, 4GHz Data plane: 10 byte links, circuit switched.
Control and Coherence Packets: 1-flit Data Packets:
Baseline NoC: 5 flits Data Plane: 7 flits
Déjà Vu Switching for Multiplane NoCs NOCS’12
Normalized NoC energy and completion time
- f synthetic traces on a 64-core CMP
0% 20% 40% 60% 80% 100% 120%
0,01 0,03 0,05 4 GHz 4/4 GHz 4/3 GHz 4/2.66 GHz 4/2 GHz Normalized Energy Consumption Traffic Injection Rate (request / cycle / node)
0% 20% 40% 60% 80% 100% 120%
0,01 0,03 0,05 4 GHz 4/4 GHz 4/3 GHz 4/2.66 GHz 4/2 GHz Traffic Injection Rate (request / cycle / node) Normalized Completion Time
*Each request receives a reply data packet
Déjà Vu Switching for Multiplane NoCs NOCS’12
0% 20% 40% 60% 80% 100% 120%
barnes blackscholes bodytrack fluidanimate lu contig lu noncontig
- cean contig
radiosity radix raytrace specjbb water nsquared water spatial Geometric mean 4 GHz 4/4 GHz 4/3 GHz 4/2.66 GHz 4/2 GHz
Normalized Execution Time
Normalized execution time on a 16-core CMP
Déjà Vu Switching for Multiplane NoCs NOCS’12
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
barnes blackscholes bodytrack fluidanimate lu contig lu noncontig
- cean contig
radiosity radix raytrace specjbb water nsquared water spatial Geometric mean 4 GHz 4/4 GHz 4/3 GHz 4/2.66 GHz 4/2 GHz
Normalized Energy Consumption
Normalized NoC energy on a 16-core CMP
Déjà Vu Switching for Multiplane NoCs NOCS’12
0% 20% 40% 60% 80% 100% 120% 140% barnes blackscholes bodytrack fluidanimate lu contig lu noncontig
- cean contig
radiosity radix specjbb water nsquared water spatial Geometric mean 4 GHz 4/4 GHz 4/3 GHz 4/2.66 GHz 4/2 GHz
Normalized Execution Time
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% barnes blackscholes bodytrack fluidanimate lu contig lu noncontig
- cean contig
radiosity radix specjbb water nsquared water spatial Geometric mean 4 GHz 4/4 GHz 4/3 GHz 4/2.66 GHz 4/2 GHz
Normalized Energy Consumption
Normalized NoC energy and execution time
- n a 64-core CMP
Déjà Vu Switching for Multiplane NoCs NOCS’12
90% 95% 100% 105% 110% barnes blackscholes bodytrack fluidanimate lu contig lu noncontig
- cean contig
radiosity radix raytrace specjbb water nsquared water spatial Geometric mean PS 4/4 DV 4/4 PS 4/2.66 PS+CW 4/2.66 DV 4/2.66
Performance Relative to Single Plane Baseline
Performance relative to CMP with a single plane NoC
- n a 16-core CMP
Déjà Vu Switching for Multiplane NoCs NOCS’12
- L. Cheng et al. (ISCA'06) and A. Flores et al. (IEEE
- Trans. Computers'10): Heterogeneous NoC
using wires of different latency and power characteristics to improve performance and reduce NoC energy. Proposal requires wide links (75 bytes), but performance degrades with narrow links. Our work differs in:
Tying latency of messages to performance Using Déjà Vu Switching to Compensate for slower data plane
Déjà Vu Switching for Multiplane NoCs NOCS’12
Problem: Saving power in the NoC by reducing the data plane’s power consumption without impacting performance. Delayed cache hits are important to performance. Operating data plane in circuit-switched mode allows it to operate at reduced frequency. Déjà Vu Switching allows reservation to proceed when resources are not currently available. The constraints governing the speed of the data plane can be estimated analytically.
Déjà Vu Switching for Multiplane NoCs NOCS’12