ahmed rami melhem alex jones abousamra university of
play

Ahmed Rami Melhem Alex Jones Abousamra University of Pittsburgh - PowerPoint PPT Presentation

Ahmed Rami Melhem Alex Jones Abousamra University of Pittsburgh Dj Vu Switching for Multiplane NoCs NOCS12 Power efficiency has become a primary concern in the design of CMPs. The NoC of Intels TeraFLOPS processor consumes more than


  1. Ahmed Rami Melhem Alex Jones Abousamra University of Pittsburgh Déjà Vu Switching for Multiplane NoCs NOCS’12

  2. Power efficiency has become a primary concern in the design of CMPs. The NoC of Intel’s TeraFLOPS processor consumes more than 28% of the chip’s power. Network messages can be classified into critical and non-critical It may be possible to send non-critical messages on a slower plane without hurting performance. Déjà Vu Switching for Multiplane NoCs NOCS’12

  3. Baseline Single Plane NoC Control Plane Data Plane • Carries control and coherence • Carries data messages messages: data requests, • Operates at a lower invalidates, acknowledgments, … voltage & frequency • Operates at baseline’s voltage & to save power frequency Déjà Vu Switching for Multiplane NoCs NOCS’12

  4. Motivation & Related work Importance of data messages to performance The Optimization Problem Proposed Solution Déjà Vu Switching Analysis of the acceptable data plane speed Evaluation Summary Déjà Vu Switching for Multiplane NoCs NOCS’12

  5. Critical Word Cache line having 8 data words 1 2 3 4 5 6 7 8 Data Message 5 1 2 3 4 6 7 8 Header Critical Word Non-Critical Words Subsequent miss for another word in the line  Delayed cache hit Déjà Vu Switching for Multiplane NoCs NOCS’12

  6. If delayed cache hits are overly delayed, performance can suffer. 50% 45% % of L1 misses that are delayed hits 40% 35% 30% 25% 20% 15% 10% 5% 0% barnes blackscholes bodytrack fluidanimate lu contig lu noncontig ocean contig radiosity radix raytrace specjbb water nsquared water spatial Geometric mean Percentage of L1 misses that are delayed cache hits Déjà Vu Switching for Multiplane NoCs NOCS’12

  7. Motivation & Related work Importance of data messages to performance The Optimization Problem Proposed Solution Déjà Vu Switching Analysis of the acceptable data plane speed Evaluation Summary Déjà Vu Switching for Multiplane NoCs NOCS’12

  8. How slow can we operate the data plane to maximize energy savings while not impacting performance? Déjà Vu Switching for Multiplane NoCs NOCS’12

  9. Split NoC physically into 2 planes: control + data On data plane: Use circuit-switching to speed-up communication. Reduce voltage & frequency. Send a control message to establish the circuit once a cache hit is detected. Do not block circuit establishment message: Déjà Vu Switching Analyze acceptable slow down of the data plane to minimize energy while maintaining performance. Déjà Vu Switching for Multiplane NoCs NOCS’12

  10. Data Packet Request Packet Reservation Packet Control Plane 1) Upon a cache miss, a data request is sent Data Plane Déjà Vu Switching for Multiplane NoCs NOCS’12

  11. Data Packet Request Packet Reservation Packet Control Plane 2) Upon a data hit in the next cache level, the circuit reservation packet is sent Data Plane Déjà Vu Switching for Multiplane NoCs NOCS’12

  12. Data Packet Request Packet Reservation Packet Control Plane 3) Finally, the data packet is sent to the requester on the reserved circuit Data Plane Déjà Vu Switching for Multiplane NoCs NOCS’12

  13. Router Control Plane Reservation 3 2 1 Packets West-East West-North East-North Data Plane Reserved Circuits 3 2 1 Data Packets Déjà Vu Switching for Multiplane NoCs NOCS’12

  14. Head of the Reservation Queue of Reservation Queue of Queues West Input Port East Output Port E W Reserving the West-East Connection: Port Names: North West South East Local Déjà Vu Switching for Multiplane NoCs NOCS’12

  15. Head of the Reservation Queues Reservation Queues Queues of the Input Ports of the Output Ports North North S N S W West West N E South South N N N E East East S W Local Local Port Names: North West South East Local Déjà Vu Switching for Multiplane NoCs NOCS’12

  16. Head of the Reservation Queues Reservation Queues Queues of the Input Ports of the Output Ports North North N W West West N South South N E East East S Local Local Port Names: North West South East Local Déjà Vu Switching for Multiplane NoCs NOCS’12

  17. Head of the Reservation Queues Reservation Queues Queues of the Input Ports of the Output Ports North North N W West West N South South N E East East S Local Local Port Names: North West South East Local Déjà Vu Switching for Multiplane NoCs NOCS’12

  18. Each input and output port must independently track the reserved circuits it is part of. Any two reservation packets that share part of their paths, must traverse all the shared links in the same order Data packets must be injected onto the data plane in the same order their reservation packets are injected onto the control plane. Déjà Vu Switching for Multiplane NoCs NOCS’12

  19. Motivation & Related work Importance of data messages to performance The Optimization Problem Proposed Solution Déjà Vu Switching Analysis of the acceptable data plane speed Evaluation Summary Déjà Vu Switching for Multiplane NoCs NOCS’12

  20. Time 10 13 11 12 6 1 8 2 3 4 5 7 9 Reservation Packet injected early at cycle 1 Data Packet injected at cycle 3 Déjà Vu Switching for Multiplane NoCs NOCS’12

  21. Time 10 13 11 12 6 1 8 2 3 4 5 7 9 Reservation Packet injected early at cycle 1 Data Packet injected at cycle 3 Déjà Vu Switching for Multiplane NoCs NOCS’12

  22. Déjà Vu Switching for Multiplane NoCs NOCS’12

  23. Motivation & Related work Importance of data messages to performance The Optimization Problem Proposed Solution Déjà Vu Switching Analysis of the acceptable data plane speed Evaluation Summary Déjà Vu Switching for Multiplane NoCs NOCS’12

  24. We use the functional simulator, Simics, to simulate cache coherent CMPs of 16 and 64 cores. We use Orion2 to get power numbers for the interconnect routers We evaluate with Synthetic traces: allows varying the network load Execution driven simulation of parallel benchmarks from the SPLASH-2 and PARSEC suits Interconnect Topology: Mesh Déjà Vu Switching for Multiplane NoCs NOCS’12

  25. Baseline NoC: Single plane, 16 byte links, packet switched with 3 cycles router pipeline, clocked at 4 GHz Evaluated NoC: Control plane: 6 byte links, packet switched, 4GHz Data plane: 10 byte links, circuit switched. Control and Coherence Packets: 1-flit Data Packets: Baseline NoC: 5 flits Data Plane: 7 flits Déjà Vu Switching for Multiplane NoCs NOCS’12

  26. 4 GHz 4/4 GHz 4/3 GHz 4/2.66 GHz 4/2 GHz 4 GHz 4/4 GHz 4/3 GHz 4/2.66 GHz 4/2 GHz Normalized Energy Consumption 120% 120% Normalized Completion Time 100% 100% 80% 80% 60% 60% 40% 40% 20% 20% 0% 0% 0,01 0,03 0,05 0,01 0,03 0,05 Traffic Injection Rate (request / cycle / node) Traffic Injection Rate (request / cycle / node) Normalized NoC energy and completion time of synthetic traces on a 64-core CMP *Each request receives a reply data packet Déjà Vu Switching for Multiplane NoCs NOCS’12

  27. 4 GHz 4/4 GHz 4/3 GHz 4/2.66 GHz 4/2 GHz Normalized Execution 120% 100% 80% Time 60% 40% 20% 0% barnes blackscholes bodytrack fluidanimate lu contig lu noncontig ocean contig radiosity radix raytrace specjbb nsquared water spatial Geometric water mean Normalized execution time on a 16-core CMP Déjà Vu Switching for Multiplane NoCs NOCS’12

  28. 4 GHz 4/4 GHz 4/3 GHz 4/2.66 GHz 4/2 GHz 100% Normalized Energy 90% 80% Consumption 70% 60% 50% 40% 30% 20% 10% 0% barnes blackscholes bodytrack fluidanimate lu contig lu noncontig ocean contig radiosity radix raytrace specjbb nsquared water spatial Geometric water mean Normalized NoC energy on a 16-core CMP Déjà Vu Switching for Multiplane NoCs NOCS’12

  29. Normalized Energy Consumption Déjà Vu Switching for Multiplane NoCs 100% 10% 20% 30% 40% 50% 60% 70% 80% 90% 0% 4 GHz barnes blackscholes 4/4 GHz bodytrack Normalized NoC energy and execution time fluidanimate lu contig 4/3 GHz lu noncontig ocean contig radiosity 4/2.66 GHz radix on a 64-core CMP specjbb water nsquared 4/2 GHz water spatial Geometric mean Normalized Execution Time 100% 120% 140% 20% 40% 60% 80% 0% 4 GHz barnes blackscholes 4/4 GHz bodytrack fluidanimate lu contig 4/3 GHz lu noncontig ocean contig radiosity 4/2.66 GHz NOCS’12 radix specjbb water nsquared 4/2 GHz water spatial Geometric mean

  30. PS 4/4 DV 4/4 PS 4/2.66 PS+CW 4/2.66 DV 4/2.66 Performance Relative to Single Plane 110% 105% 100% Baseline 95% 90% barnes blackscholes bodytrack fluidanimate lu contig lu noncontig ocean contig radiosity radix raytrace specjbb water nsquared water spatial Geometric mean Performance relative to CMP with a single plane NoC on a 16-core CMP Déjà Vu Switching for Multiplane NoCs NOCS’12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend