snacknoc processing in the communication layer
play

SNACKNOC: PROCESSING IN THE COMMUNICATION LAYER Karthik Sangaiah , - PowerPoint PPT Presentation

SNACKNOC: PROCESSING IN THE COMMUNICATION LAYER Karthik Sangaiah , Michael Lui, Ragh Kuttappa, Baris Taskin, and Mark Hempstead Feb 25 th 2020 VLSI and Architecture Lab Opportunistic Resources for Graduate Students 2 Free leftovers Steak


  1. SNACKNOC: PROCESSING IN THE COMMUNICATION LAYER Karthik Sangaiah , Michael Lui, Ragh Kuttappa, Baris Taskin, and Mark Hempstead Feb 25 th 2020 VLSI and Architecture Lab

  2. Opportunistic Resources for Graduate Students 2 Free leftovers Steak dinner toward Opportunistically collecting snacks towards a meal.

  3. Opportunistic Resources in the CMP 3 “Free leftovers” Interconnect Communication Interconnect NoC Router Intel Skylake 8180 HCC [1] Opportunistically collecting “snacks” to make a “meal”. [1] Intel Skylake SP HCC, Wikichip.

  4. Opportunistic Resources in the CMP 4 “Free leftovers” Interconnect Communication Interconnect NoC Router Intel Skylake 8180 HCC [1] Opportunistically collecting “snacks” to What is the performance gain we add by make a “meal”. opportunistically “snacking” on CMP resources? [1] Intel Skylake SP HCC, Wikichip.

  5. Quantifying Design Slack in the NoC 5  NoC designed to minimize latency during heavy traffic  NoC implementation can account for 60% to 75% of the miss latency [2] [2] Sanchez et al., ACM TACO, 2010.

  6. Quantifying Design Slack in the NoC 6  NoC designed to minimize latency during heavy traffic  NoC implementation can account for 60% to 75% of the miss latency [2]  Study of NoC resource utilization on recent NoCs designs  3 selected best paper nominated NoCs have similar performance:  DAPPER [3] , AxNoC [4] , BiNoCHS [5]  Reducing resources, substantially reduced performances  Further details of study is in our paper [2] Sanchez et al., ACM TACO, 2010. [3] Raparti et al., IEEE/ACM NOCS, 2018. [4] Ahmed et al., IEEE/ACM NOCS, 2018. [5] Mirhosseini et al, IEEE/ACM NOCS, 2017.

  7. Quantifying Design Slack in the NoC 7  NoC designed to minimize latency  Opportunities in Network-on-Chip during heavy traffic Slack  NoC implementation can account for 60% to 75% of the miss latency [2]  Study of NoC resource utilization on recent NoCs designs NoC Router  3 selected best paper nominated NoCs have similar performance:  DAPPER [3] , AxNoC [4] , BiNoCHS [5]  Reducing resources, substantially reduced performances  Further details of study is in our paper [2] Sanchez et al., ACM TACO, 2010. [3] Raparti et al., IEEE/ACM NOCS, 2018. [4] Ahmed et al., IEEE/ACM NOCS, 2018. [5] Mirhosseini et al, IEEE/ACM NOCS, 2017.

  8. Quantifying Design Slack in the NoC 8  NoC designed to minimize latency  Opportunities in Network-on-Chip during heavy traffic Slack  NoC implementation can account for  Crossbar 60% to 75% of the miss latency [2]  Study of NoC resource utilization on recent NoCs designs NoC Router  3 selected best paper nominated NoCs have similar performance:  DAPPER [3] , AxNoC [4] , BiNoCHS [5]  Reducing resources, substantially reduced performances  Further details of study is in our paper [2] Sanchez et al., ACM TACO, 2010. [3] Raparti et al., IEEE/ACM NOCS, 2018. [4] Ahmed et al., IEEE/ACM NOCS, 2018. [5] Mirhosseini et al, IEEE/ACM NOCS, 2017.

  9. Quantifying Design Slack in the NoC 9  NoC designed to minimize latency  Opportunities in Network-on-Chip during heavy traffic Slack  NoC implementation can account for  Crossbar 60% to 75% of the miss latency [2]  Network Links  Study of NoC resource utilization on recent NoCs designs NoC Router  3 selected best paper nominated NoCs have similar performance:  DAPPER [3] , AxNoC [4] , BiNoCHS [5]  Reducing resources, substantially reduced performances  Further details of study is in our paper [2] Sanchez et al., ACM TACO, 2010. [3] Raparti et al., IEEE/ACM NOCS, 2018. [4] Ahmed et al., IEEE/ACM NOCS, 2018. [5] Mirhosseini et al, IEEE/ACM NOCS, 2017.

  10. Quantifying Design Slack in the NoC 10  NoC designed to minimize latency  Opportunities in Network-on-Chip during heavy traffic Slack  NoC implementation can account for  Crossbar 60% to 75% of the miss latency [2]  Network Links  Internal Buffers  Study of NoC resource utilization on recent NoCs designs NoC Router  3 selected best paper nominated NoCs have similar performance:  DAPPER [3] , AxNoC [4] , BiNoCHS [5]  Reducing resources, substantially reduced performances  Further details of study is in our paper [2] Sanchez et al., ACM TACO, 2010. [3] Raparti et al., IEEE/ACM NOCS, 2018. [4] Ahmed et al., IEEE/ACM NOCS, 2018. [5] Mirhosseini et al, IEEE/ACM NOCS, 2017.

  11. Quantifying Design Slack in the NoC 11 Simulated 16 core CMP with 4 benchmarks representing  Crossbar Utilization “low”, “medium”, “medium-high”, “high” traffic Crossbar Utilization:  Peak utilization (Graph 500): 42% utilization  Highest median (Graph 500): 13.3% utilization 

  12. Quantifying Design Slack in the NoC 12 Simulated 16 core CMP with 4 benchmarks representing  Crossbar Utilization “low”, “medium”, “medium-high”, “high” traffic Crossbar Utilization:  Peak utilization (Graph 500): 42% utilization  Highest median (Graph 500): 13.3% utilization  Median utilization, Router 5: 8.6% Router 5 50 Router Crossbar Usage (%) 40 30 20 10 0 25 30 35 40 Time (10 8 Cycles)

  13. Quantifying Design Slack in the NoC 13 Simulated 16 core CMP with 4 benchmarks representing  Crossbar Utilization “low”, “medium”, “medium-high”, “high” traffic Crossbar Utilization:  Peak utilization (Graph 500): 42% utilization  Highest median (Graph 500): 13.3% utilization 

  14. Quantifying Design Slack in the NoC 14 Simulated 16 core CMP with 4 benchmarks representing  Crossbar Utilization “low”, “medium”, “medium-high”, “high” traffic Crossbar Utilization:  Peak utilization (Graph 500): 42% utilization  Highest median (Graph 500): 13.3% utilization  Link Utilization  Peak utilization link (Graph500): 18% utilization  Highest median link utilization (LULESH): 3.3% utilization  Link Utilization Median utilization, Router 5: 8.6%

  15. Quantifying Design Slack in the NoC 15 Simulated 16 core CMP with 4 benchmarks representing  Crossbar Utilization “low”, “medium”, “medium-high”, “high” traffic Crossbar Utilization:  Peak utilization (Graph 500): 42% utilization  Highest median (Graph 500): 13.3% utilization  Link Utilization  Peak utilization link (Graph500): 18% utilization  Highest median link utilization (LULESH): 3.3% utilization  Link Utilization Buffer Utilization  Median utilization, Router 5: 8.6% Raytrace : 4% of cycles have localized contention  10% utilization during contention  3M flits of the 2.4T flits forwarded: buffer utilization reaches  30-55% of the total capacity

  16. Quantifying Design Slack in the NoC 16 Simulated 16 core CMP with 4 benchmarks representing  Crossbar Utilization “low”, “medium”, “medium-high”, “high” traffic Crossbar Utilization:  Peak utilization (Graph 500): 42% utilization  Highest median (Graph 500): 13.3% utilization  Link Utilization  Peak utilization link (Graph500): 18% utilization  Highest median link utilization (LULESH): 3.3% utilization  Link Utilization The SnackNoC platform improves efficiency Buffer Utilization  Median utilization, Router 5: 8.6% Raytrace : 4% of cycles have localized contention  Router 5 and performance of the CMP by offloading 10% utilization during contention  50 3M flits of the 2.4T flits forwarded: buffer utilization reaches  Router Crossbar Usage (%) 30-55% of the total capacity 40 data-parallel workloads and “snacking” on 30 20 10 network resources. 0 25 30 35 40 Time (10 8 Cycles)

  17. Overview 17  “Slack” of the Communication Fabric  The SnackNoC Platform  Experimental Results  Conclusion and Future Considerations

  18. SnackNoC Platform Overview 18  Goals:  Opportunistically “Snack” on existing network resources for additional performance  Limited additional overhead to uncore  Minimal or zero interference to CMP traffic  Opportunistic NoC-based compute platform  Limited dataflow engine  Applications:  Data-parallel workloads used in scientific computing, graph analytics, and machine learning

  19. SnackNoC Platform Overview 19  Goals:  Opportunistically “Snack” on existing network resources for additional performance  Limited additional overhead to uncore  Minimal or zero interference to CMP traffic  Opportunistic NoC-based compute platform  Limited dataflow engine  Applications:  Data-parallel workloads used in scientific computing, graph analytics, and machine learning Celerity RISC-V SoC [6] [6] S. Davidson et al., IEEE Micro, 2018.

  20. SnackNoC Platform Overview 20  Goals:  Opportunistically “Snack” on existing network resources for additional performance  Limited additional overhead to uncore  Minimal or zero interference to CMP traffic Google Cloud TPU [7]  Opportunistic NoC-based compute platform  Limited dataflow engine  Applications:  Data-parallel workloads used in scientific computing, graph analytics, and machine learning Celerity RISC-V SoC [6] [6] S. Davidson et al., IEEE Micro, 2018. [7] Jouppi et. al, IEEE/ACM ISCA, 2017.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend