OpenSoC Fabric An On-Chip Network Generator Farzad Fatollahi-Fard , - - PowerPoint PPT Presentation

opensoc fabric
SMART_READER_LITE
LIVE PREVIEW

OpenSoC Fabric An On-Chip Network Generator Farzad Fatollahi-Fard , - - PowerPoint PPT Presentation

OpenSoC Fabric An On-Chip Network Generator Farzad Fatollahi-Fard , Dave Donofrio, George Michelogiannakis, John Shalf 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS-2016) April17-19, 2016. Uppsala,


slide-1
SLIDE 1

1

OpenSoC Fabric

An On-Chip Network Generator Farzad Fatollahi-Fard, Dave Donofrio, George Michelogiannakis, John Shalf 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS-2016) April17-19, 2016. Uppsala, Sweden.

slide-2
SLIDE 2

2

Motivation

1

OpenSoC Fabric

2

What is Chisel?

3

OpenSoC Fabric Breakdown

4

Results

5

Conclusion and Future Work

6

slide-3
SLIDE 3

Motivation

  • Want to build and model candidate future HPC

chip multiprocessors

3

Why Are We Doing This?

Parallelism is growing at exponential rate Data movement dominates power costs

An analysis of on-chip interconnection networks for large-scale chip multiprocessors ACM Transactions on computer architecture and code optimization (TACO), April 2010

Network topology greatly affects application performance

slide-4
SLIDE 4

4

What Interconnect Provides the Performance? Is it Open Source?

What tools exist to answer these questions?

slide-5
SLIDE 5

What tools exist for SoC research

  • Software models
  • Fast to create, but

plagued by long runtimes as system size increases

  • Hardware emulation
  • Fast, accurate evaluate

that scales with system size but suffers from long development time

What tools do we have to evaluate large, complex networks of cores?

A complexity-effective architecture for accelerating full- system multiprocessor simulations using FPGAs. FPGA 2008

slide-6
SLIDE 6

Comparison of NoCs

Language Accuracy Verification Drawbacks

Booksim

C++ Cycle-Accurate RTL Long runtimes limit simulation size

Garnet

C++ (GEM5) Event-Driven Other Simulators Not fast enough for larger simulations (1K+ cores)

NoCTweak

SystemC Cycle-Accurate RTL Long runtimes limit simulation size

PhoenixSim

OMNeT++ Event-Driven Other Simulators For Photonics

  • n-chip networks

Topaz

C++ (GEM5) Cycle-Accurate Other Simulators Not fast enough for larger simulations (1K+ cores)

6

Software Tools

slide-7
SLIDE 7

Comparison of NoCs

Language Features Open Source? Drawbacks

Stanford NoC Router

Verilog Long list of Verilog parameters Yes

  • Hard to configure

CONNECT

Bluspec SystemVerilog Completely customizable via website Yes (noncommercial)

  • Designed for

FPGAs

ARM CoreLink

Pre-generated IP Up to clusters of 48 cores No

  • Designed for ARM

cores (not design space exploration)

  • For “small” designs
  • Cache Coherent

Arteris FlexNoC

Pre-generated IP Tool optimized for VLSI design No

  • Full parameters

unknown

7

Hardware Tools

slide-8
SLIDE 8

8

Motivation

1

OpenSoC Fabric

2

What is Chisel?

3

OpenSoC Fabric Breakdown

4

Results

5

Conclusion and Future Work

6

slide-9
SLIDE 9

OpenSoC Fabric

9

  • Part of the CoDEx tool suite
  • Written in Chisel
  • Dimensions, topology, VCs

all configurable

  • Fast functional C++ model

for functional validation

  • Verilog based description

for FPGA or ASIC

  • Synthesis path enables accurate

power / energy modeling

AXI

OpenSoC Fabric

CPU(s) HMC

AXI AXI

CPU(s)

AXI

CPU(s)

AXI

CPU(s)

AXI

CPU(s)

AXI AXI

10GbE PCIe

An Open-Source, Flexible, Parameterized, NoC Generator

slide-10
SLIDE 10

Current Status

10

  • Multiple Topologies
  • Mesh
  • Flattened Butterfly
  • Wormhole Flow Control
  • Virtual Channels
  • Run both through ASIC

and FPGA tools

  • Available for download
  • www.opensocfabric.org

Version 1.1.2 Released

slide-11
SLIDE 11

11

Motivation

1

OpenSoC Fabric

2

What is Chisel?

3

OpenSoC Fabric Breakdown

4

Results

5

Conclusion and Future Work

6

slide-12
SLIDE 12

Chisel: A New Hardware DSL

  • Chisel provides both

software and hardware models from the same codebase

  • Object-oriented

hardware development

  • Allows definition of

structs and other high- level constructs

  • Powerful libraries and

components ready to use

  • Working processors

fabricated using chisel

Using Scala to construct Verilog and C++ descriptions

Verilog FPGA ASIC Hardware Compilation Software Compilation SystemC Simulation C++ Simulation

Scala

Chisel

slide-13
SLIDE 13

Recent Chisel Designs

13

Chisel code successfully boots Linux

Clock test site SRAM test site DCDC test site

Processor Site

  • First tape-out in 2012
  • Raven core taped out in

2014 – 28nm

slide-14
SLIDE 14

Chisel Overview

14

  • Not “Scala to Gates”
  • Describe hardware

functionality

  • Chisel creates graph

representation

  • Flattened
  • Each node

translated to Verilog

  • r C++

How does Chisel work?

>

Mux

x y

Mux(x > y, x, y)

slide-15
SLIDE 15

OpenSoC – Top Level Diagram

15

slide-16
SLIDE 16

OpenSoC – Functional Hierarchy

16

AXI AHB FIFO Router Routing Function Mesh

Flattened Butterfly

Torus Switch Allocator Arbiter

Round Robin Priority

Cyclic

Top-Level Network Interface Topology Injection/ Ejection

slide-17
SLIDE 17

17

Motivation

1

OpenSoC Fabric

2

What is Chisel?

3

OpenSoC Fabric Breakdown

4

Results

5

Conclusion and Future Work

6

slide-18
SLIDE 18

Configuring

  • OpenSoC configured at run time through Parameters

class

  • Declared at top level, sub modules can add / change

parameters tree

  • Not limited to just numerical values
  • Leverage Scala to pass functions to parameterize module

creation

  • Example: Routing Function constructor passed as parameter to

router

18

Parameters

slide-19
SLIDE 19

Configuring

  • All OpenSoC Modules take a Parameters class

as a constructor argument

  • Setting parameters:
  • parms.child("MySwitch", Map( ("numInPorts"->Soft(8)),

("numOutPorts"->Soft(3) ))

  • Getting a parameter:
  • val numInPorts = parms.get[Int]("numInPorts")

19

Parameters

slide-20
SLIDE 20

Developing

20

  • Modules have a

standard interface that you inherit

  • Development of

modules is very quick

  • Flattened Butterfly

took 2 hours of development

Incredibly Fast Development Time

abstract class VCRouter(parms: Parameters) extends Module(parms) { val numInChannels = parms.get[Int] ("numInChannels") val numOutChannels = parms.get[Int] ("numOutChannels") val nunVCs = parms.get[Int]("numVCs") val io = new Bundle { val inChannels = Vec.fill(numInChannels) { new ChannelVC(parms) } val outChannels = Vec.fill(numOutChannels) { new ChannelVC(parms).flip() } } } class SimpleVCRouter(parms: Parameters) extends VCRouter(parms) { // Implementation }

slide-21
SLIDE 21

OpenSoC – Functional Hierarchy

21

AXI AHB FIFO Router Routing Function Mesh

Flattened Butterfly

Torus Switch Allocator Arbiter

Round Robin Priority

Cyclic

Top-Level Network Interface Topology Injection/ Ejection

slide-22
SLIDE 22

OpenSoC – Top Level Modules

  • Stiches routers together
  • Assigns routers individual ID
  • Assigns Routing Function to

routers

  • Passes down Arbitration

scheme

  • Connections Injection and

Ejection Queues for network endpoints

22

Topology

slide-23
SLIDE 23

23

Motivation

1

OpenSoC Fabric

2

What is Chisel?

3

OpenSoC Fabric Breakdown

4

Results

5

Conclusion and Future Work

6

slide-24
SLIDE 24

24

Results – Traffic Patterns

4x4 DOR Single Concentration Dual Virtual Channel Mesh Network

slide-25
SLIDE 25

25

Results – Average Latency

Compared to Booksim OpenSoC Fabric (Software) OpenSoC Fabric (Hardware) Uniform +1.86% +8.37% Tornado +0.84% +0.42% Transpose +7.37% +8.29% Neighbor +0.84% +6.28% Bit Reverse +1.85% +10.6%

slide-26
SLIDE 26

26

Results – Latency and Utilization

Nearest Neighbor Traffic Pattern

slide-27
SLIDE 27

Results – Application Traces

Compared to Booksim OpenSoC AMR Avg latency

  • 2.42%

MiniDFT Avg latency

  • 28.3%

AMG Avg latency +16.3% AMR Execution time

  • 2.19%

MiniDFT Execution time

  • 5.25%

AMG Execution time +130.8%

27

slide-28
SLIDE 28

28

Motivation

1

OpenSoC Fabric

2

What is Chisel?

3

OpenSoC Fabric Breakdown

4

Results

5

Conclusion and Future Work

6

slide-29
SLIDE 29

Future additions

  • Upgrade OpenSoC Fabric to use Chisel 3
  • A collection of topologies and routing functions
  • Standardized interfaces at the endpoints
  • Power modeling in the C++ model

29

Towards a full set of features

slide-30
SLIDE 30

Conclusion

  • This is an open-source community-driven

infrastructure

  • We are counting on your contributions

30

slide-31
SLIDE 31

Acknowledgements

  • UCB Chisel
  • US Dept of Energy
  • Laboratory for Physical Sciences
  • Ke Wen
  • Columbia LRL
  • John Bachan

31

slide-32
SLIDE 32

32

More Information

http://opensocfabric.org