Chipyard Basics Howie Mao, Jerry Zhao UC Berkeley - - PowerPoint PPT Presentation

chipyard basics
SMART_READER_LITE
LIVE PREVIEW

Chipyard Basics Howie Mao, Jerry Zhao UC Berkeley - - PowerPoint PPT Presentation

Chipyard Basics Howie Mao, Jerry Zhao UC Berkeley {zhemao,jzh}@berkeley.edu Motivation Berkeley Architecture Research has developed and open-sourced: BOOM Core Diplomacy Chisel FireSim TileLink Rocket Core FIRRTL Configuration System


slide-1
SLIDE 1

Howie Mao, Jerry Zhao UC Berkeley {zhemao,jzh}@berkeley.edu

Chipyard Basics

slide-2
SLIDE 2

Motivation

Berkeley Architecture Research has developed and open-sourced: Goal: Make it easy for small teams to design, integrate, simulate, and tape-out a custom SoC

2

Chisel FIRRTL RISC-V Rocket Core BOOM Core TileLink Accelerators Caches Peripherals Diplomacy Configuration System FireSim HAMMER

slide-3
SLIDE 3

Chipyard

Chipyard

Tooling Chisel FIRRTL RISC-V Rocket Chip Generators Rocket Core BOOM Core TileLink Accelerators Caches Peripherals Diplomacy Configuration System Flows FireSim HAMMER Software RTL Simulation

3

slide-4
SLIDE 4

Chipyard SW RTL Simulation

4

Custom SoC Configuration RTL Generators RISC-V Cores Multi-level Caches Custom Verilog Peripherals Accelerators RTL Build Process Transforms for SW Sim FIRRTL IR Behavioral Verilog Software RTL Simulation VCS Verilator

slide-5
SLIDE 5

Chipyard targeting FireSim

5

Custom SoC Configuration RTL Generators RISC-V Cores Multi-level Caches Custom Verilog Peripherals Accelerators RTL Build Process Transforms for FireSim FIRRTL IR FireSim Verilog FireSim FPGA-Accelerated Simulation Simulation Debugging Networking

slide-6
SLIDE 6

Chipyard VLSI Flow

6

Custom SoC Configuration RTL Generators RISC-V Cores Multi-level Caches Custom Verilog Peripherals Accelerators RTL Build Process Transforms for VLSI FIRRTL IR VLSI Verilog Automated VLSI Flow Hammer Tech- plugins Tool- plugins

slide-7
SLIDE 7

Chipyard Unified Flows

7

Custom SoC Configuration RTL Generators RISC-V Cores Multi-level Caches Custom Verilog Peripherals Accelerators RTL Build Process Transforms for FireSim FIRRTL IR FireSim Verilog Transforms for SW Sim Transforms for VLSI Behavioral Verilog VLSI Verilog Automated VLSI Flow Hammer Tech- plugins Tool- plugins Software RTL Simulation VCS Verilator FireSim FPGA-Accelerated Simulation Simulation Debugging Networking

slide-8
SLIDE 8

Tutorial Roadmap

Custom SoC Configuration RTL Generators RISC-V Cores Multi-level Caches Custom Verilog Peripherals Accelerators Software RTL Simulation VCS Verilator FireSim FPGA-Accelerated Simulation Simulation Debugging Networking Automated VLSI Flow Hammer Tech- plugins Tool- plugins RTL Build Process FIRRTL Transforms FIRRTL IR Verilog FireMarshal Bare-metal & Linux Custom Workload QEMU & Spike

slide-9
SLIDE 9

Chipyard Tooling

slide-10
SLIDE 10

Chisel

  • Chisel – Hardware Construction Language built on Scala
  • What Chisel IS NOT:
  • NOT Scala-to-gates
  • NOT HLS
  • NOT tool-oriented language
  • What Chisel IS:
  • Productive language for generating hardware
  • Leverage OOP/Functional programming paradigms
  • Enables design of parameterized generators
  • Designer-friendly: low barrier-to-entry, high reward
  • Backwards-compatible: integrates with Verilog black-boxes

10

Chisel FIRRTL Verilog VLSI Chisel VLSI

slide-11
SLIDE 11

Chisel Example

// 3-point moving average implemented in the style of a FIR filter class MovingAverage3 extends Module { val io = IO(new Bundle { val in = Input(UInt(32.W)) val out = Output(UInt(32.W)) }) val z1 = RegNext(io.in) val z2 = RegNext(z1) io.out := io.in + z1 + z2 }

11

z1 32 32 z2 + × × × 32 + + 1 1 1 in

  • ut
slide-12
SLIDE 12

Chisel Example

// Generalized FIR filter parameterized by coefficients class FirFilter(bitWidth: Int, coeffs: Seq[Int]) extends Module { val io = IO(new Bundle { val in = Input(UInt(bitWidth.W)) val out = Output(UInt(bitWidth.W)) }) val zs = Wire(Vec(coeffs.length, UInt(bitWidth.W))) zs(0) := io.in for (i <- 1 until coeffs.length) { zs(i) := RegNext(zs(i-1)) } val products = zs zip coeffs map { case (z, c) => z * c.U } io.out := products.reduce(_ + _) }

12

z1 W W z2 + × × × W + + c0 c1 c2 in

  • ut

W × + zN-1 cN-1

slide-13
SLIDE 13

Chisel Example

// Basic implementation val basic3Filter = Module(new MovingAverage3) // Parameterized implementation val better3Filter = Module(new FirFilter(32, Seq(1, 1, 1))) // Generator is reusable val delayFilter = Module(new FirFilter(8, Seq(0, 1))) val triangleFilter = Module(new FirFilter(8, Seq(1, 2, 3, 2, 1)))

13

slide-14
SLIDE 14

FIRRTL – LLVM for Hardware

14

FIRRTL emits tool-friendly, synthesizable Verilog

C/C++ Rust LLVM IR LLVM PassManager x86 assembly Dead code elimination Statistics collection Optimization ARM assembly Chisel Verilog FIRRTL IR FIRRTL Passes Verilog for SW Sim Dead expression elimination Statistics collection Netlist manipulation Verilog for FPGA Sim

slide-15
SLIDE 15

Rocket Chip Generators

slide-16
SLIDE 16

What is Rocket Chip?

  • A highly parameterizable and modular SoC generator
  • Replace default Rocket core w/ your own core
  • Add your own coprocessor
  • Add your own SoC IP to uncore
  • A library of reusable SoC components
  • Memory protocol converters
  • Arbiters and Crossbar generators
  • Clock-crossings and asynchronous queues
  • The largest open-source Chisel codebase
  • Developed at Berkeley, now maintained by many
  • SiFive, ChipsAlliance, Berkeley

16

slide-17
SLIDE 17

Generating Varied SoCs

In academia: UCB Hurricane-1 In industry: SiFive Freedom E310

17

slide-18
SLIDE 18

Used in Many Tapeouts

18

slide-19
SLIDE 19

Structure of a Rocket Chip SoC

Tiles: unit of replication for a core

  • CPU
  • L1 Caches
  • Page-table walker

L2 banks:

  • Receive memory requests

FrontBus:

  • Connects to DMA devices

ControlBus:

  • Connects to core-complex devices

PeripheryBus:

  • Connects to other devices

SystemBus:

  • Ties everything together

19

slide-20
SLIDE 20

The Rocket In-Order Core

  • First open-source RISC-V CPU
  • In-order, single-issue RV64GC core
  • Floating-point via Berkeley hardfloat

library

  • RISC-V Compressed
  • Physical Memory Protection (PMP)

standard

  • Supervisor ISA and Virtual Memory

20

  • Boots Linux
  • Supports Rocket Chip Coprocessor

(RoCC) interface

  • L1 I$ and D$
  • Caches can be configured as

scratchpads

slide-21
SLIDE 21

BOOM: The Berkeley Out-of-Order Machine

  • Superscalar RISC-V OoO core
  • Fully integrated in Rocket Chip ecosystem
  • Open-source
  • Described in Chisel
  • Parameterizable generator
  • Taped-out (BROOM at HC18)
  • Full RV64GC ISA support
  • FP, RVC, Atomics, PMPs, VM, Breakpoints,

RoCC

  • Runs real OS’s, software
  • Drop-in replacement for Rocket

BOOMTile

BOOM

21

slide-22
SLIDE 22

RoCC Accelerators

  • RoCC: Rocket Chip Coprocessor
  • Execute custom RISC-V instructions

for a custom extension

  • Examples of RoCC accelerators
  • Vector accelerators
  • Memcpy accelerator
  • Machine-learning accelerators
  • Java GC accelerator

Tile BOOM/Rocket L1I$ L1D$ PTW TLBs Decoupled RoCC Accelerator L2 SystemBus Core Complex Peripherals

inst wb

22

slide-23
SLIDE 23

L2 Cache and Memory System

  • Multi-bank shared L2
  • SiFive’s open-source IP
  • Fully coherent
  • Configurable size, associativity
  • Supports atomics, prefetch hints
  • Non-caching L2 Broadcast Hub
  • Coherence w/o caching
  • Bufferless design
  • Multi-channel memory system
  • Conversion to AXI4 for compatible

DRAM controllers

23

slide-24
SLIDE 24

Core Complex Devices

  • BootROM
  • First-stage bootloader
  • DeviceTree
  • PLIC
  • CLINT
  • Software interrupts
  • Timer interrupts
  • Debug Unit
  • DMI
  • JTAG

24

slide-25
SLIDE 25

Other Chipyard Blocks

  • Hardfloat: Parameterized Chisel generators for hardware floating-point

units

  • IceNet: Custom NIC for FireSim simulations
  • SiFive-Blocks: Open-sourced Chisel peripherals
  • GPIO, SPI, UART, etc.
  • TestchipIP: Berkeley utilities for chip testing/bringup
  • Tethered serial interface
  • Simulated block device
  • Hwacha: Decoupled vector-fetch RoCC accelerator
  • SHA3: Educational SHA3 RoCC accelerator

25

slide-26
SLIDE 26

TileLink Interconnect

  • Free and open chip-scale interconnect standard
  • Supports multiprocessors, coprocessors, accelerators, DMA, peripherals, etc.
  • Provides a physically addressed, shared-memory system
  • Supports cache-coherent shared memory, MOESI-equivalent protocol
  • Verifiable deadlock freedom for conforming SoCs

26

slide-27
SLIDE 27

TileLink Interconnect

  • Three different protocol levels with increasing complexity
  • TL-UL (Uncached Lightweight)
  • TL-UH (Uncached Heavyweight)
  • TL-C (Cached)
  • Rocket Chip provides library of reusable TileLink widgets
  • Conversion to/from AXI4, AHB, APB
  • Conversion among TL-UL, TL-UH, TL-C
  • Crossbar generator
  • Width / logical size converters
  • TLMonitor conformance checker

27

slide-28
SLIDE 28

Integration

28

RISC-V Cores Multi-level Caches Custom Verilog Peripherals Accelerators Software RTL Simulator Custom SoC Configuration RISC-V Cores Multi-level Caches Custom Verilog Peripherals Accelerators FireSim FPGA Image Custom SoC Configuration RISC-V Cores Multi-level Caches Custom Verilog Peripherals Accelerators GDS Custom SoC Configuration

slide-29
SLIDE 29

Diplomacy

Problem: Interconnects are difficult to parameterize correctly

  • Complex interconnect graph with many nodes
  • Nodes are independently parameterized

Diplomacy: Framework for negotiating parameters between Chisel generators

  • Graphical abstraction of interconnectivity
  • Diplomatic lazy modules follow two-phase elaboration
  • Phase one: nodes exchange configuration information with each other and decide final

parameters

  • Phase two: Chisel RTL elaborates using calculated parameters
  • Used extensively by RocketChip TileLink generators

29

slide-30
SLIDE 30

Diplomacy Examples

Diplomatic parameters

  • Type and size of supported operations
  • Physical memory attributes – modifiability, executability, cacheability
  • Ordering requirements on operations (ex: FIFO)
  • Presence and widths of fields in wire bundles (ex: source ID bits)

Useful applications:

  • Automatically insert TLMonitor protocol correctness checkers
  • Discover AtomicAutomata topology violations

30

slide-31
SLIDE 31

Diplomacy Example

Source[0,1) Source[0,2) Source[0,4) Address[0x0, 0x1000) Address[0x8000, 0xA000) Address[0xA000, 0xB000) Client Manager Crossbar L1D$ L1I$ BootROM L2 TL to AXI AXI to TL

31

slide-32
SLIDE 32

Diplomacy Example

[0,1) [0,2) [0,4) Client Manager Crossbar [0,4) [0,4) [0,8) [0,8) L1D$ L1I$ AXI to TL BootROM L2 TL to AXI

32

slide-33
SLIDE 33

Diplomacy Example

[0x0, 0xB000) [0x8000, 0xA000) [0xA000, 0xB000) Client Manager Crossbar [0x8000, 0xB000) [0x8000, 0xB000) [0x0, 0x1000) [0x0, 0xB000) L1D$ L1I$ AXI to TL BootROM L2 TL to AXI

33

slide-34
SLIDE 34

Diplomacy-generated Graph

34

Tile Front Bus System Bus Control Bus

L2 InclusiveCache

Memory Bus

slide-35
SLIDE 35

Rocket Chip Configuration

class MyCustomConfig extends Config( new WithExtMemSize((1<<30) * 2L) ++ new WithBlockDevice ++ new WithGPIO ++ new WithBootROM ++ new hwacha.DefaultHwachaConfig ++ new WithInclusiveCache(capacityKB=1024) ++ new boom.common.WithLargeBooms ++ new boom.system.WithNBoomCores(3) ++ new WithNormalBoomRocketTop ++ new rocketchip.system.BaseConfig)

T estHarness T

  • p

Tile 0 BOOM L1I$ L1D$

3-w BOOM

SysBus MemBus

BootROM L2 Hwacha GPIOs

Tile 1 L1I$ L1D$ BOOM

3-w BOOM Hwacha SimBlockDevice SimAXIMem

Tile 2 L1I$ L1D$ BOOM

3-w BOOM Hwacha

35

slide-36
SLIDE 36

Rocket Chip Configuration

class MyCustomConfig extends Config( new WithExtMemSize((1<<30) * 2L) ++ new WithBlockDevice ++ new WithGPIO ++ new WithBootROM ++ new hwacha.DefaultHwachaConfig ++ new WithInclusiveCache(capacityKB=1024) ++ new boom.common.WithLargeBooms ++ new boom.system.WithNBoomCores(2) ++ new rocketchip.subsystem.WithNBigCores(1)++ new WithNormalBoomRocketTop ++ new rocketchip.system.BaseConfig)

T estHarness T

  • p

Tile 0 BOOM L1I$ L1D$

3-w BOOM

SysBus MemBus

BootROM L2 Hwacha GPIOs

Tile 1 L1I$ L1D$ BOOM

3-w BOOM Hwacha SimBlockDevice SimAXIMem

Tile 2 L1I$ L1D$ BOOM

Rocket Hwacha

36

slide-37
SLIDE 37

Rocket Chip Configuration

class MyCustomConfig extends Config( new WithExtMemSize((1<<30) * 2L) ++ new WithBlockDevice ++ new WithGPIO ++ new WithBootROM ++ new WithMultiRoCCConvAccel(2) ++ new WithMultiRoCCSha3(1) ++ new WithMultiRoCCHwacha(0) ++ new WithInclusiveCache(capacityKB=1024) ++ new boom.common.WithLargeBooms ++ new boom.system.WithNBoomCores(2) ++ new rocketchip.subsystem.WithNBigCores(1)++ new WithNormalBoomRocketTop ++ new rocketchip.system.BaseConfig)

T estHarness T

  • p

Tile 0 BOOM L1I$ L1D$

3-w BOOM

SysBus MemBus

BootROM L2 Hwacha GPIOs

Tile 1 L1I$ L1D$ BOOM

3-w BOOM SHA3 SimBlockDevice SimAXIMem

Tile 2 L1I$ L1D$ BOOM

Rocket ConvNN

37

slide-38
SLIDE 38

Rocket Chip Configuration

class MyCustomConfig extends Config( new WithExtMemSize((1<<30) * 2L) ++ new WithBlockDevice ++ new WithGPIO ++ new WithBootROM ++ new WithMultiRoCCConvAccel(2) ++ new WithMultiRoCCSha3(1) ++ new WithMultiRoCCHwacha(0) ++ new WithInclusiveCache(capacityKB=1024) ++ new boom.common.WithLargeBooms ++ new boom.system.WithNBoomCores(2) ++ new rocketchip.subsystem.WithRV32 ++ new rocketchip.subsystem.WithNBigCores(1)++ new WithNormalBoomRocketTop ++ new rocketchip.system.BaseConfig)

T estHarness T

  • p

Tile 0 BOOM L1I$ L1D$

3-w BOOM

SysBus MemBus

BootROM L2 Hwacha GPIOs

Tile 1 L1I$ L1D$ BOOM

3-w BOOM SHA3 SimBlockDevice SimAXIMem

Tile 2 L1I$ L1D$ BOOM

RV32Rocket

ConvNN

38

slide-39
SLIDE 39

Rocket Chip Configuration

class MyCustomConfig extends Config( new WithExtMemSize((1<<30) * 2L) ++ new WithBlockDevice ++ new WithGPIO ++ new WithJtagDTM ++ new WithBootROM ++ new WithMultiRoCCConvAccel(2) ++ new WithMultiRoCCSha3(1) ++ new WithMultiRoCCHwacha(0) ++ new WithInclusiveCache(capacityKB=1024) ++ new boom.common.WithLargeBooms ++ new boom.system.WithNBoomCores(2) ++ new rocketchip.subsystem.WithRV32 ++ new rocketchip.subsystem.WithNBigCores(1)++ new WithNormalBoomRocketTop ++ new rocketchip.system.BaseConfig)

T estHarness T

  • p

Tile 0 BOOM L1I$ L1D$

3-w BOOM

SysBus MemBus

BootROM L2 Hwacha GPIOs

Tile 1 L1I$ L1D$ BOOM

3-w BOOM SHA3 SimBlockDevice SimAXIMem

Tile 2 L1I$ L1D$ BOOM

RV32Rocket

ConvNN JTAG

39

slide-40
SLIDE 40

Rocket Chip Configuration

class MyCustomConfig extends Config( new WithExtMemSize((1<<30) * 2L) ++ new WithBlockDevice ++ new WithGPIO ++ new WithJtagDTM ++ new WithBootROM ++ new WithRenumberHarts(rocketFirst=true) ++ new WithMultiRoCCConvAccel(2) ++ new WithMultiRoCCSha3(1) ++ new WithMultiRoCCHwacha(0) ++ new WithInclusiveCache(capacityKB=1024) ++ new boom.common.WithLargeBooms ++ new boom.system.WithNBoomCores(2) ++ new rocketchip.subsystem.WithRV32 ++ new rocketchip.subsystem.WithNBigCores(1)++ new WithNormalBoomRocketTop ++ new rocketchip.system.BaseConfig)

T estHarness T

  • p

Tile 1 BOOM L1I$ L1D$

3-w BOOM

SysBus MemBus

BootROM L2 Hwacha GPIOs

Tile 2 L1I$ L1D$ BOOM

3-w BOOM SHA3 SimBlockDevice SimAXIMem

Tile 0 L1I$ L1D$ BOOM

RV32Rocket

ConvNN JTAG

40

slide-41
SLIDE 41

Rocket Chip Configuration

class MyCustomConfig extends Config( new WithExtMemSize((1<<30) * 2L) ++ new WithBlockDevice ++ new WithGPIO ++ new WithJtagDTM ++ new WithBootROM ++ new WithRenumberHarts(rocketFirst=true) ++ new WithRationalBoomTiles ++ new WithRationalRocketTiles ++ new WithMultiRoCCConvAccel(2) ++ new WithMultiRoCCSha3(1) ++ new WithMultiRoCCHwacha(0) ++ new WithInclusiveCache(capacityKB=1024) ++ new boom.common.WithLargeBooms ++ new boom.system.WithNBoomCores(2) ++ new rocketchip.subsystem.WithRV32 ++ new rocketchip.subsystem.WithNBigCores(1)++ new WithNormalBoomRocketTop ++ new rocketchip.system.BaseConfig)

T estHarness T

  • p

Tile 1 BOOM L1I$ L1D$

3-w BOOM

SysBus MemBus

BootROM L2 Hwacha GPIOs

Tile 2 L1I$ L1D$ BOOM

3-w BOOM SHA3 SimBlockDevice SimAXIMem

Tile 0 L1I$ L1D$ BOOM

RV32Rocket

ConvNN JTAG clk_1 clk_2 clk_0

41

slide-42
SLIDE 42

Chipyard is Community-Friendly

Documentation:

  • https://chipyard.readthedocs.io/en/dev/
  • 85 pages
  • Documents components, flows
  • Links to sub-project documentation
  • Most of today’s tutorial content is covered there

Continuous Integration:

  • Cloud-hosted
  • https://circleci.com/gh/ucb-bar/chipyard/tree/master

42

slide-43
SLIDE 43

Chipyard is Research-Friendly

  • Add new accelerators/custom instructions
  • Modify OS/driver/software
  • Perform design-space exploration across many parameters
  • Test in software and FPGA-sim before tape-out

Stay-tuned for Chipyard-based research from Berkeley

  • New chips
  • New accelerators

43

slide-44
SLIDE 44

High-level questions?

  • Next is a hands-on tutorial led by Abe Gonzalez

44