Howie Mao, Jerry Zhao UC Berkeley {zhemao,jzh}@berkeley.edu
Chipyard Basics Howie Mao, Jerry Zhao UC Berkeley - - PowerPoint PPT Presentation
Chipyard Basics Howie Mao, Jerry Zhao UC Berkeley - - PowerPoint PPT Presentation
Chipyard Basics Howie Mao, Jerry Zhao UC Berkeley {zhemao,jzh}@berkeley.edu Motivation Berkeley Architecture Research has developed and open-sourced: BOOM Core Diplomacy Chisel FireSim TileLink Rocket Core FIRRTL Configuration System
Motivation
Berkeley Architecture Research has developed and open-sourced: Goal: Make it easy for small teams to design, integrate, simulate, and tape-out a custom SoC
2
Chisel FIRRTL RISC-V Rocket Core BOOM Core TileLink Accelerators Caches Peripherals Diplomacy Configuration System FireSim HAMMER
Chipyard
Chipyard
Tooling Chisel FIRRTL RISC-V Rocket Chip Generators Rocket Core BOOM Core TileLink Accelerators Caches Peripherals Diplomacy Configuration System Flows FireSim HAMMER Software RTL Simulation
3
Chipyard SW RTL Simulation
4
Custom SoC Configuration RTL Generators RISC-V Cores Multi-level Caches Custom Verilog Peripherals Accelerators RTL Build Process Transforms for SW Sim FIRRTL IR Behavioral Verilog Software RTL Simulation VCS Verilator
Chipyard targeting FireSim
5
Custom SoC Configuration RTL Generators RISC-V Cores Multi-level Caches Custom Verilog Peripherals Accelerators RTL Build Process Transforms for FireSim FIRRTL IR FireSim Verilog FireSim FPGA-Accelerated Simulation Simulation Debugging Networking
Chipyard VLSI Flow
6
Custom SoC Configuration RTL Generators RISC-V Cores Multi-level Caches Custom Verilog Peripherals Accelerators RTL Build Process Transforms for VLSI FIRRTL IR VLSI Verilog Automated VLSI Flow Hammer Tech- plugins Tool- plugins
Chipyard Unified Flows
7
Custom SoC Configuration RTL Generators RISC-V Cores Multi-level Caches Custom Verilog Peripherals Accelerators RTL Build Process Transforms for FireSim FIRRTL IR FireSim Verilog Transforms for SW Sim Transforms for VLSI Behavioral Verilog VLSI Verilog Automated VLSI Flow Hammer Tech- plugins Tool- plugins Software RTL Simulation VCS Verilator FireSim FPGA-Accelerated Simulation Simulation Debugging Networking
Tutorial Roadmap
Custom SoC Configuration RTL Generators RISC-V Cores Multi-level Caches Custom Verilog Peripherals Accelerators Software RTL Simulation VCS Verilator FireSim FPGA-Accelerated Simulation Simulation Debugging Networking Automated VLSI Flow Hammer Tech- plugins Tool- plugins RTL Build Process FIRRTL Transforms FIRRTL IR Verilog FireMarshal Bare-metal & Linux Custom Workload QEMU & Spike
Chipyard Tooling
Chisel
- Chisel – Hardware Construction Language built on Scala
- What Chisel IS NOT:
- NOT Scala-to-gates
- NOT HLS
- NOT tool-oriented language
- What Chisel IS:
- Productive language for generating hardware
- Leverage OOP/Functional programming paradigms
- Enables design of parameterized generators
- Designer-friendly: low barrier-to-entry, high reward
- Backwards-compatible: integrates with Verilog black-boxes
10
Chisel FIRRTL Verilog VLSI Chisel VLSI
Chisel Example
// 3-point moving average implemented in the style of a FIR filter class MovingAverage3 extends Module { val io = IO(new Bundle { val in = Input(UInt(32.W)) val out = Output(UInt(32.W)) }) val z1 = RegNext(io.in) val z2 = RegNext(z1) io.out := io.in + z1 + z2 }
11
z1 32 32 z2 + × × × 32 + + 1 1 1 in
- ut
Chisel Example
// Generalized FIR filter parameterized by coefficients class FirFilter(bitWidth: Int, coeffs: Seq[Int]) extends Module { val io = IO(new Bundle { val in = Input(UInt(bitWidth.W)) val out = Output(UInt(bitWidth.W)) }) val zs = Wire(Vec(coeffs.length, UInt(bitWidth.W))) zs(0) := io.in for (i <- 1 until coeffs.length) { zs(i) := RegNext(zs(i-1)) } val products = zs zip coeffs map { case (z, c) => z * c.U } io.out := products.reduce(_ + _) }
12
z1 W W z2 + × × × W + + c0 c1 c2 in
- ut
W × + zN-1 cN-1
Chisel Example
// Basic implementation val basic3Filter = Module(new MovingAverage3) // Parameterized implementation val better3Filter = Module(new FirFilter(32, Seq(1, 1, 1))) // Generator is reusable val delayFilter = Module(new FirFilter(8, Seq(0, 1))) val triangleFilter = Module(new FirFilter(8, Seq(1, 2, 3, 2, 1)))
13
FIRRTL – LLVM for Hardware
14
FIRRTL emits tool-friendly, synthesizable Verilog
C/C++ Rust LLVM IR LLVM PassManager x86 assembly Dead code elimination Statistics collection Optimization ARM assembly Chisel Verilog FIRRTL IR FIRRTL Passes Verilog for SW Sim Dead expression elimination Statistics collection Netlist manipulation Verilog for FPGA Sim
Rocket Chip Generators
What is Rocket Chip?
- A highly parameterizable and modular SoC generator
- Replace default Rocket core w/ your own core
- Add your own coprocessor
- Add your own SoC IP to uncore
- A library of reusable SoC components
- Memory protocol converters
- Arbiters and Crossbar generators
- Clock-crossings and asynchronous queues
- The largest open-source Chisel codebase
- Developed at Berkeley, now maintained by many
- SiFive, ChipsAlliance, Berkeley
16
Generating Varied SoCs
In academia: UCB Hurricane-1 In industry: SiFive Freedom E310
17
Used in Many Tapeouts
18
Structure of a Rocket Chip SoC
Tiles: unit of replication for a core
- CPU
- L1 Caches
- Page-table walker
L2 banks:
- Receive memory requests
FrontBus:
- Connects to DMA devices
ControlBus:
- Connects to core-complex devices
PeripheryBus:
- Connects to other devices
SystemBus:
- Ties everything together
19
The Rocket In-Order Core
- First open-source RISC-V CPU
- In-order, single-issue RV64GC core
- Floating-point via Berkeley hardfloat
library
- RISC-V Compressed
- Physical Memory Protection (PMP)
standard
- Supervisor ISA and Virtual Memory
20
- Boots Linux
- Supports Rocket Chip Coprocessor
(RoCC) interface
- L1 I$ and D$
- Caches can be configured as
scratchpads
BOOM: The Berkeley Out-of-Order Machine
- Superscalar RISC-V OoO core
- Fully integrated in Rocket Chip ecosystem
- Open-source
- Described in Chisel
- Parameterizable generator
- Taped-out (BROOM at HC18)
- Full RV64GC ISA support
- FP, RVC, Atomics, PMPs, VM, Breakpoints,
RoCC
- Runs real OS’s, software
- Drop-in replacement for Rocket
BOOMTile
BOOM
21
RoCC Accelerators
- RoCC: Rocket Chip Coprocessor
- Execute custom RISC-V instructions
for a custom extension
- Examples of RoCC accelerators
- Vector accelerators
- Memcpy accelerator
- Machine-learning accelerators
- Java GC accelerator
Tile BOOM/Rocket L1I$ L1D$ PTW TLBs Decoupled RoCC Accelerator L2 SystemBus Core Complex Peripherals
inst wb
22
L2 Cache and Memory System
- Multi-bank shared L2
- SiFive’s open-source IP
- Fully coherent
- Configurable size, associativity
- Supports atomics, prefetch hints
- Non-caching L2 Broadcast Hub
- Coherence w/o caching
- Bufferless design
- Multi-channel memory system
- Conversion to AXI4 for compatible
DRAM controllers
23
Core Complex Devices
- BootROM
- First-stage bootloader
- DeviceTree
- PLIC
- CLINT
- Software interrupts
- Timer interrupts
- Debug Unit
- DMI
- JTAG
24
Other Chipyard Blocks
- Hardfloat: Parameterized Chisel generators for hardware floating-point
units
- IceNet: Custom NIC for FireSim simulations
- SiFive-Blocks: Open-sourced Chisel peripherals
- GPIO, SPI, UART, etc.
- TestchipIP: Berkeley utilities for chip testing/bringup
- Tethered serial interface
- Simulated block device
- Hwacha: Decoupled vector-fetch RoCC accelerator
- SHA3: Educational SHA3 RoCC accelerator
25
TileLink Interconnect
- Free and open chip-scale interconnect standard
- Supports multiprocessors, coprocessors, accelerators, DMA, peripherals, etc.
- Provides a physically addressed, shared-memory system
- Supports cache-coherent shared memory, MOESI-equivalent protocol
- Verifiable deadlock freedom for conforming SoCs
26
TileLink Interconnect
- Three different protocol levels with increasing complexity
- TL-UL (Uncached Lightweight)
- TL-UH (Uncached Heavyweight)
- TL-C (Cached)
- Rocket Chip provides library of reusable TileLink widgets
- Conversion to/from AXI4, AHB, APB
- Conversion among TL-UL, TL-UH, TL-C
- Crossbar generator
- Width / logical size converters
- TLMonitor conformance checker
27
Integration
28
RISC-V Cores Multi-level Caches Custom Verilog Peripherals Accelerators Software RTL Simulator Custom SoC Configuration RISC-V Cores Multi-level Caches Custom Verilog Peripherals Accelerators FireSim FPGA Image Custom SoC Configuration RISC-V Cores Multi-level Caches Custom Verilog Peripherals Accelerators GDS Custom SoC Configuration
Diplomacy
Problem: Interconnects are difficult to parameterize correctly
- Complex interconnect graph with many nodes
- Nodes are independently parameterized
Diplomacy: Framework for negotiating parameters between Chisel generators
- Graphical abstraction of interconnectivity
- Diplomatic lazy modules follow two-phase elaboration
- Phase one: nodes exchange configuration information with each other and decide final
parameters
- Phase two: Chisel RTL elaborates using calculated parameters
- Used extensively by RocketChip TileLink generators
29
Diplomacy Examples
Diplomatic parameters
- Type and size of supported operations
- Physical memory attributes – modifiability, executability, cacheability
- Ordering requirements on operations (ex: FIFO)
- Presence and widths of fields in wire bundles (ex: source ID bits)
Useful applications:
- Automatically insert TLMonitor protocol correctness checkers
- Discover AtomicAutomata topology violations
30
Diplomacy Example
Source[0,1) Source[0,2) Source[0,4) Address[0x0, 0x1000) Address[0x8000, 0xA000) Address[0xA000, 0xB000) Client Manager Crossbar L1D$ L1I$ BootROM L2 TL to AXI AXI to TL
31
Diplomacy Example
[0,1) [0,2) [0,4) Client Manager Crossbar [0,4) [0,4) [0,8) [0,8) L1D$ L1I$ AXI to TL BootROM L2 TL to AXI
32
Diplomacy Example
[0x0, 0xB000) [0x8000, 0xA000) [0xA000, 0xB000) Client Manager Crossbar [0x8000, 0xB000) [0x8000, 0xB000) [0x0, 0x1000) [0x0, 0xB000) L1D$ L1I$ AXI to TL BootROM L2 TL to AXI
33
Diplomacy-generated Graph
34
Tile Front Bus System Bus Control Bus
L2 InclusiveCache
Memory Bus
Rocket Chip Configuration
class MyCustomConfig extends Config( new WithExtMemSize((1<<30) * 2L) ++ new WithBlockDevice ++ new WithGPIO ++ new WithBootROM ++ new hwacha.DefaultHwachaConfig ++ new WithInclusiveCache(capacityKB=1024) ++ new boom.common.WithLargeBooms ++ new boom.system.WithNBoomCores(3) ++ new WithNormalBoomRocketTop ++ new rocketchip.system.BaseConfig)
T estHarness T
- p
Tile 0 BOOM L1I$ L1D$
3-w BOOM
SysBus MemBus
BootROM L2 Hwacha GPIOs
Tile 1 L1I$ L1D$ BOOM
3-w BOOM Hwacha SimBlockDevice SimAXIMem
Tile 2 L1I$ L1D$ BOOM
3-w BOOM Hwacha
35
Rocket Chip Configuration
class MyCustomConfig extends Config( new WithExtMemSize((1<<30) * 2L) ++ new WithBlockDevice ++ new WithGPIO ++ new WithBootROM ++ new hwacha.DefaultHwachaConfig ++ new WithInclusiveCache(capacityKB=1024) ++ new boom.common.WithLargeBooms ++ new boom.system.WithNBoomCores(2) ++ new rocketchip.subsystem.WithNBigCores(1)++ new WithNormalBoomRocketTop ++ new rocketchip.system.BaseConfig)
T estHarness T
- p
Tile 0 BOOM L1I$ L1D$
3-w BOOM
SysBus MemBus
BootROM L2 Hwacha GPIOs
Tile 1 L1I$ L1D$ BOOM
3-w BOOM Hwacha SimBlockDevice SimAXIMem
Tile 2 L1I$ L1D$ BOOM
Rocket Hwacha
36
Rocket Chip Configuration
class MyCustomConfig extends Config( new WithExtMemSize((1<<30) * 2L) ++ new WithBlockDevice ++ new WithGPIO ++ new WithBootROM ++ new WithMultiRoCCConvAccel(2) ++ new WithMultiRoCCSha3(1) ++ new WithMultiRoCCHwacha(0) ++ new WithInclusiveCache(capacityKB=1024) ++ new boom.common.WithLargeBooms ++ new boom.system.WithNBoomCores(2) ++ new rocketchip.subsystem.WithNBigCores(1)++ new WithNormalBoomRocketTop ++ new rocketchip.system.BaseConfig)
T estHarness T
- p
Tile 0 BOOM L1I$ L1D$
3-w BOOM
SysBus MemBus
BootROM L2 Hwacha GPIOs
Tile 1 L1I$ L1D$ BOOM
3-w BOOM SHA3 SimBlockDevice SimAXIMem
Tile 2 L1I$ L1D$ BOOM
Rocket ConvNN
37
Rocket Chip Configuration
class MyCustomConfig extends Config( new WithExtMemSize((1<<30) * 2L) ++ new WithBlockDevice ++ new WithGPIO ++ new WithBootROM ++ new WithMultiRoCCConvAccel(2) ++ new WithMultiRoCCSha3(1) ++ new WithMultiRoCCHwacha(0) ++ new WithInclusiveCache(capacityKB=1024) ++ new boom.common.WithLargeBooms ++ new boom.system.WithNBoomCores(2) ++ new rocketchip.subsystem.WithRV32 ++ new rocketchip.subsystem.WithNBigCores(1)++ new WithNormalBoomRocketTop ++ new rocketchip.system.BaseConfig)
T estHarness T
- p
Tile 0 BOOM L1I$ L1D$
3-w BOOM
SysBus MemBus
BootROM L2 Hwacha GPIOs
Tile 1 L1I$ L1D$ BOOM
3-w BOOM SHA3 SimBlockDevice SimAXIMem
Tile 2 L1I$ L1D$ BOOM
RV32Rocket
ConvNN
38
Rocket Chip Configuration
class MyCustomConfig extends Config( new WithExtMemSize((1<<30) * 2L) ++ new WithBlockDevice ++ new WithGPIO ++ new WithJtagDTM ++ new WithBootROM ++ new WithMultiRoCCConvAccel(2) ++ new WithMultiRoCCSha3(1) ++ new WithMultiRoCCHwacha(0) ++ new WithInclusiveCache(capacityKB=1024) ++ new boom.common.WithLargeBooms ++ new boom.system.WithNBoomCores(2) ++ new rocketchip.subsystem.WithRV32 ++ new rocketchip.subsystem.WithNBigCores(1)++ new WithNormalBoomRocketTop ++ new rocketchip.system.BaseConfig)
T estHarness T
- p
Tile 0 BOOM L1I$ L1D$
3-w BOOM
SysBus MemBus
BootROM L2 Hwacha GPIOs
Tile 1 L1I$ L1D$ BOOM
3-w BOOM SHA3 SimBlockDevice SimAXIMem
Tile 2 L1I$ L1D$ BOOM
RV32Rocket
ConvNN JTAG
39
Rocket Chip Configuration
class MyCustomConfig extends Config( new WithExtMemSize((1<<30) * 2L) ++ new WithBlockDevice ++ new WithGPIO ++ new WithJtagDTM ++ new WithBootROM ++ new WithRenumberHarts(rocketFirst=true) ++ new WithMultiRoCCConvAccel(2) ++ new WithMultiRoCCSha3(1) ++ new WithMultiRoCCHwacha(0) ++ new WithInclusiveCache(capacityKB=1024) ++ new boom.common.WithLargeBooms ++ new boom.system.WithNBoomCores(2) ++ new rocketchip.subsystem.WithRV32 ++ new rocketchip.subsystem.WithNBigCores(1)++ new WithNormalBoomRocketTop ++ new rocketchip.system.BaseConfig)
T estHarness T
- p
Tile 1 BOOM L1I$ L1D$
3-w BOOM
SysBus MemBus
BootROM L2 Hwacha GPIOs
Tile 2 L1I$ L1D$ BOOM
3-w BOOM SHA3 SimBlockDevice SimAXIMem
Tile 0 L1I$ L1D$ BOOM
RV32Rocket
ConvNN JTAG
40
Rocket Chip Configuration
class MyCustomConfig extends Config( new WithExtMemSize((1<<30) * 2L) ++ new WithBlockDevice ++ new WithGPIO ++ new WithJtagDTM ++ new WithBootROM ++ new WithRenumberHarts(rocketFirst=true) ++ new WithRationalBoomTiles ++ new WithRationalRocketTiles ++ new WithMultiRoCCConvAccel(2) ++ new WithMultiRoCCSha3(1) ++ new WithMultiRoCCHwacha(0) ++ new WithInclusiveCache(capacityKB=1024) ++ new boom.common.WithLargeBooms ++ new boom.system.WithNBoomCores(2) ++ new rocketchip.subsystem.WithRV32 ++ new rocketchip.subsystem.WithNBigCores(1)++ new WithNormalBoomRocketTop ++ new rocketchip.system.BaseConfig)
T estHarness T
- p
Tile 1 BOOM L1I$ L1D$
3-w BOOM
SysBus MemBus
BootROM L2 Hwacha GPIOs
Tile 2 L1I$ L1D$ BOOM
3-w BOOM SHA3 SimBlockDevice SimAXIMem
Tile 0 L1I$ L1D$ BOOM
RV32Rocket
ConvNN JTAG clk_1 clk_2 clk_0
41
Chipyard is Community-Friendly
Documentation:
- https://chipyard.readthedocs.io/en/dev/
- 85 pages
- Documents components, flows
- Links to sub-project documentation
- Most of today’s tutorial content is covered there
Continuous Integration:
- Cloud-hosted
- https://circleci.com/gh/ucb-bar/chipyard/tree/master
42
Chipyard is Research-Friendly
- Add new accelerators/custom instructions
- Modify OS/driver/software
- Perform design-space exploration across many parameters
- Test in software and FPGA-sim before tape-out
Stay-tuned for Chipyard-based research from Berkeley
- New chips
- New accelerators
43
High-level questions?
- Next is a hands-on tutorial led by Abe Gonzalez
44