DESIGNING DIGITAL SIGNAL PROCESSORS WITH ROCKETCHIP Paul Rigge, - - PowerPoint PPT Presentation

designing digital signal processors with rocketchip
SMART_READER_LITE
LIVE PREVIEW

DESIGNING DIGITAL SIGNAL PROCESSORS WITH ROCKETCHIP Paul Rigge, - - PowerPoint PPT Presentation

DESIGNING DIGITAL SIGNAL PROCESSORS WITH ROCKETCHIP Paul Rigge, Borivoje Nikoli {rigge,bora}@eecs.berkeley.edu UC Berkeley CARRV 2018 June 2, 2018 SoCs Combine Programmability + Efficiency


slide-1
SLIDE 1

DESIGNING DIGITAL SIGNAL PROCESSORS WITH ROCKETCHIP

Paul Rigge, Borivoje Nikolić

{rigge,bora}@eecs.berkeley.edu

UC Berkeley CARRV 2018 June 2, 2018

slide-2
SLIDE 2

SoCs Combine Programmability + Efficiency

https://telecomtalk.info/qualcomm-announces-64-bit-snapdragon-808-and-snapdragon-810-high-end-mobile-processors/115805/

slide-3
SLIDE 3

Digital Signal Processing

  • SoCs Integrate a lot of signal processing

– Cellular+WiFi – Audio – Image Processing – GPS

  • Strongly benefits from custom hardware

– Parallelism – High locality – Trim unneeded bits

slide-4
SLIDE 4

Designing SoCs is Hard

  • Long development cycle
  • High cost of tools, respins
  • Reuse limited
  • High NRE only justifiable in high volume
slide-5
SLIDE 5

Chisel, FIRRTL, Rocketchip

  • Chisel: domain specific

language (DSL) for writing programs that generate circuits

  • FIRRTL: Flexible Intermediate

Representation for RTL (LLVM for hardware)

  • RocketChip: Open-source RISC-

V implementation in Chisel

FIRRTL Chisel 3

Backends

Transfor mations

slide-6
SLIDE 6

Growing Infrastructure

  • Stable cores
  • Compilers
  • Software Infrastructure
  • Accelerators

– DMA – Hwacha – Etc.

  • Interconnect Generators
slide-7
SLIDE 7

Outline

  • DSP Generators in Chisel
  • AXI-4 Stream + Diplomacy
  • Memory-mapped DSP Peripherals

– Useful building blocks

  • Verification
  • OFDM Baseband Example
slide-8
SLIDE 8

DSP Generators in Chisel

slide-9
SLIDE 9

dsptools

https://github.com/ucb-bar/dsptools Paul Rigge, Angie Wang, Stevo Bailey, Chick Markley, Adam Izraelevitz

slide-10
SLIDE 10

Floating Point

  • Start implementing hardware without

worrying about precision

– Validate IO, control logic, algorithm

  • Validate floating point hardware

implementation against floating point golden model

  • Uses Verilog “real” types (non-synthesizeable)

Bundle Uint <64> Black Box $bitstoreal Operation (e.g. +) $rtoi Bundle Uint <64>

slide-11
SLIDE 11

Fixed Point

  • Fixed point types in Chisel and FIRRTL
  • Width inference like UInt, SInt
  • Binary point inference

val sel = Wire(Bool()) val a = Wire(FixedPoint(width = 10.W, binaryPoint = 9.BP)) val b = Wire(FixedPoint(width = 12.W, binaryPoint = 10.BP)) val reg = Reg(FixedPoint()) when (sel) { reg := a } .otherwise { reg := b }

slide-12
SLIDE 12

Complex Numbers

  • dsptools defines a Complex type
  • Generic as to underlying type, i.e.

DspComplex[SInt] or DspComplex[FixedPoint]

  • r DspComplex[FloatingPoint] OK
  • Can choose between 3 or 4 real-multiplies for

a complex multiply

slide-13
SLIDE 13

DspContext

  • Automatic pipeline register insertion for adds,

multiplies

  • Rounding
  • Precision for literals
  • Override global defaults with scope, e.g.

DspContext.alter(myContext) { val sum = a context_* b // auto-pipelined }

slide-14
SLIDE 14

Polymorphic Generators

Generic Algorithm Description Floating Point Implementation Fixed Point Implementation

  • Implement basic

functionality

  • Test against

golden model

  • Integrate with

top level

  • Tune rounding +

precision

  • Pipeline
  • Area
  • ptimizations
slide-15
SLIDE 15

Numeric Type Classes

  • Type classes: support ad hoc polymorphism by

adding constraints to type variables

  • Support using type-generic generators with

user-defined types

  • Use type classes from Spire numeric library

– Add new type classes for hardware constructs, especially Bool – Hide expensive operations, e.g. division, sqrt

slide-16
SLIDE 16

Numeric Type Classes

  • Ring

– +, -, *, zero, and one

  • Eq

– === and =!=

  • Order (extends Eq)

– <, >, <=, >=, max, min

  • Real (extends Ring with Order with Sign)

– ceil, floor, round, isWhole

  • Integer (extends Real)

– mod

slide-17
SLIDE 17

DspTester

  • Verification needs to be as parameterizable as

hardware generators

  • Type-generic PeekPokeTester
  • Assert output is within epsilon (set by type)

Generator poke(io.in, 3.0) DspTester UInt Instance Floating Point Instance Fixed Point Instance

slide-18
SLIDE 18

AXI-4 Stream + Diplomacy

slide-19
SLIDE 19

Diplomacy Background

  • Generator runs in two phases

– Negotiate parameters – Elaborate hardware

  • Parameters flow both “in” and “out”
slide-20
SLIDE 20

AXI-4 Stream

  • AMBA standard for

streaming data

  • Defines ready/valid

handshake semantics

  • Most fields optional

– Even TDATA!

Master Slave

TREADY TVALID TDATA 8n clock reset TSTRB n TKEEP n TID i TDEST d TUSER u TLAST

slide-21
SLIDE 21

AXI-4 Stream Diplomacy

  • Nodes exchange

parameters

  • Edge resolves

parameters, chooses bundle parameters

Edge Master Node Slave Node

Width of TDATA, TUSER # Masters Always Ready Has TDATA, TSTRB, TKEEP # Endpoints

Bundle Parameters

slide-22
SLIDE 22

Memory-mapped DSP Peripherals

slide-23
SLIDE 23
  • Connect DSP

accelerators to Rocket via Periphery Bus

DSP Block 1 Block 2 Block 3 Block 4 DFT Bus Periphery Bus Vector Rocket Tile DMA Bus L2 Cache

slide-24
SLIDE 24

DspBlock

  • Basic building block
  • f DSP functionality
  • Streaming inputs

and outputs (any number)

  • Optional memory

interface

CSR DSP Block Pack Unpack Memory AXI-S AXI-S DSP

slide-25
SLIDE 25

Type-generic DSP Blocks

  • Define DSP functionality

and interconnect separately

  • Treat type of memory

interface as parameter

  • streamNode: AXI4StreamNode
  • mem: Option[T]

DspBlock T

  • module: Module

MyDspBlock T TLMyDspBlock AXI4MyDspBlock APBMyDspBlock AHBMyDspBlock

<<bind>> <T -> TileLink> <<bind>> <T -> AXI4> <<bind>> <T -> AHB> <<bind>> <T -> APB>

slide-26
SLIDE 26

DspChain

  • DspBlock composed of many internal

DspBlocks

  • Generate memory interconnect, connections

between blocks

  • Add design-for-test (DFT) structures

DFT DUT Bus Pattern Generator Logic Analyzer DUT AXI4-Str Master Model

C Test

CPU

slide-27
SLIDE 27

Synchronous Data Flow

  • Represent computation as digraph
  • Samples produced/consumed by each node

known a priori

Lee and Messerschmitt (1987).

slide-28
SLIDE 28

DspRegister, DspQueue

  • Building blocks for SDF-style design
  • DspRegister

– Register with programmable vector length – Stream in and out simultaneously – Processor has visibility into contents

  • DspQueue

– Throw interrupt when entries exceeds programmable threshold – Support real-time

slide-29
SLIDE 29

Verification

slide-30
SLIDE 30

Unit Tests

DUT TileLink Master Model AXI4-Stream Master Model AXI4-Stream Slave Model

Scala Test

  • PeekPokeTester

– Type-generic with DspTester

  • FIRRTL Interpreter

very fast for small tests

slide-31
SLIDE 31

Integration Tests

  • Write C programs to

run on Rocket, use Rocket test harness

  • Generate design-for-

test (DFT) structures

– Load test vectors, set muxes, store outputs

  • Same binary for

simulation and bring-up

DFT DUT Bus Pattern Generator Logic Analyzer DUT AXI4-Str Master Model

C Test

CPU

slide-32
SLIDE 32

IPXact

  • XML schema describing

circuit metadata

– Port mappings – interface types – generator parameters

  • Use external tools

– Python test vector generation + Verification Workbench

Chisel Generator FIRRTL/ Verilog IPXact C API

slide-33
SLIDE 33

OFDM Baseband Example

slide-34
SLIDE 34

OFDM Background

  • Frequency domain equalization
  • Uses FFT
  • Relax time domain

synchronization

frequency

Guard Interval OFDM Symbol CP time

slide-35
SLIDE 35

OFDM Baseband

  • Transmitter and receiver

Splitter Sync DspQueue DspRegister Receiver DspRegister Autocorr Peak Detect

Interrupt

From ADC Receiver CFO Correct CP Strip FFT Channel Eq DspRegister Transmitter Transmitter Add CP Add Pilot IFFT To DAC DspRegister DspRegister DspRegister Transmission Scheduler

slide-36
SLIDE 36

Conclusion

  • Chisel + FIRRTL + dsptools help building DSP
  • RocketChip is not just a processor, library of

useful components

– Diplomacy – Interconnect – Utilities

  • Can build and verify complex SoCs with

RocketChip

slide-37
SLIDE 37

Thank You

  • Collaborators

– Stevo Bailey, Angie Wang, Adam Izraelevitz, Chick Markley, Colin Schmidt, Timo Joas, and Jim Lawson – UCB BAR

  • Support

– NSF GRFP – Adept and BWRC