Untethered lowRISC, Memory Mapped IO and TileLink/AXI Wei Song - - PowerPoint PPT Presentation

untethered lowrisc memory mapped io and tilelink axi
SMART_READER_LITE
LIVE PREVIEW

Untethered lowRISC, Memory Mapped IO and TileLink/AXI Wei Song - - PowerPoint PPT Presentation

Untethered lowRISC, Memory Mapped IO and TileLink/AXI Wei Song 27/07/2015 Time Line expected Nov. 2014 Apr. 2015 Now Oct. 2015 First lowRISC Rocket-Chip release Memeory Untethered from Berkeley release. Mapped IO. lowRISC release.


slide-1
SLIDE 1

Untethered lowRISC, Memory Mapped IO and TileLink/AXI

Wei Song 27/07/2015

slide-2
SLIDE 2

Time Line

2

  • Nov. 2014

Rocket-Chip release from Berkeley

  • Apr. 2015

First lowRISC release. Initial tagged memory support. Now Memeory Mapped IO.

  • Oct. 2015

Untethered lowRISC release. · Added tags in L1 D$, L2. · Added a tag cache. · Added 2 instructions to load/ store tag. · A tutorial about Rocket-chip. · Untethered SoC. · Support Kintex KC705. · Support MMIO. · Support SD, UART, DDRAM. · Open simulation environment.

expected

slide-3
SLIDE 3

Rocket-Chip Release (Berkeley)

3

Rocket Core L2 & Coherence Manager L2 & Coherence Manager TileLink I$ D$

Rocket Tile

TileLink TileLink L2 & Coherence Manager TileLink TileLink TileLink Rocket Core I$ D$

Rocket Tile

Rocket Core I$ D$

Rocket Tile

Arbiter

Memory Controller MemIO Converter

Host Interface ARM

UART SD EtherNet

slide-4
SLIDE 4

lowRISC Release (tagged memory)

4

Rocket Core L2 & Coherence Manager L2 & Coherence Manager TileLink

Allocator

I$ D$

Rocket Tile

TileLink TileLink L2 & Coherence Manager TileLink TileLink TileLink Rocket Core I$ D$

Rocket Tile

Rocket Core I$ D$

Rocket Tile

Tracker & Converter

Data Array Tracker & Converter MetaData Array

Arbiter

Memory Controller

Tag Cache Host Interface ARM

UART SD EtherNet

Tag in L1 D$, L2 $ Tag Cache LTAG/STAG instructions

slide-5
SLIDE 5

Latest Rocket-Chip (Berkeley)

5

Rocket Core L2 & Coherence Manager L2 & Coherence Manager I$ D$

Rocket Tile

L2 & Coherence Manager Rocket Core I$ D$

Rocket Tile

Rocket Core I$ D$

Rocket Tile

Arbiter

Memory Controller

Host Interface TileLink/AXI AXI/MemIO AXI Bus ARM

UART SD EtherNet

Cached TileLink Uncached TileLink AXI MemIO L2 Bus

Multi-beat TileLink Standardize TileLink transactions Possible coherence support of L3 Code refactoring AXI/AXI interface (NASTI)

slide-6
SLIDE 6

Untethered lowRISC SoC (First Version)

6

Rocket Core L2 & Coherence Manager L2 & Coherence Manager I$ D$

Rocket Tile

L2 & Coherence Manager Rocket Core I$ D$

Rocket Tile

Rocket Core I$ D$

Rocket Tile

Arbiter

Memory Controller

TileLink/AXI AXI Bus Cached TileLink Uncached TileLink AXI L2 Cache Bus Tag Cache

On-FPGA Boot Ram

L2 IO Bus AXI-Lite UART SD EtherNet TileLink/AXI-Lite DMA DMA

coherent incoherent

Boot Minion

slide-7
SLIDE 7

Current Status

7

Rocket Core L2 & Coherence Manager L2 & Coherence Manager I$ D$

Rocket Tile

L2 & Coherence Manager Rocket Core I$ D$

Rocket Tile

Rocket Core I$ D$

Rocket Tile

Arbiter

Memory Controller

TileLink/AXI AXI Bus Cached TileLink Uncached TileLink AXI L2 Cache Bus Tag Cache

On-FPGA Boot Ram

L2 IO Bus AXI-Lite UART SD EtherNet TileLink/AXI-Lite DMA DMA

coherent incoherent

Boot Minion

slide-8
SLIDE 8

Memory Mapped IO

  • Target

– IO load/write (B/HW/W/DW) – In-order uncached load/store – Side effect

  • None for all write in units of byte
  • None for all read in units of word (32-bit AXI-Lite)

– No change in current L2 coherent manager

8

slide-9
SLIDE 9

Untethered lowRISC SoC (First Version)

9

Rocket Core L2 & Coherence Manager L2 & Coherence Manager I$ D$

Rocket Tile

L2 & Coherence Manager Rocket Core I$ D$

Rocket Tile

Rocket Core I$ D$

Rocket Tile

Arbiter

Memory Controller

TileLink/AXI AXI Bus Cached TileLink Uncached TileLink AXI L2 Cache Bus Tag Cache

On-FPGA Boot Ram

L2 IO Bus AXI-Lite UART SD EtherNet TileLink/AXI-Lite AXI/AXI-Lite DMA DMA

coherent incoherent

Boot Minion

slide-10
SLIDE 10

L1 Data Cache

10

data

[DataArray rocket/nbdcache.scala]

meta

[MetadataArray uncore/cache.scala]

mshrs

[MSHRFile;rocket/nbdcache.scala]

mshr

[MSHR rocket/nbdcache.scala]

data

[DataArray rocket/nbdcache.scala]

meta

[MetadataArray uncore/cache.scala]

wb

[WriteBack;rocket/nbdcache.scala]

prober

[ProbeUnit;rocket/nbdcache.scala]

dtlb

[TLB rocket/tlb.scala]

Arbiter

1 2 3 4

Arbiter

1 2 3 mshrs.replay

Arbiter

1 2 3 4 s1_req

s1_req.addr

= = = =

s1_tag_eq_way

code

[DecodeLogic rocket/decode.scala]

s1_addr read read resp resp

s2_req

s2_data (uncorrected)

amoalu

[AMOALU rocket/nbdcache.scala]

s3_req

s2_hit s2_data (corrected)

Arb Arb

1 1

write write

mshrs.request mshrs.meta_write mem.req mem.grant

Arb

1

meta/data read meta/data read

wb.meta/data_read prober.meta/data_read

req

Arb

1

release rep wb_req data_resp s2_data (corrected) prober.release

mshrs.wb_req

meta_write prober.meta_write line_state req

mem.probe mem.finish mem.release cpu.resp.valid cpu.resp.bits.data cpu.req cpu.ptw dtlb.ptw

Stage 1 Stage 2 Stage 3 Stage 4

s2_recycle s1_recycled

s2_data_correctable vpn ppn

correctable correct in

  • ut

rhs lhs

  • ut

s1_data s2_tag_eq_way s2_data s2_hit

slide-11
SLIDE 11

L1 Data Cache (simplified)

11

data meta mshrs mshr data meta dtlb Arbiter Arbiter

mshrs.replay

Arbiter

s1_req

s1_req.addr

= = = =

s1_tag_eq_way s1_addr read read resp resp

s2_req

amoalu

s2_hit write write

mshrs.request mshrs.meta_write

mem.req mem.grant cpu.req Stage 1 Stage 2 Stage 3 Stage 4

vpn ppn

rhs lhs

  • ut

s1_data s2_data s2_hit

cpu.resp Arb

s1_addr

slide-12
SLIDE 12

L1 Data Cache with IO Handler

12

data meta mshrs mshr data meta dtlb Arbiter Arbiter

mshrs.replay

Arbiter

s1_req

s1_req.addr

= = = =

s1_tag_eq_way s1_addr read read resp resp

s2_req

amoalu

write write

mshrs.request mshrs.meta_write

io.req io.grant cpu.req Stage 1 Stage 2 Stage 3 Stage 4

vpn ppn

rhs lhs

  • ut

s1_data s2_data s2_hit

cpu.resp Arb

s1_addr

ioaddr

s2_req.addr addr io

iomshr

request iomshr.replay io_data s1_io_data s2_io_data

s2_io_replay

io_data replay

mem.req mem.grant

slide-13
SLIDE 13

TileLink Channels

  • Manager/Client

– Manager: Coherent manager or next level cache/device – Client: upper level cache

  • 5 Channels

– Acquire: [C -> M]

  • Read, uncached write (write-through, IO), permission update

– Grant: [M -> C]

  • Ack to Acquire (with data when read)

– Finish: [C -> M]

  • Finish a transaction

– Probe: [M -> C]

  • Coherence probe (snoop, invalidate)

– Release: [C -> M]

  • Write-back (replace or invalidate)

13

slide-14
SLIDE 14

Untethered lowRISC SoC (First Version)

14

Rocket Core L2 & Coherence Manager L2 & Coherence Manager I$ D$

Rocket Tile

L2 & Coherence Manager Rocket Core I$ D$

Rocket Tile

Rocket Core I$ D$

Rocket Tile

Arbiter

Memory Controller

TileLink/AXI AXI Bus Cached TileLink Uncached TileLink AXI L2 Cache Bus Tag Cache

On-FPGA Boot Ram

L2 IO Bus AXI-Lite UART SD EtherNet TileLink/AXI-Lite AXI/AXI-Lite DMA DMA

coherent incoherent

Boot Minion

slide-15
SLIDE 15

TileLink Corssbar

15 Acquire Grant Finish Probe Release Acquire Grant Finish Probe Release Acquire Grant Finish Probe Release Acquire Grant Finish Probe Release

L1 $ L1 $ L2 Bank L2 Bank client Manager TileLink Corssbar

slide-16
SLIDE 16

Shared TileLink Corssbar

16 Acquire Grant Finish Probe Release Acquire Grant Finish Probe Release Acquire Grant Finish Probe Release Acquire Grant Finish Probe Release

L1 $ L1 $ L2 Bank L2 Bank client Manager Shared TileLink Corssbar

Use a SuperChannel to store all types of TileLink channels.

slide-17
SLIDE 17

Current Status of TileLink/AXI

  • TileLink/AXI (Berkeley, Rocket-chip)

– only a whole cache line

  • TileLink/AXI-Lite (lowRISC)

– 1,2,4,8 byte write; 4,8 byte read

  • AHB/APB (Berkeley, Z-Scale)
  • Still needed:

– AXI/AXI-Lite compatible, auto width SerDes switch

  • The AXI-Node from PULP
  • May be in Chisel for its parameterization capability

– AXI/Wishbone, TileLink/Wishbone

17

slide-18
SLIDE 18

Remain Issues

  • Interrupt controller
  • Open Sourced, License compatible IPs

– UART (Flexpret, BSD) – SD host controller – Ethernet controller (Xilinx IP for now) – Memory controller (difficult to get)

  • Open Source EDA tools

– Current environment:

  • VCS (DRAMSim, Front-end server, DirectC)
  • Vivado+SDK (SDK not available for Kintex)

– Target environment:

  • Verilator (SystemVerilog 2009, SystemC, VPI, DPI)
  • Vivado only

18

slide-19
SLIDE 19

After the Untethered SoC

  • Implementing the hierarchical tag cache

(hardware)

  • Debug interface
  • Integrating minions (PULP)
  • Tag support in Rocket cores (Lucas)

19