Intro to SKARAB for programmers
(and how to use HMC!)
Jason Manley
2017 CASPER workshop
Intro to SKARAB for programmers (and how to use HMC!) Jason Manley - - PowerPoint PPT Presentation
Intro to SKARAB for programmers (and how to use HMC!) Jason Manley 2017 CASPER workshop Hardware Hardware Virtex 7, 690T FPGA 4 Mezzanine sites per SKARAB 2 in front, 2 in back 16 SERDES links per site Designed to
(and how to use HMC!)
2017 CASPER workshop
○ 2 in front, 2 in back ○ 16 SERDES links per site ○ Designed to early PowerMX standard.
around 20% - 30% rated speed.
Mezzanine cards allow trading off of memory vs IO capacity. Four cards per SKARAB.
○ HMC replaces QDR/SRAM and also DRAM found on previous CASPER boards.
○ No more complicated, flaky PHY chips that need firmware loaded to function properly.
card: 2x “half-width” (8 lane) links at 10Gbps per lane.
Quad 40G QSFP Ethernet card PHY-less (purely passive). Does have a little micro processor for SFP management (power, temp etc). Able to drive optics directly. Tested with up to 7m passive cables. Recommend AOC (Active Optical Cables) for anything 5m and over. Does not currently work in “breakout” mode with spider/octopus cables. (turning one 40G port into 4x10G ports)
Logic cells 53K 94K 476K 693K DSP slices 232 640 2016 3600 BRAM capacity 4.2Mb 8.8Mb 38Mb 53Mb SRAM capacity 2x18Mb 2x36Mb 4x144Mb HMC < 8x 32Gib 8x 30Gbps R+W SRAM bandwidth 9Gbps 43Gbps 200Gbps DDR capacity (max)
1x16Gb DDR bandwidth (total)
50Gbps Ethernet ports 2x 10G 4x10G 8x10G < 16x40G
Uses the JASPER flow, not the traditional CASPER flow. Python now forms the backend for managing:
Backend is Xilinx VIVADO, not ISE (hard break at Virtex-6/ROACH-2; no overlapping tool support). (recall Wesley’s JASPER/VIVADO in talk on Monday) SKARAB incorporates all the lessons-learnt from SKA-SA’s sizable deployments of iBOB/BEE2, ROACH-1 and ROACH-2s. After compiling a bitstream, interacting with a SKARAB from a network-attached control computer using any of the standard tools is the same as working with any previous CASPER hardware. But it is quite different under-the-hood...
Previous CASPER boards (iBOBs, BEE2s, ROACH1s, ROACH2s) all had out of band management ports (separate 100Mbps or 1G Ethernet ports from the 10G data ports). SKARAB can do everything in-band: data, management as well as (re)programming
SKARAB does not have a separate management processor.
Simpler setup and maintenance:
○ “Golden Image” and “Multiboot Image” ○ Exactly same bitstream; ○ Tries to boot multiboot image quickly. If that fails, falls back to golden image more slowly. ○ You can load your own images here, if you want, but that’s not the idea…
SKARAB is designed to work in this environment.
board on network, and can load whichever DSP gateware image, configure registers and set it to work.
○ (SKARAB wants DHCP server. Hard-coding IP addresses in your bitstreams no longer so easy.) ○ Hostname support, for example, skarab020394-01. ○ LLDP support (boards announce themselves to switches)
○ First 40G port has hostname skarab020302-01, with MAC 06:50:02:03:02:01
your DHCP server and network (switch), can take a few seconds to bring link back up.
Working Not (yet) working Basic JASPER toolflow Legacy CASPER toolflow (and never will) Polling sensors (power, temp, fans etc) Automatic fan speed control HMC Mezzanine cards Retrieval of logs for hardware errors First 40G ethernet port Arbitrary combinations of Ethernet and HMC cards 1G ethernet port Onboard USB JTAG bridge Remote reprogramming and control Fast (~1 second) remote reloading of FPGA gateware Remote updates (flash firmware) Large wishbone bus (timing implications; WIP) DHCP, LLDP, ARP, PING and other network services Comprehensive DRC during compile Python casperfpga interfaces (mostly; WIP)
○ Else, can overwhelm microblaze with traffic; especially problematic while trying to reprogram. ○ Yellowblock default is to use 7148 (SPEAD default at SKA-SA). ○ Don’t ever use: ■ 7778 decimal (0x1e62); that’s for controlling the microblaze. ■ 29000 decimal (0x7148); that’s used for reprogramming.
What is Hybrid Memory Cube?
○ Don’t have to deal with refreshes, bank management etc in FPGA controller anymore.
○ That’s up to 1.9Tbps. It’s FAST!
for details).
○ Yellowblock makes HMC look like a conventional memory interface.
○ One clock cycle per read&/write request ○ No need for burst reads or writes: truly random access possible. a26 ... a8 a7 a6 a5 a4 a3 a2 a1 a0 D19 ... D0 B3 B2 B1 B0 V3 V2 V1 V0
for details).
○ Yellowblock makes HMC look like a conventional memory interface.
○ One clock cycle per read&/write request ○ No need for burst reads or writes: truly random access possible. a26 ... a8 a7 a6 a5 a4 a3 a2 a1 a0 D19 ... D0 B3 B2 B1 B0 V3 V2 V1 V0
located on other links.
○ Link 1: vaults 0,1,2,3 ○ Link 2: vaults 4,5,6,7 ○ Link 3: vaults 8,9,10,11 ○ Link 4: vaults 12,13,14,15
incurring minimum latency.
additional latency, but the switching network is full crossbar (no reduction in bandwidth).
○ NNB for matrix-transpose (corner-turner).
○ Latency, throughput and order of operations thus not guaranteed. ○ You can issue a request to vault 1 and then another to vault 2, and get the response back from vault 2 first and then the reply from vault 1 some time later. ○ Performance heavily dependent upon your access patterns.
○ Responses contain your tags so you can sort them out again. ○ This can complicate things enormously.
back very quickly, and possibly before many earlier read requests.
○ flit (SERDES comms) errors ○ ECC in DRAM core ○ Buffer overruns ○ Internal logic errors
○ Wideband, programmable delay line ○ Corner-Turner (matrix transpose) ○ Vector-accumulator (buffered, with backpressure)
○ but with 256b interfaces instead of 64b interfaces.
○ Microblaze softcore manages all network services.
○ DHCP with auto-renew and hostname support based on serial number ○ LLDP reporting and discovery ○ ARP ○ Ping ○ Multicast TX and RX, including subscription to multiple sequential addresses. IGMPv2 signalling.
○ Can only subscribe to contiguous chunks of 2^N addresses.
○ At the moment, 40G yellowblock is hard-coded for the first QSFP port on the third mezzanine site. ○ 40G yellowblock currently pulls-in microblaze infrastructure, so all designs must contain a 40G core, even if you’re not using it!
Total available Per 40G port Per HMC mezzanine card Slices 108300 9448 (3.1%) 14173 (13.1%) BRAM 1470 24.5 (1.7%) 116 (7.9%) DSP48 3600 0 (0%) 4 (0.1%)