Atomic COTS
22nd September 2020
Accelerating astronomy using
- G. Hampson, D. Humphrey, G.Jourjon, J. Bunton,
- K. Bengston, A.Bolin, Y. Chen, E.Troup
Atomic COTS G. Hampson, D. Humphrey, G.Jourjon, J. Bunton, K. - - PowerPoint PPT Presentation
22 nd September 2020 Accelerating astronomy using Atomic COTS G. Hampson, D. Humphrey, G.Jourjon, J. Bunton, K. Bengston, A.Bolin, Y. Chen, E.Troup Astronomy Instrumentation in general Astronomy instruments can be crudely divided into
22nd September 2020
Accelerating astronomy using
○ The bespoke antenna part - RF components specific to the instrument ○ The bespoke receiver part - RF amplification and filtering, ADCs and possibly the first stage filterbank - coarse channels across the bank ○ This presentation focuses on the last stage - and the use of COTS FPGA accelerators
astronomy instrument
Array of Antennas ADC & Filterbank N-antennas Output products Digital Signal Processing N-antennas M-channels
Astronomy Instrumentation in general
Custom SKA Gemini Solution
Gemini is a custom system of subracks with backplanes carrying power/liquid and fibre, and a powerful FPGA card with HBM memory and optics. 5 boards have been produced.
Did we reinvent the wheel?
Gemini development has taken many years. Now COTS boards in PCIe standard exist. These boards have a higher TRL as they are sold to many customers in high quantity.
Does a COTS solution exist for Low.CBF?
Can SKA use a COTS product? Original down select said no – is this true now? Can the signal flow be modified such that a standard COTS product could be used?
Xilinx components sell to a high margin military/medical market - but Xilinx are moving into commercial markets like video and 5G – that means competition and high volume.
Why Change? 1st reason - Cost
x512
“Atomic” Operations
“Coarse” Filterbank Channel Select 384-coarse 1-station Output products One FPGA can compute all output products for 1-coarse 1-coarse 512-stations
FPGAs are all independent!
x384 FPGAs
SKA Low has 512 stations – if we can get them all into one FPGA for 1-coarse channel, then each becomes Atomic – independent of any other FPGAs processing or data flow.
2nd Reason - Shorter Schedule - less to do
Correlator Filterbank Fine Delay 100GbE Interface SDP Packetiser Correlator H B M SPEAD decoder H B M Doppler & Coarse Delay Quality % RFI Flag In-network Processor PCIe AXI H B M PSS & PST Filterbanks Fine Delay 100GbE Interface PSS/PST Packetisers PSS & PST Beamformers H B M SPEAD decoder H B M Coarse Delay H B M Relative Weight % PSS & PST Jones PCIe AXI PSS Jones RFI Flag In-network Processor
Going Atomic removes a majority of the communications in the FPGA. What remains is all about astronomy, giving a significant reduction in coding and testing and hence schedule.
A key part of the DSP chain is HBM memory - it enables stages of DSP to be separated (and asynchronous) and implement large memory buffers and corner turns.
Alveo uses HBM enabled FPGAs
Communications moves out
Comm M&C DSP
For Gemini, a third of our effort is about getting the right data at the right place at the right
COTS Communications
using now
A new technology is available called P4 – it directly controls the data plane of the switch and provides guaranteed performance at line rate using the Tofino ASIC.
3rd Reason - Adaptable and Flexible comms
Fixed FPGA Cube 288-FPGAs 6 8 6 LFAA SDP/PSS/PST Evenly distributed Correlator and Beamformers Existing Gemini Method
The optical data interconnect for Gemini is fixed. Using a P4 in-network processor enables an adaptable/flexible programmable data flow - it copes with failure & scales.
P4 in-network processor
P4 = Programming Protocol-Independent Packet Processors, some call it a switch, we call it an in-network processor
The P4 match-action tables are key - it can look into the packet itself to decode custom protocols such as SPEAD. Here beam index directs a packet to a particular output port.
4th reason - COTS servers
cooling and power
A COTS server houses the FPGAs and uses standard software. Tango executes adjacent to the FPGA and superfast memory transfers are possible across the PCIe bus.
Standard OpenCL API for accessing Alveo Kernels
Shell Low.CBF Kernel 100GbE PCIe Monitoring & Control Interface F 4GB HBM Alveo FPGA 4GB HBM C P
AXI
Alveo applications are developed using accelerator concepts. Xilinx uses a standardised OpenCL SW stack to talk to the kernel and enables applications to be developed quickly.
5th reason - Compact Solution
Server #3 Server #2 Server #1 M&C Server #1
Low.CBF Rack #1 Low.CBF Rack #2 Low.CBF Rack #3
Server #4 Intermediate #3 Intermediate #2 Dual 3-phase PDU Dual 3-phase PDU Dual 3-phase PDU
Low.CBF Rack #4
Dual 3-phase PDU Intermediate #1 Server #5 Server #8 Server #7 Server #6 Server #16 Server #9 Intermediate #6 Intermediate #5 Intermediate #4 Server #10 Server #14 Server #13 Server #12 Server #11 Server #15 Intermediate #9 Intermediate #8 Intermediate #7 Server #17 Intermediate #11 Intermediate #10 I/O #9 Server #18 M&C Server #2 Server #19 Server #20 I/O #2 I/O #1 I/O #4 I/O #3 I/O #6 I/O #5 I/O #8 I/O #7 Spare Server M&C Switch #1 M&C Switch #2
Atomic COTS uses a relatively small number of components. There are less cables, hot redundancy, scales easily, and spares are readily available.
Summary and Conclusions
○ No hardware to develop - Xilinx has done it already (and it's very low cost) ○ Use standard PCIe servers, power supplies and air cooling ○ Standard OpenCL software stack enables developers to focus on the astronomy not the communications ○ P4 in-network processor which provides line-rate performance with a minimal amount of coding ○ High speed 100GbE data direct into the FPGA, with configuration using PCIe monitoring and control ○ HBM memory enables greater freedom in the firmware design ○ Code can be easily migrated between Alveo boards
could now be low cost, scalable, compact and quick to develop ○ Prototyping is very promising … no show stoppers identified so far!