Atomic COTS G. Hampson, D. Humphrey, G.Jourjon, J. Bunton, K. - - PowerPoint PPT Presentation

atomic cots
SMART_READER_LITE
LIVE PREVIEW

Atomic COTS G. Hampson, D. Humphrey, G.Jourjon, J. Bunton, K. - - PowerPoint PPT Presentation

22 nd September 2020 Accelerating astronomy using Atomic COTS G. Hampson, D. Humphrey, G.Jourjon, J. Bunton, K. Bengston, A.Bolin, Y. Chen, E.Troup Astronomy Instrumentation in general Astronomy instruments can be crudely divided into


slide-1
SLIDE 1

Atomic COTS

22nd September 2020

Accelerating astronomy using

  • G. Hampson, D. Humphrey, G.Jourjon, J. Bunton,
  • K. Bengston, A.Bolin, Y. Chen, E.Troup
slide-2
SLIDE 2
  • Astronomy instruments can be crudely divided into three stages:

○ The bespoke antenna part - RF components specific to the instrument ○ The bespoke receiver part - RF amplification and filtering, ADCs and possibly the first stage filterbank - coarse channels across the bank ○ This presentation focuses on the last stage - and the use of COTS FPGA accelerators

  • It will be shown in context of SKA Low but could be applied to almost any

astronomy instrument

Array of Antennas ADC & Filterbank N-antennas Output products Digital Signal Processing N-antennas M-channels

Astronomy Instrumentation in general

slide-3
SLIDE 3

Custom SKA Gemini Solution

Gemini is a custom system of subracks with backplanes carrying power/liquid and fibre, and a powerful FPGA card with HBM memory and optics. 5 boards have been produced.

slide-4
SLIDE 4

Did we reinvent the wheel?

Gemini development has taken many years. Now COTS boards in PCIe standard exist. These boards have a higher TRL as they are sold to many customers in high quantity.

slide-5
SLIDE 5

Does a COTS solution exist for Low.CBF?

Can SKA use a COTS product? Original down select said no – is this true now? Can the signal flow be modified such that a standard COTS product could be used?

slide-6
SLIDE 6

Xilinx components sell to a high margin military/medical market - but Xilinx are moving into commercial markets like video and 5G – that means competition and high volume.

Why Change? 1st reason - Cost

slide-7
SLIDE 7

x512

“Atomic” Operations

“Coarse” Filterbank Channel Select 384-coarse 1-station Output products One FPGA can compute all output products for 1-coarse 1-coarse 512-stations

FPGAs are all independent!

x384 FPGAs

SKA Low has 512 stations – if we can get them all into one FPGA for 1-coarse channel, then each becomes Atomic – independent of any other FPGAs processing or data flow.

slide-8
SLIDE 8

2nd Reason - Shorter Schedule - less to do

Correlator Filterbank Fine Delay 100GbE Interface SDP Packetiser Correlator H B M SPEAD decoder H B M Doppler & Coarse Delay Quality % RFI Flag In-network Processor PCIe AXI H B M PSS & PST Filterbanks Fine Delay 100GbE Interface PSS/PST Packetisers PSS & PST Beamformers H B M SPEAD decoder H B M Coarse Delay H B M Relative Weight % PSS & PST Jones PCIe AXI PSS Jones RFI Flag In-network Processor

Going Atomic removes a majority of the communications in the FPGA. What remains is all about astronomy, giving a significant reduction in coding and testing and hence schedule.

slide-9
SLIDE 9

A key part of the DSP chain is HBM memory - it enables stages of DSP to be separated (and asynchronous) and implement large memory buffers and corner turns.

Alveo uses HBM enabled FPGAs

slide-10
SLIDE 10

Communications moves out

Comm M&C DSP

For Gemini, a third of our effort is about getting the right data at the right place at the right

  • time. For Atomic COTS, communication moves out of the FPGA – but where did it go?
slide-11
SLIDE 11

COTS Communications

using now

A new technology is available called P4 – it directly controls the data plane of the switch and provides guaranteed performance at line rate using the Tofino ASIC.

slide-12
SLIDE 12

Programmable Communications

3rd Reason - Adaptable and Flexible comms

Fixed FPGA Cube 288-FPGAs 6 8 6 LFAA SDP/PSS/PST Evenly distributed Correlator and Beamformers Existing Gemini Method

The optical data interconnect for Gemini is fixed. Using a P4 in-network processor enables an adaptable/flexible programmable data flow - it copes with failure & scales.

slide-13
SLIDE 13

P4 in-network processor

P4 = Programming Protocol-Independent Packet Processors, some call it a switch, we call it an in-network processor

The P4 match-action tables are key - it can look into the packet itself to decode custom protocols such as SPEAD. Here beam index directs a packet to a particular output port.

slide-14
SLIDE 14

4th reason - COTS servers

  • Integrated redundant

cooling and power

  • 20 PCIe slots per 4U
  • Local CPU & BMC

A COTS server houses the FPGAs and uses standard software. Tango executes adjacent to the FPGA and superfast memory transfers are possible across the PCIe bus.

slide-15
SLIDE 15

Standard OpenCL API for accessing Alveo Kernels

Shell Low.CBF Kernel 100GbE PCIe Monitoring & Control Interface F 4GB HBM Alveo FPGA 4GB HBM C P

AXI

Alveo applications are developed using accelerator concepts. Xilinx uses a standardised OpenCL SW stack to talk to the kernel and enables applications to be developed quickly.

slide-16
SLIDE 16

5th reason - Compact Solution

Server #3 Server #2 Server #1 M&C Server #1

Low.CBF Rack #1 Low.CBF Rack #2 Low.CBF Rack #3

Server #4 Intermediate #3 Intermediate #2 Dual 3-phase PDU Dual 3-phase PDU Dual 3-phase PDU

Low.CBF Rack #4

Dual 3-phase PDU Intermediate #1 Server #5 Server #8 Server #7 Server #6 Server #16 Server #9 Intermediate #6 Intermediate #5 Intermediate #4 Server #10 Server #14 Server #13 Server #12 Server #11 Server #15 Intermediate #9 Intermediate #8 Intermediate #7 Server #17 Intermediate #11 Intermediate #10 I/O #9 Server #18 M&C Server #2 Server #19 Server #20 I/O #2 I/O #1 I/O #4 I/O #3 I/O #6 I/O #5 I/O #8 I/O #7 Spare Server M&C Switch #1 M&C Switch #2

Atomic COTS uses a relatively small number of components. There are less cables, hot redundancy, scales easily, and spares are readily available.

slide-17
SLIDE 17

Summary and Conclusions

  • The Atomic COTS evolution has begun!

○ No hardware to develop - Xilinx has done it already (and it's very low cost) ○ Use standard PCIe servers, power supplies and air cooling ○ Standard OpenCL software stack enables developers to focus on the astronomy not the communications ○ P4 in-network processor which provides line-rate performance with a minimal amount of coding ○ High speed 100GbE data direct into the FPGA, with configuration using PCIe monitoring and control ○ HBM memory enables greater freedom in the firmware design ○ Code can be easily migrated between Alveo boards

  • Many potential projects being investigated currently … what was difficult before

could now be low cost, scalable, compact and quick to develop ○ Prototyping is very promising … no show stoppers identified so far!