Parsa, Andrew Siemion, Dan Werthimer, Mel Wright Outline What is a - - PowerPoint PPT Presentation

parsa andrew siemion dan werthimer mel wright outline
SMART_READER_LITE
LIVE PREVIEW

Parsa, Andrew Siemion, Dan Werthimer, Mel Wright Outline What is a - - PowerPoint PPT Presentation

Jason Manley, Aaron Parsons, Don Backer, Henry Chen, Terry Filiba, David MacMahon, Peter McMahon, Arash Parsa, Andrew Siemion, Dan Werthimer, Mel Wright Outline What is a correlator? Scalable packetized correlators: The architecture


slide-1
SLIDE 1

Jason Manley, Aaron Parsons, Don Backer, Henry Chen, Terry Filiba, David MacMahon, Peter McMahon, Arash Parsa, Andrew Siemion, Dan Werthimer, Mel Wright

slide-2
SLIDE 2

Outline

  • What is a correlator?
  • Scalable packetized correlators:

– The architecture – The hardware – The software – The cost

  • Closing thoughts
  • Walk through actual design
  • Questions and comments
slide-3
SLIDE 3

Interferometry…

slide-4
SLIDE 4
slide-5
SLIDE 5

Basic idea

Amplitude Time

90°

Vij  Vij Vi Vj

∑ ∑

Z-n 90°

Vij  Vij Vi Vj

∑ ∑

Z-n Z-n 90°

Vik  Vik Vk

∑ ∑

Vii

Vjj

Vjk

 Vjk

Vkk

Amplitude Time

slide-6
SLIDE 6

“Actual” FX Correlator

∑ FFT Z-n FFT Z-n FFT Z-n ∑ ∑

slide-7
SLIDE 7

CASPER DSP backend concept

Commercial off-the-shelf Multicast 10 Gbps (10GE

  • r InfiniBand) Switch

PFB

ADC

FPGA DSP Module FPGA DSP Module FPGA DSP Module FPGA DSP Module FPGA DSP Module FPGA DSP Module

General-purpose CPUs

PFB PFB

. . .

Correlator Beamformers/ Spectrometers Pulsar timer

. . .

Reconfigurable Compute Cluster

ADC ADC

Polyphase Filter Banks

. . . . . .

slide-8
SLIDE 8

Design Philosophy

  • Standardized processing hardware
  • Commercial interconnect
  • Asynchronous compute engines
  • Synchronization using common 1PPS
  • UDP output delivery over ethernet

network

  • Correlator scales with your array
slide-9
SLIDE 9

F Engine 0 10GbE Switch F Engine 1 F Engine N-1 X Engine 0 X Engine 1 X Engine N-1

. . . . . . . . .

CASPER FX Architecture

slide-10
SLIDE 10

Implementation

F Engine 0 10GbE Switch F Engine 1 F Engine N-1 X Engine 0 X Engine 1 X Engine N-1 . . . . . . . . .

slide-11
SLIDE 11

Architecture to hardware mapping

Example 8 Antenna system

BEE2 10GbE Switch X Eng X Eng BEE2 user FPGA X Eng X Eng BEE2 user FPGA X Eng X Eng BEE2 user FPGA X Eng X Eng BEE2 user FPGA F Eng F Eng iBOB F Eng F Eng iBOB F Eng F Eng iBOB F Eng F Eng iBOB

slide-12
SLIDE 12

F Engine Operations

Reformat

DDC Quantize

Channelize

X Engine

ADC

  • Two F engines per iBOB
  • Dual polarization design
  • Currently uses ASTRO library
  • Currently processes data at native clock

rate (<200MHz IBOB or < 400MHz ROACH)

slide-13
SLIDE 13

Setup and Control

  • Clocks:

– X engines each run off independent clock – Sampling synchronized at F engines, but clock not distributed to X engines

  • Synchronized using global 1pps signal at ADCs

– Propagated to X engines using out-of-band signaling on XAUI links – Headers labeling 10GbE Ethernet packet data

  • System control: separate 100Mbps Ethernet network on BEE2
  • F engines configured from BEEs through XAUI links
  • Control packets: CASPER UDP framework on BEE2 control FPGA
  • Execute Python scripts for configuration, control and debugging
slide-14
SLIDE 14

F engine development

  • 2008:

– Coarse delays (cable length compensation) – Fringe-stopping & fine delays – Walsh code generation and phase switching – Real sampling (low bandwidth) – Parallel streams (high bandwidth)

  • Future:

– Ability to output subset of band – Spectral zoom modes

slide-15
SLIDE 15

X Engine Operations

  • Using CASPER library
  • Scales with 2^N antennas
  • Fit as many X engines on an FPGA as

possible (2x 16 ant on BEE2 usr)

10GbE Buffer X Eng Accum

F Engine

slide-16
SLIDE 16

Backend Software

  • UDP packets received
  • Currently received, parsed and saved in

MIRIAD file format by single computer.

  • Computing requirements dependant on

experiment;

  • Usually single computer ok: 128

antennas, 1 sec integrations, 2k chan = 512MB/s

slide-17
SLIDE 17

Pending systems

  • Bench sys: 8ant, DP, 200MHz, 2k ch
  • PAPER: 128ant, DP, 100MHz, 2k ch
  • KAT-7: 8ant, DP, 256MHz, 2k ch
  • meerKAT: 80ant, DP, 1GHz, 16k ch
  • Bologna: 32ant, SP, 32MHz, 1k ch
  • GMRT: 32ant, DP, 400MHz, 4k-8k ch
slide-18
SLIDE 18

How does it scale

1 10 100 1000 10000 100000 1000000 F Engines X Engines

slide-19
SLIDE 19

FPGA Roadmap

  • Processing power doubling every two years
  • V4 = ½ power requirements of V2Pro*

* Manufacturers claim - Xilinx Inc.

100 200 300 400 2000 2002 2004 2006 43 100 200 330 Logic Cells Thousands

Xilinx Virtex Family

slide-20
SLIDE 20

Coming soon…

  • 10Gbps output optionally gives integrations ~10ms
  • More efficient use of hardware DSP slices
  • High speed, scalable, distributed data capture software
  • Walsh codes and phase switching
  • Phase rotation
  • 64 antenna design
  • Upgrade to 4096 channels
  • ROACH hardware:

– <400MHz bandwidth – 16 384 channels – 128 antennas – no architectural changes

slide-21
SLIDE 21

Questions and Comments

Visit the CASPER correlator page:

http://casper.berkeley.edu/wiki/index.php?title=Correlator

Add your own requirements:

http://casper.berkeley.edu/wiki/index.php?title=International_ Correlator_Collaboration

Email me: jason_manley@hotmail.com

slide-22
SLIDE 22
slide-23
SLIDE 23

PFB-FFT response

slide-24
SLIDE 24

Current uses Pocket Spectrometer

  • Using ATMEL ADC’s at 2

Gsamples/sec

  • Performing 4 real FFT’s in 1

(complex) biplex pipelined FFT module.

  • 2048 channels
  • Uses just 1 ADC, 1 IBOB, and your

laptop.

slide-25
SLIDE 25

ROACH block diagram