A new correlator for LOFAR l Chris Broekema l On behalf of the COBALT - - PowerPoint PPT Presentation

a new correlator for lofar
SMART_READER_LITE
LIVE PREVIEW

A new correlator for LOFAR l Chris Broekema l On behalf of the COBALT - - PowerPoint PPT Presentation

l Netherlands Institute for Radio Astronomy COBALT A new correlator for LOFAR l Chris Broekema l On behalf of the COBALT team Chris Broekema l ASTRON / RUG / DELL NL On behalf of the COBALT team ASTRON / RUG / DELL NL ASTRON The Netherlands


slide-1
SLIDE 1

lNetherlands Institute for Radio Astronomy

COBALT A new correlator for LOFAR

lChris Broekema

lOn behalf of the COBALT team lASTRON / RUG / DELL NL

Chris Broekema

On behalf of the COBALT team ASTRON / RUG / DELL NL

slide-2
SLIDE 2

ASTRON

The Netherlands Institute for radio astronomy

lVerdana, 20 pts

slide-3
SLIDE 3

ASTRON Mission Statement

To make discoveries in radio astronomy happen, via the development of novel and innovative technologies, the

  • peration of world-class radio astronomy facilities, and the

pursuit of fundamental astronomical research.

slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6

Introduction radio astronomy (Slight Dutch bias)

  • First observations 1932 by Karl Jansky
  • Frst purpose built telescope 1937 by Grote Reber
  • 21 cm emission line of neutral hydrogen

Predicted 1944 by van de Hulst Detected in 1951 by Ewen and Purcell (MIT) Published after confirmation by Muller and Oort

  • Opening Dwingeloo radio telescope in 1956
  • Doppler effect (redshift) of fast moving objects

shows structure of the local galaxy(1950's)

slide-7
SLIDE 7

Introduction radio astronomy

slide-8
SLIDE 8
slide-9
SLIDE 9
slide-10
SLIDE 10
slide-11
SLIDE 11
slide-12
SLIDE 12

LOFAR A distributed radiotelescoop

slide-13
SLIDE 13

The LOFAR “Superterp”

slide-14
SLIDE 14
slide-15
SLIDE 15

Phased Arrays

slide-16
SLIDE 16

IBM Blue Gene/P To be retired early 2014

slide-17
SLIDE 17
slide-18
SLIDE 18

Hardware design – Tasks

  • 1. Receive LOFAR antenna field data

l10 GbE Ethernet; ~3 Gbps/station

  • 2. Transpose data (ref. MPI_Alltoallv())
  • 3. Compute (correlate, beamform, filter, flag, etc)
  • Single precision floating point
  • Complex multiply-add
  • 4. Forward results to storage

lStorage cluster >100m, SM fibre l10GbE or QDR Infiniband

slide-19
SLIDE 19

NVIDIA Tesla K10

slide-20
SLIDE 20

COBALT Preliminary design (Feb 2013)

Strawman node

lDual Xeon E5 l2x Nvidia K10 l4x 10GbE l2x FDR IB

slide-21
SLIDE 21

First prototype Dell PowerEdge R720

slide-22
SLIDE 22

First prototype Dell PowerEdge R720

PCIe

slide-23
SLIDE 23

Second prototype Dell PowerEdge T620

slide-24
SLIDE 24

Second prototype Dell PowerEdge T620

slide-25
SLIDE 25

GPU idle temperatures

| NVIDIA-SMI 5.319.12 Driver Version: 319.12 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K10.G2.8GB Off | 0000:04:00.0 Off | N/A | | N/A 75C P0 43W / ERR! | 0% 9MB / 3583MB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla K10.G2.8B Off | 0000:05:00.0 Off | N/A | | N/A 76C P0 42W / ERR! | 0% 9MB / 3583MB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla K10.G2.8GB Off | 0000:45:00.0 Off | N/A | | N/A 62C P0 42W / ERR! | 0% 9MB / 3583MB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla K10.G2.8GB Off | 0000:46:00.0 Off | N/A | | N/A 46C P0 36W / ERR! | 0% 9MB / 3583MB | 0% Default | +-------------------------------+----------------------+----------------------+

slide-26
SLIDE 26

Prototype airflow guides Note: temperatures are under full load

slide-27
SLIDE 27

3D-printed prototype designed and produced by ASTRON

slide-28
SLIDE 28

GPU temperatures with 3D-printed airflow guides

+------------------------------------------------------+ | NVIDIA-SMI 5.319.12 Driver Version: 319.12 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K10.G2.8GB Off | 0000:04:00.0 Off | N/A | | N/A 48C P0 92W / ERR! | 2% 54MB / 3583MB | 99% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla K10.G2.8GB Off | 0000:05:00.0 Off | N/A | | N/A 52C P0 91W / ERR! | 2% 54MB / 3583MB | 100% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla K10.G2.8GB Off | 0000:45:00.0 Off | N/A | | N/A 51C P0 92W / ERR! | 2% 54MB / 3583MB | 99% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla K10.G2.8GB Off | 0000:46:00.0 Off | N/A | | N/A 49C P0 95W / ERR! | 2% 54MB / 3583MB | 99% Default | +-------------------------------+----------------------+----------------------+

slide-29
SLIDE 29

Final COBALT system

slide-30
SLIDE 30

Current status

  • 9 COBALT nodes operational (testing phase)
  • “Mass-produced” airflow ducts/guides in place
  • Software development effort on schedule
  • Commissioning proceeding

COBALT project passed performance review on 30th Aug COBALT Operational Readiness Review early December

slide-31
SLIDE 31

First Fringes with COBALT November 1st 2013

slide-32
SLIDE 32

Summary Or: problems faced

  • R720 PCIe imbalance
  • 40 GbE ≠ 4x 10GbE
  • R720 doesn't fit 2x dual 10GbE
  • Dual port ConnectX3 IB PCIe bottleneck
  • Cooling issues T620 + Nvidia K10
  • Software optimizations → MPI stack
  • Accurate measuring performance/load
  • BUT: we are well on track to build a completely new correlator
  • within 12 months
slide-33
SLIDE 33

Questions?