Processing Real-Time LOFAR Processing Real-Time LOFAR Telescope - - PowerPoint PPT Presentation

processing real time lofar processing real time lofar
SMART_READER_LITE
LIVE PREVIEW

Processing Real-Time LOFAR Processing Real-Time LOFAR Telescope - - PowerPoint PPT Presentation

Processing Real-Time LOFAR Processing Real-Time LOFAR Telescope Data on a Blue Gene/P Telescope Data on a Blue Gene/P John W. Romein John W. Romein Stichting ASTRON (Netherlands Institute for Radio Astronomy) Dwingeloo, the Netherlands


slide-1
SLIDE 1

May 20, 2009 May 20, 2009 1 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Processing Real-Time LOFAR Processing Real-Time LOFAR Telescope Data on a Blue Gene/P Telescope Data on a Blue Gene/P

John W. Romein John W. Romein

Stichting ASTRON (Netherlands Institute for Radio Astronomy) Dwingeloo, the Netherlands

slide-2
SLIDE 2

May 20, 2009 May 20, 2009 2 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

LO LOw w F Frequency requency AR ARray ray

 radio telescope  10–240 MHz  unexplored  dishes infeasible  ionospheric disturbance  new design

slide-3
SLIDE 3

May 20, 2009 May 20, 2009 3 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

A New Design A New Design

 distributed sensor network  no dishes  O(10,000) antennas  omni-directional  concurrent observations  software telescope  flexible  requires supercomputer

slide-4
SLIDE 4

May 20, 2009 May 20, 2009 4 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

LOFAR Structure LOFAR Structure

 hierarchical  receiver  (tile)  station  telescope  central core  Exloo  central processing  Groningen  real time  off-line

slide-5
SLIDE 5

May 20, 2009 May 20, 2009 5 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

LOFAR Science LOFAR Science

 Epoch of Re-ionization  cosmic rays  extragalactic surveys  transients  pulsars

slide-6
SLIDE 6

May 20, 2009 May 20, 2009 6 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Outline Outline

 from wave to image  basics  receivers  stations  real-time Blue Gene/P processing  performance  off-line processing  image

slide-7
SLIDE 7

May 20, 2009 May 20, 2009 7 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Reflectors vs. Phased Arrays Reflectors vs. Phased Arrays

Receiver Receiving array Physical delay Artificial delay Combiner Output

slide-8
SLIDE 8

May 20, 2009 May 20, 2009 8 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Beam Forming Beam Forming

Receiving array Physical delay Artificial delay Combiner Output

 delay determines

  • bservation direction

 beam forming = delayed

addition

 diameter determines FoV  use earth rotation

slide-9
SLIDE 9

May 20, 2009 May 20, 2009 9 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

LOFAR Antennas LOFAR Antennas

 two antenna types  Low-Band Antenna (10–80 MHz)  High-Band Antenna (110–240 MHz)  FM radio range not covered

slide-10
SLIDE 10

May 20, 2009 May 20, 2009 10 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Low-Band Antennas Low-Band Antennas

 10–80 MHz  dual polarized

slide-11
SLIDE 11

May 20, 2009 May 20, 2009 11 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

LBA Field LBA Field

slide-12
SLIDE 12

May 20, 2009 May 20, 2009 12 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

HBA Tiles HBA Tiles

 110–240 MHz  dual polarized  4x4 receivers = 1 tile  analogue beam forming

slide-13
SLIDE 13

May 20, 2009 May 20, 2009 13 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

A Station A Station

 48–96 LBAs  48–96 HBA tiles

slide-14
SLIDE 14

May 20, 2009 May 20, 2009 14 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Station Cabinet Station Cabinet

 station processing

slide-15
SLIDE 15

May 20, 2009 May 20, 2009 15 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Remote Control Unit Remote Control Unit

 2 LBAs + 1 HBA tile  filter  200 (or 160) MHz A→D conversion

slide-16
SLIDE 16

May 20, 2009 May 20, 2009 16 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Remote Station Processing Boards Remote Station Processing Boards

 FPGAs  PPF: creates 512 * 195 KHz subbands  select up to 164 subbands  beam form LBAs/tiles  UDP packets over WAN to correlator

slide-17
SLIDE 17

May 20, 2009 May 20, 2009 17 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Transient Buffer Boards Transient Buffer Boards

 4 sec. raw antenna data stored in TBB  trigger → freeze → dump → post analysis  not possible with dishes!

slide-18
SLIDE 18

May 20, 2009 May 20, 2009 18 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Stations Stations

 ≤ 2009: prototypes  building real stations now  18–25 core  18–25 remote  8–20 European  dedicated fibers to correlator

slide-19
SLIDE 19

May 20, 2009 May 20, 2009 19 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Observation Characteristics Observation Characteristics

 2 polarizations  32 MHz bandwidth from 1 mode  select 164 * 195 KHz subbands  up to 8 concurrent observations  trade bandwidth for beams

slide-20
SLIDE 20

May 20, 2009 May 20, 2009 20 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

LOFAR Processing LOFAR Processing

slide-21
SLIDE 21

May 20, 2009 May 20, 2009 21 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Central Processing Pipelines Central Processing Pipelines

 standard imaging mode  pulsar survey mode  known pulsar mode  transients mode  very/ultra high-energy modes  ...

slide-22
SLIDE 22

May 20, 2009 May 20, 2009 22 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Blue Gene History Blue Gene History

 6 racks Blue Gene/L (2005–2008)  2½ rack Blue Gene/P (2008–)

slide-23
SLIDE 23

May 20, 2009 May 20, 2009 23 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

The Blue Gene/P The Blue Gene/P

 850 MHz PPC  4 cores * 2 FPUs * 1 FMA/cycle  complex numbers  3-D torus, collective, barrier, 10 GbE, JTAG networks  2½ racks = 10,880 cores = 37 TFLOP/s + 160*10 Gb/s

slide-24
SLIDE 24

May 20, 2009 May 20, 2009 24 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

BG/P Pset BG/P Pset

 I/O Nodes (ION) & Compute Nodes (CN)  ION handles I/O requests of CN  transparent  ION:CN = 1:16  64 IONs/rack

slide-25
SLIDE 25

May 20, 2009 May 20, 2009 25 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

The BG/P Correlator The BG/P Correlator

 three distributed applications/platforms  BG/P I/O nodes (ION)  BG/P compute nodes (CN)  external storage nodes

slide-26
SLIDE 26

May 20, 2009 May 20, 2009 26 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Application Software on I/O Node Application Software on I/O Node

 unorthodox  more efficient & flexible  BG/L: saved costs; for input cluster  BG/L: major system software changes (ZOID) (thanks ANL!)

[PPoPP'08]

 BG/P: better support

slide-27
SLIDE 27

May 20, 2009 May 20, 2009 27 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

I/O Node Processing I/O Node Processing

 two sections  input  output  multi threaded

slide-28
SLIDE 28

May 20, 2009 May 20, 2009 28 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

I/O Node Input Section I/O Node Input Section

 ION receives from 1 station  48,828 pkt/s  handles missing packets

slide-29
SLIDE 29

May 20, 2009 May 20, 2009 29 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Circular Buffer Circular Buffer

 circular buffer (~2.5 s)  WAN delays  delay stream  handle hiccups

Δt = 22μs ≈ 4 * 5.12 μs samples

slide-30
SLIDE 30

May 20, 2009 May 20, 2009 30 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

I/O Node I/O Node → → Compute Node Compute Node

 ION sends data to CN  wall-clock time trigger  chunk

 = 196,608 samples (1.007 s), 1 subband, 2 pols, 1 station

slide-31
SLIDE 31

May 20, 2009 May 20, 2009 31 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Compute Node Processing Compute Node Processing

slide-32
SLIDE 32

May 20, 2009 May 20, 2009 32 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Exchange Exchange

 hundreds of Gb/s  asynchronous

slide-33
SLIDE 33

May 20, 2009 May 20, 2009 33 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

PolyPhase Filter PolyPhase Filter

 splits subband into channels  time vs. frequency resolution  FIR filter + FFT  allows narrow-band RFI removal

slide-34
SLIDE 34

May 20, 2009 May 20, 2009 34 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Phase Correction Phase Correction

 correct observation direction  already shifted samples — correct rest  interpolate

Δt = 22μs = 4 * 5.12 μs samples + e-2iπf *1.52

slide-35
SLIDE 35

May 20, 2009 May 20, 2009 35 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Band Pass Correction Band Pass Correction

 channel powers unequal  caused by station PPF  correct channel power

slide-36
SLIDE 36

May 20, 2009 May 20, 2009 36 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Beam Forming Beam Forming

 add group of stations to form “superstation”  optional

slide-37
SLIDE 37

May 20, 2009 May 20, 2009 37 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Correlate Correlate

 filters noise  multiply samples of all pairs of stations  integrate over time

slide-38
SLIDE 38

May 20, 2009 May 20, 2009 38 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Correlator Output Correlator Output

 correlations between two stations  color = phase, intensity = power  combined contribution of (strong) sources  earth rotation changes phase frequency (MHz) 59.9 60.1 time (h) 9

slide-39
SLIDE 39

May 20, 2009 May 20, 2009 39 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Work Distribution Work Distribution

 process subbands independently  stations must be combined  chunk needs > 1 second processing time  round-robin distribution  receive, process, send, idle  OVERLY SIMPLIFIED!

slide-40
SLIDE 40

May 20, 2009 May 20, 2009 40 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

I/O Node Output Section I/O Node Output Section

 (adds correlations)  best-effort queue  ensures real-time continuation of correlator

slide-41
SLIDE 41

May 20, 2009 May 20, 2009 41 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

I/O Node Real-Time Scheduling I/O Node Real-Time Scheduling

 use Linux RT scheduler

slide-42
SLIDE 42

May 20, 2009 May 20, 2009 42 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

I/O Node Memory I/O Node Memory

PPC 450: software TLB-miss handler [P2S2'09]

Linux: slows down applications by 40%−300%

modified kernel to provide 6 * 256 MiB “fast” pages (thanks ANL!)

slide-43
SLIDE 43

May 20, 2009 May 20, 2009 43 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Storage Storage

 correlations saved on disk  external cluster  ~1 PB  post-processed within week

slide-44
SLIDE 44

May 20, 2009 May 20, 2009 44 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Pulsar Pipelines Pulsar Pipelines

 find & observe pulsars  beam form instead of correlate  5 pipeline flavors  functional; needs optimizations  correlate & beam form concurrently

slide-45
SLIDE 45

May 20, 2009 May 20, 2009 45 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Communication Communication

slide-46
SLIDE 46

May 20, 2009 May 20, 2009 46 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

F Fast ast C Collective

  • llective N

Network etwork P Protocol rotocol

 ION  CN bandwidth insufficient  socket overhead  core hardly keeps up with network  new ION  CN protocol [PDPTA'09]  low overhead  user space  simultaneous send & receive  uses free virtual channel (thanks IBM!)  supports interrupts (thanks IBM!)

slide-47
SLIDE 47

May 20, 2009 May 20, 2009 47 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

FCNP Performance FCNP Performance

ION → CN CN → ION

 approaches link speed

slide-48
SLIDE 48

May 20, 2009 May 20, 2009 48 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Correlator Performance Correlator Performance

slide-49
SLIDE 49

May 20, 2009 May 20, 2009 49 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Optimizations Optimizations

 correlator, beam former, FIR filter, FFT written in assembly  goal: 4 FLOPS/cycle  minimize memory accesses  use L2 prefetch units  influence cache behavior  concurrent loads/stores & FPU ops  hide load & FPU latencies  ~10x faster than C++

slide-50
SLIDE 50

May 20, 2009 May 20, 2009 50 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

FPU Efficiency FPU Efficiency

 one chunk, 64 stations  256-point FFT: 8262 ops (< 5n log n)

GFLOP time (s) efficiency FIR 1.61 0.553 86% FFT 0.812 0.553 43% Correlate 12.9 3.96 96%

slide-51
SLIDE 51

May 20, 2009 May 20, 2009 51 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

How Fast Can We Go? How Fast Can We Go?

required possible station 32 MHz ~ 2.05 Gb/s 48.4 MHz ~ 3.1 Gb/s WAN 2.05 Gb/s 10 Gb/s correlator 32 MHz ~ 2.05 Gb/s ???

 goal:  process 50% more data ...  ... using 40% of the hardware

slide-52
SLIDE 52

May 20, 2009 May 20, 2009 52 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

 test setup  1 rack generates data  1 rack correlates  ½ rack “stores” data  realistic simulation  up to 64 stations

Correlator Performance Correlator Performance

slide-53
SLIDE 53

May 20, 2009 May 20, 2009 53 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Compute Node Scaling Compute Node Scaling

 1 chunk, ≤64 stations  correlate: O(n2)

slide-54
SLIDE 54

May 20, 2009 May 20, 2009 54 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

I/O Node Scaling I/O Node Scaling

 increase station bandwidth  ≤3.1 Gb/s in; ≤1.2 Gb/s out  IP stack expensive  >84% load: data loss

slide-55
SLIDE 55

May 20, 2009 May 20, 2009 55 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Observation Mode A Observation Mode A

 standard mode  50% more subbands CN ION

  • bservation mode

A #stations 64 #bits/sample 16 248 3.1+0.58 CPU load CN 35% CPU load ION 67% #subbands ION I/O (Gb/s)

slide-56
SLIDE 56

May 20, 2009 May 20, 2009 56 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Three (Future) Station Modes Three (Future) Station Modes

mode bits/sample #subbands Gb/s A 16 248 3.1 B 8 496 3.1 C 4 992 3.1

 trade accuracy for subbands  station data rate unaffected  correlator: 2x #subbands ⇒ 2x work; 2x output!

slide-57
SLIDE 57

May 20, 2009 May 20, 2009 57 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Observation Mode B Observation Mode B

 halved bits/sample  doubled #subbands  275 Gb/s CN ION

  • bservation mode

B #stations 64 #bits/sample 8 496 3.1+1.2 CPU load CN 70% CPU load ION 81% #subbands ION I/O (Gb/s)

slide-58
SLIDE 58

May 20, 2009 May 20, 2009 58 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Observation Mode C Observation Mode C

 Epoch of Reionization  reduced #stations  >9.3 GFLOP/s CN ION

  • bservation mode

C #stations 48 #bits/sample 4 992 3.1+1.3 CPU load CN 85% CPU load ION 80% #subbands ION I/O (Gb/s)

slide-59
SLIDE 59

May 20, 2009 May 20, 2009 59 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Performance Conclusions Performance Conclusions

 can process all foreseeable modes  at 50% more bandwidth  using 1 rack only!  changed the specs!

slide-60
SLIDE 60

May 20, 2009 May 20, 2009 60 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

BG/P: The Right Choice? BG/P: The Right Choice?

 compared correlator performance of BG/P, Cell BE,

GTX 280, RV770, Core i7 [ICS'09]

 written in assembly  compiler quality unimportant

slide-61
SLIDE 61

May 20, 2009 May 20, 2009 61 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Many-Core Comparison Many-Core Comparison

slide-62
SLIDE 62

May 20, 2009 May 20, 2009 62 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Many-Core Comparison (2) Many-Core Comparison (2)

Intel IBM ATI NVIDIA STI Architecture Core i7 BG/P 4870 C1060 Cell 48 13.1 171 243 187 achieved efficiency 67% 96% 14% 26% 92% measured bandwidth (GB/s) 19 6.6 47 94 50 bandwidth efficiency 73% 48% 41% 93% 192% 0.37 0.54 1.07 1.00 3.74 measured gflops achieved gflops/Watt

 Cell BE wins, due to software-managed cache  GPUs are I/O bound  BG/P: built-in interconnect; densely packed

slide-63
SLIDE 63

May 20, 2009 May 20, 2009 63 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Off-Line Processing Off-Line Processing

 flagging  self calibration  imaging

slide-64
SLIDE 64

May 20, 2009 May 20, 2009 64 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Flagging Flagging

 invalidate RFI  mostly narrow band  several algorithms

slide-65
SLIDE 65

May 20, 2009 May 20, 2009 65 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Self-Calibration Self-Calibration

 newly developed algorithm  correct instrumental, environmental errors & sky

parameters

 Global Sky Model  pos, flux, pol of O(100,000,000) sky objects  continuously refined  subtract bright sources  compare predicted & measured data  solve  need another supercomputer ...

slide-66
SLIDE 66

May 20, 2009 May 20, 2009 66 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Imaging Imaging

 Fourier transform (U,V) plane → (X,Y) image  several algorithms being considered  special attention to GPU, Cell BE, etc.

slide-67
SLIDE 67

May 20, 2009 May 20, 2009 67 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

An All-Sky Image An All-Sky Image

slide-68
SLIDE 68

May 20, 2009 May 20, 2009 68 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

And Another One And Another One

 ~1,000 sources  1:20,000 dynamic range  resolution limited

slide-69
SLIDE 69

May 20, 2009 May 20, 2009 69 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Pulsar Pulsar

slide-70
SLIDE 70

May 20, 2009 May 20, 2009 70 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Conclusions Conclusions

 LOFAR promises interesting, new science  Blue Gene/P:  very high computational performance  very high bandwidth  bandwidth increase makes LOFAR 50% more efficient

slide-71
SLIDE 71

May 20, 2009 May 20, 2009 71 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

Acknowledgments Acknowledgments

ASTRON: Chris Broekema, Martin Gels, Jan David Mol, Rob van Nieuwpoort ANL: Kamil Iskra, Kazutomo Yoshii IBM: Bruce Elmegreen, Todd Inglett, Tom Liebsch, Andrew Taufener

slide-72
SLIDE 72

May 20, 2009 May 20, 2009 72 ScicomP/SP-XXL'09 ScicomP/SP-XXL'09

References References

John W. Romein, P. Chris Broekema, Jan David Mol, and Rob V. van Nieuwpoort, Processing Real-Time LOFAR Telescope Data on a Blue Gene/P SuperComputer, Under review

Kazutomo Yoshii, Kamil Iskra, P. Chris Broekema, H. Naik, and Pete Beckman, Characterizing the Performance of Big Memory on Blue Gene Linux, International Workshop on Parallel Programming Models and System Software for High-End Computing (P2S2'09), Vienna, Austria, September, 2009

John W. Romein, FCNP: Fast I/O on the Blue Gene/P, Parallel and Distributed Processing Techniques and Applications (PDPTA'09), Las Vegas, NV, July, 2009

Rob V. van Nieuwpoort and John W. Romein, Using Many-Core Hardware to Correlate Radio Astronomy Signals, ACM International Conference on SuperComputing (ICS'09), New York, NY, June, 2009

Kamil Iskra, John W. Romein, Kazutomo Yoshii, and Pete Beckman, ZOID: I/O- Forwarding Infrastructure for Peta-Scale Architectures, ACM Symposium on Principles and Paradigms of Parallel Programming (PPoPP'08), Salt Lake City, NV, February, 2008

John W. Romein, P. Chris Broekema, Ellen van Meijeren, Kjeld van der Schaaf, and Walther H. Zwart, Astronomical Real-Time Signal Processing on a Blue Gene/L SuperComputer, ACM Symposium on Parallel Algorithms and Architectures (SPAA'06), Cambridge, MA, July, 2006