The ALMA archive Mark Lacy Data Services Lead, NAASC, NRAO NA ALMA - - PowerPoint PPT Presentation

the alma archive
SMART_READER_LITE
LIVE PREVIEW

The ALMA archive Mark Lacy Data Services Lead, NAASC, NRAO NA ALMA - - PowerPoint PPT Presentation

The ALMA archive Mark Lacy Data Services Lead, NAASC, NRAO NA ALMA Development 2016 Motivation Papers from archival data are an important. Publications using archival data account for half of the HST and Chandra publications each year.


slide-1
SLIDE 1

The ALMA archive

Mark Lacy

Data Services Lead, NAASC, NRAO

NA ALMA Development 2016

slide-2
SLIDE 2

Motivation

  • Papers from archival data are an important. Publications using archival data

account for half of the HST and Chandra publications each year.

  • ALMA archival papers picking up, some examples (in high-z extragalactic science):

– Fujimoto et al. 2016: faint end of ALMA source counts to 0.02mJy. – Silva et al. 2015 – excess of submm sources around bright WISE-selected AGN – Oteo et al. 2016: number counts from calibrator fields. – Can save many hours of observation time spent looking at deep fields. – Removes issues of field-to-field variations

  • Can point the way for some future observations without needing the TAC to be

“brave” and approve an ambitious/risky proposal.

  • Could allow entire projects to be constructed, or supplemented with small

amounts of new ALMA data e.g. for a student thesis.

NA ALMA development 2016

slide-3
SLIDE 3

Goals of the Archive

  • Provide access to data.

– Traditional search/download limited by bandwidth. – Need to move to server-side tools for the largest ALMA datasets.

  • Provide rich metadata to allow complicated

queries/data mining

– ALMA has a lot of data (~200TB today), but with very low information content (~10-4%). – To change it to “Big Data” that we can mine using data science techniques we need to extract the information from the noise. (Specific examples: source and line lists.)

NA ALMA development 2016

slide-4
SLIDE 4

Short-term Archive Developments

  • May have noticed “collapsed rows” in latest release.

– These were needed as a prerequisite for the ingest of individual pipeline products in Cycle 4. – Tests of this will begin in September. – Current product “tar blobs” will be phased out.

  • Coming soon (during Cycle 4):

– footprints via Aladdin Lite – RMS values – Upload of target lists to search on.

NA ALMA development 2016

slide-5
SLIDE 5

Current projects

  • Access to data – even very large files

– CARTA - remote visualization capability (Erik Rosolowsky) – Pipeline Processing Interface (PPI) – remote pipeline runs to deliver calibrated measurement sets and/or images (NRAO initiative; this talk).

  • From “a lot of data” to “Big Data”

– ADMIT enhanced metadata production (Peter Teuben)

NA ALMA development 2016

slide-6
SLIDE 6

PPI

  • Pipeline Processing interface will allow reruns of the ALMA pipeline. Two

modes: – Apply existing archival calibration tables to ALMA raw data to produce calibrated measurement sets.

  • Useful if you just need calibrated uv-data

– Run the current ALMA pipeline version to produce calibrated measurement sets and/or images.

  • Useful for running data with a new version of the pipeline
  • Also will allow for some pipeline parameter tweaks (parameters

will expand with time).

  • Initially available as part of the new NRAO archive access tool, will also be

made available to all ARCs as an add-on to the request handler.

NA ALMA development 2016

slide-7
SLIDE 7

RH/OODT infrastructure for the PPI

  • The PPI uses a modified ALMA request

handler and the “Object Oriented Data Technology” (OODT) to kick off jobs on the cluster.

  • This can be generalized to other tasks e.g.

analysis tools, visualization tasks etc.

  • So provides a framework for server-side

deployment of software from the ALMA Development program.

NA ALMA development 2016

slide-8
SLIDE 8

Creating rich metadata

  • Accurate prior information is crucial

– Source positions & spatial extents – Source velocities/redshifts – (Targeted lines)

  • Some of this is supplied by the PI in the OT, some not

– Need to supplement with 3rd party sources (SIMBAD, NED) – But problem of validation, and how to update – We are working on it, but slowly.

  • Source/clump finders in 2D and 3D (Jeff’s talk).

NA ALMA development 2016

slide-9
SLIDE 9

What else might we need/want?

  • Predictive searches (because you selected this

dataset you may also be interested in…) (Barrientos, JAO).

  • Cutout server (for extracting small pieces of large

cubes).

  • VO interoperability, e.g.

– Search multiwavelength archives for other data on an ALMA field. – Upload ALMA pointings/footprints to e.g. DS9 or Topcat (SAMP).

NA ALMA development 2016

slide-10
SLIDE 10

Summary

  • We strongly encourage development

proposals for the ALMA archive.

  • Do need to work closely with the Archive

working group and the software development team in Garching to ensure successful integration of products/services.

NA ALMA development 2016