Easing the Pain of Astronomical Database Access Disclaimer: I - - PowerPoint PPT Presentation

easing the pain of astronomical database access disclaimer
SMART_READER_LITE
LIVE PREVIEW

Easing the Pain of Astronomical Database Access Disclaimer: I - - PowerPoint PPT Presentation

Easing the Pain of Astronomical Database Access Disclaimer: I usually present this software to non VO-savvy people What is VODb? A desktop application that aims to simplify the process of querying astronomical databases A little history


slide-1
SLIDE 1

Easing the Pain of Astronomical Database Access

slide-2
SLIDE 2

Disclaimer:

I usually present this software to non VO-savvy people

slide-3
SLIDE 3

What is VODb?

A desktop application that aims to simplify the process

  • f querying astronomical databases
slide-4
SLIDE 4

A little history

  • Started life as part of the Pan-STARRS project
  • Pan-STARRS has a lot of data* and a complex schema
  • Difficult for the novice to retrieve any meaningful data
  • (Difficult for experts to retrieve any meaningful data)
  • Many queries would never complete

* time-domain; 150 billion detections; 5.5 billion sources; >10xSDSS in first year

slide-5
SLIDE 5

Existing web interface

  • Pan-STARRS has an SDSS-like web interface
  • Offers a query box and a basic query builder

but...

  • Web-pages offer limited functionality
  • Results need to be manually downloaded...
  • ...then loaded into a desktop application for analysis
slide-6
SLIDE 6

Objectives of VODb

  • To make it easier to generate SQL
  • To utilise existing VO software to analyse the results
  • To provide the ability to work on data offline

In short: to enable science

slide-7
SLIDE 7

Aren't there exisiting VO database access tools?

slide-8
SLIDE 8

Yes, but...

Tools like (the excellent) TOPCAT enable database querying as an extra. VODb is a dedicated astronomical database access tool that tries to apply the UNIX philosophy of: "Do one thing and do it well"

slide-9
SLIDE 9

Nearly ideal, but...

  • Can't connect to Pan-STARRS (or other non-TAP Dbs)
  • Doesn't provide the higher level abstraction we wanted
  • We wanted a more 'fancy' drag 'n drop query builder
slide-10
SLIDE 10

What's so difficult about SQL?

slide-11
SLIDE 11

Lots of things are difficult about SQL

  • SQL is a programming language (an ancient one)
  • Not all astronomers are programmers
  • Standardised, but too late: dozens of dialects
  • Most vendors don't support the entire standard
  • Most vendors diverge from the standard (eg date/time)
  • Vendors have an incentive to remain non-standard
slide-12
SLIDE 12
  • In theory we should just ask for what we want
  • We should not have to specify how to get it

but...

  • Bad performance often enforces an imperative approach
  • Requires users to have knowledge of the implementation

This is definitely bad

But it's declarative, which is good...

slide-13
SLIDE 13

But we're stuck with SQL so...

slide-14
SLIDE 14
  • Whenever possible, create an abstraction layer so that

knowledge of the schema, indexes, keys, partitioning etc is not necessary

  • When it is necessary to write SQL directly, provide as

much help as possible (user-friendly GUIs)

Simplify the process whenever we can

slide-15
SLIDE 15
slide-16
SLIDE 16

What data can you access?

  • Any local JDBC-supported database
  • Any remote TAP database (SDSS, 2MASS etc)
  • Pan-STARRS (registered users only)
  • A local Derby database to store results
  • Design allows new connection types to be added easily
slide-17
SLIDE 17

VO compliant

  • SAMP
  • VOTable
  • Table Access Protocol (TAP)
slide-18
SLIDE 18

Some features

slide-19
SLIDE 19

A user-friendly double-click desktop application

slide-20
SLIDE 20

A straightforward, searchable connection manager

slide-21
SLIDE 21

SQL syntax highlighting and other text editing conveniences

slide-22
SLIDE 22

Drag 'n drop graphical query builder

slide-23
SLIDE 23

Astronomical date/time and coordinate converters

slide-24
SLIDE 24

Mask generator e.g. for SDSS data quality flags

slide-25
SLIDE 25

SAMP communication (pass data to TOPCAT etc)

slide-26
SLIDE 26

Local Derby database including full query history

slide-27
SLIDE 27

Query diagnostics to help users understand why they waited so long

slide-28
SLIDE 28

But those features still require writing SQL

slide-29
SLIDE 29

What are plug-in queries?

  • Think of them as sophisticated sample queries
  • (SDSS and PS: the “20 queries” design methodology)
  • Curated online (in XML) for instant deployment
  • Available through menus
  • Delivered as user-friendly GUIs
  • Constrained parameters reduce errors
  • Default values enable quick, successful queries
slide-30
SLIDE 30

Plug-in queries

slide-31
SLIDE 31

Why plug-in queries?

  • Using a query builder requires a thorough knowledge of

the schema

  • Sample queries are helpful, but cumbersome (copy-and-

paste, edit)

slide-32
SLIDE 32
slide-33
SLIDE 33

Example plug-in query XML

<query> <author>Millenium</author> <title>Tully-Fisher relation </title> <shortDescription> Find the Tully-Fisher relation... </shortDescription> <longDescription> Find the Tully-Fisher relation, Mag vs Vvir, for galaxies with a bulge/total mass ratio less than a given value </longDescription> <param> <name> PARAM_BULGE_MASS_RATIO </name> <description> Bulge/total mass ratio </description> <type> float</type> <max> 1</max> <min> 0</min> <default> 0.1</default> </param> <param> <name> PARAM_SNAPNUM </name> <description> Snapnum (redshift) </description> <type> int</type> <max> 63</max> <min> 0</min> <default> 41</default> </param> <sql>SELECT vVir, mag_b, mag_v, mag_i, mag_r, mag_k FROM millimil.DeLucia2006a WHERE (bulgeMass &lt; PARAM_BULGE_MASS_RATIO*stellarMass OR bulgeMass IS NULL) AND snapnum = PARAM_SNAPNUM </sql> </query>

slide-34
SLIDE 34

Learning by example

  • It's difficult to write good SQL
  • Plug-ins allow quick, successful querying
  • Allow users to adapt, enhance and learn by example
  • Help users improve SQL skills
  • Help users and become familiar with schema
slide-35
SLIDE 35

VODb key principle: Abstraction

Astronomers shouldn't have to worry about:

  • Database implementation (indexing, partitioning etc)
  • File formats (FITS, csv, VOTable etc)
  • Connection types (TAP, JDBC, Casjobs etc)

These are distractions from the science...

slide-36
SLIDE 36

Instead:

  • Users given a list of databases with meaningful names
  • Users do not interact with any files
  • Reduced need to understand the peculiarities of SQL

VODb key principle: Abstraction

slide-37
SLIDE 37

Some users

  • Popular with Pan-STARRS users
  • Some interest from LSST and VAO
  • Interest from Millennium simulation database
  • Possibility to use with Euclid prototype archive
slide-38
SLIDE 38

http://goo.gl/4X147