VIRTUAL OBSERVATORY TECHNOLOGIES
Tamás Budavári / The Johns Hopkins University
7/30/2010
VIRTUAL OBSERVATORY TECHNOLOGIES Tams Budavri / The Johns Hopkins - - PowerPoint PPT Presentation
VIRTUAL OBSERVATORY TECHNOLOGIES Tams Budavri / The Johns Hopkins University 7/30/2010 Moores Law, Big Data! Tams Budavri 2 7/30/2010 Outline 3 Tams Budavri SQL for Big Data Computing where the bytes are Database
7/30/2010
Tamás Budavári 7/30/2010
2
Tamás Budavári
7/30/2010
SQL for Big Data
Computing where the bytes are
Database and GPU integration
CUDA from SQL
Data intensive Web services
Behind the scenes
Working examples
Sloan Digital Sky Survey Virtual Observatory tools and services
3
Tamás Budavári
7/30/2010
How it works How to build it How to use it What’s next
4
Tamás Budavári
7/30/2010
Atomic services Access to observations, simulations Access to models Higher level services Combine for more functionality User and analysis tools Can be a high level service, too
5
Tamás Budavári
Blobs: images, spectra, etc... Access, transfer Catalogs Fast searches, indexes
7/30/2010
6
Tamás Budavári
7/30/2010
7
SQL`92 standard
Almost in English SELECT <columns> FROM <table> WHERE <conditions>
Astronomical Data Query Language
An extended subset GIS-like spatial
Tamás Budavári
7/30/2010
8
SQL`92 standard
Almost in English SELECT RA, Dec FROM Stars WHERE r < 15
Astronomical Data Query Language
An extended subset GIS-like spatial
Tamás Budavári
7/30/2010
9
Sources in observations fields: 2 tables
Tamás Budavári
7/30/2010
10
Computed columns
Use J-H in SELECT and/or WHERE Similarly functions, e.g., POWER(10,-0.4*Rmag)
Grouping
SELECT FieldID, AVG(J), STDEV(J) FROM Sources GROUP BY FieldID
Can use for histograming, etc… E.g., SDSS Catalog Archive here
Tamás Budavári
11
Sloan Digital Sky Survey 2001-2008 8TB Catalog Archive Server Custom tools and indices Upcoming Surveys PanSTARRS: 100TB 2010- LSST: 1PB+ 201?
Tamás Budavári
In the number of cores Faster than ever (for now)
7/30/2010
12
Tamás Budavári
100s of cores – 27k parallel threads per GPU Running a billion threads a second Forget the fancy old algorithms Built on wrong assumptions Today CPU is free, RAM is slow GPU has >50GB/s bandwidth Still difficult to occupy the cores
7/30/2010
13
Tamás Budavári
7/30/2010
launch launch run un
sync
14
Tamás Budavári
Dedicated service for direct access
Shared memory IPC w/ on-the-fly data transform
7/30/2010
15
Tamás Budavári
Dedicated service for direct access
Shared memory IPC w/ on-the-fly data transform
7/30/2010
16
Tamás Budavári
7/30/2010
Correlation functions From pair-counts State of the art Dual-tree traversal
High resolution bins?
Just like brute force 8 bins
17
Tamás Budavári
18
800800 bins
Tamás Budavári
Pair counts computed on GPU
Returns 2D histogram as a table (i, j, cts)
Calculate the correlation fn in SQL Can also do async parallel GPU jobs
7/30/2010
19
Tamás Budavári
Pair counts computed on GPU
Returns 2D histogram as a table (i, j, cts)
Calculate the correlation fn in SQL Can also do async parallel GPU jobs
7/30/2010
20
Tamás Budavári 7/30/2010
Exponential growth
Projects last 3-5 years, data sent upwards at the end Data will never be centralized
Most data at projects
More responsibility on projects Bring analysis close to the data
22
Tamás Budavári 7/30/2010
23
Tamás Budavári 7/30/2010
Metcalfe’s Law
Utility of computer networks grows as the
The Virtual Observatory
The federation of N astronomy archives has
24
Tamás Budavári 7/30/2010
Metadata standards Data discovery Data requests Data delivery Units Database queries Distributed applications Authentication and authorization
25
Tamás Budavári
7/30/2010
NVO Research 2002-2007 NSF ITR Program: $10M for 5 years 17 organizations: Astro, CS, IT VAO Facility 2010- NSF $20M for 5 years Operational phase!
26
7/30/2010
7/30/2010
Tamás Budavári
7/30/2010
29
Tamás Budavári 7/30/2010
VOTable
Universal container for tables (in XML) First VO standard (from the DTD era)
ConeSearch
Simple catalog access based on location First VO standard interface (http get)
Many implemented them!
30
Tamás Budavári 7/30/2010
Simple Image Access Protocol (SIAP)
Http request, similar to opening a web page Returns links to the matching images in votable Assumes we know how to deal with FITS images
Universal Content Descriptor (UCD)
Crystallized set of keywords from literature For data discovery – not queries
31
Tamás Budavári
Discovery Directory, Sky coverage Access Tables, Catalogs Images, Spectra Events Distributed Storage VOSpace Authentication Distributed Computing Web & Grid services VOStat Messaging SAMP, VOPipe User Interfaces Aladin Topcat Mirage, etc… 7/30/2010
32
Tamás Budavári
7/30/2010
34
Tamás Budavári
7/30/2010
35
Tamás Budavári
Collect info in VO
On a particular object Or a part of the sky GRBs, transients, etc.
VO plotting tools
FITS images Catalog data
And more… 7/30/2010
36
Tamás Budavári
Public repository
Search by keyword or eff Extract in various formats Register & submit yours
Web site
On-the-fly plotting Easy access to all
Web services
To code against 7/30/2010
37
Tamás Budavári
Public repository
SDSS, 2dF spectra, etc Spatial and SQL search Register & submit yours
Web site
On-the-fly plotting Building composites De-reddening Line analysis
Web services
7/30/2010
38
Tamás Budavári
SkyNode interface to archives
Implements ADQL returns VOTable Basic node understands “REGION” Full node understands “XMATCH”
SkyQuery portal
Knows the SkyNodes from Registry Understands federated query
39
Tamás Budavári 7/30/2010
Web Enabled Source-Identification with Crossmatching
Higher level astronomy services built on other existing VO services: SExtractor service and Open SkyQuery Result can be sent to plotting tool for quick inspection.
http://nvogre.astro.washington.edu:8080/wesix/
40
Tamás Budavári
7/30/2010
Enabling R
For VO data
41
Tamás Budavári
Discovery
42
Tamás Budavári
7/30/2010
43
Tamás Budavári
44
Tamás Budavári
7/30/2010
Simple HTTP requests
ConeSearch Simple Image Access
Standard SOAP and REST
Interoperable across platforms IVOA compliant XML messages Programming toolkits exist
46
Tamás Budavári
7/30/2010
VOTool
47
Tamás Budavári
7/30/2010
VOTool
48
Tamás Budavári
7/30/2010
Storage instances soon everywhere
Save intermediate data products Arrange for their transfer to other places VOPipe Chain VOSpaces for data flow between services Async execution of custom processing steps
50
Tamás Budavári
7/30/2010
51
More and Moore data: new opportunities
No central data store but at projects On-site processing: CPU + GPU
Hierarchical Services
Standardized interfaces Data federation
New “VxOs”
VaO: Virtual Astronomical Observatory VsO,
Tamás Budavári
7/30/2010
52
Tamás Budavári 7/30/2010
53