VIRTUAL OBSERVATORY TECHNOLOGIES Tams Budavri / The Johns Hopkins - - PowerPoint PPT Presentation

virtual observatory
SMART_READER_LITE
LIVE PREVIEW

VIRTUAL OBSERVATORY TECHNOLOGIES Tams Budavri / The Johns Hopkins - - PowerPoint PPT Presentation

VIRTUAL OBSERVATORY TECHNOLOGIES Tams Budavri / The Johns Hopkins University 7/30/2010 Moores Law, Big Data! Tams Budavri 2 7/30/2010 Outline 3 Tams Budavri SQL for Big Data Computing where the bytes are Database


slide-1
SLIDE 1

VIRTUAL OBSERVATORY TECHNOLOGIES

Tamás Budavári / The Johns Hopkins University

7/30/2010

slide-2
SLIDE 2

Tamás Budavári 7/30/2010

Moore’s Law, Big Data!

2

slide-3
SLIDE 3

Tamás Budavári

Outline

7/30/2010

 SQL for Big Data

 Computing where the bytes are

 Database and GPU integration

 CUDA from SQL

 Data intensive Web services

 Behind the scenes

 Working examples

 Sloan Digital Sky Survey  Virtual Observatory tools and services

3

slide-4
SLIDE 4

Tamás Budavári

The Virtual Observatory

7/30/2010

“The Virtual Observatory is a framework that enables new astronomical research by greatly enhancing access to worldwide data and computing resources.” http://us-vo.org/

 How it works  How to build it  How to use it  What’s next

4

slide-5
SLIDE 5

Tamás Budavári

Hierarchy of Services

7/30/2010

 Atomic services Access to observations, simulations Access to models  Higher level services Combine for more functionality  User and analysis tools Can be a high level service, too

5

slide-6
SLIDE 6

Tamás Budavári

 Blobs: images, spectra, etc...  Access, transfer  Catalogs  Fast searches, indexes

Heterogeneous Datasets

7/30/2010

6

slide-7
SLIDE 7

Tamás Budavári

Structured Query Language

7/30/2010

7

 SQL`92 standard

 Almost in English SELECT <columns> FROM <table> WHERE <conditions>

 Astronomical Data Query Language

 An extended subset  GIS-like spatial

slide-8
SLIDE 8

Tamás Budavári

Structured Query Language

7/30/2010

8

 SQL`92 standard

 Almost in English SELECT RA, Dec FROM Stars WHERE r < 15

 Astronomical Data Query Language

 An extended subset  GIS-like spatial

slide-9
SLIDE 9

Tamás Budavári

Joining Tables

7/30/2010

9

 Sources in observations fields: 2 tables

SELECT f.FieldID, … s.ObjID, s.RA, s.Dec, … FROM Fields AS f INNER JOIN Sources AS s ON s.FieldID=f.FieldID WHERE f.ExpTime > 1000 AND s.Rmag > 16

slide-10
SLIDE 10

Tamás Budavári

Calculations in SQL

7/30/2010

10

 Computed columns

 Use J-H in SELECT and/or WHERE  Similarly functions, e.g., POWER(10,-0.4*Rmag)

 Grouping

SELECT FieldID, AVG(J), STDEV(J) FROM Sources GROUP BY FieldID

 Can use for histograming, etc…  E.g., SDSS Catalog Archive here

slide-11
SLIDE 11

Tamás Budavári

Surveys in Astronomy

11

 Sloan Digital Sky Survey 2001-2008 8TB Catalog Archive Server Custom tools and indices  Upcoming Surveys PanSTARRS: 100TB 2010- LSST: 1PB+ 201?

slide-12
SLIDE 12

Tamás Budavári

New Moore’s Law

 In the number of cores  Faster than ever (for now)

7/30/2010

12

slide-13
SLIDE 13

Tamás Budavári

New Programming Paradigm

 100s of cores – 27k parallel threads per GPU Running a billion threads a second  Forget the fancy old algorithms Built on wrong assumptions  Today CPU is free, RAM is slow GPU has >50GB/s bandwidth Still difficult to occupy the cores

7/30/2010

13

slide-14
SLIDE 14

Tamás Budavári

Hybrid Architecture

7/30/2010

launch launch run un

sync

14

slide-15
SLIDE 15

Tamás Budavári

Extending SQL Server

 Dedicated service for direct access

 Shared memory IPC w/ on-the-fly data transform

7/30/2010

SQL IPC

15

slide-16
SLIDE 16

Tamás Budavári

Extending SQL Server

 Dedicated service for direct access

 Shared memory IPC w/ on-the-fly data transform

7/30/2010

SQL IPC

16

slide-17
SLIDE 17

Tamás Budavári

Spatial Statistics

7/30/2010

 Correlation functions From pair-counts  State of the art Dual-tree traversal

 High resolution bins?

Just like brute force 8 bins

17

slide-18
SLIDE 18

Tamás Budavári

Sloan DR7

18

800800 bins

slide-19
SLIDE 19

Tamás Budavári

All Done Inside the Database

 Pair counts computed on GPU

 Returns 2D histogram as a table (i, j, cts)

 Calculate the correlation fn in SQL  Can also do async parallel GPU jobs

7/30/2010

19

slide-20
SLIDE 20

Tamás Budavári

All Done Inside the Database

 Pair counts computed on GPU

 Returns 2D histogram as a table (i, j, cts)

 Calculate the correlation fn in SQL  Can also do async parallel GPU jobs

7/30/2010

20

slide-21
SLIDE 21

Distributed Data

21

slide-22
SLIDE 22

Tamás Budavári 7/30/2010

Data at the Projects

 Exponential growth

 Projects last 3-5 years, data sent upwards at the end  Data will never be centralized

 Most data at projects

 More responsibility on projects  Bring analysis close to the data

22

slide-23
SLIDE 23

Tamás Budavári 7/30/2010

23

slide-24
SLIDE 24

Tamás Budavári 7/30/2010

Data Federation

 Metcalfe’s Law

 Utility of computer networks grows as the

number of possible connections: O(N2)

 The Virtual Observatory

 The federation of N astronomy archives has

utility O(N2), i.e. possibilities for making discoveries The whole is more than the sum of the parts

24

slide-25
SLIDE 25

Tamás Budavári 7/30/2010

Interoperability Challenges

 Metadata standards  Data discovery  Data requests  Data delivery  Units  Database queries  Distributed applications  Authentication and authorization

25

slide-26
SLIDE 26

Tamás Budavári

US National Virtual Observatory

7/30/2010

 NVO Research 2002-2007 NSF ITR Program: $10M for 5 years 17 organizations: Astro, CS, IT  VAO Facility 2010- NSF $20M for 5 years Operational phase!

http://us-vo.org/

26

slide-27
SLIDE 27

7/30/2010

http://ivoa.net/

slide-28
SLIDE 28

7/30/2010

http://ivoa.net/

slide-29
SLIDE 29

Tamás Budavári

IVOA Specifications

7/30/2010

29

slide-30
SLIDE 30

Tamás Budavári 7/30/2010

First Standards

 VOTable

 Universal container for tables (in XML)  First VO standard (from the DTD era)

 ConeSearch

 Simple catalog access based on location  First VO standard interface (http get)

 Many implemented them!

30

slide-31
SLIDE 31

Tamás Budavári 7/30/2010

Early Standards

 Simple Image Access Protocol (SIAP)

 Http request, similar to opening a web page  Returns links to the matching images in votable  Assumes we know how to deal with FITS images

 Universal Content Descriptor (UCD)

 Crystallized set of keywords from literature  For data discovery – not queries

31

slide-32
SLIDE 32

Tamás Budavári

Components

 Discovery  Directory, Sky coverage  Access  Tables, Catalogs  Images, Spectra  Events  Distributed Storage  VOSpace  Authentication  Distributed Computing  Web & Grid services  VOStat  Messaging  SAMP, VOPipe  User Interfaces  Aladin  Topcat  Mirage, etc… 7/30/2010

32

slide-33
SLIDE 33

VO Applications and Services

VO Examples

33

slide-34
SLIDE 34

Tamás Budavári

NVO Quick Start

7/30/2010

34

slide-35
SLIDE 35

Tamás Budavári

Ready, Steady…

7/30/2010

35

slide-36
SLIDE 36

Tamás Budavári

DataScope

 Collect info in VO

 On a particular object  Or a part of the sky  GRBs, transients, etc.

 VO plotting tools

 FITS images  Catalog data

 And more… 7/30/2010

36

slide-37
SLIDE 37

Tamás Budavári

Bandpass Services

 Public repository

 Search by keyword or eff  Extract in various formats  Register & submit yours

 Web site

 On-the-fly plotting  Easy access to all

 Web services

 To code against 7/30/2010

37

slide-38
SLIDE 38

Tamás Budavári

Spectrum Services

 Public repository

 SDSS, 2dF spectra, etc  Spatial and SQL search  Register & submit yours

 Web site

 On-the-fly plotting  Building composites  De-reddening  Line analysis

 Web services

7/30/2010

38

slide-39
SLIDE 39

Tamás Budavári

Open SkyQuery

 SkyNode interface to archives

 Implements ADQL returns VOTable  Basic node understands “REGION”  Full node understands “XMATCH”

 SkyQuery portal

 Knows the SkyNodes from Registry  Understands federated query

http://openskyquery.net/

39

slide-40
SLIDE 40

Tamás Budavári 7/30/2010

Web Enabled Source-Identification with Crossmatching

Higher level astronomy services built on other existing VO services: SExtractor service and Open SkyQuery Result can be sent to plotting tool for quick inspection.

WESIX

http://nvogre.astro.washington.edu:8080/wesix/

40

slide-41
SLIDE 41

Tamás Budavári

VOStat

7/30/2010

 Enabling R

 For VO data

41

slide-42
SLIDE 42

Tamás Budavári

Sky Coverage

 Discovery

42

slide-43
SLIDE 43

Tamás Budavári

Transients: VOEvent

7/30/2010

43

slide-44
SLIDE 44

Tamás Budavári

Help!

44

slide-45
SLIDE 45

Automated tools for analysis Advanced services

VO for Developers

45

slide-46
SLIDE 46

Tamás Budavári

Web Services

7/30/2010

 Simple HTTP requests

 ConeSearch  Simple Image Access

 Standard SOAP and REST

 Interoperable across platforms  IVOA compliant XML messages  Programming toolkits exist

46

slide-47
SLIDE 47

Tamás Budavári

Command Line: VO-CLI

7/30/2010

 VOTool

47

slide-48
SLIDE 48

Tamás Budavári

Command Line: VO-CLI

7/30/2010

 VOTool

48

slide-49
SLIDE 49

New features Better integration

Future

49

slide-50
SLIDE 50

Tamás Budavári

VOSpace 2.0

7/30/2010

 Storage instances soon everywhere

 Save intermediate data products  Arrange for their transfer to other places  VOPipe  Chain VOSpaces for data flow between services  Async execution of custom processing steps

50

slide-51
SLIDE 51

Tamás Budavári

Summary

7/30/2010

51

 More and Moore data: new opportunities

 No central data store but at projects  On-site processing: CPU + GPU

 Hierarchical Services

 Standardized interfaces  Data federation

 New “VxOs”

 VaO: Virtual Astronomical Observatory  VsO,

slide-52
SLIDE 52

Tamás Budavári

Sites to Explore

7/30/2010

52

slide-53
SLIDE 53

Tamás Budavári 7/30/2010

53