virtual observatory
play

VIRTUAL OBSERVATORY TECHNOLOGIES Tams Budavri / The Johns Hopkins - PowerPoint PPT Presentation

VIRTUAL OBSERVATORY TECHNOLOGIES Tams Budavri / The Johns Hopkins University 7/30/2010 Moores Law, Big Data! Tams Budavri 2 7/30/2010 Outline 3 Tams Budavri SQL for Big Data Computing where the bytes are Database


  1. VIRTUAL OBSERVATORY TECHNOLOGIES Tamás Budavári / The Johns Hopkins University 7/30/2010

  2. Moore’s Law, Big Data! Tamás Budavári 2 7/30/2010

  3. Outline 3 Tamás Budavári  SQL for Big Data  Computing where the bytes are  Database and GPU integration  CUDA from SQL  Data intensive Web services  Behind the scenes  Working examples  Sloan Digital Sky Survey  Virtual Observatory tools and services 7/30/2010

  4. The Virtual Observatory 4 Tamás Budavári “The Virtual Observatory is a framework that enables new astronomical research by greatly enhancing access to worldwide data and computing resources.” http://us-vo.org/  How it works  How to build it  How to use it  What’s next 7/30/2010

  5. Hierarchy of Services 5 Tamás Budavári  Atomic services  Access to observations, simulations  Access to models  Higher level services  Combine for more functionality  User and analysis tools  Can be a high level service, too 7/30/2010

  6. Heterogeneous Datasets 6 Tamás Budavári  Blobs: images, spectra, etc...  Access, transfer  Catalogs  Fast searches, indexes 7/30/2010

  7. Structured Query Language 7 Tamás Budavári  SQL`92 standard  Almost in English SELECT <columns> FROM <table> WHERE <conditions>  Astronomical Data Query Language  An extended subset  GIS-like spatial 7/30/2010

  8. Structured Query Language 8 Tamás Budavári  SQL`92 standard  Almost in English SELECT RA, Dec FROM Stars WHERE r < 15  Astronomical Data Query Language  An extended subset  GIS-like spatial 7/30/2010

  9. Joining Tables 9 Tamás Budavári  Sources in observations fields: 2 tables SELECT f.FieldID , … s.ObjID, s.RA, s.Dec , … FROM Fields AS f INNER JOIN Sources AS s ON s.FieldID=f.FieldID WHERE f.ExpTime > 1000 AND s.Rmag > 16 7/30/2010

  10. Calculations in SQL 10 Tamás Budavári  Computed columns  Use J-H in SELECT and/or WHERE  Similarly functions, e.g., POWER(10,-0.4*Rmag)  Grouping SELECT FieldID, AVG(J), STDEV(J) FROM Sources GROUP BY FieldID  Can use for histograming , etc…  E.g., SDSS Catalog Archive here 7/30/2010

  11. Surveys in Astronomy 11 Tamás Budavári  Sloan Digital Sky Survey 2001-2008  8TB Catalog Archive Server  Custom tools and indices  Upcoming Surveys  PanSTARRS: 100TB 2010-  LSST: 1PB+ 201?

  12. New Moore’s Law 12 Tamás Budavári  In the number of cores  Faster than ever (for now) 7/30/2010

  13. New Programming Paradigm 13 Tamás Budavári  100s of cores – 27k parallel threads per GPU  Running a billion threads a second  Forget the fancy old algorithms  Built on wrong assumptions  Today CPU is free, RAM is slow  GPU has >50GB/s bandwidth  Still difficult to occupy the cores 7/30/2010

  14. Hybrid Architecture 14 Tamás Budavári run un launch launch sync 7/30/2010

  15. Extending SQL Server 15 Tamás Budavári  Dedicated service for direct access  Shared memory IPC w/ on-the-fly data transform IPC SQL 7/30/2010

  16. Extending SQL Server 16 Tamás Budavári  Dedicated service for direct access  Shared memory IPC w/ on-the-fly data transform IPC SQL 7/30/2010

  17. Spatial Statistics 17 Tamás Budavári  Correlation functions  From pair-counts 8 bins  State of the art  Dual-tree traversal  High resolution bins?  Just like brute force 7/30/2010

  18. Sloan DR7 800  800 bins 18 Tamás Budavári

  19. All Done Inside the Database 19 Tamás Budavári  Pair counts computed on GPU  Returns 2D histogram as a table (i, j, cts)  Calculate the correlation fn in SQL  Can also do async parallel GPU jobs 7/30/2010

  20. All Done Inside the Database 20 Tamás Budavári  Pair counts computed on GPU  Returns 2D histogram as a table (i, j, cts)  Calculate the correlation fn in SQL  Can also do async parallel GPU jobs 7/30/2010

  21. Distributed Data 21

  22. Data at the Projects 22 Tamás Budavári  Exponential growth  Projects last 3-5 years, data sent upwards at the end  Data will never be centralized  Most data at projects  More responsibility on projects  Bring analysis close to the data 7/30/2010

  23. 23 Tamás Budavári 7/30/2010

  24. Data Federation 24 Tamás Budavári  Metcalfe ’ s Law  Utility of computer networks grows as the number of possible connections: O(N 2 )  The Virtual Observatory  The federation of N astronomy archives has utility O(N 2 ), i.e. possibilities for making discoveries The whole is more than the sum of the parts 7/30/2010

  25. Interoperability Challenges 25 Tamás Budavári  Metadata standards  Data discovery  Data requests  Data delivery  Units  Database queries  Distributed applications  Authentication and authorization 7/30/2010

  26. US National Virtual Observatory 26 Tamás Budavári  NVO Research 2002-2007  NSF ITR Program: $10M for 5 years  17 organizations: Astro, CS, IT  VAO Facility 2010-  NSF $20M for 5 years  Operational phase! http://us-vo.org/ 7/30/2010

  27. http://ivoa.net/ 7/30/2010

  28. http://ivoa.net/ 7/30/2010

  29. IVOA Specifications 29 Tamás Budavári 7/30/2010

  30. First Standards 30 Tamás Budavári  VOTable  Universal container for tables (in XML)  First VO standard (from the DTD era)  ConeSearch  Simple catalog access based on location  First VO standard interface (http get)  Many implemented them! 7/30/2010

  31. Early Standards 31 Tamás Budavári  Simple Image Access Protocol (SIAP)  Http request, similar to opening a web page  Returns links to the matching images in votable  Assumes we know how to deal with FITS images  Universal Content Descriptor (UCD)  Crystallized set of keywords from literature  For data discovery – not queries 7/30/2010

  32. Components 32 Tamás Budavári  Discovery  Distributed Computing  Directory, Sky coverage  Web & Grid services  VOStat  Access  Messaging  Tables, Catalogs  Images, Spectra  SAMP, VOPipe  Events  User Interfaces  Distributed Storage  Aladin  VOSpace  Topcat  Authentication  Mirage, etc… 7/30/2010

  33. VO Examples 33 VO Applications and Services

  34. NVO Quick Start 34 Tamás Budavári 7/30/2010

  35. Ready, Steady… 35 Tamás Budavári 7/30/2010

  36. DataScope 36 Tamás Budavári  Collect info in VO  On a particular object  Or a part of the sky  GRBs, transients, etc.  VO plotting tools  FITS images  Catalog data  And more … 7/30/2010

  37. Bandpass Services 37 Tamás Budavári  Public repository  Search by keyword or  eff  Extract in various formats  Register & submit yours  Web site  On-the-fly plotting  Easy access to all  Web services  To code against 7/30/2010

  38. Spectrum Services 38 Tamás Budavári  Public repository  SDSS, 2dF spectra, etc  Spatial and SQL search  Register & submit yours  Web site  On-the-fly plotting  Building composites  De-reddening  Line analysis  Web services 7/30/2010

  39. Open SkyQuery 39 Tamás Budavári  SkyNode interface to archives  Implements ADQL returns VOTable  Basic node understands “ REGION ”  Full node understands “ XMATCH ”  SkyQuery portal  Knows the SkyNodes from Registry  Understands federated query http://openskyquery.net/

  40. WESIX 40 Tamás Budavári Web Enabled Source-Identification with Crossmatching Higher level astronomy services built on other existing VO services: SExtractor service and Open SkyQuery Result can be sent to plotting tool for quick inspection. http://nvogre.astro.washington.edu:8080/wesix/ 7/30/2010

  41. VOStat 41 Tamás Budavári  Enabling R  For VO data 7/30/2010

  42. Sky Coverage 42 Tamás Budavári  Discovery

  43. Transients: VOEvent 43 Tamás Budavári 7/30/2010

  44. Help! 44 Tamás Budavári

  45. VO for Developers 45 Automated tools for analysis Advanced services

  46. Web Services 46 Tamás Budavári  Simple HTTP requests  ConeSearch  Simple Image Access  Standard SOAP and REST  Interoperable across platforms  IVOA compliant XML messages  Programming toolkits exist 7/30/2010

  47. Command Line: VO-CLI 47 Tamás Budavári  VOTool 7/30/2010

  48. Command Line: VO-CLI 48 Tamás Budavári  VOTool 7/30/2010

  49. Future 49 New features Better integration

  50. VOSpace 2.0 50 Tamás Budavári  Storage instances soon everywhere  Save intermediate data products  Arrange for their transfer to other places  VOPipe  Chain VOSpaces for data flow between services  Async execution of custom processing steps 7/30/2010

  51. Summary 51 Tamás Budavári  More and Moore data: new opportunities  No central data store but at projects  On-site processing: CPU + GPU  Hierarchical Services  Standardized interfaces  Data federation  New “ VxOs ”  VaO: Virtual Astronomical Observatory  VsO, 7/30/2010

  52. Sites to Explore 52 Tamás Budavári 7/30/2010

  53. 53 Tamás Budavári 7/30/2010

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend