Qserv: parallel, distributed SQL query service for the LSST sky catalog
Daniel L. Wang
SLAC National Accelerator Laboratory, Menlo Park, CA, USA
27 January 2015
D.L.Wang (SLAC) 27 January 2015 1 / 27
Qserv: parallel, distributed SQL query service for the LSST sky - - PowerPoint PPT Presentation
Qserv: parallel, distributed SQL query service for the LSST sky catalog Daniel L. Wang SLAC National Accelerator Laboratory, Menlo Park, CA, USA 27 January 2015 D.L.Wang (SLAC) 27 January 2015 1 / 27 Outline Background 1 Qserv Overview 2
SLAC National Accelerator Laboratory, Menlo Park, CA, USA
D.L.Wang (SLAC) 27 January 2015 1 / 27
1
2
3
4
D.L.Wang (SLAC) 27 January 2015 2 / 27
Image credit: LSST Corp / NOAO D.L.Wang (SLAC) 27 January 2015 3 / 27
D.L.Wang (SLAC) 27 January 2015 4 / 27
Table name # rows row size footprint Object 38 × 109 1.7kB 64TB Object Extra 38 × 109 26kB 1PB Source (detections) 6.3 × 1012 0.56kB 3.5PB ForcedSource (expected det.) 38 × 1012 40B 1.5PB
⋉Object → O(1.4 × 1021) [1.4 zetta!] Object⋊ ⋉ForcedSource→O(1.4 × 1024) [1.4 yotta!]
D.L.Wang (SLAC) 27 January 2015 5 / 27
supervisor supervisor
mysql client czar mysqlproxy mysqld daemon executor xrootd client (ssi) Lua script XRootD mgr xrootd cmsd worker mysqld ssi/ofs xrootd cmsd pathpublish service Central state system zookeeper
D.L.Wang (SLAC) 27 January 2015 6 / 27
D.L.Wang (SLAC) 27 January 2015 7 / 27
D.L.Wang (SLAC) 27 January 2015 8 / 27
Small area → interactive, avoid involving many workers Large area → spread load over many workers
SELECT * FROM Object o1, Object o2 WHERE scisql angSep(o1.ra, o1.decl, o2.ra, o2.decl) < R; Avoid all-to-all comms, quadratic scaling.
D.L.Wang (SLAC) 27 January 2015 9 / 27
D.L.Wang (SLAC) 27 January 2015 10 / 27
(invalid resource)
(different queries on one resource)
(slow queries)
(results only on “write” server)
D.L.Wang (SLAC) 27 January 2015 11 / 27
requests
value), mediate input/output transfer
XrdSsiService::Resource : request to a service XrdSsiRequest : request payload and result callback
XrdSsiService : answer resource requests over the wire XrdSsiSession : request processor for requests, callbacks XrdSsiResponder : transport abstraction
D.L.Wang (SLAC) 27 January 2015 12 / 27
D.L.Wang (SLAC) 27 January 2015 13 / 27
D.L.Wang (SLAC) 27 January 2015 14 / 27
D.L.Wang (SLAC) 27 January 2015 15 / 27
reliability/fault-recovery parallelism, distribution data: sequential access asynchronous operations
D.L.Wang (SLAC) 27 January 2015 16 / 27
D.L.Wang (SLAC) 27 January 2015 17 / 27
See Shared Scans
D.L.Wang (SLAC) 27 January 2015 18 / 27
Parse/analyze/generate chunk queries, dispatch queries Accumulate results in mysqld as they are ready Perform aggregation (as appropriate) Signal result ready
D.L.Wang (SLAC) 27 January 2015 19 / 27
SELECT * FROM Object WHERE areaspec circle(3,4,0.02) AND rFlux < 0.03 LIMIT 2;
SQL query string
Parsed query structure
1 2 ... n
1 2 ... n
table substitution Parallel/merge query structures Partitioned query strings
D.L.Wang (SLAC) 27 January 2015 20 / 27
D.L.Wang (SLAC) 27 January 2015 21 / 27
D.L.Wang (SLAC) 27 January 2015 22 / 27
D.L.Wang (SLAC) 27 January 2015 23 / 27
scisql ptInConvexPoly(ra PS,decl PS,2,2,3,5,4,10,4,0,3,1)=1;
D.L.Wang (SLAC) 27 January 2015 24 / 27
IN (2112525,123125)
D.L.Wang (SLAC) 27 January 2015 25 / 27
Object <chunk#> <subchunk#> SELECT * FROM LSST.Object <chunk#> WHERE subChunkId=<subchunk#>;
SELECT * FROM Object <chunk#> <subchunk#> o1, Object <chunk#> <subchunk#> o2 WHERE ... SELECT * FROM Object <chunk#> <subchunk#> o1, ObjectFullOverlap <chunk#> <subchunk#> o2 WHERE ... ... (each subchunk)
D.L.Wang (SLAC) 27 January 2015 26 / 27
SELECT * FROM Object <chunk#> <subchunk#> o, RefMatch <chunk#> <subchunk#> m, RefObject <chunk#> <subchunk#> r WHERE o.objectId = m.objectId AND m.refObjectId = r.refObjectId; SELECT * FROM Object <chunk#> <subchunk#> o, RefMatch <chunk#> <subchunk#> m, RefObjectFullOverlap <chunk#> <subchunk#> r WHERE
r.refObjectId;
D.L.Wang (SLAC) 27 January 2015 27 / 27