Network Query Engines Network Query Engines Craig Knoblock USC - PowerPoint PPT Presentation

Network Query Engines Network Query Engines Craig Knoblock USC Information Sciences Institute 1

Overview Overview • Network Query Engines • Tukwila, Telegraph, Niagara • Dataflow & pipelining similar to Theseus • Execution system with support for efficient query execution from remote data sources • Automatically generate query plans from XML queries • No support for loops, conditionals, or external interactions • Designed for querying only, not monitoring (except for NiagaraCQ) USC Information Sciences Institute ISI ISI 2

Tukwila (Ives et al. 1999) Tukwila (Ives et al. 1999) • Adaptive network query processing for XML data • Interleaved execution and optimization • Inter-operator adaptivity • Dynamic operator re-ordering based on events • Memory overflow, wrapper timeout • Notable new operators • X-SCAN: Efficient querying of streaming XML docs • JOIN: Double pipelined hash (probe is LHS or RHS) • DYNAMIC COLLECTOR: Efficient unioning of sources USC Information Sciences Institute ISI ISI 3

Tukwila – – Interleaved Planning Interleaved Planning Tukwila and Execution and Execution From Ives et al., SIGMOD’99 • Generates initial plan Fragment 1 • Can generate partial Hash Join plans and expand Materialize them later & Test East • Uses rules to decide Hash Join when to reoptimize Orders FedEx Fragment 0 WHEN end_of_fragment(0) IF card(result) > 100,000 USC Information Sciences Institute ISI ISI THEN re-optimize 4

Tukwila – – Adaptive Double Adaptive Double Tukwila Pipelined Hash Join Pipelined Hash Join From Ives et al., SIGMOD’99 Hybrid Hash Join Double Pipelined Hash Join � No output until inner read � Outputs data immediately � Asymmetric (inner vs. � Symmetric outer) � More memory USC Information Sciences Institute ISI ISI 5

Tukwila – – Dynamic Collector Op Dynamic Collector Op Tukwila From Ives et al., SIGMOD’99 • Smart union operator • Supports C • Timeouts • slow sources • overlapping sources Cust NY alt.books Reviews Times WHEN timeout(CustReviews) DO activate(NYTimes), activate(alt.books) USC Information Sciences Institute ISI ISI 6

Niagara ( Niagara (Naughton Naughton, DeWitt, et al. 2000) , DeWitt, et al. 2000) • Adaptive network query processing for XML data • Interleaved execution + document search • Supports streaming over blocking operators • Synchronization by re-evaluating operators or by propagating the differential result USC Information Sciences Institute ISI ISI 7

Execution with partial results Execution with partial results [Shanmugasundaram Shanmugasundaram et al. 2000] et al. 2000] [ • Niagara uses partial results to reduce the effects of blocking operators • Reduces blocking nature of aggregation or joins • Basic idea • Execute future operators as data streams in, refine as slow operators catch up • Execution is driven by the availability of real data • Results are refined as additional data are processed USC Information Sciences Institute ISI ISI 8

Approaches to Refining Results Approaches to Refining Results • Re-evaluation • As new data becomes available, the operators re- output the results and the downstream operators are re-executed • Can be costly, but simple to implement • Differential Algorithm • Each operator must support additions, deletes, and updates • Changed results must then be propagated to downstream operators USC Information Sciences Institute ISI ISI 9

Telegraph ( Telegraph (Hellerstein Hellerstein et al. 2000) et al. 2000) • Tuple-level adaptivity • Rivers (optimize horizontal parallelism) • Adaptive dataflow on clusters (ie, data partitioning) • Eddies (optimize vertical parallelism) • Leverage commutative property of query operators to dynamically route tuples for processing USC Information Sciences Institute ISI ISI 10

Adaptable Joins, Issue 1 Adaptable Joins, Issue 1 • Synchronization Barriers • One input frozen, × waiting for the other 2000 • Can’t adapt while waiting 2 2001 3 for barrier! 2002 4 • So, favor joins that have: 2003 5 • no barriers 2004 6 • at worst, adaptable barriers USC Information Sciences Institute ISI ISI 11

Adaptable Joins, Issue 2 Adaptable Joins, Issue 2 • Would like to reorder in-flight (pipelined) joins • Base case: swap inputs to a join • What about per-input state? • Moment of symmetry: • inputs can be swapped w/o state management • E.g. • Nested Loops: at the end of each inner loop • Merge Join: any time* • Hybrid or Grace Hash: never! • More frequent moments of symmetry � more frequent adaptivity USC Information Sciences Institute ISI ISI 12

Ripple Joins: Prime for Adaptivity Adaptivity Ripple Joins: Prime for • Ripple Joins • Pipelined hash join (a.k.a. hash ripple, Xjoin) • No synchronization barriers • Continuous symmetry • Good for equi-join • Simple (or block) ripple join • Synchronization barriers at “corners” • Moments of symmetry at “corners” • Good for non-equi-join R • Index nested loops • Short barriers • No symmetry S × USC Information Sciences Institute ISI ISI 13

Beyond Binary Joins Beyond Binary Joins • Think of swapping “inners” • Can be done at a global moment of symmetry • Intuition: like an n-ary join • Except that each pair can be joined by a different algorithm! • So… • Need to introduce n-ary joins to a traditional query engine USC Information Sciences Institute ISI ISI 14

Telegraph – – Beyond Reordering Joins Beyond Reordering Joins Telegraph From Avnur & Hellerstein, SIGMOD 2000 Eddy • A pipelining tuple-routing iterator (just like join or sort) • Adjusts flow adaptively • Tuples flow in different orders • Visit each op once before output • Naïve routing policy: • All ops fetch from eddy as fast as possible • Previously-seen tuples precede new tuples USC Information Sciences Institute ISI ISI 15

Discussion Discussion • Theseus, Tukwila, Telegraph, Niagara are all: • Streaming dataflow systems • Targeting network-based query processing • Large source latencies • Unknown characteristics of sources • Proposed various techniques for improving the efficiency of processing data • More efficient operators (e.g., double-pipelined join) • Tuple-level adaptivity • Partial results for blocking operators • Speculative execution USC Information Sciences Institute ISI ISI 16

Network Query Engines Network Query Engines Craig Knoblock USC - PowerPoint PPT Presentation

Network Query Engines Network Query Engines Craig Knoblock USC Information Sciences Institute 1 Overview Overview Network Query Engines Tukwila, Telegraph, Niagara Dataflow & pipelining similar to Theseus Execution

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Game Engines 1 Overview Game engines are a significant part of the modern games industry

Set 10 Search Engines & SEO Outline How do search engines work? Basic operation

Set 10 Search Engines & SEO Outline How do search engines work? Basic operation

Set11 Search Engines & SEO Outline How do search engines work? Basic operation

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

On Another Level: How to Debug Compiling Query Engines Timo Kersten and Thomas Neumann Technical

Peregrine: workload optimization for cloud query engines* Alekh Jindal Gray Systems Lab *

EPAs Air Quality Regulations for Stationary Engines for Stationary Engines Melanie King U.S.

NCC Education and You Study and Communication Skills Your Name Internet Search Engines Date

Why learn how to build recommendation engines? Jamen Long Data Scientist DataCamp Building

Imagine for a moment @trentmwillis Lazy Loading Engines: Anything But Lazy Engines allow

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

SQL Workshop Joins Doug Shook Inner Joins Joins are used to combine data from multiple

Relational Operators Select Evaluating Relational Operators: Project Part II Join

Module 14: Analyzing Queries Overview Queries That Use the AND Operator the OR

Implementing Multicore Real-Time Scheduling Algorithms Based on Task Splitting Using Ada 2012

Joining Ranked Input In Practice Ihab F. Ilyas Purdue University Joint work with Walid G. Aref

When to Optimize Enumerating all possible plans Selection Pushdown Join Conversion Join

Relational algebra with discriminative naively Relational algebra, joins and lazy products

An Accurate Join for Zonotopes, Preserving Affine Input/Output Relations Eric Goubault, Tristan

Network Query Engines Network Query Engines Craig Knoblock USC - PowerPoint PPT Presentation

Network Query Engines Network Query Engines Craig Knoblock USC Information Sciences Institute 1 Overview Overview Network Query Engines Tukwila, Telegraph, Niagara Dataflow & pipelining similar to Theseus Execution

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Game Engines 1 Overview Game engines are a significant part of the modern games industry

Set 10 Search Engines &amp; SEO Outline How do search engines work? Basic operation

Set 10 Search Engines &amp; SEO Outline How do search engines work? Basic operation

Set11 Search Engines &amp; SEO Outline How do search engines work? Basic operation

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

On Another Level: How to Debug Compiling Query Engines Timo Kersten and Thomas Neumann Technical

Peregrine: workload optimization for cloud query engines* Alekh Jindal Gray Systems Lab *

EPAs Air Quality Regulations for Stationary Engines for Stationary Engines Melanie King U.S.

NCC Education and You Study and Communication Skills Your Name Internet Search Engines Date

Why learn how to build recommendation engines? Jamen Long Data Scientist DataCamp Building

Imagine for a moment @trentmwillis Lazy Loading Engines: Anything But Lazy Engines allow

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

SQL Workshop Joins Doug Shook Inner Joins Joins are used to combine data from multiple

Relational Operators Select Evaluating Relational Operators: Project Part II Join

Module 14: Analyzing Queries Overview Queries That Use the AND Operator the OR

Implementing Multicore Real-Time Scheduling Algorithms Based on Task Splitting Using Ada 2012

Joining Ranked Input In Practice Ihab F. Ilyas Purdue University Joint work with Walid G. Aref

When to Optimize Enumerating all possible plans Selection Pushdown Join Conversion Join

Relational algebra with discriminative naively Relational algebra, joins and lazy products

An Accurate Join for Zonotopes, Preserving Affine Input/Output Relations Eric Goubault, Tristan

Set 10 Search Engines & SEO Outline How do search engines work? Basic operation

Set 10 Search Engines & SEO Outline How do search engines work? Basic operation

Set11 Search Engines & SEO Outline How do search engines work? Basic operation