coordination free query evaluation and multi query
play

Coordination-free query evaluation and multi-query optimization in - PowerPoint PPT Presentation

Formal approaches to: Coordination-free query evaluation and multi-query optimization in parallel and distributed systems Bas Ketsman Outline CALM Formalization CALM Revision 1 Coordination-free evaluation Conclusion Parallel-Correctness


  1. Formal approaches to: Coordination-free query evaluation and multi-query optimization in parallel and distributed systems Bas Ketsman

  2. Outline CALM Formalization CALM Revision 1 Coordination-free evaluation Conclusion Parallel-Correctness Transferability Multi-Query optimization Conclusion 2 / 46

  3. for a setting where nodes have no information about the horizontal-distribution of records for settings where nodes have information about the horizontal-distribution of record [Ameloot, Neven, Van den Bussche, 2011]: TRUE [Zinn, Green, Ludäscher, 2012]: FALSE Introduction Context: Declarative Networking, where Datalog based languages are used for parallel and distributed computing in clusters with disordered communication. CALM-conjecture: No-coordination ? = Monotonicity [Hellerstein, 2010] 3 / 46

  4. for settings where nodes have information about the horizontal-distribution of record for a setting where nodes have no information about the horizontal-distribution of records [Zinn, Green, Ludäscher, 2012]: FALSE Introduction Context: Declarative Networking, where Datalog based languages are used for parallel and distributed computing in clusters with disordered communication. CALM-conjecture: No-coordination ? = Monotonicity [Hellerstein, 2010] [Ameloot, Neven, Van den Bussche, 2011]: TRUE 3 / 46

  5. for a setting where nodes have no information about the horizontal-distribution of records for settings where nodes have information about the horizontal-distribution of record Introduction Context: Declarative Networking, where Datalog based languages are used for parallel and distributed computing in clusters with disordered communication. CALM-conjecture: No-coordination ? = Monotonicity [Hellerstein, 2010] [Ameloot, Neven, Van den Bussche, 2011]: TRUE [Zinn, Green, Ludäscher, 2012]: FALSE 3 / 46

  6. for settings where nodes have information about the horizontal-distribution of record Introduction Context: Declarative Networking, where Datalog based languages are used for parallel and distributed computing in clusters with disordered communication. CALM-conjecture: No-coordination ? = Monotonicity [Hellerstein, 2010] [Ameloot, Neven, Van den Bussche, 2011]: TRUE ▶ for a setting where nodes have no information about the horizontal-distribution of records [Zinn, Green, Ludäscher, 2012]: FALSE 3 / 46

  7. Introduction Context: Declarative Networking, where Datalog based languages are used for parallel and distributed computing in clusters with disordered communication. CALM-conjecture: No-coordination ? = Monotonicity [Hellerstein, 2010] [Ameloot, Neven, Van den Bussche, 2011]: TRUE ▶ for a setting where nodes have no information about the horizontal-distribution of records [Zinn, Green, Ludäscher, 2012]: FALSE ▶ for settings where nodes have information about the horizontal-distribution of record 3 / 46

  8. Goal: To clarify the relation between monotonicity and coordination in asynchronous systems and to reveal the more complete picture

  9. Outline CALM Formalization CALM Revision 1 Coordination-free evaluation Conclusion Parallel-Correctness Transferability Multi-Query optimization Conclusion 5 / 46

  10. Example : Select triangles in a graph : Select open triangles in a graph Monotonicity Definition A query Q is monotone if Q ( I ) ⊆ Q ( I ∪ J ) for all database instances I and J . Notation: M = class of monotone queries 6 / 46

  11. Monotonicity Definition A query Q is monotone if Q ( I ) ⊆ Q ( I ∪ J ) for all database instances I and J . Notation: M = class of monotone queries Example ▶ Q ∆ : Select triangles in a graph ∈ M ▶ Q < : Select open triangles in a graph ̸∈ M 6 / 46

  12. Monotonicity Definition A query Q is monotone if Q ( I ) ⊆ Q ( I ∪ J ) for all database instances I and J . Notation: M = class of monotone queries Example ▶ Q ∆ : Select triangles in a graph ∈ M ▶ Q < : Select open triangles in a graph ̸∈ M 6 / 46

  13. Monotonicity Definition A query Q is monotone if Q ( I ) ⊆ Q ( I ∪ J ) for all database instances I and J . Notation: M = class of monotone queries Example ▶ Q ∆ : Select triangles in a graph ∈ M ▶ Q < : Select open triangles in a graph ̸∈ M 6 / 46

  14. Semantics defined in terms of runs over a transition system Relational Transducer Networks [Ameloot, Neven, Van den Bussche, 2011] ▶ Network N = { x, y, u, z } ▶ Transducer Π ▶ messages can be arbitrarily delayed but never get lost 7 / 46

  15. Relational Transducer Networks [Ameloot, Neven, Van den Bussche, 2011] ▶ Network N = { x, y, u, z } ▶ Transducer Π ▶ messages can be arbitrarily delayed but never get lost Semantics defined in terms of runs over a transition system 7 / 46

  16. Eventual Consistent Query Evaluation Definition A transducer Π computes a query Q if ▶ for all networks N , Network independent ▶ for all databases I , Distribution independent ▶ for all horizontal distributions H , and ▶ for every run of Π , out (Π) = Q ( I ) . Consistency requirement 8 / 46

  17. Algorithm: Broadcast all data output triangles whenever new data arrives Extremely naive, but works .. and is coordination-free! Example: Q ∆ : select all triangles 9 / 46

  18. Algorithm: Broadcast all data output triangles whenever new data arrives Extremely naive, but works .. and is coordination-free! Example: Q ∆ : select all triangles 9 / 46

  19. Algorithm: Broadcast all data output triangles whenever new data arrives Extremely naive, but works .. and is coordination-free! Example: Q ∆ : select all triangles 9 / 46

  20. Extremely naive, but works .. and is coordination-free! Example: Q ∆ : select all triangles Algorithm: ▶ Broadcast all data ▶ output triangles whenever new data arrives 9 / 46

  21. Example: Q ∆ : select all triangles Algorithm: ▶ Broadcast all data ▶ output triangles whenever new data arrives Extremely naive, but works .. and is coordination-free! 9 / 46

  22. Coordination is needed to reason about the absence of records. Example: Q < : select all open triangles 10 / 46

  23. Coordination is needed to reason about the absence of records. Example: Q < : select all open triangles ? 10 / 46

  24. Coordination is needed to reason about the absence of records. Example: Q < : select all open triangles no no ? ? ? 10 / 46

  25. Example: Q < : select all open triangles no no ? ? Coordination is needed to reason about the absence of records. 10 / 46

  26. Definition is coordination-free if for all inputs I there is a distribution on which computes I without having to do communication. [Ameloot, Neven, Van den Bussche, 2011] Coordination-freeness Goal: separate data-communication from coordination-communication 11 / 46

  27. Coordination-freeness Goal: separate data-communication from coordination-communication Definition Π is coordination-free if for all inputs I there is a distribution on which Π computes Q ( I ) without having to do communication. [Ameloot, Neven, Van den Bussche, 2011] 11 / 46

  28. Algorithm: Output triangles whenever new data arrives Example: Ideal Distribution Q ∆ : select all triangles 12 / 46

  29. Example: Ideal Distribution Q ∆ : select all triangles Algorithm: ▶ Broadcast all data ▶ Output triangles whenever new data arrives 12 / 46

  30. Example: Ideal Distribution Q ∆ : select all triangles Algorithm: ▶ (Broadcast all data) ▶ Output triangles whenever new data arrives 12 / 46

  31. CALM-conjecture [Ameloot, Neven, Van den Bussche, 2011] A query has a coordination-free and eventually consistent execution strategy iff the query is monotone Theorem F 0 = M Definition F 0 = set of queries which are distributedly computed by coordination-free transducers 13 / 46

  32. Outline CALM Formalization CALM Revision 1 Coordination-free evaluation Conclusion Parallel-Correctness Transferability Multi-Query optimization Conclusion 14 / 46

  33. Policy-aware Transducers “Distribution policy” 15 / 46

  34. Policy-aware Transducers . . . . . . “Distribution policy” . . . 15 / 46

  35. Policy-aware Transducers Deduction rules ▶ in local database ⇒ in global database ▶ not in local database + in scope ⇒ not in global database ▶ not in local database + not in scope ⇒ unknown 16 / 46

  36. Policy-aware Transducers . . . . . . “Distribution policy” . . . ? 17 / 46

  37. Policy-aware Transducers . . . . . . “Distribution policy” . . . 17 / 46

  38. Policy-aware Transducers [Zinn, Green, Ludäscher, 2012] Definition A distribution policy P for σ and N is a total function from facts ( σ ) to the power set of N . Definition A policy-aware transducer is a transducer with access to P restricted to its active domain Definition F 1 = set of queries which are distributedly computed by policy-aware coordination-free transducers 18 / 46

  39. Domain-distinct-monotonicity Definition A fact f is domain distinct from instance I when adom ( f ) ̸⊆ adom ( I ) . Example � f f ′ I 19 / 46

  40. Domain-distinct-monotonicity Definition A query Q is domain-distinct-monotone if Q ( I ) ⊆ Q ( I ∪ J ) for all I and J , with J having only domain-distinct facts Notation: M distinct = domain-distinct-monotone queries M M distinct Remark M distinct : class of queries preserved under extensions 20 / 46

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend