Engineering Aggregation Operators for Relational In-Memory Database - PowerPoint PPT Presentation

Engineering Aggregation Operators for Relational In-Memory Database Systems Ingo Müller – PhD defense – February 11, 2016 Institute of Theoretical Informatics, Algorithmics II, Department of Informatics In cooperation with SAP SE www.kit.edu KIT – The Research University in the Helmholtz Association

Introduction – The Race of Database Systems Data growth [RG12] Hardware evolution [Bui12] +60%/yr 1000000 1000 Size of the Digital Universe [EiB] +80%/yr Relative Performance 100000 100 10000 gap Database systems 10 +9%/yr 1000 100 1 Time Time Trend 1: data volumes increase exponentially (or faster) Trend 2: compute power increases exponentially But also more and more complex, for example memory access Database systems are in a continuous race to translate Moore‘s law. Ingo Müller – PhD Defense 2 Feb. 11, 2016 Institute of Theoretical Informatics, Algorithmics II Engineering Aggregation Operators for Relational In-Memory Database Systems Department of Informatics

Introduction – Grouping with Aggregation Input (Sales) Output Store Item Price 1.00 € 1.00 € Berlin Berlin pen Store Item 3.00 € 3.00 € Berlin Berlin paper 3.00 € Paris 2.00 € 2.00 € Paris Paris ruler 3.00 € Vienna 1.00 € 1.00 € Berlin Berlin pen 5.00 € Berlin 1.00 € 1.00 € Paris Paris pen 3.00 € 3.00 € Vienna Vienna paper What is the sum of the prices of all sold items per store? SELECT Store, SUM (Price) AS Sum FROM Sales GROUP BY Store Ingo Müller – PhD Defense 3 Feb. 11, 2016 Institute of Theoretical Informatics, Algorithmics II Engineering Aggregation Operators for Relational In-Memory Database Systems Department of Informatics

Challenges and Overview Cache efficiency *  lower bound + (optimal) recursive algorithm Optimizer independence *  adaptive execution strategy Memory constraint * [SIGMOD15]  intra-operator pipelining *  low-level tuning of inner loops CPU friendliness *  work stealing Parallelism *  robust algorithm design Skewed data distribution  adaptive pre-aggregation Communication efficiency *  compatible with major DB architectures System integration Result: up to 3.7x faster and robust enough for use in production. Ingo Müller – PhD Defense 4 Feb. 11, 2016 Institute of Theoretical Informatics, Algorithmics II Engineering Aggregation Operators for Relational In-Memory Database Systems Department of Informatics

Challenge: Cache Efficiency – Motivation Two textbook algorithms: Hash-Aggregation Insert every row into hash map with grouping attributes as key Aggregate to existing intermediate result Sort-Aggregation Sort input by grouping attributes Aggregate consecutive rows in a single pass M = cache size B = block size N = input size K = output size Can we do better? Long standing conjecture: no! Ingo Müller – PhD Defense 5 Feb. 11, 2016 Institute of Theoretical Informatics, Algorithmics II Engineering Aggregation Operators for Relational In-Memory Database Systems Department of Informatics

External Memory Model – Proof Techniques Known lower bounds for Aggregation N input records Based on comparisons [MR91,AK+93] K output records  Do not hold for Hashing! Proof technique [AV88,Gre12] Count the number of possible permutations after t transfers block of B Compare with possible number of records input permutations cache of M records Modifications for Aggregation Allow semi-group operation in cache Count “permutations” as before “external” memory Ingo Müller – PhD Defense 6 Feb. 11, 2016 Institute of Theoretical Informatics, Algorithmics II Engineering Aggregation Operators for Relational In-Memory Database Systems Department of Informatics

External Memory Model – Result Lower bound* for Aggregation 𝑂 𝑂 𝐿 𝐿 𝑄𝐶 log 𝑁 𝐶 log 𝑁 block transfers 𝐶 𝐶 𝐶 𝐶 *simplified asymptotic worst case Same bound as for Sorting Multisets [AK+93] M = cache size B = block size N = input size K = output size We confirm: Aggregation is as hard as Sorting!  Use as guideline. Ingo Müller – PhD Defense 7 Feb. 11, 2016 Institute of Theoretical Informatics, Algorithmics II Engineering Aggregation Operators for Relational In-Memory Database Systems Department of Informatics

Outline Cache efficiency  lower bound  (optimal) recursive algorithm Optimizer independence  adaptive execution strategy Memory constraint  intra-operator pipelining Ingo Müller – PhD Defense 8 Feb. 11, 2016 Institute of Theoretical Informatics, Algorithmics II Engineering Aggregation Operators for Relational In-Memory Database Systems Department of Informatics

Challenge: Adaptivity – Motivation Traditional approach [Gra93] Implement HashAggregation and SortAggregation Optimizer selects implementation based on statistics beforehand Problem Wrong statistics may lead to suboptimal performance M = cache size B = block size N = input size K = output size Our goal: adaptively switch between Hashing and Sorting during execution. Ingo Müller – PhD Defense 9 Feb. 11, 2016 Institute of Theoretical Informatics, Algorithmics II Engineering Aggregation Operators for Relational In-Memory Database Systems Department of Informatics

Adaptivity – Mixing Hashing and Sorting Recursive algorithm: In each level of recursion: mix Hashing and Sorting adaptively Partitioning recurses when necessary Hashing ends recursion when possible efficiently Ingo Müller – PhD Defense 10 Feb. 11, 2016 Institute of Theoretical Informatics, Algorithmics II Engineering Aggregation Operators for Relational In-Memory Database Systems Department of Informatics

Adaptivity – Mixing Hashing and Sorting Our mechanism achieves the best of Hashing and Sorting. Ingo Müller – PhD Defense 11 Feb. 11, 2016 Institute of Theoretical Informatics, Algorithmics II Engineering Aggregation Operators for Relational In-Memory Database Systems Department of Informatics

Evaluation – Comparison with Prior Work 3.7x 2 Xeon E7-8870 CPUs (each 10 cores) N = 2 32 , uniform distribution Original implementation of [CR07,YR+11] Efficient recursive processing is crucial for large outputs. Ingo Müller – PhD Defense 12 Feb. 11, 2016 Institute of Theoretical Informatics, Algorithmics II Engineering Aggregation Operators for Relational In-Memory Database Systems Department of Informatics

Outline Cache efficiency  lower bound  (optimal) recursive algorithm Optimizer independence  adaptive execution strategy Memory constraint  intra-operator pipelining Ingo Müller – PhD Defense 13 Feb. 11, 2016 Institute of Theoretical Informatics, Algorithmics II Engineering Aggregation Operators for Relational In-Memory Database Systems Department of Informatics

Memory Constraint – Intra-Operator Pipelining Split work into blocks Recycle free blocks Limit number of blocks Interleave/Overlap processing levels Pipelining allows to limit the amount of intermediate memory. Ingo Müller – PhD Defense 14 Feb. 11, 2016 Institute of Theoretical Informatics, Algorithmics II Engineering Aggregation Operators for Relational In-Memory Database Systems Department of Informatics

Memory Constraint – Intra-Operator Scheduling PQ PQ Balance In which level to work?  Heuristic: target 50% memory usage On which partition to work?  Priority queue on partition length Ingo Müller – PhD Defense Ingo Müller – PhD Defense 15 Feb. 11, 2016 Institute of Theoretical Informatics, Algorithmics II Engineering Aggregation Operators for Relational In-Memory Database Systems Engineering Aggregation Operators for Relational In-Memory Database Systems Department of Informatics

Memory Constraint – Evaluation 2x 1.2% of unconstraint Input size = 16GiB, memory constraint = 256MiB Input size = 16GiB, K = 2 23 Performance basically preserved (for moderate result sizes) Trade-off between memory usage and performance Cache efficiency can be achieved under memory constraint. Ingo Müller – PhD Defense 16 Feb. 11, 2016 Institute of Theoretical Informatics, Algorithmics II Engineering Aggregation Operators for Relational In-Memory Database Systems Department of Informatics

Summary Cache efficiency *  lower bound + (optimal) recursive algorithm Optimizer independence *  adaptive execution strategy Memory constraint * [SIGMOD15]  intra-operator pipelining *  low-level tuning of inner loops CPU friendliness *  work stealing Parallelism *  robust algorithm design Skewed data distribution  adaptive pre-aggregation Communication efficiency *  compatible with major DB architectures System integration Thank you! Questions? Ingo Müller – PhD Defense 17 Feb. 11, 2016 Institute of Theoretical Informatics, Algorithmics II Engineering Aggregation Operators for Relational In-Memory Database Systems Department of Informatics

Engineering Aggregation Operators for Relational In-Memory Database - PowerPoint PPT Presentation

Engineering Aggregation Operators for Relational In-Memory Database Systems Ingo Mller PhD defense February 11, 2016 Institute of Theoretical Informatics, Algorithmics II, Department of Informatics In cooperation with SAP SE

VHDL VHDL - Flaxer Eli Ch 5 - 1 Operators and Attributes Outline Logical Operators

More Self-study Operators Unary operators, sizeof, boolean operators, comma, and operators

Elmwood Park: Electricity Aggregation Developing an Opt-In Municipal Aggregation Program to

simplifying the customer experience through account aggregation Sim Sangha Business Development

The Axiomatic Method in Social Choice Theory: Preference Aggregation, Judgment Aggregation, Graph

Tourism in in Tajikistan AS SEEN BY TOUR OPERATORS DUSHANBE, OCTOBER 16, 2019 Tour operators su

Operators of Kolmogorov type and parabolic operators associated with non-commuting vector fields:

Assignment and Arithmetic Operators http://cs.mst.edu What are operators? Operators allow us

More Self-study Operators Unary operators, sizeof, boolean

C C Community Choice Aggregation: Community Choice Aggregation: i i Ch i Ch i i i Progress

White Manipulation in Judgment Aggregation Gabriella Pigozzi Davide Grossi ILLC Amsterdam

Municipal Aggregation Update Village of Kenilworth May 16, 2012 & May 21, 2012 Outline

Implementation and Modeling of Robot Aggregation Behavior in Webots Todesco Laetitia & Rappo

Spatial aggregation and optimal Spatial aggregation and optimal p p gg gg g g p p reserve

Understanding Municipal Aggregation Programs Presented by: Matthew C. Benoit, Town Planner

Judgment Aggregation and Collective Annotation Ulle Endriss Institute for Logic, Language and

DME for Peace Thursday talk Follow that car! Telling the story of conflict & peacebuilding

Our research COVID-19 and What we found advocacy What it might mean for you Advocacy Hub We

FAIR PAY IN INGOS January 17, 2018 PROJECT TEAM University of Edinburgh Ishbel

Gallai-Ramsey Number of Graphs Yaping Mao School of Mathematics and Statistics Qinghai Normal

(NB. some images removed.. ) Articles: Arunima S. Mukherjee, Margunn Aanestad, Sundeep

Neutron matter based on chiral effective field theory interactions Ingo Tews, Technische

Calibrated Surrogate Losses for Adversarially Robust Classification 1 The University of Tokyo

Nuclear PDFs & leptonnucleon scattering From Quarks to Hadrons Fred Olness SMU Thanks

Engineering Aggregation Operators for Relational In-Memory Database - PowerPoint PPT Presentation

Engineering Aggregation Operators for Relational In-Memory Database Systems Ingo Mller PhD defense February 11, 2016 Institute of Theoretical Informatics, Algorithmics II, Department of Informatics In cooperation with SAP SE

VHDL VHDL - Flaxer Eli Ch 5 - 1 Operators and Attributes Outline Logical Operators

More Self-study Operators Unary operators, sizeof, boolean operators, comma, and operators

Elmwood Park: Electricity Aggregation Developing an Opt-In Municipal Aggregation Program to

simplifying the customer experience through account aggregation Sim Sangha Business Development

The Axiomatic Method in Social Choice Theory: Preference Aggregation, Judgment Aggregation, Graph

Tourism in in Tajikistan AS SEEN BY TOUR OPERATORS DUSHANBE, OCTOBER 16, 2019 Tour operators su

Operators of Kolmogorov type and parabolic operators associated with non-commuting vector fields:

Assignment and Arithmetic Operators http://cs.mst.edu What are operators? Operators allow us

More Self-study Operators Unary operators, sizeof, boolean

C C Community Choice Aggregation: Community Choice Aggregation: i i Ch i Ch i i i Progress

White Manipulation in Judgment Aggregation Gabriella Pigozzi Davide Grossi ILLC Amsterdam

Municipal Aggregation Update Village of Kenilworth May 16, 2012 &amp; May 21, 2012 Outline

Implementation and Modeling of Robot Aggregation Behavior in Webots Todesco Laetitia &amp; Rappo

Spatial aggregation and optimal Spatial aggregation and optimal p p gg gg g g p p reserve

Understanding Municipal Aggregation Programs Presented by: Matthew C. Benoit, Town Planner

Judgment Aggregation and Collective Annotation Ulle Endriss Institute for Logic, Language and

DME for Peace Thursday talk Follow that car! Telling the story of conflict &amp; peacebuilding

Our research COVID-19 and What we found advocacy What it might mean for you Advocacy Hub We

FAIR PAY IN INGOS January 17, 2018 PROJECT TEAM University of Edinburgh Ishbel

Gallai-Ramsey Number of Graphs Yaping Mao School of Mathematics and Statistics Qinghai Normal

(NB. some images removed.. ) Articles: Arunima S. Mukherjee, Margunn Aanestad, Sundeep

Neutron matter based on chiral effective field theory interactions Ingo Tews, Technische

Calibrated Surrogate Losses for Adversarially Robust Classification 1 The University of Tokyo

Nuclear PDFs &amp; leptonnucleon scattering From Quarks to Hadrons Fred Olness SMU Thanks

Municipal Aggregation Update Village of Kenilworth May 16, 2012 & May 21, 2012 Outline

Implementation and Modeling of Robot Aggregation Behavior in Webots Todesco Laetitia & Rappo

DME for Peace Thursday talk Follow that car! Telling the story of conflict & peacebuilding

Nuclear PDFs & leptonnucleon scattering From Quarks to Hadrons Fred Olness SMU Thanks