Query Execution in Column-Stores Atte Hinkka Seminar on Columnar - PowerPoint PPT Presentation

Query Execution in Column-Stores Atte Hinkka Seminar on Columnar Databases, Fall 2012 1

Central concepts • Column (query) operators • Compression considerations • Materialization strategies • Vectorized operations 2

Query what? • Query operator trees • Models for query execution • Architectural models • Roots in the transactional world 3

Query operator tree 4

Query operator tree scan.next() => {“Virtanen”, “Veijo”, 2011-02-01, 3} 4

Query operator tree scan.next() => {“Virtanen”, “Veijo”, 2011-02-01, 3} scan.next() => {“Meikäläinen”, “Matti”, 2012-06-01, 3} 4

Query operator tree select.next() => {“Meikäläinen”, “Matti”, 2012-06-01, 3} scan.next() => {“Virtanen”, “Veijo”, 2011-02-01, 3} scan.next() => {“Meikäläinen”, “Matti”, 2012-06-01, 3} 4

Volcano model • Each query operator provides an iterator interface • Iterator returns tuples from the disk • Conceptually simple, beautiful • Optimizations focused on the query plan level: avoid full table scans, minimize the amount of tuples processed 5

Problems with Volcano • A query heavy of aggregation operators (AVG, SUM, ...) may spend only 10% of time averaging and summing, i.e. doing actual work • MIPS R12000 can do a double multiplication in 3 cycles, MySQL takes 49 to do that; no loop pipelining! 6

Column-oriented processing • Predicates in Scan operators • Late tuple materialization • Invisible joins • Operations on compressed data 7

Predicates on Scan operators • Possible to do exact matches on heavily- compressed data (LZ-encoding) • Can avoid dictionary lookups in a similar fashion • Operating on run-length or bit-vector - encoded columns is possible when the predicate matcher knows about the compression used 8

Late tuple materialization • Operators operate on position lists • Join position lists and materialize tuples at the very end • Position lists are trivial to produce from sorted columns (<, >, ==) • Position lists can be coded as bitmaps for CPU-efficiency 9

Invisible join • Compute a bitmap (position list) for select predicates • Join result is the intersection of bitmaps • Results can be calculated efficiently by bitmap operations • Useful in column-stores and data warehouses where joins of facts & dimensions 10

Operating on compressed data • Push predicate down to Scan operator • Don’t decompress when not needed • Dictionary encoding only needs to decompress once • Keep a cache of decompressed values • Makes it possible to store columns in multiple sort orders 11

Performance benefits of... • Late materialization • Compression • Invisible join 12

Alternative design • C-Store • MonetDB/X100 • Column-optimized • Query execution as Query operators array manipulation • Late tuple • Emphasis on materialization vectorized processing and high • Modified Scan CPU efficiency operators 14

MonetDB/X100, solving memory bottleneck • Operators work with chunks of data that fit in the CPU cache (~1024 values) • Operators are vectorized, have low degree of freedom (are simple, don’t handle arbitrary predicates etc) in order for the compiler to be able to do loop pipelining • Decompress pages to CPU cache, not RAM 15

X100 query tree • Still a pull-model, but based on vectors, not tuples or values • Emphasis on CPU cache efficiency • Enables Single- Instruction-Multiple- Data (SIMD) instructions 16

Recap • Operator changes • Scan operator that knows of compression and can handle predicates • Late tuple materialization • Invisible joins • Operations on compressed data • Vectorized processing 17

Query Execution in Column-Stores Atte Hinkka Seminar on Columnar - PowerPoint PPT Presentation

Query Execution in Column-Stores Atte Hinkka Seminar on Columnar Databases, Fall 2012 1 Central concepts Column (query) operators Compression considerations Materialization strategies Vectorized operations 2 Query what?

Column-Stores vs. Row-Stores: How Different Are They Really? Daniel Abadi (Yale), Samuel Madden

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Parallel Query Execution in POLARDB for MySQL ystein Grvlen Benny Wang Alibaba Cloud Agenda

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

MASTERING STRATEGY EXECUTION 18 BEST PRACTICES FOR STRATEGY EXECUTION STRATEGY EXECUTION AS

Linear Algebra Vectors A column vector is a list of numbers stored vertically. The dimen-

Vectors and Matrices Vectors Defn. A matrix with one column is called a (column) vector . We

Positional Update Handling in Column Stores Sandor Heman, Marcin Zukowski, Niels Nes,

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

CS4224/CS5424 Lecture 9 Distributed Query Processing Query Processing Translates query into a

Module 13: Optimizing Query Performance Overview Introduction to the Query Optimizer

Physical Operators Scanning, sorting, merging, hashing 193 Physical Operators Execution Query

File System Challenges in Consumer Electronics Products (Chan Gyun Jeong) SW Platform

Connecticuts COVID -19 Homeless Response April 1, 2020 Agenda 1:00pm Introductory remarks

Singular curve point decompression attack Peter Gnther joint work with Johannes Blmer

Continuous Flow Process for Cr(VI) Removal from Drinking Water through Reduction onto FeOOH by ISRs

The search of water in Mars using radars Toms Ahumada Motivation First publication

make experiment WMT 2010 workflow management Goals JHU Submission WMT 2010 Running translation

11. Presentation Approaches II Dealing with the presentation problem Vorlesung

Data Science in the Cloud Stefan Krawczyk @stefkrawczyk linkedin.com/in/skrawczyk November