ADVANCED DATABASE SYSTEMS Query Execution & Processing @ - PowerPoint PPT Presentation

Lect ure # 15 ADVANCED DATABASE SYSTEMS Query Execution & Processing @ Andy_Pavlo // 15- 721 // Spring 2019

CMU 15-721 (Spring 2019) 2 ARCH ITECTURE OVERVIEW Networking Layer SQL Query SQL Parser Planner Binder Rewriter Optimizer / Cost Models Compiler Scheduling / Placement We Are Here Execution Engine Concurrency Control Operator Execution Indexes Storage Manager Storage Models Logging / Checkpoints

CMU 15-721 (Spring 2019) 3 O PERATO R EXECUTIO N Query Plan Processing Application Logic Execution (UDFs) Parallel Join Algorithms Vectorized Operators Query Compilation

CMU 15-721 (Spring 2019) 4 Q UERY EXECUTIO N A query plan is comprised of operators . An operator instance is an invocation of an operator on some segment of data. A task is the execution of a sequence of one or more operator instances.

CMU 15-721 (Spring 2019) 5 EXECUTIO N O PTIM IZATIO N We are now going to start discussing ways to improve the DBMS's query execution performance for data sets that fit entirely in memory. There are other bottlenecks to target when we remove the disk.

CMU 15-721 (Spring 2019) 6 O PTIM IZATIO N GOALS Approach #1: Reduce Instruction Count → Use fewer instructions to do the same amount of work. Approach #2: Reduce Cycles per Instruction → Execute more CPU instructions in fewer cycles. → This means reducing cache misses and stalls due to memory load/stores. Approach #3: Parallelize Execution → Use multiple threads to compute each query in parallel.

CMU 15-721 (Spring 2019) 7 MonetDB/X100 Analysis Processing Models Parallel Execution

CMU 15-721 (Spring 2019) 8 M O N ETDB/ X10 0 Low-level analysis of execution bottlenecks for in- memory DBMSs on OLAP workloads. → Show how DBMS are designed incorrectly for modern CPU architectures. Based on these findings, they proposed a new DBMS called MonetDB/X100. → Renamed to Vectorwise and acquired by Actian in 2010. → Rebranded as Vector and Avalance MONETDB/X100: HYPER- PIPELINING QUERY EXECUTION CIDR 2 2005

CMU 15-721 (Spring 2019) 9 CPU OVERVIEW CPUs organize instructions into pipeline stages . The goal is to keep all parts of the processor busy at each cycle by masking delays from instructions that cannot complete in a single cycle. Super-scalar CPUs support multiple pipelines. → Execute multiple instructions in parallel in a single cycle if they are independent. → Flynn's Taxonomy: Single Instruction stream, Single Data stream ( SISD )

CMU 15-721 (Spring 2019) 10 DBM S / CPU PRO BLEM S Problem #1: Dependencies → If one instruction depends on another instruction, then it cannot be pushed immediately into the same pipeline. Problem #2: Branch Prediction → The CPU tries to predict what branch the program will take and fill in the pipeline with its instructions. → If it gets it wrong, it has to throw away any speculative work and flush the pipeline.

CMU 15-721 (Spring 2019) 11 BRAN CH M ISPREDICTIO N Because of long pipelines, CPUs will speculatively execute branches. This potentially hides the long stalls between dependent instructions. The most executed branching code in a DBMS is the filter operation during a sequential scan. But this is (nearly) impossible to predict correctly.

CMU 15-721 (Spring 2019) 12 SELECTIO N SCAN S SELECT * FROM table WHERE key >= $(low) AND key <= $(high) Source: Bogdan Raducanu

CMU 15-721 (Spring 2019) 12 SELECTIO N SCAN S Scalar (Branching) i = 0 for t in table : key = t.key if ( key≥ low ) && ( key≤ high ): copy (t, output[i]) i = i + 1 Source: Bogdan Raducanu

CMU 15-721 (Spring 2019) 12 SELECTIO N SCAN S Scalar (Branching) Scalar (Branchless) i = 0 i = 0 for t in table : for t in table : key = t.key copy (t, output[i]) if ( key≥ low ) && ( key≤ high ): key = t.key copy (t, output[i]) m = ( key≥ low ? 1 : 0) && ⮱ ( key≤ high ? 1 : 0) i = i + 1 i = i + m Source: Bogdan Raducanu

CMU 15-721 (Spring 2019) 13 SELECTIO N SCAN S Source: Bogdan Raducanu

CMU 15-721 (Spring 2019) 14 EXCESSIVE IN STRUCTIO NS The DBMS needs to support different data types, so it must check a values type before it performs any operation on that value. → This is usually implemented as giant switch statements. → Also creates more branches that can be difficult for the CPU to predict reliably. Example: Postgres' addition for NUMERIC types.

CMU 15-721 (Spring 2019) 15 PRO CESSIN G M O DEL A DBMS's processing model defines how the system executes a query plan. → Different trade-offs for different workloads. Approach #1: Iterator Model Approach #2: Materialization Model Approach #3: Vectorized / Batch Model

CMU 15-721 (Spring 2019) 16 ITERATO R M O DEL Each query plan operator implements a next function. → On each invocation, the operator returns either a single tuple or a null marker if there are no more tuples. → The operator implements a loop that calls next on its children to retrieve their tuples and then process them. Also called Volcano or Pipeline Model.

CMU 15-721 (Spring 2019) 17 ITERATO R M O DEL SELECT A.id, B.value for t in child.Next() : FROM A, B emit ( projection (t)) WHERE A.id = B.id AND B.value > 100 for t 1 in left.Next() : p buildHashTable (t 1 ) for t 2 in right.Next() : A.id, B.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ A.id=B.id for t in child.Next() : s if evalPred (t): emit (t) value>100 for t in A : for t in B : emit (t) A B emit (t)

CMU 15-721 (Spring 2019) 17 ITERATO R M O DEL SELECT A.id, B.value for t in child.Next() : 1 FROM A, B emit ( projection (t)) WHERE A.id = B.id AND B.value > 100 for t 1 in left.Next() : p buildHashTable (t 1 ) for t 2 in right.Next() : A.id, B.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ A.id=B.id for t in child.Next() : s if evalPred (t): emit (t) value>100 for t in A : for t in B : emit (t) A B emit (t)

CMU 15-721 (Spring 2019) 17 ITERATO R M O DEL SELECT A.id, B.value for t in child.Next() : 1 FROM A, B emit ( projection (t)) WHERE A.id = B.id AND B.value > 100 for t 1 in left.Next() : p 2 buildHashTable (t 1 ) for t 2 in right.Next() : A.id, B.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ A.id=B.id for t in child.Next() : s if evalPred (t): emit (t) value>100 for t in A : for t in B : emit (t) A B emit (t)

CMU 15-721 (Spring 2019) 17 ITERATO R M O DEL SELECT A.id, B.value for t in child.Next() : 1 FROM A, B emit ( projection (t)) WHERE A.id = B.id AND B.value > 100 for t 1 in left.Next() : p 2 buildHashTable (t 1 ) for t 2 in right.Next() : A.id, B.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ A.id=B.id for t in child.Next() : Single Tuple s if evalPred (t): emit (t) value>100 for t in A : for t in B : 3 emit (t) A B emit (t)

CMU 15-721 (Spring 2019) 17 ITERATO R M O DEL SELECT A.id, B.value for t in child.Next() : 1 FROM A, B emit ( projection (t)) WHERE A.id = B.id AND B.value > 100 for t 1 in left.Next() : p 2 buildHashTable (t 1 ) for t 2 in right.Next() : A.id, B.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ A.id=B.id for t in child.Next() : 4 s if evalPred (t): emit (t) value>100 for t in A : for t in B : 3 5 emit (t) A B emit (t)

ADVANCED DATABASE SYSTEMS Query Execution & Processing @ - PowerPoint PPT Presentation

Lect ure # 15 ADVANCED DATABASE SYSTEMS Query Execution & Processing @ Andy_Pavlo // 15- 721 // Spring 2019 CMU 15-721 (Spring 2019) 2 ARCH ITECTURE OVERVIEW Networking Layer SQL Query SQL Parser Planner Binder Rewriter

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Advanced Database CS 525: Organization? Advanced Database =Database Implementation

Advanced Database Management Systems Database Management Systems Alvaro A A Fernandes School of

DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016-2017

DATABASE SYSTEMS Database programming in a web environment Database System Course AGENDA FOR

Lect ure # 11 ADVANCED DATABASE SYSTEMS System Catalogs and Database Compression @

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

Database Systems Database Systems 1 Creating a Database System Design Construction

Overview of Database Systems CS3860 - Jay Urbain, PhD Introduction to Database Systems 1 2 UFR

ADVANCED DATABASE SYSTEMS Database Compression @ Andy_Pavlo // 15- 721 // Spring 2019 CMU

National Address Database National Address Database What is a National Address Database?

CSc 337 LECTURE 24: CREATING A DATABASE AND MORE JOINS Creating a database In the command line

Lect ure # 01 ADVANCED DATABASE SYSTEMS Course Introduction & History of Database Systems

ADVANCED DATABASE SYSTEMS Self-Driving Database Management Systems @ Andy_Pavlo // 15- 721 //

Opportunities for international DUNE communication and outreach Kurt Riesselmann kurtr@fnal.gov

Laboratory Astrophysics and Stardust Natalia Ruiz Zelmanovitch - @bynzelman Public Information

eaking Br 56 nd Ba A Breakdown of High- performance Communication Rohit Zambre,* Megan

NMO in presence of NSI N. R. Khan Chowdhury 1 5th Nov 2019 | N. R. Khan Chowdhury | Group

Abstract Generation Advanced VLSI Design CMPE 414 Abstract Generation Place and route tools do

xJS Elias Athanasopoulos, FORTH-ICS 2 xJS Elias

The Simulation Pipeline phenomenon, process etc. Scientific Computing I modelling v

Outline Background and Motivation Research Questions Serverless Application

ADVANCED DATABASE SYSTEMS Query Execution & Processing @ - PowerPoint PPT Presentation

Lect ure # 15 ADVANCED DATABASE SYSTEMS Query Execution & Processing @ Andy_Pavlo // 15- 721 // Spring 2019 CMU 15-721 (Spring 2019) 2 ARCH ITECTURE OVERVIEW Networking Layer SQL Query SQL Parser Planner Binder Rewriter

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Advanced Database CS 525: Organization? Advanced Database =Database Implementation

Advanced Database Management Systems Database Management Systems Alvaro A A Fernandes School of

DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016-2017

DATABASE SYSTEMS Database programming in a web environment Database System Course AGENDA FOR

Lect ure # 11 ADVANCED DATABASE SYSTEMS System Catalogs and Database Compression @

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

Database Systems Database Systems 1 Creating a Database System Design Construction

Overview of Database Systems CS3860 - Jay Urbain, PhD Introduction to Database Systems 1 2 UFR

ADVANCED DATABASE SYSTEMS Database Compression @ Andy_Pavlo // 15- 721 // Spring 2019 CMU

National Address Database National Address Database What is a National Address Database?

CSc 337 LECTURE 24: CREATING A DATABASE AND MORE JOINS Creating a database In the command line

Lect ure # 01 ADVANCED DATABASE SYSTEMS Course Introduction &amp; History of Database Systems

ADVANCED DATABASE SYSTEMS Self-Driving Database Management Systems @ Andy_Pavlo // 15- 721 //

Opportunities for international DUNE communication and outreach Kurt Riesselmann kurtr@fnal.gov

Laboratory Astrophysics and Stardust Natalia Ruiz Zelmanovitch - @bynzelman Public Information

eaking Br 56 nd Ba A Breakdown of High- performance Communication Rohit Zambre,* Megan

NMO in presence of NSI N. R. Khan Chowdhury 1 5th Nov 2019 | N. R. Khan Chowdhury | Group

Abstract Generation Advanced VLSI Design CMPE 414 Abstract Generation Place and route tools do

xJS Elias Athanasopoulos, FORTH-ICS 2 xJS Elias

The Simulation Pipeline phenomenon, process etc. Scientific Computing I modelling v

Outline Background and Motivation Research Questions Serverless Application

Lect ure # 01 ADVANCED DATABASE SYSTEMS Course Introduction & History of Database Systems