DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // - PowerPoint PPT Presentation

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #17: QUERY EXECUTION & SCHEDULING

2 TODAY’S AGENDA Process Models Query Parallelization Data Placement Scheduling

3 QUERY EXECUTION A query plan is comprised of operators . An operator instance is an invocation of an operator on some segment of data. A task is the execution of a sequence of one or more operator instances.

4 PROCESS MODEL A DBMS’s process model defines how the system is architected to support concurrent requests from a multi-user application. A worker is the DBMS component that is responsible for executing tasks on behalf of the client and returning the results. ARCHITECTURE OF A DATABASE SYSTEM Foundations and Trends in Databases 2007

5 PROCESS MODELS Approach #1: Process per DBMS Worker Approach #2: Process Pool Approach #3: Thread per DBMS Worker

6 PROCESS PER WORKER Each worker is a separate OS process. → Relies on OS scheduler. → Use shared-memory for global data structures. → A process crash doesn’t take down entire system. → Examples: IBM DB2, Postgres, Oracle Dispatcher Worker

12 PROCESS POOL A worker uses any process that is free in a pool → Still relies on OS scheduler and shared memory. → Bad for CPU cache locality. → Examples: IBM DB2, Postgres (2015) Dispatcher Worker Pool

15 THREAD PER WORKER Single process with multiple worker threads. → DBMS has to manage its own scheduling. → May or may not use a dispatcher thread. → Thread crash (may) kill the entire system. → Examples: IBM DB2, MSSQL, MySQL, Oracle (2014) Worker Threads

16 THREAD PER WORKER Single process with multiple worker threads. → DBMS has to manage its own scheduling. → May or may not use a dispatcher thread. → Thread crash (may) kill the entire system. → Examples: IBM DB2, MSSQL, MySQL, Oracle (2014) Worker Threads

17 PROCESS MODELS Using a multi-threaded architecture has several advantages: → Less overhead per context switch. → Don’t have to manage shared memory. The thread per worker model does not mean that you have intra-query parallelism. I am not aware of any new DBMS built in the last 7-8 years that doesn’t use threads.

18 SCHEDULING For each query plan, the DBMS has to decide where, when, and how to execute it. → How many tasks should it use? → How many CPU cores should it use? → What CPU core should the tasks execute on? → Where should a task store its output? The DBMS always knows more than the OS.

19 INTER-QUERY PARALLELISM Improve overall performance by allowing multiple queries to execute simultaneously. → Provide the illusion of isolation through concurrency control scheme. The difficulty of implementing a concurrency control scheme is not significantly affected by the DBMS’s process model.

20 INTRA-QUERY PARALLELISM Improve the performance of a single query by executing its operators in parallel. Approach #1: Intra-Operator (Horizontal) → Operators are decomposed into independent instances that perform the same function on different subsets of data. Approach #2: Inter-Operator (Vertical) → Operations are overlapped in order to pipeline data from one stage to the next without materialization.

21 INTRA-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100 p ⨝ s s A B

22 INTRA-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100 p ⨝ s s A B

23 INTRA-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100 p ⨝ s s A 1 A 2 A 3 A B

24 INTRA-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100 p ⨝ s s A 1 A 2 A 3 A B 1 2 3

25 INTRA-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100 p ⨝ s s A 1 A 2 A 3 A B 1 2 3

26 INTRA-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100 p ⨝ s s s s s A 1 A 2 A 3 A B 1 2 3

27 INTRA-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100 p ⨝ s s s s s A 1 A 2 A 3 A B 1 2 3

28 INTRA-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100 p p p p ⨝ s s s s s A 1 A 2 A 3 A B 1 2 3

29 INTRA-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100 p Build HT Build HT Build HT p p p ⨝ s s s s s A 1 A 2 A 3 A B 1 2 3

30 INTRA-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100 p Exchange Build HT Build HT Build HT p p p ⨝ s s s s s A 1 A 2 A 3 A B 1 2 3

31 INTRA-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id ⨝ AND A.value < 99 AND B.value > 100 p Exchange Build HT Build HT Build HT p p p ⨝ s s s s s A 1 A 2 A 3 A B 1 2 3

32 INTRA-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id ⨝ AND A.value < 99 AND B.value > 100 p Exchange Build HT Build HT Build HT p p p ⨝ s s s s s A 1 A 2 A 3 B 1 B 2 B 3 A B 1 2 3 1 2 3

33 INTRA-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id ⨝ AND A.value < 99 AND B.value > 100 p Exchange Build HT Build HT Build HT p p p ⨝ s s s s s A 1 A 2 A 3 B 1 B 2 B 3 A B 1 2 3 1 2 3

34 INTRA-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id ⨝ AND A.value < 99 AND B.value > 100 p Exchange Build HT Build HT Build HT p p p p p p ⨝ s s s s s s s s A 1 A 2 A 3 B 1 B 2 B 3 A B 1 2 3 1 2 3

35 INTRA-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id ⨝ AND A.value < 99 AND B.value > 100 p Exchange Probe HT Probe HT Probe HT Build HT Build HT Build HT p p p p p p ⨝ s s s s s s s s A 1 A 2 A 3 B 1 B 2 B 3 A B 1 2 3 1 2 3

36 INTRA-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B Exchange WHERE A.id = B.id ⨝ AND A.value < 99 AND B.value > 100 p Exchange Probe HT Probe HT Probe HT Build HT Build HT Build HT p p p p p p ⨝ s s s s s s s s A 1 A 2 A 3 B 1 B 2 B 3 A B 1 2 3 1 2 3

37 INTER-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100 p ⨝ s s A B

38 INTER-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100 p ⨝ s s A B

39 INTER-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100 p ⨝ s s 1 ⨝ for r 1 ∊ outer: for r 2 ∊ inner: A B emit (r 1 ⨝ r 2 )

40 INTER-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id 2 p AND A.value < 99 AND B.value > 100 for r ∊ incoming: p emit ( p r) ⨝ s s 1 ⨝ for r 1 ∊ outer: for r 2 ∊ inner: A B emit (r 1 ⨝ r 2 )

41 INTER-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id 2 p AND A.value < 99 AND B.value > 100 for r ∊ incoming: p emit ( p r) ⨝ s s 1 ⨝ for r 1 ∊ outer: for r 2 ∊ inner: A B emit (r 1 ⨝ r 2 )

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // - PowerPoint PPT Presentation

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #17: QUERY EXECUTION & SCHEDULING 2 TODAYS AGENDA Process Models Query Parallelization Data Placement Scheduling 3 QUERY EXECUTION A query plan is

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Advanced Database CS 525: Organization? Advanced Database =Database Implementation

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016-2017

DATABASE SYSTEMS Database programming in a web environment Database System Course AGENDA FOR

CS411: Two Perspectives on DBMS User perspective CS411 how to use a database system

Database Systems Database Systems 1 Creating a Database System Design Construction

National Address Database National Address Database What is a National Address Database?

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

CSc 337 LECTURE 24: CREATING A DATABASE AND MORE JOINS Creating a database In the command line

DATABASE SYSTEMS Introduction to MySQL Database System Course, 2016 AGENDA FOR TODAY

Lect ure # 11 ADVANCED DATABASE SYSTEMS System Catalogs and Database Compression @

Database Management System (DBMS) DBMS contains information about a particular enterprise

Distributed Databases 1 19.1 Distributed Database System A distributed database system

Distributed Databases Distributed database management system A distributed database (DDB) is

Shortest Paths in Intersection Graphs of Unit Disks Sergio Cabello sergio.cabello@fmf.uni-lj.si

Linux and Advanced Storage Technologies Martin K. Petersen <martin.petersen@oracle.com>

INFS 423 Preservation of Information Resources Session 10 Preservation of Electronic

Mod 3 Unit 7 Lesson 4 Proving the Area of a Disk Lecture Slides.notebook April 28, 2015 1 Mod 3

Outline ROS Basics Plan Execution Very Simple Dispatch Very Simple Temporal

Objects Dr. Mattox Beckman University of Illinois at Urbana-Champaign Department of Computer

Processes Sanzheng Qiao Department of Computing and Software December, 2012 What is a process?

CPSC 410/ 611: Week 3: CPU Scheduling Schedulers in t he OS St ruct ure of a CPU

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // - PowerPoint PPT Presentation

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #17: QUERY EXECUTION & SCHEDULING 2 TODAYS AGENDA Process Models Query Parallelization Data Placement Scheduling 3 QUERY EXECUTION A query plan is

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Advanced Database CS 525: Organization? Advanced Database =Database Implementation

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016-2017

DATABASE SYSTEMS Database programming in a web environment Database System Course AGENDA FOR

CS411: Two Perspectives on DBMS User perspective CS411 how to use a database system

Database Systems Database Systems 1 Creating a Database System Design Construction

National Address Database National Address Database What is a National Address Database?

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

CSc 337 LECTURE 24: CREATING A DATABASE AND MORE JOINS Creating a database In the command line

DATABASE SYSTEMS Introduction to MySQL Database System Course, 2016 AGENDA FOR TODAY

Lect ure # 11 ADVANCED DATABASE SYSTEMS System Catalogs and Database Compression @

Database Management System (DBMS) DBMS contains information about a particular enterprise

Distributed Databases 1 19.1 Distributed Database System A distributed database system

Distributed Databases Distributed database management system A distributed database (DDB) is

Shortest Paths in Intersection Graphs of Unit Disks Sergio Cabello sergio.cabello@fmf.uni-lj.si

Linux and Advanced Storage Technologies Martin K. Petersen &lt;martin.petersen@oracle.com&gt;

INFS 423 Preservation of Information Resources Session 10 Preservation of Electronic

Mod 3 Unit 7 Lesson 4 Proving the Area of a Disk Lecture Slides.notebook April 28, 2015 1 Mod 3

Outline ROS Basics Plan Execution Very Simple Dispatch Very Simple Temporal

Objects Dr. Mattox Beckman University of Illinois at Urbana-Champaign Department of Computer

Processes Sanzheng Qiao Department of Computing and Software December, 2012 What is a process?

CPSC 410/ 611: Week 3: CPU Scheduling Schedulers in t he OS St ruct ure of a CPU

Linux and Advanced Storage Technologies Martin K. Petersen <martin.petersen@oracle.com>