CS 764: Topics in Database Management Systems Lecture 12: Parallel - PowerPoint PPT Presentation

CS 764: Topics in Database Management Systems Lecture 12: Parallel DBMSs Xiangyao Yu 10/14/2020 1

Announcement Class schedule • 10/21: Last lecture included in exam • 10/26: Guest lecture from Ippokratis Pandis (AWS) • 10/28 and 11/2: Lectures become office hours • 11/9 – 12/2: Lectures on state-of-the-art research in databases • 12/7 and 12/9: DAWN workshop 2

Today’s Paper: Parallel DBMSs Communications of the ACM, 1992 3

Agenda Parallelism metrics Parallel architecture Parallel OLAP operators 4

Parallel Database History 1980’s: database machines • Specialized hardware to make databases run fast • Special hardware cannot catch up with Moore’s Law 1980’s – 2010’s: shared-nothing architecture • Connecting machines using a network 2010’s – future? 5

Scaling in Parallel Systems Linear speedup • Twice as much hardware can perform the task in half the elapsed time '()** '+',-( -*).'-/ ,0(- • Speedup = 102 '+',-( -*).'-/ ,0(- • Ideally speedup = N, where the big system is N times larger than the small system Linear scaleup • Twice as much hardware can perform twice as large a task in the same elapsed time '()** '+',-( -*).'-/ ,0(- 67 '()** .861*-( • Scaleup = 102 '+',-( -*).'-/ ,0(- 67 102 .861*-( • Ideally scaleup = 1 6

Scaling in Parallel Systems Ideal speedup No speedup In practice 7

Threats to Parallelism Start parallel tasks Ideal non-ideal processors & disks Startup Collect results 8

Threats to Parallelism Ideal non-ideal processors & disks Examples of interference Startup Interference • Shared hardware resources (e.g., memory, disk, network) • Synchronization (e.g., locking) 9

Threats to Parallelism Ideal Tasks: non-ideal processors & disks Startup Interference Skew Some nodes take more time to execute the assigned tasks, e.g., • More tasks assigned • More computational intensive tasks assigned • Node has slower hardware 10

Design Spectrum Shared-memory Shared-disk Shared-nothing CPU CPU CPU CPU CPU CPU CPU CPU CPU Network Mem Mem Mem Mem Mem Mem Mem Mem Mem Network HDD HDD HDD HDD HDD HDD HDD HDD HDD Network Shared Memory Shared Disk Shared Nothing 11

Design Spectrum – Shared Memory (SM) All processors share direct access to a CPU CPU CPU common global memory and to all disks Network Mem Mem Mem • Does not scale beyond a single server HDD Example: multicore processors HDD HDD Shared Memory 12

Design Spectrum – Shared Disk (SD) CPU Each processor has a private memory but has CPU CPU direct access to all disks Mem Mem Mem Network • Does not scale beyond tens of servers HDD HDD HDD Example: Network attached storage (NAS) and Shared Disk storage area network (SAN) 13

Design Spectrum – Shared Nothing (SN) CPU CPU CPU Each memory and disk is owned by some processor that acts as a server for that data Mem Mem Mem • Scales to thousands of servers and beyond HDD HDD HDD Network Shared Nothing Important optimization goal: minimize network data transfer 14

Legacy Software Old uni-processor software must be rewritten to benefit from parallelism Most database programs are written in relational language SQL • Can make SQL work on parallel hardware without rewriting • Benefits of a high-level programming interface 15

Pipelined Parallelism Pipelined parallelism: pipeline of operators Processor 1 Advantages Processor 2 • Avoid writing intermediate results back to disk Disadvantages • Small number of stages in a query • Blocking operators: e.g., sort and aggregation • Different speed: scan faster than join. Slowest operator becomes the bottleneck 16

Partitioned Parallelism Round-robin partitioning • map tuple i to disk ( i mode n ) Hash partitioning • map tuple i based on a hash function Range partitioning Processor 1 Processor 4 • map contiguous attribute ranges to disks Processor 2 Processor 3 • benefits from clustering but suffers from skew 17

Parallelism within Relational Operators Parallel data streams so that sequential operator code is not modified • Each operator has a set of input and output ports • Partition and merge these ports to sequential ports so that an operator is not aware of parallelism 18

Parallelism within Relational Operators Parallel data streams so that operator code is not modified • Each operator has a set of input and output ports • Partition and merge these ports to sequential ports so that an operator is not aware of parallelism 19

Specialized Parallel Operators Parallel join algorithms R S • Parallel sort-merge join • Parallel hash join (e.g., radix join) 20

Specialized Parallel Operators Semi-join • Example: SELECT * FROM T1, T2 WHERE T1.A = T2.C 21 * Source: Sattler KU. (2009) Semijoin. Encyclopedia of Database Systems.

2010’s – Future Cloud databases – Storage disaggregation • Lower management cost • Independent scaling of computation and storage CPU CPU CPU CPU CPU CPU CPU CPU CPU Mem Mem Mem Mem Mem Mem Mem Mem Mem Network Network HDD HDD HDD HDD HDD HDD HDD HDD HDD Network Shared Nothing Storage Disaggregation Shared Disk 22

Q/A – Parallel DBMSs Parallel vs. distributed vs. cloud DBMS? Valid for modern databases? Batch processing for OLTP workloads? Change of storage technology affects OLTP performance? Will things change with the end of Moore’s law? Extra challenges in the cloud? 23

Discussion SQL, as a simple and high-level interface, enables database optimization across the hardware and software layers. Can you think of other examples of such high-level interfaces that enables flexible optimizations? Can you think of any optimization opportunities for the storage- disaggregation architecture for OLTP or OLAP workloads? 24

Before Next Lecture Look for teammates for the course project J Submit discussion summary to https://wisc-cs764-f20.hotcrp.com • Title: Lecture 12 discussion. group ## • Authors: Names of students who joined the discussion Deadline: Thursday 11:59pm Submit review before next lecture • Michael Stonebraker, et al., Mariposa: A Wide-Area Distributed Database System. VLDB 1996 25

CS 764: Topics in Database Management Systems Lecture 12: Parallel - PowerPoint PPT Presentation

CS 764: Topics in Database Management Systems Lecture 12: Parallel DBMSs Xiangyao Yu 10/14/2020 1 Announcement Class schedule 10/21: Last lecture included in exam 10/26: Guest lecture from Ippokratis Pandis (AWS) 10/28 and 11/2:

Order No. 764: Order No. 764: Implications of Integrating Variable Energy Resources Implications

CS 764: Topics in Database Management Systems Lecture 9: B-tree Locking Xiangyao Yu 10/5/2020 1

CS 764: Topics in Database Management Systems Lecture 1: Introduction Xiangyao Yu 9/2/2020 Who

CS 764: Topics in Database Management Systems Lecture 3: Buffer Management Xiangyao Yu 9/14/2020

CS 764: Topics in Database Management Systems Lecture 4: Query Optimization-1 Xiangyao Yu

CS 764: Topics in Database Management Systems Lecture 11: Two-Phase Commit (2PC) Xiangyao Yu

CS 764: Topics in Database Management Systems Lecture 13: Distributed DBMSs Xiangyao Yu

CS 764: Topics in Database Management Systems Lecture 14: MapReduce Xiangyao Yu 10/21/2020 1

CS 764: Topics in Database Management Systems Lecture 6: Granularity of Locks Xiangyao Yu

The Evolution of Fuel 585.764.5373 CerionEnergy.com Introduction Warren M. Surcouf III Vice

Decision on FERC Order 764 Market Design Changes Greg Cook Director, Market and Infrastructure

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Advanced Database Management Systems Database Management Systems Alvaro A A Fernandes School of

Introduction to Database Systems Database Systems Lecture 1 Natasha Alechina www.cs.nott.ac.uk/

Announcements 61A Lecture 34 Database Management System Architecture Database Management Systems

61A Lecture 33 Announcements Database Management Systems Database Management System Architecture

Introduction to GPUs and to the Linux Graphics Stack Martin Peres CC By-SA 3.0 Nouveau

iSHELL INSTRUMENT CONTROLLER OVERVIEW Tony Denault Software Programmer Eric

Jason Williams Cody Boettcher CSCE 488 Homework 6 Wireless Wumpus World Wireless technology

A Compact and Accurate Timing Macro Model for Efficient Hierarchical Timing Analysis Pei-Yu Lee ,

Unit 8: Superscalar Pipelines Then: Static & dynamic scheduling Extract much more

Assessing and Improving Large Scale Parallel Volume Rendering on the IBM Blue Gene/P

1/13/2020 Bringing Up Great Kids (BUGK) Facilitating respectful, reflective & effective

Rep eated Computation of a Global F unction 1 Goals of the lecture Rep eated

CS 764: Topics in Database Management Systems Lecture 12: Parallel - PowerPoint PPT Presentation

CS 764: Topics in Database Management Systems Lecture 12: Parallel DBMSs Xiangyao Yu 10/14/2020 1 Announcement Class schedule 10/21: Last lecture included in exam 10/26: Guest lecture from Ippokratis Pandis (AWS) 10/28 and 11/2:

Order No. 764: Order No. 764: Implications of Integrating Variable Energy Resources Implications

CS 764: Topics in Database Management Systems Lecture 9: B-tree Locking Xiangyao Yu 10/5/2020 1

CS 764: Topics in Database Management Systems Lecture 1: Introduction Xiangyao Yu 9/2/2020 Who

CS 764: Topics in Database Management Systems Lecture 3: Buffer Management Xiangyao Yu 9/14/2020

CS 764: Topics in Database Management Systems Lecture 4: Query Optimization-1 Xiangyao Yu

CS 764: Topics in Database Management Systems Lecture 11: Two-Phase Commit (2PC) Xiangyao Yu

CS 764: Topics in Database Management Systems Lecture 13: Distributed DBMSs Xiangyao Yu

CS 764: Topics in Database Management Systems Lecture 14: MapReduce Xiangyao Yu 10/21/2020 1

CS 764: Topics in Database Management Systems Lecture 6: Granularity of Locks Xiangyao Yu

The Evolution of Fuel 585.764.5373 CerionEnergy.com Introduction Warren M. Surcouf III Vice

Decision on FERC Order 764 Market Design Changes Greg Cook Director, Market and Infrastructure

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Advanced Database Management Systems Database Management Systems Alvaro A A Fernandes School of

Introduction to Database Systems Database Systems Lecture 1 Natasha Alechina www.cs.nott.ac.uk/

Announcements 61A Lecture 34 Database Management System Architecture Database Management Systems

61A Lecture 33 Announcements Database Management Systems Database Management System Architecture

Introduction to GPUs and to the Linux Graphics Stack Martin Peres CC By-SA 3.0 Nouveau

iSHELL INSTRUMENT CONTROLLER OVERVIEW Tony Denault Software Programmer Eric

Jason Williams Cody Boettcher CSCE 488 Homework 6 Wireless Wumpus World Wireless technology

A Compact and Accurate Timing Macro Model for Efficient Hierarchical Timing Analysis Pei-Yu Lee ,

Unit 8: Superscalar Pipelines Then: Static &amp; dynamic scheduling Extract much more

Assessing and Improving Large Scale Parallel Volume Rendering on the IBM Blue Gene/P

1/13/2020 Bringing Up Great Kids (BUGK) Facilitating respectful, reflective &amp; effective

Rep eated Computation of a Global F unction 1 Goals of the lecture Rep eated

Unit 8: Superscalar Pipelines Then: Static & dynamic scheduling Extract much more

1/13/2020 Bringing Up Great Kids (BUGK) Facilitating respectful, reflective & effective