B a n Parallel DBMS d 1 Terabyte w 1 Terabyte i d Chapter - PDF document

Why Parallel Access To Data? At 10 MB/s 1,000 x parallel 1.2 days to scan 1.5 minute to scan. B a n Parallel DBMS d 1 Terabyte w 1 Terabyte i d Chapter 21, Part A t h Parallelism: 10 MB/s Slides by Joe Hellerstein, UCB, with some material from Jim Gray, divide a big problem Microsoft Research. See also: into many smaller ones http://www.research.microsoft.com/research/BARC/Gray/PDB95.ppt to be solved in parallel. Database Management Systems, 2 nd Edition. Raghu Ramakrishnan and Johannes Gehrke Database Management Systems, 2 nd Edition. Raghu Ramakrishnan and Johannes Gehrke 1 2 Parallel DBMS: Intro DBMS: The || Success Story ❖ Parallelism is natural to DBMS processing ❖ DBMSs are the most (only?) successful – Pipeline parallelism: many machines each doing one application of parallelism. step in a multi-step process. – Teradata, Tandem vs. Thinking Machines, KSR.. – Partition parallelism: many machines doing the – Every major DBMS vendor has some || server same thing to different pieces of data. – Workstation manufacturers now depend on || DB – Both are natural in DBMS! server sales. Any Any Sequential Sequential ❖ Reasons for success: Pipeline Program Program – Bulk-processing (= partition ||-ism). – Natural pipelining. Sequential Any Partition Any Sequential Sequential Sequential Sequential Sequential – Inexpensive hardware can do the trick! Program Program outputs split N ways, inputs merge M ways – Users/app-programmers don’t need to think in || Database Management Systems, 2 nd Edition. Raghu Ramakrishnan and Johannes Gehrke Database Management Systems, 2 nd Edition. Raghu Ramakrishnan and Johannes Gehrke 3 4 Some || Terminology Architecture Issue: Shared What? Ideal (throughput) Xact/sec. Shared Memory Shared Nothing Shared Disk (SMP) (network) ❖ Speed-Up – More resources means CLIENTS CLIENTS CLIENTS proportionally less time degree of ||-ism for given amount of data. Processors Memory Ideal (response time) ❖ Scale-Up sec./Xact – If resources increased in Hard to program Easy to program proportion to increase in Cheap to build Expensive to build data size, time is constant. Easy to scaleup Difficult to scaleup Sequent, SGI, Sun degree of ||-ism VMScluster, Sysplex Tandem, Teradata, SP2 Database Management Systems, 2 nd Edition. Raghu Ramakrishnan and Johannes Gehrke Database Management Systems, 2 nd Edition. Raghu Ramakrishnan and Johannes Gehrke 5 6

What Systems Work This Way Different Types of DBMS ||-ism (as of 9/1995) Shared Nothing Teradata: 400 nodes CLIENTS ❖ Intra-operator parallelism Tandem: 110 nodes – get all machines working to compute a given IBM / SP2 / DB2: 128 nodes Informix/SP2 48 nodes operation (scan, sort, join) ATT & Sybase ? nodes ❖ Inter-operator parallelism CLIENTS – each operator may run concurrently on a different Shared Disk site (exploits pipelining) Oracle 170 nodes DEC Rdb 24 nodes ❖ Inter-query parallelism – different queries run on different sites Shared Memory CLIENTS ❖ We’ll focus on intra-operator ||-ism Informix 9 nodes Processors RedBrick ? nodes Memory Database Management Systems, 2 nd Edition. Raghu Ramakrishnan and Johannes Gehrke Database Management Systems, 2 nd Edition. Raghu Ramakrishnan and Johannes Gehrke 7 8 Automatic Data Partitioning Parallel Scans Partitioning a table: Range Hash Round Robin ❖ Scan in parallel, and merge. ❖ Selection may not require all sites for range or hash partitioning. ❖ Indexes can be built at each partition. A...E F...J K...N O...S T...Z A...E F...J K...N O...S T...Z A...E F...J K...N O...S T...Z ❖ Question: How do indexes differ in the Good for equijoins, Good for equijoins Good to spread load different schemes? range queries – Think about both lookups and inserts! group-by – What about unique indexes? Shared disk and memory less sensitive to partitioning, Shared nothing benefits from "good" partitioning Database Management Systems, 2 nd Edition. Raghu Ramakrishnan and Johannes Gehrke Database Management Systems, 2 nd Edition. Raghu Ramakrishnan and Johannes Gehrke 9 10 Parallel Sorting Parallel Aggregates ❖ For each aggregate function, need a decomposition: ❖ Current records: – count (S) = Σ count (s(i)), ditto for sum () – 8.5 Gb/minute, shared-nothing; Datamation – avg (S) = ( Σ sum (s(i))) / Σ count (s(i)) benchmark in 2.41 secs (UCB students! http://now.cs.berkeley.edu/NowSort/ ) – and so on... ❖ For groups: ❖ Idea: – Scan in parallel, and range-partition as you go. – Sub-aggregate groups close to the source. – As tuples come in, begin “local” sorting on each – Pass each sub-aggregate to its group’s site. – Resulting data is sorted, and range-partitioned. ◆ Chosen via a hash fn. Count – Problem: skew! Count Count Count Count Count – Solution: “sample” the data at start to determine partition points. A Table A...E F...J K...N O...S T...Z Database Management Systems, 2 nd Edition. Raghu Ramakrishnan and Johannes Gehrke Database Management Systems, 2 nd Edition. Raghu Ramakrishnan and Johannes Gehrke 11 12 Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey

Parallel Joins Parallel Hash Join Partitions OUTPUT 1 1 Phase 1 INPUT 2 ❖ Nested loop: 2 hash function . . . Original Relations – Each outer tuple must be compared with each h (R then S) B-1 B-1 inner tuple that might join. Disk B main memory buffers Disk – Easy for range partitioning on join cols, hard otherwise! ❖ In first phase, partitions get distributed to ❖ Sort-Merge (or plain Merge-Join): different sites: – Sorting gives range-partitioning. – A good hash function automatically distributes work evenly! ◆ But what about handling 2 skews? – Merging partitioned tables is local. ❖ Do second phase at each site. ❖ Almost always the winner for equi-join. Database Management Systems, 2 nd Edition. Raghu Ramakrishnan and Johannes Gehrke Database Management Systems, 2 nd Edition. Raghu Ramakrishnan and Johannes Gehrke 13 14 Dataflow Network for || Join Complex Parallel Query Plans ❖ Complex Queries: Inter-Operator parallelism – Pipelining between operators: ◆ note that sort and phase 1 of hash-join block the pipeline!! – Bushy Trees Sites 1-8 Sites 1-4 Sites 5-8 A B R S ❖ Good use of split/merge makes it easier to build parallel versions of sequential join code. Database Management Systems, 2 nd Edition. Raghu Ramakrishnan and Johannes Gehrke Database Management Systems, 2 nd Edition. Raghu Ramakrishnan and Johannes Gehrke 15 16 N × M-way Parallelism Observations Merge Merge Merge ❖ It is relatively easy to build a fast parallel Sort Sort Sort Sort Sort query executor Join Join Join Join Join – S.M.O.P. ❖ It is hard to write a robust and world-class A...E F...J K...N O...S T...Z parallel query optimizer. – There are many tricks. N inputs, M outputs, no bottlenecks. – One quickly hits the complexity barrier. – Still open research! Partitioned Data Partitioned and Pipelined Data Flows Database Management Systems, 2 nd Edition. Raghu Ramakrishnan and Johannes Gehrke Database Management Systems, 2 nd Edition. Raghu Ramakrishnan and Johannes Gehrke 17 18

Parallel Query Optimization What’s Wrong With That? ❖ Best serial plan != Best || plan! Why? ❖ Common approach: 2 phases ❖ Trivial counter-example: – Pick best sequential plan (System R algorithm) – Table partitioned with local secondary index at – Pick degree of parallelism based on current two nodes system parameters. – Range query: all of node 1 and 1% of node 2. ❖ “Bind” operators to processors – Node 1 should do a scan of its partition. – Take query tree, “decorate” as in previous picture. Index – Node 2 should use secondary index. Table Scan Scan ❖ SELECT * FROM telephone_book WHERE name < “NoGood”; A..M N..Z Database Management Systems, 2 nd Edition. Raghu Ramakrishnan and Johannes Gehrke Database Management Systems, 2 nd Edition. Raghu Ramakrishnan and Johannes Gehrke 19 20 Parallel DBMS Summary || DBMS Summary, cont. ❖ ||-ism natural to query processing: ❖ Data layout choices important! – Both pipeline and partition ||-ism! ❖ Most DB operations can be done partition-|| ❖ Shared-Nothing vs. Shared-Mem – Sort. – Shared-disk too, but less standard – Sort-merge join, hash-join. – Shared-mem easy, costly. Doesn’t scaleup. ❖ Complex plans. – Shared-nothing cheap, scales well, harder to – Allow for pipeline-||ism, but sorts, hashes block implement. the pipeline. ❖ Intra-op, Inter-op, & Inter-query ||-ism all – Partition ||-ism acheived via bushy trees. possible. Database Management Systems, 2 nd Edition. Raghu Ramakrishnan and Johannes Gehrke Database Management Systems, 2 nd Edition. Raghu Ramakrishnan and Johannes Gehrke 21 22 || DBMS Summary, cont. ❖ Hardest part of the equation: optimization. – 2-phase optimization simplest, but can be ineffective. – More complex schemes still at the research stage. ❖ We haven’t said anything about Xacts, logging. – Easy in shared-memory architecture. – Takes some care in shared-nothing. Database Management Systems, 2 nd Edition. Raghu Ramakrishnan and Johannes Gehrke 23

B a n Parallel DBMS d 1 Terabyte w 1 Terabyte i d Chapter - PDF document

Why Parallel Access To Data? At 10 MB/s 1,000 x parallel 1.2 days to scan 1.5 minute to scan. B a n Parallel DBMS d 1 Terabyte w 1 Terabyte i d Chapter 21, Part A t h Parallelism: 10 MB/s Slides by Joe Hellerstein, UCB, with some

Status of Grid Activities in Pakistan FAWAD SAEED National Centre For Physics, Pakistan 1

Bandwidth 1 Terabyte 1 Terabyte Parallelism: 10 MB/s divide a big problem into many smaller

61A Lecture 36 Announcements Unix Computer Systems Systems research enables application

SARA Computing & Networking Services Ronald van der Pol rvdp@sara.nl TF-NOC Preparation

DM207 I/O-Efficient Algorithms and Data Structures Fall 2009 Rolf Fagerberg IOEADS Fall 2009

Introduction to Assurance Chapter 19 Computer Security: Art and Science , 2 nd Edition Version

6 th December 2019 Chaired by Claire Murdoch, National Director for Mental Health NHS England and

QPR Question Persuade Refer Jon Mattleman Website: www.jonmattleman.com Email:

Sponsored by Supported in part by grant No. 90ADPI0011-01-00 from the U.S. Administration for

Accurately Determining Intermediate and Terminal Plan States Using Bayesian Goal Recognition

JAVASCRIPT DEVELOPMENT Sasha Vodnik, Instructor JAVASCRIPT DEVELOPMENT THE COMMAND LINE THE

Outline CS 188: Artificial Intelligence Markov Decision Processes (MDPs) Formalism

Applied Machine Learning Spring 2019, CS 519 Prof. Liang Huang School of EECS Oregon State

ACCT 420: ML and AI for visual data Session 11 Dr. Richard M. Crowley 1 Front matter 2 . 1

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 21: Reinforcement

Survivor: CSCI 135 Interfacing with your computer GUI (graphical user interfaces) Today:

Introduction to MATLAB CS534 Fall 2016 Contact Qisi Wang Office: 1308CS E-mail:

Price- -cap cap regulation regulation in port in port pricing pricing: : Price a DEA

TDDE18 & 726G77 Course Introductjon Christofger Holm Department of Computer and informatjon

4303 Ed/Serious Gamification Case Study Gamifiction Gamification is a proven method to

Data Management Systems Query Processing Execution models Views and Schemas

Computational Semantics and Pragmatics Autumn 2011 Raquel Fernndez Institute for Logic,

CMPSC 311- Introduction to Systems Programming Module: Build Processing Professor Patrick

I/O 2 / Filesystems 1 1 Changelog Changes made in this version not seen in fjrst lecture: 13

B a n Parallel DBMS d 1 Terabyte w 1 Terabyte i d Chapter - PDF document

Why Parallel Access To Data? At 10 MB/s 1,000 x parallel 1.2 days to scan 1.5 minute to scan. B a n Parallel DBMS d 1 Terabyte w 1 Terabyte i d Chapter 21, Part A t h Parallelism: 10 MB/s Slides by Joe Hellerstein, UCB, with some

Status of Grid Activities in Pakistan FAWAD SAEED National Centre For Physics, Pakistan 1

Bandwidth 1 Terabyte 1 Terabyte Parallelism: 10 MB/s divide a big problem into many smaller

61A Lecture 36 Announcements Unix Computer Systems Systems research enables application

SARA Computing &amp; Networking Services Ronald van der Pol rvdp@sara.nl TF-NOC Preparation

DM207 I/O-Efficient Algorithms and Data Structures Fall 2009 Rolf Fagerberg IOEADS Fall 2009

Introduction to Assurance Chapter 19 Computer Security: Art and Science , 2 nd Edition Version

6 th December 2019 Chaired by Claire Murdoch, National Director for Mental Health NHS England and

QPR Question Persuade Refer Jon Mattleman Website: www.jonmattleman.com Email:

Sponsored by Supported in part by grant No. 90ADPI0011-01-00 from the U.S. Administration for

Accurately Determining Intermediate and Terminal Plan States Using Bayesian Goal Recognition

JAVASCRIPT DEVELOPMENT Sasha Vodnik, Instructor JAVASCRIPT DEVELOPMENT THE COMMAND LINE THE

Outline CS 188: Artificial Intelligence Markov Decision Processes (MDPs) Formalism

Applied Machine Learning Spring 2019, CS 519 Prof. Liang Huang School of EECS Oregon State

ACCT 420: ML and AI for visual data Session 11 Dr. Richard M. Crowley 1 Front matter 2 . 1

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 21: Reinforcement

Survivor: CSCI 135 Interfacing with your computer GUI (graphical user interfaces) Today:

Introduction to MATLAB CS534 Fall 2016 Contact Qisi Wang Office: 1308CS E-mail:

Price- -cap cap regulation regulation in port in port pricing pricing: : Price a DEA

TDDE18 &amp; 726G77 Course Introductjon Christofger Holm Department of Computer and informatjon

4303 Ed/Serious Gamification Case Study Gamifiction Gamification is a proven method to

Data Management Systems Query Processing Execution models Views and Schemas

Computational Semantics and Pragmatics Autumn 2011 Raquel Fernndez Institute for Logic,

CMPSC 311- Introduction to Systems Programming Module: Build Processing Professor Patrick

I/O 2 / Filesystems 1 1 Changelog Changes made in this version not seen in fjrst lecture: 13

SARA Computing & Networking Services Ronald van der Pol rvdp@sara.nl TF-NOC Preparation

TDDE18 & 726G77 Course Introductjon Christofger Holm Department of Computer and informatjon