ADM INISTRIVIA Homework #1 is due September 11 th @ 11:59pm Project - PowerPoint PPT Presentation

03 Database Storage Part I Intro to Database Systems Andy Pavlo AP AP 15-445/15-645 Computer Science Fall 2019 Carnegie Mellon University

5 ADM INISTRIVIA Homework #1 is due September 11 th @ 11:59pm Project #1 will be released on September 11 th CMU 15-445/645 (Fall 2019)

7 OVERVIEW We now understand what a database looks like at a logical level and how to write queries to read/write data from it. We will next learn how to build software that manages a database. CMU 15-445/645 (Fall 2019)

8 COURSE OUTLINE Relational Databases Query Planning Storage Operator Execution Execution Concurrency Control Access Methods Recovery Buffer Pool Manager Distributed Databases Disk Manager Potpourri CMU 15-445/645 (Fall 2019)

9 DISK- O RIEN TED ARCHITECTURE The DBMS assumes that the primary storage location of the database is on non-volatile disk. The DBMS's components manage the movement of data between non-volatile and volatile storage. CMU 15-445/645 (Fall 2019)

10 STORAGE HIERARCH Y Faster Smaller CPU Registers Expensive CPU Caches Volatile Random Access Byte-Addressable DRAM Non-Volatile SSD Sequential Access Block-Addressable HDD Slower Larger Network Storage Cheaper CMU 15-445/645 (Fall 2019)

10 STORAGE HIERARCH Y Faster Smaller CPU Registers Expensive CMU 15-721 CPU Caches Memory DRAM SSD Disk HDD Slower Larger Network Storage Cheaper CMU 15-445/645 (Fall 2019)

10 STORAGE HIERARCH Y Faster Smaller CPU Registers Expensive CMU 15-721 CPU Caches Memory DRAM Non-volatile Memory SSD Disk HDD Slower Larger Network Storage Cheaper CMU 15-445/645 (Fall 2019)

11 ACCESS TIM ES 0.5 sec 0.5 ns L1 Cache Ref 7 sec 7 ns L2 Cache Ref 100 sec 100 ns DRAM 1.7 days 150, 0,000 ns ns SSD 16.5 weeks 10, 0,000, 0,000 ns HDD 11.4 months ~30, 0,000, 0,000 ns Network Storage 1,000, 0,000, 0,000 ns Tape Archives 31.7 years [Source] CMU 15-445/645 (Fall 2019)

12 SYSTEM DESIGN GOALS Allow the DBMS to manage databases that exceed the amount of memory available. Reading/writing to disk is expensive, so it must be managed carefully to avoid large stalls and performance degradation. CMU 15-445/645 (Fall 2019)

13 DISK- O RIEN TED DBM S Database File Directory Header Header Header Header Header … 1 2 3 4 5 Pages Disk CMU 15-445/645 (Fall 2019)

13 DISK- O RIEN TED DBM S Buffer Pool Memory Database File Directory Header Header Header Header Header … 1 2 3 4 5 Pages Disk CMU 15-445/645 (Fall 2019)

13 DISK- O RIEN TED DBM S Get page # 2 Execution Engine Buffer Pool Memory Database File Directory Header Header Header Header Header … 1 2 3 4 5 Pages Disk CMU 15-445/645 (Fall 2019)

13 DISK- O RIEN TED DBM S Get page # 2 Execution Engine Buffer Pool Directory Memory Database File Directory Header Header Header Header Header … 1 2 3 4 5 Pages Disk CMU 15-445/645 (Fall 2019)

13 DISK- O RIEN TED DBM S Get page # 2 Execution Engine Buffer Pool Directory Header 2 Memory Database File Directory Header Header Header Header Header … 1 2 3 4 5 Pages Disk CMU 15-445/645 (Fall 2019)

13 DISK- O RIEN TED DBM S Get page # 2 Execution Engine Buffer Pool Directory Header Interpret the layout 2 Pointer to page # 2 Memory Database File Directory Header Header Header Header Header … 1 2 3 4 5 Pages Disk CMU 15-445/645 (Fall 2019)

13 DISK- O RIEN TED DBM S Get page # 2 Execution Engine Lecture 5 Buffer Pool Directory Header Interpret the layout 2 Pointer to page # 2 Lecture 6 Memory Database File Lectures 3-4 Directory Header Header Header Header Header … 1 2 3 4 5 Pages Disk CMU 15-445/645 (Fall 2019)

15 WHY NOT USE THE OS? One can use memory mapping ( mmap ) Virtual Physical Memory Memory to store the contents of a file into a process' address space. page1 page2 The OS is responsible for moving data page3 for moving the files' pages in and out page4 of memory. page1 page2 page3 page4 On-Disk File CMU 15-445/645 (Fall 2019)

15 WHY NOT USE THE OS? One can use memory mapping ( mmap ) Virtual Physical Memory Memory to store the contents of a file into a process' address space. page1 page1 page2 The OS is responsible for moving data page3 for moving the files' pages in and out page4 of memory. page1 page2 page3 page4 On-Disk File CMU 15-445/645 (Fall 2019)

15 WHY NOT USE THE OS? One can use memory mapping ( mmap ) Virtual Physical Memory Memory to store the contents of a file into a process' address space. page1 page1 page1 page2 The OS is responsible for moving data page3 for moving the files' pages in and out page4 of memory. page1 page2 page3 page4 On-Disk File CMU 15-445/645 (Fall 2019)

15 WHY NOT USE THE OS? One can use memory mapping ( mmap ) Virtual Physical Memory Memory to store the contents of a file into a process' address space. page1 page1 page1 page2 page3 The OS is responsible for moving data page3 for moving the files' pages in and out page4 of memory. page1 page2 page3 page4 On-Disk File CMU 15-445/645 (Fall 2019)

15 WHY NOT USE THE OS? One can use memory mapping ( mmap ) Virtual Physical Memory Memory to store the contents of a file into a process' address space. page1 page1 page1 page2 page3 The OS is responsible for moving data page3 page3 for moving the files' pages in and out page4 of memory. page1 page2 page3 page4 On-Disk File CMU 15-445/645 (Fall 2019)

15 WHY NOT USE THE OS? One can use memory mapping ( mmap ) Virtual Physical Memory Memory to store the contents of a file into a process' address space. page1 page1 page1 ??? page2 page3 The OS is responsible for moving data page3 page3 for moving the files' pages in and out page4 of memory. page1 page2 page3 page4 On-Disk File CMU 15-445/645 (Fall 2019)

16 WHY NOT USE THE OS? What if we allow multiple threads to access the mmap files to hide page fault stalls? This works good enough for read-only access. It is complicated when there are multiple writers… CMU 15-445/645 (Fall 2019)

17 WHY NOT USE THE OS? Full Usage There are some solutions to this problem: → madvise : Tell the OS how you expect to read certain pages. → mlock : Tell the OS that memory ranges cannot be paged out. → msync : Tell the OS to flush memory ranges out to disk. Partial Usage CMU 15-445/645 (Fall 2019)

18 WHY NOT USE THE OS? DBMS (almost) always wants to control things itself and can do a better job at it. → Flushing dirty pages to disk in the correct order. → Specialized prefetching. → Buffer replacement policy. → Thread/process scheduling. The OS is not your friend. CMU 15-445/645 (Fall 2019)

19 DATABASE STORAGE Problem #1: How the DBMS represents the ← Today database in files on disk. Problem #2: How the DBMS manages its memory and move data back-and-forth from disk. CMU 15-445/645 (Fall 2019)

20 TODAY'S AGENDA File Storage Page Layout Tuple Layout CMU 15-445/645 (Fall 2019)

21 FILE STORAGE The DBMS stores a database as one or more files on disk. → The OS doesn't know anything about the contents of these files. Early systems in the 1980s used custom filesystems on raw storage. → Some "enterprise" DBMSs still support this. → Most newer DBMSs do not do this. CMU 15-445/645 (Fall 2019)

22 STORAGE M ANAGER The storage manager is responsible for maintaining a database's files. → Some do their own scheduling for reads and writes to improve spatial and temporal locality of pages. It organizes the files as a collection of pages. → Tracks data read/written to pages. → Tracks the available space. CMU 15-445/645 (Fall 2019)

23 DATABASE PAGES A page is a fixed-size block of data. → It can contain tuples, meta- data, indexes, log records… → Most systems do not mix page types. → Some systems require a page to be self-contained. Each page is given a unique identifier. → The DBMS uses an indirection layer to map page ids to physical locations. CMU 15-445/645 (Fall 2019)

24 DATABASE PAGES 4KB There are three different notions of "pages" in a DBMS: → Hardware Page (usually 4KB) → OS Page (usually 4KB) → Database Page (512B-16KB) 8KB By hardware page, we mean at what level the device can guarantee a "failsafe write". 16KB CMU 15-445/645 (Fall 2019)

25 PAGE STORAGE ARCHITECTURE Different DBMSs manage pages in files on disk in different ways. → Heap File Organization → Sequential / Sorted File Organization → Hashing File Organization At this point in the hierarchy we don't need to know anything about what is inside of the pages. CMU 15-445/645 (Fall 2019)

26 DATABASE HEAP A heap file is an unordered collection of pages where tuples that are stored in random order. → Create / Get / Write / Delete Page → Must also support iterating over all pages. Need meta-data to keep track of what pages exist and which ones have free space. Two ways to represent a heap file: → Linked List → Page Directory CMU 15-445/645 (Fall 2019)

ADM INISTRIVIA Homework #1 is due September 11 th @ 11:59pm Project - PowerPoint PPT Presentation

03 Database Storage Part I Intro to Database Systems Andy Pavlo AP AP 15-445/15-645 Computer Science Fall 2019 Carnegie Mellon University 5 ADM INISTRIVIA Homework #1 is due September 11 th @ 11:59pm Project #1 will be released on

ADM INISTRIVIA Project #1 is due Fri Sept 27 th @ 11:59pm Homework #2 is due Mon Sept 30 th @

2013 PRESENTATION ADM Services A division of ADM Capital, a wholly-owned subsidiary of

ADM Nutrition Connecting The Harvest To The Home Confidential and Proprietary Business

Nicolas Poyade General Manager ADM, a Premier Player

UKA ENDURANCE ADM V1.2 INTRODUCTION Many coaches have requested guidelines surrounding long-term

ADM Crisis Center 15 Frederick Ave., Akron, Ohio Central Assessment, Drop-in Center &

Evaluating the Treatment Effect in the ADM Study and Lords Paradox Perman Gochyyev GSE, UC

The idea Outline Definition Definition ADM: an IR effectiveness measure based on

The Adm inistrative Procedure Act governs the way federal agencies can propose and establish

I EEE St I EEE Standards d d NesCom Overview 2 0 1 0 Lisa Yacone NesCom Adm inistrator

Digesting Recent Treatment Guidelines ERIN RANEY, PHARM.D., BCPS, BC-ADM PROFESSOR OF PHARMACY

ADVANCED DATABASE SYSTEMS Query Compilation & Code Generation @ Andy_Pavlo // 15- 721 //

23 Databases Intro to Database Systems Andy Pavlo AP AP 15-445/15-645 Computer Science Fall

ADVANCED DATABASE SYSTEMS Networking @ Andy_Pavlo // 15- 721 // Spring 2019 CMU 15-721

04 Part II Intro to Database Systems Andy Pavlo AP AP 15-445/15-645 Computer Science Fall

03 Part I Intro to Database Systems Andy Pavlo AP AP 15-445/15-645 Computer Science Fall

Enterprise Storage Architecture Fall 2018 Hard disks, SSDs, and the I/O subsystem Tyler Bletsch

Unified Address Translation for Memory Mapped SSDs with FlashMap Jian Huang Anirudh Badam

CSE 232A Graduate Database Systems Arun Kumar Topic 1: Data Storage Chapters 8 and 9 of Cow

Disks and RAID CS 4410 Opera5ng Systems Spring 2017 Cornell University Lorenzo Alvisi Anne

ADVANCED DATABASE SYSTEMS Larger-than-Memory Databases @ Andy_Pavlo // 15- 721 // Spring 2020

Today: Secondary Storage To read or write a disk block: Seek: (latency) position head over

Operating Systems Operating Systems CMPSC 473 CMPSC 473 Input/Output Input/Output April 22,

hard drives / fjlesystems 2 1 last time direct memory access write directy to device driver

ADM INISTRIVIA Homework #1 is due September 11 th @ 11:59pm Project - PowerPoint PPT Presentation

03 Database Storage Part I Intro to Database Systems Andy Pavlo AP AP 15-445/15-645 Computer Science Fall 2019 Carnegie Mellon University 5 ADM INISTRIVIA Homework #1 is due September 11 th @ 11:59pm Project #1 will be released on

ADM INISTRIVIA Project #1 is due Fri Sept 27 th @ 11:59pm Homework #2 is due Mon Sept 30 th @

2013 PRESENTATION ADM Services A division of ADM Capital, a wholly-owned subsidiary of

ADM Nutrition Connecting The Harvest To The Home Confidential and Proprietary Business

Nicolas Poyade General Manager ADM, a Premier Player

UKA ENDURANCE ADM V1.2 INTRODUCTION Many coaches have requested guidelines surrounding long-term

ADM Crisis Center 15 Frederick Ave., Akron, Ohio Central Assessment, Drop-in Center &amp;

Evaluating the Treatment Effect in the ADM Study and Lords Paradox Perman Gochyyev GSE, UC

The idea Outline Definition Definition ADM: an IR effectiveness measure based on

The Adm inistrative Procedure Act governs the way federal agencies can propose and establish

I EEE St I EEE Standards d d NesCom Overview 2 0 1 0 Lisa Yacone NesCom Adm inistrator

Digesting Recent Treatment Guidelines ERIN RANEY, PHARM.D., BCPS, BC-ADM PROFESSOR OF PHARMACY

ADVANCED DATABASE SYSTEMS Query Compilation &amp; Code Generation @ Andy_Pavlo // 15- 721 //

23 Databases Intro to Database Systems Andy Pavlo AP AP 15-445/15-645 Computer Science Fall

ADVANCED DATABASE SYSTEMS Networking @ Andy_Pavlo // 15- 721 // Spring 2019 CMU 15-721

04 Part II Intro to Database Systems Andy Pavlo AP AP 15-445/15-645 Computer Science Fall

03 Part I Intro to Database Systems Andy Pavlo AP AP 15-445/15-645 Computer Science Fall

Enterprise Storage Architecture Fall 2018 Hard disks, SSDs, and the I/O subsystem Tyler Bletsch

Unified Address Translation for Memory Mapped SSDs with FlashMap Jian Huang Anirudh Badam

CSE 232A Graduate Database Systems Arun Kumar Topic 1: Data Storage Chapters 8 and 9 of Cow

Disks and RAID CS 4410 Opera5ng Systems Spring 2017 Cornell University Lorenzo Alvisi Anne

ADVANCED DATABASE SYSTEMS Larger-than-Memory Databases @ Andy_Pavlo // 15- 721 // Spring 2020

Today: Secondary Storage To read or write a disk block: Seek: (latency) position head over

Operating Systems Operating Systems CMPSC 473 CMPSC 473 Input/Output Input/Output April 22,

hard drives / fjlesystems 2 1 last time direct memory access write directy to device driver

ADM Crisis Center 15 Frederick Ave., Akron, Ohio Central Assessment, Drop-in Center &

ADVANCED DATABASE SYSTEMS Query Compilation & Code Generation @ Andy_Pavlo // 15- 721 //