03 Part I Intro to Database Systems Andy Pavlo AP AP - - PowerPoint PPT Presentation

03
SMART_READER_LITE
LIVE PREVIEW

03 Part I Intro to Database Systems Andy Pavlo AP AP - - PowerPoint PPT Presentation

Database Storage 03 Part I Intro to Database Systems Andy Pavlo AP AP 15-445/15-645 Computer Science Fall 2020 Carnegie Mellon University 2 ADM INISTRIVIA Homework #1 is due September 13 th @ 11:59pm Project #0 is due September 13 th @


slide-1
SLIDE 1

Intro to Database Systems 15-445/15-645 Fall 2020 Andy Pavlo Computer Science Carnegie Mellon University

AP AP

03

Database Storage Part I

slide-2
SLIDE 2

15-445/645 (Fall 2020)

ADM INISTRIVIA

Homework #1 is due September 13th @ 11:59pm Project #0 is due September 13th @ 11:59pm Project #1 will be released on September 14th

2

slide-3
SLIDE 3

15-445/645 (Fall 2020)

OVERVIEW

We now understand what a database looks like at a logical level and how to write queries to read/write data from it. We will next learn how to build software that manages a database.

3

slide-4
SLIDE 4

15-445/645 (Fall 2020)

COURSE OUTLINE

Relational Databases Storage Execution Concurrency Control Recovery Distributed Databases Potpourri

4

Query Planning Operator Execution Access Methods Buffer Pool Manager Disk Manager

slide-5
SLIDE 5

15-445/645 (Fall 2020)

DISK- O RIEN TED ARCHITECTURE

The DBMS assumes that the primary storage location of the database is on non-volatile disk. The DBMS's components manage the movement

  • f data between non-volatile and volatile storage.

5

slide-6
SLIDE 6

15-445/645 (Fall 2020)

STORAGE HIERARCH Y

6 CPU Registers

CPU Caches

DRAM SSD HDD Network Storage Faster Smaller Expensive Slower Larger Cheaper

Volatile Random Access Byte-Addressable Non-Volatile Sequential Access Block-Addressable

slide-7
SLIDE 7

15-445/645 (Fall 2020)

STORAGE HIERARCH Y

6

Memory Disk CMU 15-721

CPU Registers

CPU Caches

DRAM SSD HDD Network Storage Faster Smaller Expensive Slower Larger Cheaper

slide-8
SLIDE 8

15-445/645 (Fall 2020)

STORAGE HIERARCH Y

6

Memory Disk CMU 15-721

CPU Registers

CPU Caches

DRAM SSD HDD Network Storage Faster Smaller Expensive Slower Larger Cheaper Non-volatile Memory Fast Network Storage

slide-9
SLIDE 9

15-445/645 (Fall 2020)

ACCESS TIM ES

7

0.5 ns L1 Cache Ref 7 ns L2 Cache Ref 100 ns DRAM 150, 0,000 ns ns SSD 10, 0,000, 0,000 ns HDD ~30, 0,000, 0,000 ns Network Storage 1,000, 0,000, 0,000 ns Tape Archives

0.5 sec 7 sec 100 sec 1.7 days 16.5 weeks 11.4 months 31.7 years

[Source]

slide-10
SLIDE 10

15-445/645 (Fall 2020)

SEQ UENTIAL VS. RANDOM ACCESS

Random access on non-volatile storage is usually much slower than sequential access. DBMS will want to maximize sequential access.

→ Algorithms try to reduce number of writes to random pages so that data is stored in contiguous blocks. → Allocating multiple pages at the same time is called an extent.

8

slide-11
SLIDE 11

15-445/645 (Fall 2020)

SYSTEM DESIGN GOALS

Allow the DBMS to manage databases that exceed the amount of memory available. Reading/writing to disk is expensive, so it must be managed carefully to avoid large stalls and performance degradation. Random access on disk is usually much slower than sequential access, so the DBMS will want to maximize sequential access.

9

slide-12
SLIDE 12

15-445/645 (Fall 2020)

DISK- O RIEN TED DBM S

10

Disk

Database File

slide-13
SLIDE 13

15-445/645 (Fall 2020)

DISK- O RIEN TED DBM S

10

Disk Memory

Database File

1

Header

Directory

2

Header

3

Header

Pages Buffer Pool

4

Header

5

Header

Execution Engine

Get page # 2

Directory

slide-14
SLIDE 14

15-445/645 (Fall 2020)

DISK- O RIEN TED DBM S

10

Disk Memory

Database File

1

Header

Directory

2

Header

3

Header

Pages Buffer Pool

2

Header

4

Header

5

Header

Execution Engine

Get page # 2

Directory

Interpret the layout Pointer to page # 2

slide-15
SLIDE 15

15-445/645 (Fall 2020)

DISK- O RIEN TED DBM S

10

Disk Memory

Database File

1

Header

Directory

2

Header

3

Header

Pages Buffer Pool

2

Header

4

Header

5

Header

Execution Engine

Get page # 2

Directory

Interpret the layout Pointer to page # 2

Lectures 3-4 Lecture 5 Lecture 6 Lectures 10-13

slide-16
SLIDE 16

15-445/645 (Fall 2020)

WHY NOT USE THE OS?

One can use memory mapping (mmap) to store the contents of a file into a process' address space. The OS is responsible for moving data for moving the files' pages in and out

  • f memory.

11

page1 page2 page3 page4

On-Disk File Virtual Memory

page1 page2 page3 page4

Physical Memory

slide-17
SLIDE 17

15-445/645 (Fall 2020)

WHY NOT USE THE OS?

One can use memory mapping (mmap) to store the contents of a file into a process' address space. The OS is responsible for moving data for moving the files' pages in and out

  • f memory.

11

page1 page2 page3 page4

On-Disk File Virtual Memory

page1 page2 page3 page4

Physical Memory

page1 page1

slide-18
SLIDE 18

15-445/645 (Fall 2020)

WHY NOT USE THE OS?

One can use memory mapping (mmap) to store the contents of a file into a process' address space. The OS is responsible for moving data for moving the files' pages in and out

  • f memory.

11

page1 page2 page3 page4

On-Disk File Virtual Memory

page1 page2 page3 page4

Physical Memory

page1 page1

slide-19
SLIDE 19

15-445/645 (Fall 2020)

WHY NOT USE THE OS?

One can use memory mapping (mmap) to store the contents of a file into a process' address space. The OS is responsible for moving data for moving the files' pages in and out

  • f memory.

11

page1 page2 page3 page4

On-Disk File Virtual Memory

page1 page2 page3 page4

Physical Memory

page1 page3 page1 page3

slide-20
SLIDE 20

15-445/645 (Fall 2020)

WHY NOT USE THE OS?

One can use memory mapping (mmap) to store the contents of a file into a process' address space. The OS is responsible for moving data for moving the files' pages in and out

  • f memory.

11

page1 page2 page3 page4

On-Disk File Virtual Memory

page1 page2 page3 page4

Physical Memory

page1 page3

???

page1 page3

slide-21
SLIDE 21

15-445/645 (Fall 2020)

WHY NOT USE THE OS?

What if we allow multiple threads to access the mmap files to hide page fault stalls? This works good enough for read-only access. It is complicated when there are multiple writers…

12

slide-22
SLIDE 22

15-445/645 (Fall 2020)

WHY NOT USE THE OS?

There are some solutions to this problem:

→ madvise: Tell the OS how you expect to read certain pages. → mlock: Tell the OS that memory ranges cannot be paged out. → msync: Tell the OS to flush memory ranges out to disk.

13

Full Usage Partial Usage

slide-23
SLIDE 23

15-445/645 (Fall 2020)

WHY NOT USE THE OS?

DBMS (almost) always wants to control things itself and can do a better job at it.

→ Flushing dirty pages to disk in the correct order. → Specialized prefetching. → Buffer replacement policy. → Thread/process scheduling.

The OS is not your friend.

14

slide-24
SLIDE 24

15-445/645 (Fall 2020)

DATABASE STORAGE

Problem #1: How the DBMS represents the database in files on disk. Problem #2: How the DBMS manages its memory and move data back-and-forth from disk.

15

← Today

slide-25
SLIDE 25

15-445/645 (Fall 2020)

TODAY'S AGENDA

File Storage Page Layout Tuple Layout

16

slide-26
SLIDE 26

15-445/645 (Fall 2020)

FILE STORAGE

The DBMS stores a database as one or more files

  • n disk typically in a proprietary format.

→ The OS doesn't know anything about the contents of these files.

Early systems in the 1980s used custom filesystems

  • n raw storage.

→ Some "enterprise" DBMSs still support this. → Most newer DBMSs do not do this.

17

slide-27
SLIDE 27

15-445/645 (Fall 2020)

STORAGE M ANAGER

The storage manager is responsible for maintaining a database's files.

→ Some do their own scheduling for reads and writes to improve spatial and temporal locality of pages.

It organizes the files as a collection of pages.

→ Tracks data read/written to pages. → Tracks the available space.

18

slide-28
SLIDE 28

15-445/645 (Fall 2020)

DATABASE PAGES

A page is a fixed-size block of data.

→ It can contain tuples, meta-data, indexes, log records… → Most systems do not mix page types. → Some systems require a page to be self-contained.

Each page is given a unique identifier.

→ The DBMS uses an indirection layer to map page ids to physical locations.

19

slide-29
SLIDE 29

15-445/645 (Fall 2020)

DATABASE PAGES

There are three different notions of "pages" in a DBMS:

→ Hardware Page (usually 4KB) → OS Page (usually 4KB) → Database Page (512B-16KB)

A hardware page is the largest block

  • f data that the storage device can

guarantee failsafe writes.

20

slide-30
SLIDE 30

15-445/645 (Fall 2020)

DATABASE PAGES

There are three different notions of "pages" in a DBMS:

→ Hardware Page (usually 4KB) → OS Page (usually 4KB) → Database Page (512B-16KB)

A hardware page is the largest block

  • f data that the storage device can

guarantee failsafe writes.

20

16KB 8KB 4KB

slide-31
SLIDE 31

15-445/645 (Fall 2020)

DATABASE HEAP

A heap file is an unordered collection of pages where tuples that are stored in random order.

→ Create / Get / Write / Delete Page → Must also support iterating over all pages.

Two ways to represent a heap file:

→ Linked List → Page Directory

22

slide-32
SLIDE 32

15-445/645 (Fall 2020)

DATABASE HEAP

It is easy to find pages if there is only a single heap file. Need meta-data to keep track of what pages exist in multiple files and which

  • nes have free space.

23

Database File

Page0 Page1 Page2 Page3 Page4 …

slide-33
SLIDE 33

15-445/645 (Fall 2020)

DATABASE HEAP

It is easy to find pages if there is only a single heap file. Need meta-data to keep track of what pages exist in multiple files and which

  • nes have free space.

23

Database File

Page0 Page1 Page2 Page3 Page4 …

Get Page # 2

Offset = Page# × PageSize

slide-34
SLIDE 34

15-445/645 (Fall 2020)

DATABASE HEAP

It is easy to find pages if there is only a single heap file. Need meta-data to keep track of what pages exist in multiple files and which

  • nes have free space.

23

Get Page # 2

slide-35
SLIDE 35

15-445/645 (Fall 2020)

HEAP FILE: LINKED LIST

Maintain a header page at the beginning of the file that stores two pointers:

→ HEAD of the free page list. → HEAD of the data page list.

Each page keeps track of how many free slots they currently have.

24

Header

Page1

Data

Page0

Data

Page4

Data

Page2

Data

… …

Free Page List Data Page List

slide-36
SLIDE 36

15-445/645 (Fall 2020)

HEAP FILE: PAGE DIRECTORY

The DBMS maintains special pages that tracks the location of data pages in the database files. The directory also records the number

  • f free slots per page.

The DBMS must make sure that the directory pages are in sync with the data pages.

25

Directory

Page0

Data

Page1

Data

Page100

Data

slide-37
SLIDE 37

15-445/645 (Fall 2020)

TODAY'S AGENDA

File Storage Page Layout Tuple Layout

26

slide-38
SLIDE 38

15-445/645 (Fall 2020)

PAGE HEADER

Every page contains a header of meta- data about the page's contents.

→ Page Size → Checksum → DBMS Version → Transaction Visibility → Compression Information

Some systems require pages to be self- contained (e.g., Oracle).

27

Data

Page

Header

slide-39
SLIDE 39

15-445/645 (Fall 2020)

PAGE LAYOUT

For any page storage architecture, we now need to decide how to organize the data inside of the page.

→ We are still assuming that we are only storing tuples.

Two approaches:

→ Tuple-oriented → Log-structured

28

slide-40
SLIDE 40

15-445/645 (Fall 2020)

TUPLE STORAGE

How to store tuples in a page? Strawman Idea: Keep track of the number of tuples in a page and then just append a new tuple to the end.

29

Page

Num Tuples = 0

slide-41
SLIDE 41

15-445/645 (Fall 2020)

TUPLE STORAGE

How to store tuples in a page? Strawman Idea: Keep track of the number of tuples in a page and then just append a new tuple to the end.

29

Page

Num Tuples = 0 Tuple #1 Tuple #2 Tuple #3 Num Tuples = 3

slide-42
SLIDE 42

15-445/645 (Fall 2020)

TUPLE STORAGE

How to store tuples in a page? Strawman Idea: Keep track of the number of tuples in a page and then just append a new tuple to the end.

→ What happens if we delete a tuple?

29

Page

Num Tuples = 0 Tuple #1 Tuple #3 Num Tuples = 3 Num Tuples = 2

slide-43
SLIDE 43

15-445/645 (Fall 2020)

TUPLE STORAGE

How to store tuples in a page? Strawman Idea: Keep track of the number of tuples in a page and then just append a new tuple to the end.

→ What happens if we delete a tuple?

29

Page

Num Tuples = 0 Tuple #1 Tuple #3 Tuple #4 Num Tuples = 3

slide-44
SLIDE 44

15-445/645 (Fall 2020)

TUPLE STORAGE

How to store tuples in a page? Strawman Idea: Keep track of the number of tuples in a page and then just append a new tuple to the end.

→ What happens if we delete a tuple? → What happens if we have a variable- length attribute?

29

Page

Num Tuples = 0 Tuple #1 Tuple #3 Tuple #4 Num Tuples = 3

slide-45
SLIDE 45

15-445/645 (Fall 2020)

SLOTTED PAGES

The most common layout scheme is called slotted pages. The slot array maps "slots" to the tuples' starting position offsets. The header keeps track of:

→ The # of used slots → The offset of the starting location of the last slot used.

30

Header Tuple #4 Tuple #2 Tuple #3 Tuple #1

Fixed- and Var-length Tuple Data Slot Array

slide-46
SLIDE 46

15-445/645 (Fall 2020)

SLOTTED PAGES

The most common layout scheme is called slotted pages. The slot array maps "slots" to the tuples' starting position offsets. The header keeps track of:

→ The # of used slots → The offset of the starting location of the last slot used.

30

Header Tuple #4 Tuple #2 Tuple #3 Tuple #1

Fixed- and Var-length Tuple Data Slot Array

slide-47
SLIDE 47

15-445/645 (Fall 2020)

SLOTTED PAGES

The most common layout scheme is called slotted pages. The slot array maps "slots" to the tuples' starting position offsets. The header keeps track of:

→ The # of used slots → The offset of the starting location of the last slot used.

30

Header Tuple #4 Tuple #2 Tuple #3 Tuple #1

Fixed- and Var-length Tuple Data Slot Array

slide-48
SLIDE 48

15-445/645 (Fall 2020)

SLOTTED PAGES

The most common layout scheme is called slotted pages. The slot array maps "slots" to the tuples' starting position offsets. The header keeps track of:

→ The # of used slots → The offset of the starting location of the last slot used.

30

Header Tuple #4 Tuple #2 Tuple #1

Fixed- and Var-length Tuple Data Slot Array

slide-49
SLIDE 49

15-445/645 (Fall 2020)

SLOTTED PAGES

The most common layout scheme is called slotted pages. The slot array maps "slots" to the tuples' starting position offsets. The header keeps track of:

→ The # of used slots → The offset of the starting location of the last slot used.

30

Header Tuple #4 Tuple #2 Tuple #1

Fixed- and Var-length Tuple Data Slot Array

slide-50
SLIDE 50

15-445/645 (Fall 2020)

RECORD IDS

The DBMS needs a way to keep track

  • f individual tuples.

Each tuple is assigned a unique record identifier.

→ Most common: page_id + offset/slot → Can also contain file location info.

An application cannot rely on these ids to mean anything.

31

CTID (6-bytes) ROWID (10-bytes) ROWID (8-bytes)

slide-51
SLIDE 51

15-445/645 (Fall 2020)

TODAY'S AGENDA

File Storage Page Layout Tuple Layout

32

slide-52
SLIDE 52

15-445/645 (Fall 2020)

TUPLE LAYOUT

A tuple is essentially a sequence of bytes. It's the job of the DBMS to interpret those bytes into attribute types and values.

33

slide-53
SLIDE 53

15-445/645 (Fall 2020)

Tuple

TUPLE HEADER

Each tuple is prefixed with a header that contains meta-data about it.

→ Visibility info (concurrency control) → Bit Map for NULL values.

We do not need to store meta-data about the schema.

34

Header Attribute Data

slide-54
SLIDE 54

15-445/645 (Fall 2020)

TUPLE DATA

Attributes are typically stored in the

  • rder that you specify them when you

create the table. This is done for software engineering reasons. We re-order attributes automatically in CMU's new DBMS…

35

Tuple

Header a b c d e

CREATE TABLE foo ( a INT PRIMARY KEY, b INT NOT NULL, c INT, d DOUBLE, e FLOAT );

slide-55
SLIDE 55

15-445/645 (Fall 2020)

DENORM ALIZED TUPLE DATA

Can physically denormalize (e.g., "pre join") related tuples and store them together in the same page.

→ Potentially reduces the amount of I/O for common workload patterns. → Can make updates more expensive.

36

CREATE TABLE foo ( a INT PRIMARY KEY, b INT NOT NULL, ); CREATE TABLE bar ( c INT PRIMARY KEY, a INT ⮱REFERENCES foo (a), );

slide-56
SLIDE 56

15-445/645 (Fall 2020)

DENORM ALIZED TUPLE DATA

Can physically denormalize (e.g., "pre join") related tuples and store them together in the same page.

→ Potentially reduces the amount of I/O for common workload patterns. → Can make updates more expensive.

36

foo

Header c a Header c a Header c a

bar

Header a b

slide-57
SLIDE 57

15-445/645 (Fall 2020)

DENORM ALIZED TUPLE DATA

Can physically denormalize (e.g., "pre join") related tuples and store them together in the same page.

→ Potentially reduces the amount of I/O for common workload patterns. → Can make updates more expensive.

Not a new idea.

→ IBM System R did this in the 1970s. → Several NoSQL DBMSs do this without calling it physical denormalization.

36

foo

c c c …

foo bar

Header a b

slide-58
SLIDE 58

15-445/645 (Fall 2020)

CONCLUSIO N

Database is organized in pages. Different ways to track pages. Different ways to store pages. Different ways to store tuples.

37

slide-59
SLIDE 59

15-445/645 (Fall 2020)

NEXT CLASS

Log-Structured Storage Value Representation Storage Models

38