Database Storage Part I Lecture # 03 Database Systems Andy Pavlo - - PowerPoint PPT Presentation

database storage
SMART_READER_LITE
LIVE PREVIEW

Database Storage Part I Lecture # 03 Database Systems Andy Pavlo - - PowerPoint PPT Presentation

Database Storage Part I Lecture # 03 Database Systems Andy Pavlo AP AP Computer Science 15-445/15-645 Carnegie Mellon Univ. Fall 2018 2 ADM IN ISTRIVIA Homework #1 is due Monday September 10 th @ 11:59pm Project #1 will be released on


slide-1
SLIDE 1

Database Systems 15-445/15-645 Fall 2018 Andy Pavlo Computer Science Carnegie Mellon Univ.

AP AP

Lecture # 03

Database Storage

Part I

slide-2
SLIDE 2

CMU 15-445/645 (Fall 2018)

ADM IN ISTRIVIA

Homework #1 is due Monday September 10th @ 11:59pm Project #1 will be released on Wednesday September 12th

2

slide-3
SLIDE 3

CMU 15-445/645 (Fall 2018)

UPCO M IN G DATABASE EVEN TS

Kinetica Talk

→ Thursday Sep 6th @ 12pm → CIC 4th Floor

SalesForce Talk

→ Friday Sep 7th @ 12pm → CIC 4th Floor

Relational AI Talk

→ Wednesday @ Sep 12th @ 4:00pm → GHC 8102

3

slide-4
SLIDE 4

CMU 15-445/645 (Fall 2018)

OVERVIEW

We now understand what a database looks like at a logical level and how to write queries to read/write data from it. We will next learn how to build software that manages a database.

4

slide-5
SLIDE 5

CMU 15-445/645 (Fall 2018)

CO URSE O UTLIN E

Relational Databases Storage Execution Concurrency Control Recovery Distributed Databases Potpourri

5

Query Planning Operator Execution Access Methods Buffer Pool Manager Disk Manager

slide-6
SLIDE 6

CMU 15-445/645 (Fall 2018)

DISK- O RIEN TED ARCH ITECTURE

The DBMS assumes that the primary storage location of the database is on non-volatile disk. The DBMS's components manage the movement

  • f data between non-volatile and volatile storage.

6

slide-7
SLIDE 7

CMU 15-445/645 (Fall 2018)

STO RAGE H IERARCH Y

7 CPU Registers

CPU Caches

DRAM SSD HDD Network Storage

Faster Smaller Slower Larger

Volatile Random Access Byte-Addressable Non-Volatile Sequential Access Block-Addressable

slide-8
SLIDE 8

CMU 15-445/645 (Fall 2018)

STO RAGE H IERARCH Y

7

Memory Disk CMU 15-721 (Spring 2019)

CPU Registers

CPU Caches

DRAM SSD HDD Network Storage

Faster Smaller Slower Larger

slide-9
SLIDE 9

CMU 15-445/645 (Fall 2018)

STO RAGE H IERARCH Y

7

Memory Disk CMU 15-721 (Spring 2019)

CPU Registers

CPU Caches

DRAM SSD HDD Network Storage

Faster Smaller Slower Larger

Non-volatile Memory

slide-10
SLIDE 10

CMU 15-445/645 (Fall 2018)

ACCESS TIM ES

8

0.5 ns L1 Cache Ref 7 ns L2 Cache Ref 100 ns DRAM 150,000 ns SSD 10,000,000 ns HDD ~30,000,000 ns Network Storage 1,000,000,000 ns Tape Archives

0.5 sec 7 sec 1 00 sec 1 .7 days 1 6.5 weeks 1 1 .4 months 31 .7 years

[Source]

slide-11
SLIDE 11

CMU 15-445/645 (Fall 2018)

SYSTEM DESIGN GOALS

Allow the DBMS to manage databases that exceed the amount of memory available. Reading/writing to disk is expensive, so it must be managed carefully to avoid large stalls and performance degradation.

9

slide-12
SLIDE 12

CMU 15-445/645 (Fall 2018)

SEQ UEN TIAL VS. RAN DO M ACCESS

Random access on an HDD is much slower than sequential access. Traditional DBMSs are designed to maximize sequential access.

→ Algorithms try to reduce number of writes to random pages so that data is stored in contiguous blocks. → Allocating multiple pages at the same time is called an extent.

10

slide-13
SLIDE 13

CMU 15-445/645 (Fall 2018)

WH Y N OT USE TH E O S?

One can use mmap to map the contents

  • f a file into a process' address space.

The OS is responsible for moving data for moving the files' pages in and out

  • f memory.

11

page1 page2 page3 page4

On-Disk File

slide-14
SLIDE 14

CMU 15-445/645 (Fall 2018)

WH Y N OT USE TH E O S?

One can use mmap to map the contents

  • f a file into a process' address space.

The OS is responsible for moving data for moving the files' pages in and out

  • f memory.

11

page1 page2 page3 page4

On-Disk File Virtual Memory

page1 page2 page3 page4

Physical Memory

slide-15
SLIDE 15

CMU 15-445/645 (Fall 2018)

WH Y N OT USE TH E O S?

One can use mmap to map the contents

  • f a file into a process' address space.

The OS is responsible for moving data for moving the files' pages in and out

  • f memory.

11

page1 page2 page3 page4

On-Disk File Virtual Memory

page1 page2 page3 page4

Physical Memory

slide-16
SLIDE 16

CMU 15-445/645 (Fall 2018)

WH Y N OT USE TH E O S?

One can use mmap to map the contents

  • f a file into a process' address space.

The OS is responsible for moving data for moving the files' pages in and out

  • f memory.

11

page1 page2 page3 page4

On-Disk File Virtual Memory

page1 page2 page3 page4

Physical Memory

page1 page1

slide-17
SLIDE 17

CMU 15-445/645 (Fall 2018)

WH Y N OT USE TH E O S?

One can use mmap to map the contents

  • f a file into a process' address space.

The OS is responsible for moving data for moving the files' pages in and out

  • f memory.

11

page1 page2 page3 page4

On-Disk File Virtual Memory

page1 page2 page3 page4

Physical Memory

page1 page1 page3 page3

slide-18
SLIDE 18

CMU 15-445/645 (Fall 2018)

WH Y N OT USE TH E O S?

One can use mmap to map the contents

  • f a file into a process' address space.

The OS is responsible for moving data for moving the files' pages in and out

  • f memory.

11

page1 page2 page3 page4

On-Disk File Virtual Memory

page1 page2 page3 page4

Physical Memory

page1 page1 page3 page3

???

slide-19
SLIDE 19

CMU 15-445/645 (Fall 2018)

WH Y N OT USE TH E O S?

What if we allow multiple threads to access the mmap files to hide page fault stalls? This works good enough for read-only access. It is complicated when there are multiple writers…

12

slide-20
SLIDE 20

CMU 15-445/645 (Fall 2018)

WH Y N OT USE TH E O S?

There are some solutions to this problem:

→ madvise: Tell the OS how you expect to read certain pages. → mlock: Tell the OS that memory ranges cannot be paged out. → msync: Tell the OS to flush memory ranges out to disk.

13

Full Usage Partial Usage

slide-21
SLIDE 21

CMU 15-445/645 (Fall 2018)

WH Y N OT USE TH E O S?

DBMS (almost) always wants to control things itself and can do a better job at it.

→ Flushing dirty pages to disk in the correct order. → Specialized prefetching. → Buffer replacement policy. → Thread/process scheduling.

The OS is not your friend.

14

slide-22
SLIDE 22

CMU 15-445/645 (Fall 2018)

DATABASE STO RAGE

Problem #1: How the DBMS represents the database in files on disk. Problem #2: How the DBMS manages its memory and move data back-and-forth from disk.

15

slide-23
SLIDE 23

CMU 15-445/645 (Fall 2018)

TO DAY'S AGEN DA

File Storage Page Layout Tuple Layout

16

slide-24
SLIDE 24

CMU 15-445/645 (Fall 2018)

FILE STO RAGE

The DBMS stores a database as one or more files

  • n disk.

The OS doesn't know anything about these files.

→ All of the standard filesystem protections are used. → Early systems in the 1980s used custom "filesystems" on raw storage.

17

slide-25
SLIDE 25

CMU 15-445/645 (Fall 2018)

STO RAGE M AN AGER

The storage manager is responsible for maintaining a database's files. It organizes the files as a collection of pages.

→ Tracks data read/written to pages. → Tracks the available space.

18

slide-26
SLIDE 26

CMU 15-445/645 (Fall 2018)

DATABASE PAGES

A page is a fixed-size block of data.

→ It can contain tuples, meta-data, indexes, log records… → Most systems do not mix page types. → Some systems require a page to be self-contained.

Each page is given a unique identifier.

→ The DBMS uses an indirection layer to map page ids to physical locations.

19

slide-27
SLIDE 27

CMU 15-445/645 (Fall 2018)

DATABASE PAGES

There are three different notions of "pages" in a DBMS:

→ Hardware Page (usually 4KB) → OS Page (usually 4KB) → Database Page (1-16KB)

By hardware page, we mean at what level the device can guarantee a "failsafe write".

20

4KB 16KB 8KB 1KB

slide-28
SLIDE 28

CMU 15-445/645 (Fall 2018)

PAGE STO RAGE ARCH ITECTURE

Different DBMSs manage pages in files on disk in different ways.

→ Heap File Organization → Sequential / Sorted File Organization → Hashing File Organization

At this point in the hierarchy we don't need to know anything about what is inside of the pages.

21

slide-29
SLIDE 29

CMU 15-445/645 (Fall 2018)

DATABASE H EAP

A heap file is an unordered collection of pages where tuples that are stored in random order.

→ Get / Delete Page → Must also support iterating over all pages.

Need meta-data to keep track of what pages exist and which ones have free space. Two ways to represent a heap file:

→ Linked List → Page Directory

22

slide-30
SLIDE 30

CMU 15-445/645 (Fall 2018)

H EAP FILE: LIN KED LIST

Maintain a header page at the beginning of the file that stores two pointers:

→ HEAD of the free page list. → HEAD of the data page list.

Each page keeps track of the number

  • f free slots in itself.

23

Header Page Data Page Data Page Data Page Data

… …

Free Page List Data Page List

slide-31
SLIDE 31

CMU 15-445/645 (Fall 2018)

H EAP FILE: PAGE DIRECTO RY

The DBMS maintains special pages that tracks the location of data pages in the database files. The directory also records the number

  • f free slots per page.

The DBMS has to make sure that the directory pages are in sync with the data pages.

24

Directory

Page Data Page Data Page Data

slide-32
SLIDE 32

CMU 15-445/645 (Fall 2018)

TO DAY'S AGEN DA

File Storage Page Layout Tuple Layout

27

slide-33
SLIDE 33

CMU 15-445/645 (Fall 2018)

PAGE H EADER

Every page contains a header of meta- data about the page's contents.

→ Page Size → Checksum → DBMS Version → Transaction Visibility → Compression Information

Some systems require pages to be self- contained (e.g., Oracle).

28

Data

Page

Header

slide-34
SLIDE 34

CMU 15-445/645 (Fall 2018)

PAGE LAYO UT

For any page storage architecture, we now need to understand how to organize the data stored inside

  • f the page.

→ We are still assuming that we are only storing tuples.

Two approaches:

→ Tuple-oriented → Log-structured

29

slide-35
SLIDE 35

CMU 15-445/645 (Fall 2018)

TUPLE STO RAGE

How to store tuples in a page? Strawman Idea: Keep track of the number of tuples in a page and then just append a new tuple to the end.

30

Page

Num Tuples = 0

slide-36
SLIDE 36

CMU 15-445/645 (Fall 2018)

TUPLE STO RAGE

How to store tuples in a page? Strawman Idea: Keep track of the number of tuples in a page and then just append a new tuple to the end.

30

Page

Num Tuples = 0 Tuple #1 Tuple #2 Tuple #3 Num Tuples = 3

slide-37
SLIDE 37

CMU 15-445/645 (Fall 2018)

TUPLE STO RAGE

How to store tuples in a page? Strawman Idea: Keep track of the number of tuples in a page and then just append a new tuple to the end.

→ What happens if we delete a tuple?

30

Page

Num Tuples = 0 Tuple #1 Tuple #3 Num Tuples = 3 Num Tuples = 2

slide-38
SLIDE 38

CMU 15-445/645 (Fall 2018)

TUPLE STO RAGE

How to store tuples in a page? Strawman Idea: Keep track of the number of tuples in a page and then just append a new tuple to the end.

→ What happens if we delete a tuple?

30

Page

Num Tuples = 0 Tuple #1 Tuple #3 Tuple #4 Num Tuples = 3

slide-39
SLIDE 39

CMU 15-445/645 (Fall 2018)

TUPLE STO RAGE

How to store tuples in a page? Strawman Idea: Keep track of the number of tuples in a page and then just append a new tuple to the end.

→ What happens if we delete a tuple? → What happens if we have a variable- length attribute?

30

Page

Num Tuples = 0 Tuple #1 Tuple #3 Tuple #4 Num Tuples = 3

slide-40
SLIDE 40

CMU 15-445/645 (Fall 2018)

SLOTTED PAGES

The most common layout scheme is called slotted pages. The slot array maps "slots" to the tuples' starting position offsets. The header keeps track of:

→ The # of used slots → The offset of the starting location of the last slot used.

31

Header Tuple #4 Tuple #2 Tuple #3 Tuple #1

Fixed/Var-length Tuple Data Slot Array

slide-41
SLIDE 41

CMU 15-445/645 (Fall 2018)

SLOTTED PAGES

The most common layout scheme is called slotted pages. The slot array maps "slots" to the tuples' starting position offsets. The header keeps track of:

→ The # of used slots → The offset of the starting location of the last slot used.

31

Header Tuple #4 Tuple #2 Tuple #3 Tuple #1

Fixed/Var-length Tuple Data Slot Array

slide-42
SLIDE 42

CMU 15-445/645 (Fall 2018)

LO G- STRUCTURED FILE O RGAN IZATIO N

Instead of storing tuples in pages, the DBMS only stores log records. The system appends log records to the file of how the database was modified:

→ Inserts store the entire tuple. → Deletes mark the tuple as deleted. → Updates contain the delta of just the attributes that were modified.

32

New Entries

INSERT id=1,val=a INSERT id=2,val=b DELETE id=4 UPDATE val=X (id=3) UPDATE val=Y (id=4) INSERT id=3,val=c

Page

slide-43
SLIDE 43

CMU 15-445/645 (Fall 2018)

LO G- STRUCTURED FILE O RGAN IZATIO N

To read a record, the DBMS scans the log backwards and "recreates" the tuple to find what it needs. Build indexes to allow it to jump to locations in the log.

33

INSERT id=1,val=a INSERT id=2,val=b DELETE id=4 UPDATE val=X (id=3) UPDATE val=Y (id=4) INSERT id=3,val=c

Reads Page

slide-44
SLIDE 44

CMU 15-445/645 (Fall 2018)

LO G- STRUCTURED FILE O RGAN IZATIO N

To read a record, the DBMS scans the log backwards and "recreates" the tuple to find what it needs. Build indexes to allow it to jump to locations in the log.

33

INSERT id=1,val=a INSERT id=2,val=b DELETE id=4 UPDATE val=X (id=3) UPDATE val=Y (id=4) INSERT id=3,val=c

id=1 id=2 id=3 id=4

Page

slide-45
SLIDE 45

CMU 15-445/645 (Fall 2018)

LO G- STRUCTURED FILE O RGAN IZATIO N

To read a record, the DBMS scans the log backwards and "recreates" the tuple to find what it needs. Build indexes to allow it to jump to locations in the log. Periodically compact the log.

33

id=1,val=a id=2,val=b id=3,val=X id=4,val=Y

Page

slide-46
SLIDE 46

CMU 15-445/645 (Fall 2018)

LO G- STRUCTURED CO M PACTIO N

Compaction coalesces larger log files into smaller files by removing unnecessary records.

34 Sorted Log File

Level 0

Level Compaction

slide-47
SLIDE 47

CMU 15-445/645 (Fall 2018)

LO G- STRUCTURED CO M PACTIO N

Compaction coalesces larger log files into smaller files by removing unnecessary records.

34 Sorted Log File

Level 0

Compaction

Sorted Log File

Level Compaction

slide-48
SLIDE 48

CMU 15-445/645 (Fall 2018)

LO G- STRUCTURED CO M PACTIO N

Compaction coalesces larger log files into smaller files by removing unnecessary records.

34 Sorted Log File Sorted Log File

Level 0 Level 1

Compaction

Sorted Log File

Level Compaction

slide-49
SLIDE 49

CMU 15-445/645 (Fall 2018)

LO G- STRUCTURED CO M PACTIO N

Compaction coalesces larger log files into smaller files by removing unnecessary records.

34 Sorted Log File Sorted Log File

Level 0 Level 1

Compaction

Sorted Log File Sorted Log File

Level Compaction

slide-50
SLIDE 50

CMU 15-445/645 (Fall 2018)

LO G- STRUCTURED CO M PACTIO N

Compaction coalesces larger log files into smaller files by removing unnecessary records.

34 Sorted Log File Sorted Log File Sorted Log File

Level 0 Level 1 Level 2

Compaction

Sorted Log File Sorted Log File

Level Compaction

slide-51
SLIDE 51

CMU 15-445/645 (Fall 2018)

LO G- STRUCTURED CO M PACTIO N

Compaction coalesces larger log files into smaller files by removing unnecessary records.

34 Sorted Log File Sorted Log File Sorted Log File

Level 0 Level 1 Level 2

Compaction

Sorted Log File Sorted Log File

Level Compaction Universal Compaction

Sorted Log File Sorted Log File Sorted Log File Sorted Log File

slide-52
SLIDE 52

CMU 15-445/645 (Fall 2018)

LO G- STRUCTURED CO M PACTIO N

Compaction coalesces larger log files into smaller files by removing unnecessary records.

34 Sorted Log File Sorted Log File Sorted Log File

Level 0 Level 1 Level 2

Compaction

Sorted Log File Sorted Log File

Level Compaction Universal Compaction

Sorted Log File Sorted Log File Sorted Log File Sorted Log File Sorted Log File Sorted Log File Sorted Log File

slide-53
SLIDE 53

CMU 15-445/645 (Fall 2018)

TO DAY'S AGEN DA

File Storage Page Layout Tuple Layout

35

slide-54
SLIDE 54

CMU 15-445/645 (Fall 2018)

TUPLE LAYO UT

A tuple is essentially a sequence of bytes. It's the job of the DBMS to interpret those bytes into attribute types and values.

36

slide-55
SLIDE 55

CMU 15-445/645 (Fall 2018)

Tuple

TUPLE H EADER

Each tuple is prefixed with a header that contains meta-data about it.

→ Visibility info (concurrency control) → Bit Map for NULL values.

We do not need to store meta-data about the schema.

37

Header Attribute Data

slide-56
SLIDE 56

CMU 15-445/645 (Fall 2018)

TUPLE DATA

Attributes are typically stored in the

  • rder that you specify them when you

create the table. This is done for software engineering reasons. We re-order attributes automatically in CMU's new DBMS…

38

Tuple

Header a b c d e

CREATE TABLE foo ( a INT PRIMARY KEY, b INT NOT NULL, c INT, d DOUBLE, e FLOAT );

slide-57
SLIDE 57

CMU 15-445/645 (Fall 2018)

DEN O RM ALIZED TUPLE DATA

Can physically denormalize (e.g., "pre join") related tuples and store them together in the same page.

→ Potentially reduces the amount of I/O for common workload patterns. → Can make updates more expensive.

39

CREATE TABLE foo ( a INT PRIMARY KEY, b INT NOT NULL, ); CREATE TABLE bar ( c INT PRIMARY KEY, a INT ⮱REFERENCES foo (a), );

slide-58
SLIDE 58

CMU 15-445/645 (Fall 2018)

DEN O RM ALIZED TUPLE DATA

Can physically denormalize (e.g., "pre join") related tuples and store them together in the same page.

→ Potentially reduces the amount of I/O for common workload patterns. → Can make updates more expensive.

39

foo

Header c a Header c a Header c a

bar

Header a b

slide-59
SLIDE 59

CMU 15-445/645 (Fall 2018)

DEN O RM ALIZED TUPLE DATA

Can physically denormalize (e.g., "pre join") related tuples and store them together in the same page.

→ Potentially reduces the amount of I/O for common workload patterns. → Can make updates more expensive.

39

foo

c c c …

foo bar

Header a b

slide-60
SLIDE 60

CMU 15-445/645 (Fall 2018)

DEN O RM ALIZED TUPLE DATA

Can physically denormalize (e.g., "pre join") related tuples and store them together in the same page.

→ Potentially reduces the amount of I/O for common workload patterns. → Can make updates more expensive.

Not a new idea.

→ IBM System R did this in the 1970s. → Several NoSQL DBMSs do this without calling it physical denormalization.

39

foo

c c c …

foo bar

Header a b

slide-61
SLIDE 61

CMU 15-445/645 (Fall 2018)

RECO RD IDS

The DBMS needs a way to keep track

  • f individual tuples.

Each tuple is assigned a unique record identifier.

→ Most common: page_id + offset/slot → Can also contain file location info.

An application cannot rely on these ids to mean anything.

40

CTID (4-bytes) ROWID (10-bytes) ROWID (8-bytes)

slide-62
SLIDE 62

CMU 15-445/645 (Fall 2018)

CO N CLUSIO N

Database is organized in pages. Different ways to track pages. Different ways to store pages. Different ways to store tuples.

41

slide-63
SLIDE 63

CMU 15-445/645 (Fall 2018)

N EXT CLASS

Value Representation Storage Models

42