15-721 DATABASE SYSTEMS Lecture #22 Larger-than-Memory Databases - - PowerPoint PPT Presentation

15 721
SMART_READER_LITE
LIVE PREVIEW

15-721 DATABASE SYSTEMS Lecture #22 Larger-than-Memory Databases - - PowerPoint PPT Presentation

15-721 DATABASE SYSTEMS Lecture #22 Larger-than-Memory Databases Andy Pavlo / / Carnegie Mellon University / / Spring 2016 2 ADMINISTRIVIA Final Exam: April 27 th @ 12:00pm Three short-essay questions. I will provide sample


slide-1
SLIDE 1

Andy Pavlo / / Carnegie Mellon University / / Spring 2016

Lecture #22 – Larger-than-Memory Databases

DATABASE SYSTEMS

15-721

slide-2
SLIDE 2

CMU 15-721 (Spring 2016)

ADMINISTRIVIA

Final Exam: April 27th @ 12:00pm

→ Three short-essay questions. → I will provide sample questions this week.

“Final” Presentations: May 6th @ 1:00pm

→ 10 minutes per group → Food and prizes for everyone!

Code Reviews: May 8th @ 11:59pm

→ I will announce group assignments and guidelines next class.

2

slide-3
SLIDE 3

CMU 15-721 (Spring 2016)

QUICKSTEP

Pivotal has submitted QuickStep to be an Apache Project. They are also working on a distributed version

  • f the system.

https://wiki.apache.org/incubator/QuickstepProposal

3

slide-4
SLIDE 4

CMU 15-721 (Spring 2016)

TODAY’S AGENDA

Background Implementation Issues Real-world Examples Evaluation

4

slide-5
SLIDE 5

CMU 15-721 (Spring 2016)

MOTIVATION

DRAM is expensive, son. It would be nice if our in-memory DBMS could use cheaper storage.

5

slide-6
SLIDE 6

CMU 15-721 (Spring 2016)

LARGER-THAN-MEMORY DATABASES

Allow an in-memory DBMS to store/access data on disk without bringing back all the slow parts of a disk-oriented DBMS. Need to be aware of hardware access methods

→ In-memory Storage = Tuple-Oriented → Disk Storage = Block-Oriented

6

slide-7
SLIDE 7

CMU 15-721 (Spring 2016)

OLAP

OLAP queries generally access the entire table. Thus, there isn’t anything about the workload for the DBMS to exploit that a disk-oriented buffer pool can’t handle.

7

slide-8
SLIDE 8

CMU 15-721 (Spring 2016)

OLAP

OLAP queries generally access the entire table. Thus, there isn’t anything about the workload for the DBMS to exploit that a disk-oriented buffer pool can’t handle.

7

A

slide-9
SLIDE 9

CMU 15-721 (Spring 2016)

OLAP

OLAP queries generally access the entire table. Thus, there isn’t anything about the workload for the DBMS to exploit that a disk-oriented buffer pool can’t handle.

7

Disk Data

A

In-Memory

Pre-Computed (A)

MIN=## MAX=## SUM=## COUNT=## AVG=### STDEV=###

A

slide-10
SLIDE 10

CMU 15-721 (Spring 2016)

OLTP

OLTP workloads almost always have hot and cold portions of the database.

→ We can assume that txns will almost always access hot tuples.

The DBMS needs a mechanism to move cold data out to disk and then retrieve it if it is ever needed again.

8

slide-11
SLIDE 11

CMU 15-721 (Spring 2016)

LARGER-THAN-MEMORY DATABASES

9

In-Memory Table Heap

Tuple #01 Tuple #03 Tuple #04 Tuple #00 Tuple #02

Cold-Data Storage In-Memory Index

slide-12
SLIDE 12

CMU 15-721 (Spring 2016)

LARGER-THAN-MEMORY DATABASES

9

In-Memory Table Heap

Tuple #01 Tuple #03 Tuple #04 Tuple #00 Tuple #02

Cold-Data Storage In-Memory Index

slide-13
SLIDE 13

CMU 15-721 (Spring 2016)

LARGER-THAN-MEMORY DATABASES

9

In-Memory Table Heap

Tuple #00 Tuple #02

Cold-Data Storage

header

Tuple #01 Tuple #03 Tuple #04

In-Memory Index

slide-14
SLIDE 14

CMU 15-721 (Spring 2016)

LARGER-THAN-MEMORY DATABASES

9

In-Memory Table Heap

Tuple #00 Tuple #02

Cold-Data Storage

header

Tuple #01 Tuple #03 Tuple #04

In-Memory Index

Evicted Tuple Block

slide-15
SLIDE 15

CMU 15-721 (Spring 2016)

LARGER-THAN-MEMORY DATABASES

9

In-Memory Table Heap

Tuple #00 Tuple #02

Cold-Data Storage

header

Tuple #01 Tuple #03 Tuple #04

In-Memory Index

??? ??? ???

Evicted Tuple Block

slide-16
SLIDE 16

CMU 15-721 (Spring 2016)

LARGER-THAN-MEMORY DATABASES

9

In-Memory Table Heap

Tuple #00 Tuple #02

Cold-Data Storage

header

Tuple #01 Tuple #03 Tuple #04

In-Memory Index

??? ??? ??? ???

Evicted Tuple Block

slide-17
SLIDE 17

CMU 15-721 (Spring 2016)

LARGER-THAN-MEMORY DATABASES

9

In-Memory Table Heap

Tuple #00 Tuple #02

Cold-Data Storage

header

Tuple #01 Tuple #03 Tuple #04

In-Memory Index

SELECT * FROM table WHERE id = <Tuple #01>

??? ??? ??? ???

Evicted Tuple Block

slide-18
SLIDE 18

CMU 15-721 (Spring 2016)

LARGER-THAN-MEMORY DATABASES

9

In-Memory Table Heap

Tuple #00 Tuple #02

Cold-Data Storage

header

Tuple #01 Tuple #03 Tuple #04

In-Memory Index

SELECT * FROM table WHERE id = <Tuple #01>

??? ??? ??? ???

Evicted Tuple Block

slide-19
SLIDE 19

CMU 15-721 (Spring 2016)

LARGER-THAN-MEMORY DATABASES

9

In-Memory Table Heap

Tuple #00 Tuple #02

Cold-Data Storage

header

Tuple #01 Tuple #03 Tuple #04

In-Memory Index

SELECT * FROM table WHERE id = <Tuple #01>

??? ??? ??? ???

Evicted Tuple Block

slide-20
SLIDE 20

CMU 15-721 (Spring 2016)

LARGER-THAN-MEMORY DATABASES

9

In-Memory Table Heap

Tuple #00 Tuple #02

Cold-Data Storage

header

Tuple #01 Tuple #03 Tuple #04

In-Memory Index

SELECT * FROM table WHERE id = <Tuple #01>

??? ??? ??? ???

Evicted Tuple Block

???

slide-21
SLIDE 21

CMU 15-721 (Spring 2016)

LARGER-THAN-MEMORY DATABASES

9

In-Memory Table Heap

Tuple #00 Tuple #02

Cold-Data Storage

header

Tuple #01 Tuple #03 Tuple #04

In-Memory Index

SELECT * FROM table WHERE id = <Tuple #01>

??? ??? ??? ???

Evicted Tuple Block

??? ???

slide-22
SLIDE 22

CMU 15-721 (Spring 2016)

OLTP ISSUES

Run-time Operations

→ Cold Tuple Identification

Eviction Policies

→ Timing → Evicted Tuple Metadata

Data Retrieval Policies

→ Granularity → Retrieval Mechanism → Merging back to memory

10

slide-23
SLIDE 23

CMU 15-721 (Spring 2016)

COLD TUPLE IDENTIFICATION

Choice #1: On-line

→ The DBMS monitors txn access patterns and tracks how often tuples are used. → Embed the tracking meta-data directly in tuples.

Choice #2: Off-line

→ Maintain a tuple access log during txn execution. → Process in background to compute frequencies.

11

slide-24
SLIDE 24

CMU 15-721 (Spring 2016)

EVICTION TIMING

Choice #1: Threshold

→ The DBMS monitors memory usage and begins evicting tuples when it reaches a threshold. → The DBMS has to manually move data.

Choice #2: OS Virtual Memory

→ The OS decides when it wants to move data out to

  • disk. This is done in the background.

12

slide-25
SLIDE 25

CMU 15-721 (Spring 2016)

EVICTED TUPLE METADATA

Choice #1: Tombstones

→ Leave a marker that points to the on-disk tuple. → Update indexes to point to the tombstone tuples.

Choice #2: Bloom Filters

→ Use approximate data structure for each index. → Check both index + filter for each query.

Choice #3: OS Virtual Memory

→ The OS tracks what data is on disk. The DBMS does not need to maintain any additional metadata.

13

slide-26
SLIDE 26

CMU 15-721 (Spring 2016)

EVICTED TUPLE METADATA

14

In-Memory Table Heap

Tuple #01 Tuple #03 Tuple #04 Tuple #00 Tuple #02

Cold-Data Storage In-Memory Index

slide-27
SLIDE 27

CMU 15-721 (Spring 2016)

EVICTED TUPLE METADATA

14

In-Memory Table Heap

Tuple #01 Tuple #03 Tuple #04 Tuple #00 Tuple #02

Cold-Data Storage In-Memory Index

Access Frequency

Tuple #00 Tuple #01 Tuple #02 Tuple #03 Tuple #04 Tuple #05

slide-28
SLIDE 28

CMU 15-721 (Spring 2016)

EVICTED TUPLE METADATA

14

In-Memory Table Heap

Tuple #01 Tuple #03 Tuple #04 Tuple #00 Tuple #02

Cold-Data Storage In-Memory Index

Access Frequency

Tuple #00 Tuple #01 Tuple #02 Tuple #03 Tuple #04 Tuple #05

slide-29
SLIDE 29

CMU 15-721 (Spring 2016)

EVICTED TUPLE METADATA

14

In-Memory Table Heap

Tuple #00 Tuple #02

Cold-Data Storage

header

Tuple #01 Tuple #03 Tuple #04

In-Memory Index

Access Frequency

Tuple #00 Tuple #01 Tuple #02 Tuple #03 Tuple #04 Tuple #05

slide-30
SLIDE 30

CMU 15-721 (Spring 2016)

EVICTED TUPLE METADATA

14

In-Memory Table Heap

Tuple #00 Tuple #02

Cold-Data Storage

header

Tuple #01 Tuple #03 Tuple #04

In-Memory Index

Access Frequency

Tuple #00 Tuple #01 Tuple #02 Tuple #03 Tuple #04 Tuple #05

slide-31
SLIDE 31

CMU 15-721 (Spring 2016)

EVICTED TUPLE METADATA

14

In-Memory Table Heap

Tuple #00 Tuple #02

Cold-Data Storage

header

Tuple #01 Tuple #03 Tuple #04

In-Memory Index

slide-32
SLIDE 32

CMU 15-721 (Spring 2016)

EVICTED TUPLE METADATA

14

In-Memory Table Heap

Tuple #00 Tuple #02

Cold-Data Storage

header

Tuple #01 Tuple #03 Tuple #04

In-Memory Index

<Block,Offset> <Block,Offset> <Block,Offset>

slide-33
SLIDE 33

CMU 15-721 (Spring 2016)

EVICTED TUPLE METADATA

14

In-Memory Table Heap

Tuple #00 Tuple #02

Cold-Data Storage

header

Tuple #01 Tuple #03 Tuple #04

In-Memory Index Bloom Filter Index

slide-34
SLIDE 34

CMU 15-721 (Spring 2016)

DATA RETRIEVAL GRANULARITY

Choice #1: Only Tuples Needed

→ Only merge the tuples that were accessed by a query back into the in-memory table heap. → Requires additional bookkeeping to track holes.

Choice #2: All Tuples in Block

→ Merge all the tuples retrieved from a block regardless of whether they are needed. → More CPU overhead to update indexes. → Tuples are likely to be evicted again.

15

slide-35
SLIDE 35

CMU 15-721 (Spring 2016)

RETRIEVAL MECHANISM

Choice #1: Abort-and-Restart

→ Abort the txn that accessed the evicted tuple. → Retrieve the data from disk and merge it into memory with a separate background thread. → Restart the txn when the data is ready. → Cannot guarantee consistency for large queries.

Choice #2: Synchronous Retrieval

→ Stall the txn when it accesses an evicted tuple while the DBMS fetches the data and merges it back into memory.

16

slide-36
SLIDE 36

CMU 15-721 (Spring 2016)

MERGING THRESHOLD

Choice #1: Always Merge

→ Retrieved tuples are always put into table heap.

Choice #2: Merge Only on Update

→ Retrieved tuples are only merged into table heap if they are used in an UPDATE query. → All other tuples are put in a temporary buffer.

Choice #3: Selective Merge

→ Keep track of how often each block is retrieved. → If a block’s access frequency is above some threshold, merge it back into the table heap.

17

slide-37
SLIDE 37

CMU 15-721 (Spring 2016)

REAL-WORLD IMPLEMENTATIONS

H-Store – Anti-Caching Hekaton – Project Siberia EPFL’s VoltDB Prototype Apache Geode – Overflow Tables MemSQL – Columnar Tables

18

slide-38
SLIDE 38

CMU 15-721 (Spring 2016)

H-STORE – ANTI-CACHING

On-line Identification Administrator-defined Threshold Tombstones Abort-and-restart Retrieval Block-level Granularity Always Merge

19

ANTI-CACHING: A NEW APPROACH TO DATABASE MANAGEMENT SYSTEM ARCHITECTURE VLDB 2013

slide-39
SLIDE 39

CMU 15-721 (Spring 2016)

HEKATON – PROJECT SIBERIA

Off-line Identification Administrator-defined Threshold Bloom Filters Synchronous Retrieval Tuple-level Granularity Always Merge

20

TREKKING THROUGH SIBERIA: MANAGING COLD DATA IN A MEMORY-OPTIMIZED DATABASE VLDB 2014

slide-40
SLIDE 40

CMU 15-721 (Spring 2016)

EPFL VOLTDB

Off-line Identification OS Virtual Memory Synchronous Retrieval Page-level Granularity Always Merge

21

ENABLING EFFICIENT OS PAGING FOR MAIN- MEMORY OLTP DATABASES DAMON 2013

slide-41
SLIDE 41

CMU 15-721 (Spring 2016)

In-Memory Table Heap Cold-Data Storage

EPFL VOLTDB

22

Tuple #00 Tuple #02 Tuple #01

slide-42
SLIDE 42

CMU 15-721 (Spring 2016)

In-Memory Table Heap Cold-Data Storage

EPFL VOLTDB

22

Tuple #00 Tuple #02

Hot Tuples Cold Tuples

Tuple #01

slide-43
SLIDE 43

CMU 15-721 (Spring 2016)

In-Memory Table Heap Cold-Data Storage

EPFL VOLTDB

22

Tuple #00 Tuple #02

Hot Tuples Cold Tuples

Tuple #01

slide-44
SLIDE 44

CMU 15-721 (Spring 2016)

In-Memory Table Heap Cold-Data Storage

EPFL VOLTDB

22

Tuple #00 Tuple #02

Hot Tuples Cold Tuples

Tuple #01

slide-45
SLIDE 45

CMU 15-721 (Spring 2016)

In-Memory Table Heap Cold-Data Storage

EPFL VOLTDB

22

Tuple #00 Tuple #02

Hot Tuples Cold Tuples

Tuple #01 Tuple #03

slide-46
SLIDE 46

CMU 15-721 (Spring 2016)

In-Memory Table Heap Cold-Data Storage

EPFL VOLTDB

22

Tuple #00 Tuple #02

Hot Tuples Cold Tuples

Tuple #01 Tuple #03

slide-47
SLIDE 47

CMU 15-721 (Spring 2016)

In-Memory Table Heap Cold-Data Storage

EPFL VOLTDB

22

Tuple #00 Tuple #02

Hot Tuples Cold Tuples

Tuple #03 Tuple #01

slide-48
SLIDE 48

CMU 15-721 (Spring 2016)

In-Memory Table Heap Cold-Data Storage

EPFL VOLTDB

22

Tuple #00 Tuple #02

Hot Tuples Cold Tuples

Tuple #03 Tuple #01

slide-49
SLIDE 49

CMU 15-721 (Spring 2016)

In-Memory Table Heap Cold-Data Storage

EPFL VOLTDB

22

Tuple #00 Tuple #02

Hot Tuples Cold Tuples

Tuple #03

slide-50
SLIDE 50

CMU 15-721 (Spring 2016)

In-Memory Table Heap Cold-Data Storage

EPFL VOLTDB

22

Tuple #00 Tuple #02

Hot Tuples Cold Tuples

Tuple #03

slide-51
SLIDE 51

CMU 15-721 (Spring 2016)

APACHE GEODE – OVERFLOW TABLES

On-line Identification Administrator-defined Threshold Tombstones (?) Synchronous Retrieval Tuple-level Granularity Merge Only on Update (?)

23

Source: Apache Geode Documentation

slide-52
SLIDE 52

CMU 15-721 (Spring 2016)

MEMSQL – COLUMNAR TABLES

Administrator manually declares a table as a distinct disk-resident columnar table.

→ Appears as a separate logical table to the application. → Uses mmap to manage buffer pool. → Pre-computed aggregates per block always in memory.

Manual Identification No Evicted Metadata is needed. Synchronous Retrieval Always Merge

24

Source: MemSQL Documentation

slide-53
SLIDE 53

CMU 15-721 (Spring 2016)

EVALUATION

Compare different design decisions in H-Store with anti-caching. Storage Devices:

→ Hard-Disk Drive (HDD) → Shingled Magnetic Recording Drive (SMR) → Solid-State Drive (SSD) → 3D XPoint (3DX) → Non-volatile Memory (NVRAM)

25

LARGER-THAN-MEMORY DATA MANAGEMENT ON MODERN STORAGE HARDWARE FOR IN- MEMORY OLTP DATABASE SYSTEMS UNDER SUBMISSION

slide-54
SLIDE 54

CMU 15-721 (Spring 2016)

MICROBENCHMARK

26

1E+00 1E+02 1E+04 1E+06 1E+08

HDD SMR SSD 3D XPoint NVRAM DRAM Latency (nanosec)

1KB Read 1KB Write 64KB Read 64KB Write

102 100 104 108 106

10m Tuples – 1KB each 50% Reads / 50% Writes – Synchronization Enabled

slide-55
SLIDE 55

CMU 15-721 (Spring 2016)

MICROBENCHMARK

26

1E+00 1E+02 1E+04 1E+06 1E+08

HDD SMR SSD 3D XPoint NVRAM DRAM Latency (nanosec)

1KB Read 1KB Write 64KB Read 64KB Write

102 100 104 108 106

10m Tuples – 1KB each 50% Reads / 50% Writes – Synchronization Enabled

slide-56
SLIDE 56

CMU 15-721 (Spring 2016)

MICROBENCHMARK

26

1E+00 1E+02 1E+04 1E+06 1E+08

HDD SMR SSD 3D XPoint NVRAM DRAM Latency (nanosec)

1KB Read 1KB Write 64KB Read 64KB Write

102 100 104 108 106

10m Tuples – 1KB each 50% Reads / 50% Writes – Synchronization Enabled

slide-57
SLIDE 57

CMU 15-721 (Spring 2016)

MERGING THRESHOLD

27

50000 100000 150000 200000 250000 HDD (AR) HDD (SR) SMR (AR) SMR (SR) SSD 3DX NVMRAM Throughput (txn/sec)

Merge (Update-Only) Merge (Top-5%) Merge (Top-20%) Merge (All)

YCSB Workload – 90% Reads / 10% Writes 10GB Database using 1.25GB Memory

slide-58
SLIDE 58

CMU 15-721 (Spring 2016)

MERGING THRESHOLD

27

50000 100000 150000 200000 250000 HDD (AR) HDD (SR) SMR (AR) SMR (SR) SSD 3DX NVMRAM Throughput (txn/sec)

Merge (Update-Only) Merge (Top-5%) Merge (Top-20%) Merge (All)

YCSB Workload – 90% Reads / 10% Writes 10GB Database using 1.25GB Memory

slide-59
SLIDE 59

CMU 15-721 (Spring 2016)

MERGING THRESHOLD

27

50000 100000 150000 200000 250000 HDD (AR) HDD (SR) SMR (AR) SMR (SR) SSD 3DX NVMRAM Throughput (txn/sec)

Merge (Update-Only) Merge (Top-5%) Merge (Top-20%) Merge (All)

YCSB Workload – 90% Reads / 10% Writes 10GB Database using 1.25GB Memory

DRAM

slide-60
SLIDE 60

CMU 15-721 (Spring 2016)

CONFIGURATION COMPARISON

Generic Configuration

→ Abort-and-Restart Retrieval → Merge (All) Threshold → 1024 KB Block Size

Optimized Configuration

→ Synchronous Retrieval → Top-5% Merge Threshold → Block Sizes (HDD/SMR - 1024 KB) (SSD/3DX - 16 KB)

28

slide-61
SLIDE 61

CMU 15-721 (Spring 2016)

TATP BENCHMARK

29

80000 160000 240000 320000 HDD SMR SSD 3D XPoint NVRAM Throughput (txn/sec)

Generic Optimized

Optimal Configuration per Storage Device 1.25GB Memory

DRAM

slide-62
SLIDE 62

CMU 15-721 (Spring 2016)

VOTER BENCHMARK

30

50000 100000 150000 HDD SMR SSD 3DX NVRAM Throughput (txn/sec)

Generic Optimized

Optimal Configuration per Storage Device 1.25GB Memory

DRAM

slide-63
SLIDE 63

CMU 15-721 (Spring 2016)

PARTING THOUGHTS

Today was about working around the block-

  • riented access and slowness of secondary

storage. Fast & cheap byte-addressable NVM will make this lecture unnecessary.

31

slide-64
SLIDE 64

CMU 15-721 (Spring 2016)

NEXT CLASS

Non-Volatile Memory Project #3 Code Reviews

32