Lect ure # 24 ADVANCED DATABASE SYSTEMS Databases on New Hardware - - PowerPoint PPT Presentation

lect ure 24 advanced database
SMART_READER_LITE
LIVE PREVIEW

Lect ure # 24 ADVANCED DATABASE SYSTEMS Databases on New Hardware - - PowerPoint PPT Presentation

Lect ure # 24 ADVANCED DATABASE SYSTEMS Databases on New Hardware @ Andy_Pavlo // 15- 721 // Spring 2018 2 ADM IN ISTRIVIA Snowflake Guest: May 2 th @ 3:00pm Final Exam Handout: May 2 nd Code Review #2: May 2 nd @ 11:59pm We will use the


slide-1
SLIDE 1

Databases on New Hardware

@ Andy_Pavlo // 15- 721 // Spring 2018

ADVANCED DATABASE SYSTEMS Lect ure # 24

slide-2
SLIDE 2 CMU 15-721 (Spring 2018)

ADM IN ISTRIVIA

Snowflake Guest: May 2th @ 3:00pm Final Exam Handout: May 2nd Code Review #2: May 2nd @ 11:59pm

→ We will use the same group pairings as before.

Final Presentations: May 14th @ 8:30am

→ GHC 4303 (ignore schedule!) → 12 minutes per group → Food and prizes for everyone!

2
slide-3
SLIDE 3 CMU 15-721 (Spring 2018)

ADM IN ISTRIVIA

Course Evaluation

→ Please tell me what you really think of me. → I actually take your feedback in consideration. → Take revenge on next year's students.

https://cmu.smartevals.com/

3
slide-4
SLIDE 4 CMU 15-721 (Spring 2018)

ADM IN ISTRIVIA

Course Evaluation

→ Please tell me what you really think of me. → I actually take your feedback in consideration. → Take revenge on next year's students.

https://cmu.smartevals.com/

3
slide-5
SLIDE 5 CMU 15-721 (Spring 2018)

DATABASE H ARDWARE

People have been thinking about using hardware to accelerate DBMSs for decades. 1980s: Database Machines 2000s: FPGAs + Appliances 2010s: FPGAs + GPUs

4 DATABASE MACHINES: AN IDEA WHOSE TIME HAS PASSED? A CRITIQUE O OF THE FUTURE OF DATABASE M MACHINES University of Wisconsin 1983
slide-6
SLIDE 6 CMU 15-721 (Spring 2018)

Non-Volatile Memory GPU Acceleration Hardware Transactional Memory

5
slide-7
SLIDE 7 CMU 15-721 (Spring 2018)

N O N- VO LATILE M EM O RY

Emerging storage technology that provide low latency read/writes like DRAM, but with persistent writes and large capacities like SSDs.

→ aka Storage-class Memory, Persistent Memory

First devices will be block-addressable (NVMe) Later devices will be byte-addressable.

6
slide-8
SLIDE 8 CMU 15-721 (Spring 2018)

FUN DAM EN TAL ELEM EN TS O F CIRCUITS

7

Capacitor (ca. 1745) Resistor (ca. 1827) Inductor (ca. 1831)

slide-9
SLIDE 9 CMU 15-721 (Spring 2018)

FUN DAM EN TAL ELEM EN TS O F CIRCUITS

In 1971, Leon Chua at Berkeley predicted the existence of a fourth fundamental element. A two-terminal device whose resistance depends

  • n the voltage applied to it, but when that voltage

is turned off it permanently remembers its last resistive state.

8 TWO CENTURIES OF MEMRISTORS Nature Materials 2012
slide-10
SLIDE 10 CMU 15-721 (Spring 2018)

FUN DAM EN TAL ELEM EN TS O F CIRCUITS

9

Capacitor (ca. 1745) Resistor (ca. 1827) Inductor (ca. 1831) Memristor (ca. 1971)

slide-11
SLIDE 11 CMU 15-721 (Spring 2018)

M ERISTO RS

A team at HP Labs led by Stanley Williams stumbled upon a nano-device that had weird properties that they could not understand. It wasn’t until they found Chua’s 1971 paper that they realized what they had invented.

10 HOW WE FOUND THE MISSING MEMRISTOR IEEE Spectrum 2008
slide-12
SLIDE 12 CMU 15-721 (Spring 2018)

TECH N O LO GIES

Phase-Change Memory (PRAM) Resistive RAM (ReRAM) Magnetoresistive RAM (MRAM)

11
slide-13
SLIDE 13 CMU 15-721 (Spring 2018)

PH ASE- CH AN GE M EM O RY

Storage cell is comprised of two metal electrodes separated by a resistive heater and the phase change material (chalcogenide). The value of the cell is changed based on how the material is heated.

→ A short pulse changes the cell to a ‘0’. → A long, gradual pulse changes the cell to a ‘1’.

12 PHASE CHANGE MEMORY ARCHITECTURE AND THE QUEST FOR SCALABILITY Communications of the ACM 2010 Heater Bitline Access chalcogenide
slide-14
SLIDE 14 CMU 15-721 (Spring 2018)

RESISTIVE RAM

Two metal layers with two TiO2 layers in between. Running a current one direction moves electrons from the top TiO2 layer to the bottom, thereby changing the resistance. May be programmable storage fabric…

→ Bertrand Russell’s Material Implication Logic

13 HOW WE FOUND THE MISSING MEMRISTOR IEEE Spectrum 2008 Platinum Platinum TiO2 Layer TiO2-x Layer
slide-15
SLIDE 15 CMU 15-721 (Spring 2018)

M AGN ETO RESISTIVE RAM

Stores data using magnetic storage elements instead of electric charge or current flows. Spin-Transfer Torque (STT-MRAM) is the leading technology for this type of NVM.

→ Supposedly able to scale to very small sizes (10nm) and have SRAM latencies.

14 Fixed FM Layer→ Oxide Layer Free FM Layer ↔ SPIN MEMORY SHOWS ITS M MIGHT IEEE Spectrum 2014
slide-16
SLIDE 16 CMU 15-721 (Spring 2018)

WH Y TH IS IS FO R REAL TH IS TIM E

Industry has agreed to standard technologies and form factors. Linux and Microsoft have added support for NVM in their kernels (DAX). Intel has added new instructions for flushing cache lines to NVM (CLFLUSH, CLWB).

15
slide-17
SLIDE 17 CMU 15-721 (Spring 2018)

N VM DIM M FO RM FACTO RS

NVDIMM-F (2015)

→ Flash only. Has to be paired with DRAM DIMM.

NVDIMM-N (2015)

→ Flash and DRAM together on the same DIMM. → Appears as volatile memory to the OS.

NVDIMM-P (2018)

→ True persistent memory. No DRAM or flash.

16
slide-18
SLIDE 18 CMU 15-721 (Spring 2018) NVM Filesystem Disk Filesystem

NVM as Persistent Memory

DBMS DBMS Address Space Buffer Pool NVM DRAM

NVM Next to DRAM

DBMS Virtual Memory Subsystem DBMS Address Space NVM DRAM

DRAM as Hardware- Managed Cache

DBMS DBMS Address Space Virtual Memory Subsystem

N VM CO N FIGURATIO N S

Source: Ismail Oukid 17
slide-19
SLIDE 19 CMU 15-721 (Spring 2018)

N VM FO R DATABASE SYSTEM S

Block-addressable NVM is not that interesting. Byte-addressable NVM will be a game changer but will require some work to use correctly.

→ In-memory DBMSs will be better positioned to use byte- addressable NVM. → Disk-oriented DBMSs will initially treat NVM as just a faster SSD.

18
slide-20
SLIDE 20 CMU 15-721 (Spring 2018)

STO RAGE & RECOVERY M ETH O DS

Understand how a DBMS will behave on a system that only has byte-addressable NVM. Develop NVM-optimized implementations of standard DBMS architectures. Based on the N-Store prototype DBMS.

19 LET'S TALK ABOUT STORAGE & RECOVERY METHODS FOR NON- VOLATILE MEMORY DATABASE SYSTEMS SIGMOD 2015
slide-21
SLIDE 21 CMU 15-721 (Spring 2018)

SYN CH RO N IZATIO N

Existing programming models assume that any write to memory is non-volatile.

→ CPU decides when to move data from caches to DRAM.

The DBMS needs a way to ensure that data is flushed from caches to NVM.

20

STORE

L1 Cache L2 Cache

Memory Controller

slide-22
SLIDE 22 CMU 15-721 (Spring 2018)

SYN CH RO N IZATIO N

Existing programming models assume that any write to memory is non-volatile.

→ CPU decides when to move data from caches to DRAM.

The DBMS needs a way to ensure that data is flushed from caches to NVM.

20

STORE CLWB

L1 Cache L2 Cache

Memory Controller

slide-23
SLIDE 23 CMU 15-721 (Spring 2018)

SYN CH RO N IZATIO N

Existing programming models assume that any write to memory is non-volatile.

→ CPU decides when to move data from caches to DRAM.

The DBMS needs a way to ensure that data is flushed from caches to NVM.

20

STORE CLWB

L1 Cache L2 Cache

ADR

Memory Controller

slide-24
SLIDE 24 CMU 15-721 (Spring 2018)

N AM IN G

If the DBMS process restarts, we need to make sure that all of the pointers for in-memory data point to the same data.

21

Table Heap

Tuple #00 Tuple #02 Tuple #01

Index

Tuple #00 (v2)

slide-25
SLIDE 25 CMU 15-721 (Spring 2018)

N AM IN G

If the DBMS process restarts, we need to make sure that all of the pointers for in-memory data point to the same data.

21

Table Heap

Tuple #00 Tuple #02 Tuple #01

Index

Tuple #00 (v2)

X X

slide-26
SLIDE 26 CMU 15-721 (Spring 2018)

N AM IN G

If the DBMS process restarts, we need to make sure that all of the pointers for in-memory data point to the same data.

21

Table Heap

Tuple #00 Tuple #02 Tuple #01

Index

Tuple #00 (v2)

slide-27
SLIDE 27 CMU 15-721 (Spring 2018)

N VM - AWARE M EM O RY ALLO CATO R

Feature #1: Synchronization

→ The allocator writes back CPU cache lines to NVM using the CLFLUSH instruction. → It then issues a SFENCE instruction to wait for the data to become durable on NVM.

Feature #2: Naming

→ The allocator ensures that virtual memory addresses assigned to a memory-mapped region never change even after the OS or DBMS restarts.

22
slide-28
SLIDE 28 CMU 15-721 (Spring 2018)

DBM S EN GIN E ARCH ITECTURES

Choice #1: In-place Updates

→ Table heap with a write-ahead log + snapshots. → Example: VoltDB

Choice #2: Copy-on-Write

→ Create a shadow copy of the table when updated. → No write-ahead log. → Example: LMDB

Choice #3: Log-structured

→ All writes are appended to log. No table heap. → Example: RocksDB

23
slide-29
SLIDE 29 CMU 15-721 (Spring 2018)

IN- PLACE UPDATES EN GIN E

24

In-Memory Table Heap

Tuple #00 Tuple #02

Durable Storage

Write-Ahead Log

In-Memory Index

Tuple #01

Snapshots
slide-30
SLIDE 30 CMU 15-721 (Spring 2018)

IN- PLACE UPDATES EN GIN E

24

In-Memory Table Heap

Tuple #00 Tuple #02

Durable Storage

Write-Ahead Log

Tuple Delta

In-Memory Index

Tuple #01

Snapshots

1

slide-31
SLIDE 31 CMU 15-721 (Spring 2018)

IN- PLACE UPDATES EN GIN E

24

In-Memory Table Heap

Tuple #00 Tuple #02

Durable Storage

Write-Ahead Log

Tuple Delta

In-Memory Index

Tuple #01

Snapshots

Tuple #01 (!)

1 2

slide-32
SLIDE 32 CMU 15-721 (Spring 2018)

IN- PLACE UPDATES EN GIN E

24

In-Memory Table Heap

Tuple #00 Tuple #02

Durable Storage

Write-Ahead Log

Tuple Delta

In-Memory Index

Tuple #01

Snapshots

Tuple #01 (!) Tuple #01 (!)

1 2 3

slide-33
SLIDE 33 CMU 15-721 (Spring 2018)

IN- PLACE UPDATES EN GIN E

24

In-Memory Table Heap

Tuple #00 Tuple #02

Durable Storage

Write-Ahead Log

Tuple Delta

In-Memory Index

Tuple #01

Snapshots

Tuple #01 (!) Tuple #01 (!)

1 2 3

Duplicate Data Recovery Latency

slide-34
SLIDE 34 CMU 15-721 (Spring 2018)

N VM - O PTIM IZED ARCH ITECTURES

Leverage the allocator’s non-volatile pointers to

  • nly record what changed rather than how it

changed. The DBMS only has to maintain a transient UNDO log for a txn until it commits.

→ Dirty cache lines from an uncommitted txn can be flushed by hardware to the memory controller. → No REDO log because we flush all the changes to NVM at the time of commit.

25
slide-35
SLIDE 35 CMU 15-721 (Spring 2018)

N VM IN- PLACE UPDATES EN GIN E

26

NVM Table Heap

Tuple #00 Tuple #02

NVM Storage

Write-Ahead Log

NVM Index

Tuple #01

slide-36
SLIDE 36 CMU 15-721 (Spring 2018)

N VM IN- PLACE UPDATES EN GIN E

26

NVM Table Heap

Tuple #00 Tuple #02

NVM Storage

Write-Ahead Log

Tuple Pointers

NVM Index

Tuple #01

1

slide-37
SLIDE 37 CMU 15-721 (Spring 2018)

N VM IN- PLACE UPDATES EN GIN E

26

NVM Table Heap

Tuple #00 Tuple #02

NVM Storage

Write-Ahead Log

Tuple Pointers

NVM Index

Tuple #01 Tuple #01 (!)

1 2

slide-38
SLIDE 38 CMU 15-721 (Spring 2018)

CO PY- O N- WRITE EN GIN E

27

Current Directory Master Record Leaf 1 Leaf 2

Page #00 Page #01
slide-39
SLIDE 39 CMU 15-721 (Spring 2018)

CO PY- O N- WRITE EN GIN E

27

Current Directory Master Record Leaf 1 Leaf 2

1

Page #00 Page #01

Updated Leaf 1

Page #00
slide-40
SLIDE 40 CMU 15-721 (Spring 2018)

CO PY- O N- WRITE EN GIN E

27

Current Directory Dirty Directory Master Record Leaf 1 Leaf 2

1 2

Page #00 Page #01

Updated Leaf 1

Page #00
slide-41
SLIDE 41 CMU 15-721 (Spring 2018)

CO PY- O N- WRITE EN GIN E

27

Current Directory Dirty Directory Master Record Leaf 1 Leaf 2

1 2 3

Page #00 Page #01

Updated Leaf 1

Page #00
slide-42
SLIDE 42 CMU 15-721 (Spring 2018)

CO PY- O N- WRITE EN GIN E

27

Current Directory Dirty Directory Master Record Leaf 1 Leaf 2

1 2 3

Expensive Copies

Page #00 Page #01

Updated Leaf 1

Page #00
slide-43
SLIDE 43 CMU 15-721 (Spring 2018)

N VM CO PY- O N- WRITE EN GIN E

28

Current Directory

Tuple #00

Master Record Leaf 1 Leaf 2

Tuple #01

slide-44
SLIDE 44 CMU 15-721 (Spring 2018)

N VM CO PY- O N- WRITE EN GIN E

28

Current Directory

Tuple #00

Master Record Leaf 1 Leaf 2 Updated Leaf 1

Tuple #00 (!)

1

Tuple #01 Only Copy Pointers

slide-45
SLIDE 45 CMU 15-721 (Spring 2018)

N VM CO PY- O N- WRITE EN GIN E

28

Current Directory Dirty Directory

Tuple #00

Master Record Leaf 1 Leaf 2 Updated Leaf 1

Tuple #00 (!)

1 2 3

Tuple #01 Only Copy Pointers

slide-46
SLIDE 46 CMU 15-721 (Spring 2018)

LO G- STRUCTURED EN GIN E

29

SSTable MemTable

Write-Ahead Log

Bloom Filter

slide-47
SLIDE 47 CMU 15-721 (Spring 2018)

LO G- STRUCTURED EN GIN E

29

SSTable MemTable

Write-Ahead Log

Tuple Delta Bloom Filter

1

slide-48
SLIDE 48 CMU 15-721 (Spring 2018)

LO G- STRUCTURED EN GIN E

29

SSTable MemTable

Write-Ahead Log

Tuple Delta Bloom Filter Tuple Delta Tuple Data

1 2 3

slide-49
SLIDE 49 CMU 15-721 (Spring 2018)

LO G- STRUCTURED EN GIN E

29

SSTable MemTable

Write-Ahead Log

Tuple Delta Bloom Filter Tuple Delta Tuple Data

1 2 3

Duplicate Data Compactions

slide-50
SLIDE 50 CMU 15-721 (Spring 2018)

N VM LO G- STRUCTURED EN GIN E

30

SSTable MemTable

Write-Ahead Log

Tuple Delta Bloom Filter Tuple Delta Tuple Data

1 2 3

slide-51
SLIDE 51 CMU 15-721 (Spring 2018)

N VM LO G- STRUCTURED EN GIN E

30

SSTable MemTable

Write-Ahead Log

Tuple Delta Bloom Filter Tuple Delta Tuple Data

1 2 3

X

slide-52
SLIDE 52 CMU 15-721 (Spring 2018)

N VM LO G- STRUCTURED EN GIN E

30

MemTable

Write-Ahead Log

Tuple Delta

1

slide-53
SLIDE 53 CMU 15-721 (Spring 2018)

N VM SUM M ARY

Storage Optimizations

→ Leverage byte-addressability to avoid unnecessary data duplication.

Recovery Optimizations

→ NVM-optimized recovery protocols avoid the overhead

  • f processing a log.

→ Non-volatile data structures ensure consistency.

31
slide-54
SLIDE 54 CMU 15-721 (Spring 2018)

GPU ACCELERATIO N

GPUs excel at performing (relatively simple) repetitive operations on large amounts of data

  • ver multiple streams of data.

Target operations that do not require blocking for input or branches:

→ Good: Sequential scans with predicates → Bad: B+Tree index probes

GPU memory is (usually) not cache coherent with CPU memory.

32
slide-55
SLIDE 55 CMU 15-721 (Spring 2018)

GPU ACCELERATIO N

33
slide-56
SLIDE 56 CMU 15-721 (Spring 2018)

GPU ACCELERATIO N

33
slide-57
SLIDE 57 CMU 15-721 (Spring 2018)

GPU ACCELERATIO N

33

PCIe Bus (~16 GB/s) DDR4 (~40 GB/s) NVLink (~25 GB/s) NVLink (~25 GB/s)

slide-58
SLIDE 58 CMU 15-721 (Spring 2018)

GPU ACCELERATIO N

Choice #1: Entire Database

→ Store the database in the GPU(s) VRAM. → All queries perform massively parallel seq scans.

Choice #2: Important Columns

→ Return the offsets of records that match the portion

  • f the query that accesses GPU-resident columns.

→ Have to materialize full results in CPU.

Choice #3: Streaming

→ Transfer data from CPU to GPU on the fly.

34
slide-59
SLIDE 59

https://db.cs.cmu.edu/seminar2018

slide-60
SLIDE 60 CMU 15-721 (Spring 2018)

H ARDWARE TRAN SACTIO N AL M EM O RY

Create critical sections in software that are managed by hardware.

→ Leverages same cache coherency protocol to detect transaction conflicts. → Intel x86: Transactional Synchronization Extensions

Read/write set of transactions must fit in L1 cache.

→ This means that it is not useful for general purpose txns. → It can be used to create latch-free indexes.

TO LOCK, SWAP OR ELIDE: O ON THE INTERPLAY OF HARDWARE TRANSACTIONAL MEMORY AND LOCK- FREE INDEXING VLDB 2015
slide-61
SLIDE 61 CMU 15-721 (Spring 2018)

H TM PRO GRAM M IN G M O DEL

Hardware Lock Elision (HLE)

→ Optimistically execute critical section by eliding the write to a lock so that it appears to be free to other threads. → If there is a conflict, re-execute the code but actually take locks the second time.

Restricted Transactional Memory (RTM)

→ Like HLE but with an optional fallback codepath that the CPU jumps to if the txn aborts.

37
slide-62
SLIDE 62 CMU 15-721 (Spring 2018)

H TM LATCH ELISIO N

38

A B D G

20 10 35 6 12 23 38 44

C E F R

Insert Key 25

slide-63
SLIDE 63 CMU 15-721 (Spring 2018)

H TM LATCH ELISIO N

38

A B D G

20 10 35 6 12 23 38 44

C E F R R

Insert Key 25

slide-64
SLIDE 64 CMU 15-721 (Spring 2018)

H TM LATCH ELISIO N

38

A B D G

20 10 35 6 12 23 38 44

C E F R

Insert Key 25

slide-65
SLIDE 65 CMU 15-721 (Spring 2018)

H TM LATCH ELISIO N

38

A B D G

20 10 35 6 12 23 38 44

C E F R X

Insert Key 25

slide-66
SLIDE 66 CMU 15-721 (Spring 2018)

H TM LATCH ELISIO N

38

A B D G

20 10 35 6 12 23 38 44

C E F X

Insert Key 25

slide-67
SLIDE 67 CMU 15-721 (Spring 2018)

H TM LATCH ELISIO N

38

A B D G

20 10 35 6 12 23 38 44

C E F X

Insert Key 25

25
slide-68
SLIDE 68 CMU 15-721 (Spring 2018)

H TM LATCH ELISIO N

38

A B D G

20 10 35 6 12 23 38 44

C E F

Insert Key 25

TSX-START { LATCH A Read A LATCH C UNLATCH A Read C LATCH F UNLATCH C } TSX-COMMIT Insert 25 UNLATCH F

slide-69
SLIDE 69 CMU 15-721 (Spring 2018)

H TM LATCH ELISIO N

38

A B D G

20 10 35 6 12 23 38 44

C E F

Insert Key 25

TSX-START { LATCH A Read A LATCH C UNLATCH A Read C LATCH F UNLATCH C } TSX-COMMIT Insert 25 UNLATCH F

slide-70
SLIDE 70 CMU 15-721 (Spring 2018)

H TM LATCH ELISIO N

38

A B D G

20 10 35 6 12 23 38 44

C E F

Insert Key 25

TSX-START { LATCH A Read A LATCH C UNLATCH A Read C LATCH F UNLATCH C } TSX-COMMIT Insert 25 UNLATCH F

slide-71
SLIDE 71 CMU 15-721 (Spring 2018)

H TM LATCH ELISIO N

38

A B D G

20 10 35 6 12 23 38 44

C E F X

Insert Key 25

TSX-START { LATCH A Read A LATCH C UNLATCH A Read C LATCH F UNLATCH C } TSX-COMMIT Insert 25 UNLATCH F

slide-72
SLIDE 72 CMU 15-721 (Spring 2018)

PARTIN G TH O UGH TS

Designing for NVM is important

→ Non-volatile data structures provide higher throughput and faster recovery

Byte-addressable NVM is going to be a game changer when it comes out.

39
slide-73
SLIDE 73 CMU 15-721 (Spring 2018)

N EXT CLASS

Final Exam Handout

40