(Computer Science @ CMU) Presenter: Devesh Kumar Singh Background - - PowerPoint PPT Presentation

computer science cmu
SMART_READER_LITE
LIVE PREVIEW

(Computer Science @ CMU) Presenter: Devesh Kumar Singh Background - - PowerPoint PPT Presentation

Authors : Joy Arulraj, Matthew Peron, Andrew Pavlo (Computer Science @ CMU) Presenter: Devesh Kumar Singh Background Storage Devices Write Ahead Protocol Write Behind Protocol Evaluation Durability of updates: Persist


slide-1
SLIDE 1

Authors : Joy Arulraj, Matthew Peron, Andrew Pavlo (Computer Science @ CMU) Presenter: Devesh Kumar Singh

slide-2
SLIDE 2
  • Background
  • Storage Devices
  • Write Ahead Protocol
  • Write Behind Protocol
  • Evaluation
slide-3
SLIDE 3
slide-4
SLIDE 4

Durability of updates: Persist committed transactions Failure Atomicity: Dispose aborted transactions

Tx: A = A+1 Commit Crash/Abort Tx: A = A + 1 A = 2 A = 1 A = 1 A = 1

slide-5
SLIDE 5

Media failure: Data loss, storage corruption Transaction failure: Aborted by DBMS/ application System failure: Hardware failure, bugs in DBM/OS

slide-6
SLIDE 6
  • Steal
  • Grab buffer-pool frames from uncommitted transactions
  • Can lose dirty writes, but better performance
  • No Force
  • Don’t force transaction updates to disk before committing
  • Difficult to guarantee durability, but better performance

No Force Force

Desired Trivial

No Steal Steal

slide-7
SLIDE 7

Changes added to a log on durable storage, then send to durable storage

  • Redo log
  • Reapply updates of committed transactions
  • Undo log:
  • Reverses updates by failed transactions
slide-8
SLIDE 8
slide-9
SLIDE 9
  • Magnetic storage platters based
  • High data density/ Low storage price per capacity
  • Random access slower than sequential access
  • Slowest speeds due to mechanical design choices
slide-10
SLIDE 10
  • NAND-based flash memory based
  • Read/Write 100-1000x faster then HDD
  • Storage cell durable for fixed # of writes
  • 3-10x expensive then HDD
slide-11
SLIDE 11
  • Low latency, byte sized reads/writes of DRAM
  • Persistent writes, large storage capacity of HDD/SDDs
  • Cache line granularity, High bandwidth, Low latency to CPU’s
slide-12
SLIDE 12

0.1 0.02 1 0.5 100 100 0.01 0.1 1 10 100 1000 Sequential Writes Random Writes

IOPS (K)

Synchronized file write throughput to a 64 GB file

HDD SSD NVM

slide-13
SLIDE 13
slide-14
SLIDE 14

WAL Record Dirty Page Table Active Transaction Table

LSN Log Rec Type Transaction Commit Timestamp Table ID Insert Location Delete Location Before/After Images

activeTxId latestLSN TxId lastLSN status

slide-15
SLIDE 15

Traditional DBMS In-memory DBMS

Storage All TX Committed TX DPT/ATT ATT Storage

slide-16
SLIDE 16

During Transaction

txId lastLSN status 1

  • Active

rec1,rec2 ,rec3 txId lastLSN status 1 28 Commit

Database Checkpoints 1 2 3 DRAM NVM Data Data

slide-17
SLIDE 17

In memory DBMS skips Undo phase

slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20

WBL record Dirty Tuple table

LSN Log Record Type Persisted commit Timestamp Dirty Commit Timestamp

TX id Table id Tuple location

slide-21
SLIDE 21

Operation Finish TX changes DRAM Tuple changes DTT Cp: Commit timestamp of latest committed transaction Cd: Commit timestamp not assigned to any transaction before the next group commit finishes Group Commit: Flushes a batch a log records in a single write to durable storage

slide-22
SLIDE 22

DTT Dirty tuples, (Cp,Cd), Long running tx Cp dt1 dt2 (Cp,Cd)

Database 2 Database Log 1 3 DRAM NVM Meta Data Data Checkpoints

slide-23
SLIDE 23

{ }

Group Commit Time

{(101,199)} {(101,199)} { }

Garbage Collection Dirty Ranges 1 5 4 2 3

{(101,199) (301,399)}

slide-24
SLIDE 24
slide-25
SLIDE 25
slide-26
SLIDE 26

Intel PMEP Hardware Emulator 128 GB DRAM 128 GB Emulated NVM from DRAM 3 TB Seagate Barracuda HDD 400 GB Intel DC S3700 SSD

slide-27
SLIDE 27

Yahoo’s YCSB 1 table with 2 mil tuples (2 GB)

Read-heavy, 90% reads, 10% updates Balanced 50% reads, 50% updates Write-heavy 10% reads, 90% updates TPC-C 5 Tx types, 88% reads, 12% updates, 100k tuples(1 GB)

slide-28
SLIDE 28

1 100 10000 HDD SSD NVM

Tx\sec

WAL WBL

slide-29
SLIDE 29

0.1 1 10 100 1000 HDD SSD NVM

Recovery time

WAL WBL

slide-30
SLIDE 30