Transaction Logging Unleashed with NVRAM Tianzheng Wang Ryan - - PowerPoint PPT Presentation

transaction logging unleashed with nvram
SMART_READER_LITE
LIVE PREVIEW

Transaction Logging Unleashed with NVRAM Tianzheng Wang Ryan - - PowerPoint PPT Presentation

Transaction Logging Unleashed with NVRAM Tianzheng Wang Ryan Johnson No, Im not talking about the syslog 1 Write-ahead logging Used by most transactional systems Databases , file systems Reliability Everything goes to


slide-1
SLIDE 1

Tianzheng Wang Ryan Johnson

Transaction Logging Unleashed with NVRAM

slide-2
SLIDE 2

No, I’m not talking about the syslog

1

slide-3
SLIDE 3
  • Used by most transactional systems
  • Databases, file systems…
  • Reliability
  • Everything goes to the log first, then the real place
  • Replay winners, rollback losers
  • Performance
  • Buffer log records in DRAM
  • Disk/storage friendly long, sequential writes

Write-ahead logging

2

Update Bal=500 Data

  • 1. Log it 2. Really change it

Update Bal=500 Data

  • 1. Log it 2. Really change it
slide-4
SLIDE 4
  • Used by most transactional systems
  • Databases, file systems…
  • Reliability
  • Everything goes to the log first, then the real place
  • Replay winners, rollback losers
  • Performance
  • Buffer log records in DRAM
  • Disk/storage friendly long, sequential writes

Write-ahead logging

3

Update Bal=500 Data

  • 1. Log it 2. Really change it

Update Bal=500 Data

  • 1. Log it 2. Really change it

All was good until we had massively parallel hardware

slide-5
SLIDE 5

Centralized log: a serious bottleneck

4

Transaction threads DRAM Log buffer

  • n commit

Flush Storage Why not distribute the log? Reality Ideal

Log work

CPU cycles

46% - Log contention Other

slide-6
SLIDE 6

5

Sure!

But need the help of byte-addressable, non-volatile memory (NVRAM).

slide-7
SLIDE 7

The (impractical) distributed log

  • Log space partitioning
  • by page or xct?
  • Impacts locality and recovery
  • Dependency tracking
  • Direct xct deps: T4  T2
  • Direct page deps: T4  T3
  • Transitive deps: T4  {T3, T2}

 T1

  • Easily end up flushing all logs
  • Storage is slow
  • System becomes I/O bound

6

a d c

Log 1 Log 2

e f

Log 3

g

Log 4

a e g

Log 1 Log 2

d f

Log 3

c

Log 4 T1 T2 T3 T4

a b c d e f g h

T1 T2 T3 T4

slide-8
SLIDE 8

The (impractical) distributed log

7

* R. Johnson etc., “Aether: a scalable approach to logging”, PVLDB 2010

slide-9
SLIDE 9

The (impractical) distributed log

8

Heavy dep. tracking + slow I/O = showstoppers

* R. Johnson etc., “Aether: a scalable approach to logging”, PVLDB 2010

slide-10
SLIDE 10

NVRAM to the rescue

  • NVRAM as log buffers for distributed logging
  • Log records durable once written
  • No dep tracking or flush-before-commit

9

Heavy dep. tracking + slow I/O = (SOLVED)

slide-11
SLIDE 11

System architecture

10

Before: Log buffer (DRAM) After: Log buffers (NVRAM)

  • Contend on a single

log buffer

  • Flush on commit or

timeout

  • Less or no contention
  • Flush when buffers are

full or timeout

slide-12
SLIDE 12
  • NUMA effects
  • Durability – processor cache is volatile
  • Database system implications
  • Ordering
  • Uniqueness of log records
  • Recovery
  • Checkpointing

Challenges

11

slide-13
SLIDE 13

NUMA node 2 NUMA node 1

Problem #1: NUMA effects

  • Partition-by-page => easier/simpler recovery
  • Threads prefer to access local NVM node

12

P1 P2 P2

NUMA node 2 NUMA node 1 Transaction level: Page level: Prefer to partition by xct  NUMA-friendly  Cross NUMA boundary

slide-14
SLIDE 14

Problem #2: LSN gives partial order

  • Log sequence numbers only good in any one log
  • Recovery needs total order in any log/xct/page

13

Transaction threads: Log buffers: … 1 2 1 2 The same page being modified: … Same LSNs, whom first? Recovery manager: smaller ≠ earlier! By-xct d-log needs global ordering of log records

slide-15
SLIDE 15

Solution #2: global sequence number

14

Tx GSN: Log bufs: … 2 3 8 9 Page: … 2 3 1 – 2 – 3 Pg GSN: 7 8 9 3 – 8 – 9

GSN: Page Transaction Log

EX-latch max(pg’s, tx’s) + 1 / SH-latch / max(pg’s, tx’s) / Log ins. max (pg’s, tx’s, log’s) + 1

How? Bump GSNs when the transaction latches pages and inserts log records

  • Based on Lamport’s clock, no extra contention

GSN gives a partial, global order in each page, tx and log

slide-16
SLIDE 16
  • Log records must leave CPU cache before commit,

preferably without dependency-tracking

  • The ultimate solution: durable processor cache
  • Candidates: FeRAM, SRAM + Supercapacitor…
  • Kiln [MICRO-46]
  • Whole system persistence [ASPLOS ’12]
  • Rohm nonvolatile CPU

Problem #3: Volatile CPU caches

15

But not available

  • n the market
slide-17
SLIDE 17

dGSN dGSN dGSN

  • Log records must leave CPU cache before commit,

preferably without dependency-tracking

  • Stop-gap solution: passive group commit

Problem #3: Volatile CPU caches

16

Commit queue

Get min dGSN: 8 Passive group commit daemon Dequeue xct with dGSN <= 8

  • n commit:
  • 1. Flush local caches
  • 2. Update local dGSN
  • 3. Enqueue transaction

TXN dGSN Xct 1 5 Xct 2 10

slide-18
SLIDE 18

Evaluation

  • Setup
  • 4-socket, 6-core Xeon E7- 4807 @ 1.8GHz
  • 24 physical cores, 48 “CPUs” with hyper threading
  • 64GB DRAM
  • NVM: flash/super-capacitor backed DRAM
  • Workloads
  • Shore-MT, with Aether*
  • TPC-C: online transaction processing
  • TATP: telecom database applications

17

* R. Johnson etc., “Aether: a scalable approach to logging”, PVLDB 2010

slide-19
SLIDE 19

TATP – write intensive

  • Distributed vs. centralized logging

18

slide-20
SLIDE 20

TATP – write intensive

  • Passive group commit

19

slide-21
SLIDE 21

TPC-C – full transaction mix

  • Distributed vs. centralized logging

20

slide-22
SLIDE 22

TPC-C – full transaction mix

  • Passive group commit

21

slide-23
SLIDE 23

Conclusion

  • Centralized logging is a serious bottleneck
  • NVRAM resurrects d-log to scale databases
  • Practical distributed log today
  • Passive group commit
  • Flash/super-capacitor backed DRAM (NVDIMM)

22

Find out more in our VLDB paper: Scalable Logging through Emerging Non-Volatile Memory http://www.vldb.org/pvldb/vol7/p865-wang.pdf

Thank you!