transaction logging unleashed with nvram
play

Transaction Logging Unleashed with NVRAM Tianzheng Wang Ryan - PowerPoint PPT Presentation

Transaction Logging Unleashed with NVRAM Tianzheng Wang Ryan Johnson No, Im not talking about the syslog 1 Write-ahead logging Used by most transactional systems Databases , file systems Reliability Everything goes to


  1. Transaction Logging Unleashed with NVRAM Tianzheng Wang Ryan Johnson

  2. No, I’m not talking about the syslog 1

  3. Write-ahead logging  Used by most transactional systems  Databases , file systems…  Reliability  Everything goes to the log first, then the real place  Replay winners, rollback losers Data Data Update Update Bal=500 Bal=500 1. Log it 2. Really change it 1. Log it 2. Really change it  Performance  Buffer log records in DRAM  Disk/storage friendly long, sequential writes 2

  4. Write-ahead logging  Used by most transactional systems  Databases , file systems…  Reliability  Everything goes to the log first, then the real place All was good until we had  Replay winners, rollback losers massively parallel hardware Data Data Update Update Bal=500 Bal=500 1. Log it 2. Really change it 1. Log it 2. Really change it  Performance  Buffer log records in DRAM  Disk/storage friendly long, sequential writes 3

  5. Centralized log: a serious bottleneck Ideal Reality Transaction DRAM threads Log buffer on commit CPU Log 46% - Log Storage Other Flush cycles work contention Why not distribute the log? 4

  6. Sure! But need the help of byte-addressable, non-volatile memory (NVRAM). 5

  7. The (impractical) distributed log  Log space partitioning T1 T2 T3 T4  by page or xct?  Impacts locality and recovery c d e f g h a b  Dependency tracking  Direct xct deps: T4  T2 Log 1 Log 2 Log 3 Log 4 a d c e f g  Direct page deps: T4  T3  Transitive deps: T4  { T3 , T2 }  T1 T1 T2 T3 T4  Easily end up flushing all logs  Storage is slow a e g d f c Log 1 Log 2 Log 3 Log 4  System becomes I/O bound 6

  8. The (impractical) distributed log * R. Johnson etc., “ Aether : a scalable approach to logging”, PVLDB 2010 7

  9. The (impractical) distributed log Heavy dep. tracking + slow I/O = showstoppers * R. Johnson etc., “ Aether : a scalable approach to logging”, PVLDB 2010 8

  10. NVRAM to the rescue  NVRAM as log buffers for distributed logging  Log records durable once written  No dep tracking or flush-before-commit Heavy dep. tracking + slow I/O = ( SOLVED ) 9

  11. System architecture Before: After: Log buffer (DRAM) Log buffers (NVRAM)  Contend on a single  Less or no contention log buffer  Flush when buffers are  Flush on commit or full or timeout timeout 10

  12. Challenges  NUMA effects  Durability – processor cache is volatile  Database system implications  Ordering  Uniqueness of log records  Recovery  Checkpointing  … 11

  13. Problem #1: NUMA effects  Partition-by-page => easier/simpler recovery  Threads prefer to access local NVM node Transaction level: Page level: NUMA NUMA NUMA NUMA P1 node 1 node 2 node 1 node 2 P2 P2  NUMA-friendly  Cross NUMA boundary Prefer to partition by xct 12

  14. Problem #2: LSN gives partial order  Log sequence numbers only good in any one log  Recovery needs total order in any log/xct/page Recovery The same page manager: being modified: Same LSNs, Transaction whom first? threads: smaller ≠ earlier! Log buffers: 1 2 … 1 2 … By-xct d-log needs global ordering of log records 13

  15. Solution #2: global sequence number  Based on Lamport’s clock, no extra contention How? Bump GSNs when the Page: transaction latches pages and inserts log records Pg GSN: 1 – 2 – 3 3 – 8 – 9 0 7 GSN: Page Transaction Log Tx GSN: 2 8 EX-latch max(pg’s, tx’s) + 1 / 3 9 SH-latch / max(pg’s, tx’s ) / 2 3 … 8 9 … Log bufs: Log ins. max (pg’s, tx’s, log’s) + 1 GSN gives a partial, global order in each page, tx and log 14

  16. Problem #3: Volatile CPU caches  Log records must leave CPU cache before commit, preferably without dependency-tracking  The ultimate solution: durable processor cache  Candidates: FeRAM, SRAM + Supercapacitor …  Kiln [MICRO-46]  Whole system persistence [ASPLOS ’12]  Rohm nonvolatile CPU But not available on the market 15

  17. Problem #3: Volatile CPU caches  Log records must leave CPU cache before commit, preferably without dependency-tracking  Stop-gap solution: passive group commit Passive group commit daemon Get min dGSN: 8 TXN dGSN dGSN dGSN dGSN Xct 1 5 on commit: 1. Flush local caches Xct 2 10 Dequeue xct with 2. Update local dGSN dGSN <= 8 Commit queue 3. Enqueue transaction 16

  18. Evaluation  Setup  4-socket, 6-core Xeon E7- 4807 @ 1.8GHz  24 physical cores, 48 “CPUs” with hyper threading  64GB DRAM  NVM: flash/super-capacitor backed DRAM  Workloads  Shore-MT, with Aether*  TPC-C: online transaction processing  TATP: telecom database applications * R. Johnson etc., “Aether: a scalable approach to logging”, PVLDB 2010 17

  19. TATP – write intensive  Distributed vs. centralized logging 18

  20. TATP – write intensive  Passive group commit 19

  21. TPC-C – full transaction mix  Distributed vs. centralized logging 20

  22. TPC-C – full transaction mix  Passive group commit 21

  23. Conclusion  Centralized logging is a serious bottleneck  NVRAM resurrects d-log to scale databases  Practical distributed log today  Passive group commit  Flash/super-capacitor backed DRAM (NVDIMM) Find out more in our VLDB paper: Scalable Logging through Emerging Non-Volatile Memory http://www.vldb.org/pvldb/vol7/p865-wang.pdf Thank you! 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend