Enabling System Transactions via Lightweight Kernel Extensions R.P. - - PowerPoint PPT Presentation

enabling system transactions
SMART_READER_LITE
LIVE PREVIEW

Enabling System Transactions via Lightweight Kernel Extensions R.P. - - PowerPoint PPT Presentation

Enabling System Transactions via Lightweight Kernel Extensions R.P. Spillane, S. Gaikwad. M. Chinni, C.P. Wright, E. Zadok Stony Brook University http://www.fsl.cs.sunysb.edu/ Summary What is the design complexity of system transactions


slide-1
SLIDE 1

R.P. Spillane, S. Gaikwad. M. Chinni, C.P. Wright, E. Zadok Stony Brook University http://www.fsl.cs.sunysb.edu/

Enabling System Transactions

via Lightweight Kernel Extensions

slide-2
SLIDE 2

2/28/2009 FAST 2009 - Enabling System Transactions 2

Summary

 What is the design complexity of system

transactions implemented in the VFS?

Low

  • 100 lines of code added to page writeback
  • 4000 lines of module code (log implementation)

 What is the performance?

Valor: 35% overhead on top of theoretical

best, compared to…

104% overhead for an efficient user-level

alternative

slide-3
SLIDE 3

System Transaction

Process 1 TID←sys_tbegin(...) write(TID,...) unlink(TID,...) sys_tabort(TID) f1 / f2 / f1 f2

2/28/2009 3 FAST 2009 - Enabling System Transactions

FS State: foo FS State: foo’ System Calls

slide-4
SLIDE 4

Design Feasibility Transparency & Performance Quicksilver, TxF Berkeley DB, Stasis

The Design Spectrum

 Valor side-steps the traditional trade-off

by working with the Kernel’s page cache in a general way.

2/28/2009 FAST 2009 - Enabling System Transactions 4

Valor Amino KBDB

slide-5
SLIDE 5

Valor’s Process Txn Model

 Transactional Model

 Supported Operations:

  • dirtying a page
  • appending to a file, modifying an inode
  • modifying a directory

Locking:

  • directory locks, inode locks
  • page range locks for overwrites
  • intent locks for directory renames

2/28/2009 5 FAST 2009 - Enabling System Transactions

slide-6
SLIDE 6

Asynchronous By Default

 ACI (no D w/o tsync)  Similar to asynchronous write(2) with

fsync(2)

 Same purpose (performance increase)  Requires page cache for files updated

transactionally

2/28/2009 FAST 2009 - Enabling System Transactions 6

slide-7
SLIDE 7

Valor Design

 Modify page writeback to support simple

write ordering

 Implement an ARIES style undo/redo

log module for FS-operations

2/28/2009 FAST 2009 - Enabling System Transactions 7

slide-8
SLIDE 8

write(TID,… )

Page Dirtying: No Txns

2/28/2009 FAST 2009 - Enabling System Transactions

Process 1 write(TID,… ) OK bad Old Page New Page LEGEND:

Uh-oh…

8

slide-9
SLIDE 9

log_append(TID,… ) write(TID,… ) log_append(TID,… ) write(TID,… )

Page Dirtying: With Txns

2/28/2009 FAST 2009 - Enabling System Transactions 9

Process 1 Old Page New Page LEGEND: U/R Page

slide-10
SLIDE 10

Current Kernel Design

2/28/2009 10 FAST 2009 - Enabling System Transactions

Ext3 Ext2 XFS ZFS Page Cache Old Page New Page U/R Page LEGEND:

Uh-oh…

Process 2 write(TID,…) log_append(TID,…) Page Writeback…

slide-11
SLIDE 11

Page Cache

What DBs Do

2/28/2009 FAST 2009 - Enabling System Transactions 11

Ext2 XFS ZFS Page Cache II: The Wrath of Khan (fsync)

Disk Cache Flush

slide-12
SLIDE 12

Simple Write Ordering

2/28/2009 12 FAST 2009 - Enabling System Transactions

FS1 FS2 FS3 FS4 Page Cache Old Page New Page U/R Page LEGEND: Valor

slide-13
SLIDE 13

Log Module

2/28/2009 13 FAST 2009 - Enabling System Transactions

Log File

1

U/R Page 1

2 3 4 5 6

U/R Page 5 U/R Page 4 U/R Page 3 U/R Page 2 1 2 3 4 5 6 State File

U/R,1 U/R,1 U/R,1 U/R,1 U/R,1 C,1

1 2 3 4 5 6

U/R,1 U/R,1 U/R,1 U/R,1 U/R,1 C,1

Process 2 tbegin(TID,…) tlog(TID,…) write(TID,…) page writeback tlog(TID,…) write(TID,…) tresolve(TID,…) page writeback page writeback 1 2 3 4 5 6 7 8 9

Disk

Valor Module Record Maps

slide-14
SLIDE 14

Atomicity Argument

 Transition from pre-writeback to post-

writeback disk state atomically iff

All writes preceded by sys_log_append Simple write ordering is implemented  writes to a single sector are atomic

 Valor satisfies the top 2 constraints  A supported hard disk satisfies the third

2/28/2009 FAST 2009 - Enabling System Transactions 14

slide-15
SLIDE 15

Performing Recovery

 Two kinds of recovery are supported:

System Recovery Application Recovery (per-process abort)

 Standard recovery process:

Reconstruct RAM state from log In reverse LSN order commit/abort landed

transactions

Perform a page writeback

2/28/2009 FAST 2009 - Enabling System Transactions 15

slide-16
SLIDE 16

Evaluation

 We must compare against traditional

asynchronous FSes

benchmark against asynchronous ext3 do serial transfer benchmarks for large files

 We turn off synchronous transactions for

two other controls (for fairness)

FS built on top of Stasis FS built on top of Berkeley DB

2/28/2009 16 FAST 2009 - Enabling System Transactions

slide-17
SLIDE 17

Mock ARIES Benchmark

 Important lower bound (not tight)

Log Disk MT-ow-noread Log Disk MT-ow Log Disk MT-ow-finite

2/28/2009 17 FAST 2009 - Enabling System Transactions

slide-18
SLIDE 18

Mock ARIES Benchmark

2/28/2009 FAST 2009 - Enabling System Transactions 18

10 20 30 40 50 60 70 80 90 Elapsed Time (sec) Wait User System

2x 2% 16% 104% 66% 35%

slide-19
SLIDE 19

Serial Overwrite

2/28/2009 FAST 2009 - Enabling System Transactions 19

100 200 300 400 500 600 700 800 900 1000 256 512 1024 2048 Elapsed Time (sec) Size of Serial Transfer (MiB) BDB Stasis Valor Ext3

2.75 x Ext3 5.0 x Ext3 22.75 x Ext3 Transaction size: 16 pages

slide-20
SLIDE 20

Transaction Throughput

2/28/2009 FAST 2009 - Enabling System Transactions 20

200 400 600 800 1000 1200 1 4 16 64 256 Elapsed Time (sec) Size of Transaction (pages) BDB Stasis Valor Ext3

Stasis Heel Valor Heel BDB Heel 2.9 x Ext3 4.2 x Ext3 23.0 x Ext3

slide-21
SLIDE 21

Conclusions

 System transactions are feasible  Valor achieves good overhead  Minimal changes to existing kernels

2/28/2009 21 FAST 2009 - Enabling System Transactions

slide-22
SLIDE 22

Limitations/Future Work

 Limitations

Locking slows interleaved writes to the

same page

Some FSes/Disks do not fsync() when

asked to

 Future Work

Explore use of logging device as a

coordinator in a transactional disk array

2/28/2009 FAST 2009 - Enabling System Transactions 22

slide-23
SLIDE 23

R.P. Spillane, S. Gaikwad. M. Chinni, C.P. Wright, E. Zadok Stony Brook University http://www.fsl.cs.sunysb.edu/

Q&A

Enabling System Transactions via Lightweight Kernel Extensions

slide-24
SLIDE 24

TxF

 TxF is Microsoft’s transactional file

system

Motivation: program installation, system

updates, website updates

 Pros

Backed by Microsoft

 Cons

Specific to NTFS

2/28/2009 FAST 2009 - Enabling System Transactions 24

slide-25
SLIDE 25

Isolation

 Extended mandatory locking

Allows locking of directories Do not have to set group exec/setgid bits

 Locking permissions

Let users decide if a file can be locked

 All processes acquire locks

Regular processes hold only for the syscall

 Lock inheritance

Allow multi-process transactions

2/28/2009 FAST 2009 - Enabling System Transactions 25

slide-26
SLIDE 26

Valor != Journaling

 Journaling FSes good at fast recovery  …but are too special-purpose:

No-Steal Caching

  • all state modified by a txn. must remain in memory

until commit/abort

Non-Modular Design

  • does not handle rollback of VFS and page caches,

just disk-state on boot

2/28/2009 26 FAST 2009 - Enabling System Transactions