Hope for the Best, Expect the Worst or what happens when E[ f(good - - PowerPoint PPT Presentation

hope for the best expect the worst
SMART_READER_LITE
LIVE PREVIEW

Hope for the Best, Expect the Worst or what happens when E[ f(good - - PowerPoint PPT Presentation

Hope for the Best, Expect the Worst or what happens when E[ f(good event) ] > E[ f(bad event) ] Lukas Kroc October 12, 2006 1 Outline Overview of file systems The basic idea: speculation Applying the idea to file systems:


slide-1
SLIDE 1

1

Hope for the Best, Expect the Worst

  • r what happens when

E[ f(good event) ] > E[ f(bad event) ] Lukas Kroc October 12, 2006

slide-2
SLIDE 2

2

Outline

  • Overview of file systems
  • The basic idea: speculation
  • Applying the idea to file systems:

– local file systems – distributed file systems

  • Implementation issues

– performance results

  • Conclusion
slide-3
SLIDE 3

3

File Systems: What They Are

  • allow and organize access to data

– operations: create, write, read, delete

  • physical scenarios:

– local file systems – distributed file systems

  • goal:

– provide durability and performance given physical

limitations (latency, bandwidth)

  • consistency added for distributed systems
slide-4
SLIDE 4

4

File Systems: How They Work

stolen from Paul Francis' CS414 lecture notes

slide-5
SLIDE 5

5

File Systems: How They Work

stolen from Paul Francis' CS414 lecture notes

slide-6
SLIDE 6

6

Main Issues

  • a trade-off between durability and performance

– durability calls for immediate access to the medium

  • synchronous access

– performance calls for caching

  • asynchronous access
  • file system speedups:

– local: use memory cache and disk buffer to delay

access

– distributed: cache fetched files on clients

slide-7
SLIDE 7

7

Papers for Discussion

  • Nightingale et al: Speculative Execution in a

Distributed File System (SOSP'05)

– new way of dealing with issues of distributed file

system

  • Nightingale et al: Rethink the Sync (OSDI'06)

– applies ideas from above to issues of local file

systems

  • same basic idea, different scenarios

– will reverse the order of presentation, easier first

slide-8
SLIDE 8

8

Basic Idea

“Expect the best, be prepared for the worst”

  • best = no power failure, cached data is valid
  • worst = power fails, cached data is invalid
  • prepared = able to recover a consistent state

after a bad event happened

  • expect = speculate that it will happen
slide-9
SLIDE 9

9

Conditions for the Basic Idea to Work

  • highly predictable results of speculations

– crash will most likely not occur in the next 5

seconds

– data in the cache is most likely valid

  • computers have spare CPU cycles

– to perform “free” speculative computation

  • local overhead is lower than remote I/O
slide-10
SLIDE 10

10

Outline

  • Overview of file systems
  • The basic idea: speculation
  • Applying the idea to file systems:

– local file systems – distributed file systems

  • Implementation issues

– performance results

  • Conclusion
slide-11
SLIDE 11

11

Local File Systems: Traditional Approach (ext3)

  • i-node based
  • added journaling for

increased durability

– meta-data only for

performance reasons

  • 2 modes of operation:

– synchronous: system call return only after done – asynchronous: system call returns immediately

stolen from Paul Francis' CS414 lecture notes

slide-12
SLIDE 12

12

Problems of Traditional Approach

  • synchronous mode:

– durable (but only if using write barriers, or with disk

buffer disabled), but very slow

  • asynchronous mode:

– not durable, but fast

slide-13
SLIDE 13

13

Local File Systems: New Approach

  • shift of paradigm: don't promise anything to the

application, promise it to the user

– the promise = synchronous guarantees – the user = any external entity observing the process

⇒ external synchrony

– asynchronous internal workings, synchronous

external guarantees

– combines performance and durability benefits of

both

slide-14
SLIDE 14

14

External Synchrony

  • Idea:

– speculate that everything will be properly written to

disk

  • Overview:

– immediately return from write call (asynchrony) – buffer all external output of the application until the

write successfully happens

– if write fails, discard the buffers

  • Result:

– better guarantees AND performance than ext3

slide-15
SLIDE 15

15

External Synchrony: Schema

slide-16
SLIDE 16

16

External Synchrony: Performance

slide-17
SLIDE 17

17

Distributed File Systems: Traditional Approach (NFS)

  • client-server approach
  • synchronous I/O
  • perations required for

coherence

– using RPC

  • offers close-to-open

consistency

– weaker than local file

systems

slide-18
SLIDE 18

18

Problems of Traditional Approach

  • at least 2 round-trip-

times required per close

– very slow

  • close-to-open

consistency isn't very good

– for how slow it is

slide-19
SLIDE 19

19

Distributed File Systems: New Approach

  • Idea:

– speculate that close is successful, that a cached

data is valid....

  • Overview:

– use asynchronous RPCs, immediately returning – checkpoint the application (store its state) and

buffer all subsequent output

– on success: output buffers, on failure: roll-back

  • Result:

– better guarantee AND performance than NFS

slide-20
SLIDE 20

20

Speculative NFS: Schema

slide-21
SLIDE 21

21

Speculative NFS: Performance

slide-22
SLIDE 22

22

Overview of the Technique

Speculate on...

power failure not occurring, cache being valid

...by means of...

buffering externalized output, checkpointing the process

...in order to...

improve performance, increase consistency

slide-23
SLIDE 23

23

Outline

  • Overview of file systems
  • The basic idea: speculation
  • Applying the idea to file systems:

– local file systems – distributed file systems

  • Implementation issues

– performance results

  • Conclusion
slide-24
SLIDE 24

24

Implementation: Buffering Externalized Output

  • any kernel object with commit dependencies is

uncommitted

– any process that accesses uncommitted object is

marked uncommitted, and vice versa

– any external output of such process is buffered by

kernel

– logs are used to track dependencies

  • once commit dependencies are removed, the

buffers are output to external devices

– also allows to group commits

slide-25
SLIDE 25

25

Buffering Externalized Output (1)

slide-26
SLIDE 26

26

Buffering Externalized Output (2)

slide-27
SLIDE 27

27

Result: xsyncfs

  • adapted ext3 file system to use external

synchrony

– internally works asynchronously, but looks

synchronous

  • commits journal transaction when:

– journal space exhausted, journal old.... – user calls fsync() – output-triggered by buffered output

  • adapts for throughput/latency optimization
slide-28
SLIDE 28

28

xsyncfs: Performance

PostMark benchmark Apache build

slide-29
SLIDE 29

29

Implementation: Checkpointing a Process

  • checkpoint: a state-image of a process

– copy-on-write fork of the process – not placed on the run queue

  • output of the running processed buffered while

the process is speculative (with a checkpoint)

  • depending on the result of the speculation:

– success: the checkpoint is discarded – failed: process terminated and checkpoint assumes

its identity and placed on the run queue

slide-30
SLIDE 30

30

Propagating Causal Dependencies

slide-31
SLIDE 31

31

Result: SpecNFS

  • preserves existing NFS semantics

– including close-to-open consistency

  • offers much better performance than NFS
  • implemented using the same RPCs

– but in an asynchronous, speculative manner

  • follows the external-synchrony paradigm

– what is observed has been committed

slide-32
SLIDE 32

32

Result: BlueFS

  • strong consistency and safety guarantees

– single-copy file semantics (shared local disk)

  • still good performance

– still outperforms NFS

  • prior to read/write, cached versions are

speculated to be valid

– in case of access conflict, roll-back occurs

slide-33
SLIDE 33

33

SpecNFS & BlueFS: Performance

PostMark benchmark Apache build

slide-34
SLIDE 34

34

SpecNFS & BlueFS: Performance

Apache build

slide-35
SLIDE 35

35

Conclusions

  • Concept of speculation/roll-back introduced

– known in fault tolerance research already – applicable to general I/O issues – “Expecting the best, being prepared for the worst”

  • Might help resolve the tension between

performance and durability in file systems

– not “proven by time” yet, but looks good

  • The idea is applicable in a broader context

– distributed simulations, processor cache warm-up