1
Hope for the Best, Expect the Worst
- r what happens when
Hope for the Best, Expect the Worst or what happens when E[ f(good - - PowerPoint PPT Presentation
Hope for the Best, Expect the Worst or what happens when E[ f(good event) ] > E[ f(bad event) ] Lukas Kroc October 12, 2006 1 Outline Overview of file systems The basic idea: speculation Applying the idea to file systems:
1
2
– local file systems – distributed file systems
– performance results
3
– operations: create, write, read, delete
– local file systems – distributed file systems
– provide durability and performance given physical
4
stolen from Paul Francis' CS414 lecture notes
5
stolen from Paul Francis' CS414 lecture notes
6
– durability calls for immediate access to the medium
– performance calls for caching
– local: use memory cache and disk buffer to delay
– distributed: cache fetched files on clients
7
– new way of dealing with issues of distributed file
– applies ideas from above to issues of local file
– will reverse the order of presentation, easier first
8
9
– crash will most likely not occur in the next 5
– data in the cache is most likely valid
– to perform “free” speculative computation
10
– local file systems – distributed file systems
– performance results
11
– meta-data only for
– synchronous: system call return only after done – asynchronous: system call returns immediately
stolen from Paul Francis' CS414 lecture notes
12
– durable (but only if using write barriers, or with disk
– not durable, but fast
13
– the promise = synchronous guarantees – the user = any external entity observing the process
– asynchronous internal workings, synchronous
– combines performance and durability benefits of
14
– speculate that everything will be properly written to
– immediately return from write call (asynchrony) – buffer all external output of the application until the
– if write fails, discard the buffers
– better guarantees AND performance than ext3
15
16
17
– using RPC
– weaker than local file
18
– very slow
– for how slow it is
19
– speculate that close is successful, that a cached
– use asynchronous RPCs, immediately returning – checkpoint the application (store its state) and
– on success: output buffers, on failure: roll-back
– better guarantee AND performance than NFS
20
21
22
23
– local file systems – distributed file systems
– performance results
24
– any process that accesses uncommitted object is
– any external output of such process is buffered by
– logs are used to track dependencies
– also allows to group commits
25
26
27
– internally works asynchronously, but looks
– journal space exhausted, journal old.... – user calls fsync() – output-triggered by buffered output
28
PostMark benchmark Apache build
29
– copy-on-write fork of the process – not placed on the run queue
– success: the checkpoint is discarded – failed: process terminated and checkpoint assumes
30
31
– including close-to-open consistency
– but in an asynchronous, speculative manner
– what is observed has been committed
32
– single-copy file semantics (shared local disk)
– still outperforms NFS
– in case of access conflict, roll-back occurs
33
PostMark benchmark Apache build
34
Apache build
35
– known in fault tolerance research already – applicable to general I/O issues – “Expecting the best, being prepared for the worst”
– not “proven by time” yet, but looks good
– distributed simulations, processor cache warm-up