Causality-Based Versioning Causality-Based Versioning Kiran-Kumar - - PowerPoint PPT Presentation
Causality-Based Versioning Causality-Based Versioning Kiran-Kumar - - PowerPoint PPT Presentation
Causality-Based Versioning Causality-Based Versioning Kiran-Kumar Muniswamy-Reddy and David A. Holland Harvard School of Engineering and Applied Sciences Consider this scenario Consider this scenario I installed a piece of software I
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 2 2
Consider this scenario Consider this scenario
I installed a piece of software
I installed a piece of software
But.. that broke a few other tools!
But.. that broke a few other tools!
Uninstall not good enough
Uninstall not good enough
The config files were still corrupt
The config files were still corrupt
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 3 3
But which But which files were files were modified? modified?
Versioning Versioning
Maintains old Maintains old data to which data to which you can recover you can recover
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 4 4
Causality Causality
Tracks propagation Tracks propagation
- f data and lets you
- f data and lets you
find which files find which files were modified were modified Too bad I don’t Too bad I don’t have those old have those old versions versions
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 5 5
Causality Causality Versioning Versioning
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 6 6
Applications of Versioning + Causality Applications of Versioning + Causality
System Configuration Management
System Configuration Management
Causal data identifies files modified
Causal data identifies files modified
Version data allows you to recover the files
Version data allows you to recover the files modified modified
Intrusion Recovery
Intrusion Recovery
IP Compliance
IP Compliance
Reproduce Research Results
Reproduce Research Results
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 7 7
Apache split-logfile Vulnerability Apache split-logfile Vulnerability
Vulnerability in Apache 1.3
Vulnerability in Apache 1.3
Vulnerability allows attacker to overwrite
Vulnerability allows attacker to overwrite any file with a .log extension any file with a .log extension
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 8 8
08AM 08AM 09AM 09AM 10AM 10AM 11AM 11AM 12PM 12PM
- pen DB.log
- pen DB.log
Write tx Write tx Write tx Write tx Detect Corruption Detect Corruption
Scenario Scenario
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 9 9
08AM 08AM 09AM 09AM 10AM 10AM
V1;DB.log V1;DB.log
Open-close Open-close
12PM 12PM Detect Corruption Detect Corruption
Can only Can only recover to recover to 8 AM 8 AM
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 10 10
08AM 08AM 09AM 09AM 10AM 10AM
V1;DB.log V1;DB.log
Version-on-every write Version-on-every write
V2;DB.log V2;DB.log Vn;DB.log Vn;DB.log Vn+2;DB.log Vn+2;DB.log Vn+1;DB.log Vn+1;DB.log
can recover to can recover to 10 AM, but 10 AM, but expensive expensive
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 11 11
Goal Goal
Combine versioning and causality, taking Combine versioning and causality, taking advantage of causality information to advantage of causality information to create versions at just the right time create versions at just the right time
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 12 12
Contributions Contributions
Two algorithms that create
Two algorithms that create useful useful versions versions
Cycle Avoidance
Cycle Avoidance
Graph Finesse
Graph Finesse
Evaluate efficacy and efficiency of these
Evaluate efficacy and efficiency of these two algorithms in the context of versioning two algorithms in the context of versioning
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 13 13
Outline Outline
Introduction
Introduction
Background on PASS
Background on PASS
Versioning Algorithms
Versioning Algorithms
Implementation
Implementation
Evaluation
Evaluation
Conclusion
Conclusion
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 14 14
PASS Architecture: P reads A PASS Architecture: P reads A
KERNEL USER VFS Layer Syscall Layer Waldo User process P Lasagna Interceptor Observer Analyzer
log
Distributor
generates generates record record ‘P ‘P A’ A’ version? version? cache cache ‘P ‘P A’ A’ filters filters events events
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 15 15
PASS Architecture: PASS Architecture: P writes B P writes B
KERNEL USER VFS Layer Syscall Layer Waldo User process P Lasagna Interceptor Observer Analyzer
log
Distributor
generates generates record record ‘B ‘B P’ P’ Version? Version? cache cache ‘P ‘P A’ A’ P P A A B B P P ‘ ‘B B P’ P’
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 16 16
Outline Outline
Introduction
Introduction
Background on PASS
Background on PASS
Versioning Algorithms
Versioning Algorithms
Implementation
Implementation
Evaluation
Evaluation
Conclusion
Conclusion
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 17 17
Intuition for new algorithms Intuition for new algorithms
The creation of a cycle is an indicator
The creation of a cycle is an indicator that a version created at that instant that a version created at that instant could be useful later could be useful later
Cycles are violations of causality
Cycles are violations of causality
Implies that past depends on future!
Implies that past depends on future!
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 18 18
Open-Close Versioning Open-Close Versioning
1. 1.
On the last close of a file, issue a “freeze” On the last close of a file, issue a “freeze”
- peration
- peration
Freeze declares end of a version Freeze declares end of a version
2. 2.
The next open and write triggers a new The next open and write triggers a new version version
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 19 19
Example scenario Example scenario
P Q read A read B write B write A read A read B
Each read/write is Each read/write is enclosed by an enclosed by an
- pen and close
- pen and close
Time Time
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 20 20
Open-Close Open-Close
A1 A1 P P
P Q read A read B write B write A read A read B
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 21 21
Open-Close Open-Close
A1 A1 P P Q Q B1 B1
P Q read A read B write B write A read A read B
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 22 22
Open-Close Open-Close
A1 A1 P P B2 B2 Q Q B1 B1
P Q read A read B write B write A read A read B
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 23 23
Open-Close Open-Close
A1 A1 P P B2 B2 A2 A2 Q Q B1 B1
P Q read A read B write B write A read A read B
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 24 24
Open-Close Open-Close
A1 A1 P P B2 B2 A2 A2 Q Q B1 B1
P Q read A read B write B write A read A read B
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 25 25
Open-Close Open-Close
A1 A1 P P B2 B2 A2 A2 Q Q B1 B1
P Q read A read B write B write A read A read B
Open-Close allows cycles to happen. Open-Close allows cycles to happen. Violates Causality Violates Causality
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 26 26
Version-on-every write Version-on-every write
Pros:
Pros:
Preserves causality: there are no cycles
Preserves causality: there are no cycles
Every read creates a new version of the process
Every read creates a new version of the process
Every write creates a new version of the file
Every write creates a new version of the file
There are no duplicates either
There are no duplicates either
Disadvantage: most versions are
Disadvantage: most versions are unnecessary unnecessary
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 27 27
Cycle Avoidance Algorithm Cycle Avoidance Algorithm
Preserves Causality by avoiding cycles
Preserves Causality by avoiding cycles
Uses local per-object information to make
Uses local per-object information to make decisions decisions
Similar to the timestamp ordering in
Similar to the timestamp ordering in databases databases
Intuition:
Intuition:
Freeze an object when we add a dependency Freeze an object when we add a dependency that does not previously exist, i.e., new that does not previously exist, i.e., new causality causality
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 28 28
Cycle Avoidance Example Cycle Avoidance Example
On receiving record A1
On receiving record A1 B2 B2
If no B in A’s history, then freeze A
If no B in A’s history, then freeze A
Else if B in A’s history, then
Else if B in A’s history, then
If A’s history has B2, discard record (duplicate)
If A’s history has B2, discard record (duplicate)
If A’s history has B3 (version > 2), discard record
If A’s history has B3 (version > 2), discard record
If A’s history has B1 (version < 2), freeze A
If A’s history has B1 (version < 2), freeze A
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 29 29
Cycle Avoidance Cycle Avoidance
A1 A1 P2 P2 A1 A1 P1 P1
P Q read A read B write B write A read A read B
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 30 30
Cycle Avoidance Cycle Avoidance
A1 A1 P2 P2 B2 B2 A2 A2 Q2 Q2 B1 B1 P3 P3
P Q read A read B write B write A read A read B
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 31 31
Cycle Avoidance Cycle Avoidance
A1 A1 P2 P2 B2 B2 A2 A2 Q2 Q2 B1 B1 P3 P3 Q3 Q3
P Q read A read B write B write A read A read B
Cycle-Avoidance prevents cycles, Cycle-Avoidance prevents cycles, but creates more versions but creates more versions
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 32 32
Graph Finesse Algorithm Graph Finesse Algorithm
Uses Global knowledge
Uses Global knowledge
Intuition:
Intuition:
Check every new record against a global
Check every new record against a global dependency graph. dependency graph.
If it forms a cycle, just freeze that one node
If it forms a cycle, just freeze that one node
Subsumes open-close algorithm
Subsumes open-close algorithm
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 33 33
Graph Finesse Example Graph Finesse Example
On receiving record A1 On receiving record A1 B2 B2
If B2 is already in A’s history, discard record If B2 is already in A’s history, discard record
Else check for a path from B2 Else check for a path from B2 A1 A1
If yes, this a cycle, freeze A1 and change the If yes, this a cycle, freeze A1 and change the record to A2 record to A2 B2 B2
If no cycle, add A1 If no cycle, add A1 B2 to the graph B2 to the graph
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 34 34
Graph Finesse Graph Finesse
A1 A1 P1 P1 B2 B2 A2 A2 Q1 Q1 B1 B1 Q2 Q2
P Q read A read B write B write A read A read B
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 35 35
Cycle Avoidance Cycle Avoidance
A1 A1 P1 P1 B2 B2 A2 A2 Q1 Q1 B1 B1 Q2 Q2 A1 A1 P2 P2 B2 B2 A2 A2 Q2 Q2 B1 B1 Q3 Q3 P3 P3
Graph Finesse
Graph Finesse prevents cycles. Graph Finesse prevents cycles. But creates fewer versions than But creates fewer versions than Cycle Avoidance Cycle Avoidance
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 36 36
Cycle Avoidance Graph Finesse Uses Local state Uses Global state Creates a few un- necessary versions Creates fewer versions Has lower runtime
- verhead
Can have high run- time overheads
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 37 37
Outline Outline
Introduction
Introduction
Background on PASS
Background on PASS
Versioning Algorithms
Versioning Algorithms
Implementation
Implementation
Evaluation
Evaluation
Conclusion
Conclusion
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 38 38
Implementation Implementation
Implemented on Linux 2.6.23.17
Implemented on Linux 2.6.23.17
Lasagna is a stackable file system derived
Lasagna is a stackable file system derived from eCryptfs from eCryptfs
Versioning file system
Versioning file system
Redo log that keeps track of file versioning
Redo log that keeps track of file versioning (deltas) (deltas)
Redo log for directory modifications (deltas)
Redo log for directory modifications (deltas)
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 39 39
Outline Outline
Introduction
Introduction
Background on PASS
Background on PASS
Versioning Algorithms
Versioning Algorithms
Implementation
Implementation
Evaluation
Evaluation
Conclusion
Conclusion
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 40 40
Evaluation Goals Evaluation Goals
What are the run-time overheads a user
What are the run-time overheads a user might see? might see?
What are the space overheads?
What are the space overheads?
How do the algorithms compare during
How do the algorithms compare during recovery? recovery?
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 41 41
Test platform Test platform
Linux 2.6.23.17
Linux 2.6.23.17
3Ghz Pentium 4
3Ghz Pentium 4
512MB of RAM
512MB of RAM
80GB 7200 RPM IDE Disk
80GB 7200 RPM IDE Disk
All results are averages of 5 runs
All results are averages of 5 runs
Less than 5% Std. Dev.
Less than 5% Std. Dev.
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 42 42
Modes Modes
Without causal data
Without causal data
Ext2: Baseline (
Ext2: Baseline (Lasagna was stacked on Ext2
Lasagna was stacked on Ext2)
)
VER: plain versioning (
VER: plain versioning (open-close
- pen-close)
)
With causal data
With causal data
OC: open-close
OC: open-close
CA: Cycle-Avoidance
CA: Cycle-Avoidance
GF: Graph-Finesse
GF: Graph-Finesse
ALL: Version-on-every write
ALL: Version-on-every write
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 43 43
Linux Compile: Elapsed Time Linux Compile: Elapsed Time
11.9%
500 1000 1500 2000 2500 3000 EXT2 VER OC CA GF ALL Time (s) Wait User System
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 44 44
Linux Compile: Elapsed Time Linux Compile: Elapsed Time
11.9% 17.1% 18.3% 21.3%
500 1000 1500 2000 2500 3000 EXT2 VER OC CA GF ALL Time (s) Wait User System
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 45 45
Linux Compile: Elapsed Time Linux Compile: Elapsed Time
11.9% 17.1% 18.3% 21.3% 57.4%
500 1000 1500 2000 2500 3000 EXT2 VER OC CA GF ALL Time (s) Wait User System
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 46 46
Linux Compile: Space Overheads Linux Compile: Space Overheads
2.9%
0.0 0.5 1.0 1.5 2.0 2.5 3.0 EXT2 VER OC CA GF ALL Space (GB)
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 47 47
Linux Compile: Space Overheads Linux Compile: Space Overheads
2.9% 15.8% 17.6% 15.8%
0.0 0.5 1.0 1.5 2.0 2.5 3.0 EXT2 VER OC CA GF ALL Space (GB)
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 48 48
Linux Compile: Space Overheads Linux Compile: Space Overheads
2.9% 15.8% 17.6% 15.8% 121.6%
0.0 0.5 1.0 1.5 2.0 2.5 3.0 EXT2 VER OC CA GF ALL Space (GB)
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 49 49
Mercurial Activity: Elapsed Time Mercurial Activity: Elapsed Time
25.9%
0.0 200.0 400.0 600.0 800.0 1000.0 1200.0 1400.0 EXT2 VER OC CA GF ALL Time(s) Wait User System
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 50 50
Mercurial Activity: Elapsed Time Mercurial Activity: Elapsed Time
25.9% 28.8% 27.9% 89.6%
0.0 200.0 400.0 600.0 800.0 1000.0 1200.0 1400.0 EXT2 VER OC CA GF ALL Time(s) Wait User System
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 51 51
Mercurial Activity: Elapsed Time Mercurial Activity: Elapsed Time
25.9% 28.8% 27.9% 89.6% 61.3%
0.0 200.0 400.0 600.0 800.0 1000.0 1200.0 1400.0 EXT2 VER OC CA GF ALL Time(s) Wait User System
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 52 52
Mercurial Activity: Space Overheads Mercurial Activity: Space Overheads
26.6%
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 EXT2 VER OC CA GF ALL Space (GB)
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 53 53
Mercurial Activity: Space Overheads Mercurial Activity: Space Overheads
26.6% 31.6% 30.2% 31.9%
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 EXT2 VER OC CA GF ALL Space (GB)
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 54 54
Mercurial Activity: Space Overheads Mercurial Activity: Space Overheads
26.6% 31.6% 30.2% 31.9% 53.7%
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 EXT2 VER OC CA GF ALL Space (GB)
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 55 55
Recovery Benchmarks Recovery Benchmarks
How the algorithms perform in the scenario
How the algorithms perform in the scenario where open close is not sufficient where open close is not sufficient
Microbenchmark
Microbenchmark
Models the apache split-log scenario
Models the apache split-log scenario
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 56 56
Recovery MicroBenchmark Recovery MicroBenchmark
P P write write Q Q fork fork pipe pipe write write read read read read
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 57 57
Recovery Microbenchmark: Space Util. Recovery Microbenchmark: Space Util.
Causal Data Version Data OC 60KB 12KB CA 176KB 470.5MB GF 184KB 470.5MB ALL 76.9MB 1.97GB
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 58 58
Recovery Times Recovery Times
5 10 15 20 25 30 Rollback 1 Rollback 5 RollBack 9 Recovery Times (s) CA GF
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 59 59
Recovery Times Recovery Times
25.1x 17.9x 9.3x
100 200 300 400 500 600 700 800 Rollback 1 Rollback 5 RollBack 9 Recovery Time(s) CA GF ALL
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 60 60
Conclusions Conclusions
Combining Versioning and Causality
Combining Versioning and Causality enables novel functionality enables novel functionality
New algorithms for Causal Versioning
New algorithms for Causal Versioning
Cycle Avoidance
Cycle Avoidance
Comparable to open-close
Comparable to open-close
May create more versions
May create more versions
Graph Finesse
Graph Finesse
Provides greater control on versioning
Provides greater control on versioning
Can be inefficient at times
Can be inefficient at times
2/25/2009 2/25/2009 Causality-Based Versioning - FAST'09 Causality-Based Versioning - FAST'09 61 61