*-Box (star-box)
Towards Reliability and Consistency in Dropbox-like File Synchronization Services
Yupu Zhang, Chris Dragga, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau University of Wisconsin - Madison
6/27/2013 1
Cloud-Based File Synchronization Services Exploding in popularity - - PowerPoint PPT Presentation
*-Box (star-box) Towards Reliability and Consistency in Dropbox-like File Synchronization Services Yupu Zhang , Chris Dragga, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau University of Wisconsin - Madison 6/27/2013 1 Cloud-Based File
Yupu Zhang, Chris Dragga, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau University of Wisconsin - Madison
6/27/2013 1
– Numerous providers: Dropbox, Google Drive, SkyDrive … – Large user base: Dropbox has more than 100 million users
– Automatic synchronization across clients/devices – Reliable data storage on the server through replication
6/27/2013 2
6/27/2013 3
6/27/2013 4
after reboot sync client thinks everything is in sync
6/27/2013 5
– Close the gap between local file system and cloud – Provide * without too much infrastructure changes
– e.g., reliable, consistent, fast, private …
– Reliable: Data corruption – Consistent: Crash consistency
6/27/2013 6
6/27/2013 7
– Comes from disk media, firmware, controllers
[Bairavasundaram07, Anderson03]
– Remains local w/o synchronization
– Corruption may propagate and pollute other copies
– Make sure synchronized data is good
6/27/2013 8
I1 I
Dedup
NOT a duplicate duplicate
Disk
9 6/27/2013
Dropbox Client
Local Database
File System Dropbox Server
Name Attributes
foo I
Inotify
Rsync
C1 C2 C3 D1 D2 foo [v0] (4MB data chunks) foo: inode
NO offline changes
D2’
foo was modified I’ read foo (chunk by chunk) D1 … … C1’ C2 C3 4KB data blocks D2’ C1’ foo [v1] (4MB data chunks)
Name Attributes
foo I’ Changed Multiple Times
D1 D1 D1 D1
Disk
M1 I
10 6/27/2013
File System Dropbox Server Inotify
C1 C2 C3 D2 foo [v0] (4MB data chunks) … … Corrupt D1 foo: inode 4KB data blocks
I I
Local Database
Name Attributes
foo I
Dedup
Disk
11 6/27/2013
Dropbox Client File System Dropbox Server Inotify
Rsync
C1 C2 C3 D1 D2 foo [v0] (4MB data chunks) D1 … …
NO offline changes
foo: inode 4KB data blocks
I I’ D1 D1
Dedup
NOT a duplicate
Disk
I
12 6/27/2013
Dropbox Client
Local Database
File System Dropbox Server Inotify
Rsync
C1 C2 C3 D2 foo [v0] (4MB data chunks) D2’
foo was modified read foo (chunk by chunk) … … C1’ D2’ C1’ foo [v1] (4MB data chunks) D1 foo: inode 4KB data blocks
Name Attributes
foo I
Name Attributes
foo I’
I I’ D2 D1 D1
Dedup
NOT a duplicate
Disk
I
13 6/27/2013
Dropbox Client
Local Database
File System Dropbox Server Inotify
Rsync
C1 C2 C3 D2 foo [v0] (4MB data chunks) touch -m foo’s metadata was changed read foo (chunk by chunk) … … C1’ D2’ C1’ foo [v1] (4MB data chunks) D1 foo: inode 4KB data blocks
Name Attributes
foo I
Name Attributes
foo I’
FS Service Data Writes
Metadata Changes
mtime ctime atime ext4 (Linux) Dropbox L G L G L G L
L G L G L L FileRock L G L G L L HFS+ (Mac OS X) Dropbox L G L G L L
L G L G L L GoogleDrive L G L G L L SugarSync L G L L L Syncplicity L G L G L L
6/27/2013 14
L: Local corruption G: Global corruption
– ALL copies polluted – Cloud copies protected by checksum
– FS monitoring services only provide file-level notification – Sync clients cannot tell legitimate changes from corruption
– If corruption can be detected, local FS can recover from corruption using cloud copies
6/27/2013 15
6/27/2013 16
– Always roll back to a consistent version
– Data journaling mode
– Ordered journaling mode
6/27/2013 17
Disk
I I I’ I’ I’ D1’
1. Write dirty data blocks to home locations 2. Write metadata blocks to journal 3. Write journal commit block to the journal 4. Checkpoint journaled metadata blocks to home locations
18 6/27/2013
File System
D1 D2 D2’ D1
D2’ 1 2 C 3 4 foo: inode 4KB data blocks … D1’ 1 Journal / Log
Disk
I I’ I’ I D1’
19 6/27/2013
File System
D1 D2 D2’ D1
D2’ 1 foo: inode 4KB data blocks … D1’ 1 Consistent Data Inconsistent Data
6/27/2013 20
I I’ I
Disk
D1
Dedup
21 6/27/2013
Dropbox Client File System Dropbox Server Inotify
Rsync
C1 C2 C3 D2 foo [v0] (4MB data chunks) D2’ … … foo: inode 4KB data blocks
D1’ D1’ crash AFTER database is changed inconsistent data
Local Database
Name Attributes
foo I
Name Attributes
foo I’ NOT fully updated foo was modified
I I C1
Dedup
Disk
22 6/27/2013
Dropbox Client
Local Database
File System Dropbox Server Inotify
Rsync
C2 C3 D1 D2 foo [v0] (4MB data chunks) D1’ … … foo: inode 4KB data blocks C1’ foo [v1] (4MB data chunks) inconsistent data D1’ D2 on cloud
Name Attributes
foo I’ NOT fully updated
Sync!
inconsistent data
I I’ I
Disk
D1
Dedup
23 6/27/2013
Dropbox Client
Local Database
File System Dropbox Server Inotify
Rsync
C1 C2 C3 D2 foo [v0] (4MB data chunks) D2’ … … foo: inode 4KB data blocks
(O_SYNC) D1’ D1’ D2’ crash BEFORE database is changed consistent data
Name Attributes
foo I
I I C1
Dedup
Disk
24 6/27/2013
Dropbox Client
Local Database
File System Dropbox Server Inotify
Rsync
C2 C3 D1 D2’ foo [v0] (4MB data chunks) D1’ … …
NO offline changes
foo: inode 4KB data blocks Server and other devices still have v0 This machine has v1
Name Attributes
foo I
NO sync!
consistent data
6/27/2013 25
6/27/2013 26
6/27/2013 27
6/27/2013 28
6/27/2013 29
– Propagation of corrupt data and inconsistent state – Synchronized files are out-of-sync
– Solve problems by reducing the semantic gap between existing local FS and cloud storage
6/27/2013 30
Advanced Systems Lab (ADSL) University of Wisconsin-Madison http://www.cs.wisc.edu/adsl Wisconsin Institute on Software-defined Datacenters in Madison http://wisdom.cs.wisc.edu/
6/27/2013 31