MetaSync File Synchronization Across Multiple Untrusted Storage - - PowerPoint PPT Presentation

metasync
SMART_READER_LITE
LIVE PREVIEW

MetaSync File Synchronization Across Multiple Untrusted Storage - - PowerPoint PPT Presentation

MetaSync File Synchronization Across Multiple Untrusted Storage Services Seungyeop Han, Haichen Shen, Taesoo Kim*, Arvind Krishnamurthy, Thomas Anderson, and David Wetherall University of Washington *Georgia Institute of Technology 1 File


slide-1
SLIDE 1

MetaSync

File Synchronization Across Multiple Untrusted Storage Services

Seungyeop Han, Haichen Shen, Taesoo Kim*, Arvind Krishnamurthy, Thomas Anderson, and David Wetherall

University of Washington *Georgia Institute of Technology

1

slide-2
SLIDE 2

File sync services are popular

400M of Dropbox users reached in June 2015

2

slide-3
SLIDE 3

Baidu(2TB)

Many sync service providers

Dropbox(2GB) Google Drive (15GB) MS OneDrive (15GB) Box.net (10GB)

3

slide-4
SLIDE 4

Can we rely on any single service?

4

slide-5
SLIDE 5

Existing Approaches

  • Encrypt files to prevent modification

– Boxcryptor

  • Rewrite file sync service to reduce trust

– SUNDR (Li et al., 04), DEPOT (Mahajan et al., 10)

5

slide-6
SLIDE 6

MetaSync: Can we build a better file synchronization system across multiple existing services?

MetaSync

6

Higher availability, greater capacity, higher performance Stronger confidentiality & integrity

slide-7
SLIDE 7

Goals

  • Higher availability
  • Stronger confidentiality & integrity
  • Greater capacity and higher performance
  • No service-service, client-client

communication

  • No additional server
  • Open source software

7

slide-8
SLIDE 8

Overview

  • Motivation & Goals
  • MetaSync Design
  • Implementation
  • Evaluation
  • Conclusion

8

slide-9
SLIDE 9

Key Challenges

  • Maintain a globally consistent view of the

synchronized files across multiple clients

  • Using only the service providers’ unmodified

APIs without any centralized server

  • Even in the presence of service failure

9

slide-10
SLIDE 10

Overview of the Design

Synchronization Replication Object Store MetaSync Backend abstractions Local Storage Dropbox Google Drive OneDrive Remote Services

10

  • 1. File Management
slide-11
SLIDE 11

Object Store

  • Similar data structure with version control

systems (e.g., git)

  • Content-based addressing

– File name = hash of the contents – De-duplication – Simple integrity checks

  • Directories form a hash tree

– Independent & concurrent updates

11

slide-12
SLIDE 12

Object Store

12

head = f12…

Dir1

abc…

Dir2

4c0…

Large.bin

20e…

blob blob blob small1 small2

  • Files are chunked or grouped into blobs
  • The root hash = f12… uniquely identifies a snapshot
slide-13
SLIDE 13

Object Store

13

  • ld = f12…

Dir1

abc…

Dir2

4c0…

Large.bin

20e…

blob blob blob small1 small2

  • Files are chunked or grouped into blobs
  • The root hash = f12… uniquely identifies a snapshot

1ae…

blob

head = 07c…

Large.bin

slide-14
SLIDE 14

Overview of the Design

Synchronization Replication Object Store MetaSync Backend abstractions Local Storage Dropbox Google Drive OneDrive Remote Services

14

  • 2. Consistent update
slide-15
SLIDE 15

Updating Global View

Global View

v0 ab1…

Client1

Prev Head Prev Head

Client2

master

15

Previously synchronized point Current root hash

slide-16
SLIDE 16

Updating Global View

Global View

v0 ab1…

Client1

Prev Head Prev

Client2

v1 c10…

master

16

Head

slide-17
SLIDE 17

Updating Global View

Global View

v0 ab1…

Client1

Prev Head Prev

Client2

v1 c10…

master

17

Head

slide-18
SLIDE 18

Updating Global View

Global View

v0 ab1…

Client1

Prev Head Prev

Client2

v1 c10…

master

18

Head

slide-19
SLIDE 19

Updating Global View

Global View

v0 ab1…

Client1

Prev Prev Head

Client2

v1 c10… v2 7b3…

master

19

Head v2 f13…

slide-20
SLIDE 20

Updating Global View

Global View

v0 ab1…

Client1

Prev Prev Head

Client2

v1 c10… v2 7b3…

master

20

Head v2 f13…

slide-21
SLIDE 21

Updating Global View

Global View

v0 ab1…

Client1

Prev Prev Head

Client2

v1 c10… v2 7b3…

master

21

Head v3 a31…

slide-22
SLIDE 22

Consistent Update of Global View

  • Need to handle concurrent updates,

unavailable services based on existing APIs

MetaSync Dropbox MetaSync Google Drive OneDrive root= f12… root= b05…

22

slide-23
SLIDE 23

Paxos

  • Multi-round non-blocking consensus

algorithm

– Safe regardless of failures – Progress if majority is alive

Proposer Acceptor

23

slide-24
SLIDE 24

Metasync: Simulate Paxos

  • Use an append-only list to log Paxos messages

– Client sends normal Paxos messages – Upon arrival of message, service appends it into a list – Client can fetch a list of the ordered messages

  • Each service provider has APIs to build append-
  • nly list

– Google Drive, OneDrive, Box: Comments on a file – Dropbox: Revision list of a file – Baidu: Files in a directory

24

slide-25
SLIDE 25

Metasync: Passive Paxos (pPaxos)

  • Backend services work as passive acceptor
  • Acceptor decisions are delegated to clients

Clients Passive Storage Services

S2 S1 S3

25

P1 P2 propose(3)

slide-26
SLIDE 26

Metasync: Passive Paxos (pPaxos)

  • Backend services work as passive acceptor
  • Acceptor decisions are delegated to clients

Clients Passive Storage Services

S2 S1 S3

26

P1 P2 propose(2)

slide-27
SLIDE 27

Metasync: Passive Paxos (pPaxos)

  • Backend services work as passive acceptor
  • Acceptor decisions are delegated to clients

Clients Passive Storage Services

S2 S1 S3

27

P1 P2 fetch(S1) fetch(S2) fetch(S3)

slide-28
SLIDE 28

Metasync: Passive Paxos (pPaxos)

  • Backend services work as passive acceptor
  • Acceptor decisions are delegated to clients

Clients Passive Storage Services

S2 S1 S3

28

P1 P2 accept(3, v1) fetch

slide-29
SLIDE 29

DiskPaxos

29

Disk 1 Disk 2 Disk 3 P1 P2 P3 Propose

slide-30
SLIDE 30

DiskPaxos

30

Disk 1 Disk 2 Disk 3 P1 P2 P3 Fetch

slide-31
SLIDE 31

Paxos vs. Disk Paxos vs. pPaxos

  • Disk Paxos: maintains a block per client

Proposer Acceptor computation

Propose Accept

Paxos

Proposer Acceptor disk blocks

Propose Check

Disk Paxos

Proposer Acceptor append-only

Propose Check

pPaxos

Gafni & Lamport ’02 Requires acceptor API O(clients x acceptors) O(acceptors) O(acceptors)

31

require # msgs

slide-32
SLIDE 32

Overview of the Design

Synchronization Replication Object Store MetaSync Backend abstractions Local Storage Dropbox Google Drive OneDrive Remote Services

32

  • 3. Replicate objects
slide-33
SLIDE 33

Stable Deterministic Mapping

  • MetaSync replicates objects R times across S

storage providers (R<S)

  • Requirements

– Share minimal information among services/clients – Support variation in storage size – Minimize realignment upon configuration changes

  • Deterministic mapping

– E.g., map(7a1…) = Dropbox, Google

33

slide-34
SLIDE 34

Deterministic Mapping Example

  • Service = {A(1), B(2), C(2), D(1)}
  • N = {A1, B1, B2, C1, C2, D1} (normalized)
  • Map(i) = Sorted(N, key= md5(i, serviceID, vID))

Capacity

map[0] = [A1, C2, D1, B1, B2, C1] = [A, C] map[1] = [B2, B1, C1, C2, A1, D1] = [B, C] … map[19] = [C2, B1, D1, A1, B2, C1] = [C, B]

H = 20 R = 2

bc1… mod 20 = 1 => Replicate onto B and C

34

slide-35
SLIDE 35

Deterministic Mapping Example

  • When C is removed

map[0] = [A1, C2, D1, B1, B2, C1] = [A, C] map[1] = [B2, B1, C1, C2, A1, D1] = [B, C] … map[19] = [C2, B1, D1, A1, B2, C1] = [C, B]

H = 20 R = 2

map[0] = [A1, D1, B1, B2] = [A, D] map[1] = [B2, B1, A1, D1] = [B, A] … map[19] = [B1, D1, A1, B2] = [B,D]

H = 20

The sorted order is maintained => Minimize realignments

35

slide-36
SLIDE 36

Implementation

  • Prototyped with Python

– ~8k lines of code

  • Currently supports 5 backend services

– Dropbox, Google Drive, OneDrive, Box.net, Baidu

  • Two front-end clients

– Command line client – Sync daemon

36

slide-37
SLIDE 37

Evaluation

  • How is the end-to-end performance?
  • What’s the performance characteristics of

pPaxos?

  • How quickly does MetaSync reconfigure

mappings?

37

slide-38
SLIDE 38

Evaluation

  • How is the end-to-end performance?
  • What’s the performance characteristics of

pPaxos?

  • How quickly does MetaSync reconfigure

mappings?

38

slide-39
SLIDE 39

End-to-End Performance

Dropbox Google MetaSync Linux Kernel 920 directories 15k files, 166MB 2h 45m > 3hrs 12m 18s Pictures 50 files, 193MB 415s 143s 112s

Synchronize the target between two computers

39

Performance gains are from:

  • Parallel upload/download with multiple providers
  • Combined small files into a blob

(S = 4, R = 2)

slide-40
SLIDE 40

Latency of pPaxos

Latency is not degraded with increasing concurrent proposers

  • r adding slow backend storage service

40

5 10 15 20 25 30 35 1 2 3 4 5 Latency (s) # of Proposers Google Dropbox OneDrive Box Baidu

slide-41
SLIDE 41

Latency of pPaxos

Latency is not degraded with increasing concurrent proposers

  • r adding slow backend storage service

41

5 10 15 20 25 30 35 1 2 3 4 5 Latency (s) # of Proposers Google Dropbox OneDrive Box Baidu All

slide-42
SLIDE 42

Conclusion

  • MetaSync provides a secure, reliable, and

performant files sync service on top of popular cloud providers

– To achieve a consistent update, we devise a new client-based Paxos – To minimize redistribution, we present a stable deterministic mapping

  • Source code is available:

– http://uwnetworkslab.github.io/metasync/

42