Pr Pract ctica cal Web-ba based d Delta Sy Sync fo for Cloud - - PowerPoint PPT Presentation

pr pract ctica cal web ba based d delta sy sync fo for
SMART_READER_LITE
LIVE PREVIEW

Pr Pract ctica cal Web-ba based d Delta Sy Sync fo for Cloud - - PowerPoint PPT Presentation

Pr Pract ctica cal Web-ba based d Delta Sy Sync fo for Cloud Storage Services He Xiao Zhenhua Li Ennan Zhai Tianyin Xu xiaoh16@gmail.com July 10, 2017 Hotstorage17 Ne Network Traffic is Ov Overwhelming in in Clo loud Storag age


slide-1
SLIDE 1

He Xiao Zhenhua Li Ennan Zhai Tianyin Xu

Pr Pract ctica cal Web-ba based d Delta Sy Sync fo for Cloud Storage Services

xiaoh16@gmail.com July 10, 2017 Hotstorage’17

slide-2
SLIDE 2

Ne Network Traffic is Ov Overwhelming in in Clo loud Storag age

File Sync

2

Cloud Traffic has 30% CAGR (CompoundAverage Growth Rate)

Sever Client Network Traffic Users Vendors

slide-3
SLIDE 3

De Delta Sync Improve ves Ne Network Ef Efficiency

Delta Sync is crucial for reducing cloud storage network traffic.

10 MB 1 B

Delta Sync

Delta Data

3

New File Old File Delta sync support in nine state-of-the-art cloud storage services 10 MB

Full Sync

New File Old File Full File

slide-4
SLIDE 4

No No We Web-ba based d Delta Sync nc

Why web-based delta sync is not supported by today’s cloud storage services ?

4

Web Apps with local storage or log files need web-based Delta Sync Web is the most pervasive and OS- independent cloud storage access method

Web-based delta sync is essential for cloud storage web clients and web apps

slide-5
SLIDE 5

Co Contribution

  • We quantitatively study why web-based delta sync

is not offered by today’s cloud storage services.

  • We build a practical web-based delta sync solution

for cloud storage services.

  • By reversing traditional delta sync process, we make the
  • verhead affordable at the web client side.
  • By exploiting the locality of users’ edits and trading off

hash algorithms, we make the computation overhead affordable at the server side.

5

slide-6
SLIDE 6

We WebRsync: : Imp mpleme ment Delta Sync on Web

  • Implement rsync on real cloud storage with native web

tech: JavaScript + HTML5 + WebSocket

  • rsync is the de facto solution of delta sync in cloud storage

JavaScript Implementation

  • f Rsync

Web Server Local File System HTML5 FileAPI WebSocket Storage Backend Aliyun OSS / OpenStack Swift High-Speed Internal Network Web Browser C Implementation

  • f Rsync

6

slide-7
SLIDE 7

We WebRsync vs. rsync

7

Sync time of WebRsync vs rsync

Average Client CPU utilization

slide-8
SLIDE 8

St Stagnation due to JavaScr Script’s Si Single- th thread Ev Event Loop Model

//print timestamp every 100ms setInterval(print(timestamp),100) //print the timestamp of every keystone( start or end of a task)

  • n_start(task);

print(task.id, timestamp)

  • n_finish(task);

print(task.id, timestamp)

8

StagMeter

slide-9
SLIDE 9
  • 1. Send meta data

Wait server

  • 2. Checksum Search

and Comparison

  • 3. Send tokens and literal bytes

Wait server High CPU Utilization when computing Timestamp Printing is suspended Web is under stagation state

St StagMeter on WebRsync

9

Sync Process (Second)

slide-10
SLIDE 10

We WebR2sync: Client-si side Optimization Re Reverse Co Computation Process

Client Server

Request for Syncing File f’ Checksum List of f Segmentation Fingerprinting

Searching Comparing

Generate tokens and Literal Bytes Construct New File f ACK

10

WebRsync

slide-11
SLIDE 11

We WebR2sync: Client-si side optimization Re Reverse Com

  • mputation
  • n Pr

Proce cess

  • Web Reverse Rsync: Reverse complicatedcomputationfrom

server to client.

Client Server

Request for Syncing File f’ Segmentation Fingerprinting Generate Tokens And Literal Bytes Construct New File f ACK Searching Comparing Checksum List of f

11

slide-12
SLIDE 12

Pe Performance of We WebR2sync

Edit Size (Byte) Sync Time (Second)

12

Edit Size (Byte) Sync Time (Second)

Issue: Server takes severely heavy overhead.

slide-13
SLIDE 13

Se Server-si side Ov Overhead Profiling

Checksum searching and block comparison occupy 80% of the computing time

MD5 Computing Checksum Search

13

Ø Use faster hash functions to replace MD5 Ø Reduce checksum searching overhead

slide-14
SLIDE 14

Re Replacing MD5 with S SipHash in Ch Chunk Com

  • mparison
  • n

Hash Function Collision Probability Cycles per Byte MD5 Low 5.58 Murmur3 High 0.33 Spooky High 0.14 SipHash Low 1.13 SipHash remain low Collision Probability at much faster speed

14

A comparison of pseudorandom hash functions

slide-15
SLIDE 15

So Solve Possible Hash Collision

  • Replace MD5 with SipHash, may cause potential

collisions (Probability p), so does MD5.

  • Our Solution: Use Spooky (fastest method, collision

probability p’).

  • The probability of collisions is p*p’
  • Alternative: Use MD5 or other strong hash

functions as a global verification.

  • Compute MD5 over whole file is expensive.

15

slide-16
SLIDE 16

Re Reduce Ch Chunk Sea earching by Exploiting Loc

  • cality of
  • f Fi

File Edi dits.

16

MD5-4 Hash Table Adler32-1 Adler32-2 Adler32-3 Adler32-4 MD5-1 MD5-2 MD5-3 Block1 Block2 Block3 Block4

Checksum search Compare

95% synchronized files have less than 10 edits.

slide-17
SLIDE 17

Ev Evaluation Setup

17

Basic experiment setup visualized in a map of China

slide-18
SLIDE 18

Sy Sync c Time

18

1 10 100 1K 10K 100k

Edit Size (Byte)

10-1 100 101

Sync Time (Second)

WebRsync WebR2sync WebR2sync+ rsync

WebR2sync+ is 2-3 times faster than WebR2sync and 15-20 times faster than WebRsync

slide-19
SLIDE 19

Th Throughput

19

2000 4000 6000 8000

Number of Concurrent Users

NoWebRsync WebRsync WebR2sync WebR2sync+ rsync

This throughput is as 4 times as that of WebR2sync/rsync and as 9 times as that of NoWebRsync.

slide-20
SLIDE 20

Fu Future Work

  • Evaluate our approach under different edit modes
  • delete, insert, append
  • Evaluate traffic efficiency
  • all the methods should have similar traffic efficiency
  • Understand the effects of three optimizations
  • evaluate them separately

20

slide-21
SLIDE 21

Di Discussion

  • Probability of collisions of file checksums
  • Characteristics of file operations in real-world

scenarios from the perspective of sync

  • Locality measure for deciding whether to apply

locality-based optimization.

21

slide-22
SLIDE 22

Co Conclusion

  • WebR2sync+ is a practical solution for web-

based delta sync

  • lightweight computation at the client side
  • optimized overhead at the server side
  • the server-side optimizations can be adopted in

the traditional cloud storage architecture

22

slide-23
SLIDE 23

Thanks!

discussion

23

slide-24
SLIDE 24

We WebRsync Detailed De Descripti tion

Block1 Block2 Block3 … Adler32 MD5 Adler32 MD5 Adler32 MD5 … … Weak Checksum Search Strong Checksum Compare 1 block offset YES YES NO NO Matched Tokens Literal Bytes Construct New File Client Server 1 byte offset Rolling Adler32 O(1): Adler(i)=>Adler(i+1)

24

slide-25
SLIDE 25

We WebR2sync: Flowchart and Data st structure

Construct New Files Client Server Weak Checksum Search Strong Checksum Compare YES NO NO 1 byte offset No further Operation YES Block 1 Block 2 Block 3 Block 4 Block 1 Block 2 Block 3 Block 4 When find a match, record the associated index

25

slide-26
SLIDE 26

Sy Sync c Time deco composed

26

1 10 100 1K 10K 100K

Edit Size (Byte)

0.05 0.1 0.15 0.2

Sync Time (Second)

Server Network Client

WebR2sync+ client takes stable and shorter time. Because of the Server-side optimization, computing time is much shorter both in client and server.