He Xiao Zhenhua Li Ennan Zhai Tianyin Xu
Pr Pract ctica cal Web-ba based d Delta Sy Sync fo for Cloud Storage Services
xiaoh16@gmail.com July 10, 2017 Hotstorage’17
Pr Pract ctica cal Web-ba based d Delta Sy Sync fo for Cloud - - PowerPoint PPT Presentation
Pr Pract ctica cal Web-ba based d Delta Sy Sync fo for Cloud Storage Services He Xiao Zhenhua Li Ennan Zhai Tianyin Xu xiaoh16@gmail.com July 10, 2017 Hotstorage17 Ne Network Traffic is Ov Overwhelming in in Clo loud Storag age
He Xiao Zhenhua Li Ennan Zhai Tianyin Xu
xiaoh16@gmail.com July 10, 2017 Hotstorage’17
File Sync
2
Cloud Traffic has 30% CAGR (CompoundAverage Growth Rate)
Sever Client Network Traffic Users Vendors
Delta Sync is crucial for reducing cloud storage network traffic.
10 MB 1 B
Delta Sync
Delta Data
3
New File Old File Delta sync support in nine state-of-the-art cloud storage services 10 MB
Full Sync
New File Old File Full File
Why web-based delta sync is not supported by today’s cloud storage services ?
4
Web Apps with local storage or log files need web-based Delta Sync Web is the most pervasive and OS- independent cloud storage access method
Web-based delta sync is essential for cloud storage web clients and web apps
is not offered by today’s cloud storage services.
for cloud storage services.
hash algorithms, we make the computation overhead affordable at the server side.
5
tech: JavaScript + HTML5 + WebSocket
JavaScript Implementation
Web Server Local File System HTML5 FileAPI WebSocket Storage Backend Aliyun OSS / OpenStack Swift High-Speed Internal Network Web Browser C Implementation
6
7
Sync time of WebRsync vs rsync
Average Client CPU utilization
//print timestamp every 100ms setInterval(print(timestamp),100) //print the timestamp of every keystone( start or end of a task)
print(task.id, timestamp)
print(task.id, timestamp)
8
StagMeter
Wait server
and Comparison
Wait server High CPU Utilization when computing Timestamp Printing is suspended Web is under stagation state
9
Sync Process (Second)
Client Server
Request for Syncing File f’ Checksum List of f Segmentation Fingerprinting
Searching Comparing
Generate tokens and Literal Bytes Construct New File f ACK
10
WebRsync
server to client.
Client Server
Request for Syncing File f’ Segmentation Fingerprinting Generate Tokens And Literal Bytes Construct New File f ACK Searching Comparing Checksum List of f
11
Edit Size (Byte) Sync Time (Second)
12
Edit Size (Byte) Sync Time (Second)
Checksum searching and block comparison occupy 80% of the computing time
MD5 Computing Checksum Search
13
Hash Function Collision Probability Cycles per Byte MD5 Low 5.58 Murmur3 High 0.33 Spooky High 0.14 SipHash Low 1.13 SipHash remain low Collision Probability at much faster speed
14
A comparison of pseudorandom hash functions
collisions (Probability p), so does MD5.
probability p’).
functions as a global verification.
15
16
MD5-4 Hash Table Adler32-1 Adler32-2 Adler32-3 Adler32-4 MD5-1 MD5-2 MD5-3 Block1 Block2 Block3 Block4
Checksum search Compare
95% synchronized files have less than 10 edits.
17
Basic experiment setup visualized in a map of China
18
1 10 100 1K 10K 100k
Edit Size (Byte)
10-1 100 101
Sync Time (Second)
WebRsync WebR2sync WebR2sync+ rsync
WebR2sync+ is 2-3 times faster than WebR2sync and 15-20 times faster than WebRsync
19
2000 4000 6000 8000
Number of Concurrent Users
NoWebRsync WebRsync WebR2sync WebR2sync+ rsync
This throughput is as 4 times as that of WebR2sync/rsync and as 9 times as that of NoWebRsync.
20
scenarios from the perspective of sync
locality-based optimization.
21
the traditional cloud storage architecture
22
discussion
23
Block1 Block2 Block3 … Adler32 MD5 Adler32 MD5 Adler32 MD5 … … Weak Checksum Search Strong Checksum Compare 1 block offset YES YES NO NO Matched Tokens Literal Bytes Construct New File Client Server 1 byte offset Rolling Adler32 O(1): Adler(i)=>Adler(i+1)
24
Construct New Files Client Server Weak Checksum Search Strong Checksum Compare YES NO NO 1 byte offset No further Operation YES Block 1 Block 2 Block 3 Block 4 Block 1 Block 2 Block 3 Block 4 When find a match, record the associated index
25
26
1 10 100 1K 10K 100K
Edit Size (Byte)
0.05 0.1 0.15 0.2
Sync Time (Second)
Server Network Client
WebR2sync+ client takes stable and shorter time. Because of the Server-side optimization, computing time is much shorter both in client and server.