Content Replication in I2-DSI using Rsync+ Bert J Dempsey Debra - - PDF document

content replication in i2 dsi using rsync
SMART_READER_LITE
LIVE PREVIEW

Content Replication in I2-DSI using Rsync+ Bert J Dempsey Debra - - PDF document

Content Replication in I2-DSI using Rsync+ Bert J Dempsey Debra Weiss University of North Carolina at Chapel Hill dempsey@ils.unc.edu Multiple-site replication in I2-DSI http://dsi.internet2.edu/ 1 Replicating Channels Channel provider


slide-1
SLIDE 1

1

Content Replication in I2-DSI using Rsync+

Bert J Dempsey Debra Weiss

University of North Carolina at Chapel Hill dempsey@ils.unc.edu

Multiple-site replication in I2-DSI

http://dsi.internet2.edu/

slide-2
SLIDE 2

2

Replicating Channels

clients Channel provider

Master Node S1 S2 S3

clients

Content import from channel provider to master I2-DSI node Content Replication to all replication sites that carry the channel

Rsync+ for Content Replication

clients Channel provider

At M: Master-side rsync+ -F srcLatest/ src/ updates

M S1 S2 S3

at each of S1, S2, S3: Slave-side updates rsync+ -f src/

clients

(m)ftp updates

Rsync is popular filesystem sync tool Rsync+ is our mods to enable local capture of update info for store-and-forward communication

slide-3
SLIDE 3

3

Server Experiment

Instrumented Mirror

Active Linux repository (8 GB, 25,000 files) Twice daily synchronization

On dsi.ncni.net:

rsync+ -F: Perform master-side rsync+ processing

between two local directories to create updates file

rsync+ -f: Use updates to perform slave-side rsync+

processing

Content Change Patterns

Data here from 1-month Linux mirror

Update per 12-hour period No files to change 13 of 60 periods (21%)

Average size of updated data (all periods)

0.144% of aggregate archive 0.104% under rsync+

Maximum size of updated data

2.42% of mirror

slide-4
SLIDE 4

4

Rsync+ processing cost

1 3 5 7 9 11 13 15 0.5 1 1.5 2

Run time as % of rsync Mirror updates (one per 12 hours)

rsync+ -F rysnc+ -f

Rsync+ Local Throughput

5 10 15 20 25 30 4 8 12 16

Mirror Update (12 hours/update) Tput (Mbits/sec)

runtime (sec) unnormalized tput normalized tput

slide-5
SLIDE 5

5

Network Throughput: ttcp experiment parameters

240KB / 240KB shared Buffer Policy 240 KB Receiver socket buffer size (KB) 1,2,4,8,16,24,32 Concurrent ttcp connections dsi.ncni.net ils.unc.edu Network Path (100 Mbit/s min) 5.45 MB File Size Values Parameter

Network Throughput: concurrent ttcp transmits

27.03944 8.23088 9.62592 6.29856 4.50688 2.8464 2.24 5 10 15 20 25 30 1 2 4 8 16 24 32 ttcp concurrent transmits Tput (Mbits/s)

slide-6
SLIDE 6

6

Network experiments: setting socket buffer sizes

dsi2ils sept14 (1 ttcp, avg. over 6 runs)

1000 2000 3000 4000 5000 6000 7000 8000 9000 50000 100000 150000 200000 250000 300000 350000 Buffer Size (bytes) Tput (Kbytes/s)

Network Throughput: concurrent ttcp tputs

1 10 100 1 2 4 8 16 24 32 Concurrent ttcp transmits (avg over runs) Tput (Mbits/s) Tput, Buffer Policy 1 Tput, Buffer Policy 2 Aggregate Tput, Buffer Policy 1 Aggregate Tput, Buffer Policy 2

slide-7
SLIDE 7

7

Baseline Scalability Analysis using empirical inputs

Update of content

0.1 % avg, 2.4 % maximum

Network tput

8 Servers, thus 6.2 Mbits/sec to each

Server tput (local rsync actions)

Master: 11.4 Mbits/sec Slave: 8.18 Mbits/sec

Baseline Scalability Analysis: end-to-end update latency

97 sec 38.8 min 49.3 min 19.72 hrs 16.1 min 386.4 min 21.5 min 516 min 11.7 min 280.8 min 1 GB 24 GB 1 TB 296 sec 118.3 min 129 sec 51.5 min 70 sec 28 min 100 MB 2.4 GB 100 GB 29.6 sec 710 s 9.7 secs 233 s 12.9 secs 309 sec 7 secs 168 sec 10 MB 240 MB 10 GB End-to- end update latency Slave processing latency Network latency Master processing latency Updates Avg Max Content Channel Size

slide-8
SLIDE 8

8

Conclusions

Our work creates scalable design for filesystem- level tool for data synchronization Current systems without tuning suggest O(100 GB) content can be handled for initial server set For TB content, system advances will need to provide speed-ups

Tuning Hardware Distributed processing