Characterizing Performance and Fairness of Big Data Transfer - - PowerPoint PPT Presentation

characterizing performance and fairness of big data
SMART_READER_LITE
LIVE PREVIEW

Characterizing Performance and Fairness of Big Data Transfer - - PowerPoint PPT Presentation

Characterizing Performance and Fairness of Big Data Transfer Protocols on Long-haul Networks Nevil Brownlee <n.brownlee@auckland.ac.nz> Se-young Yu <syu051@aucklanduni.ac.nz> Aniket Mahanti <a.mahanti@auckland.ac.nz>


slide-1
SLIDE 1

Characterizing Performance and Fairness of Big Data Transfer Protocols on Long-haul Networks

Nevil Brownlee <n.brownlee@auckland.ac.nz> Se-young Yu <syu051@aucklanduni.ac.nz> Aniket Mahanti <a.mahanti@auckland.ac.nz>

slide-2
SLIDE 2

Transferring over Long Fat pipes

  • SKA, LHC, ITER, etc., generate PB of data!
  • Transferring data over distance is hard
  • TCP is not efficient , UDP is not reliable

– TCP uses small buffers by default – UDP doesn't provide congestion control – Neither protocol uses parallel transfer streams

  • We tested systems that provide these improvements
slide-3
SLIDE 3

Our 'International' Testbed

  • 10 Gb/s, 320 ms RTT

international link

  • Tested GridFTP, FDT

(TCP-based) and UDT (UDP-based)

  • Measured goodput,

RTT and bytes in-flight

slide-4
SLIDE 4

Disk vs Memory

  • Multiple flows

improve performance for FDT (which uses TCP)

  • Disk speed limits

performance

  • UDT does not improve

mem-mem transfer rate

1 2 3 4 GridFTP FDT UDT

Goodput(Gb/s)

Disk-Disk Mem-Mem

2 4 6 8 1 2 3 4 5 6 7 8 9

Goodput (Gb/s) Number of flows

GridFTP FDT

slide-5
SLIDE 5

What Happens in the Network

  • Single Flow
  • Five Flows
  • Multiple flows increase aggregated cwin
  • Faster recovery time with multiple flows

50 100 150 200 50 100 150 200 50 100 150 200 250 Bytes in transit (MB) Packet Resends Elapsed Time (s) (b) FDT Data in transit Packet Resends 50 100 150 200 250 300 350 50 100 150 200 50 100 150 200 250 300 350 400 Bytes in transit (MB) Packet Resends Elapsed Time (s) (b) FDT

slide-6
SLIDE 6

Multiple Flows have Side Effects

  • Single Flow
  • 5 Flows
  • RTT increases for all flows
  • More packets are lost
  • Plots are for FDT,

GridFTP plots are similar

310 320 330 350 390 50 100 150 200 50 100 150 200 250 300 RTT (ms) Packet Resends Elapsed Time (s) (b) FDT RTT (ms) Packet Resends 310 320 330 350 390 50 100 150 200 50 100 150 200 250 300 RTT (ms) Packet Resends Elapsed Time (s) (b) FDT

slide-7
SLIDE 7

Recommendation

  • Avoid using UDP-based protocol
  • Allocate enough socket buffer in OS/Kernel and

force applications to use larger socket buffers

  • Keep the number of multiple flows low
  • Use TCP-based protocol with recent congestion control

e.g. CUBIC, Scalable TCP

– There are many possible definitions of network “fairness” – We want to use most of the available capacity – We consider our file transfers to be “fair”