to relay or not to relay for inter cloud transfers
play

To Relay or Not to Relay for Inter-Cloud Transfers? Fan Lai , - PowerPoint PPT Presentation

To Relay or Not to Relay for Inter-Cloud Transfers? Fan Lai , Mosharaf Chowdhury, Harsha Madhyastha Background Over 40 Data Centers (DCs) on EC2, Azure, Google Cloud A geographically denser set of DCs across clouds Cloud apps host on


  1. To Relay or Not to Relay for Inter-Cloud Transfers? Fan Lai , Mosharaf Chowdhury, Harsha Madhyastha

  2. Background • Over 40 Data Centers (DCs) on EC2, Azure, Google Cloud • A geographically denser set of DCs across clouds • Cloud apps host on multiple DCs • Web search, Interactive Multimedia • Low latency access, privacy regulations • Massive data across geo-distributed DCs

  3. WAN is Crucial for Geo-distributed Service • Bandwidth-intensive transfers • Geo-distributed replication : Web search, cloud storage • Inter-DC Routing : SWAN [SIGCOMM’13] , Pretium [SIGCOMM’16], etc • Big data analytics : Iridium [SIGCOMM’15] , Clarinet [OSDI’16] … • … • Latency-sensitive traffic • Interactive service : Skype, Hangout • Transaction processing : SPANStore [SOSP’13] , Carousel [SIGMOD’18] , etc • …

  4. Prior Efforts: WAN b/w varies spatially • WAN bandwidth(b/w) varies significantly between different regions • Close regions have more than12 × of the b/w than distant regions [1] Direct: VM WAN VM Sao Paulo Singapore ≈ 3x Relay: WAN WAN VM • Virginia Bandwidth Measurement across 11 EC2 regions [1] [1] “Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds.” NSDI’17

  5. WAN Bandwidth Varies Spatially • Reproduce prior measurements • 11 EC2 regions, 110 inter-DC pairs • Tools: iperf (TCP) • Heterogeneous link capacity • Varies between the same type of VMs • Lower b/w between distant regions • Relay should work pretty well

  6. About 40% percent data 40% transfers between EC2 regions can have more than 1.5x bandwidth increase via relay Bandwidth improvement via best relay on EC2

  7. How to identify and tackle this complicated WAN? - Heterogeneous across regions - Dynamic runtime environment - Great complexity in sys design

  8. How to identify and tackle this Assumptions in prior measure- complicated WAN? ments: - Heterogeneous across regions - Default TCP setting works well - Dynamic runtime environment - Single TCP is representative - Great complexity in sys design enough for the available b/w

  9. #1: Whether the b/w still varies spatially ? What if we Break Down these assumptions ? #2: Whether the b/w still varies - Default TCP setting works well temporally? - Single TCP is representative enough for the available b/w #3: How much room for WAN improvement via relay?

  10. Default TCP Setting may be Sub-optimal • B/w varies across regions • Lower b/w between distant regions • RTT varies across regions • Max TCP window is bounded • TCP throughput is RTT -based • Google: Bandwidth to Iowa

  11. Default TCP Setting is Sub-optimal • B/w varies across regions • Lower b/w between distant regions • RTT varies across regions • Max TCP window is bounded • TCP throughput is RTT -based • Per-TCP rate limit on the WAN Google: Bandwidth to Iowa

  12. Single TCP is not Representative • Single TCP underutilize the b/w • Use multiple TCPs • Per-VM cap for outbound rate • Per-TCP rate limit < Per-VM cap • Aggregate b/w is homogeneous • VM-cap works on all connections Google: Bandwidth to Iowa

  13. #1: Whether the b/w still varies spatially ? Often Homogeneous What if we Break Down these assumptions ? #2: Whether the b/w still varies - Default TCP setting works well temporally? - Single TCP is representative enough for the available b/w #3: How much room for WAN improvement via relay?

  14. Available B/w is often Stable • Measurement setup • Create/terminate connections • Inter-DC connections share the VM-cap Create new connections • Google: Throughput from Iowa

  15. Available B/w is often Stable • Measurement setup Terminate connections • Create/terminate connections • Inter-DC connections share the VM-cap • Google: Throughput from Iowa

  16. Available B/w is often Stable • Measurement setup Aggregate b/w is stable • Create/terminate connections • Inter-DC connections share the VM-cap • Max b/w (VM cap) is stable Google: Throughput from Iowa

  17. Homogeneous bandwidth Maximum available bandwidth - Homogeneous across regions - Stable over time - Varies with VM instances - Performance can be predict- able w/o great sys complexity What will happen if the b/w is homogeneous ?

  18. Little Scope for Optimization via Inter-DC Relay Homogeneous bandwidth Latency Measurement across 40 DCs What will happen if the b/w is homogeneous ?

  19. Takeaway • Intra-DC relay from poor performance VMs to high performance VMs • Gain more inter-DC bandwidth without extra costs for transfers • Routing through a third DC takes your money away $ $ VM VM VM VM DC 1 DC 2 VM VM $ + $ = 2$ DC 1 0 + $ + 0 = $ DC 2 VM • Intra-DC relay DC 3 Inter-DC routing

  20. Takeaway • Turn to the optimization of bandwidth contentions inside VMs • VM-cap VS link-level optimizations used in existing GDA work • VM-aware VS WAN-aware • Bandwidth measurements are far from complete • More than 40 VM instance types VM ∑ b i ≤ VM-cap b 1 b n b 2 VM VM VM •

  21. #1: Whether the b/w still varies spatially ? Often Homogeneous Thank you! #2: Whether the b/w still varies Questions? temporally? Often Stable #3: How much room for WAN fanlai@umich.edu improvement via relay? Case by case

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend