Application Migration for Geographically Shifting Workloads Robbert - - PowerPoint PPT Presentation

application migration for
SMART_READER_LITE
LIVE PREVIEW

Application Migration for Geographically Shifting Workloads Robbert - - PowerPoint PPT Presentation

Follow the Sun through the Clouds: Application Migration for Geographically Shifting Workloads Robbert van Renesse Cornell University Joint work with Zhiming Shen, Qin Jia, Gur-Eyal Sela, Ben Rainero Weijia Song, Hakim Weatherspoon 1


slide-1
SLIDE 1

Follow the Sun through the Clouds: Application Migration for Geographically Shifting Workloads

Robbert van Renesse Cornell University Joint work with Zhiming Shen, Qin Jia, Gur-Eyal Sela, Ben Rainero Weijia Song, Hakim Weatherspoon

1

slide-2
SLIDE 2

Infrastructure as a Service (IaaS) Clouds

  • Offer on-demand virtual machines (VMs)
  • Pay-as-you-go: charge according to used hours
  • Provide useful services such as auto-scaling and failure recovering

2

slide-3
SLIDE 3

Handling Geographically Shifting Workloads

Follow the sun

3

slide-4
SLIDE 4

Handling Geographically Shifting Workloads

Follow the sun

4

  • Lack of homogeneous interface
  • Lack of privileged control
  • Lack of infrastructure support
  • Lack of common resource

management

slide-5
SLIDE 5

Supercloud Overview

  • Application migration as a service across cloud providers and

availability zones

  • Support ALL major virtualization platforms and ALL major public cloud

providers

  • Live migration without changing IP addresses or breaking TCP

connections

  • Automatic scheduling framework
  • Optimize metrics such as average perceived latency
  • Provide cross-cloud storage and networking solution

5

slide-6
SLIDE 6

Supercloud Overview

  • Computation
  • Nested hypervisor: Xen-Blanket
  • Support all major platforms
  • Network
  • SDN overlay
  • Support migration with public IP
  • Storage:
  • Geo-replicated storage
  • Optimized for serving VM

images

  • Resource management
  • OpenStack platform

Software Defined Network (SDN) and Geo-replicated Image Store

Cornell Red Cloud Microsoft Azure

First Layer

Xen-Blanket OpenStack User VMs Xen/PV-on-HVM

Secon d Layer

User VMs

First Layer

Xen-Blanket OpenStack User VMs KVM/virtio

Secon d Layer

User VMs

First Layer

Xen-Blanket OpenStack User VMs Xen/Hyper-V

Secon d Layer

User VMs

Google Compute Engine

slide-7
SLIDE 7

7

slide-8
SLIDE 8

Nested Virtualization

Xen-Blanket

  • Second Layer Hypervisor
  • Uniformity

Guest VM Xen-Blanket Dom0 DomU Xen KVM Hyper-V Second-layer provider First-layer provider

slide-9
SLIDE 9
slide-10
SLIDE 10

Supercloud Networking

  • Goal:
  • Inter-connection
  • Optimized routing
  • Supporting migration
  • VPN overlay
  • Full-mesh tunnels
  • Frenetic SDN controller
  • Transparent VM migration
  • Public IP address support

10

vSwitch vSwitch vSwitch vSwitch

VM

vSwitch

VM

Cloud 1 Cloud 2

slide-11
SLIDE 11

VM Migration with Public IP Address

11

VM Pub IP front-end

54.172.26.213 54.172.26.213

slide-12
SLIDE 12

VM Migration with Public IP Address

12

VM Pub IP front-end

54.172.26.213 52.69.94.195

Pub IP front-end

slide-13
SLIDE 13

Centralized VM Image Storage

13

VM Image Long latency; Low throughput

slide-14
SLIDE 14

Geo-Replicated VM Image Storage

14

VM Image Image

Challenges:

  • Strong consistency requirement
  • Long latency and low throughput in WAN
slide-15
SLIDE 15

Decouple Consistency and Data Propagation

15

VM

Local Meta- Data Local Meta- Data

Data Store Layer

VM VM VM

NFS/iSCSI NFS/iSCSI

Cloud 1

Global Meta-Data Global Meta-Data

Data View Layer

Back-End Storage

Propagation Manager

Back-End Storage

Propagation Manager

Cloud 2

Strong Consistency

  • Version number
  • Location of the latest block
  • Local version number

On-demand fetch Pro-active data propagation

Consistency Data Propagation

Eventual Consistency

slide-16
SLIDE 16

Global Meta-Data Propagation

  • Challenge:
  • Long latency
  • Observation:
  • Single writer
  • No read-write sharing
  • Relaxed consistency model
  • Close-to-open consistency
  • Propagation policy
  • Commit locally
  • Flush to centralized controller when closing

16

VM

Global Meta-Data

Local Meta-Data

Back-End Storage

Propagation Manager

Global Meta-Data

Local Meta-Data

Back-End Storage

Propagation Manager

Data View Layer Data Store Layer

NFS/iSCSI

VM VM

NFS/iSCSI

VM

Cloud 1

Controller

Cloud 2

Strong Consistency

slide-17
SLIDE 17

Evaluation: ZooKeeper Migration

17

  • Application level vs. VM level migration

ZooKeeper Dynamic Reconfiguration Supercloud VM migration Code complexity

  • Add/remove nodes: 6700+

lines of code change

  • Leader rotation: not

supported yet No code change Transparency Clients need to be notified Completely transparent Performance Several seconds of downtime due to state synchronization and leader election Little performance impact

slide-18
SLIDE 18

Comparing ZooKeeper Migration Mechanisms

18

  • Initially: Asia 1, US 2
  • 2-step reconfiguration:
  • Asia + 1, US -1
  • 3-step reconfiguration:
  • Asia +2, US -2
  • Asia -1, US +1
  • Supercloud
  • Migrate the leader from

US to Asia

Leader is separated from the majority 20second performance degradation

slide-19
SLIDE 19

Follow the Sun

  • Experimental Setup
  • Global ZooKeeper deployment in US and Asia
  • MSN trace
  • Comparing different deployments
  • US Ensemble: all ZooKeeper nodes in the US
  • Global Ensemble: majority in US, one node in Asia
  • Dynamic Ensemble: using Supercloud VM migration

19

slide-20
SLIDE 20

Follow the Sun

20

slide-21
SLIDE 21

Supercloud Scheduler

  • Decides placement and migration automatically
  • Requires run-time monitoring and performance models for cloud

resources

21

slide-22
SLIDE 22

22

Memory Performance Measurements / Anomalies

slide-23
SLIDE 23

Partners in crime

  • NIST ANTD (Advanced Network Techologies Divison):

Monitoring and Security

  • Abdella Battou
  • Fred de Vaulx
  • Lotfi Benmohamed
  • Charif Mahmoudi
  • Cornell Aristotle Project and XSEDE

Academic cloud sharing and bursting

  • David Lifka (Cornell CIO)

23

slide-24
SLIDE 24

Conclusion

  • Supercloud: application migration for geographically shifting

workloads

  • Crossing heterogeneous cloud providers
  • Automatic scheduling
  • Geo-replicated image storage
  • Wide-area SDN
  • Visit our workshop tomorrow morning (Thursday)
  • We’ll also present exciting cloud performance comparison studies
  • More at http://supercloud.cs.cornell.edu

24

Thank You. Questions?