Live Migration @Alibaba Cloud: issues settled & challenges - - PowerPoint PPT Presentation

live migration alibaba cloud issues settled challenges
SMART_READER_LITE
LIVE PREVIEW

Live Migration @Alibaba Cloud: issues settled & challenges - - PowerPoint PPT Presentation

1111 Live Migration @Alibaba Cloud: issues settled & challenges remain Chao Zhang Email: zhuoxi.zc@alibaba-inc.com 1111 1 2 4 3 Challenges of Live LM Application@ Performance Tuning Future challenges Alibaba Cloud Migration


slide-1
SLIDE 1

1111

Live Migration @Alibaba Cloud: issues settled & challenges remain

Chao Zhang Email: zhuoxi.zc@alibaba-inc.com

slide-2
SLIDE 2

1111

Challenges of Live Migration @Alibaba Cloud

Performance Tuning & Robust Improvements Future challenges

1

LM Application@ Alibaba Cloud

2 3 4

slide-3
SLIDE 3

1111

Traditional Live Migration in Virtualization

VM State Save last-copy 内存pre-copy SRC

DST

init Reservation MEM pre-copy

Storage Shutdown Network Shutdown

Cleanup VM State Restore last-copy VM Start MEM pre-copy

Storage Reopen Network reconnect

VM Downtime VM Running on SRC Host

VM Breaktime

VM Running on DST HOST

slide-4
SLIDE 4

1111

Challenges of Live Migration @Alibaba Cloud

  • Require transparent migration to

the whole cloud system

  • Hardware & Software backward

compatibility

  • Robust of live migration
  • Why/When/Which?

VM

Security

SLB

Cloud Disk

VPC

Control System

Virtuali zation

Not Just a Virtualized Instance Migrating

Cloud Services

slide-5
SLIDE 5

1111

Start Migration

Relay Forwarding

VM Status Manager VM Pause Migration Notify

Storage Pause

Storage RD_ONLY

Last Copy VM-NC Switch

Storage Reopen

VM configuration

Install Flow Rules

Network Switch

Device Relocation

SRC VM destroy

Session Copy

Preparation

VM Start

Migration Operations Required @Alibaba Cloud

Status Notify

Control Plane Virtualization Plane Other Cloud Services

slide-6
SLIDE 6

1111 Decoupling Migration by Define Status Entrance Standard

Control System Network Storage Virtualizaton

Start Migration

Relay Forwarding

Status Notify VM Pause

Pre MEM COPY

SRC PAUSE

RD_only Open

Last Copy VM-NC switch

Reopen

Status Notify

Session ReCreate VM network Switch

SRC VM destroy

SESSION copy

Migration Prepare

SESSION Last Copy

VM Start

Migration Preparation

Pre Migration Post Migration VM Start Resource Cleanup

Migration Prepare Status Notify

Flow Rules Install

slide-7
SLIDE 7

1111 Optimization of Live Migration in Virtualization

VM Last Copy Compression BDRV Flush Pre Heavy Operation BDRV flush Add Pre Last Copy SESSION COPY Lazy Heavy Operation

  • Critical path parallelism
  • Dismantling heavy operations
  • Rearrangement: Lazy/Pre
  • Downwards time-sensitive
  • peration from control system

to virtualization plane

SESSION Copy Storage Reopen

Critical Path

Relay Forwarding

slide-8
SLIDE 8

1111

Cloud Disk Optimization

SRC: Close Fd DST: ReOpen Fd SRC:(1)Pause Fd DST: ReOpen Fd SRC: (2)Destroy Fd Open Fd by RD_ONLY

Pre Optimization After Optimization

  • Critical Path Optimization
  • Light Weight Pause

Operation

Open Fd by RD_ONLY

Critical Path Critical Path

slide-9
SLIDE 9

1111

VM1

Network Manager

switch VM1` switch VM2 switch

Relay Forwarding

SESSION table

VPC/SDN Live Migration

  • Copy SESSION table
  • Relay Forwarding
  • SESSION table update

Install Flow Rules

slide-10
SLIDE 10

1111

VM

SRC Host

Cloud Service VM

DST Host

NIC

…… DPDK

Add-on Cloud Services (1) (2)

  • Indirect VM-Host relationship
  • Direct VM-Host relationship
  • Live Migration friendly cloud ecosystem

Add-on Cloud Services Stay Intact

slide-11
SLIDE 11

1111 Control System

Manager Virtualization Storage Network

Cloud Service

Migration CORE

  • Migration trigger point
  • Query migration status
  • Cancel migration
  • Downwards time critical operation from

control system to virtualization plane

  • Migration procedure control
  • Cluster/Host Configuration
  • Control Policy

Configuration

slide-12
SLIDE 12

1111

Migration Test Data

VM Stress Type VM Type Total Migration Time VM Downtime idle 4u4g ~1min 70~80 ms idle 16c32g 1~2min 70~90 ms mem_stress 4u4g 1~2min 90~120ms fio 4u4g 1~2min 90~120ms

Environment:Generation III instance mem_stress: 512M dirty memory fio: iodepth=32、bs=512、randread Downtime may vary for different vm/hardware/software/stress type

slide-13
SLIDE 13

1111

Application of Live Migration:Server Maintenance

CPU IO MEM Hypervisor/Host VM VM

……

CPU IO MEM Hypervisor/Host VM VM

……

Fault Can migrate?

Cold/Live Migration

Offline Repair Online

HOST Maintenance Procedure HOST Fault-Migration

slide-14
SLIDE 14

1111

Alibaba Maintenance System

Ugrading Entrance

Rolling System Migration Manager

VM Live Migration NC Uprading

Kerne/Firmware Upgrading Before After Improvement

Memory Bandwidth (MB/s) 30179 27873 8.27% SPECjbb 128655 120552 6.72% Packet Forwarding (MB/s) 610 570 7.02%

Impoverments of the Whole Cluster

Application of Live Migration:Kerne/Firmware Upgrading

slide-15
SLIDE 15

1111

  • Doing

a) Resource defragments b) Resource balance

  • To Do

a) Power Management b) other

Host

(a)Resource Fragments

16C 32G 32C 32G 16C 32G

……

Host

(b)Power & Resource Management

16C 32G

Host

16C 32G 16C 32G

Application of Live Migration:Cloud Scheduling

slide-16
SLIDE 16

1111

Future Challenges

slide-17
SLIDE 17

1111

Hardware

SR-IOV/PassThrough Live Migration

IO Device

VM

PassThrough

Challenges:

  • IO Register migration
  • in-flight IO
  • Guest aware

Hypervisor

IO Device VF

VM

IO Device

VM

emulate SR-IOV Traditional

slide-18
SLIDE 18

1111

Ways to Start a Live Migration

General instance

Performance Robustness Price

Compute enhanced instance Credit instance

GPU XEN KVM FPGA VIRT 2.0

PASS-Through

SR-IOV

……

  • A variety of Instance types
  • Navigate through

heterogeneous architecture

  • Enable more application

practices

slide-19
SLIDE 19

1111

FAQ