Migration v2 Andrew Cooper Citrix XenServer 17 th August 2015 17 th - - PowerPoint PPT Presentation

migration v2
SMART_READER_LITE
LIVE PREVIEW

Migration v2 Andrew Cooper Citrix XenServer 17 th August 2015 17 th - - PowerPoint PPT Presentation

Migration v2 Andrew Cooper Citrix XenServer 17 th August 2015 17 th August 2015 Andrew Cooper (Citrix XenServer) Migration v2 1 / 12 Why Migration v2 XenServer 6.2 64bit Xen, 32bit Dom0 Inertia, More efficient to virtualise 17 th


slide-1
SLIDE 1

Migration v2

Andrew Cooper

Citrix XenServer

17th August 2015

Andrew Cooper (Citrix XenServer) Migration v2 17th August 2015 1 / 12

slide-2
SLIDE 2

Why Migration v2

XenServer 6.2

◮ 64bit Xen, 32bit Dom0 ◮ Inertia, More efficient to virtualise Andrew Cooper (Citrix XenServer) Migration v2 17th August 2015 2 / 12

slide-3
SLIDE 3

Why Migration v2

XenServer 6.2

◮ 64bit Xen, 32bit Dom0 ◮ Inertia, More efficient to virtualise

XenServer 6.5

◮ 64bit Xen, 64bit Dom0 ◮ High MMIO regions above 244 bits Andrew Cooper (Citrix XenServer) Migration v2 17th August 2015 2 / 12

slide-4
SLIDE 4

Why Migration v2

XenServer 6.2

◮ 64bit Xen, 32bit Dom0 ◮ Inertia, More efficient to virtualise

XenServer 6.5

◮ 64bit Xen, 64bit Dom0 ◮ High MMIO regions above 244 bits

Rolling Pool Upgrade tests

◮ Migrate VM from XS6.2 to XS6.5 ◮ Error on the receiving side: Andrew Cooper (Citrix XenServer) Migration v2 17th August 2015 2 / 12

slide-5
SLIDE 5

Why Migration v2

XenServer 6.2

◮ 64bit Xen, 32bit Dom0 ◮ Inertia, More efficient to virtualise

XenServer 6.5

◮ 64bit Xen, 64bit Dom0 ◮ High MMIO regions above 244 bits

Rolling Pool Upgrade tests

◮ Migrate VM from XS6.2 to XS6.5 ◮ Error on the receiving side:

xc: detail: xc_domain_restore: starting restore of new domid 1 xc: detail: xc_domain_restore: p2m_size = ffffffff00010000 xc: error: Couldn’t allocate p2m_frame_list array: Internal error

Andrew Cooper (Citrix XenServer) Migration v2 17th August 2015 2 / 12

slide-6
SLIDE 6

Legacy Migration

int xc_domain_restore(xc_interface *xch, ... if ( RDEXACT(io_fd, &dinfo->p2m_size, sizeof(unsigned long)) ) { PERROR("read: p2m_size"); goto out; } DPRINTF("%s: p2m_size = %lx\n", __func__, dinfo->p2m_size);

Andrew Cooper (Citrix XenServer) Migration v2 17th August 2015 3 / 12

slide-7
SLIDE 7

Legacy Migration

No format written down

◮ Subsequently reverse engineered from existing code Andrew Cooper (Citrix XenServer) Migration v2 17th August 2015 4 / 12

slide-8
SLIDE 8

Legacy Migration

No format written down

◮ Subsequently reverse engineered from existing code

No header information at all Hard to extend

◮ Written mostly as two monolithic functions ◮ goto tangle ◮ PV MSR support too complicated to implement Andrew Cooper (Citrix XenServer) Migration v2 17th August 2015 4 / 12

slide-9
SLIDE 9

Legacy Migration

No format written down

◮ Subsequently reverse engineered from existing code

No header information at all Hard to extend

◮ Written mostly as two monolithic functions ◮ goto tangle ◮ PV MSR support too complicated to implement

Asymmetry with Qemu handling

◮ Save side’s caller puts Qemu blob into the stream ◮ Restore side pulls Qemu blob out and saves in magic path Andrew Cooper (Citrix XenServer) Migration v2 17th August 2015 4 / 12

slide-10
SLIDE 10

Legacy Migration

No format written down

◮ Subsequently reverse engineered from existing code

No header information at all Hard to extend

◮ Written mostly as two monolithic functions ◮ goto tangle ◮ PV MSR support too complicated to implement

Asymmetry with Qemu handling

◮ Save side’s caller puts Qemu blob into the stream ◮ Restore side pulls Qemu blob out and saves in magic path

Stream contents depends on compilation ABI

◮ Different between 32bit and 64bit Andrew Cooper (Citrix XenServer) Migration v2 17th August 2015 4 / 12

slide-11
SLIDE 11

VM Serialisation

Information (Currently x86 specific) Common Page Data, TSC HVM Params, Context (Xen serialised state) PV Width, Levels, P2M, VCPU State, Shared Info

Andrew Cooper (Citrix XenServer) Migration v2 17th August 2015 5 / 12

slide-12
SLIDE 12

VM Serialisation

Information (Currently x86 specific) Common Page Data, TSC HVM Params, Context (Xen serialised state) PV Width, Levels, P2M, VCPU State, Shared Info Suspend

◮ Pause VM ◮ Copy all memory Andrew Cooper (Citrix XenServer) Migration v2 17th August 2015 5 / 12

slide-13
SLIDE 13

VM Serialisation

Information (Currently x86 specific) Common Page Data, TSC HVM Params, Context (Xen serialised state) PV Width, Levels, P2M, VCPU State, Shared Info Suspend

◮ Pause VM ◮ Copy all memory

Migrate

◮ Enable logdirty ◮ Copy all memory ◮ — Query logdirty bitmap ◮ — Copy dirty memory ◮ — Loop ◮ Pause VM ◮ Copy remaining memory Andrew Cooper (Citrix XenServer) Migration v2 17th August 2015 5 / 12

slide-14
SLIDE 14

Solution for XenServer

Redesigned completely from scratch

Andrew Cooper (Citrix XenServer) Migration v2 17th August 2015 6 / 12

slide-15
SLIDE 15

Solution for XenServer

Redesigned completely from scratch Specification written down

◮ docs/specs/libxc-migration-stream.pandoc ◮ Describes exact binary layout ◮ Extensible Andrew Cooper (Citrix XenServer) Migration v2 17th August 2015 6 / 12

slide-16
SLIDE 16

Solution for XenServer

Redesigned completely from scratch Specification written down

◮ docs/specs/libxc-migration-stream.pandoc ◮ Describes exact binary layout ◮ Extensible

Reimplemented completely from scratch

◮ Common save and restore algorithms ◮ Per-guest-type hooks to implement Andrew Cooper (Citrix XenServer) Migration v2 17th August 2015 6 / 12

slide-17
SLIDE 17

Solution for XenServer

Redesigned completely from scratch Specification written down

◮ docs/specs/libxc-migration-stream.pandoc ◮ Describes exact binary layout ◮ Extensible

Reimplemented completely from scratch

◮ Common save and restore algorithms ◮ Per-guest-type hooks to implement

Legacy conversion needed

◮ tools/python/scripts/convert-legacy-stream ◮ Reads in legacy stream ◮ Writes out v2 stream Andrew Cooper (Citrix XenServer) Migration v2 17th August 2015 6 / 12

slide-18
SLIDE 18

Stream format – libxc

1 2 3 4 5 6 7 octet +-------------------------------------------------+ + | marker (0xffffffffffffffff) | | +-----------------------+-------------------------+ | Image | id ("XENF" in ASCII) | version (2) | | Header +-----------+-----------+-------------------------+ | | options | (reserved) | | +-----------+-------------------------------------+ + +-----------------------+-----------+-------------+ + | type (PV, HVM, etc) | page_shift| (reserved) | | Domain +-----------------------+-----------+-------------+ | Header | xen_major (4) | xen_minor (6) | | +-----------------------+-------------------------+ + +-----------------------+-------------------------+ + | type | body_length | | +-----------+-----------+-------------------------+ | | body... | | Record ... | | | padding (0 to 7 octets) | | +-----------+-------------------------------------+ +

Andrew Cooper (Citrix XenServer) Migration v2 17th August 2015 7 / 12

slide-19
SLIDE 19

Upstreaming

Problems with libxl

◮ No participation in stream ◮ ’Toolstack Data’ depends on compilation ABI Andrew Cooper (Citrix XenServer) Migration v2 17th August 2015 8 / 12

slide-20
SLIDE 20

Upstreaming

Problems with libxl

◮ No participation in stream ◮ ’Toolstack Data’ depends on compilation ABI

Design from scratch

Andrew Cooper (Citrix XenServer) Migration v2 17th August 2015 8 / 12

slide-21
SLIDE 21

Upstreaming

Problems with libxl

◮ No participation in stream ◮ ’Toolstack Data’ depends on compilation ABI

Design from scratch Specification written down

◮ docs/specs/libxl-migration-stream.pandoc ◮ Describes exact binary layout ◮ Extensible Andrew Cooper (Citrix XenServer) Migration v2 17th August 2015 8 / 12

slide-22
SLIDE 22

Upstreaming

Problems with libxl

◮ No participation in stream ◮ ’Toolstack Data’ depends on compilation ABI

Design from scratch Specification written down

◮ docs/specs/libxl-migration-stream.pandoc ◮ Describes exact binary layout ◮ Extensible

Write from scratch

Andrew Cooper (Citrix XenServer) Migration v2 17th August 2015 8 / 12

slide-23
SLIDE 23

Upstreaming

Problems with libxl

◮ No participation in stream ◮ ’Toolstack Data’ depends on compilation ABI

Design from scratch Specification written down

◮ docs/specs/libxl-migration-stream.pandoc ◮ Describes exact binary layout ◮ Extensible

Write from scratch Compatibility script extended

◮ Able to write libxl migration v2 streams ◮ ’Qemu’ and ’Toolstack data’ layered appropriately Andrew Cooper (Citrix XenServer) Migration v2 17th August 2015 8 / 12

slide-24
SLIDE 24

Framing

Legacy Migration

header

  • ptional data

... toolstack ... qemu Key: xl libxl libxc

Andrew Cooper (Citrix XenServer) Migration v2 17th August 2015 9 / 12

slide-25
SLIDE 25

Framing

Legacy Migration Migration v2

header

  • ptional data

... toolstack ... qemu header

  • ptional data

header libxc content image header domain header ... end emulator xenstore emulator context end Key: xl libxl libxc

Andrew Cooper (Citrix XenServer) Migration v2 17th August 2015 9 / 12

slide-26
SLIDE 26

Framing

Legacy Migration Migration v2 Remus Migration v2

header

  • ptional data

... toolstack ... qemu header

  • ptional data

header libxc content image header domain header ... end emulator xenstore emulator context end header

  • ptional data

header libxc content image header domain header ... checkpoint ... checkpoint end ... checkpoint ... checkpoint end ... Key: xl libxl libxc

Andrew Cooper (Citrix XenServer) Migration v2 17th August 2015 9 / 12

slide-27
SLIDE 27

General Notes

Issues fixed

◮ PV VCPU state corruption when racing with vcpu actions ◮ PV guests with superpages abort on save, rather than failing to

reconstruct pagetables on restore

◮ More efficient handling of page data Andrew Cooper (Citrix XenServer) Migration v2 17th August 2015 10 / 12

slide-28
SLIDE 28

General Notes

Issues fixed

◮ PV VCPU state corruption when racing with vcpu actions ◮ PV guests with superpages abort on save, rather than failing to

reconstruct pagetables on restore

◮ More efficient handling of page data

Issues still present

◮ Guests which balloon ◮ PV P2M structure changes ◮ HVM guests with PoD pages Andrew Cooper (Citrix XenServer) Migration v2 17th August 2015 10 / 12

slide-29
SLIDE 29

General Notes

Issues fixed

◮ PV VCPU state corruption when racing with vcpu actions ◮ PV guests with superpages abort on save, rather than failing to

reconstruct pagetables on restore

◮ More efficient handling of page data

Issues still present

◮ Guests which balloon ◮ PV P2M structure changes ◮ HVM guests with PoD pages

Areas for further work

◮ Live migrate looping parameters ◮ Linear P2M support Andrew Cooper (Citrix XenServer) Migration v2 17th August 2015 10 / 12

slide-30
SLIDE 30

Status – Xen 4.6

All committed Fully enabled (and tested) xl save/restore/migrate/remus function as before Legacy migration removed No noticeable difference to users

Andrew Cooper (Citrix XenServer) Migration v2 17th August 2015 11 / 12

slide-31
SLIDE 31

Migration v2

Any Questions?

Andrew Cooper (Citrix XenServer) Migration v2 17th August 2015 12 / 12