Ganeti The Cluster-based Virtualization Mangement Software Helga - - PowerPoint PPT Presentation

ganeti
SMART_READER_LITE
LIVE PREVIEW

Ganeti The Cluster-based Virtualization Mangement Software Helga - - PowerPoint PPT Presentation

Overview Architecture Customization In Production Current development Community Conclusion Ganeti The Cluster-based Virtualization Mangement Software Helga Velroyen (helgav@google.com) Klaus Aehlig (aehlig@google.com) August 24, 2013


slide-1
SLIDE 1

Overview Architecture Customization In Production Current development Community Conclusion

Ganeti

The Cluster-based Virtualization Mangement Software Helga Velroyen (helgav@google.com) Klaus Aehlig (aehlig@google.com) August 24, 2013

slide-2
SLIDE 2

Overview Architecture Customization In Production Current development Community Conclusion

Virtualization

To build your VMs (“instances”), you would take . . .

slide-3
SLIDE 3

Overview Architecture Customization In Production Current development Community Conclusion

Virtualization

node node node

To build your VMs (“instances”), you would take . . .

  • a bunch of physical machines

(“nodes”)

slide-4
SLIDE 4

Overview Architecture Customization In Production Current development Community Conclusion

Virtualization

instance instance node instance node instance node

To build your VMs (“instances”), you would take . . .

  • a bunch of physical machines

(“nodes”)

  • some hypervisor, say Xen
slide-5
SLIDE 5

Overview Architecture Customization In Production Current development Community Conclusion

Virtualization

instance instance node instance node instance node

To build your VMs (“instances”), you would take . . .

  • a bunch of physical machines

(“nodes”)

  • some hypervisor, say Xen
  • some way to replicate storage,

say DRBD

slide-6
SLIDE 6

Overview Architecture Customization In Production Current development Community Conclusion

Enter Ganeti

instance instance node instance node instance node

While all this works on its own, Ganeti helps

slide-7
SLIDE 7

Overview Architecture Customization In Production Current development Community Conclusion

Enter Ganeti

instance instance node instance node instance node

While all this works on its own, Ganeti helps

  • to get there
  • uniform interface
slide-8
SLIDE 8

Overview Architecture Customization In Production Current development Community Conclusion

Enter Ganeti

instance instance node instance node instance node

While all this works on its own, Ganeti helps

  • to get there
  • uniform interface
  • Hypervisors: Xen, kvm, . . .
  • Storage: drbd, lvm, file, . . .
  • Network
slide-9
SLIDE 9

Overview Architecture Customization In Production Current development Community Conclusion

Enter Ganeti

instance instance node instance node instance node

While all this works on its own, Ganeti helps

  • to get there
  • uniform interface

hypervisors/storage/. . .

  • policies, balanced allocation
slide-10
SLIDE 10

Overview Architecture Customization In Production Current development Community Conclusion

Enter Ganeti

instance instance node instance node instance node

While all this works on its own, Ganeti helps

  • to get there
  • uniform interface

hypervisors/storage/. . .

  • policies, balanced allocation
  • Instance memory/disk size
  • CPU oversubscription
  • tag-exclusion

“Don’t put both name servers on the same node!”

slide-11
SLIDE 11

Overview Architecture Customization In Production Current development Community Conclusion

Enter Ganeti

instance instance node instance node instance node

While all this works on its own, Ganeti helps

  • to get there
  • uniform interface

hypervisors/storage/. . .

  • policies, balanced allocation
  • and to stay there
slide-12
SLIDE 12

Overview Architecture Customization In Production Current development Community Conclusion

Enter Ganeti

instance instance node instance node instance node

While all this works on its own, Ganeti helps

  • to get there
  • uniform interface

hypervisors/storage/. . .

  • policies, balanced allocation
  • and to stay there
slide-13
SLIDE 13

Overview Architecture Customization In Production Current development Community Conclusion

Enter Ganeti

instance instance node instance node instance node instance instance

While all this works on its own, Ganeti helps

  • to get there
  • uniform interface

hypervisors/storage/. . .

  • policies, balanced allocation
  • and to stay there
  • failover instances
slide-14
SLIDE 14

Overview Architecture Customization In Production Current development Community Conclusion

Enter Ganeti

instance instance node instance node instance node instance instance

While all this works on its own, Ganeti helps

  • to get there
  • uniform interface

hypervisors/storage/. . .

  • policies, balanced allocation

keeping N + 1 redundancy

  • and to stay there
  • failover instances
slide-15
SLIDE 15

Overview Architecture Customization In Production Current development Community Conclusion

Enter Ganeti

instance instance node instance node instance node instance instance

While all this works on its own, Ganeti helps

  • to get there
  • uniform interface

hypervisors/storage/. . .

  • policies, balanced allocation

keeping N + 1 redundancy

  • and to stay there
  • failover instances

and evacuate nodes

slide-16
SLIDE 16

Overview Architecture Customization In Production Current development Community Conclusion

Enter Ganeti

instance instance node instance node instance node instance instance

While all this works on its own, Ganeti helps

  • to get there
  • uniform interface

hypervisors/storage/. . .

  • policies, balanced allocation

keeping N + 1 redundancy

  • and to stay there
  • failover instances

and evacuate nodes

  • rebalance
slide-17
SLIDE 17

Overview Architecture Customization In Production Current development Community Conclusion

Enter Ganeti

instance instance node instance node instance node instance instance

While all this works on its own, Ganeti helps

  • to get there
  • uniform interface

hypervisors/storage/. . .

  • policies, balanced allocation

keeping N + 1 redundancy

  • and to stay there
  • failover instances

and evacuate nodes

  • rebalance
  • Restart instances after power
  • utage
slide-18
SLIDE 18

Overview Architecture Customization In Production Current development Community Conclusion

Enter Ganeti

instance instance node instance node instance node instance instance

While all this works on its own, Ganeti helps

  • to get there
  • uniform interface

hypervisors/storage/. . .

  • policies, balanced allocation

keeping N + 1 redundancy

  • and to stay there
  • failover instances

and evacuate nodes

  • rebalance
  • Restart instances after power
  • utage
  • . . .
slide-19
SLIDE 19

Overview Architecture Customization In Production Current development Community Conclusion

Basic Interaction—Cluster creation

  • gnt-cluster init -s 192.0.2.1

clusterA.example.com

slide-20
SLIDE 20

Overview Architecture Customization In Production Current development Community Conclusion

Basic Interaction—Cluster creation

  • gnt-cluster init -s 192.0.2.1

clusterA.example.com

  • gnt-node add -s 192.0.2.2 node2.example.com
slide-21
SLIDE 21

Overview Architecture Customization In Production Current development Community Conclusion

Basic Interaction—Cluster creation

  • gnt-cluster init -s 192.0.2.1

clusterA.example.com

  • gnt-node add -s 192.0.2.2 node2.example.com
  • . . .
slide-22
SLIDE 22

Overview Architecture Customization In Production Current development Community Conclusion

Basic Interaction—Cluster creation

  • gnt-cluster init -s 192.0.2.1

clusterA.example.com

  • gnt-node add -s 192.0.2.2 node2.example.com
  • . . .
  • gnt-instance add -t drbd -o debootstrap -s 2G
  • -tags=foo,bar instance1.example.com
slide-23
SLIDE 23

Overview Architecture Customization In Production Current development Community Conclusion

Basic Interaction—Node maintenance

Evacutating a node

  • gnt-node modify --drained=yes node2.example.com
slide-24
SLIDE 24

Overview Architecture Customization In Production Current development Community Conclusion

Basic Interaction—Node maintenance

Evacutating a node

  • gnt-node modify --drained=yes node2.example.com
  • gnt-node migrate -f node2.example.com
slide-25
SLIDE 25

Overview Architecture Customization In Production Current development Community Conclusion

Basic Interaction—Node maintenance

Evacutating a node

  • gnt-node modify --drained=yes node2.example.com
  • gnt-node migrate -f node2.example.com
  • gnt-node evacuate -f -s node2.example.com
slide-26
SLIDE 26

Overview Architecture Customization In Production Current development Community Conclusion

Basic Interaction—Node maintenance

Evacutating a node

  • gnt-node modify --drained=yes node2.example.com
  • gnt-node migrate -f node2.example.com
  • gnt-node evacuate -f -s node2.example.com
  • gnt-node modify --offline=yes node2.example.com
slide-27
SLIDE 27

Overview Architecture Customization In Production Current development Community Conclusion

Basic Interaction—Node maintenance

Evacutating a node

  • gnt-node modify --drained=yes node2.example.com
  • gnt-node migrate -f node2.example.com
  • gnt-node evacuate -f -s node2.example.com
  • gnt-node modify --offline=yes node2.example.com

Using the node again

  • gnt-node modify --online=yes node2.example.com
slide-28
SLIDE 28

Overview Architecture Customization In Production Current development Community Conclusion

Basic Interaction—Node maintenance

Evacutating a node

  • gnt-node modify --drained=yes node2.example.com
  • gnt-node migrate -f node2.example.com
  • gnt-node evacuate -f -s node2.example.com
  • gnt-node modify --offline=yes node2.example.com

Using the node again

  • gnt-node modify --online=yes node2.example.com
  • hbal -L -X
slide-29
SLIDE 29

Overview Architecture Customization In Production Current development Community Conclusion

Jobs

cli

slide-30
SLIDE 30

Overview Architecture Customization In Production Current development Community Conclusion

Jobs

cli gnt-cluster gnt-node gnt-instance ...

slide-31
SLIDE 31

Overview Architecture Customization In Production Current development Community Conclusion

Jobs

cli

slide-32
SLIDE 32

Overview Architecture Customization In Production Current development Community Conclusion

Jobs

cli masterd lock queue config LUXI

slide-33
SLIDE 33

Overview Architecture Customization In Production Current development Community Conclusion

Jobs

cli masterd lock queue config LUXI RAPI

slide-34
SLIDE 34

Overview Architecture Customization In Production Current development Community Conclusion

Jobs

cli masterd lock queue config LUXI RAPI REST JSON over HTTP

slide-35
SLIDE 35

Overview Architecture Customization In Production Current development Community Conclusion

Jobs

cli masterd lock queue config RAPI

slide-36
SLIDE 36

Overview Architecture Customization In Production Current development Community Conclusion

RPC

cli masterd lock queue config RAPI master node

slide-37
SLIDE 37

Overview Architecture Customization In Production Current development Community Conclusion

RPC

cli masterd lock queue config RAPI master node noded vm-capable node backends hypervisors bdev ... RPC

slide-38
SLIDE 38

Overview Architecture Customization In Production Current development Community Conclusion

RPC

cli masterd lock queue config RAPI master node noded vm-capable node

slide-39
SLIDE 39

Overview Architecture Customization In Production Current development Community Conclusion

Configuration

cli masterd lock queue config RAPI master node noded vm-capable node master candidate queue config hosts w/ roles, status, ... instances w/ hosts, disks, ... policies

slide-40
SLIDE 40

Overview Architecture Customization In Production Current development Community Conclusion

Configuration

cli masterd lock queue config RAPI master node noded vm-capable node master candidate queue config

slide-41
SLIDE 41

Overview Architecture Customization In Production Current development Community Conclusion

Configuration

cli masterd lock queue config RAPI master node confd noded vm-capable node master candidate queue config confd

slide-42
SLIDE 42

Overview Architecture Customization In Production Current development Community Conclusion

Configuration

cli masterd lock queue config RAPI master node confd noded vm-capable node master candidate queue config confd confd protocol upd ask all, take best answer configurations time stamped

slide-43
SLIDE 43

Overview Architecture Customization In Production Current development Community Conclusion

Configuration

cli masterd lock queue config RAPI master node confd noded vm-capable node master candidate queue config confd

slide-44
SLIDE 44

Overview Architecture Customization In Production Current development Community Conclusion

Configuration

cli masterd lock queue config RAPI master node ssconf confd noded vm-capable node ssconf master candidate queue config ssconf confd ssconf flat file "static" information nodes instance, w/o location ...

slide-45
SLIDE 45

Overview Architecture Customization In Production Current development Community Conclusion

Configuration

cli masterd lock queue config RAPI master node ssconf confd noded vm-capable node ssconf master candidate queue config ssconf confd

slide-46
SLIDE 46

Overview Architecture Customization In Production Current development Community Conclusion

Roles and Statuses

Nodes can serve different roles. (Nodes can, and usually do, take both roles.)

slide-47
SLIDE 47

Overview Architecture Customization In Production Current development Community Conclusion

Roles and Statuses

Nodes can serve different roles. (Nodes can, and usually do, take both roles.)

  • VM-hosting nodes
  • VM-capable
  • grouped in “node groups”
slide-48
SLIDE 48

Overview Architecture Customization In Production Current development Community Conclusion

Roles and Statuses

Nodes can serve different roles. (Nodes can, and usually do, take both roles.)

  • VM-hosting nodes
  • VM-capable
  • grouped in “node groups”
  • Administrative nodes
  • master capable (policy decision)
  • master candidate (have a full copy of the live configuration)
  • master (manages all operations on the cluster)
slide-49
SLIDE 49

Overview Architecture Customization In Production Current development Community Conclusion

Roles and Statuses

Nodes can serve different roles. (Nodes can, and usually do, take both roles.)

  • VM-hosting nodes
  • VM-capable
  • grouped in “node groups”
  • Administrative nodes
  • master capable (policy decision)
  • master candidate (have a full copy of the live configuration)
  • master (manages all operations on the cluster)

Independently of its role, nodes can be in a different statuses:

  • nline, drained, offline
slide-50
SLIDE 50

Overview Architecture Customization In Production Current development Community Conclusion

Guest OS Interface

Ganeti is agnostic about the guest OSes; it just expects information to be provided. (on directory per guest OS)

  • executables: create, import, export, rename, verify
  • text files: ganeti api version, variants.list

Executables are provided with information via the environment.

  • OS VARIANT
  • HYPERVISOR
  • DISK COUNT, DISK 0 PATH, DISK 1 PATH, . . .
  • . . .
slide-51
SLIDE 51

Overview Architecture Customization In Production Current development Community Conclusion

Available OS Definitions

There exist quite a few implementations of the guest OS interface.

slide-52
SLIDE 52

Overview Architecture Customization In Production Current development Community Conclusion

Available OS Definitions

There exist quite a few implementations of the guest OS interface.

  • debootstrap (git://git.ganeti.org/instance-debootstrap.git)

glorified call of debootstrap(8) sfdisk, mkswap, mke2fs, . . . ; /etc/{hostname, ...}

slide-53
SLIDE 53

Overview Architecture Customization In Production Current development Community Conclusion

Available OS Definitions

There exist quite a few implementations of the guest OS interface.

  • debootstrap (git://git.ganeti.org/instance-debootstrap.git)

glorified call of debootstrap(8) sfdisk, mkswap, mke2fs, . . . ; /etc/{hostname, ...}

  • snf-image (http://www.synnefo.org/docs/synnefo/latest/snf-image.html)

Installation done by a helper VM

  • target disk, with base image, as additional disk
  • floppy with customization
slide-54
SLIDE 54

Overview Architecture Customization In Production Current development Community Conclusion

Available OS Definitions

There exist quite a few implementations of the guest OS interface.

  • debootstrap (git://git.ganeti.org/instance-debootstrap.git)

glorified call of debootstrap(8) sfdisk, mkswap, mke2fs, . . . ; /etc/{hostname, ...}

  • snf-image (http://www.synnefo.org/docs/synnefo/latest/snf-image.html)

Installation done by a helper VM

  • target disk, with base image, as additional disk
  • floppy with customization
  • ganeti-instance-image (https://code.osuosl.org/projects/ganeti-image)

image-based; images created with tar(1) or dump(8)

  • ganeti-os-defs (http://sourceforge.net/p/ganeti-os-defs/home/Home/)
  • . . .
slide-55
SLIDE 55

Overview Architecture Customization In Production Current development Community Conclusion

Ways to customize Ganeti

  • Hooks
  • Allocator
  • . . .
slide-56
SLIDE 56

Overview Architecture Customization In Production Current development Community Conclusion

Hooks

  • hook scripts to customize cluster operations
  • useful for synching with external systems
  • pre phase: e.g. for authorization
  • post phase: e.g. for logging, billing, setting passwords
  • examples: cluster-verify-post.d, node-add-pre.d
slide-57
SLIDE 57

Overview Architecture Customization In Production Current development Community Conclusion

Allocation

  • Where to put an instance?
  • protocol:
  • JSON over pipes
  • input: cluster’s state +

request-specific info

  • output: suggestions where to

place which instance

  • supported requests: allocate,

relocate, change-group, node-evacuate, multi-allocate

slide-58
SLIDE 58

Overview Architecture Customization In Production Current development Community Conclusion

Ganeti in Production

What should you add?

  • Monitoring:
  • Check host disks, memory, load
  • Automation:
  • Trigger events (evacuate, send to repairs, readd node,

rebalance)

  • Configuration Management:
  • Automated host installation / setup
  • Self service use
  • Graphical interface (e.g. Ganeti Web Manager)

(http://ganeti-webmgr.readthedocs.org/en/latest/)

  • Instance creation and resize
  • Instance console access
slide-59
SLIDE 59

Overview Architecture Customization In Production Current development Community Conclusion

Production Cluster

As we use it in a Google Datacenter

slide-60
SLIDE 60

Overview Architecture Customization In Production Current development Community Conclusion

Fleet at Google

slide-61
SLIDE 61

Overview Architecture Customization In Production Current development Community Conclusion

2.7 (Current Release)

  • Network management (contributed by grnet.gr)
  • Exclusive storage
  • Opportunistic locking
  • Restricted commands
  • Monitoring agent
slide-62
SLIDE 62

Overview Architecture Customization In Production Current development Community Conclusion

Monitoring Agent

  • integrated monitoring service
  • implemented in 2.7, 2.8, 2.9
  • new daemon, runs on all nodes, speaks http
  • provides information about the cluster’s status
  • collectors for: drbd, disk status, LVM, instance status (xen)
  • Google Summer of Code: CPU load monitoring
slide-63
SLIDE 63

Overview Architecture Customization In Production Current development Community Conclusion

2.8 (Beta)

  • Improved support of non-lvm storage
  • Downgrading
  • More work on monitoring daemon
  • Autorepair tool
  • Hroller
slide-64
SLIDE 64

Overview Architecture Customization In Production Current development Community Conclusion

Hroller

  • Scheduler for rolling reboots
  • Partitiones cluster into groups
  • f nodes that can be rebooted

simultaneously

  • various modes: default, full

evacuation, offline-maintenace

  • options for non-redundant

instances

slide-65
SLIDE 65

Overview Architecture Customization In Production Current development Community Conclusion

2.9 (Alpha)

  • DRBD 8.4 support
  • Improved support of non-lvm storage handling
  • Improvements of monitoring agent
  • Improvements of hroller
slide-66
SLIDE 66

Overview Architecture Customization In Production Current development Community Conclusion

Future

Just plans, no promises!

  • Hot-plugging
  • Automatic updates
  • More fine-grained job-queue management
  • Storage pools
slide-67
SLIDE 67

Overview Architecture Customization In Production Current development Community Conclusion

Open Source Ganeti

  • Ganeti has been open source since 2007
  • Relatively big community of external users and contributers
  • People running Ganeti:
  • Google (Corporate Computing Infrastructure)

(https://www.youtube.com/watch?v=TELArK6SmyY)

  • grnet.gr (Greek Research & Technology Network)
  • osuosl.org (Oregon State University Open Source Lab)
  • fsffrance.org (Free Software Foundation France)
slide-68
SLIDE 68

Overview Architecture Customization In Production Current development Community Conclusion

Ganeti Development Process

  • Time-based release process, one freeze every 3 months
  • Code reviews over the mailing list
  • Discussion of design documents publicly on the mailing list
  • Video-conferences with bigger contributors
  • Public continuous build system1
  • QA scripts public to be re-used

1machines provided by grnet.gr, run by Google

slide-69
SLIDE 69

Overview Architecture Customization In Production Current development Community Conclusion

Recent Events 2013

  • Fosdem 2013 (https://archive.fosdem.org/2013/)
  • Xen Hackathon in Dublin, May 2013

(http://www.xenproject.org/component/content/article/97-event-details/ 126-xen-hackathon-dublin-2013.html)

  • Google Summer of Code, 2013
  • Better Openvswitch support
  • CPU load monitoring
slide-70
SLIDE 70

Overview Architecture Customization In Production Current development Community Conclusion

Upcoming Events

  • GanetiCon, Athens, Sep 2013

(https://sites.google.com/site/ganeticon/)

  • LinuxCon North America, New Orleans, Sep 2013,

introductory talk

(http://events.linuxfoundation.org/events/linuxcon-north-america/program/schedule)

  • LinuxCon Europe, Edinburgh, UK, Oct 2013, introductory talk

(http://events.linuxfoundation.org/events/linuxcon-europe)

  • LISA, Washington D. C., Nov 2013, workshop / class

(https://www.usenix.org/conference/lisa13)

A list of publications from previous events (slides, recordings) can be found in our wiki. (https://code.google.com/p/ganeti/wiki/Publications)

slide-71
SLIDE 71

Overview Architecture Customization In Production Current development Community Conclusion

Conclusion

  • Check us out at

https://code.google.com/p/ganeti/

  • Or just search for ”Ganeti”
  • We are around on FrOSCon today and

tomorrow! Questions? Feedback? Ideas? Flames?

c 2010-2013 Google Use under GPLv2+ or CC-by-SA Some images borrowed / modified from Lance Albertson and Guido Trotter