ganeti the new and arcane
play

Ganeti, the New and Arcane ganeti's best kept secrets, and exciting - PowerPoint PPT Presentation

Ganeti, the New and Arcane ganeti's best kept secrets, and exciting new developments Ganeti Eng Team - Google LinuxCon Japan 2014 - 2 Feb 2014 Introduction to Ganeti A cluster virtualization manager, in one slide What is Ganeti? Manage


  1. Ganeti, the New and Arcane ganeti's best kept secrets, and exciting new developments Ganeti Eng Team - Google LinuxCon Japan 2014 - 2 Feb 2014

  2. Introduction to Ganeti A cluster virtualization manager, in one slide

  3. What is Ganeti? · Manage clusters 1-200 of physical machines, divided in nodegroups · Deploy Xen/KVM/LXC virtual machines on them - Live migration - Resiliency to failure (DRBD, Ceph, SAN/NAS, ...) - Cluster balancing - Ease of repairs and hardware swaps · Controlled via command line, REST, web interfaces 4/53

  4. Newest features Development status

  5. 2.10 The very stable release · Improved upgrade procedure "gnt-cluster upgrade" · CPU Load in hail/hbal (GSOC project) · Hotplug support (KVM) · RBD storage direct access (KVM) · Better Openvswitch support (GSOC project) 6/53

  6. 2.11 The latest stable release · Faster instance moves · GlusterFS support · hsqueeze (achieve maximum cluster compaction) 7/53

  7. 2.12 and future The next stable release(s) · Jobs as processes · New install model · More secure master candidates · Better container support (GSOC) · Resource reservation/Extra parallelization · Generic conversion between disk templates (GSOC) 8/53

  8. Monitoring daemon What's going on in your cluster?

  9. Monitoring a cluster The old school way Other Systems Cluster Monitoring NICs System Instance Master Node Storage 10/53

  10. Monitoring a cluster Using the monitoring daemon Other Systems Cluster Monitoring System Monitoring Daemons 11/53

  11. What is the monitoring daemon? Provides information: · about the cluster state/health · live · read-only design doc: design-monitoring-agent.rst 12/53

  12. More details · HTTP daemon · Replying to REST-like queries · Actually, GET only · Providing JSON replies · Easy to parse in any language · Already used in all the rest of Ganeti · Running on every node (Not: only master-candidates, VM-enabled) · Additionally: mon-collector : quick 'n dirty CLI tool 13/53

  13. Data collectors · provide data to the deamon · one collector, one report · one collector, one category: - storage, hypervisor, daemon, instance · two kinds: performance reporting, status reporting · new feature: stateful data collectors 14/53

  14. Data collectors What data can be retrieved right now? Now: · instance status (Xen only) (category: instance) · diskstats information (storage) · LVM logical volumes information (storage) · DRBD status information (storage) · Node OS CPU load average (no category, default) Soon(-ish): · instance status for KVM (instance) · Ganeti daemons status (daemon) · Hypervisor resources (hypervisor) · Node OS resources report (default) 15/53

  15. The report format JSON { "name" : "TheCollectorIdentifier", "version" : "1.2", "format_version" : 1, "timestamp" : 1351607182000000000, "category" : null, "kind" : 0, "data" : { "plugin_specific_data" : "go_here" } } · name: the name of the plugin. Unique string. · version: the version of the plugin. A string. · format_version: the version of the data format of the plugin. Incremental integer. · timestamp: when the report was produced. Nanoseconds. Can be zero- padded. 16/53

  16. Status reporting collectors: report They introduce a mandatory part inside the data section. JSON "data" : { ... "status" : { "code" : <value> "message: "some summary goes here" } } · <value>: by increasing criticality level · 0: working as intended · 1: temporarily wrong. Being auto-repaired · 2: unknown. Potentially dangerous state · 4: problems. External intervention required 17/53

  17. How to use the daemon? · Accepts HTTP connections on node.example.com:1815 · Not authenticated: read only · Just firewall, or bind on local address only · GET requests to specific addresses · Each address returns different info according to the API / (return the list of supported protocol version) /1/list/collectors /1/report/all /1/report/[category]/[collector_name] 18/53

  18. Configuration Daemon (confd) How's your cluster supposed to look like?

  19. Before confd · Configuration only available on master candidates · Few selected values replicated with ssconf · Small pieces of config in text files on all the nodes · Doesn't scale · Need for a way to access config from other nodes · Scalable · No single point of failure (so, no RAPI) 20/53

  20. What does confd do? · Provides information from config.data · Read-only · Distributed · Multiple daemons running on master candidates · Accessible from all the nodes through confd protocol · Resilient to failures · Optional 21/53

  21. What info does it provide? Replies to simple queries: · Ping · Master IP · Node role · Node primary IP · Master candidates primary IPs · Instance IPs · Node primary IP from Instance primary IP · Node DRBD minors · Node instances 22/53

  22. confd protocol General description · UDP (port 1814) · keyed-Hash Message Authentication Code (HMAC) authentication · Pre-shared, cluster wide key · Generated at cluster-init · Root-only readable · Timestamp · Checked (± 2.5 mins) to prevent replay attacks · Used as HMAC salt · Queries made to any subset of master candidates · Timeout · Maximum number of expected replies 23/53

  23. Confd protocol Request/Reply request request request request request 24/53

  24. Confd protocol Request/Reply timeout reply (v: 57) reply (v: 57) reply (v: 57) (enough replies) reply (v: 56) 25/53

  25. confd protocol Request CONFD plj0{ "msg": "{\"type\": 1, \"rsalt\": \"9aa6ce92-8336-11de-af38-001d093e835f\", \"protocol\": 1, \"query\": \"node1.example.com\"}\n", "salt": "1249637704", "hmac": "4a4139b2c3c5921f7e439469a0a45ad200aead0f" } · plj0: fourcc detailing the message content (PLain Json 0) · hmac: HMAC signature of salt+msg with the cluster hmac key 26/53

  26. confd protocol Request CONFD plj0{ "msg": "{\"type\": 1, \"rsalt\": \"9aa6ce92-8336-11de-af38-001d093e835f\", \"protocol\": 1, \"query\": \"node1.example.com\"}\n", "salt": "1249637704", "hmac": "4a4139b2c3c5921f7e439469a0a45ad200aead0f" } · msg: JSON-encoded query · protocol: confd protocol version (=1) · type: What to ask for ( CONFD_REQ_* constants) · query: additional parameters · rsalt: response salt == UUID identifying the request 27/53

  27. confd protocol Reply CONFD plj0{ "msg": "{\"status\": 0, \"answer\": 0, \"serial\": 42, \"protocol\": 1}\n", "salt": "9aa6ce92-8336-11de-af38-001d093e835f", "hmac": "aaeccc0dff9328fdf7967cb600b6a80a6a9332af" } · salt: the rsalt of the query · hmac: hmac signature of salt+msg 28/53

  28. confd protocol Reply CONFD plj0{ "msg": "{\"status\": 0, \"answer\": 0, \"serial\": 42, \"protocol\": 1}\n", "salt": "9aa6ce92-8336-11de-af38-001d093e835f", "hmac": "aaeccc0dff9328fdf7967cb600b6a80a6a9332af" } · msg: JSON-encoded answer · protocol: protocol version (=1) · status: 0=ok; 1=error · answer: query-specific reply · serial: version of config.data 29/53

  29. Ready-made clients The protocol is simple, but clients are simpler · Ready to use confd clients · Python · lib/confd/client.py · Haskell · Since Ganeti 2.7 · src/Ganeti/ConfD/Client.hs · src/Ganeti/ConfD/ClientFunctions.hs 30/53

  30. Expanding confd capabilities · Currently not so many queries are supported · Easy to add new ones · Just add a new query type in the constants list · ...and extend the buildResponse function ( src/Ganeti/Confd/Server.hs to reply to it in the appropriate way 31/53

  31. Ganeti and Networks How do your instances talk to the world? · Some slides contributed by Dimitris Aragiorgis <dimara@grnet.gr>

  32. current nics: MAC + IP + link + mode NIC configuration · mode=bridged uses brctl addif · Hooks can deal with firewall rules, and more · External systems needed for DHCP, IPv6, etc. Management · Which VMs are on the same collision domain? · Which IP is free for a new VM to use? 33/53

  33. gnt-network overview · manage collision domains for your instances · easy way to assign IPs to instances - If resources are shared in multiple clusters, allocation must be done externally · keep existing per-nic flexibility · hide underlying infrastructure · better networking overview 34/53

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend