WHO? Sten Spans Schuberg Philis @sspans (github, etc) CUSTOMER - - PowerPoint PPT Presentation

who sten spans schuberg philis sspans github etc
SMART_READER_LITE
LIVE PREVIEW

WHO? Sten Spans Schuberg Philis @sspans (github, etc) CUSTOMER - - PowerPoint PPT Presentation

CUSTOMER WHO? Sten Spans Schuberg Philis @sspans (github, etc) CUSTOMER TOPIC Going from 100 to 10000 systems Orchestrating a Zone Not Google-scale CUSTOMER WHY? New Zone Rethink principles Automate Comments on Centos7/KVM Conceptual


slide-1
SLIDE 1

CUSTOMER

WHO? Sten Spans Schuberg Philis @sspans (github, etc)

slide-2
SLIDE 2

CUSTOMER

TOPIC Going from 100 to 10000 systems Orchestrating a Zone Not Google-scale

slide-3
SLIDE 3

CUSTOMER

WHY? New Zone Rethink principles Automate Comments on Centos7/KVM Conceptual or Technical?

slide-4
SLIDE 4

CUSTOMER

WHAT?

slide-5
SLIDE 5

CUSTOMER

SUDO MAKE CLOUD Networking Hypervisors Storage Orchestration

slide-6
SLIDE 6

CUSTOMER

TOYS

Source: https:/ /www.flickr.com/photos/rfc1036/406675831/

slide-7
SLIDE 7

CUSTOMER

STAFF

slide-8
SLIDE 8

CUSTOMER

GOAL

slide-9
SLIDE 9

CUSTOMER

GOAL

slide-10
SLIDE 10

CUSTOMER

CLOUDY

https:/ /www.flickr.com/photos/versageek/493800514

slide-11
SLIDE 11

CUSTOMER

MISTAKES Artisinal / Pets Network not Scalable / Redundant Stretching Failure-domains Other technical downsides Lack of Automation

slide-12
SLIDE 12

CUSTOMER

WHAT IS ARTISINAL? People tracking MAC addresses Tweaking settings for each system Multiple sources of truth Validation / Acceptance test Naming - individual servers

slide-13
SLIDE 13

CUSTOMER

NAMING? Impacts automation Impacts labeling Impacts replacements Go for location-based identities!

slide-14
SLIDE 14

CUSTOMER

NETWORKING? Large layer2 domains Sharing networks between zones Manual configuration Not redundant (enough)? Or more failures due to redundancy?

slide-15
SLIDE 15

CUSTOMER

FAILURE DOMAINS Do you really want twin-datacenter? Clustering is complicated… Way more complicated failures… Have you actually tested failures?

slide-16
SLIDE 16

CUSTOMER

GOAL Manage zone as one unit Capture design / logic in config-management Versioned Iterations Think about naming Think about how you identify hosts Simplify…

slide-17
SLIDE 17

CUSTOMER

GOAL Stop managing individual servers (cattle) Stop being Artisanal Start scaling Start Orchestrating Think Terraform/CloudFormation/Heat

slide-18
SLIDE 18

CUSTOMER

BUILDING BLOCKS Isolated Networking Isolated Pods Worry-free Storage Optional: Dedicated SDN Clusters Fully orchestrated zones

slide-19
SLIDE 19

CUSTOMER

BOOTSTRAP NETWORK CORE Core Switches LoM switch Hypervisors SDN?

slide-20
SLIDE 20

CUSTOMER

CORE SWITCHES Linux based Bootstrap via DHCP/HTTP

Chef/Ansible/Puppet supported!

Capture design in cookbooks/playbooks

Can run additional services

slide-21
SLIDE 21

CUSTOMER

SDN

Cluster per (availability) Zone

Failure Domain Features vs. Lock-in Complicated? Expensive? Accept tunnels between zones Customers will accept trade-offs!

slide-22
SLIDE 22

CUSTOMER

BOOTSTRAP A POD TOR Switch Pair LoM switch Hypervisors Storage

slide-23
SLIDE 23

CUSTOMER

TOR SWITCHES Linux Based Bootstrap via DHCP/HTTP

Chef/Ansible/Puppet supported!

Capture design in cookbooks/playbooks

Can run DHCP/DNS per Pod Move pod services into the Pod

slide-24
SLIDE 24

CUSTOMER

LOM SWITCHES

Can bootstrap via ToR switch

Config via ToR Manage iLO’s via DHCP Hooks Would love a linux box here too

slide-25
SLIDE 25

CUSTOMER

HYPERVISORS Linux Based

Automated Firmware Updates

Bootstrap via DHCP/HTTP HTTP Bootstrap via Chef TFTP Proxy on ToR Location based DHCP (Option 82)

slide-26
SLIDE 26

CUSTOMER

HYPERVISOR HARDWARE Machines are extremely scalable Calculate cost per VM Waiting for 25G Ethernet Has anybody solved EFI PXE? Please?

slide-27
SLIDE 27

CUSTOMER

PROVISIONING Bootstrap via DHCP/HTTP

Nekopan - Golang webserver

Interfaces with Chef (or ansible/puppet)

slide-28
SLIDE 28

CUSTOMER

STORAGE Stable NFS – For now… API Driven No fancy replication / clustering

slide-29
SLIDE 29

CUSTOMER

DONE? Lets add all of this to cloudstack…

slide-30
SLIDE 30

CUSTOMER

CLOUDSTACK SDN providers need work cloudstack-setup-agent is … horrible Routervm/SystemVM Small networking issues And I bet there is more…

slide-31
SLIDE 31

CUSTOMER

THE HORROR:

slide-32
SLIDE 32

CUSTOMER

WHAT IS GOING ON? All Ubuntu is the same… Fedora == Redhat 6 Centos == Redhat 5 Or you may have Redhat 7

Really? WTF?

slide-33
SLIDE 33

CUSTOMER

RESULTS ON CENTOS 7 Selinux is disabled (revert broken) Firewall changes don’t work for firewalld Cgroup changes are not that cool really

Workarounds for old bugs results in breakage on newer systems

So I reinstalled the box

slide-34
SLIDE 34

CUSTOMER

CENTOS 7 STATUS Selinux seems to work Labeled NFS is still bleeding edge No need to mess with cgroups Firewalld is pretty nice really Cloudstack should perhaps audit the config But please don’t change it…

slide-35
SLIDE 35

CUSTOMER

ROUTERVM We run ansible to hotfix/manage routervms

But ip / kernel commandline not available on KVM L

Qemu-guest-agent solves that and more… Libvmi – not sure