OpenStack on Kubernetes: Make OpenStack and Kubernetes Fail-Safe - - PowerPoint PPT Presentation

openstack on kubernetes make openstack and kubernetes
SMART_READER_LITE
LIVE PREVIEW

OpenStack on Kubernetes: Make OpenStack and Kubernetes Fail-Safe - - PowerPoint PPT Presentation

OpenStack on Kubernetes: Make OpenStack and Kubernetes Fail-Safe Seungkyu Ahn (ahnsk@sk.com) Jaesuk Ahn (jay.ahn@sk.com) Open System Lab Network IT Convergence R&D Center SK Telecom Wil Reichert (wil@solinea.com) Solinea What will


slide-1
SLIDE 1

OpenStack on Kubernetes: Make OpenStack and Kubernetes Fail-Safe

Seungkyu Ahn (ahnsk@sk.com)
 Jaesuk Ahn (jay.ahn@sk.com)

Open System Lab Network IT Convergence R&D Center SK Telecom

Wil Reichert (wil@solinea.com)

Solinea

slide-2
SLIDE 2

What will happen

  • Introduction
  • Kubernetes/OpenStack
  • Demo Starts
  • CI
  • Demo Ends
slide-3
SLIDE 3

Introducing Our Company

SKT

  • No. 1 Mobile Service Provider in Korea

with 50% market share

  • We has been at the forefront of

developing and commercializing new wireless technologies (recently, 4G LTE 5band CA with max 700Mbps)

  • We are exploring more than network;

Especially around AI and Media.

  • We actively participate open source

project; OCP, TIP, ONOS, Ceph, OpenStack, etc.

Solinea

  • professional services partner that

accelerates enterprise cloud adoption

  • Technology agnostic, always

working in the best interest of

  • ur clients
  • Our clients are primarily Global

Fortune 1000 organizations in multiple industry verticals

slide-4
SLIDE 4

This is Totally Community Effort

Wil: CI/CD & K8S Sungkyu Ahn: 


OpenStack & K8S

Jaesuk Ahn:


OpenStack & K8S

Robert Choi:

OpenStack & Automation

Containers! Cloud Native! Dan Kim: 


OpenStack & K8S

Jawon Choo:


OpenStack & Kolla

Large Contributing OpenStack Operator WG

slide-5
SLIDE 5

Current (previous) Way

SPEC DEV TEST UPSTREAM Community Code OpenStack Package Configuration Management

+

Requirements Deployment Architecture

  • Network
  • Storage
  • Appliance Integration
  • Configuration Tuning

Hardware/Appliance Purchase Deployment QA

OpenStack Production Deployment Operation Development

Triage Monitoring Upgrade Tuning Capacity Mgmt. Scale-out Analysis Patch Deployment Automation Flexible Configuration

+

Trouble Sh.

slide-6
SLIDE 6

Previous Product Pain-Points

  • Even Update (patch) is challenging
  • Upgrade - gosh, what I can say.
  • Deployment issue - snow-flake env. vs. cattle
  • Not single huge scale OpenStack, but many small/medium OpenStacks
  • Lack of flexible configuration management capability in “standardized

manner”

  • Very difficult to integrate with our own stuffs (Ceph, SDN Controller,

Datacenter Operation Platform, etc)

slide-7
SLIDE 7

More to Do

SPEC DEV TEST UPSTREAM Community Code OpenStack Package Configuration Management

+

Requirements Deployment Architecture

  • Network
  • Storage
  • Appliance Integration
  • Configuration Tuning

Hardware/Appliance Purchase Deployment QA

OpenStack Production Deployment Operation Development

Triage Monitoring Upgrade Tuning Capacity Mgmt. Scale-out Analysis Patch Deployment Automation Flexible Configuration

+

Trouble Sh. TEST

slide-8
SLIDE 8

Continuous Loop

SPEC DEV TEST UPSTREAM Community Code OpenStack Package Configuration Management

+

Requirements Deployment Architecture

  • Network
  • Storage
  • Appliance Integration
  • Configuration Tuning

Hardware/Appliance Purchase Deployment QA

OpenStack Production Deployment Operation Development

Triage Monitoring Upgrade Tuning Capacity Mgmt. Scale-out Analysis Patch Deployment Automation Flexible Configuration

+

Trouble Sh.

slide-9
SLIDE 9

Why OpenStack on Kubernetes?

Better way to deliver OpenStack and manage its Lifecycle

  • Reducing Overhead: Dependency Management
  • “Easy and Fast” Multiple Deployment in “Standardized” way
  • Upgrade/Update/Rollback
  • Easy Scaling and Healing
slide-10
SLIDE 10

Key Technologies

  • Kubernetes (Control Plance Orchestration)
  • Helm (Application Lifecycle Management Automation)
  • CI/CD Pipeline leveraging Jenkins
  • OpenStack-helm (Managing OpenStack on Kubernetes)
  • Kolla (Containerizing OpenStack)
  • ONOS/SONA (OpenStack Network Management)
  • Ceph (Storage)
slide-11
SLIDE 11

Our Plan

  • Production-Ready by the end of 2017
  • First Production within 2017 (IT Infrastructure)
  • Expanding to more deployment (Media, NFV) in 2018 


& Putting more APPs on this “Streamline”

slide-12
SLIDE 12

Overall Architecture

slide-13
SLIDE 13

Today’s Demo System

git Helm Repo

Tiller Helm Cli Kubectl

Jenkins
 Master Jenkins
 Slave Kubernetes
 Master Kubernetes
 Node

slide-14
SLIDE 14

What is the HA target?

Kubernetes Master

  • Etcd
  • API server (load balance)
  • Scheduler (leader election)
  • Controller manager (leader election)


OpenStack Controller (Keystone, Glance, Nova, Cinder, Neutron)

  • API server (load balance)
  • Scheduler (Nova, Cinder)
  • MariaDB
  • RabbitMQ
  • Neutron network node (SONA)
slide-15
SLIDE 15

Kubernetes 3-Masters

apiserver kube-master01 node01 scheduler controller manager etcd kube proxy kubelet apiserver kube-master02 scheduler controller manager etcd flanneld flanneld flanneld kubelet kubelet apiserver kube-master03 scheduler controller manager etcd flanneld kubelet kube proxy kubelet flanneld node00 ceph kube proxy kube proxy kube proxy

slide-16
SLIDE 16

Kubelet

KUBELET_OPTS="--kubeconfig=/etc/kubernetes/kubelet.conf \

  • -require-kubeconfig=true \
  • -hostname-override=kube-master01 \
  • -logtostderr=false \
  • -log-dir=/var/log/kubernetes \
  • -pod-manifest-path=/etc/kubernetes/manifests \
  • -allow-privileged=true \
  • -v=0 \
  • -register-schedulable=false \
  • -cluster-dns=10.96.0.10 \
  • -cluster-domain=cluster.local"

Kubelet flanneld

slide-17
SLIDE 17

etcd yaml

  • -name kube-master01
  • -initial-advertise-peer-urls http://192.168.30.13:2380
  • -listen-peer-urls http://192.168.30.13:2380
  • -advertise-client-urls http://192.168.30.13:4001
  • -listen-client-urls http://192.168.30.13:2379,


http://127.0.0.1:2379,http://192.168.30.13:4001,
 http://127.0.0.1:4001

  • -data-dir /var/etcd/data
  • -initial-cluster-token 5d3903915c2cda30174970d784075f0a
  • -initial-cluster kube-master01=http://192.168.30.13:2380,


kube-master02=http://192.168.30.14:2380,
 kube-master03=http://192.168.30.15:2380

  • -initial-cluster-state new

Kubelet flanneld

etcd

slide-18
SLIDE 18

kube-apiserver yaml

  • /usr/local/bin/kube-apiserver
  • -etcd-servers=http://127.0.0.1:2379
  • -storage-backend=etcd3
  • -insecure-bind-address=127.0.0.1
  • -insecure-port=8080
  • -secure-port=6443
  • -admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,

PersistentVolumeLabel,DefaultStorageClass,ResourceQuota

  • -service-cluster-ip-range=10.96.0.0/16
  • -tls-cert-file=/etc/kubernetes/pki/apiserver.crt
  • -tls-private-key-file=/etc/kubernetes/pki/apiserver-key.pem
  • -token-auth-file=/etc/kubernetes/pki/kube-token
  • -service-account-key-file=/etc/kubernetes/pki/apiserver-key.pem
  • -allow-privileged
  • -anonymous-auth=false

Kubelet flanneld

etcd kube-apiserver

slide-19
SLIDE 19

kube-controller-manager yaml

  • kube-controller-manager
  • -master=127.0.0.1:8080
  • -cluster-cidr=172.16.0.0/16
  • -cluster-name=kubernetes
  • -allocate-node-cidrs=true
  • -service-account-private-key-file=/etc/kubernetes/pki/

apiserver-key.pem

  • -root-ca-file=/etc/kubernetes/pki/ca.crt
  • -cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt
  • -cluster-signing-key-file=/etc/kubernetes/pki/ca-key.pem
  • -v=0
  • -leader-elect=true

Kubelet flanneld

controller
 manager etcd kube-apiserver

slide-20
SLIDE 20

kube-scheduler yaml

  • /usr/local/bin/kube-scheduler
  • -master=127.0.0.1:8080
  • -v=0
  • -leader-elect=true

Kubelet flanneld

controller
 manager scheduler etcd kube-apiserver

slide-21
SLIDE 21

kube-proxy yaml

securityContext: privileged: true command:

  • /bin/sh
  • -c
  • /usr/local/bin/kube-proxy
  • -kubeconfig=/run/kubeconfig
  • -cluster-cidr=10.96.0.0/16
  • -v=0

Kubelet flanneld

controller
 manager scheduler etcd kube-apiserver kube-proxy

slide-22
SLIDE 22

OpenStack Controller & Compute

kubernetes worker-nodes
 (label: controller) kubernetes worker-nodes (label: compute)

MariaDB RabbitMQ Keystone Glance Cinder NOVA Neutron NOVA Neutron

VM VM

slide-23
SLIDE 23

OpenStack Controller & Compute

kubernetes worker-nodes
 (label: controller) kubernetes worker-nodes (label: compute)

OpenStack 
 Process 1 (nova-api)

NOVA Neutron

VM VM

OpenStack 
 Process 2 (nova-api) OpenStack 
 Process 3 (nova-api)

slide-24
SLIDE 24

OpenStack Controller & Compute

kubernetes worker-nodes
 (label: controller) kubernetes worker-nodes (label: compute)

NOVA Neutron

VM VM

OpenStack 
 Process 1 (nova-api) OpenStack 
 Process 2 (nova-api) OpenStack 
 Process 3 (nova-api) OpenStack 
 Process 3 (nova-api)

slide-25
SLIDE 25

Database clustering (3 node)

jobs/mariadb-seed po/mariadb-0 po/mariadb-1 po/mariadb-2 joiner (3th mariadb)

  • -wsrep_cluster_address=gcomm://172.16.56.7,172.16.75.5,172.16.8.15
slide-26
SLIDE 26

Neutron network (1 nic)

Network

br-ex br-data eth0

addif veth1 addif veth0 link add (veth) add-port

br-int

192.168.30.33 patch

phy-br-ex int-br-ex qr-xxx qg-xxx

br-tun

patch-int patch-tun

patch

  • vs internal iface
  • vs patch

qrouter-xxx

vxlan-xxx

  • vs vxlan

linux virtual iface

tapxxx

qdhcp-xxx namespace

local ip, remote ip, vni: flow

qbrxxx

tabxxx VM qvbxxx

br-int

patch-int qvoxxx

br-tun

vxlan-xxx patch-tun

eth0 br-ex

phy-br-ex int-br-ex

br-data

veth1 veth0 192.168.30.34

Network Node Compute Node

slide-27
SLIDE 27

OpenStack-Helm Neutron chart

… network:
 interface:
 external: veth0
 default: br-data
 ml2:
 agent:
 tunnel_types: vxlan
 type_drivers:


  • flat

  • vxlan

  • vs:


auto_bridge_add: null
 bridge_mappings: null
 neutron:
 default:
 l3_ha_network_type: vxlan
 debug: 'True’ …

slide-28
SLIDE 28

Simplified Overlay Network Architecture

Neutron

Sets Switching/Routing flow rules

Compute-01 Compute-02

nova-compute br-int (OVS)

Gateway Node Group

nova-compute br-router (OVS) Quagga Hypervisor Hypervisor

Controls external connectivity BGP / OSPF Multipath

br-int (OVS) br-int (OVS) Nova OpenStackNetworking

Proxies ARP, DHCP

vRouter

Sets NAT flow rules Provisions virtual machine East-West Traffic North-South Traffic

OpenStack SONA(ONOS)

VXLAN tunnel

slide-29
SLIDE 29

Live Demo 1

slide-30
SLIDE 30

CI Presentation - Wil

slide-31
SLIDE 31

VS

slide-32
SLIDE 32

Moving parts - some numbers

  • Git repositories: 34 local, 10 upstream
  • Deployment configurations: 4
  • Supported Environments: 4
  • Charts per deployment: 12
  • Unique docker images in each deployment: 23
  • Pods & jobs in a single deployment: 34
  • Pods & jobs in a 3 node HA deployment: 85
slide-33
SLIDE 33

Kinds of Workflows

  • Build Kolla Container
  • Build Helm Chart
  • Test
  • Deploy / upgrade
slide-34
SLIDE 34

Kinds of Jobs

  • Commit / PR builds - validate internal changes
  • Live system deploys - validate component upgrades
  • Nightly builds - validate merge CI / upstream
  • Nightly redeploy - validate clean slate deploys
  • Upstream builds - identify incoming breaking changes
slide-35
SLIDE 35

When you’ve got twice as many virtualization technologies, things fall over twice as fast.

slide-36
SLIDE 36

Managing the Clutter

  • No upstream forks
  • Isolated CI build & test environments
  • Incremental upgrades
  • Constant rebuilds
slide-37
SLIDE 37

Testing starts from the ground up Deployed Kubernetes

  • Kubernetes e2e
  • Custom
  • Heapster*
slide-38
SLIDE 38

Testing starts from the ground up Kolla Container Builds

  • Bats
  • Clair
slide-39
SLIDE 39

Testing starts from the ground up Helm charts

  • Helm test
  • Partial tempest runs
  • Built in (e.g. horizon)
slide-40
SLIDE 40

Testing starts from the ground up Deployed Openstack

  • Full tempest runs
  • Rally
slide-41
SLIDE 41

Deployment is only the Beginning

  • Manual validations
  • Launch custom application on OpenStack
  • Additional security tests on running containers
  • Performance baselines and thresholds
slide-42
SLIDE 42

Failure testing

  • Risk - Many pods are run —privileged & with host networking
  • Layered solutions present a lot more interesting failure cases
  • HA rules change
  • Openstack HA behaviors mapped onto Kubernetes HA behavior
slide-43
SLIDE 43

Upgrades Become Boring

  • All changes can be simulated multiple times prior to live rollout
  • Minor component patches become trivial
  • OpenStack upgrades become predictable
slide-44
SLIDE 44

Live Demo 2

slide-45
SLIDE 45

Challenges (pitfalls)

  • Operational Burden
  • Limitations
  • OpenStack as Cloud Native Apps? Feasible? Needed? Gaps and

Improvement?

  • Kubernetes Stability (fast moving projects)
  • Security
slide-46
SLIDE 46

THE END