TUT1131 - Best Practices in Deploying SUSE CaaS Platform Martin - - PowerPoint PPT Presentation

tut1131 best practices in deploying suse caas platform
SMART_READER_LITE
LIVE PREVIEW

TUT1131 - Best Practices in Deploying SUSE CaaS Platform Martin - - PowerPoint PPT Presentation

TUT1131 - Best Practices in Deploying SUSE CaaS Platform Martin Weiss Juan Utande Herrera Senior Architect Infrastructure Solutions Senior Architect Infrastructure Solutions Martin.Weiss@SUSE.com Juan.Herrera@suse.com AGENDA AGEN What


slide-1
SLIDE 1

TUT1131 - Best Practices in Deploying SUSE CaaS Platform

Martin Weiss Senior Architect Infrastructure Solutions Martin.Weiss@SUSE.com Juan Utande Herrera Senior Architect Infrastructure Solutions Juan.Herrera@suse.com

slide-2
SLIDE 2

What What is is SUSE SUSE CaaS CaaS Platform tform 1

AGEN AGENDA

Requirements uirements 2 Plan Planning and Sizi and Sizing 3 Deployment B loyment Best P t Practices ctices 4 Testing ting 5 Operations rations 6

slide-3
SLIDE 3

3

What is SUSE CaaS Plaform 3

slide-4
SLIDE 4

4

SUSE: Underpinning Digital Transformation

Physical Infrastructure: Multi-platform Servers, Switches, Storage Container Management

SUSE CaaS Platform

Storage

SUSE Enterprise Storage

Networking

SDN and NFV

Compute

Virtual Machine & Container

Multimodal Operating System

SUSE Linux Enterprise Server

Platform as a Service

SUSE Cloud Application Platform

Private Cloud / IaaS

SUSE OpenStack Cloud

Software-defined Infrastructure

SUSE Manager

Infrastructure & Lifecycle Management

Application Delivery

SUSE Global Services

Consulting Services Select Services Premium Support Services

Services

Business-critical Applications Machine Learning Business Analytics High Performance Computing Traditional IT & Applications Internet of Things

Open, Secure, Proven

Public Cloud

SUSE Cloud Service Provider Program

slide-5
SLIDE 5

What is SUSE CaaS Platform 3?

  • Kubernetes
  • MicroOS with Transactional Updates
  • Simple deployment
  • SUSE supported
  • LDAP / Active Directory Integration
  • Caching Registry Integration
  • Air Gapped Implementation Support
  • Registry.suse.com
  • Helm
  • Docker or Cri-o (tech preview), Flannel
  • Multiple deployment methods
slide-6
SLIDE 6

6

Requirements

slide-7
SLIDE 7

General requirements

Where to deploy

  • Deploy on physical

Hardware or on your Virtualization infrastructure

  • Ready to Run on

Public and Private Clouds What do I need

  • SUSE CaaS

Platform subscriptions

  • SLES for

infrastructure nodes Who can help me

  • Sales and Pre/Post

Sales Consulting:

  • Help choosing the

right Hardware

  • Architect the

solution

  • Initial

implementation Support options

  • Included 24/7

priority support in case of issues

  • Consulting for

maintenance and proactive support to scale, upgrade, review and fix

slide-8
SLIDE 8

Application Requirements (Sizing)

  • Number of Pods
  • Memory, CPU
  • Storage requirements (file,

block, object, single or multi- writer, capacity, static or dynamic provisioning)

  • specific Hardware / CPU /

GPU requirements

  • Network Entry points /

Services / Bandwidth Security Requirements

  • Images (source and size)
  • Isolation
  • Integration into existing

Identity Sources Availability Requirements

  • Single or multi data-center
  • Distance / Latency

Use Case Specific Requirements

$$$ BUDGET $$$ Politics, Religion, Philosophy, Processes ;-)

slide-9
SLIDE 9

Planning and Sizing

slide-10
SLIDE 10

Kuber ernet etes es SUSE C SE CaaS P S Platform – CLUS USTER 1 R 1

Worker Worker Worker Master Admin Master Master

+

Based

  • n

number

  • f pods

+

Based on number of pods and resource requirements

LDAP, Salt, Velum, SQL Fault tolerance ETCD cluster

Planning and Sizing

Workers as VM or physical Second cluster:

  • Fault

tolerance

  • Disaster

recovery Disk Space for each Worker:

  • 50 GB for OS (BTRFS minimum for OS)
  • 100 GB for /var/lib/docker (BTRFS for Images

and Containers)

  • Space really depends on image sizes, versions

and changes

slide-11
SLIDE 11

1

Deployment Best Practices

slide-12
SLIDE 12

Deployment - Processes and People

Prepare the Team (DevOps?) – Server – Storage – Network – Application – Security – User Other

slide-13
SLIDE 13

1 Infrastructur e Preparation 2 Base Software Installation 3 Infrastructur e Verification 4 SUSE CaaS Platform Installation 5 Kubernetes Addons

Deployment Stages

slide-14
SLIDE 14

Review the Design

  • Depending on the requirements adjust

before implementation

Hardware Installation

  • Ensure that hardware installation and

cabling is correct

  • Update Firmware
  • Adjust Firmware / BIOS settings

Disable everything not required (i.e. serial

ports, network boot, power saving)

Configure HW date/time VM Preparation

  • Use paravirtual SCSI

Preparation of Time Synchronization

  • Have a fault tolerant time provider group

Name Resolution

  • Ensure that all addresses of the servers

have different names

  • Add all addresses to DNS with forward and

reverse lookup

  • Ensure DNS is fault tolerant
  • /etc/HOSTNAME must be the name in the

public network

  • Define and create DNS Entries for internal

and external Velum and API targets (Cname, Load Balancer, no round robin)

Deployment

slide-15
SLIDE 15

Deploy On-Premise Registry (docker-distribution-registry)

  • Implement Portus to Secure the On-Premise Registry
  • Create DNS entry for Registry
  • Create Namespaces and Users on Registry
  • Optional: Integrate Portus into existing LDAP or Active-Directory

Put all required images into registry into the right namespace

  • Dashboard, Prometheus, Grafana, etc.

Optional: Setup caching registries

Deployment

slide-16
SLIDE 16

Prepare Load Balancer Endpoints for API and DEX

  • Port 6443 and 32000

Storage Network setup and connectivity Prepare on-premise helm chart repository Prepare docker host to pull from internet, scan images, push to on- premise registry Prepare GIT for storing all manifests / yaml files

Deployment

slide-17
SLIDE 17

Software Staging

  • Subscription Management Toolkit, SUSE

Manager, RMT (limited)

  • Ensure staging of patches to guarantee

same patch level on existing servers and newly installed servers

General

  • Use BTRFS for the OS
  • Disable Firewall / AppArmor / IPv6

AutoYaST

  • Ensure that all servers are installed

100% identical

  • Consulting solution available (see

https://github.com/Martin-Weiss/cif)

Configuration Management

  • Templates
  • Salt

Deployment

slide-18
SLIDE 18

Deployment

ONLY USE STATIC IP Configs Verify Time Synchronization Verify Name Resolution Test all Network Connections

  • Bandwidth
  • Latency
slide-19
SLIDE 19

Deployment

  • Install all Servers (Admin, Master, Worker) via AutoYaST
  • Ensure that all the patches available are installed at this point in time
  • AutoYaST configures Salt to ensure all Master/Worker connect to Salt-Master on the Admin host
  • Access Velum web-interface and create admin user
  • Specify Internal Dashboard FQDN (CNAME)
  • Enable Tiller (for later Helm usage)
  • Configure the overlay network
  • Add the SSL certificate of the CA signing the registry and external LDAP certificates
  • Accept Nodes, Assign Roles
  • Specify External API FQDN (load balancer for API and DEX)
  • Specify External Velum FQDN (CNAME)
  • Run Bootstrap (and now have a cup of coffee ;-))
slide-20
SLIDE 20

Deployment

Create required Namespaces Create required Users / Groups in LDAP or Connect to Active Directory Create Roles and Role-Assignments Deploy Basic Services

  • K8s Dashboard
  • Persistent Storage / Storage Classes
  • Ingress
  • Monitoring
  • Logging

Deploy Application

  • Application based scripts
  • CI/CD
  • Helm
slide-21
SLIDE 21

2

Testing

slide-22
SLIDE 22

2

Testing - Preparation

Create a test plan

For every test describe

  • Starting point
  • Test details
  • Expected result

When executing the test

  • Prepare and verify starting point
  • Execute test
  • Document the test execution
  • Document the test results
  • Compare test results with expectation
  • Repeat the test several times

2

slide-23
SLIDE 23

Ensure all fault tolerance tests are done with load on the system Network failure

  • Single / Multiple NIC
  • Single / Multiple Switches
  • Cluster / Public Network

Node failure

  • Admin
  • Master
  • Worker

Testing - Fault Tolerance

slide-24
SLIDE 24

Operations

slide-25
SLIDE 25

Life Cycle

  • New Patches
  • Create new Stage on Staging System
  • Assign new Stage to Admin and Nodes
  • Wait until next day or “transactional-update dup reboot”
  • Access Velum - reboot admin
  • Ensure NO Single Pod application runs in the cluster*
  • Access Velum - reboot all
slide-26
SLIDE 26
  • Old: cAdvisor, Heapster, InfluxDB, Grafana
  • New: cAdvisor with Prometheus and Grafana
  • Alertmanager
  • Logfile collection and cleanup
  • Disk space usage
  • Application Specific Monitoring?

Monitoring and Logging

slide-27
SLIDE 27
  • Don´t do backup and recovery
  • Everything that is deployed to the cluster must be 100% reproducible
  • Use a second cluster for disaster recovery and deploy the application twice
  • Have proper staging for the application
  • For persistent data - the application MUST support consistent backup and

restore and this can not be done on the k8s side of things

  • Recommendation: use a GIT or similar source code management system
  • Disaster Recovery: delete the whole cluster, de-deploy and re-configure the

cluster, re-deploy the application and restore the applications data via application functionality

Backup and Recovery (1)

slide-28
SLIDE 28

2

Backup and Recovery (2)

  • Backup ETCD
  • LDIF export of openLDAP
  • Snapshot of Admin VM
  • Power off everything and snapshot
  • Kubectl export
  • GIT / Helm / Yaml File backup and versioning
  • Backup of Persistent Volumes
  • Single object restore?
  • Create an alias for kubectl - -record
slide-29
SLIDE 29

2

Questions?

slide-30
SLIDE 30

Questions?

Requirements uirements Plan Planning and Sizi and Sizing Deployment B loyment Best t Practices ctices Testing ting Operations rations

slide-31
SLIDE 31
slide-32
SLIDE 32

Backup slides

slide-33
SLIDE 33

3

General Requirements ->

  • Hardware / Virtualization Infrastructure

–Where to deploy? –On premise or public / private cloud?

  • Software

–CaaS Platform Subscriptions (plus SLES for Infrastructure Service)

  • Sales and Pre-/Post-Sales Consulting

–For architecture and to buy the right hardware –For the initial implementation

  • Support

–24/7 in case of issues

  • Maintenance and pro-active support

–Scale, Upgrade, Review and Fix

slide-34
SLIDE 34

BENEFIT SAVINGS

  • Where to deploy?
  • On premise or public / private

cloud?

$ $

Improve operational efficiency, reduce costs, keep developers focused on development

$ $

Get to market faster, with fewer disruptions Eliminate surprises between devel-

  • pment and production environ-

ments—innovate faster Hardware / Virtualization Infrastructure Foster agile development and business

  • pportunities

Integrate development and operations Manage container lifecycles

slide-35
SLIDE 35

3

Use Case Specific Requirements

  • Application Requirements (Sizing)

–Number of Pods –Memory, CPU –Storage requirements (file, block, s3?, single or multi-writer, capacity, static or dynamic provisioning) –specific Hardware / CPU / GPU requirements

–Network Entry points / Services / Bandwidth

  • Security Requirements

–Images (source and size) –Isolation –Integration into existing Identity Sources –Certificate Authorities

  • Availability Requirements

–Single or multi data-center –Distance / Latency

  • Budget
  • Politics, Religion, Philosophy, Processes ;-)
slide-36
SLIDE 36

3

Planning and Sizing

  • 1 Admin VM

–LDAP, Salt, Velum, SQL

  • 3 Master VM (more based on number of PODs)

–Fault tolerance –ETCD cluster

  • 3 or more Worker (more based on number of PODs and resource requirements)

–Workers as VM or physical

  • Second Cluster for fault tolerance / disaster recovery
  • Disk Space for each Worker

–50 GB for OS (BTRFS minimum for OS) –100 GB for /var/lib/docker (BTRFS for Images and Containers) –(really depends on image sizes and image versions / image changes)

slide-37
SLIDE 37

Use Case Specific Requirements

slide-38
SLIDE 38

1 Infrastructur e Preparation 2 Base Software Installation 3 Infrastructur e Verification 4 CaaS Platform Installation 5 Kubernetes Addons

Deployment Stages

I

slide-39
SLIDE 39

1 Infrastructur e Preparation 2 Base Software Installation 3 Infrastructur e Verification 4 CaaS Platform Installation 5 Kubernetes Addons

slide-40
SLIDE 40

3 Infrastructur e Verification 1 Infrastructur e Preparation 1 Infrastructur e Preparation 1 Infrastructur e Preparation e Preparation 2 Base Software Installation e Verification Platform Installation e Preparation 1 Infrastructur e Preparation 2 Base Software Installation 3 Infrastructur e Verification 4 CaaS Platform Installation 2 Base Software Installation e Preparation 1 Infrastructur e Preparation 1 Infrastructur e Preparation 1 Infrastructur e Preparation 1 Infrastructur e Preparation 4 CaaS Platform Installation

slide-41
SLIDE 41
slide-42
SLIDE 42

GRAYLOG PROMETHEUS GRAFANA

DATA CENTER 1 Monitoring network L2

slide-43
SLIDE 43

5

Deployment - Infrastructure Preparation

  • Deploy On-Premise Registry (docker-distribution-registry)

–Implement Portus to Secure the On-Premise Registry

–Create DNS entry for Registry –Create Namespaces and Users on Registry –Optional: Integrate Portus into existing LDAP or Active-Directory

  • Put all required images into registry into the right namespace

–Dashboard, Prometheus, Grafana, etc.

  • Optional: Setup caching registries
  • Prepare Load Balancer Entpoints for API and DEX

–Port 6443 and 32000

  • Storage Network setup and connectivity
slide-44
SLIDE 44

5

Deployment - Infrastructure Preparation

  • Prepare on-premise helm chart repository
  • Prepare docker host to pull from internet, scan images, push to on-

premise registry

  • Prepare GIT for storing all yaml files
  • ToDo: Monitoring/Logging/Backup???
slide-45
SLIDE 45

5

Deployment - Software Installation

  • Software Staging

–Subscription Management Toolkit, SUSE Manager, RMT (limited) –Ensure staging of patches to guarantee same patch level on existing servers and newly installed servers

  • General

–Use BTRFS for the OS –Disable Firewall / AppArmor / IPv6

  • AutoYaST

–Ensure that all servers are installed 100% identical –Consulting solution available (see https://github.com/Martin-Weiss/cif)

  • Configuration Management

–Templates –Salt

slide-46
SLIDE 46

5

Deployment – Infrastructure Verification

  • ONLY USE STATIC IP Configs
  • Verify Time Synchronization
  • Verify Name Resolution
  • Verify repository sources are ok (stagging)
  • Test all Network Connections

–Bandwidth, Latency

slide-47
SLIDE 47

5

Deployment - Step by Step

  • Install all Servers (Admin, Master, Worker) via AutoYaST
  • Ensure that all the patches available are installed at this point in time
  • AutoYaST configures Salt to ensure all Master/Worker connect to Salt-Master on the

Admin host

  • Access Velum web-interface and create admin user
  • Specify Internal Dashboard FQDN (CNAME)
  • Enable Tiller (for later Helm usage)
  • Configure the overlay network
  • Add the SSL certificate of the CA signing the registry and external LDAP certificates
  • Accept Nodes, Assign Roles
  • Specify External API FQDN (load balancer for API and DEX)
  • Specify External Velum FQDN (cname)
  • Run Bootstrap (and now hope and pray ;-))