tut1131 best practices in deploying suse caas platform
play

TUT1131 - Best Practices in Deploying SUSE CaaS Platform Martin - PowerPoint PPT Presentation

TUT1131 - Best Practices in Deploying SUSE CaaS Platform Martin Weiss Juan Utande Herrera Senior Architect Infrastructure Solutions Senior Architect Infrastructure Solutions Martin.Weiss@SUSE.com Juan.Herrera@suse.com AGENDA AGEN What


  1. TUT1131 - Best Practices in Deploying SUSE CaaS Platform Martin Weiss Juan Utande Herrera Senior Architect Infrastructure Solutions Senior Architect Infrastructure Solutions Martin.Weiss@SUSE.com Juan.Herrera@suse.com

  2. AGENDA AGEN What What is is SUSE SUSE CaaS CaaS 4 Deployment B loyment Best P t Practices ctices 1 Platform tform 2 Requirements uirements 5 Testing ting 3 Planning and Sizi Plan and Sizing 6 Operations rations

  3. What is SUSE CaaS Plaform 3 3

  4. SUSE: Underpinning Digital Transformation Business-critical Machine Business High Performance Traditional IT Internet of Applications Learning Analytics Computing & Applications Things Application Delivery Container Management Platform as a Service SUSE CaaS Platform SUSE Cloud Application Platform Software-defined Infrastructure Services Private Cloud / IaaS SUSE OpenStack Cloud SUSE Global Infrastructure Services & Lifecycle Compute Storage Consulting Management Networking Services Virtual Machine SUSE Enterprise Public Cloud SDN and NFV Select Services & Container Storage SUSE Manager Premium Support SUSE Cloud Services Multimodal Operating System Service Provider SUSE Linux Enterprise Server Program Physical Infrastructure: Multi-platform Servers, Switches, Storage Open, Secure, Proven 4

  5. What is SUSE CaaS Platform 3? • Kubernetes • MicroOS with Transactional Updates • Simple deployment • SUSE supported • LDAP / Active Directory Integration • Caching Registry Integration • Air Gapped Implementation Support • Registry.suse.com • Helm • Docker or Cri-o (tech preview), Flannel • Multiple deployment methods

  6. Requirements 6

  7. General requirements Where to deploy What do I need Who can help me Support options • Deploy on physical • SUSE CaaS • Sales and Pre/Post • Included 24/7 Hardware or on your Platform Sales Consulting: priority support in Virtualization subscriptions case of issues - Help choosing the infrastructure • SLES for right Hardware • Consulting for • Ready to Run on infrastructure nodes maintenance and - Architect the Public and Private proactive support to solution Clouds scale, upgrade, - Initial review and fix implementation

  8. Use Case Specific Requirements Application Requirements Security Requirements Availability Requirements (Sizing) • Number of Pods • Images (source and size) • Single or multi data-center • Memory, CPU • Isolation • Distance / Latency • Storage requirements (file, • Integration into existing block, object, single or multi- Identity Sources writer, capacity, static or dynamic provisioning) • specific Hardware / CPU / GPU requirements • Network Entry points / Services / Bandwidth $$$ BUDGET $$$ Politics, Religion, Philosophy, Processes ;-)

  9. Planning and Sizing

  10. Planning and Sizing SUSE C SE CaaS P S Platform – CLUS USTER 1 R 1 Kuber ernet etes es Master Master Master + Admin Based Workers as VM or physical on number of pods Worker Worker Worker + Fault tolerance Based on LDAP, Salt, number of pods ETCD cluster and resource Velum, SQL requirements Disk Space for each Worker: Second cluster: • 50 GB for OS (BTRFS minimum for OS) • Fault • 100 GB for /var/lib/docker (BTRFS for Images tolerance and Containers) • Disaster • Space really depends on image sizes, versions recovery and changes

  11. Deployment Best Practices 1

  12. Deployment - Processes and People Prepare the Team (DevOps?) – Server – Storage – Network – Application – Security – User Other

  13. Deployment Stages 2 1 3 Base Infrastructur Infrastructur Software e e Installation Preparation Verification 4 5 SUSE Kubernetes CaaS Addons Platform Installation

  14. Deployment Review the Design Preparation of Time Synchronization • Depending on the requirements adjust • Have a fault tolerant time provider group before implementation Name Resolution Hardware Installation • Ensure that all addresses of the servers • Ensure that hardware installation and have different names cabling is correct • Add all addresses to DNS with forward and • Update Firmware reverse lookup • Adjust Firmware / BIOS settings • Ensure DNS is fault tolerant Disable everything not required ( i.e. serial • /etc/HOSTNAME must be the name in the ports, network boot, power saving ) public network Configure HW date/time • Define and create DNS Entries for internal and external Velum and API targets (Cname, VM Preparation Load Balancer, no round robin) • Use paravirtual SCSI

  15. Deployment Deploy On-Premise Registry (docker-distribution-registry) • Implement Portus to Secure the On-Premise Registry • Create DNS entry for Registry • Create Namespaces and Users on Registry • Optional: Integrate Portus into existing LDAP or Active-Directory Put all required images into registry into the right namespace • Dashboard, Prometheus, Grafana, etc. Optional: Setup caching registries

  16. Deployment Prepare Load Balancer Endpoints for API and DEX • Port 6443 and 32000 Storage Network setup and connectivity Prepare on-premise helm chart repository Prepare docker host to pull from internet, scan images, push to on- premise registry Prepare GIT for storing all manifests / yaml files

  17. Deployment Software Staging AutoYaST • Subscription Management Toolkit, SUSE • Ensure that all servers are installed Manager, RMT (limited) 100% identical • Ensure staging of patches to guarantee • Consulting solution available (see same patch level on existing servers and https://github.com/Martin-Weiss/cif) newly installed servers Configuration Management General • Templates • Use BTRFS for the OS • Salt • Disable Firewall / AppArmor / IPv6

  18. Deployment ONLY USE STATIC IP Configs Verify Time Synchronization Verify Name Resolution Test all Network Connections • Bandwidth • Latency

  19. Deployment • Install all Servers (Admin, Master, Worker) via AutoYaST • Ensure that all the patches available are installed at this point in time • AutoYaST configures Salt to ensure all Master/Worker connect to Salt-Master on the Admin host • Access Velum web-interface and create admin user • Specify Internal Dashboard FQDN (CNAME) • Enable Tiller (for later Helm usage) • Configure the overlay network • Add the SSL certificate of the CA signing the registry and external LDAP certificates • Accept Nodes, Assign Roles • Specify External API FQDN (load balancer for API and DEX) • Specify External Velum FQDN (CNAME) • Run Bootstrap (and now have a cup of coffee ;-))

  20. Deployment Create required Namespaces Create required Users / Groups in LDAP or Connect to Active Directory Create Roles and Role-Assignments Deploy Basic Services • K8s Dashboard • Persistent Storage / Storage Classes • Ingress • Monitoring • Logging Deploy Application • Application based scripts • CI/CD • Helm

  21. Testing 2

  22. Testing - Preparation Create a test plan For every test describe • Starting point • Test details • Expected result When executing the test • Prepare and verify starting point • Execute test • Document the test execution • Document the test results • Compare test results with expectation • Repeat the test several times 2 2

  23. Testing - Fault Tolerance Ensure all fault tolerance tests are done with load on the system Network failure • Single / Multiple NIC • Single / Multiple Switches • Cluster / Public Network Node failure • Admin • Master • Worker

  24. Operations

  25. Life Cycle • New Patches • Create new Stage on Staging System • Assign new Stage to Admin and Nodes • Wait until next day or “transactional-update dup reboot” • Access Velum - reboot admin • Ensure NO Single Pod application runs in the cluster* • Access Velum - reboot all

  26. Monitoring and Logging • Old: cAdvisor, Heapster, InfluxDB, Grafana • New: cAdvisor with Prometheus and Grafana • Alertmanager • Logfile collection and cleanup • Disk space usage • Application Specific Monitoring?

  27. Backup and Recovery (1) Don ´ t do backup and recovery • • Everything that is deployed to the cluster must be 100% reproducible • Use a second cluster for disaster recovery and deploy the application twice • Have proper staging for the application • For persistent data - the application MUST support consistent backup and restore and this can not be done on the k8s side of things • Recommendation: use a GIT or similar source code management system • Disaster Recovery: delete the whole cluster, de-deploy and re-configure the cluster, re-deploy the application and restore the applications data via application functionality

  28. Backup and Recovery (2) • Backup ETCD • LDIF export of openLDAP • Snapshot of Admin VM • Power off everything and snapshot • Kubectl export • GIT / Helm / Yaml File backup and versioning • Backup of Persistent Volumes • Single object restore? • Create an alias for kubectl - -record 2

  29. Questions? 2

  30. Questions? Deployment B loyment Best t Requirements uirements Plan Planning and Sizi and Sizing Practices ctices Testing ting Operations rations

  31. Backup slides

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend