More Containers, More Problems Ed Rooth @sym3tri | - - PowerPoint PPT Presentation

more containers more problems
SMART_READER_LITE
LIVE PREVIEW

More Containers, More Problems Ed Rooth @sym3tri | - - PowerPoint PPT Presentation

More Containers, More Problems Ed Rooth @sym3tri | ed.rooth@coreos.com | coreos.com Agenda 1. Define problems 2. Define vision of the solution 3. How CoreOS is building solutions 4. How you can get started It all started with... a


slide-1
SLIDE 1

Ed Rooth

@sym3tri | ed.rooth@coreos.com | coreos.com

More Containers, More Problems

slide-2
SLIDE 2
  • 1. Define problems
  • 2. Define vision of the solution
  • 3. How CoreOS is building solutions
  • 4. How you can get started

Agenda

slide-3
SLIDE 3

a server

It all started with...

slide-4
SLIDE 4

many servers

Then we got...

slide-5
SLIDE 5

VMs on our servers

Then we got...

slide-6
SLIDE 6

APIs around hosted VMs (cloud)

Then we got...

slide-7
SLIDE 7

even more servers

Which led to...

slide-8
SLIDE 8

The cloud made booting servers really easy. Also… Moore’s law is still a thing.

Too Many Servers!

slide-9
SLIDE 9

Patching………………………..is hard Dependency management........is hard Managing access ……………...is hard Managing workloads ………....is hard App Lifecycle management .. ..is hard Identifying security issues ......is hard

More Servers, More Problems

slide-10
SLIDE 10

More Servers == More Sysadmins

Servers Sysadmins

1000 500

slide-11
SLIDE 11

1000 500

More Servers, More Problems

Servers Sysadmins

slide-12
SLIDE 12

… before the rest of us did. They solved many of these problems internally, and published some great papers.

Google needed more servers

slide-13
SLIDE 13

We started building it

CoreOS, Google, and the community... are building the open-source version.

slide-14
SLIDE 14

#GIFEE

slide-15
SLIDE 15

Google’s Infrastructure For Everyone Else

What is #GIFEE?

slide-16
SLIDE 16

"Fundamentally, it's what happens when you ask a software engineer to design an operations function."

  • -Ben Treynor Sloss

Vice President, Google Engineering founder of Google SRE

Google’s Infrastructure

slide-17
SLIDE 17
slide-18
SLIDE 18

Servers are not your pets Servers are the new CPU Cores Clusters are the new servers

What is #GIFEE?

slide-19
SLIDE 19

Evolution of Servers

slide-20
SLIDE 20

Clusters

Server Cluster

slide-21
SLIDE 21

Clusters

Process App

slide-22
SLIDE 22

Operating System Custom Linux Distributed Consensus Chubby Cluster Manager Borg Monitoring BorgMon RPC framework Stubby Auth private

slide-23
SLIDE 23

Operating System Custom Linux CoreOS Linux Distributed Consensus Chubby etcd Cluster Manager Borg Kubernetes Monitoring BorgMon Prometheus RPC framework Stubby gRPC Auth private Dex

Open Source

slide-24
SLIDE 24

“cluster operating system”

slide-25
SLIDE 25

Orchestration State Scheduler: Gets work to the servers

OS for Clusters

slide-26
SLIDE 26

Software manages servers Software manages workloads Declare what you want, it will become so

What is #GIFEE?

slide-27
SLIDE 27

worker kubelet worker kubelet worker kubelet worker kubelet worker kubelet worker kubelet worker kubelet API + scheduler

slide-28
SLIDE 28

worker kubelet API + scheduler

slide-29
SLIDE 29

API + Scheduler + worker

works on 1 node too

slide-30
SLIDE 30

Primary component of the Cluster OS Fits our vision Started by Google with over 10 yrs experience running Borg

slide-31
SLIDE 31

Centralized administration & orchestration No more SSH Yes, that even means your favorite config mgmt tool

What is #GIFEE?

slide-32
SLIDE 32

What is #GIFEE?

$ scp myapp host:/opt $ ssh host systemd-run /opt/myapp

Don’t say HOW

slide-33
SLIDE 33

What is #GIFEE?

$ kubectl run myapp

  • -image=quay.io/sym3tri/hello
  • -replicas=1

$ kubectl get pods POD IP myapp-97wt8 10.2.29.3

say WHAT

slide-34
SLIDE 34

What is #GIFEE?

$ kubectl scale rc myapp

  • -replicas=4

$ kubectl get pods POD IP myapp-97wt8 10.2.29.3 myapp-f839d 10.2.29.4 myapp-98b35 10.2.29.5 myapp-e40ee 10.2.29.8

say WHAT again

slide-35
SLIDE 35

What is #GIFEE?

$ kubectl run myapp

  • -image=quay.io/sym3tri/hello
  • -replicas=1

$ kubectl get pods POD IP myapp-97wt8 10.2.29.3

say WHAT

  • ne more time
slide-36
SLIDE 36
slide-37
SLIDE 37

RC web-prod select(env=prod,app=web) count=1 Pod env=prod app=web

slide-38
SLIDE 38

RC web-prod select(env=prod,app=web) count=4 Pod env=prod app=web Pod env=prod app=web Pod env=prod app=web Pod env=prod app=web

slide-39
SLIDE 39

automated != automatic

slide-40
SLIDE 40

Dependencies are isolated per app Apps automatically migrate throughout the cluster

What is #GIFEE?

slide-41
SLIDE 41

All apps are “12-factor” Configuration/Secret management

What is #GIFEE?

prod config staging config

slide-42
SLIDE 42

Consistent Deployment API Deploy canary builds and experiments Rolling Updates

What is #GIFEE?

slide-43
SLIDE 43

Load Balanced Service

app v1 app v1 app v1 app v1

slide-44
SLIDE 44

Load Balanced Service

app v1 app v1 app v1 app v1 app v2

slide-45
SLIDE 45

Load Balanced Service

app v1 app v1 app v1 app v1 app v2

slide-46
SLIDE 46

Load Balanced Service

app v1 app v1 app v1 app v1 app v2

slide-47
SLIDE 47

Load Balanced Service

app v1 app v1 app v1 app v2 app v2

slide-48
SLIDE 48

Load Balanced Service

app v1 app v1 app v2 app v2 app v2

slide-49
SLIDE 49

Load Balanced Service

app v2 app v2 app v2 app v2

slide-50
SLIDE 50

C Team B Team A Team

What is #GIFEE?

Mixed workloads (staging + prod) Logically partitioned resources

slide-51
SLIDE 51

Trusted & Secure from the bottom up* Only trusted code is executed

What is #GIFEE?

Cluster OS Container Runtime OS Firmware & TPM

slide-52
SLIDE 52

Every {human,machine,process} is… authenticated & authorized All communication is encrypted

What is #GIFEE?

worker kubelet API + scheduler

slide-53
SLIDE 53

Failure is expected and handled for…

  • Services / Apps
  • Machines
  • Storage
  • Clusters
  • Regions

What is #GIFEE?

slide-54
SLIDE 54

Logging Monitoring / Alerting

What is #GIFEE?

slide-55
SLIDE 55

Compatibility with existing tools Work with other projects (Docker, Calico, Prometheus) Incorporates lessons learned

#GIFEE vs Google Infra?

slide-56
SLIDE 56

Build for scale Manage your apps, not servers High Availability New paradigm of infra/development

Why?

slide-57
SLIDE 57

We believe: As #GIFEE becomes ubiquitous, the Internet becomes more secure overall

#GIFEE and Security

slide-58
SLIDE 58

Secure the Internet

CoreOS Mission

slide-59
SLIDE 59

Journey to #GIFEE

slide-60
SLIDE 60

Leverage prior work + standards

  • Raft
  • Omaha Protocol
  • OIDC

Getting Started

slide-61
SLIDE 61

Start from the bottom The Operating System

Securing The Internet

slide-62
SLIDE 62

Minimal Server OS + Automatic Updates Requires:

  • Distributed consensus
  • Containers
  • Cluster computing

Securing The Internet

slide-63
SLIDE 63

In this new world we containerize all the things…

Containerize

slide-64
SLIDE 64

but…

Containerize

slide-65
SLIDE 65

“Every solution breeds new problems”

  • Arthur Bloch

1つの問題解決 → 別の問題発生

More Containers, More Problems

slide-66
SLIDE 66

Problem #1

  • Secure & controlled

container distribution

More Containers, More Problems

slide-67
SLIDE 67

Problem #1

  • Secure & controlled

container distribution

More Containers, More Problems

Solution

slide-68
SLIDE 68

More Containers, More Problems

Problem #2

  • Docker security model
  • Docker coupling of

components

slide-69
SLIDE 69

More Containers, More Problems

Problem #2

  • Docker security model
  • Docker coupling of

components Solution

slide-70
SLIDE 70

More Containers, More Problems

systemd app systemd app docker run redis docker engine daemon

slide-71
SLIDE 71

Implementation:

Side Note: Spec vs Implementation

slide-72
SLIDE 72

Side Note: Spec vs Implementation

Specification:

https://en.wikipedia.org/wiki/ISO_668

slide-73
SLIDE 73

More Containers, More Problems

Problem #3

  • User Authentication
slide-74
SLIDE 74

More Containers, More Problems

Problem #3

  • User Authentication

Solution

  • Dex
slide-75
SLIDE 75

More Containers, More Problems

Problem #4

  • Really big containers
slide-76
SLIDE 76

More Containers, More Problems

Problem #4

  • Really big containers

Solution

  • Go
  • Buildroot
  • acbuild for ACIs
slide-77
SLIDE 77

github.com/brianredbeard/minimal_containers NOOOOOOOOO!!!

Your container is 500MB !?

slide-78
SLIDE 78

Problems #5-11

  • Co-locating Containers
  • Intelligent Scheduling
  • Port Management
  • Segmenting workloads
  • Configuration Management
  • Secrets Management
  • Inconsistent Deployments

More Containers, More Problems

slide-79
SLIDE 79

Problems #5-11

  • Co-locating Containers
  • Intelligent Scheduling
  • Port Management
  • Segmenting workloads
  • Configuration Management
  • Secrets Management
  • Inconsistent Deployments

More Containers, More Problems

Solution

slide-80
SLIDE 80

More Containers, More Problems

Problem #12 Networking

  • Too many types of SDNs
  • IP per POD
slide-81
SLIDE 81

More Containers, More Problems

Problem #12 Networking

  • Too many types of SDNs
  • IP per POD

Solution

  • CNI
slide-82
SLIDE 82

More Containers, More Problems

Problem #13

  • Metrics
  • Monitoring
  • Alerting
slide-83
SLIDE 83

More Containers, More Problems

Problem #13

  • Metrics
  • Monitoring
  • Alerting

Solution

  • Prometheus
slide-84
SLIDE 84

More Containers, More Problems

Problem #14

  • Vulnerabilities inside

containers

slide-85
SLIDE 85

More Containers, More Problems

Problem #14

  • Vulnerabilities inside

containers Solution

slide-86
SLIDE 86
slide-87
SLIDE 87

More Containers, More Problems

Problem #15

  • Visualize & configure

clusters

slide-88
SLIDE 88

More Containers, More Problems

Problem #15

  • Visualize & configure

clusters Solution

  • Tectonic Console
slide-89
SLIDE 89
slide-90
SLIDE 90

More Containers, More Problems

Problem #16

  • Running on Bare Metal
slide-91
SLIDE 91

More Containers, More Problems

Problem #16

  • Running on Bare Metal

Solution

  • Ignition
  • coreos-baremetal
  • Tectonic baremetal

installer

slide-92
SLIDE 92

More Containers, More Problems

Problem #17

  • Inability to verify node

trust

slide-93
SLIDE 93

More Containers, More Problems

Solution

  • Distributed Trusted

Computing (DTC) Problem #17

  • Inability to verify node

trust

slide-94
SLIDE 94

More Containers, More Problems

Problem #18

  • Persistent storage
slide-95
SLIDE 95

More Containers, More Problems

Solution

  • Torus

Problem #18

  • Persistent storage
slide-96
SLIDE 96

Kubernetes is the kernel, Tectonic is the distro.

tectonic.com @tectonic

slide-97
SLIDE 97
  • ff-the-shelf #GIFEE
slide-98
SLIDE 98

Kubernetes Contributions

OIDC Authentication RBAC Authorization TLS Bootstrapping rktnetes 2x Scheduler Performance etcd 3 support coreos-kubernetes Bootstrap/Upgrade Simplification

slide-99
SLIDE 99

Future

More Management Tools Expand platform support Prometheus Enhancements Federated Clusters

slide-100
SLIDE 100

Summary

Open-Source is key Security is key Updates are key Containers Orchestration Automatic systems

slide-101
SLIDE 101

Ed Rooth

@sym3tri | ed.rooth@coreos.com | coreos.com

More Containers, More Problems

slide-102
SLIDE 102

We’re hiring in all departments! Email: careers@coreos.com Positions: coreos.com/ careers

90+ Projects on GitHub, 1,000+ Contributors

OPEN SOURCE

CoreOS.com - @coreoslinux - github/coreos Secure solutions, support plans, training + more

ENTERPRISE

sales@coreos.com - tectonic.com - quay.io

CoreOS is Running the World’s Containers