DevOps: Where is My PodPod Hello! I am smalltown MaiCoin Site - - PowerPoint PPT Presentation

devops where is my podpod hello
SMART_READER_LITE
LIVE PREVIEW

DevOps: Where is My PodPod Hello! I am smalltown MaiCoin Site - - PowerPoint PPT Presentation

DevOps: Where is My PodPod Hello! I am smalltown MaiCoin Site Reliability Engineer Taipei HashiCorp UG Organizer AWS UG Taiwan Staff Pets vs Cattle GUI Driven API Driven Ticket Based Self Service Hand Crafted


slide-1
SLIDE 1

DevOps: Where is My PodPod

slide-2
SLIDE 2

Hello!

I am smalltown

MaiCoin Site Reliability Engineer Taipei HashiCorp UG Organizer AWS UG Taiwan Staff

slide-3
SLIDE 3
  • GUI Driven
  • Ticket Based
  • Hand Crafted
  • Reserved
  • Scale-Up
  • Smart Hardware
  • Proprietary
  • Waterfall Ops
  • ...
  • API Driven
  • Self Service
  • Automated
  • On Demand
  • Scale-Out
  • Smart Apps
  • Open Source
  • Agile DevOps
  • ...

Pets vs Cattle

slide-4
SLIDE 4

Kubernetes = Cattle Pattern

slide-5
SLIDE 5

After Using Kubernetes?

slide-6
SLIDE 6

Livestock Industry Requires Expertise

System

Feeding Breeding Animal Health Range of Species Product

slide-7
SLIDE 7

The Same Thing Happened in K8S

Pod is Pending

Node Not Ready

App Not Redundancy Out of Resource Pod Not in Right Node Interfere W/ Each Other

slide-8
SLIDE 8

Yes, You are Involved in Livestock Industry Now!

真的變成 “碼農” 了...

slide-9
SLIDE 9

Cluster Pattern Resource Management Pod Arrangement

slide-10
SLIDE 10

Cluster Pattern Resource Management Pod Arrangement

slide-11
SLIDE 11

How to Arrange Application Workload?

  • If There are 3 Applications, 3 Environments (Alpha, Beta,

Production) ...

  • Run All Application Instances on a Single Cluster?
  • A Separate Cluster for Each Application Instance?
  • A Combination of the Above?
slide-12
SLIDE 12

One Large Shared Cluster

👎 Efficient Resource Usage 👎 Cheap 👎 Efficient Administration 👏 Single Point of Failure 👏 No Hard Security Isolation 👏 No Hard Multi-Tenancy 👏 Many Users 👏 Clusters Can't Grow Infinitely Large

Alpha Beta Prod Alpha Beta Prod Alpha Beta Prod

slide-13
SLIDE 13

Many Small Single-Use Clusters

👎 Reduced Blast Radius 👎 Isolation 👎 Few Users 👏 Inefficient Resource Usage 👏 Expensive 👏 Complex Administration

Alpha Beta Prod Alpha Beta Prod Alpha Beta Prod

slide-14
SLIDE 14

Cluster per Application

👎 Cluster Can be Customised for an App 👏 Different Environments in the Same Cluster

Alpha Beta Prod Alpha Beta Prod Alpha Beta Prod

slide-15
SLIDE 15

Cluster per Environment

👎 Isolation of the Prod Environment 👎 Cluster can be Customised for an Environment 👎 Lock Down Access to Prod Cluster 👏 Lack of Isolation Between Apps 👏 App Requirements are Not Localised

Alpha Beta Prod

slide-16
SLIDE 16

Which One is Better?

  • Depends on Your Use Case
  • Trade-Off the Pros and Cons of the Different Approaches
  • The Choice is Not Limited to the Above Examples
  • It can be Any Combination of Them!

Ref

slide-17
SLIDE 17

Multiple (Availability) Zones

  • Multiple, Isolated Locations Within Each Region
  • Protect your Application Against (Availability) Zone

Disruption

slide-18
SLIDE 18

Network Latency

  • Take AWS for Example, Inter-AZ Network Latency Depends
  • n Different Region, General Below 10 ms
  • Does It Matter?
slide-19
SLIDE 19

Persistent Volume

  • High Efficiency Storage and Pod Need to Stay in the Same

(Availability) Zone

  • What is the Problem?
slide-20
SLIDE 20

Extra Cost

  • AWS/Azure/GCP Regional Data Transfer is Charged at $

0.01/GB

  • Large Amount of Data Transfer will Lead to Huge Cost

(GitLab)

slide-21
SLIDE 21

Cluster Pattern Resource Management Pod Arrangement

slide-22
SLIDE 22

How to Put Pod in the Right Node

  • Dedicated Nodes
  • Nodes with Special Hardware
  • Taint based Evictions
slide-23
SLIDE 23

Node Selector

apiVersion: v1 kind: Pod ... spec: containers:

  • name: cattle

image: cattle imagePullPolicy: IfNotPresent nodeSelector: land: grass

land:grass land:grass ❤

slide-24
SLIDE 24

Node Affinity - Required

apiVersion: v1 kind: Pod metadata: name: with-node-affinity spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms:

  • matchExpressions:
  • key: kubernetes.io/land
  • perator: In

values:

  • pasture-1
  • pasture-2

...

kubernetes.io/land: pasture-1 http://kubernetes.io/land: pasture-1 or pasture-2 ❤

slide-25
SLIDE 25

Node Affinity - Preferred

╮(╯_╰)╭ apiVersion: v1 kind: Pod metadata: name: with-node-affinity spec: affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution:

  • weight: 1

preference: matchExpressions:

  • key: kubernetes.io/land
  • perator: In

values:

  • pasture-1
  • pasture-2

...

http://kubernetes.io/land: pasture-1 or pasture-1 😣 kubernetes.io/land: pasture-3

slide-26
SLIDE 26

Taint

apiVersion: v1 kind: Pod metadata: name: cattle labels: env: test spec: containers:

  • name: cattle

image: cattle imagePullPolicy: IfNotPresent

land=mud:NoSchedule

slide-27
SLIDE 27

Toleration

... spec: containers:

  • name: pig

image: pig imagePullPolicy: IfNotPresent tolerations:

  • key: "land"
  • perator: "Equal"

value: "mud" effect: "NoSchedule"

land=mud:NoSchedule

slide-28
SLIDE 28

Inter-Pod Affinity

apiVersion: v1 kind: Pod metadata: name: with-pod-affinity spec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution:

  • labelSelector:

matchExpressions:

  • key: species
  • perator: In

values:

  • cattle

topologyKey: failure-domain.beta.kubernetes.io/land land:grass land:grass land:mud land:mud

slide-29
SLIDE 29

Inter-Pod Anti-Affinity

apiVersion: v1 kind: Pod metadata: name: with-pod-affinity spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution:

  • labelSelector:

matchExpressions:

  • key: species
  • perator: In

values:

  • cattle

topologyKey: failure-domain.beta.kubernetes.io/land land:grass land:mud

slide-30
SLIDE 30

Why Need PodTopologySpread?

apiVersion: v1 kind: Pod metadata: name: with-pod-affinity spec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution:

  • labelSelector:

matchExpressions:

  • key: species
  • perator: In

values:

  • cattle

topologyKey: failure-domain.beta.kubernetes.io/land land:grass land:grass land:mud land:mud

slide-31
SLIDE 31

spec: topologySpreadConstraints:

  • maxSkew: 1

topologyKey: land whenUnsatisfiable: DoNotSchedule labelSelector: species: cattle

How PodTopologySpread Work?

land:grass land:grass skew=3 ❌ skew=0 ✅

slide-32
SLIDE 32

Cluster Pattern Resource Management Pod Arrangement

slide-33
SLIDE 33

Why Need Resource Management?

  • Avoid Out of Control Application Affect Others
  • Application Support Scale Out Ability
  • Easy to Plan Cluster Overall Capability
  • Ensure The Most Important Application Survival and Safety
  • ...
slide-34
SLIDE 34

Everyone Knows Resource Request & Limit

Ref

slide-35
SLIDE 35
  • Default Memory Requests and Limits for a Namespace
  • Default CPU Requests and Limits for a Namespace
  • Minimum and Maximum Memory Constraints for a

Namespace

  • Minimum and Maximum CPU Constraints for a Namespace
  • Memory and CPU Quotas for a Namespace
  • Pod Quota for a Namespace

When K8S Users Ignore You 😇

slide-36
SLIDE 36

But Do You Know Pod QoS?

  • Guaranteed: Every Container in the Pod Must Have a

Memory/CPU Limit and a Memory/CPU Request, and They Must be the Same

  • Burstable: Not Meet the Criteria for QoS Class

Guaranteed, and At Least one Container in the Pod has a Memory or CPU Request

  • BestEffort: Not Have Any Memory or CPU Limits or

Requests

slide-37
SLIDE 37

When Out of Resource...

  • BestEffort Pods
  • Burstable Pods Whose Resource Usage Exceeds Its Request
  • Burstable Pods Whose Resource Usage is Beneath Its

Request

  • Guaranteed Pods

฀฀

slide-38
SLIDE 38

Pod Disruptions

  • Voluntary and Involuntary Disruptions
  • Dealing with Disruptions

○ Ensure Pod Requests Appropriate Resources ○ Replicate Your Application ○ Spread Applications Across Racks (Using Anti-Affinity)

  • r Across Zones (if Using a Multi-Zone Cluster)
slide-39
SLIDE 39

Perform a Disruptive Action on All the Nodes

  • Accept Downtime
  • Failover to Another Complete Replica Cluster
  • Use Pod Disruption Budget
slide-40
SLIDE 40

Pod Disruption Budget (1/6)

PDB = At Least 2 of The 3 Pods to be Available at All Times

slide-41
SLIDE 41

Pod Disruption Budget (2/6)

PDB = At Least 2 of The 3 Pods to be Available at All Times

slide-42
SLIDE 42

Pod Disruption Budget (3/6)

PDB = At Least 2 of The 3 Pods to be Available at All Times

slide-43
SLIDE 43

Pod Disruption Budget (4/6)

PDB = At Least 2 of The 3 Pods to be Available at All Times

slide-44
SLIDE 44

Pod Disruption Budget (5/6)

PDB = At Least 2 of The 3 Pods to be Available at All Times

slide-45
SLIDE 45

Pod Disruption Budget (6/6)

PDB = At Least 2 of The 3 Pods to be Available at All Times

slide-46
SLIDE 46

Pod Priority and Preemption

apiVersion: scheduling.k8s.io/v1 kind: PriorityClass metadata: name: high-priority-nonpreempting value: 1000000 preemptionPolicy: Never/PreemptLowerPriority globalDefault: false/true description: "Pod Priority and Preemption"

฀฀

slide-47
SLIDE 47

THANKS!

ANY QUESTIONS? You can find me at my office:

  • Frontend Engineer
  • Backend Engineer