Managing Containers with Helix Kanak Biscuitwala Jason Zhang - - PowerPoint PPT Presentation

managing containers with helix
SMART_READER_LITE
LIVE PREVIEW

Managing Containers with Helix Kanak Biscuitwala Jason Zhang - - PowerPoint PPT Presentation

Managing Containers with Helix Kanak Biscuitwala Jason Zhang Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix Intersection of Job Types Oracle Oracle DB DB Intersection of Job Types Oracle Oracle DB DB Backup Backup


slide-1
SLIDE 1

Managing Containers with Helix

Kanak Biscuitwala Jason Zhang Apache Helix Committers @ LinkedIn

helix.apache.org @apachehelix

slide-2
SLIDE 2

Intersection of Job Types

Oracle

DB

Oracle

DB

slide-3
SLIDE 3

Intersection of Job Types

Oracle

DB

Oracle

DB

Backup Backup

slide-4
SLIDE 4

Intersection of Job Types

Oracle

DB

Oracle

DB

Backup Backup HDFS ETL ETL

slide-5
SLIDE 5

Intersection of Job Types

Oracle

DB

Oracle

DB

Backup Backup HDFS ETL ETL

Long-running and batch jobs running together!

slide-6
SLIDE 6

Cloud Deployment

A B

  • nline

nearline C batch A1 A1 A2 A3 B1 C1 C2 C3 B2 B3 C2 B4 B5 C2 C4

Applications with diverse requirements running together in a datacenter

slide-7
SLIDE 7

Cloud Deployment

A B C A1 A1 A2 A3 B1 C1 C2 C3 B2 B3 C2 B4 B5 C2 C4

Applications with diverse requirements running together in a datacenter

DB Backup ETL

slide-8
SLIDE 8

Processes on Machines

Machine

Container Process VM

slide-9
SLIDE 9

Processes on Machines

Task Task Process No Isolation

Machine

Container Process VM

slide-10
SLIDE 10

Processes on Machines

Task Task Process 128 MB 128 MB 128 MB Process Process Process No Isolation VM-based Isolation

Machine

Container Process VM

slide-11
SLIDE 11

Processes on Machines

Task Task Process 256 MB 64 MB 128 MB 128 MB 128 MB Process Process Process Process Process No Isolation VM-based Isolation Container-based Isolation

Machine

Container Process VM

slide-12
SLIDE 12
  • Run as individual processes

– Poor isolation or poor utilization

  • Virtual machines

– Better isolation – Xen, Hyper-V, ESX, KVM

  • Containers

– cgroup – YARN, Mesos – Super lightweight, dynamic based on application requirements

Processes on Machines

slide-13
SLIDE 13

Processes on Machines

Virtualization and containerization significantly improve process isolation and open up possibilities for efficient utilization of physical resources

slide-14
SLIDE 14

Container-Based Solution

slide-15
SLIDE 15

Container-Based Solution

System Requirements

A B C

64 MB 64 MB 64 MB 128 MB 128 MB 256 MB

slide-16
SLIDE 16

Container-Based Solution

Allocation

64 MB 64 MB 128 MB 256 MB 128 MB 64 MB

Machine

Container

slide-17
SLIDE 17

Container-Based Solution

Allocation

64 MB 64 MB 128 MB 256 MB 128 MB 64 MB

Machine

Container A A A B B C Process

slide-18
SLIDE 18

Container-Based Solution

Allocation

64 MB 64 MB 128 MB 256 MB 128 MB 64 MB

Containerization is powerful!

Machine

Container A A A B B C Process

slide-19
SLIDE 19

Container-Based Solution

Allocation

64 MB 64 MB 128 MB 256 MB 128 MB 64 MB

Containerization is powerful!

Machine

Container A A A B B C Process

But do processes always fit so nicely?

slide-20
SLIDE 20

Over-Utilization

256 MB

Container-Based Solution

Machine

Container Process

slide-21
SLIDE 21

Over-Utilization

256 MB Process 1

Container-Based Solution

Machine

Container Process

slide-22
SLIDE 22

Over-Utilization Outcome: Preemption and relaunch

256 MB Process 1

Container-Based Solution

Machine

Container Process

slide-23
SLIDE 23

Over-Utilization Outcome: Preemption and relaunch

Container-Based Solution

384 MB

Machine

Container Process

slide-24
SLIDE 24

Over-Utilization Outcome: Preemption and relaunch

Container-Based Solution

384 MB Process 1

Machine

Container Process

slide-25
SLIDE 25

Under-Utilization

384 MB 128 MB

Container-Based Solution

Machine

Container Process

slide-26
SLIDE 26

Under-Utilization Outcome: Over-provisioned until restart

384 MB Process 1 128 MB

Container-Based Solution

Machine

Container Process Process 2

slide-27
SLIDE 27

Container-Based Solution

Failure

64 MB 64 MB 128 MB 256 MB 128 MB 64 MB

Machine

Container A A A B B C Process

slide-28
SLIDE 28

Container-Based Solution

Failure

64 MB 64 MB 128 MB 128 MB

Machine

Container A A B B Process

slide-29
SLIDE 29

Container-Based Solution

Failure

64 MB 64 MB 128 MB 128 MB

Outcome: Launch containers elsewhere

Machine

Container A A B B Process 256 MB C 64 MB A

What about stateful systems?

slide-30
SLIDE 30

Container-Based Solution

Failure

64 MB 64 MB 128 MB 256 MB 128 MB 64 MB

Machine

Container SLAVE SLAVE MASTER B B C Process

slide-31
SLIDE 31

Container-Based Solution

Failure

64 MB 64 MB 128 MB 128 MB

Without additional information, the master is unavailable until restart

Machine

Container SLAVE SLAVE B B Process

slide-32
SLIDE 32

Scaling

Container-Based Solution

Machine

Container Process 256 MB 50% 256 MB 50%

slide-33
SLIDE 33

Scaling

Container-Based Solution

Machine

Container Process

slide-34
SLIDE 34

Scaling

Container-Based Solution

Machine

Container Process 128 MB 33% 128 MB 33% 128 MB 33%

Outcome: Relaunch with new sharding

slide-35
SLIDE 35

Container-Based Solution

Container-Based Solution Utilization Application requirements define container size Fault Tolerance New container is started Scaling Workload is repartitioned and new containers are brought up Discovery Existence

slide-36
SLIDE 36

Container-Based Solution

We need something finer-grained The container model provides flexibility within machines, but assumes homogeneity of tasks within containers

slide-37
SLIDE 37

Task-Based Solution

slide-38
SLIDE 38

Task-Based Solution

System Requirements

A B C

complete in less than 5 hours always have 2 containers running response time should be less than 50 ms

slide-39
SLIDE 39

Task-Based Solution

Allocation

Machine

Container A A B Task B C C

slide-40
SLIDE 40

Over-Utilization

Task-Based Solution

Machine

Container Task

slide-41
SLIDE 41

Over-Utilization

Task-Based Solution

Task 1

Machine

Container Task

slide-42
SLIDE 42

Over-Utilization

Task-Based Solution

Task 1

Machine

Container Task

slide-43
SLIDE 43

Over-Utilization

Task-Based Solution

Task 1

Machine

Container Task Task 1

slide-44
SLIDE 44

Over-Utilization

Task-Based Solution

Hide the overhead of a container restart

Machine

Container Task Task 1

slide-45
SLIDE 45

Under-Utilization

384 MB 128 MB

Task-Based Solution

Machine

Container Task

slide-46
SLIDE 46

Under-Utilization

384 MB Task 1 128 MB Task 2

Task-Based Solution

Machine

Container Task

slide-47
SLIDE 47

Under-Utilization Optimize container allocations based on usage

384 MB Task 1 Task 2

Task-Based Solution

Machine

Container Task

slide-48
SLIDE 48

Task-Based Solution

Failure

Task 1 Leader Task 2 Leader Task 3 Leader Task 2 Standby Task 3 Standby Task 1 Standby Task 2 Standby Task 1 Standby Task 3 Standby

Machine

Container

slide-49
SLIDE 49

Task-Based Solution

Failure

Task 1 Leader Task 2 Leader Task 2 Standby Task 3 Standby Task 1 Standby Task 3 Standby Task 3 Leader

Machine

Container

slide-50
SLIDE 50

Task-Based Solution

Failure Some systems cannot wait for new containers to start

Task 1 Leader Task 2 Leader Task 2 Standby Task 3 Standby Task 1 Standby Task 3 Standby Task 3 Leader

Machine

Container

slide-51
SLIDE 51

Task-Based Solution

Discovery

Task 1 Leader Task 2 Leader Task 2 Standby

Machine

Container

Task 1: Leader at N1 Standby at N2

Task 1 Standby

Task 2: Leader at N2 Standby at N1 N1 N2

slide-52
SLIDE 52

Task-Based Solution

Discovery

Task 1 Leader Task 2 Leader Task 2 Standby

Machine

Container

Learn where everything runs, and what state each task is in

Task 1: Leader at N1 Standby at N2

Task 1 Standby

Task 2: Leader at N2 Standby at N1 N1 N2

slide-53
SLIDE 53

Scaling

Task-Based Solution

T4 T5 T6 T1 T2 T3

Machine

Container Task

slide-54
SLIDE 54

Scaling

Task-Based Solution

T4 T5 T6 T1 T2 T3

Machine

Container Task

slide-55
SLIDE 55

Scaling

Task-Based Solution

T4 T5 T6 T1 T2 T3

Machine

Container Task

slide-56
SLIDE 56

Scaling

Task-Based Solution

T4 T5 T6 T1 T2 T3

Machine

Container Task

slide-57
SLIDE 57

Comparing Solutions

Container Solution Task + Container Solution Utilization Application requirements define container size Tasks are distributed as needed to a minimal container set as per SLA Fault Tolerance New container is started Existing task can assume a new state while waiting for new container Scaling Workload is repartitioned and new containers are brought up Tasks are moved across containers Discovery Existence Existence and state

slide-58
SLIDE 58

Benefits of a Task-Based Solution

Comparing Solutions

Container reuse Minimize overhead of container relaunch Fine-grained scheduling

slide-59
SLIDE 59

Benefits of a Task-Based Solution

Comparing Solutions

Container reuse Minimize overhead of container relaunch Fine-grained scheduling Task : Container :: Thread : Process Task is the right level of abstraction

slide-60
SLIDE 60

Working at task granularity is powerful We need a reactive approach to resource assignment

Comparing Solutions

slide-61
SLIDE 61

Working at task granularity is powerful How can Helix help? We need a reactive approach to resource assignment

Comparing Solutions

slide-62
SLIDE 62

Working at task granularity is powerful How can Helix help? We need a reactive approach to resource assignment

Comparing Solutions

YARN/Mesos: containers bring flexibility in a machine Helix: tasks bring flexibility in a container

slide-63
SLIDE 63

Task Management with Helix

slide-64
SLIDE 64

Application Lifecycle

Capacity Planning Provisioning Fault Tolerance State Management

Allocating physical resources for your load Deploying and launching tasks Staying available, ensuring success Determining what code should be running and where

slide-65
SLIDE 65

Controller NODES (Participants) Spectators Controller Controller Manage TASKS

Helix Overview

Cluster Roles

slide-66
SLIDE 66

Helix Controller

High-Level Overview

Rebalancer Task Assignment

Constraints Nodes “single master” “no more than 3 tasks per machine”

slide-67
SLIDE 67

Helix Controller

Rebalancer

ResourceAssignment computeResourceMapping( RebalancerConfig rebalancerConfig, ResourceAssignment prevAssignment, Cluster cluster, ResourceCurrentState currentState);

Based on the current nodes in the cluster and constraints, find an assignment of task to node

slide-68
SLIDE 68

Helix Controller

Rebalancer

ResourceAssignment computeResourceMapping( RebalancerConfig rebalancerConfig, ResourceAssignment prevAssignment, Cluster cluster, ResourceCurrentState currentState);

Based on the current nodes in the cluster and constraints, find an assignment of task to node

What else do we need?

slide-69
SLIDE 69

Helix Controller

What is Missing?

Dynamic Container Allocation Container Isolation Automated Service Deployment Resource Utilization Monitoring

slide-70
SLIDE 70

Helix Controller

Target Provider

Based on some constraints, determine how many containers are required in this system

Fixed CPU Memory Bin Packing

We’re working on integrating with monitoring systems in order to query for usage information

slide-71
SLIDE 71

Helix Controller

Target Provider

Based on some constraints, determine how many containers are required in this system

TargetProviderResponse evaluateExistingContainers( Cluster cluster, ResourceId resourceId, Collection<Participant> participants); class TargetProviderResponse { List<ContainerSpec> containersToAcquire; List<Participant> containersToRelease; List<Participant> containersToStop; List<Participant> containersToStart; }

Fixed CPU Memory Bin Packing

We’re working on integrating with monitoring systems in order to query for usage information

slide-72
SLIDE 72

Helix Controller

Adding a Target Provider

Rebalancer Task Assignment

Constraints Nodes

Target Provider

slide-73
SLIDE 73

Helix Controller

Adding a Target Provider

Rebalancer Task Assignment

Constraints Nodes

Target Provider

How do we use the target provider response?

slide-74
SLIDE 74

Helix Controller

Container Provider

Given the container requirements, ensure that number

  • f containers are running

YARN Mesos Local

slide-75
SLIDE 75

Helix Controller

Container Provider

Given the container requirements, ensure that number

  • f containers are running

ListenableFuture<ContainerId> allocateContainer(ContainerSpec spec);

  • ListenableFuture<Boolean>

deallocateContainer(ContainerId containerId);

  • ListenableFuture<Boolean>

startContainer(ContainerId containerId, Participant participant);

  • ListenableFuture<Boolean>

stopContainer(ContainerId containerId);

YARN Mesos Local

slide-76
SLIDE 76

Helix Controller

Adding a Container Provider

Rebalancer Task Assignment

Constraints Nodes

Target Provider Container Provider

Target Provider + Container Provider = Provisioner

slide-77
SLIDE 77

Application Lifecycle

Capacity Planning Provisioning Fault Tolerance State Management

Target Provider Container Provider Existing Helix Controller (enhanced by Provisioner) Existing Helix Controller (enhanced by Provisioner)

With Helix and the Task Abstraction

slide-78
SLIDE 78

System Architecture

slide-79
SLIDE 79

System Architecture

Resource Provider

slide-80
SLIDE 80

System Architecture

submit job Resource Provider Client

slide-81
SLIDE 81

System Architecture

submit job Resource Provider Controller Container

Provisioner Rebalancer

Client

App Launcher

slide-82
SLIDE 82

System Architecture

submit job Resource Provider Controller Container

Provisioner Rebalancer

Client container request

App Launcher

slide-83
SLIDE 83

System Architecture

submit job Resource Provider Controller Container

Provisioner Rebalancer

Client container request Participant Container

Participant Launcher Helix Participant App App Launcher

slide-84
SLIDE 84

System Architecture

submit job Resource Provider Controller Container

Provisioner Rebalancer

Client container request Participant Container

Participant Launcher Helix Participant App App Launcher

assign tasks

slide-85
SLIDE 85

HDFS/Common Area

Helix + YARN

YARN Architecture

Client Resource Manager Application Master Container Node Manager Node Manager submit job node status node status container request assign work status App Package grab package

slide-86
SLIDE 86

HDFS/Common Area

Helix + YARN

Helix + YARN Architecture

Client Resource Manager Application Master Container Node Manager Node Manager submit job node status node status container request assign tasks status

Helix Controller Rebalancer Helix Participant App

App Package grab package

slide-87
SLIDE 87

HDFS/Common Area

Scheduler Slave

Helix + Mesos

Mesos Architecture

Scheduler Mesos Master Slave Machine Slave Machine Mesos Slave Mesos Slave

  • ffer resources

node status node status Mesos Executor grab executor Executor Package

  • ffer response
slide-88
SLIDE 88

Scheduler Slave

Helix Controller

Helix + Mesos

Helix + Mesos Architecture

Scheduler Mesos Master Slave Machine Slave Machine Mesos Slave Mesos Slave

  • ffer resources

node status node status assign tasks

HDFS/Common Area

Mesos Executor grab executor Helix Executor Package

  • ffer response

Helix Participant/App

slide-89
SLIDE 89

Example

slide-90
SLIDE 90

Distributed Document Store

Overview

Oracle

Partition 0 Partition 1 Partition 2

Oracle

Partition 0 Partition 1 Partition 2

P1 Backup P2 Backup HDFS ETL ETL

Master Slave

Oracle

Partition 0 Partition 1 Partition 2

P0 Backup ETL

slide-91
SLIDE 91

Distributed Document Store

Overview

Oracle

Partition 0 Partition 1 Partition 2

Oracle

Partition 0 Partition 1 Partition 2

P1 Backup P2 Backup HDFS ETL ETL

Master Slave

P0 Backup

Partition 0 Partition 1 Partition 2

slide-92
SLIDE 92

Distributed Document Store

YARN Example

Client Resource Manager submit job container request assign work status node status Application Master Node Manager

Helix Controller Rebalancer

Container Node Manager node status

Helix Participant

Oracle Partition 0 Partition 1

P1 Backup ETL

slide-93
SLIDE 93

YAML Specification

appConfig: { config: { k1: v1 } } appPackageUri: 'file://path/to/myApp-pkg.tar' appName: myApp services: [DB, ETL] # the task containers serviceConfigMap: {DB: { num_containers: 3, memory: 1024 }, ... ETL: { time_to_complete: 5h, ... }, ...} servicePackageURIMap: { DB: ‘file://path/to/db-service-pkg.tar', ... } ...

Distributed Document Store

slide-94
SLIDE 94

YAML Specification

appConfig: { config: { k1: v1 } } appPackageUri: 'file://path/to/myApp-pkg.tar' appName: myApp services: [DB, ETL] # the task containers serviceConfigMap: {DB: { num_containers: 3, memory: 1024 }, ... ETL: { time_to_complete: 5h, ... }, ...} servicePackageURIMap: { DB: ‘file://path/to/db-service-pkg.tar', ... } ...

Distributed Document Store

TargetProvider specification

slide-95
SLIDE 95

Service/Container Implementation

public class MyQueuerService extends StatelessParticipantService { @Override public void init() { ... }

  • @Override

public void onOnline() { ... }

  • @Override

public void onOffline() { ... } }

Distributed Document Store

slide-96
SLIDE 96

Task Implementation

public class BackupTask extends Task { @Override public ListenableFuture<Status> start() { ... }

  • @Override

public ListenableFuture<Status> cancel() { ... }

  • @Override

public ListenableFuture<Status> pause() { ... }

  • @Override

public ListenableFuture<Status> resume() { ... } }

Distributed Document Store

slide-97
SLIDE 97

Distributed Document Store

State Model-Style Callbacks

public class StoreStateModel extends StateModel { public void onBecomeMasterFromSlave() { ... }

  • public void onBecomeSlaveFromMaster() { ... }
  • public void onBecomeSlaveFromOffline() { ... }
  • public void onBecomeOfflineFromSlave() { ... }

}

slide-98
SLIDE 98

class ¡RoutingLogic ¡{ ¡ ¡ ¡ ¡public ¡void ¡write(Request ¡request) ¡{ ¡ ¡ ¡ ¡ ¡ ¡partition ¡= ¡getPartition(request.key); ¡ ¡ ¡ ¡ ¡ ¡List<Participant> ¡nodes ¡= ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡routingTableProvider.getInstance( ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡partition, ¡“MASTER”); ¡ ¡ ¡ ¡ ¡ ¡nodes.get(0).write(request); ¡ ¡ ¡ ¡} ¡

  • ¡ ¡ ¡public ¡void ¡read(Request ¡request) ¡{ ¡

¡ ¡ ¡ ¡ ¡partition ¡= ¡getPartition(request.key); ¡ ¡ ¡ ¡ ¡ ¡List<Participant> ¡nodes ¡= ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡routingTableProvider.getInstance(partition); ¡ ¡ ¡ ¡ ¡ ¡random(nodes).read(request); ¡ ¡ ¡ ¡} ¡ }

Spectator (for Discovery)

Distributed Document Store

slide-99
SLIDE 99

Helix at LinkedIn

slide-100
SLIDE 100

Helix at LinkedIn

Oracle Oracle Oracle

DB

Change Capture

Change Consumers Index Search Index User Writes

Data Replicator

Backup/Restore

In Production

ETL

HDFS

Analytics

slide-101
SLIDE 101

Helix at LinkedIn

In Production

Over 1000 instances covering over 30000 database partitions Over 1000 instances for change capture consumers As many as 500 instances in a single Helix cluster (all numbers are per-datacenter)

slide-102
SLIDE 102

Summary

  • Container abstraction has become a huge win
  • With Helix, we can go a step further and make

tasks the unit of work

  • With the TargetProvider and ContainerProvider

abstractions, any popular provisioner can be plugged in

slide-103
SLIDE 103

Questions?

Jason zzhang@apache.org Kanak kanak@apache.org Website helix.apache.org Dev Mailing List dev@helix.apache.org User Mailing List user@helix.apache.org Twitter @apachehelix

?

slide-104
SLIDE 104