Secure Async Execution @ Brennan Saeta The Beginnings 2012 1 - - PowerPoint PPT Presentation

secure async execution
SMART_READER_LITE
LIVE PREVIEW

Secure Async Execution @ Brennan Saeta The Beginnings 2012 1 - - PowerPoint PPT Presentation

ECS & Docker: Secure Async Execution @ Brennan Saeta The Beginnings 2012 1 million 4 10 learners courses partners worldwide Education at Scale 18 million 140 1,800 learners courses partners worldwide Outline Evolution


slide-1
SLIDE 1

ECS & Docker: Secure Async Execution @

Brennan Saeta

slide-2
SLIDE 2
slide-3
SLIDE 3

The Beginnings — 2012

10

courses

1 million

learners worldwide

4

partners

slide-4
SLIDE 4

Education at Scale

1,800

courses

18 million

learners worldwide

140

partners

slide-5
SLIDE 5

Outline

  • Evolution of Coursera’s nearline execution systems
  • Next-generation execution framework: Iguazú
  • Iguazú application deep dive:

GrID — evaluating programming assignments

slide-6
SLIDE 6

Key Takeaways

  • What is nearline execution, and why it is useful
  • Best practices for running containers in production

in the cloud

  • Hardening techniques for securely operating

container infrastructure at scale

slide-7
SLIDE 7

A history of nearline execution

slide-8
SLIDE 8
slide-9
SLIDE 9

Coursera Architecture (2012)

PHP Monolith

slide-10
SLIDE 10

Early days - Requirements

  • Video re-encoding for distribution
  • Grade computation for 100,000+ learners
  • Pedagogical data exports for courses
slide-11
SLIDE 11

Coursera Architecture (2012)

PHP Monolith

slide-12
SLIDE 12

Cascade Architecture

PHP Monolith PHP Monolith Cascade

slide-13
SLIDE 13

Cascade Architecture

PHP Monolith PHP Monolith Cascade Queue

slide-14
SLIDE 14

Upgrading to Scala

Re-architecting delayed execution for our 2nd generation learning platform.

slide-15
SLIDE 15

Upgrading to the JVM

  • Leverage mature Scala & JVM ecosystems for code

sharing

  • JVM much more reliable (no memory leaks)
  • New job model: scheduled recurring jobs.
  • Named: Saturn
slide-16
SLIDE 16

Saturn Architecture

Service A Service B Service C

C*

Online Serving Scala/micro-service architecture

C*

slide-17
SLIDE 17

Saturn Architecture

Service A Service B Service C

C*

Online Serving Scala/micro-service architecture Saturn

C*

slide-18
SLIDE 18

Saturn Architecture

Service A Service B Service C

C*

Saturn

C*

ZK Ensemble

slide-19
SLIDE 19

Saturn Architecture

Saturn Leader ZK Ensemble Service A Service B Service C

C* C*

slide-20
SLIDE 20

Problems with Saturn

  • Single master meant naïve implementation ran all

jobs in same JVM

  • Huge CPU contention @ top of the hour
  • OOM Exceptions & GC issues
slide-21
SLIDE 21

Enter: Docker

Containers allow for resource isolation!

CC-by-2.0 https://www.flickr.com/photos/photohome_uk/1494590209

slide-22
SLIDE 22

Supported Features

Platform Saturn Docker Amazon ECS Iguazú Run code

✅ ✅ ✅ ✅

Resource Isolation

❌ ✅ ✅ ✅

Clusters / HA

☑︐ ❌ ✅ ✅

Great developer workflow

✅ ❌ ❌ ✅

Scheduled Jobs

✅ ❌ ❌ ✅

slide-23
SLIDE 23

Supported Features

Platform Saturn Docker Amazon ECS Iguazú Run code

✅ ✅ ✅ ✅

Resource Isolation

❌ ✅ ✅ ✅

Clusters / HA

✅ ❌ ✅ ✅

Great developer workflow

✅ ❌ ❌ ✅

Scheduled Jobs

✅ ❌ ❌ ✅

slide-24
SLIDE 24

Supported Features

Platform Saturn Docker Amazon ECS Iguazú Run code

✅ ✅ ✅ ✅

Resource Isolation

❌ ✅ ✅ ✅

Clusters / HA

✅ ❌ ✅ ✅

Great developer workflow

✅ ❌ ❌ ✅

Scheduled Jobs

✅ ❌ ❌ ✅

slide-25
SLIDE 25

Supported Features

Platform Saturn Docker Amazon ECS Iguazú Run code

✅ ✅ ✅ ✅

Resource Isolation

❌ ✅ ✅ ✅

Clusters / HA

✅ ❌ ✅ ✅

Great developer workflow

✅ ❌ ❌ ✅

Scheduled Jobs

✅ ❌ ❌ ✅

slide-26
SLIDE 26

Supported Features

Platform Saturn Docker Amazon ECS ??? Run code

✅ ✅ ✅ ✅

Resource Isolation

❌ ✅ ✅ ✅

Clusters / HA

✅ ❌ ✅ ✅

Great developer workflow

✅ ❌ ❌ ✅

Scheduled Jobs

✅ ❌ ❌ ✅

slide-27
SLIDE 27

Solution: Iguazú

Marissa Strniste (https://www.flickr.com/photos/mstrniste/5999464924) CC-BY-2.0

slide-28
SLIDE 28

Solution: Iguazú

  • Framework & service for

asynchronous execution

  • Optimized Scala developer

experience for Coursera

  • Unified scheduler supports:
  • Immediate execution (nearline)
  • Scheduled recurring execution

(cron-like)

  • Deferred execution (run once @

time X)

Marissa Strniste (https://www.flickr.com/photos/mstrniste/5999464924) CC-BY-2.0

slide-29
SLIDE 29

Iguazú Architecture

Iguazú Frontend Iguazú Scheduler

Iguazú Backend

Cassandra

Services Services Iguazú Admin

Iguazú Workers SQS ECS API Devs Users

slide-30
SLIDE 30

Iguazú Architecture

Iguazú Frontend Iguazú Scheduler

Iguazú Backend

Cassandra

Services Services Iguazú Admin

Iguazú Workers

SQS Queue

ECS API Devs Users

slide-31
SLIDE 31

Iguazú Architecture

Iguazú Frontend Iguazú Scheduler

Iguazú Backend

Cassandra

Services Services Iguazú Admin

Iguazú Workers ECS API Devs Users

SQS Queue

slide-32
SLIDE 32

Iguazú Architecture

Iguazú Frontend Iguazú Scheduler

Iguazú Backend

Cassandra

Services Services Iguazú Admin

Iguazú Workers ECS API Devs Users ZK Ensemble

SQS Queue

slide-33
SLIDE 33

Iguazú Architecture

Iguazú Frontend Iguazú Scheduler

Iguazú Backend

Cassandra

Services Services Iguazú Admin

Iguazú Workers ECS API Devs Users ZK Ensemble

SQS Queue

slide-34
SLIDE 34

Autoscale, autoscale, autoscale!

slide-35
SLIDE 35

Autoscaling ⇄ Iguazú ⇆ ECS

Iguazu ECS API Autoscaling EC2 Worker EC2 Worker Shutdown Lifecycle Notification Poll Worker Job Status All finished Proceed Term- inate EC2 Worker

slide-36
SLIDE 36

Failure in Nearline Systems

  • Most jobs are non-idempotent
  • Iguazú: At most once execution
  • Time-bounded delay
  • Future: At least once execution
  • With caveats
slide-37
SLIDE 37

Iguazú adoption by the numbers

~100 jobs in production >1000 runs per day >100 different job schedules

slide-38
SLIDE 38

Iguazú Applications

Nearline Jobs

  • Pedagogical Instructor

Data Exports

  • System Integrations
  • Course Migrations

Scheduled Recurring Jobs

  • Course Reminders
  • System Integrations
  • Payment reconciliation
  • Course translations
  • Housekeeping
  • Build artifact archival
  • A/B Experiments
slide-39
SLIDE 39

While containers may help you

  • n your journey, they are not

themselves a destination.

CC-by-2.0 https://www.flickr.com/photos/usoceangov/5369581593

slide-40
SLIDE 40

Writing an Iguazu Job

class AbReminderJob @Inject() (abClient: AbClient, email: EmailAPI) extends AbstractJob {

  • verride val reservedCpu = 1024 // 1 CPU core
  • verride val reservedMemory = 1024 // 1 GB RAM

def run(parameters: JsValue) = { val experiments = abClient.findForgotten() logger.info(s"Found ${experiments.size} forgotten experiments.") experiments.foreach { experiment => sendReminder(experiment.owners, experiment.description) } } }

slide-41
SLIDE 41

Writing an Iguazu Job

class AbReminderJob @Inject() (abClient: AbClient, email: EmailAPI) extends AbstractJob {

  • verride val reservedCpu = 1024 // 1 CPU core
  • verride val reservedMemory = 1024 // 1 GB RAM

def run(parameters: JsValue) = { val experiments = abClient.findForgotten() logger.info(s"Found ${experiments.size} forgotten experiments.") experiments.foreach { experiment => sendReminder(experiment.owners, experiment.description) } } }

slide-42
SLIDE 42

Writing an Iguazu Job

class AbReminderJob @Inject() (abClient: AbClient, email: EmailAPI) extends AbstractJob {

  • verride val reservedCpu = 1024 // 1 CPU core
  • verride val reservedMemory = 1024 // 1 GB RAM

def run(parameters: JsValue) = { val experiments = abClient.findForgotten() logger.info(s"Found ${experiments.size} forgotten experiments.") experiments.foreach { experiment => sendReminder(experiment.owners, experiment.description) } } }

slide-43
SLIDE 43

Writing an Iguazu Job

class AbReminderJob @Inject() (abClient: AbClient, email: EmailAPI) extends AbstractJob {

  • verride val reservedCpu = 1024 // 1 CPU core
  • verride val reservedMemory = 1024 // 1 GB RAM

def run(parameters: JsValue) = { val experiments = abClient.findForgotten() logger.info(s"Found ${experiments.size} forgotten experiments.") experiments.foreach { experiment => sendReminder(experiment.owners, experiment.description) } } }

slide-44
SLIDE 44

Writing an Iguazu Job

class AbReminderJob @Inject() (abClient: AbClient, email: EmailAPI) extends AbstractJob {

  • verride val reservedCpu = 1024 // 1 CPU core
  • verride val reservedMemory = 1024 // 1 GB RAM

def run(parameters: JsValue) = { val experiments = abClient.findForgotten() logger.info(s"Found ${experiments.size} forgotten experiments.") experiments.foreach { experiment => sendReminder(experiment.owners, experiment.description) } } }

slide-45
SLIDE 45

Testing an Iguazu job

slide-46
SLIDE 46

The Hollywood Principle applies to distributed systems.

CC-by-2.0 https://www.flickr.com/photos/raindog808/354080327

slide-47
SLIDE 47

Deploying a new Iguazu Job

  • Developer
  • merge into master… done
  • Jenkins Build Steps
  • Compile & package job JAR
  • Prepare Docker image
  • Pushes image into registry
  • Register updated job with

Amazon ECS API

slide-48
SLIDE 48

Invoking an Iguazú Job

// invoking a job with one function call // from another service via REST framework RPC

val invocationId = iguazuJobInvocationClient .create(IguazuJobInvocationRequest( jobName = "exportQuizGrades", parameters = quizParams))

slide-49
SLIDE 49

A clean environment increases reliability.

CC-by-2.0 https://www.flickr.com/photos/raindog808/354080327

slide-50
SLIDE 50

Evaluating Programming Assignments

An application of Iguazú

slide-51
SLIDE 51
slide-52
SLIDE 52
slide-53
SLIDE 53

Design Goals

Elastic Infrastructure No Maintenance Near Real-time Secure Infrastructure

slide-54
SLIDE 54

Design Goals

Elastic Infrastructure No Maintenance Near Real-time Secure Infrastructure

slide-55
SLIDE 55

Design Goals

Elastic Infrastructure No Maintenance Near Real-time Secure Infrastructure

slide-56
SLIDE 56

Solution: GrID

Patrick Hoesly (https://www.flickr.com/photos/zooboing/5665221326/) CC-BY-2.0

  • Service + framework for grading

programming assignments

  • Builds on Iguazú
  • Named for Tron’s “digital frontier”
  • Backronym: Grading Inside Docker
slide-57
SLIDE 57

High-level GrID Architecture

Learners GrID

Iguazú S3 Bucket ECS APIs Grading Machines VPC Firewalls Coursera Production Account Coursera GrID Grading Account

slide-58
SLIDE 58

High-level GrID Architecture

Learners GrID

Iguazú

S3 Bucket

ECS APIs Grading Machines VPC Firewalls Coursera Production Account Coursera GrID Grading Account

slide-59
SLIDE 59

High-level GrID Architecture

Learners GrID Iguazú S3 Bucket ECS API

Grading Machines VPC Firewalls

Production Acct GrID Grading Account

slide-60
SLIDE 60

High-level GrID Architecture

Learners GrID Iguazú S3 Bucket ECS API Grading Machines VPC Firewalls Production Acct GrID Grading Account

slide-61
SLIDE 61

Design Goals

Elastic Infrastructure No Maintenance Near Real-time Secure Infrastructure

slide-62
SLIDE 62

Programming Assignments

slide-63
SLIDE 63

The Security Challenge

Compiling and running untrusted, arbitrary code on

  • ur cluster in near real time.

Would you like to compile and run C code from random people on the Internet on your servers?

slide-64
SLIDE 64

FROM redis FROM ubuntu:latest FROM jane’s-image

slide-65
SLIDE 65

Security Assumptions

  • Run arbitrary binaries
  • Instructor grading scripts may have vulnerabilities
  • ∴ Grading code is untrusted
  • Unknown vulnerabilities in Docker and Linux

name-spacing and/or container implementation

slide-66
SLIDE 66

Security Goals

Prevent submitted code from:

  • impacting the evaluation of other submissions.
  • disrupting the grading environment (e.g., DoS)
  • affecting the rest of the Coursera learning platform
slide-67
SLIDE 67

Grading assignment submissions

CC-by-2.0 https://www.flickr.com/photos/dherholz/4367511580/

slide-68
SLIDE 68
slide-69
SLIDE 69

CPU CPU CPU CPU RAM Alice’s Container Alice’s Submission Grader Bob’s Container Bob’s Submission Grader Mallory’s Container Mallory’s Submission Grader Kernel Disk

slide-70
SLIDE 70

CPU CPU CPU CPU RAM Alice’s Container Alice’s Submission Grader Bob’s Container Bob’s Submission Grader Mallory’s Container Mallory’s Submission Grader Kernel Disk

slide-71
SLIDE 71

CPU cgroups CPU cgroups RAM — cgroups Alice’s Container Alice’s Submission Grader Bob’s Container Bob’s Submission Grader Mallory’s Container Mallory’s Submission Grader Kernel Disk

slide-72
SLIDE 72

CPU cgroups CPU cgroups RAM — cgroups Alice’s Container Alice’s Submission Grader Bob’s Container Bob’s Submission Grader Mallory’s Container Mallory’s Submission Grader Kernel Disk

slide-73
SLIDE 73

CPU cgroups CPU cgroups RAM — cgroups Alice’s Container Alice’s Submission Grader Bob’s Container Bob’s Submission Grader Mallory’s Container Mallory’s Submission Grader Kernel Disk — blkio limits & btrfs quotas

slide-74
SLIDE 74

CPU cgroups CPU cgroups RAM — cgroups Alice’s Container Alice’s Submission Grader Bob’s Container Bob’s Submission Grader Mallory’s Container Mallory’s Submission Grader Kernel Disk — blkio limits & btrfs quotas

slide-75
SLIDE 75

Attacks: Kernel Resource Exhaustion

  • Open file limits per container

(nofile)

  • nproc Process limits
  • Limit kernel memory per cgroup
  • Limit execution time
slide-76
SLIDE 76
slide-77
SLIDE 77

CPU cgroups CPU cgroups RAM — cgroups Alice’s Container Alice’s Submission Grader Bob’s Container Bob’s Submission Grader Mallory’s Container Mallory’s Submission Grader Kernel — cgroups, ulimits Disk — blkio limits & btrfs quotas Network

slide-78
SLIDE 78

Attacks: Network attacks

Attacks:

  • Bitcoin mining
  • DoS attacks on other systems
  • Access Amazon S3 and other AWS APIs

Defense:

  • Deny network access
slide-79
SLIDE 79

Docker Network Modes

NetworkDisabled too restrictive

  • Some graders require local loopback
  • Feature also deprecated
  • -net=none + deny net_admin + audit

network

  • Isolation via Docker creating an

independent network stack for each container github.com/coursera/amazon-ecs-agent

slide-80
SLIDE 80

CC-by-2.0 https://www.flickr.com/photos/valentinap/253659858

slide-81
SLIDE 81

CC-by-2.0 https://www.flickr.com/photos/jessicafm/2834658255/

slide-82
SLIDE 82

CC-by-2.0 https://www.flickr.com/photos/donnieray/11501178306/in/photostream/

slide-83
SLIDE 83

Defense in Depth

  • Mandatory Access Control (App Armor)
  • Allows auditing or denying access to a

variety of subsystems

  • Drop capabilities from bounding set
  • No need for NET_BIND_SERVICE,

CAP_FOWNER, MKNOD

  • Deny root within container
slide-84
SLIDE 84

Deny Root Escalations

  • We modify instructor grader images

before allowing them to be run

  • Clears setuid
  • Inserts C wrapper to drop privileges from

root and redirect stdin/stdout/stderr

  • Run cleaning job on another Iguazú

cluster

  • Run Docker in Docker!
  • Docker 1.10 adds User Namespaces
slide-85
SLIDE 85

If all else fails…

  • Utilizes VPC security measures to

further restrict network access

  • No public internet access
  • Security group to restrict

inbound/outbound access

  • Network flow logs for auditing
  • Separate AWS account
  • Run in an Auto Scaling group
  • Regularly terminate all grading EC2

instances

slide-86
SLIDE 86

Other Security Measures

  • Utilize AWS CloudTrail for audit logs
  • Third-party security monitoring

(Threat Stack)

  • No one should log in, so any TTY is an alert
  • Penetration testing by third-party red

team (Synack)

slide-87
SLIDE 87

Lessons Learned - GrID

  • Building a platform for code

execution is hard!

  • Carefully monitor disk usage
  • Run the latest kernels
  • Latest security patches
  • btrfs wedging on older kernels
  • Default Ubuntu 14.04 kernel not new

enough!

slide-88
SLIDE 88

Reliable deploy tooling pays for itself.

slide-89
SLIDE 89

Thank you!

Brennan Saeta

github/saeta @bsaeta saeta@coursera.org

Frank Chen

github/frankchn @frankchn frankchn@coursera.org

GrID lead Iguazú Lead

slide-90
SLIDE 90

Questions?

Brennan Saeta

github/saeta @bsaeta saeta@coursera.org

Frank Chen

github/frankchn @frankchn frankchn@coursera.org

GrID lead Iguazú Lead