ECS & Docker: Secure Async Execution @
Brennan Saeta
Secure Async Execution @ Brennan Saeta The Beginnings 2012 1 - - PowerPoint PPT Presentation
ECS & Docker: Secure Async Execution @ Brennan Saeta The Beginnings 2012 1 million 4 10 learners courses partners worldwide Education at Scale 18 million 140 1,800 learners courses partners worldwide Outline Evolution
Brennan Saeta
10
courses
1 million
learners worldwide
4
partners
1,800
courses
18 million
learners worldwide
140
partners
GrID — evaluating programming assignments
in the cloud
container infrastructure at scale
PHP Monolith
PHP Monolith
PHP Monolith PHP Monolith Cascade
PHP Monolith PHP Monolith Cascade Queue
Re-architecting delayed execution for our 2nd generation learning platform.
sharing
Service A Service B Service C
C*
Online Serving Scala/micro-service architecture
C*
Service A Service B Service C
C*
Online Serving Scala/micro-service architecture Saturn
C*
Service A Service B Service C
C*
Saturn
C*
ZK Ensemble
Saturn Leader ZK Ensemble Service A Service B Service C
C* C*
jobs in same JVM
Containers allow for resource isolation!
CC-by-2.0 https://www.flickr.com/photos/photohome_uk/1494590209
Platform Saturn Docker Amazon ECS Iguazú Run code
Resource Isolation
Clusters / HA
Great developer workflow
Scheduled Jobs
Platform Saturn Docker Amazon ECS Iguazú Run code
Resource Isolation
Clusters / HA
Great developer workflow
Scheduled Jobs
Platform Saturn Docker Amazon ECS Iguazú Run code
Resource Isolation
Clusters / HA
Great developer workflow
Scheduled Jobs
Platform Saturn Docker Amazon ECS Iguazú Run code
Resource Isolation
Clusters / HA
Great developer workflow
Scheduled Jobs
Platform Saturn Docker Amazon ECS ??? Run code
Resource Isolation
Clusters / HA
Great developer workflow
Scheduled Jobs
Marissa Strniste (https://www.flickr.com/photos/mstrniste/5999464924) CC-BY-2.0
asynchronous execution
experience for Coursera
(cron-like)
time X)
Marissa Strniste (https://www.flickr.com/photos/mstrniste/5999464924) CC-BY-2.0
Iguazú Frontend Iguazú Scheduler
Iguazú Backend
Cassandra
Services Services Iguazú Admin
Iguazú Workers SQS ECS API Devs Users
Iguazú Frontend Iguazú Scheduler
Iguazú Backend
Cassandra
Services Services Iguazú Admin
Iguazú Workers
SQS Queue
ECS API Devs Users
Iguazú Frontend Iguazú Scheduler
Iguazú Backend
Cassandra
Services Services Iguazú Admin
Iguazú Workers ECS API Devs Users
SQS Queue
Iguazú Frontend Iguazú Scheduler
Iguazú Backend
Cassandra
Services Services Iguazú Admin
Iguazú Workers ECS API Devs Users ZK Ensemble
SQS Queue
Iguazú Frontend Iguazú Scheduler
Iguazú Backend
Cassandra
Services Services Iguazú Admin
Iguazú Workers ECS API Devs Users ZK Ensemble
SQS Queue
Iguazu ECS API Autoscaling EC2 Worker EC2 Worker Shutdown Lifecycle Notification Poll Worker Job Status All finished Proceed Term- inate EC2 Worker
~100 jobs in production >1000 runs per day >100 different job schedules
Nearline Jobs
Data Exports
Scheduled Recurring Jobs
CC-by-2.0 https://www.flickr.com/photos/usoceangov/5369581593
class AbReminderJob @Inject() (abClient: AbClient, email: EmailAPI) extends AbstractJob {
def run(parameters: JsValue) = { val experiments = abClient.findForgotten() logger.info(s"Found ${experiments.size} forgotten experiments.") experiments.foreach { experiment => sendReminder(experiment.owners, experiment.description) } } }
class AbReminderJob @Inject() (abClient: AbClient, email: EmailAPI) extends AbstractJob {
def run(parameters: JsValue) = { val experiments = abClient.findForgotten() logger.info(s"Found ${experiments.size} forgotten experiments.") experiments.foreach { experiment => sendReminder(experiment.owners, experiment.description) } } }
class AbReminderJob @Inject() (abClient: AbClient, email: EmailAPI) extends AbstractJob {
def run(parameters: JsValue) = { val experiments = abClient.findForgotten() logger.info(s"Found ${experiments.size} forgotten experiments.") experiments.foreach { experiment => sendReminder(experiment.owners, experiment.description) } } }
class AbReminderJob @Inject() (abClient: AbClient, email: EmailAPI) extends AbstractJob {
def run(parameters: JsValue) = { val experiments = abClient.findForgotten() logger.info(s"Found ${experiments.size} forgotten experiments.") experiments.foreach { experiment => sendReminder(experiment.owners, experiment.description) } } }
class AbReminderJob @Inject() (abClient: AbClient, email: EmailAPI) extends AbstractJob {
def run(parameters: JsValue) = { val experiments = abClient.findForgotten() logger.info(s"Found ${experiments.size} forgotten experiments.") experiments.foreach { experiment => sendReminder(experiment.owners, experiment.description) } } }
CC-by-2.0 https://www.flickr.com/photos/raindog808/354080327
Amazon ECS API
// invoking a job with one function call // from another service via REST framework RPC
val invocationId = iguazuJobInvocationClient .create(IguazuJobInvocationRequest( jobName = "exportQuizGrades", parameters = quizParams))
CC-by-2.0 https://www.flickr.com/photos/raindog808/354080327
An application of Iguazú
Elastic Infrastructure No Maintenance Near Real-time Secure Infrastructure
Elastic Infrastructure No Maintenance Near Real-time Secure Infrastructure
Elastic Infrastructure No Maintenance Near Real-time Secure Infrastructure
Patrick Hoesly (https://www.flickr.com/photos/zooboing/5665221326/) CC-BY-2.0
programming assignments
Learners GrID
Iguazú S3 Bucket ECS APIs Grading Machines VPC Firewalls Coursera Production Account Coursera GrID Grading Account
Learners GrID
Iguazú
S3 Bucket
ECS APIs Grading Machines VPC Firewalls Coursera Production Account Coursera GrID Grading Account
Learners GrID Iguazú S3 Bucket ECS API
Grading Machines VPC Firewalls
Production Acct GrID Grading Account
Learners GrID Iguazú S3 Bucket ECS API Grading Machines VPC Firewalls Production Acct GrID Grading Account
Elastic Infrastructure No Maintenance Near Real-time Secure Infrastructure
Compiling and running untrusted, arbitrary code on
Would you like to compile and run C code from random people on the Internet on your servers?
name-spacing and/or container implementation
Prevent submitted code from:
CC-by-2.0 https://www.flickr.com/photos/dherholz/4367511580/
CPU CPU CPU CPU RAM Alice’s Container Alice’s Submission Grader Bob’s Container Bob’s Submission Grader Mallory’s Container Mallory’s Submission Grader Kernel Disk
CPU CPU CPU CPU RAM Alice’s Container Alice’s Submission Grader Bob’s Container Bob’s Submission Grader Mallory’s Container Mallory’s Submission Grader Kernel Disk
CPU cgroups CPU cgroups RAM — cgroups Alice’s Container Alice’s Submission Grader Bob’s Container Bob’s Submission Grader Mallory’s Container Mallory’s Submission Grader Kernel Disk
CPU cgroups CPU cgroups RAM — cgroups Alice’s Container Alice’s Submission Grader Bob’s Container Bob’s Submission Grader Mallory’s Container Mallory’s Submission Grader Kernel Disk
CPU cgroups CPU cgroups RAM — cgroups Alice’s Container Alice’s Submission Grader Bob’s Container Bob’s Submission Grader Mallory’s Container Mallory’s Submission Grader Kernel Disk — blkio limits & btrfs quotas
CPU cgroups CPU cgroups RAM — cgroups Alice’s Container Alice’s Submission Grader Bob’s Container Bob’s Submission Grader Mallory’s Container Mallory’s Submission Grader Kernel Disk — blkio limits & btrfs quotas
(nofile)
CPU cgroups CPU cgroups RAM — cgroups Alice’s Container Alice’s Submission Grader Bob’s Container Bob’s Submission Grader Mallory’s Container Mallory’s Submission Grader Kernel — cgroups, ulimits Disk — blkio limits & btrfs quotas Network
Attacks:
Defense:
NetworkDisabled too restrictive
network
independent network stack for each container github.com/coursera/amazon-ecs-agent
CC-by-2.0 https://www.flickr.com/photos/valentinap/253659858
CC-by-2.0 https://www.flickr.com/photos/jessicafm/2834658255/
CC-by-2.0 https://www.flickr.com/photos/donnieray/11501178306/in/photostream/
variety of subsystems
CAP_FOWNER, MKNOD
before allowing them to be run
root and redirect stdin/stdout/stderr
cluster
further restrict network access
inbound/outbound access
instances
(Threat Stack)
team (Synack)
execution is hard!
enough!
Brennan Saeta
github/saeta @bsaeta saeta@coursera.org
Frank Chen
github/frankchn @frankchn frankchn@coursera.org
GrID lead Iguazú Lead
Brennan Saeta
github/saeta @bsaeta saeta@coursera.org
Frank Chen
github/frankchn @frankchn frankchn@coursera.org
GrID lead Iguazú Lead