Laura Frank Engineer, Codeship Agenda 1. Parallel Testing Goals - - PowerPoint PPT Presentation

laura frank
SMART_READER_LITE
LIVE PREVIEW

Laura Frank Engineer, Codeship Agenda 1. Parallel Testing Goals - - PowerPoint PPT Presentation

Efficient Parallel Testing with Docker Laura Frank Engineer, Codeship Agenda 1. Parallel Testing Goals 2. DIY with LXC 3. Using Docker and the Docker Ecosystem Parallel Testing GOAL Create a customizable, flexible test environment that


slide-1
SLIDE 1

Efficient Parallel Testing with Docker

Laura Frank

Engineer, Codeship

slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4
  • 1. Parallel Testing Goals
  • 2. DIY with LXC
  • 3. Using Docker and the Docker

Ecosystem

Agenda

slide-5
SLIDE 5

Parallel Testing

slide-6
SLIDE 6

Create a customizable, flexible test environment that enables us to run tests in parallel

GOAL

slide-7
SLIDE 7
  • Deploy new code faster
  • Find out quickly when automated steps fail

If you’re still not sure why testing is important, please talk to me in the Codeship booth.

Why?

slide-8
SLIDE 8
  • For local testing, e.g. unit and integration tests run

by a development team

  • On internal CI/CD systems
  • As part of a hosted CI/CD solution

Where?

slide-9
SLIDE 9
  • Performance optimization for serial testing tasks is

limited

  • Split up testing tasks
  • Run tasks in parallel

How?

Optimize the services themselves Use smart finders in integration testing (waiting vs. non-waiting finders) Use a bigger build machine

slide-10
SLIDE 10

Run tasks across multiple processors in parallel computing environments

TASK PARALLELISM

parallel task assignment in heterogeneous distributed computing systems is totally a thing, I promise heterogeny is enabled by containerization hardware specs are all but irrelevant here

slide-11
SLIDE 11

Distributed Task Parallelism

A distributed system of containerized computing environments takes the place of a single multiprocessor machine A container is a process, not a small VM

The introductory way of thinking that a container is just a lightweight VM needs to be replaced with a more accurate depiction of a container as a process Distributed computing with containers can’t evolve with Container:VM mapping

slide-12
SLIDE 12

Serial test execution

slide-13
SLIDE 13

parallel execution

slide-14
SLIDE 14

Spend less time waiting around for your builds to finish

  • Ship newest code to production faster
  • Be alerted sooner when tests fail
  • Allow multiple developers to run builds simultaneously

Goal: Shorter Feedback Cycles

slide-15
SLIDE 15

Developers should have full autonomy over testing environments, and the way tests are executed.

  • Move testing commands to separate pipelines
  • Designate commands to be run serially or in parallel
  • Declare specific dependencies for each service

Goal: More User Control

slide-16
SLIDE 16

Why not VMs?

  • Isolation of running builds and Codeship infrastructure
  • Challenges with dependency management
  • Not straightforward to impose resource limits
  • Infrastructure is underutilized which makes it expensive

Distributed systems need not include containers

slide-17
SLIDE 17

Containers, duh!

✓ Impose resource limits and utilize infrastructure at higher

capacity

✓ Isolation and security of customer code ✓ Consistent build environment across many build runs ✓ Enable simultaneous testing jobs

We want a separation of Codeship infrastructure and customer builds Easier and more cost effective to introduce multi-processor build configurations to enable parallelism

slide-18
SLIDE 18

DIY with LXC

slide-19
SLIDE 19

Codeship has been powered by containers since the very beginning.

slide-20
SLIDE 20

Flowing salty water on Mars International Year of Forests Preparations for 12.04 Precise with LXC improvements Codeship was founded Green Bay Packers won Super Bowl XLV

2011: A Brief History Lesson

MRO — first initial photographic evidence of salty water (was confirmed later in the year) Precise shipped in early 2012

slide-21
SLIDE 21

I should use LXC…

slide-22
SLIDE 22

Why LXC?

  • Impose resource limits
  • Isolation and security of customer code
  • Consistent build environment and experience

across multiple build runs

  • Can programmatically automate creation and

deletion

We want to guarantee that

  • Customer code is isolated
  • All build runs are identical
slide-23
SLIDE 23

Checkbot (Codeship Classic)

  • Still running in production as our classic

infrastructure

  • Well-suited for users who want 1-click test

environments without much customization

  • Compromise ease of use for flexibility
slide-24
SLIDE 24

Checkbot

39K builds per day 7.8M builds per year

Peak builds per day

slide-25
SLIDE 25

Architecture

  • Universal Container with provided dependencies
  • Run builds in isolation from one another
  • Implement parallel testing pattern using pipelines
  • Users can have N pipelines, also run in isolation

during a build

M*N containers in this system, but M is always 1

slide-26
SLIDE 26

User Commands Universal Container Pipeline User Commands Universal Container Pipeline Heroku Deployment Provider Capistrano AppEngine Elastic Beanstalk etc…

slide-27
SLIDE 27

Limitations

  • Parity between dev and test
  • Can’t really debug locally
  • No useable interface between user and container
  • Have to compromise ease of use for flexibility
  • Resource consumption is too high

No easy workflow for running a customized service in a lightweight container

slide-28
SLIDE 28

While using straight-up LXC solved some

  • f our technical problems, it didn’t solve

any of our workflow problems.

slide-29
SLIDE 29

We weren’t able to provide the best, most efficient product to

  • ur customers (or ourselves)
slide-30
SLIDE 30

Using Docker and the Docker Ecosystem

slide-31
SLIDE 31

Create a customizable, flexible test environment that enables us to run tests in parallel

GOAL

slide-32
SLIDE 32

Big Wins with Docker

Even before 1.0, Docker was a clear choice

  • Support and tooling
  • Standardization
  • Community of motivated developers
slide-33
SLIDE 33

Using Docker allowed us to build a much better testing platform than with LXC alone.

slide-34
SLIDE 34

Codeship Jet

TODO: find higher res image

slide-35
SLIDE 35

Codeship Jet

2.3K builds per day ~250K total builds

Peak builds per day

slide-36
SLIDE 36

A Docker-based Testing Platform

  • Development started in 2014
  • First beta in 2015
  • Official launch February 2016
slide-37
SLIDE 37

A Docker-based Testing Platform

Built with Docker in order to support Docker workflows

slide-38
SLIDE 38

Why Docker?

  • Docker Compose: service and step definition syntax
  • Docker Registry: storage for images; remote caching*
  • Docker for Mac and Windows: give users ability to

reproduce CI environments locally

slide-39
SLIDE 39
slide-40
SLIDE 40

Docker Compose

  • Provides simplicity and a straightforward interface
  • Developers can use existing docker-compose.yml files

with Codeship

  • Use similar syntax for testing step definitions to get

users up and running faster

  • Ensure parity in dev, test, and production

Reuse of docker-compose.yml file is possible but typically not optimal Basic idea: reuse of services in dev, test, production Reduce barrier of entry

slide-41
SLIDE 41

The workflow tools provided by Docker are indispensable.

slide-42
SLIDE 42

Parallel Testing with Docker

slide-43
SLIDE 43

Managing containers with Docker allowed us to improve our parallel testing workflow

slide-44
SLIDE 44

A New Parallel Workflow

  • Loosen coupling between steps and services —

execute N steps against M services

  • Parallel and serial steps can be grouped and ordered

in any way

  • Introducing services adds additional layer of flexibility

N*M is still true but we don’t control for M anymore

slide-45
SLIDE 45

Services

  • Pull image from any registry or build from Dockerfile
  • Optimize service for testing tasks
  • Fully customizable by the user

Components of the app are expressed as services Multiple services is the biggest change between Checkbot and Jet

slide-46
SLIDE 46

Steps

  • Each step is executed in an independent environment
  • Can be nested in serial and parallel groups
  • Two functions
  • Run: execute a command against a service
  • Push: push image to registry
  • Tag regex matching to run steps on certain branches

Tasks are expressed as steps

slide-47
SLIDE 47

User Commands Universal Container Pipeline User Commands Universal Container Pipeline User Commands Universal Container Pipeline

T1 T1 T1

Different from having one testing service with multiple pipelines Enabled by engine (Dockerfiles/links), Registry, Compose

slide-48
SLIDE 48

Step

postgres redis command web

Step

postgres redis command web

Step

command web

Step

postgres redis command web

T1 T2 T3

Different from having one testing service with multiple pipelines Enabled by engine (Dockerfiles/links), Registry, Compose

slide-49
SLIDE 49

codeship-services.yml

db: image: postgres:9.5 app: encrypted_dockercfg_path: dockercfg.encrypted build: image: user/some-image dockerfile: Dockerfile.test cached: true links:

  • db

deploy: encrypted_dockercfg_path: dockercfg.encrypted build: dockerfile: Dockerfile.deploy

quick explanation of YAML for those who are not familiar

slide-50
SLIDE 50

codeship-steps.yml

  • type: serial

steps:

  • type: parallel

steps:

  • name: rspec

service: app command: bin/ci spec

  • name: rubocop

service: app command: rubocop

  • name: haml-lint

service: app command: haml-lint app/views

  • name: rails_best_practices

service: app command: bin/railsbp

  • service: deploy

type: push image_name: rheinwein/notes-app tag: ^master$ registry: https://index.docker.io/v1/ encrypted_dockercfg_path: dockercfg.encrypted

slide-51
SLIDE 51

Serial Steps

  • Maintain order within CI process
  • A failing step will stop and fail the build, and prevent

any other steps from executing

  • Only one serial step or step group can execute at a

time

slide-52
SLIDE 52

Parallel Steps

  • Optimize groupings for speed
  • First-to-fail will stop and fail build
  • It’s possible for other steps to be running while a

failing step is also running

slide-53
SLIDE 53

Pro Tip: Your push step should never be part of a parallel step group.

slide-54
SLIDE 54

Demo!

slide-55
SLIDE 55

Docker for Mac and Windows

  • All users can test locally
  • Jet CLI is available at https:/

/codeship.com/ documentation/docker/installation/

  • Don’t have a Docker for Mac/Windows invitation yet?

Totally cool, Docker Toolbox also rocks

  • HUGE advantage over our previous LXC implementation

I’m demoing using Docker for Mac (I’ll have a docker machine backup, as well as a video of my demo) Users not running on Linux are able to locally test and debug in an environment identical to CI

slide-56
SLIDE 56

Engineering Challenges

slide-57
SLIDE 57

Infrastructure

Build allocation

  • Customers can choose specs for their build

machines

  • Machine provisioning used to be part of the build

process

  • Now we pool build machines
  • Allocation time is ~1 second!
slide-58
SLIDE 58

Performance

Image Caching

  • Old way: rely on the registry for caching
  • A pull would give us access to each parent layer,

and then a rebuild of the service image used the cache

  • 1.10 content addressable breaking change
slide-59
SLIDE 59

Performance

Image Caching

  • Great news: 1.11 restores parent/child relationship

when you save the images via docker save

  • ETA: 1 month
  • Double-edged sword of relying on external tools

¯\_()_/¯

slide-60
SLIDE 60

What’s Next?

slide-61
SLIDE 61

Docker Swarm

  • Jet was born pre-Swarm
  • We manage build machines on AWS via our own

service

  • Previous concerns about security — single tenancy
  • Swarm (and services like Carina) are promising for the

future

slide-62
SLIDE 62

libcompose

  • Currently use APIs directly for container-level
  • perations (Jet was also born before Fig was popular)
  • Minimal change for users and builds, but much easier

for us

  • Have already completed preliminary work
slide-63
SLIDE 63

Task parallelism is cool and you can implement it with LXC alone, but using Docker tools makes it better.

TL;DR

slide-64
SLIDE 64

Thank you!