Migrating HealthCare.gov to Terraform: Lessons Learned Christian - - PowerPoint PPT Presentation

migrating healthcare gov to terraform lessons learned
SMART_READER_LITE
LIVE PREVIEW

Migrating HealthCare.gov to Terraform: Lessons Learned Christian - - PowerPoint PPT Presentation

Migrating HealthCare.gov to Terraform: Lessons Learned Christian Monaghan @monaghan_a_gram Cofounder, Nava PBC What is Terraform? A tool for building , changing , and versioning infrastructure Manage cloud providers Infrastructure as Code


slide-1
SLIDE 1

Christian Monaghan

@monaghan_a_gram Cofounder, Nava PBC

Migrating HealthCare.gov to Terraform: Lessons Learned

slide-2
SLIDE 2

What is Terraform?

slide-3
SLIDE 3

A tool for building, changing, and versioning infrastructure

slide-4
SLIDE 4

Manage cloud providers

slide-5
SLIDE 5

Infrastructure as Code

  • Declarative syntax
  • Source control
  • Variable support
slide-6
SLIDE 6
  • Developer reviews

plan before proceeding

Execution plans

slide-7
SLIDE 7

Resource graph

  • Resources created in

dependency order

slide-8
SLIDE 8

Resource graph

  • Resources created in

dependency order

slide-9
SLIDE 9

Our project history

slide-10
SLIDE 10

AWS Cloudformation

JSON interface 3,000+ lines for 1 Virtual Private Cloud (VPC) Managing dozens of VPCs

slide-11
SLIDE 11

Custom tooling to interact with Cloudformation

YAML Config Custom script AWS Cloudformation

slide-12
SLIDE 12

Challenges we faced with our existing tooling

slide-13
SLIDE 13
  • Complex
  • Not unit tested
  • Limited documentation, quickly out of date
  • Increasing bloat
  • Hard to understand
  • Hard to debug

Maintaining custom code :(

slide-14
SLIDE 14

Unable to incorporate manual changes

Past examples:

  • Horizontally scale NATs (Network Address Translation)
  • Adding a temporary second Elastic Load Balancer
  • Scaling down from 3 availability zones to 1 availability zone
  • Swap in new Elastic IPs
slide-15
SLIDE 15

Uncertain client demands

  • Must build atop partially provisioned

vpc infrastructure

  • Client frequently requesting custom

architecture changes

  • Client might make manual changes

that would be unrecoverable in Cloudformation

slide-16
SLIDE 16
  • Load testing resources
  • Continuous Integration clusters
  • Custom monitoring
  • Graphite/Graphana
  • Nessus scanning clusters

Proliferating use cases

slide-17
SLIDE 17

We were trying to shoehorn all these new use cases into our existing tooling

slide-18
SLIDE 18

Engineering goal

slide-19
SLIDE 19

Manage all infrastructure with a single tool that is flexible, extensible, fast, and well-supported

slide-20
SLIDE 20

Choosing the right tool

slide-21
SLIDE 21

Tools we considered

slide-22
SLIDE 22

Chef, Puppet, Ansible, SaltStack

  • These are configuration management tools
  • Install and manage software on existing machines
slide-23
SLIDE 23
  • Incorporate manual changes
  • Declarative syntax, easy to read, understand, extend
  • Supports multiple providers
  • Separates planning and execution
  • Well-supported, open-source
  • Modular

Why we chose Terraform

slide-24
SLIDE 24

Some Terraform basics

slide-25
SLIDE 25

Changes required

How it knows what to provision

Desired state Actual state

slide-26
SLIDE 26

Desired state looks like this

slide-27
SLIDE 27

Actual state looks like this

slide-28
SLIDE 28

Prototyping

slide-29
SLIDE 29

Greenfield approach

Define Diff Apply State Updated

slide-30
SLIDE 30

Reverse engineering approach

Define Diff Apply Import State

slide-31
SLIDE 31

Hardcoded

Refactor to use variables

Variables

slide-32
SLIDE 32

Testing

1. Successfully provision a new VPC 2. Application functional

a. Passes health checks b. Passes smoke testing

3. Infrastructure security scan

a. AWS Trusted Advisor

slide-33
SLIDE 33

End result

  • A configuration file (.tf) that

represents one complete vpc configuration

  • A state file (.tfstate) that

represents one existing vpc

slide-34
SLIDE 34

Design

slide-35
SLIDE 35

How can we design this for reuse?

AppA Test AppA Staging AppA Prod AppA ... AppB Test AppB Staging AppB Prod AppB ... ... ... ... ...

slide-36
SLIDE 36

Existing design

Variable inputs Assemble building blocks Building blocks

slide-37
SLIDE 37

Implementation

slide-38
SLIDE 38

Build new VPC's & cutover traffic

slide-39
SLIDE 39

Learnings

slide-40
SLIDE 40

Use shared modules sparingly

slide-41
SLIDE 41

Sharing modules within applications worked well

Use shared modules sparingly

slide-42
SLIDE 42

Use shared modules sparingly

Sharing modules across applications did not work well

slide-43
SLIDE 43

Change the Elastic Load Balancer module

Use shared modules sparingly

slide-44
SLIDE 44

Use shared modules sparingly

slide-45
SLIDE 45

Use shared modules sparingly

cots

slide-46
SLIDE 46

Migrating infrastructure in place

It's possible, but time consuming

slide-47
SLIDE 47

Importing existing state

  • Native terraform import CLI utility

○ Only imports one resource at a time ○ Requires manually finding each resource id relevant to a particular vpc

  • Third party open source terraforming CLI

○ Imports all resources in a region ○ Cannot narrow scope to a specific vpc

slide-48
SLIDE 48

Lock resources to a particular terraform version

slide-49
SLIDE 49

Terraform needs to be managed in CI/CD

Otherwise:

  • Risk losing internet connection in mid-apply
  • No record of who changed what when
  • Developers bump versions unintentionally
slide-50
SLIDE 50

Semantically version modules with git tags

Good Bad

slide-51
SLIDE 51

Terraform utilities

slide-52
SLIDE 52

terraforming

Export existing AWS resources to Terraform

slide-53
SLIDE 53

TODO: screenshot

tfenv

Terraform version manager inspired by rbenv

slide-54
SLIDE 54

terraform fmt

Before After

slide-55
SLIDE 55

terraform-docs

Generate docs from terraform modules

slide-56
SLIDE 56

Thank you

@monaghan_a_gram