Terraform Earth Secure Infrastructure for Developers Chase Evans - - PowerPoint PPT Presentation

terraform earth
SMART_READER_LITE
LIVE PREVIEW

Terraform Earth Secure Infrastructure for Developers Chase Evans - - PowerPoint PPT Presentation

Terraform Earth Secure Infrastructure for Developers Chase Evans Timeline 1. Where we were before May 2. Where we are today 3. Where we are going Timeline 1. Where we were before May GeoEngineer Builds Terraform state files by


slide-1
SLIDE 1

Terraform Earth

Secure Infrastructure for Developers

Chase Evans

slide-2
SLIDE 2

Timeline

  • 1. Where we were before May
  • 2. Where we are today
  • 3. Where we are going
slide-3
SLIDE 3

Timeline

  • 1. Where we were before May
slide-4
SLIDE 4
  • Builds Terraform state files by fetching

remote resources, think `$ terraform refresh`

  • Manual and distributed changes easily reconciled

when AWS is the source of truth

  • Looks like HCL
  • github.com/coinbase/geoengineer

GeoEngineer

slide-5
SLIDE 5

Applying Resources

slide-6
SLIDE 6

Terraform Mars

slide-7
SLIDE 7

The Problem

slide-8
SLIDE 8

The Problem (Bottlenecking)

slide-9
SLIDE 9

The Problem (Bottlenecking)

slide-10
SLIDE 10

The Problem (Bottlenecking)

slide-11
SLIDE 11

The Problem (Business units)

slide-12
SLIDE 12

The Problem (Platform vs Operations)

slide-13
SLIDE 13

The Problem (Did you remember to pull?)

slide-14
SLIDE 14

The Problem (Credential proliferation)

slide-15
SLIDE 15

The Problem (VPC proliferation)

slide-16
SLIDE 16

Timeline

  • 1. Where we were before May
  • 2. Where we are today
slide-17
SLIDE 17

Introducing Terraform Earth

slide-18
SLIDE 18

Heimdall

  • Records PR approvals with MFA
  • Provides a clean API
  • Not vulnerable to administrative

Github tampering

slide-19
SLIDE 19

Terraform Earth

slide-20
SLIDE 20

Single Production Deployment

  • One deployment makes updates easier
  • New VPCs work without deployment
slide-21
SLIDE 21

Flow Diagram

slide-22
SLIDE 22

Flow Diagram

slide-23
SLIDE 23

Why bother locking?

  • Concurrent changes are usually safe
  • Sometimes multiple PRs pile up and need to

modify a resource in order

slide-24
SLIDE 24

Flow Diagram

slide-25
SLIDE 25

Why SHAs and not ‘master’?

  • Master is just a label and moves frequently
  • Code has quorum, not labels
  • Something could be merged to the repo between

quorum check and clone

slide-26
SLIDE 26

Flow Diagram

slide-27
SLIDE 27

Handling Failure

  • Retry the GeoEngineer apply with backoff

AWS rate limits heavily AWS has failures

  • Queue and retry
  • Replay the webhook using Github administration
  • Add an endpoint to manually intervene
slide-28
SLIDE 28

Handling Failure

Not great solutions, if you have ideas, let me know

slide-29
SLIDE 29

Staging Deploys

  • Setup a bot with limited privileges

You can test the flow, without breaking everything We have a separate repository that defines 1 S3 bucket

  • Make a periodic cleaner that cleans up test resources

We use lambdas to do this

slide-30
SLIDE 30

Timeline

  • 1. Where we were before May
  • 2. Where we are today
  • 3. Where we are going
slide-31
SLIDE 31

Team Scaling

slide-32
SLIDE 32

Team Scaling

slide-33
SLIDE 33

Team Scaling

slide-34
SLIDE 34

Resource Configuration Today

slide-35
SLIDE 35

Ownership

slide-36
SLIDE 36

Ownership

slide-37
SLIDE 37

Resource Configuration Today

  • project = Project.new(‘infra/heimdall’, aws_accounts)
  • project.service_with_elb(‘api’, configuration)
  • project.rds_instance(‘db’, configuration)
slide-38
SLIDE 38

What’s Wrong?

  • Uses language the Infrastructure team knows
  • Developer’s mental model of deploys is not represented
  • Too many options, very little opinion
  • Code is too flexible
slide-39
SLIDE 39

Resource Configuration Tomorrow

name: ‘developers/my-service’ services:

  • api:

load_balanced: true accessible_by: [‘developers/my-other-service’] databases:

  • postgres:

size: medium

slide-40
SLIDE 40

Ownership

slide-41
SLIDE 41

Ownership

slide-42
SLIDE 42

The Future

slide-43
SLIDE 43

Design Considerations

  • Mono-repo or multi-repo
  • Automated workflows (PR bots)
  • Exposing the information to outside services
slide-44
SLIDE 44

The Other Half

  • Provisioning and management is now easy
  • Operation is not
slide-45
SLIDE 45

Account Stewardship Today

slide-46
SLIDE 46

Account Stewardship Today

slide-47
SLIDE 47

Account Stewardship Tomorrow

slide-48
SLIDE 48

Complications

  • Managing connectivity between many VPCs is hard
  • Like microservices, finding the right domain is difficult
  • How much access is enough access?
slide-49
SLIDE 49

Team Scaling

slide-50
SLIDE 50

Team Scaling

slide-51
SLIDE 51

The Future

slide-52
SLIDE 52

The Future

slide-53
SLIDE 53

The Future

slide-54
SLIDE 54

The Future

slide-55
SLIDE 55

Secure Infrastructure for Developers

Or: Infrastructure with Vacation

slide-56
SLIDE 56

We’re Hiring!

careers.coinbase.com

slide-57
SLIDE 57

Questions?

chase.evans@coinbase.com