You build it, you run it Matthias Rampke, SoundCloud You build it, - - PowerPoint PPT Presentation

you build it you run it
SMART_READER_LITE
LIVE PREVIEW

You build it, you run it Matthias Rampke, SoundCloud You build it, - - PowerPoint PPT Presentation

You build it, you run it Matthias Rampke, SoundCloud You build it, you run it Operating SoundCloud's microservice architecture GOTO Berlin 2016 Intro: me Who I am and where I work Engineer in Production Engineering (platform, monitoring,


slide-1
SLIDE 1

You build it, you run it

Matthias Rampke, SoundCloud

slide-2
SLIDE 2
slide-3
SLIDE 3

You build it, you run it

Operating SoundCloud's microservice architecture

GOTO Berlin 2016

slide-4
SLIDE 4

Engineer in Production Engineering

(platform, monitoring, availability)

previously in Systems Engineering

(ops remnant catch-all)

Intro: me

Who I am and where I work

slide-5
SLIDE 5

a cloud full of sounds 135M tracks, 12M artists, 175M listeners 300+ employees no ops team

Intro: SoundCloud

Who I am and where I work

slide-6
SLIDE 6

Where we came from Where we are today Why we did it How you can do it How does this compare to…?

Intro: Agenda

slide-7
SLIDE 7

⋁ ⋁

Where we came from

slide-8
SLIDE 8

One team One table One codebase

In the beginning …

the early days

slide-9
SLIDE 9

2009/2010

20-50 engineers hired an ops team, 24/7 on-call app team deploys the monolith first separate "micro"services

growing pains

slide-10
SLIDE 10

2011/2012

more microservices deployment platform SRE/platforms team multiple on-call rotations

the fork in the road

slide-11
SLIDE 11

2013-2015

cambrian explosion of microservices feature teams and collectives client specific APIs shared components & libraries continuous delivery

maturing

slide-12
SLIDE 12

⋁ ⋁

Where we are today

slide-13
SLIDE 13

simplified

Org chart

slide-14
SLIDE 14

every feature • service • codebase is owned by a team

Ownership

You buildown it, you run it

slide-15
SLIDE 15
  • wners are on call for what they own

groups of teams work together to reduce load remove alerts • write documentation

On Call

slide-16
SLIDE 16

avoid shared infrastructure be flexible don't duplicate work

Shared Components

slide-17
SLIDE 17

run the systems that run systems monitoring & availability internal consulting

Production Engineering

slide-18
SLIDE 18

⋁ ⋁

Why we did it

slide-19
SLIDE 19

autonomy predictability velocity

Delivery

get more done, consistently

slide-20
SLIDE 20

learn something new every day no pure specialists internal mobility

Personal growth

slide-21
SLIDE 21

simple resilient

  • perable

Better systems

slide-22
SLIDE 22

⋁ ⋁

How you can do it

slide-23
SLIDE 23

basic automation

  • penness

pride trust

Prerequisites

slide-24
SLIDE 24

testing & deployment

  • n-call

provisioning dependencies

Expanding ownership

slide-25
SLIDE 25

internal moves escalation paths documentation tooling

Checks & Balances

slide-26
SLIDE 26

learn improve commiserate

Postmortems

slide-27
SLIDE 27

⋁ ⋁

How does this compare to …?

slide-28
SLIDE 28

no assignment to SWE teams no on-call handoff no deploy blocks

Site Reliability Engineering

as Google describes it

slide-29
SLIDE 29

more shared code more communication infrastructure & core teams

Radical agility

as Zalando describe it

slide-30
SLIDE 30

no Ops team less shared infrastructure less standardization deploys spread in a different dimension

DevOps

as described by Etsy

slide-31
SLIDE 31

.

soundcloud.com Berlin New York San Francisco London

Slides: https://bit.ly/gotober16-sc Please rate!

slide-32
SLIDE 32