You build it, you run it Matthias Rampke, SoundCloud You build it, - - PowerPoint PPT Presentation
You build it, you run it Matthias Rampke, SoundCloud You build it, - - PowerPoint PPT Presentation
You build it, you run it Matthias Rampke, SoundCloud You build it, you run it Operating SoundCloud's microservice architecture GOTO Berlin 2016 Intro: me Who I am and where I work Engineer in Production Engineering (platform, monitoring,
You build it, you run it
Operating SoundCloud's microservice architecture
GOTO Berlin 2016
Engineer in Production Engineering
(platform, monitoring, availability)
previously in Systems Engineering
(ops remnant catch-all)
Intro: me
Who I am and where I work
a cloud full of sounds 135M tracks, 12M artists, 175M listeners 300+ employees no ops team
Intro: SoundCloud
Who I am and where I work
Where we came from Where we are today Why we did it How you can do it How does this compare to…?
Intro: Agenda
⋁ ⋁
Where we came from
One team One table One codebase
In the beginning …
the early days
2009/2010
20-50 engineers hired an ops team, 24/7 on-call app team deploys the monolith first separate "micro"services
growing pains
2011/2012
more microservices deployment platform SRE/platforms team multiple on-call rotations
the fork in the road
2013-2015
cambrian explosion of microservices feature teams and collectives client specific APIs shared components & libraries continuous delivery
maturing
⋁ ⋁
Where we are today
simplified
Org chart
every feature • service • codebase is owned by a team
Ownership
You buildown it, you run it
- wners are on call for what they own
groups of teams work together to reduce load remove alerts • write documentation
On Call
avoid shared infrastructure be flexible don't duplicate work
Shared Components
run the systems that run systems monitoring & availability internal consulting
Production Engineering
⋁ ⋁
Why we did it
autonomy predictability velocity
Delivery
get more done, consistently
learn something new every day no pure specialists internal mobility
Personal growth
simple resilient
- perable
Better systems
⋁ ⋁
How you can do it
basic automation
- penness
pride trust
Prerequisites
testing & deployment
- n-call
provisioning dependencies
Expanding ownership
internal moves escalation paths documentation tooling
Checks & Balances
learn improve commiserate
Postmortems
⋁ ⋁
How does this compare to …?
no assignment to SWE teams no on-call handoff no deploy blocks
Site Reliability Engineering
as Google describes it
more shared code more communication infrastructure & core teams
Radical agility
as Zalando describe it
no Ops team less shared infrastructure less standardization deploys spread in a different dimension
DevOps
as described by Etsy
.
soundcloud.com Berlin New York San Francisco London