Spotify Lessons: Learning to Let Go of Machines
IO Tribe
James Wen, Site Reliability Engineer at Spotify ALF Squad, Infrastructure & Operations Tribe
Spotify Lessons: Learning to Let Go of Machines James Wen, Site - - PowerPoint PPT Presentation
Spotify Lessons: Learning to Let Go of Machines James Wen, Site Reliability Engineer at Spotify ALF Squad, Infrastructure & Operations Tribe IO Tribe Lets control how feature developers think about what their code is actually
IO Tribe
James Wen, Site Reliability Engineer at Spotify ALF Squad, Infrastructure & Operations Tribe
Stockholm San Jose Rack 2 Rack 1
Historical: Feature Developer’s Context for Service’s Capacity
lon-1-d lon-1-b lon-1-c lon-1-a
keys updated
Rack 2 lon-1-f lon-1-e
updated
disk, etc.)
memory)
Unbound v1.6.3 ash2-metadata-a.ash2.spotify.net Openssl v1.0.0f 2 Cores 8 GB RAM Tarred Logs In Virginia 3 Years
How to get? How many? Specs? How long? How to talk to it? Where? Up to date? How to track? What tools
Maintenance?
What to put
Available? Service + Business
How to get? How many? Specs? How long? How to talk to it? Where? Up to date? How to track? What tools
Maintenance?
What to put
Available? Service + Business
How to get? How long? How to talk to it? What tools
Maintenance?
What to put
Available? Service + Business How to track? Where? Up to date? How many? Specs?
How long? How to talk to it? What tools
Maintenance?
What to put
Available? Service + Business How to track? How to get? Where? Up to date? How many? Specs?
How long? How to talk to it? What tools
Maintenance?
What to put
Available? Service + Business How to track? How to get? Where? Up to date? How many? Specs?
How long? How to talk to it? What tools
Maintenance?
What to put
Available? Service + Business How to track? How to get? Where? Up to date? How many? Specs?
How long? How to talk to it? What tools
Maintenance?
What to put
Available? Service + Business How to track? How to get? Where? Up to date? How many? Specs?
How long? How to talk to it? What tools
Maintenance?
What to put
Available? Service + Business Where? Up to date? How many? Specs? How to track? How to get?
How long? How to talk to it? What tools
Maintenance?
What to put
Available? Service + Business How to track? How to get? Where? Up to date? How many? Specs?
How long? How to talk to it? What tools
Maintenance?
What to put
Available? Service + Business How to track? How to get? Where? Up to date? How many? Specs?
How to talk to it? What tools
Maintenance?
What to put
Available? Service + Business How to track? How to get? Where? How long? Up to date? How many? Specs?
How long? How to talk to it?
Maintenance?
Available? Service + Business How to track? How to get? Where? Up to date? What tools
What to put
How many? Specs?
Current: Feature Developer’s Context for Service’s Capacity
GCP - europe-west-1 Pool: 2 instances x (n1-standard-32) Stockholm Pool: 4 instances x (High Mem)
How long? How to talk to it?
Maintenance?
Available? Service + Business How to track? How to get? Where? Up to date? What tools
What to put
How many? Specs?
How long? How to talk to it?
Maintenance?
Available? Service + Business How to track? How to get? Where? Up to date? What tools
What to put
How many? Specs?
How long? How to talk to it?
Maintenance?
Available? Service + Business How to track? How to get? Where? Up to date? What tools
What to put
How many? Specs?
How long? How to talk to it?
Maintenance?
Available? Service + Business How to track? How to get? Where? Up to date? What tools
What to put
How many? Specs?
Future Feature Developer’s Context for Service’s Capacity
GCP - asia-east-1 Service Pool GCP - europe-west-1 Service Pool GCP - us-central-1 Service Pool
How long? How to talk to it?
Maintenance?
Available? Service + Business How to track? How to get? Where? Up to date? What tools
What to put
How many? Specs?
iterate
many
paradigms
sledgehammers, and/or limos to change
enough for feature teams to handle the edge cases
IO Tribe