Preparing for the Unexpected Samuel Parkinson - PowerPoint PPT Presentation

Preparing for the Unexpected Samuel Parkinson samuel.parkinson@ft.com #qconlondon #prepfortheunexpected Photo by Hush Naidoo on Unsplash #qconlondon #prepfortheunexpected

#qconlondon #prepfortheunexpected

Let’s start with a story #qconlondon #prepfortheunexpected

What’s the worst thing that could happen? #qconlondon #prepfortheunexpected

************* #qconlondon #prepfortheunexpected

The FT.com zone was missing #qconlondon #prepfortheunexpected

FT.com has over 5,100 subdomains 😭 #qconlondon #prepfortheunexpected

This impacted the whole company #qconlondon #prepfortheunexpected

😲 #qconlondon #prepfortheunexpected

We have never prepared for such an incident #qconlondon #prepfortheunexpected

It’s a classic data loss situation #qconlondon #prepfortheunexpected

Our provider had a partial backup #qconlondon #prepfortheunexpected

But critical records we used for DNS load balancing were missing 👼 #qconlondon #prepfortheunexpected

About 10 people worked to resolve the incident #qconlondon #prepfortheunexpected

And over 30 people were online to follow along #qconlondon #prepfortheunexpected

Most were not called, but still volunteered their time #qconlondon #prepfortheunexpected

4h 30m The first hour was a total outage. #qconlondon #prepfortheunexpected

Lack of panic in the moment #qconlondon #prepfortheunexpected

It was a slick operation and we recovered #qconlondon #prepfortheunexpected

It took restoring from a backup and manual entry to get there #qconlondon #prepfortheunexpected

We were focused on recovery, not what happened #qconlondon #prepfortheunexpected

People were joining the incident to learn #qconlondon #prepfortheunexpected

This is where we are today #qconlondon #prepfortheunexpected

Photo by Victor Garcia on Unsplash #qconlondon #prepfortheunexpected

Photo by Markus Spiske on Unsplash #qconlondon #prepfortheunexpected

0. How do we do on-call? 1. Our incident management challenges 2. Making out-of-hours sustainable 3. The results and takeaways #qconlondon #prepfortheunexpected

Internal FT Core Products Enterprise Services Customer Products Operations & Reliability FT Group Products #qconlondon #prepfortheunexpected

Customer We are Products #qconlondon #prepfortheunexpected

45 engineers and counting 📉 #qconlondon #prepfortheunexpected

And we own about 180 systems #qconlondon #prepfortheunexpected

Split into 9 teams #qconlondon #prepfortheunexpected

Operations monitor our entire estate 24/7 #qconlondon #prepfortheunexpected

Our systems are a drop in the pond #qconlondon #prepfortheunexpected

You build it, you run it #qconlondon #prepfortheunexpected

Supporting our systems out-of-hours #qconlondon #prepfortheunexpected

This is our approach to DevOps #qconlondon #prepfortheunexpected

Our engineers wear many hats Photo by Joshua Coleman on Unsplash #qconlondon #prepfortheunexpected

We’re putting on our incident management hat #qconlondon #prepfortheunexpected

How do we do support out-of-hours? #qconlondon #prepfortheunexpected

Our engineers volunteer to be part of the out-of-hours team #qconlondon #prepfortheunexpected

We don’t have shifts #qconlondon #prepfortheunexpected

Which means, we could all be unavailable #qconlondon #prepfortheunexpected

What do we care about? #qconlondon #prepfortheunexpected

We’re talking about our business capabilities #qconlondon #prepfortheunexpected

What is an incident at the FT? #qconlondon #prepfortheunexpected

Customer Products has two really important business capabilities #qconlondon #prepfortheunexpected

1. Our users can always read the news #qconlondon #prepfortheunexpected

2. Journalists must be able to publish the news #qconlondon #prepfortheunexpected

If either of these go wrong we declare an incident #qconlondon #prepfortheunexpected

What were our challenges? #qconlondon #prepfortheunexpected

We were not immediately productive on call → #qconlondon #prepfortheunexpected

We were not immediately productive on call We had an engineering mindset in an operations situation #qconlondon #prepfortheunexpected

We were not immediately productive on call Because we don’t have any SRE or DevOps specialists #qconlondon #prepfortheunexpected

“ ” I always start with the impact and the comms, they kinda jump in at the Tech. #qconlondon #prepfortheunexpected

We were not immediately productive on call Our incident management process wasn’t second nature #qconlondon #prepfortheunexpected

We had very few incidents in the first half of the year #qconlondon #prepfortheunexpected

And we were down to 5 people on the out-of-hours support team #qconlondon #prepfortheunexpected

So we needed to make out-of-hours team sustainable #qconlondon #prepfortheunexpected

We surveyed engineers about helping out during an incident #qconlondon #prepfortheunexpected

There were many people on the fence #qconlondon #prepfortheunexpected

There were many people 7 people on the fence 3 people 6 people #qconlondon #prepfortheunexpected

And they told us why #qconlondon #prepfortheunexpected

“ ” I will need much more confidence in systems and domains knowledge. #qconlondon #prepfortheunexpected

“ ” If I were to have a better understanding of how it works and what I would need to do, I would very likely join. #qconlondon #prepfortheunexpected

We set out to convince people to join our out-of-hours team #qconlondon #prepfortheunexpected

We built and ran incident workshops #qconlondon #prepfortheunexpected

So our engineers are better prepared to take on incidents #qconlondon #prepfortheunexpected

And we wrote a generic runbook for our microservices #qconlondon #prepfortheunexpected

So engineers knew what they can do, and apply it to our ~180 systems #qconlondon #prepfortheunexpected

We set out in the last 6 months of 2019 to address the situation #qconlondon #prepfortheunexpected

Building your incident workshop → #qconlondon #prepfortheunexpected

Building your incident workshop Don’t Panic! #qconlondon #prepfortheunexpected

Preparing for the Unexpected Samuel Parkinson - PowerPoint PPT Presentation

Preparing for the Unexpected Samuel Parkinson samuel.parkinson@ft.com #qconlondon #prepfortheunexpected Photo by Hush Naidoo on Unsplash #qconlondon #prepfortheunexpected #qconlondon #prepfortheunexpected Lets start with a story

Investigation of Sudden Investigation of Sudden and Unexpected Infant and Unexpected Infant

Dialogue Managing the Unexpected Managing the Unexpected the Role of the Regulatory Body

How to Make a Formal Presentation Contents Preparing Content ( Written ) Theory

Scaling for the Expected and Unexpected Brian Moon Senior Web Engineer dealnews.com

Managing the Unexpected: Interactions between, Human Excellence and Organization Safety Culture

Pitfalls in PID: Expect the unexpected Faculty Disclosure X No, nothing to disclose Yes, please

Computing the unexpected and Unpredicted and Deceptive and Interesting Reflections

Preparing for Virtual Meitheal Preparing for Virtual Meitheal Video 1 of 4 What is Meitheal?

Preparing for Turbulent Times Ahead Preparing for Turbulent Times Ahead Further Strengthening our

Preparing for Cascadia 9.0 Preparing for Cascadia 9.0 Individual, Household, and Community

Preparing IRB Submissions for Human Subjects Research Tips for Preparing IRB Protocols IRB

Whiskey is for drinking and water is for fighting over. ~Mark Twain Preparing a Drought

ReConnect Program Preparing to Apply: Getting Started & Engaging Your Community Preparing to

Teaching Pathway in SW Kansas PERK Preparing Educators in Rural Kansas PERK Preparing

Motivational Interviewing Motivational Interviewing Preparing People for Change Preparing People

Teaching Pathway in SW Kansas PERK Preparing Educators in Rural Kansas PERK Preparing

Multiclass Boosting with Repartitioning Ling Li Learning Systems Group, Caltech ICML 2006

Adversarial Music: Real world audio adversary against wake-word detection systems Juncheng B. Li

Opening the pod bay doors building intelligent agents that can interpret, generate and learn

Kalman Filter Kalman Filter = special case of a Bayes filter with dynamics model and n

Product Development Dilemma Product Development Dilemma

Unique equilibrium states for geodesic flows in nonpositive curvature Todd Fisher Department of

Probabilistic reasoning with graphical security models Barbara Kordy Clermont-Ferrand, January

Consistent Multitask Learning with Nonlinear Output Constraints Carlo Ciliberto Department of

Sambuz

Useful Links

Newsletter

Mail Us

Preparing for the Unexpected Samuel Parkinson - PowerPoint PPT Presentation

Preparing for the Unexpected Samuel Parkinson samuel.parkinson@ft.com #qconlondon #prepfortheunexpected Photo by Hush Naidoo on Unsplash #qconlondon #prepfortheunexpected #qconlondon #prepfortheunexpected Lets start with a story

Investigation of Sudden Investigation of Sudden and Unexpected Infant and Unexpected Infant

Dialogue Managing the Unexpected Managing the Unexpected the Role of the Regulatory Body

How to Make a Formal Presentation Contents Preparing Content ( Written ) Theory

Scaling for the Expected and Unexpected Brian Moon Senior Web Engineer dealnews.com

Managing the Unexpected: Interactions between, Human Excellence and Organization Safety Culture

Pitfalls in PID: Expect the unexpected Faculty Disclosure X No, nothing to disclose Yes, please

Computing the unexpected and Unpredicted and Deceptive and Interesting Reflections

Preparing for Virtual Meitheal Preparing for Virtual Meitheal Video 1 of 4 What is Meitheal?

Preparing for Turbulent Times Ahead Preparing for Turbulent Times Ahead Further Strengthening our

Preparing for Cascadia 9.0 Preparing for Cascadia 9.0 Individual, Household, and Community

Preparing IRB Submissions for Human Subjects Research Tips for Preparing IRB Protocols IRB

Whiskey is for drinking and water is for fighting over. ~Mark Twain Preparing a Drought

ReConnect Program Preparing to Apply: Getting Started &amp; Engaging Your Community Preparing to

Teaching Pathway in SW Kansas PERK Preparing Educators in Rural Kansas PERK Preparing

Motivational Interviewing Motivational Interviewing Preparing People for Change Preparing People

Teaching Pathway in SW Kansas PERK Preparing Educators in Rural Kansas PERK Preparing

Multiclass Boosting with Repartitioning Ling Li Learning Systems Group, Caltech ICML 2006

Adversarial Music: Real world audio adversary against wake-word detection systems Juncheng B. Li

Opening the pod bay doors building intelligent agents that can interpret, generate and learn

Kalman Filter Kalman Filter = special case of a Bayes filter with dynamics model and n

Product Development Dilemma Product Development Dilemma

Unique equilibrium states for geodesic flows in nonpositive curvature Todd Fisher Department of

Probabilistic reasoning with graphical security models Barbara Kordy Clermont-Ferrand, January

Consistent Multitask Learning with Nonlinear Output Constraints Carlo Ciliberto Department of

Sambuz

Useful Links

Newsletter

Mail Us

ReConnect Program Preparing to Apply: Getting Started & Engaging Your Community Preparing to