heretical resilience
play

heretical resilience (to repair is human) Ryn Daniels - - PowerPoint PPT Presentation

heretical resilience (to repair is human) Ryn Daniels - @rynchantress QCon New York 2018 @rynchantress qcon nyc 2018 my side of the story AKA: A Dramatic blargh Retelling of The Time I Nearly Broke Etsy Dot Com @rynchantress qcon


  1. heretical resilience (to repair is human) Ryn Daniels - @rynchantress 
 QCon New York 2018

  2. @rynchantress qcon nyc 2018

  3. my side of the story AKA: A Dramatic blargh Retelling of The Time I Nearly Broke Etsy Dot Com @rynchantress qcon nyc 2018

  4. @rynchantress qcon nyc 2018

  5. apache versions @rynchantress qcon nyc 2018

  6. apache versions @rynchantress qcon nyc 2018

  7. @rynchantress qcon nyc 2018

  8. @rynchantress qcon nyc 2018

  9. blargh @rynchantress qcon nyc 2018

  10. blargh @rynchantress qcon nyc 2018

  11. @rynchantress qcon nyc 2018

  12. @rynchantress qcon nyc 2018

  13. @rynchantress qcon nyc 2018

  14. blargh @rynchantress qcon nyc 2018

  15. blargh @rynchantress qcon nyc 2018

  16. @rynchantress qcon nyc 2018

  17. @rynchantress qcon nyc 2018

  18. @rynchantress qcon nyc 2018

  19. @rynchantress qcon nyc 2018

  20. @rynchantress qcon nyc 2018

  21. + = + + = @rynchantress qcon nyc 2018

  22. @rynchantress qcon nyc 2018

  23. @rynchantress qcon nyc 2018

  24. + + = @rynchantress qcon nyc 2018

  25. @rynchantress qcon nyc 2018

  26. blargh @rynchantress qcon nyc 2018

  27. blargh @rynchantress qcon nyc 2018

  28. The Post-mortem aka: What the heck actually just happened? @rynchantress qcon nyc 2018

  29. The Post-mortem aka: What the heck actually just happened? aka: what did we learn? @rynchantress qcon nyc 2018

  30. how did the site stay up? @rynchantress qcon nyc 2018

  31. @rynchantress qcon nyc 2018

  32. @rynchantress qcon nyc 2018

  33. Lesson 1 Always keep 7 servers out of config management, just in case. @rynchantress qcon nyc 2018

  34. Lesson 1 Consider fallbacks 
 for automation @rynchantress qcon nyc 2018

  35. distrusting your automation • How will you detect problems? • How easily can you test your automation? • Can you turn the automation off? • Do you remember how to do the thing manually? @rynchantress qcon nyc 2018

  36. How did we respond so fast? @rynchantress qcon nyc 2018

  37. @rynchantress qcon nyc 2018

  38. blargh @rynchantress qcon nyc 2018

  39. Lesson 2 Create a Slack Team in charge of maintaining a proper amount of slack in case of incidents. @rynchantress qcon nyc 2018

  40. Lesson 2 maintain adaptive capacity @rynchantress qcon nyc 2018

  41. twiddling your thumbs • How do people ask each other for help? • Which teams have more or less slack? • What happens after work gets rearranged? @rynchantress qcon nyc 2018

  42. what couldn't we see? @rynchantress qcon nyc 2018

  43. @rynchantress qcon nyc 2018

  44. @rynchantress qcon nyc 2018

  45. @rynchantress qcon nyc 2018

  46. @rynchantress qcon nyc 2018

  47. Lesson 3 Buy a couple botnets to DDoS your monitoring tools every now and then. @rynchantress qcon nyc 2018

  48. Lesson 3 understand the dependencies 
 in your tooling @rynchantress qcon nyc 2018

  49. watching the world burn • What do your monitoring/automation/ 
 orchestration tools depend on? • Who watches the watchers? • How do you communicate internally and externally? • Do you have backup tools? @rynchantress qcon nyc 2018

  50. what actually went wrong with chef? @rynchantress qcon nyc 2018

  51. @rynchantress qcon nyc 2018

  52. Lesson 4 Always label your dragons. @rynchantress qcon nyc 2018

  53. Lesson 4 make informed decisions about which yaks to shave. @rynchantress qcon nyc 2018

  54. choosing your yaks wisely • Which teams have sufficient slack? • Can a problem be avoided if not solved? • What are the tradeoffs and opportunity costs? • Who has the precision yak razors? @rynchantress qcon nyc 2018

  55. who digs into the weird things? @rynchantress qcon nyc 2018

  56. Lesson 4.5 Hire the person who created the primary language your site is written in. 
 (This always scales.) @rynchantress qcon nyc 2018

  57. Lesson 4.5 Develop depth of 
 inter-team relationships @rynchantress qcon nyc 2018

  58. finding your own rasmus • Which areas only have one (or two) people who understand them? • How is information shared within your organization? • What behaviors are rewarded? @rynchantress qcon nyc 2018

  59. what happened afterwards? @rynchantress qcon nyc 2018

  60. @rynchantress qcon nyc 2018

  61. Lesson 5 Give people ill-fitting clothing when they mess up. @rynchantress qcon nyc 2018

  62. Lesson 5 encourage organizational learning @rynchantress qcon nyc 2018

  63. a warning to others • How do people respond to incidents? • What happens after an incident? • How are remediation items prioritized? • What happen to the bandaid solutions? @rynchantress qcon nyc 2018

  64. @rynchantress qcon nyc 2018

  65. technology can be robust.* only humans can be resilient. *for some already-known, pre-defined subset of problems @rynchantress qcon nyc 2018

  66. @rynchantress qcon nyc 2018

  67. 1. understand your automation 2. maintain adaptive capacity 3. know your dependencies 4. build cross-team relationships 5. always be learning @rynchantress qcon nyc 2018

  68. 1. understand your automation 2. maintain adaptive capacity 3. know your dependencies 4. build cross-team relationships 5. always be learning @rynchantress qcon nyc 2018

  69. Thank you! @rynchantress qcon nyc 2018

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend