building resilience
play

Building resilience How outages shaped Etsys systems Act 1 Quick! - PowerPoint PPT Presentation

Building resilience How outages shaped Etsys systems Act 1 Quick! Be resilient! http://www.flickr.com/photos/niaid/11854196633/sizes/l/ Quick! Be resilient! Actually, its a slow process Iterative Introspective Horizontal


  1. Building resilience How outages shaped Etsy’s systems

  2. Act 1

  3. Quick! Be resilient! http://www.flickr.com/photos/niaid/11854196633/sizes/l/

  4. Quick! Be resilient! • Actually, it’s a slow process • Iterative • Introspective • Horizontal and vertical development

  5. Quick! Be resilient! http://www.flickr.com/photos/ogcodes/6091644301/sizes/l/

  6. Quick! Be resilient! http://www.flickr.com/photos/studio360/1150744342/sizes/o/

  7. Quick! Be resilient! http://www.flickr.com/photos/studio360/1150744368/sizes/o/

  8. Quick! Be resilient! http://www.flickr.com/photos/ogcodes/6091644301/sizes/l/

  9. Quick! Be resilient! Next generation Current generation

  10. Quick! Be resilient! http://www.flickr.com/photos/jurvetson/8671257096/

  11. Quick! Be resilient! http://cudebi.wordpress.com/2012/09/19/tah-pagh-tahbe-o-el-reconocimiento-de-william-shakespeare-en-el-universo-de-star-trek/

  12. Resilience Engineering http:/ /www.flickr.com/photos/freefoto/728651045/sizes/o/

  13. Resilience Engineering • “To Engineer is Human” 
 “To Forgive Design” 
 - Henry Petroski • “The Field Guide to Understanding Human Error” 
 “Just Culture” 
 - Sidney Dekker

  14. Act 2

  15. Building resilience at Etsy • Continuous deployment • Metrics, metrics, metrics • Peer review • Postmortems

  16. Building resilience at Etsy • Postmortems } • Continuous deployment • Metrics, metrics, metrics Culture • Peer review

  17. Postmortems Or: How to win at failing

  18. Constructive cultures • No blame • Open discussion • Focus on improvements

  19. Constructive cultures • Focus on improvements } • No blame Culture • Open discussion

  20. Destructive cultures “The nail that sticks up, 
 gets hammered down” –Japanese proverb

  21. The result?

  22. • #23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations and deploys • Lasting relationships • Generousity of spirit • …and much more

  23. Act 3

  24. Doing postmortems? Get Morgue http:/ /github.com/etsy/morgue

  25. Morgue

  26. Morgue

  27. Morgue

  28. Forkistan • Mean time to detect: 0 min • Mean time to recover: 10 mins

  29. Yo Dawg, I Heard You Like Errors.. • Mean time to detect: 2 mins • Mean time to recover: 15 mins

  30. Smashing INT for Fun and Profit • Mean time to detect: 0 min • Mean time to recover: 4 hrs 52 mins

  31. Apache Amnesia • Mean time to detect: 2 hours • Mean time to recover: 5 mins

  32. Continuously Upgrading Databases • Mean time to detect: 2 mins • Mean time to recover: 1 hour (but, not really..)

  33. Q & A Avleen Vig Sta ff Operations Engineer Etsy, Inc @avleen

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend