the sre i aspire to be
play

The SRE I aspire to be Yaniv Aknin // @aknin #VelocityConf San Jose - PowerPoint PPT Presentation

The SRE I aspire to be Yaniv Aknin // @aknin #VelocityConf San Jose 2019 The SRE I aspire to be // @aknin Who is this guy Google SRE since 2013 Most recently GCP's Quantitative Reliability Lead Jack of all trades Equal parts SRE, dev,


  1. The SRE I aspire to be Yaniv Aknin // @aknin #VelocityConf San Jose 2019

  2. The SRE I aspire to be // @aknin Who is this guy ● Google SRE since 2013 Most recently GCP's Quantitative Reliability Lead ● Jack of all trades Equal parts SRE, dev, and /pro(duct|ject) manager/ ● Opinions my own But I owe a lot here to others

  3. The SRE I aspire to be // @aknin Who is this guy ● Google SRE since 2013 Most recently GCP's Quantitative Reliability Lead ● Jack of all trades * Equal parts SRE, dev, and /pro(duct|ject) manager/ ● Opinions my own But I owe a lot here to others * NB: what does "SRE" really mean?

  4. The SRE I aspire to be // @aknin Wikipedia says Engineering is " using scientific principles to design and build https://en.wikipedia.org/wiki/Engineering $THINGS "

  5. The SRE I aspire to be // @aknin Wikipedia says Engineering is " using scientific principles to design and build https://en.wikipedia.org/wiki/Engineering $THINGS " Imagine THINGS="Reliability" ... how do we apply science to that?

  6. The SRE I aspire to be // @aknin Innovation Reliability (engineering, proactive, change) (support, reactive, preserve)

  7. The SRE I aspire to be // @aknin (support, reactive, preserve) Reliability (engineering, proactive, change) ? Innovation

  8. The SRE I aspire to be // @aknin ( engineering, proactive, change ) Reliability (engineering, proactive, change) Innovation The Error Budget

  9. The SRE I aspire to be // @aknin Measurably optimise reliability vs cost

  10. The SRE I aspire to be // @aknin “ When you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, your knowledge is of a meagre and ” unsatisfactory kind . William Thomson (Lord Kelvin) President of the Royal Society Lecture on "Electrical Units of Measurement" Published in "Popular Lectures", Vol. 1, 1883 (abridged to fit slide)

  11. The SRE I aspire to be // @aknin MTTR 99.9% 99.99% MTBF MTBF/MTTR "9s" (e.g. "99.95% uptime") Challenge: fungible definition of "failure" Challenge: aggregating individual events into business credible 9s

  12. The SRE I aspire to be // @aknin Why is this hard? ● Scope ● Difficulty ● Cost++ ● Misconceptions

  13. The SRE I aspire to be // @aknin Why is this hard? And why is it good? ● Scope ● Leverage ● Difficulty ● Precision ● Cost++ ● Cost-- ● Misconceptions

  14. The SRE I aspire to be // @aknin On ops, user harm, and tradeoffs Ops Your product is here. User happiness

  15. The SRE I aspire to be // @aknin On ops, user harm, and tradeoffs Ops Your product is here. User happiness

  16. The SRE I aspire to be // @aknin On ops, user harm, and tradeoffs Ops Your product is here. User happiness

  17. The SRE I aspire to be // @aknin On ops, user harm, and tradeoffs Ops Your product is here. User happiness

  18. The SRE I aspire to be // @aknin You need "better quality" 9s! 99.999% "I spent time making my metrics hit certain thresholds" Misaligned Aligned "Whatever I happened "I spent time ensuring 9s correlate to measure" with customer pain" 99% "Whatever I happened to ship"

  19. The SRE I aspire to be // @aknin First move right, then move up 99.999% "I spent time making my metrics hit certain thresholds" Wasted Happy Effort Customers Misaligned Aligned "Whatever I happened "I spent time ensuring 9s correlate to measure" with customer pain" Unknown Known Problem Problem 99% "Whatever I happened to ship"

  20. The SRE I aspire to be // @aknin SRE team: a recipe Obvious Monitoring Alerting Capacity planning CI/CD & Rollouts Load Balancing

  21. The SRE I aspire to be // @aknin SRE team: a recipe Obvious Less Obvious Monitoring System Architecture Alerting Distributed Algorithms Capacity planning Networking CI/CD & Rollouts Operating Systems Load Balancing

  22. The SRE I aspire to be // @aknin SRE team: a recipe Obvious Less Obvious Least Obvious Monitoring Product Management System Architecture Alerting Data Science Distributed Algorithms Capacity planning Business Acumen Networking CI/CD & Rollouts (nose for) UX Operating Systems Research Load Balancing

  23. The SRE I aspire to be // @aknin Litmus test of SRE ● Have a measurement of reliability ● When unreliable, resource allocation changes ● When reliable, you don't do ops

  24. The SRE I aspire to be // @aknin * Litmus test of SRE ● Have a measurement of reliability ● When unreliable, resource allocation changes ● When reliable, you don't do ops * Please remember this is my litmus test... tell me yours?

  25. The SRE I aspire to be // @aknin Thank you! Yaniv Aknin // @aknin Art credits "Lord Kelvin", Messrs. Dickinson, London, goo.gl/RHF61Z, [cropped] Yin Yang, https://openclipart.org/detail/276316/ying-yang

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend