automating chaos experiments in production
play

Automating Chaos Experiments In Production Ali Basiri - Chaos Team - PowerPoint PPT Presentation

Automating Chaos Experiments In Production Ali Basiri - Chaos Team @abasiri Netflix Control CDN Plane Movie Bits Website, Apps, Signup, Login, Browsing, Search Playback control, Bookmarks, ... Ali Basiri Software Engineer @ Netflix


  1. Automating Chaos Experiments In Production Ali Basiri - Chaos Team @abasiri

  2. Netflix

  3. Control CDN Plane Movie Bits Website, Apps, Signup, Login, Browsing, Search Playback control, Bookmarks, ...

  4. Ali Basiri Software Engineer @ Netflix Chaos Engineer ● Distributed Systems Engineer ● Co-author of Principles of Chaos ●

  5. Chaos Monkey

  6. Service Availability

  7. Anatomy of a Failure

  8. Movie CDN API Info Selection

  9. Movie CDN API Info Selection

  10. Movie CDN API Info Selection Fallback

  11. Movie CDN API Info Selection Fallback

  12. Fallback Movie CDN API Info Selection

  13. Fallback Movie CDN API Info Selection

  14. FIT

  15. Request Level Failure Injection

  16. Request Level Failure Injection Movie CDN API Info Selection

  17. Is API resilient to failure of Personalization? Persona- Gateway API lization

  18. Persona- Gateway API lization Randomly select 10% of requests to participate in experiment

  19. Persona- Gateway API lization

  20. Persona- Gateway API lization if (shouldFail == true)

  21. Persona- Gateway API lization if (shouldFail == true)

  22. Even More FIT Availability

  23. CH ∀ OS ENGINEERING Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production.

  24. Stream Starts Per Second (SPS)

  25. Principles Of Chaos Engineering http://principlesofchaos.org ● Build a Hypothesis around Steady State Behavior ● Vary Real-world Events ● Run Experiments in Production ● Automate Experiments to Run Continuously

  26. Stream Starts Per Second (SPS)

  27. ChAP

  28. Goal: Chaos All The Things

  29. Persona- Gateway API lization

  30. Persona- Gateway API lization API Control API Exp

  31. Persona- Gateway API lization API Control API Exp

  32. Persona- Gateway API lization API Control API Exp

  33. Persona- Gateway API lization API Control API Exp Select 1% of requests for control Select 1% of requests for experiment

  34. Persona- Gateway API lization API Control API Exp

  35. Persona- Gateway API lization API Control API Exp if(shouldRoute == true)

  36. 98% Persona- Gateway API lization 1% API Control 1% API Exp

  37. Persona- Gateway API lization API Control API Exp if(shouldFail == true)

  38. Persona- Gateway API lization API Control API Exp

  39. Stream Starts Per Second (SPS)

  40. Fallback Metrics

  41. Fallback Metrics

  42. Fallback Metrics

  43. CPU Utilization

  44. Future Work on ChAP

  45. Automated Canary Analysis

  46. Detect divergence and stop early

  47. Integrate with continuous delivery system

  48. Clone multiple services to run an experiment A C B D B Con C Con B Exp C Exp

  49. http://principlesofchaos.org http://chaos.community

  50. Questions?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend