Automating Chaos Experiments In Production Ali Basiri - Chaos Team @abasiri
Netflix
Control CDN Plane Movie Bits Website, Apps, Signup, Login, Browsing, Search Playback control, Bookmarks, ...
Ali Basiri Software Engineer @ Netflix Chaos Engineer ● Distributed Systems Engineer ● Co-author of Principles of Chaos ●
Chaos Monkey
Service Availability
Anatomy of a Failure
Movie CDN API Info Selection
Movie CDN API Info Selection
Movie CDN API Info Selection Fallback
Movie CDN API Info Selection Fallback
Fallback Movie CDN API Info Selection
Fallback Movie CDN API Info Selection
FIT
Request Level Failure Injection
Request Level Failure Injection Movie CDN API Info Selection
Is API resilient to failure of Personalization? Persona- Gateway API lization
Persona- Gateway API lization Randomly select 10% of requests to participate in experiment
Persona- Gateway API lization
Persona- Gateway API lization if (shouldFail == true)
Persona- Gateway API lization if (shouldFail == true)
Even More FIT Availability
CH ∀ OS ENGINEERING Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production.
Stream Starts Per Second (SPS)
Principles Of Chaos Engineering http://principlesofchaos.org ● Build a Hypothesis around Steady State Behavior ● Vary Real-world Events ● Run Experiments in Production ● Automate Experiments to Run Continuously
Stream Starts Per Second (SPS)
ChAP
Goal: Chaos All The Things
Persona- Gateway API lization
Persona- Gateway API lization API Control API Exp
Persona- Gateway API lization API Control API Exp
Persona- Gateway API lization API Control API Exp
Persona- Gateway API lization API Control API Exp Select 1% of requests for control Select 1% of requests for experiment
Persona- Gateway API lization API Control API Exp
Persona- Gateway API lization API Control API Exp if(shouldRoute == true)
98% Persona- Gateway API lization 1% API Control 1% API Exp
Persona- Gateway API lization API Control API Exp if(shouldFail == true)
Persona- Gateway API lization API Control API Exp
Stream Starts Per Second (SPS)
Fallback Metrics
Fallback Metrics
Fallback Metrics
CPU Utilization
Future Work on ChAP
Automated Canary Analysis
Detect divergence and stop early
Integrate with continuous delivery system
Clone multiple services to run an experiment A C B D B Con C Con B Exp C Exp
http://principlesofchaos.org http://chaos.community
Questions?
Recommend
More recommend