Automating Chaos Experiments In Production
Ali Basiri - Chaos Team @abasiri
Automating Chaos Experiments In Production Ali Basiri - Chaos Team - - PowerPoint PPT Presentation
Automating Chaos Experiments In Production Ali Basiri - Chaos Team @abasiri Netflix Control CDN Plane Movie Bits Website, Apps, Signup, Login, Browsing, Search Playback control, Bookmarks, ... Ali Basiri Software Engineer @ Netflix
Ali Basiri - Chaos Team @abasiri
Netflix
CDN Control Plane
Movie Bits Website, Apps, Signup, Login, Browsing, Search Playback control, Bookmarks, ...
Ali Basiri
Software Engineer @ Netflix
Movie Info API CDN Selection
Movie Info API CDN Selection
Movie Info API CDN Selection Fallback
Movie Info API CDN Selection Fallback
Movie Info API CDN Selection Fallback
Movie Info API CDN Selection Fallback
Movie Info API CDN Selection
API Gateway Persona- lization Is API resilient to failure of Personalization?
API Gateway Persona- lization Randomly select 10% of requests to participate in experiment
API Gateway Persona- lization
API Gateway Persona- lization if (shouldFail == true)
API Gateway Persona- lization if (shouldFail == true)
Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production.
State Behavior
Continuously
http://principlesofchaos.org
Stream Starts Per Second (SPS)
API Gateway Persona- lization
API Gateway Persona- lization
API Control API Exp
API Gateway Persona- lization
API Control API Exp
API Gateway Persona- lization
API Control API Exp
API Gateway Persona- lization
API Control API Exp
Select 1% of requests for control Select 1% of requests for experiment
API Gateway Persona- lization
API Control API Exp
API Gateway Persona- lization
API Control API Exp
if(shouldRoute == true)
API Gateway Persona- lization
API Control API Exp
1% 1% 98%
API Gateway Persona- lization
API Control API Exp
if(shouldFail == true)
API Gateway Persona- lization
API Control API Exp
Stream Starts Per Second (SPS)
Fallback Metrics
Fallback Metrics
Fallback Metrics
CPU Utilization
Automated Canary Analysis
Detect divergence and stop early
Integrate with continuous delivery system
Clone multiple services to run an experiment
B A
B Con B Exp
D C
C Con C Exp