The Forces That Disrupt Netflix
- Nov. 7, 2016
Haley Tucker
The Forces That Disrupt Netflix Haley Tucker Nov. 7, 2016 our - - PowerPoint PPT Presentation
The Forces That Disrupt Netflix Haley Tucker Nov. 7, 2016 our world ACROBAT FLEA parallel world # A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.
Haley Tucker
ACROBAT FLEA
parallel world
ACROBAT FLEA
parallel world computing
ENGINEER
Proxy/Routing
Devices Netflix Service Netflix Service Edge Service Traffic Netflix Playback Service Netflix Playback Service Edge Service Edge Service Edge Service Playback Service Traffic
Notes on Distributed Systems for Young Bloods
CHAPTER 1: THE WEIRD DATA IN THE CATALOG
CHAPTER 2: THE VANISHING OF CRITICAL SERVICES
CHAPTER 3: THE THROTTLE
Whoops, something went wrong…
Netflix Streaming Error
We’re having trouble playing this title right now. Please try again later or select a different title.
Clock, by heyyobecky4lyfe, Tumblr
Video Metadata Service Amazon S3 Source System Source System Netflix Services Netflix Services Netflix Services Netflix Services Netflix Service Traffic
Amazon S3 Netflix Playback Service { String msg = “This should never happen!”; throw new IllegalStateException(msg); }
Explosion, CC BY 2.0, Andrew Kuznetsov 2008, Flikr
Amazon WS Global Infrastructure
Amazon WS Global Infrastructure
Pager Diagnosis?
Canary, CC BY 2.0, Steve P2008 2014, Flikr
Canary (New Code) Baseline (Old Code) Traffic Traffic Video Metadata Service Amazon S3 Netflix Services Netflix Services Netflix Services Netflix Services Netflix Service Source System Source System Traffic
Netflix Services Netflix Services Netflix Services Netflix Services
Video Metadata Service Amazon S3 Source System Source System
Netflix Service Netflix Data Canary Service Data Tester Netflix Service Traffic
Australia with AAT, CC BY-SA 2.0, Ssolbergj 2010, Wikimedia
…one tool is a data canary.
THE VANISHING OF CRITICAL SERVICES
Proxy/Routing Devices
Log Data Service Traffic Cassandra Playback Service Netflix Playback Service Netflix Playback Service Edge Service Edge Service Edge Service Playback Service Traffic
Proxy/Routing Devices
Log Data Service Traffic Cassandra Playback Service Netflix Playback Service Netflix Playback Service Edge Service Edge Service Edge Service Playback Service Traffic
Proxy/Routing Devices
Log Data Service Traffic Cassandra Playback Service Netflix Playback Service Netflix Playback Service Edge Service Edge Service Edge Service Playback Service Traffic
{ throw new OutOfMemoryError(); }
Log Data Service Cassandra Playback Service
Whatever you ask, CC BY-SA 2.0, Kreg Steppe 2008, Flikr
Astronomical Clock, CC BY 2.0, Andrew Fleming 2011, Flikr
Keep Only Dependencies which are Necessary
Magic, CC BY-ND 2.0, Daniel Lee 2013, Flikr
Medusa Kill Switch, CC BY-NC-ND 2.0, Scott Hart 2013, Flikr
DEV
Playback Service TEST Playback Service PROD Playback Service
try { remoteService.call(); } catch( Throwable t ){ //Oops! System.exit(1); }
Log Data Service Cassandra Playback Service Proxy/Routing Traffic
It's Electric, CC BY ND 2.0, Alan Hochberg 2008, Flikr
Wrecking Ball in Building, CC BY 2.0, Jason Eppink 2008, Flikr
Proxy/Routing Devices
Log Data Service Traffic Cassandra Playback Service Automating Chaos Experiments in Production by Ali Basiri Applying Failure Testing Research @Netflix by Kolton Andrus and Peter Alvaro
Leverage circuit breakers and rigorously test failures.
Proxy/Routing Devices
Edge Service Edge Service Edge Service Playback Service Traffic Traffic URL Service
URL Service URL Client
Circuit-breakers and Fallbacks Metrics Retries and Timeouts RPC Service Discovery
Traffic Concurrent Requests Throttled Requests (HTTP 503)
} System.gc(); }
URL Service Playback Service Edge Service Proxy/Routing Traffic
URL Service URL Client
Circuit-breakers Metrics Retries and Timeouts RPC Service Discovery
Heavy Fallback
With 100% Fallback, CPU held at 90%
No fallback, CPU held at 90%
Siege: https://github.com/JoeDog/siege
CACHE STATIC FALLBACK SERVICE
URL Service Playback Service Edge Service Proxy/Routing Traffic
} return Response .status(503) .build(); }
Experience or Performance Impact
Customer Streaming Impact
Fire Buckets at Oakworth Statione, CC BY 2.0, Tim Greene 2015, Flikr
Non-Critical Playback Service Proxy/Routing Devices Edge Service Edge Service Edge Service Traffic Traffic URL Service Critical Playback Service Non-Critical URL Service
Country Road at Sunrisee, CC BY-SA 2.0, Susanne Nilssone 2014, Flikr
Traffice, CC BY-NC 2.0, jonbgeme 2008, Flikr
Non-Critical Playback Service Proxy/Routing Devices Edge Service Edge Service Edge Service Traffic Traffic Critical Playback Service URL Service Non-Critical URL Service
Shard your application based on
CHAPTER 1: THE WEIRD DATA IN THE CATALOG
CHAPTER 2: THE VANISHING OF CRITICAL SERVICES
CHAPTER 3: THE THROTTLE
Haley Tucker