Pragmatic Evolution of Super 6 and Sky Bet for Resiliency
M i c h a e l M a i b a u m S k y B e t t i n g & G a m i n g @ m m a i b a u m
Pragmatic Evolution of Super 6 and Sky Bet for Resiliency M i c h a - - PowerPoint PPT Presentation
Pragmatic Evolution of Super 6 and Sky Bet for Resiliency M i c h a e l M a i b a u m S k y B e t t i n g & G a m i n g @ m m a i b a u m Pragmatic and Achievable Focus is on pragmatic, achievable improvements in availability
M i c h a e l M a i b a u m S k y B e t t i n g & G a m i n g @ m m a i b a u m
@mmaibaum
@mmaibaum
@mmaibaum
@mmaibaum
50M Monthly Transactions (millions) 350M 2010 2012 2014 2016
@mmaibaum
@mmaibaum
@mmaibaum
@mmaibaum
@mmaibaum
@mmaibaum
@mmaibaum
@mmaibaum
behind
the updates
services
@mmaibaum
Web Tier API Service MySQL Score Service
Web Tier League Service MySQL Score Service
Score Updates and leagues held in memory DB updates, sorts for every change
@mmaibaum
Web Tier League Service MySQL Score Service
But what happens when the league service crashes?
@mmaibaum
League Service League Service Web Tier League Service MySQL Score Service
@mmaibaum
@mmaibaum
OpenBet Stored Procedures & DB OXI XML Login UI Payment Router Sidebar UI OpenBet Payments App SSO & Identity API Account API SSO Consumers Other Products Payment Services
>4.5THz CPU >3 TB RAM >300 VMs
@mmaibaum
OpenBet Stored Procedures & DB OXI XML Login UI Payment Router Sidebar UI OpenBet Payments App SSO & Identity API Account API SSO Consumers Other Products Payment Services
@mmaibaum
Can one kind of slow request consume all the resources in a critical tier of the application?
@mmaibaum
impact on other services
saturate, other requests fail quickly
different requests once separated out. Easier to manage and scale
@mmaibaum
@mmaibaum
@mmaibaum
@mmaibaum
@mmaibaum
@mmaibaum
@mmaibaum
threshold = 25 access_code = ‘a3fd3d2df4’ banner_cookie = get_cookie(‘smart_banners’) if ( banner_cookie IS NULL ) { set_cookie( bucket = random_number(1,100) ) } customer_bucket = cookie.get_value( ‘bucket’ ) customer_access_code = cookie.get_value( ‘access_code’ ) if ( access_code == customer_access_code) { route_request( ‘service’ ) } else if ( customer_bucket <= threshold ) { set_cookie( ‘access_code’ = access_code ) route_request( ‘service’ ) } else { route_request( ‘banner’ ) }
Pseudocode
@mmaibaum
@mmaibaum
@mmaibaum
Web Pages Bet API Couchbase Core API ~60,000 req/min Circuit Breaker with global state Circuit Breaker with local state
Circuit breakers used to protect higher level services from underlying failures
@mmaibaum
@mmaibaum
@mmaibaum
– You’ve had one big failure and then they care (briefly?) – or – Pro-active - they set targets and provide time and budget to achieve them?
– big failures leading to a massive focus on reliability – generally good performance leading to a lack of maintenance
@mmaibaum
@mmaibaum
Products Total Revenue Loss Error Budget Used Monthly Budget £40k 75% £50k £500 5% £10k £35k 87.5% £40k £1.5k 5% £30k
@mmaibaum
@mmaibaum
@mmaibaum
@mmaibaum
degraded
more important – Incident Command, Roles & Responsibilities
@mmaibaum