John Adams Twitter Operations <jna@twitter.com>
Fixing Twitter
... and Finding your own Fail Whale
Fixing Twitter ... and Finding your own Fail Whale John Adams - - PowerPoint PPT Presentation
Fixing Twitter ... and Finding your own Fail Whale John Adams Twitter Operations <jna@twitter.com> Operations Small team, growing rapidly. What do we do? Software Performance (back-end) Availability Capacity Planning
John Adams Twitter Operations <jna@twitter.com>
... and Finding your own Fail Whale
high in existing cloud offerings
computer science problems.
2008 Growth
1.25 2.5 3.75 5 Dec 07 Feb 08 Apr 08 Jun 08 Aug 08 Oct 08 Dec 08
Unique Visitors (in Millions)
That was only the beginning...
previous graph!
Not slowing down, despite what outsiders say. Hard for outsiders to measure API usage!
+ an appreciation for Institutionalized Fear
Find Weakest Point
Metrics + Logs + Science = Analysis
Find Weakest Point
Metrics + Logs + Science = Analysis
Take Corrective Action
Process
Find Weakest Point
Metrics + Logs + Science = Analysis
Take Corrective Action Move to Next Weakest Point
Process Repeatability
real time as possible
HTTP 200s/SEO
managed services
software deploy
deploys
signed int (32 bit) Twitpocolypse unsigned int (32 bit) Twitpocolypse status_id r2=0.99
Curve-fitting for capacity planning (R, fityk, Mathematica, CurveFit)
CPU and Latency
last deploy times
daemon / www logs
disable computationally or IO-Heavy site function
EARLY in your company.
the log message doesn’t include ‘reviewed’
what changed via email
www.review-board.org
Campfire
Apache
MPM Model MaxClients TCP Listen queue depth
Rails (mongrel) 2:1 oversubscribed to cores Varnish (search) # threads
Many limiting factors in the request pipeline
Memcached # connections MySQL # db connections
Symptom Bottleneck Vector Solution Bandwidth Network HTTP Latency Servers++ Timeline Database Update Delay Better algorithm Search Database Delays DBs++ Code Updates Algorithm Latency Algorithms
and quad core machines with 8 core
gain
month.
Capital expenditure = hard to realize new technology gains.
resulting in slow queries against the db
many O(ny) operations.
(>60 s)
types to prevent eviction
FNV Hash instead of Ruby + MD5
libmemcached
50% decrease in load with Native C gem + libmemcached
difficult.
important configuration data (loss of darkmode flags, for example)
inputs causing SEGV and unexpected behavior
(failover to another backend if Varnish dies)
applying hash algorithm
specifying a callback= parameter - big win.
market
used in ‘durable’ mode.
(RabbitMQ), but you need in house Erlang experience.
Falco tinnunculus
handle 3rd party communications or back-end work.
request cycle
tree traversal
scheduling algorithm)
produce inconsistent results to the end user.
popular data
tables
the write DBs. Reading from master = slow death
designed queries
you (mkill)
available.
Twitter Open Source (Apache License):
http://github.com/nkallen/cache-money/tree/master
http://tangent.org/552/libmemcached.html
http://github.com/robey/kestrel
http://github.com/netik/mod_memcache_block