The 1 Year and 1 hour Capacity Plan in the Drupal World
The 1 Year and 1 hour Capacity Plan in the Drupal World About me - - PDF document
The 1 Year and 1 hour Capacity Plan in the Drupal World About me - - PDF document
The 1 Year and 1 hour Capacity Plan in the Drupal World About me Principal SRE @Acquia (Cloud Data Team) Joined in December 2011 Location: Lisbon, Portugal Co-authored Seeking SRE w/ Machine Learning for SRE (OReilly)
About me
- Principal SRE @Acquia (Cloud Data Team)
- Joined in December 2011
- Location: Lisbon, Portugal
- Co-authored Seeking SRE w/ Machine Learning for SRE (O’Reilly)
- Founder and Lead of the Portuguese Drupal Association
- Fun Facts:
○ Presented in DevOps events including DrupalCons. ○ Dedicated father of 2 kids and still manages to study and write. ○ First Linux installation: Slackware in 1994. ○ Former theatre actor.
Agenda
The problem What is Capacity Why do Capacity Planning Relation to Site Reliability Engineering Budget & Capacity Planning Load Testing Performance Tuning vs. Capacity Planning What to measure How to measure How to track capacity Forecasting First Easy Steps Conclusions
The Problem
Site Launch & User Expectations
Falcon Heavy launch, SpacexTypical Drupal Site Launch
What about Capacity Planning??
- Disable devel
- Configure cron
- Check The Upload Sizes & Execution Time
- Check Recipient Email Addresses
- Set The File Permissions
- Pro-tect Your Root Account
- Check Per-mis-sions
- Turn Off Error Reporting
- Han-dle 404 Errors Gracefully
- Check Robots.txt
- Com-bine Pathauto With Global Redirect
- Cre-ate A Main-te-nance Page
- Con-fig-ure Caching
- Css And Javascript Optimisation
- Check Unpub-lished Con-tent Is Not Visible
- Con-fig-ure Statistics
- Monitor the Site
- ** Plan for Failure **
User Expectations
Drupal click screenshot- The end goal of capacity
planning is a smooth and speedy experience for the users
- Varies depending on what type
- f application is and what
portion of the application they interact with
No silver bullet
- Plenty of capacity but a slow
website or unavailable
- Capacity is only one part of
making the end-user experience fast
- We want to measure and track
to make forecasts
- Intolerable amount of latency
should raise a flag
What is Capacity
resources required to run your services in the context you have chosen to run them
Carbon Fiber Tank, SpaceXCapacity in Site Reliability Engineering (SRE)
- Capacity: The maximum amount of output a product deployment is
capable of completing in a given period of time
- Capacity planning: Process that determines the resources needed,
like people, instances, CPU, memory, time and more, for the company to meet changing demands for its services
- In the Drupal World we focus mostly on serving WEB capacity
Resource management
The Art of Capacity Planning
Arun Kejariwal, John Allspaw "O'Reilly Media, Inc."- Ensure proper resources are
available to handle load
- Define procurement and an
approval process
- Justify capital needs
- Manage resources after
deployment
Why do Capacity Planning
Kroger grocery store, Lexington Kentucky, 1947, by Brett StreutketQuick and Dirty Math
- Only spend as much as you
actually need
- Be ahead of sharp growth
- Avoid emergencies
Stay Fast and Reliable
Site Reliability Engineering
Rocket Laboratory, 1952 NASA/William A. BowlesBen Treynor - Google
...an SRE team is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their service(s)...
“ “
Demand Forecasting and Capacity Planning
- Ensuring that there is sufficient
capacity and redundancy
- Serve projected future demand
with the required availability
- Ensure the required capacity is
in place by the time it is needed
- Take both organic and inorganic
growth into account
https://unsplash.com/photos/mexeVPlTB6kHow SRE advocates for Capacity Planning
- Perform regular load testing
- Incorporate SLOs on Capacity
- Capacity is critical to
availability, therefore the SRE team leads capacity planning initiatives and provisioning
https://unsplash.com/photos/DX9X0g0Cg88Budget & Capacity Planning
Vintage Grow Your Money by Chris Potter, ccPixs.comKeeping the costs low
- Meet with Finance, Engineering
and Product
- Gather Systems and Application
metrics
- Use that data to justify the
investment
Three forces that impact Capacity Planning Product Finance Engineering
Plan
Load Testing
“Hope is not a strategy”
- St. Margrethen - Load Test by Kecko
Load testing a Drupal stack
- How to load test?
“Hit it until it breaks”
- Include the points of failure in
the calculations
- Determining backend limits can
be tricky
- Use those resource ceilings as a
basis while predicting future growth
https://docs.acquia.com/acquia-cloud/arch/A Few Load testing Tools
simulate
- Loadrunner
○ http://bit.ly/microfocus-loadrunner
- Iago
○ https://github.com/twitter/iago
- JMeter
○ http://jmeter.apache.org/
collect
- Prometheus
○ http://www.prometheus.io/
- Signalfx
○ http://www.signalfx.com/
- Cacti
○ http://cacti.net
- Ganglia
○ http://ganglia.info
- Nagios
○ http://nagios.org/
https://www.gocomics.com/calvinandhobbes/1986/11/26Performance Tuning
- vs. Capacity planning
(different goals)
Top Speed by Alexander NieWhat to measure
defining the metrics
End-of-life by Dennis van ZuijlekomDivide & Conquer
- Splitting nodes
- Understand capacity demands
- f each node
- Measure more distinctly
- How requests or queries per
second affect resources
Identifying the key resources to measure
- Disk space (MB)
- Disk throughput (IOPS)
- CPU performance (FLOPS)
- RAM memory (MB)
- Network bandwidth (Mbps)
- Network IP pool (Netmask)
- Others
How to measure
Living Computer Museum, Seattle http://www.brendangregg.com/Perf/linux_perf_tools_full.png| Tools to measure on Linux servers |
Collecting resources on web servers
TODO: CODE- Example script that
sends metrics to statsd
- Low footprint using
/proc, df and ps
- For a constant reliable
monitoring service use collectd: https://collectd.org
- r Telegraf:
https://www.influxdata.com/time- series-platform/telegraf/
How to track Capacity
Store and display time-series
- Signalfx
- Cacti
- Ganglia
- Graphite
- Signalfx
- Datadog
- Ruxit
- LogicMonitor
- Sematext
- CoScale
- Riemann
- Prometheus
- Sensu
- Idera
- Bijk
- X-Pack
- vRealize Hyperic HQ
A couple of load testing tips load testing Tutorials:
https://www.tutorialspoint.com/jmeter https://www.blazemeter.com/load-testing
docker app for grafana:
https://github.com/kamon-io/docker-grafana-graphite
Forecasting
(predicting trends)
Numbers And Finance by SeniorLiving.orgPredict the future?
- Use Context & Math
- Make educated guesses
- Long-term view is generally
steady
- Generate estimates to sustain
growth
- Use an adjustable process
- Forecast guides autoscaling
policies
Ceilings and Historical data
- Daily storage consumption
example
- Metric: total available disk space
- Cumulative total provides an
historical perspective
- We can predict future needs
- Storage will probably be
exhausted in the ceiling to where the line is headed
Curve fitting
- Curve fitting
- Creative & Scientific
- Stay ahead of growth
- Use time-series data
- Forecast by constructing new
data points beyond the known
- Reconciliation of what we know
and the best fit equation
- Consider context before math
Forecasting Peak-Driven Resource Usage
- Track how the peaks change over time
- Extrapolate from that data to predict
future needs
- Identify the server resource ceilings
- Find a relation between resources and
application-level work
- Decide if we should scale vertically or
horizontally
- and perform proactive autoscalling
- Fityk is an Open Source
Software for nonlinear fitting
- f analytical functions to data.
- Incorporate cfityk scripts into
automated curve fitting, like:
cfityk ricardo-disk.fit @0 < ricardo-disk.csv guess Quadratic fit info formula quitReturns the formula:
4888.18 + 363.063 * x + 8.91132 + -1.55119*x + 0.0660771*x^2 Homepage: https://fityk.nieto.pl/
cfityk ricardo-disk.fit @0 < ricardo-disk.csv guess Quadratic fit info formula quitAutomating Forecasts with fityk & cfityk
Small demo: https://youtube.com/watch?v=EZnyq1Hr_7I
Forecasting with Machine Learning
Seeking SRE
Conversations About Running Production Systems at Scale Publisher: O'Reilly Media- Most popular method for
curve-fitting in fityk is Levenberg-Marquardt
- ML is also an option for
forecasting (book I co-authored)
- Code examples and guides
https://github.com/ricardoamaro/MachineLearning4SRE
Start with Easy Steps
Get Started
- 1. Select a process owner.
- 2. Identify the resources to be measured.
- 3. Measure these resources.
- 4. Compare to maximum capacity.
- 5. Collect workload forecasts.
- 6. Use forecasts for IT resource requirements.
- 7. Map requirements onto existing utilizations.
- 8. Predict when the system will be out of capacity.
- 9. Update forecasts and utilizations.
Set a Goal!
- Two Classes:
○ Load: usually expressed in arrival rate or peak rate of requests hitting the service
- eg. target for 10.000 authenticated concurrent
Drupal users
○ Performance: usually expressed in the form of Service Level Objectives
- eg. 99th percentile of all requests should return
in less 500ms
Be proactive
( plan & document ahead)
Picasso drawing with Paloma and Claude at Villa la Galloise, 1953. By Edward Quinn, EdwardQuinn.com.Capacity Planning Dashboard
- Support your conclusions with
metrics in a dashboard
- Both manual scaling and auto
scaling decision should be based
- n real data
- When to scale?
○ date and time (be alerted if needed)
- How to scale?
○ vertical, horizontal or diagonal scaling
(Example) Drupal Cluster Dashboard
type valu e limit/ node ceiling units limit (total) current (peak) peak % Estimated days left Varnish cache 28 1024 req/sec 2048 600 29% 830 Web 31 80 busy calls 160 145 90% 12 Database 15 60 connections 120 96 80% 36 Storage 14 30 TB 30 14 46% 21Conclusions
Drive the system to the appropriate level of risk for the lowest cost.
Questions?
The 1 Year and 1 hour Capacity Plan in the Drupal World
Join us for contribution opportunities
Mentored Contribution First Time Contributor Workshop General Contribution
#DrupalContributions What did you think?
https://events.drupal.org/node/22330 https://www.surveymonkey.com/r/DrupalConSeattle