Beyond The Numbers Baron Schwartz Who Am I? baron@percona.com - - PowerPoint PPT Presentation

beyond the numbers
SMART_READER_LITE
LIVE PREVIEW

Beyond The Numbers Baron Schwartz Who Am I? baron@percona.com - - PowerPoint PPT Presentation

Beyond The Numbers Baron Schwartz Who Am I? baron@percona.com @xaprb linkedin.com/in/xaprb xaprb.com/blog Who Am I? Maatkit Percona T oolkit Innotop Monitoring Plugins Aspersa Online T ools JavaScript


slide-1
SLIDE 1

Beyond The Numbers

Baron Schwartz

slide-2
SLIDE 2

Who Am I?

  • baron@percona.com
  • @xaprb
  • linkedin.com/in/xaprb
  • xaprb.com/blog
slide-3
SLIDE 3

Who Am I?

  • Maatkit
  • Innotop
  • Aspersa
  • JavaScript Libraries
  • Percona T
  • olkit
  • Monitoring Plugins
  • Online T
  • ols
slide-4
SLIDE 4
  • Consulting
  • Support
  • Remote DBA
  • Engineering
  • Conferences &

Training

  • Percona Server
  • Percona XtraBackup
  • Percona XtraDB

Cluster

  • Percona T
  • olkit
  • Many More
slide-5
SLIDE 5

Today's Agenda

  • Benchmarks
  • Aggregation and Distributions
  • Performance, Capacity & Utilization
  • Rules of Thumb
  • Queueing Theory and Scalability
slide-6
SLIDE 6

Benchmarks

slide-7
SLIDE 7

What's Missing?

  • Distribution
  • Time Series
  • Response Times
  • Parameters
  • Goals
  • System Specs
slide-8
SLIDE 8

What's Misleading?

  • Logarithmic X-Axis
  • Interpolation
slide-9
SLIDE 9

What's Good?

  • Y-Axis Reaches 0
  • No Fake-Smoothing
slide-10
SLIDE 10

Behind a Single Dot

slide-11
SLIDE 11

Look At All That Data...

slide-12
SLIDE 12

What's With The Grid Lines?!?!?

slide-13
SLIDE 13

Better Benchmarks

What does an ideal benchmark report look like?

slide-14
SLIDE 14

Clear Benchmark Goals

  • Validating hardware configuration
  • Comparing two systems
  • Checking for regressions
  • Capacity planning
  • Reproducing bad behavior to solve it
  • Stress-testing to find bottlenecks
slide-15
SLIDE 15

Hardware and Software

  • Specs for CPU, disk, memory, network
  • Software versions (OS, SUT, benchmark)
  • Filesystem, RAID controller
  • Disk queue scheduler
slide-16
SLIDE 16

Presenting Results

  • Ideally, make raw results available
  • Include metrics from OS (CPU, RAM, IO,

network)

  • Generate some plots to summarize
  • This is where the rubber meets the road!
slide-17
SLIDE 17

Better Aggregate Measures

  • Average
  • Percentiles
  • 95th
  • 99th
  • Maximum
  • Observation Duration
  • Question: how bad can 95th percentile be?
slide-18
SLIDE 18

More Aggregate Measures

  • Median (50th Percentile)
  • Standard Deviation
  • Index of Dispersion
slide-19
SLIDE 19

Better...

slide-20
SLIDE 20

Better Still...

slide-21
SLIDE 21

Keep It Coming...

slide-22
SLIDE 22

Throughput AND Response Time

slide-23
SLIDE 23

Performance

  • What is Performance?
  • T

wo Metrics

  • Response Time (time per task)
  • Throughput (tasks per time)
  • They're not reciprocals
  • More on this later
slide-24
SLIDE 24

What Performance Isn't

  • CPU Usage
  • Load Average
  • Other metrics of resource consumption
slide-25
SLIDE 25

Performance

  • I often focus on response time
  • It represents user experience
  • Throughput indicates capacity rather than

performance

  • For benchmarking, throughput is primary
slide-26
SLIDE 26

Utilization

  • The portion of time during which the

resource is busy

  • i.e. there is at least one thing in progress
slide-27
SLIDE 27

Utilization is Confusing

  • Be very careful with tools that report

utilization

  • From the Linux iostat man page:
  • “%util: Percentage of CPU time during which

I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this value is close to 100%.”

  • Can you parse that? Is it true?
slide-28
SLIDE 28

Capacity

  • What is Capacity?
slide-29
SLIDE 29

Capacity

slide-30
SLIDE 30

Capacity – My Definition

Capacity is the maximum throughput ... at achievable concurrency ... with acceptable performance ... as defined by response time ... meeting specified constraints ... over specified observation intervals.

slide-31
SLIDE 31

Capacity Example

  • What is capacity of the system at a

concurrency of 32 with 10-second 95th- percentile response time not to exceed 2ms over a 60-minute duration?

  • T
  • determine this, we need goal-seeking

benchmark software

  • Most benchmark software can't do this
slide-32
SLIDE 32

Benchmarks, etc Recap

  • Most benchmarks reveal very little
  • Benchmark reports reveal even less
  • It's good to go beyond the surface
slide-33
SLIDE 33

Amdahl's Law

  • “The speedup of a program using multiple

processors in parallel computing is limited by the time needed for the sequential fraction of the program.” - Wikipedia

  • It's basically a law of diminishing returns.
slide-34
SLIDE 34

Should I Defragment My Disk?

  • Method 1: Google “defragment”
  • Method 2: Try it and see
  • Method 3: Measure if the disk is a

bottleneck

slide-35
SLIDE 35

Spolsky -vs- Millsap

slide-36
SLIDE 36

Spolsky -vs- Millsap

slide-37
SLIDE 37

Amdahl's Law

  • Don't try to optimize little things.
slide-38
SLIDE 38

Little's Law

  • N = XR
  • That is,
  • Concurrency = Throughput * Response Time
  • This holds regardless of queueing, arrival

rate distribution, response time distribution, etc.

slide-39
SLIDE 39

Little's Law Example

  • If disk IOs average 4ms...
  • And there are 280 IOs per second...
  • Then the disk's average concurrency is:
  • N = 280 * .004
  • N = 1.12
  • Do you believe this?
  • When might it not be true?
slide-40
SLIDE 40

Little's Law Example #2

  • If disk utilization is 98%
  • And there are 280 IOs per second
  • What do we know?
slide-41
SLIDE 41

Utilization Law

  • U = SX
  • Also independent of distributions, etc...
  • That is,
  • Utilization = Service Time * Throughput
  • Utilization = 98% and Throughput = 280
  • S = U/X
  • Service Time = .98 / 280 = .0035
slide-42
SLIDE 42

Queueing Theory

  • How can we predict the amount of

queueing in a system?

  • How can we predict its response times?
  • How can we predict capacity?
slide-43
SLIDE 43

Erlang Queueing

  • Erlang's formulas model the probability of

queueing for a given arrival rate, service time, and number of servers.

  • A “server” is anything capable of serving

a request.

  • CPUs
  • Disks
slide-44
SLIDE 44

CPU -vs- Disk Queueing

  • Scenario: 4-CPU, 4-disk (RAID0) server
  • Thought experiment:
  • How do processes queue for CPU?
  • How do I/O requests queue on disks?
slide-45
SLIDE 45

Notation

  • T

ypically see something like M/M/1

  • Each letter is a placeholder in A/S/n
  • A = Arrival distribution
  • S = Service-time distribution
  • n = Number of servers
  • A and S can be one of:
  • Markov
  • Deterministic
  • General
slide-46
SLIDE 46

CPUs -vs- Disks

  • CPUs: M/M/4
  • Disks: 4 x {M/M/1}
slide-47
SLIDE 47

M/M/1 Queueing

cmg.org

slide-48
SLIDE 48

M/M/n Queueing

cmg.org

slide-49
SLIDE 49

Erlang C Function

  • M/M/n queueing is modeled by Erlang C
  • See http://en.wikipedia.org/wiki/Erlang_(unit)
slide-50
SLIDE 50

What's Wrong With Erlang C?

  • You must validate your arrival times.
  • You must validate your service times.
  • The equation is hard to work with.
  • In practice, it's hard to use Erlang C.
slide-51
SLIDE 51

Scalability

  • Queueing causes non-linear scaling.
  • But first, let's talk about linearity.
slide-52
SLIDE 52

System Scalability

Concurrency Throughput Why?

slide-53
SLIDE 53

Universal Scalability Law

Concurrency Throughput Linear Amdahl USL

slide-54
SLIDE 54

Amdahl Scalability

slide-55
SLIDE 55

USL Scalability

slide-56
SLIDE 56

USL Scalability Modeling

slide-57
SLIDE 57

USL Performance Modeling

slide-58
SLIDE 58

Scalability Limitations

  • Locks
  • Synchronization points
  • Shared resources
  • Duplicated data to be kept in sync
  • Weakest-link problems
slide-59
SLIDE 59

RAID10 On EBS

  • Which is faster?
  • RAID 10 over 10 EBS volumes
  • RAID 10 over 20 EBS volumes
  • Hint: http://goo.gl/Xm92Y
  • Also, http://goo.gl/fAEIL
slide-60
SLIDE 60

Debunking “Linear”

  • Ask to see the actual numbers.
  • They shouldn't be rounded off suspiciously.
  • They must be truly linear.
  • They must intersect the point (0, 0).
slide-61
SLIDE 61

Debunking, Example #1

slide-62
SLIDE 62

Is it Linear?

slide-63
SLIDE 63

It's Not Linear

slide-64
SLIDE 64

Resources

  • Naomi Robbins' Blog
  • http://blogs.forbes.com/naomirobbins/
  • Percona White Papers
  • http://www.percona.com/
  • Neil J. Gunther
  • Guerrilla Capacity Planning
  • http://www.contextneeded.com/
slide-65
SLIDE 65

Questions?

slide-66
SLIDE 66

baron@percona.com @xaprb