Experimental Algorithmics Dagstuhl Seminar Empirical Evaluation for - - PowerPoint PPT Presentation
Experimental Algorithmics Dagstuhl Seminar Empirical Evaluation for - - PowerPoint PPT Presentation
Experimental Algorithmics Dagstuhl Seminar Empirical Evaluation for Graph Drawing January 2015 Catherine C. McGeoch Amherst College and D-Wave Systems Inc. Algorithms and Statistics Mismatch Statistics type Algorithm answer: type
Algorithm – type question: Is the algorithm cost C(n) a member of the set O(n log n)? * Upper bound
- n leading
term of the cost function
- n problem
size n. Statistics – type answer: With certainty at least 95%, C(100) < C(200), assuming cost is normally distributed and variance is constant in n.
Algorithms and Statistics Mismatch
Categories of Statistics
l Confirmatory: Assume a model, form a hypothesis
about parameters of the model, measure confidence in the conclusions.
l Descriptive: Find a concise summary of data:
location, spread, distribution, trend lines.
l Exploratory (EDA): Find patterns and relationships,
generate hypotheses & conjectures about the model.
l Graphical EDA: Emphasizes visualization tools.
Experimental Algorithmics: Mostly Exploratory
0.8 1 1.2 1.4 1.6 1.8 2 2.2 1 2 3 4 5 6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 1 2 3 4 5 6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 1 2 3 4 5 6
Comparison: Bigger/smaller Location/ spread. Confirmatory + descriptive methods. Trend Lines: Apply a concise model. Interpolate. Confirmatory + descriptive methods. Modeling: Find the ``true'' underlying model. Extrapolate. Exploratory + graphical methods.
Find the Model
First Fit algorithm for Bin Packing (ccm 1989). e(u,N) = empty space in bins, as a function of item size u, item count N. Contrary to previous conjectures: e(u, N) is not linear in u for large N. e(u, N) is not linear in N for all u < 1.
Find the Model
Min cuts in randomly weighted grid graphs. The mean occurs with probability zero. The result: discovery
- f bimodality and an
explanation of
- bserved mean
runtime
Find the Model
Paired data points x, y. Correlation statistic r .83 < r < .87 Fairly strong positive correlation. But the model is quite different in each case!
Algorithmic Experiments: The Good News
Near total control of the experiment. Tons of data. Fairly simple causal models (input-output). Relatively strong theory (for runtime analysis anyway).
Algorithmic Experiments: The Bad News
High expectations from data analysis. Unusual and complex relationships. Standard statistical techniques do not address common questions.
Methods for Algorithmic Experiments*
*Emphasizing descriptive, exploratory & graphical techniques. Focus on exploratory (vs designed) experiments.
- I. How not to do it (anonymized).
- II. One methodological question: What should
I measure?
1: CPU time increase is probably due to paging at large problem sizes.
How Not to Do It 1
``As conjectured, algorithm cost is asymptotically O(n), except for a slight increase at large n.1''
CPU Times Are Noisy
One program, 9 identical instances (ordered by size), run on 12 common platforms. (11 other platforms couldn't run all instances.) src: DIMACS TSP Challenge
What Not To Do 1
l Don't choose the wrong cost metric:
CPU time ≠ dominant operation
l Don’t fit regression lines to CPU times: step
functions, increasing constants, skew, bimodality, etc. violate the model assumptions...
l Don’t naively extrapolate from regression fits to
asymptopia.
How Not to Do It 2
My experiments using Culberson's Iterated Greedy code for graph coloring. Look at: variations in Vertex Rule Inputs: from a DIMACS Challenge (1993).
G = (Input Graph) C = (Initial Coloring) Loop Reorder colors (Color rule) Reorder vertices (Vertex rule) C = GreedyColor[G] //save best C = Kempe reduction (k iterations) C = Revert rule (r iterations) until Stopping rule (r, max) Report C* best coloring found.
DIMACS Challenge Graph Coloring Inputs (REG) = Register interference graphs from a compiler
- application. Optimal
Colorings not known. This input.
Culberson's Iterated Greedy Algorithm, scores for one input.
Compare Five Vertex Rules Score = color count + `niceness’ term.
Two rules look like random walks. Three rules converge after 1 iteration. Why do they converge so quickly? ... Report scores after every 100 iterations.
Why do they converge so quickly?
l Because the REG
graphs are trivial to solve.
l Optimality of one
pass of simple greedy can be verified by eye!
x y
2 cliques chained together
Adjacency Matrix
DIMACS Challenge Graph Coloring Inputs (REG) = Register interference graphs from a compiler
- application. Optimal
colorings not known....
Because nobody tried running the simple greedy algorithm on these problems.
How Not to Do It 2
l Don't assume that good performance is due to
your wonderful new algorithm. It could be due to easy inputs.
l Don’t overlook simple/baseline algorithms. l Don't just look at the final result – look at the
mechanics.
l Don't pull parameters (iteration count) out of
your hat. My runtimes were 700x bigger than necessary.
How Not to Do It 3
u FF: Consider weights in order of arrival; pack each into the first (leftmost) bin that can contain it.
The First Fit Heuristic for Bin Packing.
L = list of n weights uniformly distributed
- n (0, u).
How well does FF pack them into unit- capacity bins, as a function of u, asymptotically in n?
First Fit Bin Packing
Experiments circa 1978: Pack n=1000 weights, u=.25, .5, .75, 1, 20 trials each. Measure B: bins used in the packing. Cost = Bins / OPT Cost < Bins / Weights Known: FF is optimal at u=1. Conjecture: FF is
- ptimal for all u.
First Fit Bin Packing
Ten years later: n=100,000 weights, u = .2, .22, ... .98 1.0. Measure Empty Space: E = B – Weightsum. Observe = Empty space is not linear in u! The ``peak'' grows as n increases; the valley disappears. New Conjecture: FF is
- ptimal u=1, but
nowhere else.
How Not To Do It 3
l Don’t assume you are looking at asymptopia. l Don’t look at one cost metric. Here, a difference
gives a better view than a ratio. Cost = Bins / Weights EmptySpace = Bins – Weights
Don’t Be That Guy: What to Avoid
Looking at the wrong metric. Looking at just one metric. Looking at summarized instead of raw data. Reasoning incorrectly about cause and effect. Thinking the model is a true description of the data
- -- Use exploratory experiments to build
understanding.
Experimental Algorithm Evaluation
- I. How not to do it.
- II. What should I measure?
Matching goals to choices. Reducing design complexity. Variance reduction techniques.
Factors that affect experimental
- utcomes in algorithmics
- Metrics. Quantities that are measured as
performance indicators: e.g. time, solution quality.
- Input. Category, provenance, size n, more.
Algorithm/code. Data structure, tuning, rules.
- Environment. Language, compiler, memory
hierarchy, etc.
Today!
What Should I Measure?
Theory: The ocean is at most O(n3) feet deep when n feet from shore. Practice: New York Harbor is 24.1532 feet deep at its entrance. The Atlantic Ocean is 12,881.82892 feet deep in
- ne spot.
It is 102.03901 feet deep in another ....
What Should I Measure?
Match the performance indicator to the research goals. There is usually a tradeoff: Theory = accurate (always true) Practice = precise (many decimal places).
Accurate. Precise
Flavors of Experiments
Field experiments:
l Observe real-world
phenomena
l Describe results l Classify outcomes
Laboratory experiments:
l Isolate components l Manipulate parameters l Cause/effect l Build models
Levels of Instantiation
void Qsort(A, lo, hi) { if (lo >= hi) return; int p = Partition(A); Qsort(A, lo, p-1); Qsort(A, p+1, hi); }
Algorithm Program Process
Quicksort A[lo,hi]: x is an element of A Partition A around x Recur to left of x Recur to right of x
... Paradigm ... Algorithm ... Data Structures ... Source ... Object ... Process ...
What Should I Measure?
Lab Experiment: Isolated components
- f the algorithm,
abstract costs, simple generated inputs. Field Experiment: Instantiated code, whole costs, realistic inputs.
Accurate = Dominant
- peration counts.
Precise = CPU times.
Time Performance Indicators
l Theory's dominant op. l Data structure ops. l Function calls. l Main loop iterations. l Code block counts. l CPU time l Memory accesses l Cache & page misses l Wall clock time l ... experimenters have many
choices!
How to Choose a PI
l Enough precision to distinguish outcomes
among competing algorithms.
l Choose PIs that are directly comparable
across several algorithms.
l Lab: PI should isolate the interesting factors
and ignore irrelevant factors.
l Field: PI should measure the whole cost. l Choose indicators that match the literature. l OK to use more than one.
.
Multiple Performance Indicators
.
x = number
- f edge
crossings y = aspect ratio z = user task score Which algorithm is best?
C A B best worst
Multiple Performance Indicators
Two strategies for reducing dimensionality in data:
- 1. Transform numerical
to categorical data and stratify.
- 2. Project to a lower
dimension.
low med high
Categorization+Stratification
l Two
algorithms: red and blue.
l Each has solid
cost and a dashed cost.
l Which is
better, as f(x)?
. parameter x best worst
Categorization + Stratficiation
l In areas where
dashed is below the threshold, which algorithm has best solid curve?
. parameter x best worst none blue red blue no yes
Project to a Lower Dimension
.
a = A(x,y,z) b = B(x,y,z) c = C(x,y,z) Which is lowest: a,b,c?
C A B best worst
Variance Reduction Techniques
How to create experiments that produce analysis- friendly data. Less variance in data, in same number of trials.
What is Analysis-Friendly?
Is the mean of y asymptotically positive or negative? x x y y
Case Study: Self-organizing Sequential Search
Sequence of m random requests for items 1..n. Request probabilities p1, p2 ... Organizing rules M, T Cost = location in list What is the expected difference in cost of rules M and T?
L: 3, 5, 7, 1, 2, 4, 9, 8,6
Request r = 2, p2 Cost = 5
M: 2, 3, 5, 7, 1, 4, 9, 8, 6
M: Move to Front Rule
T: 3, 5, 7, 2, 1, 4, 9, 8,6
T: Transpose Rule
Comparison of MTF and TRA
VRT: Common Random Numbers
l If measuring
difference of two algorithms, and
l Outcomes are
positively correlated with respect to input, then
l Compare using
common inputs, paired differences.
When comparing M & T, use the same random request list. D[L] = M[L] - T[L] E[D] = E[ M – T ] = E[M] – E[T] Var(D) = Var(M) + Var(T) – 2Cov(M, T) If covariance > 0, variance in D is smaller than in unpaired differences.
Common Random Numbers Cost Difference D = MTF - TR
- St. Dev = 7.50
- St. Dev = 5.03
n=20, m=1000, t=50 random trials, P = Zipf's Law
VRT: Control Variates
If cost C is correlated with variate Y, and Expectation E[Y] = y is known, then Measure C and look at the difference D = C – y Y is a ``control variate'' for C.
Cov(C, Y) > 0 E[Y] = y (known) E[C] = c (unknown) E [ C – Y ] = c - y Var[C – Y ] = Var(C) + Var(Y) – 2Cov(C,Y). Measure Di = Ci – Yi Estimate E[C] with Di – y
Control Variates
Cost of request r in Transpose list is positively correlated with cost of r in the Optimal list. Cost of r in Optimal list = r Measure D = T – r for each request. Estimate E[T] = D + r R is a control variate for T.
T: 3, 6, 2, 5, 1, 4, 7 Opt: 1, 2, 3, 4, 5, 6, 7
Cost = 3 D = 1 = 3 - 2
Control Variates Cost of TR
T n=20, m=1000, t=50, P = Zipf's Law D + r
FF Bin Packing
u l Consider weights in order of arrival; pack each into the first (leftmost) bin that can contain it.
Given a list of n weights uniform on (0,u), pack into unit- sized bins using the First Fit heuristic. What is the expected cost (bins used) of FF(n,u)?
FF Control Variates
Bins = Weight Sum + Empty Space Bins is positively correlated with weight sum. E[W] = n*u/2 Measure empty space. Estimate mean bins with E[B] = E[S] +nu/2
Weight sum is a control variate for number of bins. Empty space S=B-W has less variance than B. S + nu/2 is friendlier to analyze than B.
FF Control Variates Bins vs Empty Space
Sources of Control Variates
l Optimal solution
(Transpose).
l Lower bound on solution
(bin packing).
l Initial random solution in
iterative algorithm.
l Solution to a simpler
problem.
l ....
- 1. Is the variate X
correlated with solution cost C? (Check it experimentally.)
- 2. Do I know the
expectation x of this variate X? If YES: Measure the difference D = C – X. . Substitute x = E[X] to estimate mean cost. E[C] = E[D] + x
VRT: Conditional Monte Carlo
l If cost C depends on another
variate Z, and
l the conditional expectation of
C with respect to Z can be calculated directly, then
l Generate Z, and calculate c =
f(C|Z).
l E[c] = E[C,Z]
MTF: Conditional Monte Carlo
l Cost of request r at
time t depends on list order L.
l Expected cost of
the list f(L) can be calculated.
l Generate list M,
then calculate f(M)
M: 3, 5, 2, 1, 4
Instead of one request r = 2, cost = 3, Calculate the expected cost of all requests for this list. E[c] = 1p3 + 2p5+ 3p2 + 4p1 + 5p4
Sequential Search Cost With/Without Variance Reduction
Ideas for Conditional Monte Carlo
Iterative, data structures, random walk algorithms: Generate random state S at time t. Instead of sampling a random cost in S, compute expected cost of S.
+ + + +
Splitting: if E[S] can't be calculated, sample the cost many times from S.
Statistical Methods for Algorithm Evaluation
- I. How not to do it.
- II. What should I measure?
- Matching metrics to experimental goals.
- Reducing dimensionality of multiple objectives.
- Variance reduction techniques.
Some Papers on Methodology
l CCM, Analyzing Algorithms by Simulation: Variance
Reduction Techniques and Simulation Speedups, ACM Computing Surveys, June 1992.
l D.S. Johnson, A theoretician's guide to experimental analysis
- f algorithms, in Data Structures, Near Neighbor Searches,
and Methodology, Fifth and Sixth DIMACS Implementation Challenges, Goldwasser, Johnson, McGeoch, Eds. (Or just google it).
Some Useful Books
l Cohen, Empirical Methods for Artificial Intelligence l Chambers et al., Graphical Methods for Data Analysis l Bartz-Beielstein et al. (eds), Experimental Methods for the
Analysis of Optimization Algorithms, Springer 2010
l Muller-Hannemann and Schirra (eds), Algorithm Engineering
– Bridging the Gap Between Algorithm Theory and Practice, Springer 2010 (Dagstuhl Meeting)
l McGeoch, A Guide to Experimental Algorithmics, Cambridge
Press 2012
Theory vs Practice
Practice Theory
- In theory there is no
difference between theory and practice. In practice there is.
- It may be alright in
practice, but it will never work in theory.
- There is nothing so