Experimental Algorithmics Dagstuhl Seminar Empirical Evaluation for - - PowerPoint PPT Presentation

experimental algorithmics
SMART_READER_LITE
LIVE PREVIEW

Experimental Algorithmics Dagstuhl Seminar Empirical Evaluation for - - PowerPoint PPT Presentation

Experimental Algorithmics Dagstuhl Seminar Empirical Evaluation for Graph Drawing January 2015 Catherine C. McGeoch Amherst College and D-Wave Systems Inc. Algorithms and Statistics Mismatch Statistics type Algorithm answer: type


slide-1
SLIDE 1

Experimental Algorithmics

Dagstuhl Seminar Empirical Evaluation for Graph Drawing January 2015

Catherine C. McGeoch Amherst College and D-Wave Systems Inc.

slide-2
SLIDE 2

Algorithm – type question: Is the algorithm cost C(n) a member of the set O(n log n)? * Upper bound

  • n leading

term of the cost function

  • n problem

size n. Statistics – type answer: With certainty at least 95%, C(100) < C(200), assuming cost is normally distributed and variance is constant in n.

Algorithms and Statistics Mismatch

slide-3
SLIDE 3

Categories of Statistics

l Confirmatory: Assume a model, form a hypothesis

about parameters of the model, measure confidence in the conclusions.

l Descriptive: Find a concise summary of data:

location, spread, distribution, trend lines.

l Exploratory (EDA): Find patterns and relationships,

generate hypotheses & conjectures about the model.

l Graphical EDA: Emphasizes visualization tools.

slide-4
SLIDE 4

Experimental Algorithmics: Mostly Exploratory

0.8 1 1.2 1.4 1.6 1.8 2 2.2 1 2 3 4 5 6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 1 2 3 4 5 6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 1 2 3 4 5 6

Comparison: Bigger/smaller Location/ spread. Confirmatory + descriptive methods. Trend Lines: Apply a concise model. Interpolate. Confirmatory + descriptive methods. Modeling: Find the ``true'' underlying model. Extrapolate. Exploratory + graphical methods.

slide-5
SLIDE 5

Find the Model

First Fit algorithm for Bin Packing (ccm 1989). e(u,N) = empty space in bins, as a function of item size u, item count N. Contrary to previous conjectures: e(u, N) is not linear in u for large N. e(u, N) is not linear in N for all u < 1.

slide-6
SLIDE 6

Find the Model

Min cuts in randomly weighted grid graphs. The mean occurs with probability zero. The result: discovery

  • f bimodality and an

explanation of

  • bserved mean

runtime

slide-7
SLIDE 7

Find the Model

Paired data points x, y. Correlation statistic r .83 < r < .87 Fairly strong positive correlation. But the model is quite different in each case!

slide-8
SLIDE 8

Algorithmic Experiments: The Good News

Near total control of the experiment. Tons of data. Fairly simple causal models (input-output). Relatively strong theory (for runtime analysis anyway).

slide-9
SLIDE 9

Algorithmic Experiments: The Bad News

High expectations from data analysis. Unusual and complex relationships. Standard statistical techniques do not address common questions.

slide-10
SLIDE 10

Methods for Algorithmic Experiments*

*Emphasizing descriptive, exploratory & graphical techniques. Focus on exploratory (vs designed) experiments.

  • I. How not to do it (anonymized).
  • II. One methodological question: What should

I measure?

slide-11
SLIDE 11

1: CPU time increase is probably due to paging at large problem sizes.

How Not to Do It 1

``As conjectured, algorithm cost is asymptotically O(n), except for a slight increase at large n.1''

slide-12
SLIDE 12

CPU Times Are Noisy

One program, 9 identical instances (ordered by size), run on 12 common platforms. (11 other platforms couldn't run all instances.) src: DIMACS TSP Challenge

slide-13
SLIDE 13

What Not To Do 1

l Don't choose the wrong cost metric:

CPU time ≠ dominant operation

l Don’t fit regression lines to CPU times: step

functions, increasing constants, skew, bimodality, etc. violate the model assumptions...

l Don’t naively extrapolate from regression fits to

asymptopia.

slide-14
SLIDE 14

How Not to Do It 2

My experiments using Culberson's Iterated Greedy code for graph coloring. Look at: variations in Vertex Rule Inputs: from a DIMACS Challenge (1993).

G = (Input Graph) C = (Initial Coloring) Loop Reorder colors (Color rule) Reorder vertices (Vertex rule) C = GreedyColor[G] //save best C = Kempe reduction (k iterations) C = Revert rule (r iterations) until Stopping rule (r, max) Report C* best coloring found.

slide-15
SLIDE 15

DIMACS Challenge Graph Coloring Inputs (REG) = Register interference graphs from a compiler

  • application. Optimal

Colorings not known. This input.

slide-16
SLIDE 16

Culberson's Iterated Greedy Algorithm, scores for one input.

Compare Five Vertex Rules Score = color count + `niceness’ term.

Two rules look like random walks. Three rules converge after 1 iteration. Why do they converge so quickly? ... Report scores after every 100 iterations.

slide-17
SLIDE 17

Why do they converge so quickly?

l Because the REG

graphs are trivial to solve.

l Optimality of one

pass of simple greedy can be verified by eye!

x y

2 cliques chained together

Adjacency Matrix

slide-18
SLIDE 18

DIMACS Challenge Graph Coloring Inputs (REG) = Register interference graphs from a compiler

  • application. Optimal

colorings not known....

Because nobody tried running the simple greedy algorithm on these problems.

slide-19
SLIDE 19

How Not to Do It 2

l Don't assume that good performance is due to

your wonderful new algorithm. It could be due to easy inputs.

l Don’t overlook simple/baseline algorithms. l Don't just look at the final result – look at the

mechanics.

l Don't pull parameters (iteration count) out of

your hat. My runtimes were 700x bigger than necessary.

slide-20
SLIDE 20

How Not to Do It 3

u FF: Consider weights in order of arrival; pack each into the first (leftmost) bin that can contain it.

The First Fit Heuristic for Bin Packing.

L = list of n weights uniformly distributed

  • n (0, u).

How well does FF pack them into unit- capacity bins, as a function of u, asymptotically in n?

slide-21
SLIDE 21

First Fit Bin Packing

Experiments circa 1978: Pack n=1000 weights, u=.25, .5, .75, 1, 20 trials each. Measure B: bins used in the packing. Cost = Bins / OPT Cost < Bins / Weights Known: FF is optimal at u=1. Conjecture: FF is

  • ptimal for all u.
slide-22
SLIDE 22

First Fit Bin Packing

Ten years later: n=100,000 weights, u = .2, .22, ... .98 1.0. Measure Empty Space: E = B – Weightsum. Observe = Empty space is not linear in u! The ``peak'' grows as n increases; the valley disappears. New Conjecture: FF is

  • ptimal u=1, but

nowhere else.

slide-23
SLIDE 23

How Not To Do It 3

l Don’t assume you are looking at asymptopia. l Don’t look at one cost metric. Here, a difference

gives a better view than a ratio. Cost = Bins / Weights EmptySpace = Bins – Weights

slide-24
SLIDE 24

Don’t Be That Guy: What to Avoid

Looking at the wrong metric. Looking at just one metric. Looking at summarized instead of raw data. Reasoning incorrectly about cause and effect. Thinking the model is a true description of the data

  • -- Use exploratory experiments to build

understanding.

slide-25
SLIDE 25

Experimental Algorithm Evaluation

  • I. How not to do it.
  • II. What should I measure?

Matching goals to choices. Reducing design complexity. Variance reduction techniques.

slide-26
SLIDE 26

Factors that affect experimental

  • utcomes in algorithmics
  • Metrics. Quantities that are measured as

performance indicators: e.g. time, solution quality.

  • Input. Category, provenance, size n, more.

Algorithm/code. Data structure, tuning, rules.

  • Environment. Language, compiler, memory

hierarchy, etc.

Today!

slide-27
SLIDE 27

What Should I Measure?

Theory: The ocean is at most O(n3) feet deep when n feet from shore. Practice: New York Harbor is 24.1532 feet deep at its entrance. The Atlantic Ocean is 12,881.82892 feet deep in

  • ne spot.

It is 102.03901 feet deep in another ....

slide-28
SLIDE 28

What Should I Measure?

Match the performance indicator to the research goals. There is usually a tradeoff: Theory = accurate (always true) Practice = precise (many decimal places).

Accurate. Precise

slide-29
SLIDE 29

Flavors of Experiments

Field experiments:

l Observe real-world

phenomena

l Describe results l Classify outcomes

Laboratory experiments:

l Isolate components l Manipulate parameters l Cause/effect l Build models

slide-30
SLIDE 30

Levels of Instantiation

void Qsort(A, lo, hi) { if (lo >= hi) return; int p = Partition(A); Qsort(A, lo, p-1); Qsort(A, p+1, hi); }

Algorithm Program Process

Quicksort A[lo,hi]: x is an element of A Partition A around x Recur to left of x Recur to right of x

... Paradigm ... Algorithm ... Data Structures ... Source ... Object ... Process ...

slide-31
SLIDE 31

What Should I Measure?

Lab Experiment: Isolated components

  • f the algorithm,

abstract costs, simple generated inputs. Field Experiment: Instantiated code, whole costs, realistic inputs.

Accurate = Dominant

  • peration counts.

Precise = CPU times.

slide-32
SLIDE 32

Time Performance Indicators

l Theory's dominant op. l Data structure ops. l Function calls. l Main loop iterations. l Code block counts. l CPU time l Memory accesses l Cache & page misses l Wall clock time l ... experimenters have many

choices!

slide-33
SLIDE 33

How to Choose a PI

l Enough precision to distinguish outcomes

among competing algorithms.

l Choose PIs that are directly comparable

across several algorithms.

l Lab: PI should isolate the interesting factors

and ignore irrelevant factors.

l Field: PI should measure the whole cost. l Choose indicators that match the literature. l OK to use more than one.

.

slide-34
SLIDE 34

Multiple Performance Indicators

.

x = number

  • f edge

crossings y = aspect ratio z = user task score Which algorithm is best?

C A B best worst

slide-35
SLIDE 35

Multiple Performance Indicators

Two strategies for reducing dimensionality in data:

  • 1. Transform numerical

to categorical data and stratify.

  • 2. Project to a lower

dimension.

low med high

slide-36
SLIDE 36

Categorization+Stratification

l Two

algorithms: red and blue.

l Each has solid

cost and a dashed cost.

l Which is

better, as f(x)?

. parameter x best worst

slide-37
SLIDE 37

Categorization + Stratficiation

l In areas where

dashed is below the threshold, which algorithm has best solid curve?

. parameter x best worst none blue red blue no yes

slide-38
SLIDE 38

Project to a Lower Dimension

.

a = A(x,y,z) b = B(x,y,z) c = C(x,y,z) Which is lowest: a,b,c?

C A B best worst

slide-39
SLIDE 39

Variance Reduction Techniques

How to create experiments that produce analysis- friendly data. Less variance in data, in same number of trials.

slide-40
SLIDE 40

What is Analysis-Friendly?

Is the mean of y asymptotically positive or negative? x x y y

slide-41
SLIDE 41

Case Study: Self-organizing Sequential Search

Sequence of m random requests for items 1..n. Request probabilities p1, p2 ... Organizing rules M, T Cost = location in list What is the expected difference in cost of rules M and T?

L: 3, 5, 7, 1, 2, 4, 9, 8,6

Request r = 2, p2 Cost = 5

M: 2, 3, 5, 7, 1, 4, 9, 8, 6

M: Move to Front Rule

T: 3, 5, 7, 2, 1, 4, 9, 8,6

T: Transpose Rule

slide-42
SLIDE 42

Comparison of MTF and TRA

slide-43
SLIDE 43

VRT: Common Random Numbers

l If measuring

difference of two algorithms, and

l Outcomes are

positively correlated with respect to input, then

l Compare using

common inputs, paired differences.

When comparing M & T, use the same random request list. D[L] = M[L] - T[L] E[D] = E[ M – T ] = E[M] – E[T] Var(D) = Var(M) + Var(T) – 2Cov(M, T) If covariance > 0, variance in D is smaller than in unpaired differences.

slide-44
SLIDE 44

Common Random Numbers Cost Difference D = MTF - TR

  • St. Dev = 7.50
  • St. Dev = 5.03

n=20, m=1000, t=50 random trials, P = Zipf's Law

slide-45
SLIDE 45

VRT: Control Variates

If cost C is correlated with variate Y, and Expectation E[Y] = y is known, then Measure C and look at the difference D = C – y Y is a ``control variate'' for C.

Cov(C, Y) > 0 E[Y] = y (known) E[C] = c (unknown) E [ C – Y ] = c - y Var[C – Y ] = Var(C) + Var(Y) – 2Cov(C,Y). Measure Di = Ci – Yi Estimate E[C] with Di – y

slide-46
SLIDE 46

Control Variates

Cost of request r in Transpose list is positively correlated with cost of r in the Optimal list. Cost of r in Optimal list = r Measure D = T – r for each request. Estimate E[T] = D + r R is a control variate for T.

T: 3, 6, 2, 5, 1, 4, 7 Opt: 1, 2, 3, 4, 5, 6, 7

Cost = 3 D = 1 = 3 - 2

slide-47
SLIDE 47

Control Variates Cost of TR

T n=20, m=1000, t=50, P = Zipf's Law D + r

slide-48
SLIDE 48

FF Bin Packing

u l Consider weights in order of arrival; pack each into the first (leftmost) bin that can contain it.

Given a list of n weights uniform on (0,u), pack into unit- sized bins using the First Fit heuristic. What is the expected cost (bins used) of FF(n,u)?

slide-49
SLIDE 49

FF Control Variates

Bins = Weight Sum + Empty Space Bins is positively correlated with weight sum. E[W] = n*u/2 Measure empty space. Estimate mean bins with E[B] = E[S] +nu/2

Weight sum is a control variate for number of bins. Empty space S=B-W has less variance than B. S + nu/2 is friendlier to analyze than B.

slide-50
SLIDE 50

FF Control Variates Bins vs Empty Space

slide-51
SLIDE 51

Sources of Control Variates

l Optimal solution

(Transpose).

l Lower bound on solution

(bin packing).

l Initial random solution in

iterative algorithm.

l Solution to a simpler

problem.

l ....

  • 1. Is the variate X

correlated with solution cost C? (Check it experimentally.)

  • 2. Do I know the

expectation x of this variate X? If YES: Measure the difference D = C – X. . Substitute x = E[X] to estimate mean cost. E[C] = E[D] + x

slide-52
SLIDE 52

VRT: Conditional Monte Carlo

l If cost C depends on another

variate Z, and

l the conditional expectation of

C with respect to Z can be calculated directly, then

l Generate Z, and calculate c =

f(C|Z).

l E[c] = E[C,Z]

slide-53
SLIDE 53

MTF: Conditional Monte Carlo

l Cost of request r at

time t depends on list order L.

l Expected cost of

the list f(L) can be calculated.

l Generate list M,

then calculate f(M)

M: 3, 5, 2, 1, 4

Instead of one request r = 2, cost = 3, Calculate the expected cost of all requests for this list. E[c] = 1p3 + 2p5+ 3p2 + 4p1 + 5p4

slide-54
SLIDE 54

Sequential Search Cost With/Without Variance Reduction

slide-55
SLIDE 55

Ideas for Conditional Monte Carlo

Iterative, data structures, random walk algorithms: Generate random state S at time t. Instead of sampling a random cost in S, compute expected cost of S.

+ + + +

Splitting: if E[S] can't be calculated, sample the cost many times from S.

slide-56
SLIDE 56

Statistical Methods for Algorithm Evaluation

  • I. How not to do it.
  • II. What should I measure?
  • Matching metrics to experimental goals.
  • Reducing dimensionality of multiple objectives.
  • Variance reduction techniques.
slide-57
SLIDE 57

Some Papers on Methodology

l CCM, Analyzing Algorithms by Simulation: Variance

Reduction Techniques and Simulation Speedups, ACM Computing Surveys, June 1992.

l D.S. Johnson, A theoretician's guide to experimental analysis

  • f algorithms, in Data Structures, Near Neighbor Searches,

and Methodology, Fifth and Sixth DIMACS Implementation Challenges, Goldwasser, Johnson, McGeoch, Eds. (Or just google it).

slide-58
SLIDE 58

Some Useful Books

l Cohen, Empirical Methods for Artificial Intelligence l Chambers et al., Graphical Methods for Data Analysis l Bartz-Beielstein et al. (eds), Experimental Methods for the

Analysis of Optimization Algorithms, Springer 2010

l Muller-Hannemann and Schirra (eds), Algorithm Engineering

– Bridging the Gap Between Algorithm Theory and Practice, Springer 2010 (Dagstuhl Meeting)

l McGeoch, A Guide to Experimental Algorithmics, Cambridge

Press 2012

slide-59
SLIDE 59

Theory vs Practice

Practice Theory

  • In theory there is no

difference between theory and practice. In practice there is.

  • It may be alright in

practice, but it will never work in theory.

  • There is nothing so

practical as a good theory.

slide-60
SLIDE 60

Thanks for Your Attention! Any Questions?