Core bench: micro-benchmarking for OCaml Christopher S. Hardin and - - PowerPoint PPT Presentation

core bench micro benchmarking for ocaml
SMART_READER_LITE
LIVE PREVIEW

Core bench: micro-benchmarking for OCaml Christopher S. Hardin and - - PowerPoint PPT Presentation

Overview Implementation Core bench: micro-benchmarking for OCaml Christopher S. Hardin and Roshan P. James Jane Street September 24, 2013, OUD Workshop Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml


slide-1
SLIDE 1

Overview Implementation

Core bench: micro-benchmarking for OCaml

Christopher S. Hardin and Roshan P. James

Jane Street

September 24, 2013, OUD Workshop

Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml

slide-2
SLIDE 2

Overview Implementation

Micro-benchmarking

Precise measurement is essential for writing performance sensitive code. Objective: Measure the execution cost of functions that are relatively cheap.

Functions with execution times on the order of nanoseconds to a tens or hundreds of milli-seconds. A 3.4 GHz cpu runs several simple instructions per nanosecond.

Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml

slide-3
SLIDE 3

Overview Implementation

Micro-benchmarking : Timing

let t1 = Time.now () in f (); let t2 = Time.now () in report (t2 - t1) Time.now is often too imprecise (about 1 microsec). Asking for current time also takes time.

Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml

slide-4
SLIDE 4

Overview Implementation

Micro-benchmarking : Timing

let t1 = Time.now () in f (); let t2 = Time.now () in report (t2 - t1) Time.now is often too imprecise (about 1 microsec). Asking for current time also takes time.

Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml

slide-5
SLIDE 5

Overview Implementation

Micro-benchmarking : Batch sizes

let t1 = Time.now () in for i = 1 to batch_size do f (); done; let t2 = Time.now () in report batch_size (t2 - t1) Compute a batch size to account for the timer. Criterion for Haskell. Mean, Std deviation to account for system noise.

Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml

slide-6
SLIDE 6

Overview Implementation

Micro-benchmarking : Batch sizes

let t1 = Time.now () in for i = 1 to batch_size do f (); done; let t2 = Time.now () in report batch_size (t2 - t1) Compute a batch size to account for the timer. Criterion for Haskell. Mean, Std deviation to account for system noise.

Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml

slide-7
SLIDE 7

Overview Implementation

Micro-benchmarking : Noise

System noise from other processes and OS activity. More importantly, there are delayed costs due to GC. Variance in execution times is influenced by batch size. 1e+07 2e+07 3e+07 4e+07 5e+07 2000 4000 6000 8000 10000 runtime (ms) batch size

Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml

slide-8
SLIDE 8

Overview Implementation

Core bench : Linear regression

Treats micro-benchmarking as a linear regression.

Simple case: fit of execution time to batch size.

Data of larger batch sizes have smaller %-error.

Geometric sampling of batch sizes to get a better linear fit.

1000 2000 3000 4000 5000 6000 7000 1e+06 runtime (ms) batch size

Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml

slide-9
SLIDE 9

Overview Implementation

Core bench : Linear regression

No need to estimate the clock and other constant errors:

Constant overheads are accounted for in the y-intercept.

Predict other costs in the same way.

Estimate memory allocations and promotions using batch size. Estimate garbage collection using batch size.

User specifies how much sampling time is allowed.

More data allows better estimates. Error estimation, goodness of fit by

Bootstrapping R2

Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml

slide-10
SLIDE 10

Overview Implementation

Example source (basic)

  • pen Core.Std
  • pen Core_bench.Std

let t1 = Bench.Test.create ~name:"id" (fun () -> ()) let t2 = Bench.Test.create ~name:"Time.now" (fun () -> ignore (Time.now ())) let t3 = Bench.Test.create ~name:"Array.create300" (fun () -> ignore (Array.create ~len:300 0)) let () = Command.run (Bench.make_command [t1; t2; t3]) Output Name Time/Run Minor Major

  • ---------------- ---------- ------- -------

id 3.08 Time.now 843 2.00 Array.create300 3_971 301

Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml

slide-11
SLIDE 11

Overview Implementation

Some functions have strange execution times

let benchmark = Bench.Test.create ~name:"List.init" (fun () -> ignore(List.init 100_000 ~f:id))

100 200 300 400 500 600 700 100 200 300 400 500 runtime (ms) batch size

  • bserved

1-predictor model

Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml

slide-12
SLIDE 12

Overview Implementation

Multiple predictors

100 200 300 400 500 600 700 50 100 150 200 250 300 350 400 450 milliseconds batch size

  • bserved runtime

runs promoted words compactions

Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml

slide-13
SLIDE 13

Overview Implementation

Multiple predictors: fit

Using runs, compactions, promoted as predictors 100 200 300 400 500 600 700 100 200 300 400 500 runtime (ms) batch size

  • bserved

1-predictor model 3-predictor model

Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml

slide-14
SLIDE 14

Overview Implementation

Runtime cost decomposition example

X = [batch size x, minor GCs, compactions], y = runtime (ns). Solve Xβ = y, xγ = X. Suppose we get β =   1.06 × 104 1.04 × 106 2.25 × 106   γ =

  • 1

0.00299 0.00149

  • Then (predicted) runtime is

γβ = (1.06 × 104)(1)

  • nominal

+

ns/mGC

  • (1.04 × 106)

mGCs/run

  • (0.00299)
  • minor GC cost

+

ns/cmp

  • (2.25 × 106)

cmps/run

  • (0.00149)
  • compaction cost

= 10.6µs + 3.1µs + 3.4µs = 17.4µs (Note: Just solving xm = y gives 17.4µs.)

Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml

slide-15
SLIDE 15

Overview Implementation

Conclusion and Future Work

  • pam install core bench

Expose more predictors

Measure the effect of live words on performance. Counters for major collection work per minor GC.

Accuracy of results

Ordinary least-squares is susceptible to outliers. Incorporate the fact that measurement error is heavy-tailed (on the positive side). Automatically select execution time based on error.

Automatically pick predictors from a set.

Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml

slide-16
SLIDE 16

Overview Implementation

Thank you.

Christopher S. Hardin and Roshan P. James Core bench: micro-benchmarking for OCaml