Tricks and Traps for Young Players Ray D Brownrigg Statistical - PowerPoint PPT Presentation

Tricks and Traps for Young Players Ray D Brownrigg Statistical Computing Manager School of Mathematics, Statistics and Computer Science Victoria University of Wellington Wellington, New Zealand ray@mcs.vuw.ac.nz UseR! 2008 Dortmund, August 2008

CONTENTS 1. Background 2. Introduction 3. sort(), order() and rank() 4. Reproducible random numbers for grid computing 5. Resolution of pdf graphs 6. Local versions of standard functions 7. Vectorisation • user-defined functions using curve() • pseudo vectorisation • multi-dimensional 8. get() 1

CONTENTS (continued) 9. Using a matrix to index an array 10. Matrices, lists and dataframes, which are more efficient? 11. Using .Rhistory 12. [Windows] file.choose() 2

1. Background Items encountered during a simulation research project using a computation grid of approximately 150 unix workstations. 2. Introduction Calculate the distribution function of the supremum of a nor- malised two-dimensional independent poisson process. This simulates Brownian Motion, which appears as a limiting process in goodness-of-fit studies. – throw down N points on unit square – calculate difference between density and expected density at every point on the square – find supremum 3

E.g.: Example plot of ξ n (s, t) Example plot of ξ n (s, t) − nst * z − nst z * * * * * * * * * * t t s s 4

– goal is to have N as large as computationally possible, given we need large number of repetitions – basic exhaustive search algorithm is O ( N 3 ) – Fortran gives > 1 order of magnitude improvement (12-40x) – restructuring to single loop using cumsum() and order() is generally faster than the initial Fortran – now O ( N 2 ) – further improvements save another factor of 3 – now Fortran saves another 1.5 orders of magnitude (i.e. 30x) – overall 5 orders of magnitude speed improvement 5

Algorithm performance 1e+04 1e+02 CPU time a A f h 1e+00 k m V 1e−02 5000 1e+02 1e+03 1e+04 1e+05 1e+06 length 6

3. sort(), order() and rank() – sort ( x ) == x [ order ( x )] • in fact it is defined that way (for “objects” - with class) – rank ( x ) == order ( order ( x )) – order ( order ( x )) is generally faster than rank ( x ) – for small vector lengths x [ order ( x )] can be faster than sort ( x ) • but see later 7

4. Reproducible random numbers for grid computing – generally need to be able to rerun a task – can generate .Random.seed for each task, keep in table, lookup table when required – or generate random sequence ’on the fly’ • don’t need to pass R data to each task • each task can be ’text only’ • but do need to know how many random numbers are used for each task 8

5. Resolution of pdf graphs – specify width= and height= to suit eventual size – e.g. small diagram in paper postscript() postscript(width=6, height=4) Histogram of normals, small size Histogram of normals, standard size 150 150 Frequency 100 Frequency 100 50 50 0 −3 −2 −1 0 1 2 3 0 rnorm(1000) −3 −2 −1 0 1 2 3 rnorm(1000) 9

– e.g. fine detail in downloadable file pdf() 2D edfs for m = 50K 1.0 0.8 0.6 n=50 edf(x) n=100 n=1000 n=10000 0.4 n=50000 n=100000 n=100K, m=100K 0.2 0.0 0 1 2 3 4 x 10

pdf(width=12, height=12) 2D edfs for m = 50K 1.00 0.95 n=50 n=100 n=1000 n=10000 n=50000 n=100000 n=100K, m=100K 0.90 edf(x) 0.85 0.80 0.75 2.0 2.2 2.4 2.6 2.8 3.0 x 11

6. Local versions of standard functions – once algorithm and data are known to be ’clean’ – extract just the ’active’ part of primary function – savings are dependent on the format of the data – e.g. rank() > x <- runif(50000) > system.time(for(i in 1:1000) rank(x)) user system elapsed 22.698 0.550 23.257 > system.time(for(i in 1:1000) .Internal(rank(x, "min"))) user system elapsed 20.356 0.160 20.575 > 12

– e.g. sort() > system.time(for(i in 1:1000) sort(x)) user system elapsed 11.189 0.119 11.349 > system.time(for(i in 1:1000) .Internal(qsort(x, FALSE))) user system elapsed 5.237 0.070 5.321 > all.equal(sort(x), .Internal(qsort(x, FALSE))) [1] TRUE > 13

– e.g. order() > system.time(for(i in 1:1000) order(x)) user system elapsed 18.948 0.010 18.986 > system.time(for(i in 1:1000) .Internal(qsort(x, TRUE))$ix) user system elapsed 7.105 0.050 7.170 > all.equal(order(x), .Internal(qsort(x, TRUE))$ix) [1] TRUE > 14

7. Vectorisation • user-defined functions using curve() – curve() requires a vectorised expression – e.g. a ( x ) = φ ( x ) / (1 − Φ( x )) φ is standard normal density Φ is standard normal df g 1( x ) = a ( x ) / (1 + x.a ( x ) − a 2 ( x )) x G 1( x ) = −∞ g 1( y ) dy � – want G1() to be vectorised 15

‘G1‘ <- function(z) { lz <- length(z) oz <- order(z) z <- c(-Inf, z[oz]) result <- rep(NA, lz) for (i in 1:lz) { result[i] <- integrate(g1, z[i], z[i + 1])$value } return(cumsum(result)[order(oz)]) } – check vectorisation: ... 16

> x <- qnorm(runif(10)) > x [1] 1.2629543 -0.6264538 -0.3262334 0.1836433 1.3297993 [6] -0.8356286 1.2724293 1.5952808 0.4146414 0.3295078 > for (i in 1:10) cat(G1(x[i]), "\t") 8.17856 0.4691605 0.7979545 1.829099 8.883072 0.3174469 8.27546 12.20594 2.591307 2.283419 > print(G1(x)) [1] 8.1785600 0.4691605 0.7979545 1.8290994 8.8830715 [6] 0.3174469 8.2754602 12.2059365 2.5913071 2.2834192 > 17

– check timing: > x <- qnorm(runif(100000)) > system.time(for (i in 1:length(x)) G1(x[i])) user system elapsed 24.251 -0.001 24.270 > system.time(G1(x)) user system elapsed 11.496 0.000 11.501 > 18

– curve() is extremely useful when tracking down numerical instability – e.g. > G1(7) Error in integrate(g1, z[i], z[i + 1]) : maximum number of subdivisions reached > 19

> g1 function(y) { ay <- a(y) return(ay/(1 + y*ay - ay^2)) } > a function(y) return(dnorm(y)/(1 - pnorm(y))) > > curve(g1, 6, 7) > curve(a, 6, 7) > 20

380 340 g1 (x) 300 260 6.0 6.2 6.4 6.6 6.8 7.0 x 7.0 6.8 a (x) 6.6 6.4 6.2 6.0 6.2 6.4 6.6 6.8 7.0 21 x

> curve(x*a(x) - a(x)**2, 6, 7) x * a(x) − a(x)^2 −0.982 −0.981 −0.980 −0.979 −0.978 −0.977 −0.976 6.0 6.2 6.4 x 6.6 6.8 7.0 22

> a function(y) return(dnorm(y)/(1 - pnorm(y))) > > a1 function(y) return(dnorm(y)/pnorm(y, lower=FALSE)) > > curve(x*a(x) - a(x)**2, 6, 7) > curve(x*a1(x) - a1(x)**2, add=T, col=2) > 23

x * a(x) − a(x)^2 −0.982 −0.981 −0.980 −0.979 −0.978 −0.977 −0.976 6.0 6.2 6.4 x 6.6 6.8 24 7.0

• pseudo vectorisation – if val is a scalar, then val [1] is defined – use a loop to generate a vector result – e.g. ‘stepfun2D‘ <- function(u1, u2, s, t) { # vectorised in t res <- numeric(lt <- length(t)) for (i in 1:lt) res[i] <- sum(u1 <= s & u2 <= t[i]) return(res) } 25

• multi-dimensional – a function of two parameters may give the correct result when one of the parameters is supplied as a vector, but fail if both are – e.g. two-dimensional linear interpolation (achieves vectorisation through the use of recycling) ‘jointpc0.5‘ <- function(s, t) { # Only one of s, t can be vector. DI <- DJ <- 0.001 # granularity of table i <- s/DI + 1 j <- t/DJ + 1 26

di <- i %% 1 dj <- j %% 1 i <- trunc(i) j <- trunc(j) res <- (1 - dj)* ((1 - di)*jpt0.5[i, j] + di*jpt0.5[i + 1, j]) + dj* ((1 - di)*jpt0.5[i, j + 1] + di*jpt0.5[i + 1, j + 1]) return(res) } 27

‘jointpc0.5m‘ <- function(s, t) { # Either or both can be a vector. DI <- DJ <- 0.001 # granularity of table i <- s/DI + 1; j <- t/DJ + 1 di <- i %% 1; dj <- j %% 1 i <- trunc(i); j <- trunc(j) res <- t( (1 - dj)* t((1 - di)*jpt0.5[i, j] + di*jpt0.5[i + 1, j]) + dj* t((1 - di)*jpt0.5[i, j + 1] + di*jpt0.5[i + 1, j + 1])) return(res) } 28

8. get() – useful when using paste to construct an object name – can be used as if it was an object of the type retrieved – e.g. get("+")(3, 5) get("x")[4] thisobj <- get(paste("myobject", myval, sep="")) for(i in ls()) cat(i, "\t", object.size(get(i)), "\n") 29

9. Using a matrix to index an array – general format is m*n • m is the number of elements to be matched • n is the number of dimensions of the array – can generate the matrix using matrix() – or use which(..., arr.ind = TRUE) – e.g. ... 30

> arr <- sample(1:24) > dim(arr) <- 4:2 > arr , , 1 [,1] [,2] [,3] [1,] 5 14 7 [2,] 19 16 4 [3,] 8 24 1 [4,] 18 20 6 , , 2 [,1] [,2] [,3] [1,] 22 13 12 [2,] 15 9 21 [3,] 17 10 23 [4,] 2 3 11 > 31

> toolarge <- which(arr > 20, arr.ind = TRUE) > toolarge dim1 dim2 dim3 [1,] 3 2 1 [2,] 1 1 2 [3,] 2 3 2 [4,] 3 3 2 > > arr[toolarge] <- NA > arr , , 1 [,1] [,2] [,3] [1,] 5 14 7 [2,] 19 16 4 [3,] 8 NA 1 [4,] 18 20 6 , , 2 [,1] [,2] [,3] [1,] NA 13 12 [2,] 15 9 NA [3,] 17 10 NA [4,] 2 3 11 > 32

Tricks and Traps for Young Players Ray D Brownrigg Statistical - PowerPoint PPT Presentation

Tricks and Traps for Young Players Ray D Brownrigg Statistical Computing Manager School of Mathematics, Statistics and Computer Science Victoria University of Wellington Wellington, New Zealand ray@mcs.vuw.ac.nz UseR! 2008 Dortmund, August

TRICKS AND TRAPS OF THE CO-OPS ACT TRICKS AND TRAPS OF THE CO-OPS ACT THE NAME TRAP a

Sensitivity Of Quake3 Players Sensitivity Of Quake3 Players Sensitivity Of Quake3 Players

NFS Tricks and Benchmarking Traps Daniel Ellard and Margo Seltzer FREENIX 2003 - June 12, 2003

Traps and Faults Traps and Faults Review: Mode and Space Review: Mode and Space C A B data

Avoiding Sand Traps and Moguls: Avoiding Sand Traps and Moguls: A Refresher Course for In A

The History of the Inner Solar System According to the Lunar Cold Traps D. H. Crider Catholic

Positrons traps Samuel Niang C.E.A. Saclay / GBAR Experiment 28/11/2018 Positrons traps 1 35

Tasman Valley Predator Control 2005 - present Kill traps Leghold traps Cat captures 40 35 30

Dont Get Short Changed: Tricks and Traps to Keep in Mind When Setting up Contracts and

Strategic Games. Famous because? Proving Nash. C D N players. n players. C (3,3) (0,5) Each

Unpacking tips and tricks Protector Techniques Conclusion Samuel Chevet w4kfu@lse.epita.fr

Introduction and applications Bending & twisting rigidity of DNA with Magnetic Traps. What

Cute Tricks with Cute Tricks with Virtual Memory Virtual Memory A short history of VM A short

Teaching old type systems Teaching old type systems new tricks with type providers new tricks

Image Processing Tricks in Image Processing Tricks in OpenGL OpenGL Simon Green Simon Green

The Agile PMP: Teaching an Old Dog New Tricks The Agile PMP: Teaching an Old Dog New Tricks

Who is that guy? Sanne Grinovero From this planet T eam Hibernate

What Youll Learn Today Why do they call it a motion picture? What is digital

BOF: Painless kernel - Removing the HZ Josh Triplett September 23, 2009 Problem Tickless

The B D ( ) & B decays, Theoretical status thereof Marat

Introduction MPEG: A Video Compression Standard 1980s technology made possible full

Apache Cassandra for Big Data Applications Christof Roduner Java User Group Switzerland COO and

A Tutorial By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Outline

GHDL and the economy of EDA FOSS Tristan Gingold - FSiC 2019 1 / 21 What is GHDL ? A Free

Tricks and Traps for Young Players Ray D Brownrigg Statistical - PowerPoint PPT Presentation

Tricks and Traps for Young Players Ray D Brownrigg Statistical Computing Manager School of Mathematics, Statistics and Computer Science Victoria University of Wellington Wellington, New Zealand ray@mcs.vuw.ac.nz UseR! 2008 Dortmund, August

TRICKS AND TRAPS OF THE CO-OPS ACT TRICKS AND TRAPS OF THE CO-OPS ACT THE NAME TRAP a

Sensitivity Of Quake3 Players Sensitivity Of Quake3 Players Sensitivity Of Quake3 Players

NFS Tricks and Benchmarking Traps Daniel Ellard and Margo Seltzer FREENIX 2003 - June 12, 2003

Traps and Faults Traps and Faults Review: Mode and Space Review: Mode and Space C A B data

Avoiding Sand Traps and Moguls: Avoiding Sand Traps and Moguls: A Refresher Course for In A

The History of the Inner Solar System According to the Lunar Cold Traps D. H. Crider Catholic

Positrons traps Samuel Niang C.E.A. Saclay / GBAR Experiment 28/11/2018 Positrons traps 1 35

Tasman Valley Predator Control 2005 - present Kill traps Leghold traps Cat captures 40 35 30

Dont Get Short Changed: Tricks and Traps to Keep in Mind When Setting up Contracts and

Strategic Games. Famous because? Proving Nash. C D N players. n players. C (3,3) (0,5) Each

Unpacking tips and tricks Protector Techniques Conclusion Samuel Chevet w4kfu@lse.epita.fr

Introduction and applications Bending &amp; twisting rigidity of DNA with Magnetic Traps. What

Cute Tricks with Cute Tricks with Virtual Memory Virtual Memory A short history of VM A short

Teaching old type systems Teaching old type systems new tricks with type providers new tricks

Image Processing Tricks in Image Processing Tricks in OpenGL OpenGL Simon Green Simon Green

The Agile PMP: Teaching an Old Dog New Tricks The Agile PMP: Teaching an Old Dog New Tricks

Who is that guy? Sanne Grinovero From this planet T eam Hibernate

What Youll Learn Today Why do they call it a motion picture? What is digital

BOF: Painless kernel - Removing the HZ Josh Triplett September 23, 2009 Problem Tickless

The B D ( ) &amp; B decays, Theoretical status thereof Marat

Introduction MPEG: A Video Compression Standard 1980s technology made possible full

Apache Cassandra for Big Data Applications Christof Roduner Java User Group Switzerland COO and

A Tutorial By Sarita Adve &amp; Kourosh Gharachorloo Slides by Jim Larson Outline

GHDL and the economy of EDA FOSS Tristan Gingold - FSiC 2019 1 / 21 What is GHDL ? A Free

Introduction and applications Bending & twisting rigidity of DNA with Magnetic Traps. What

The B D ( ) & B decays, Theoretical status thereof Marat

A Tutorial By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Outline