Stochastic simulation and resampling methods Statistical modelling: - PowerPoint PPT Presentation

Stochastic simulation and resampling methods Statistical modelling: Theory and practice Gilles Guillot gigu@dtu.dk October 22, 2013 Gilles Guillot (gigu@dtu.dk) Stochastic simulation and resampling methods October 22, 2013 1 / 32

Simulations, what for? 1 Pseudo uniform random number generation 2 Beyond uniform numbers on [0 , 1] 3 Random number generation in R 4 An application: the bootstrap 5 Exercises 6 Gilles Guillot (gigu@dtu.dk) Stochastic simulation and resampling methods October 22, 2013 2 / 32

Introduction Analysis of a stochastic model A stochastic model involving D : something observed (or observable) about the system (data) θ : a known or unknown parameter P ( θ, D ) : a probability model ( not necessarilly known as a nice function) Three questions One has observed some data D , what does it say about θ ? (inference) If θ is known to be equal to θ 0 , what does it say about D ? (prediction) One has observed some data D , what does it say about future data value D ∗ ? (joint inference and prediction) Stochastic simulations used to produce some “fake data” used to understand/learn something about the data generating process. Much more in June course Stochastic Simulation 02443. Gilles Guillot (gigu@dtu.dk) Stochastic simulation and resampling methods October 22, 2013 3 / 32

Introduction Stochastic simulations as a tool in ordinary numerical analysis Integration in large dimension Optimization and equation solving Gilles Guillot (gigu@dtu.dk) Stochastic simulation and resampling methods October 22, 2013 4 / 32

Pseudo uniform random numbers Pseudo uniform random number generation Computer = most deterministic thing on earth Idea here: produce deterministic sequence that mimics randomness Methods originate in von Neuman’s work in theoretical physics during WWII coined “Monte Carlo” simulations Today’s lecture: basic idea used to produce uniform numbers in [0 , 1] a statistical application: the bootstrap method Gilles Guillot (gigu@dtu.dk) Stochastic simulation and resampling methods October 22, 2013 5 / 32

Pseudo uniform random numbers Some examples of more or less “random uniform” sequences in { 0 , 1 } (not to be mixed up with [0 , 1] !!) Some examples of sequences in { 0 , 1 } : 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 ... 0 0 1 0 1 0 0 0 1 0 1 0 0 1 1 0 1 0 1 0 1... To look random and uniform on { 0 , 1 } , a sequence must have the correct proportion of 0 s and 1 s. Pairs of outcome should have the correct proportion of 0 , 0 and 0 , 1 and 1 , 1 . Randomness relates to complexity: length of the shortest algorithm required to produce this sequence (Kolmogorov, Martin-Löf). Gilles Guillot (gigu@dtu.dk) Stochastic simulation and resampling methods October 22, 2013 6 / 32

Pseudo uniform random numbers (Pseudo) Uniform random number generator on { 0 , 1 } Definition A (uniform) RNG on { 0 , 1 } is an algorithm that returns numbers such that #0 n − → 1 / 2 as n − → ∞ And #(0 , 0) − → 1 / 4 as n − → ∞ n #(0 , 1) − → 1 / 4 as n − → ∞ n #(1 , 0) − → 1 / 4 as n − → ∞ n #(1 , 1) − → 1 / 4 as n − → ∞ n And so on for any order... Such a sequence is also said to be k-uniform on { 0 , 1 } for any k and mimics what we expect of a sequence of i.i.d b (1 / 2) random variables. Gilles Guillot (gigu@dtu.dk) Stochastic simulation and resampling methods October 22, 2013 7 / 32

Pseudo uniform random numbers (Pseudo) Uniform random number generator on [0 , 1] Definition A (uniform) RNG on [0 , 1] is an algorithm that returns numbers such that # x i ∈ [ a, b ] − → ( b − a ) as n − → ∞ n #( x i , x j ) ∈ [ a, b ] × [ c, d ] − → ( b − a ) × ( d − c ) as n − → ∞ n for a, b, c, d ∈ [0 , 1] And so on for any order... Such a sequence is also said to be k-uniform on [0 , 1] for any k and mimics what we expect of a sequence of i.i.d U ([0 , 1]) random variables. Gilles Guillot (gigu@dtu.dk) Stochastic simulation and resampling methods October 22, 2013 8 / 32

Pseudo uniform random numbers Pseudo random uniform numbers on [0 , 1] : the congruential method. The congruential method takes advantage of the erratic occurence of prime numbers. It produce sequences in a discrete set { 0 , M } . To lie in [0 , 1] , the sequence has to be re-scaled. Congruential method 1 Choose a, b, M in N 2 Init x 1 at some arbitrary value in { 0 , ..., M − 1 } 3 Iterate x i := ax i − 1 + b ( mod M ) 4 y i = x i /M (rescaling) Gilles Guillot (gigu@dtu.dk) Stochastic simulation and resampling methods October 22, 2013 9 / 32

Pseudo uniform random numbers Some R code for the congruential method require(gmp); require(gplots) a <- 1000 b <- 0 M <- 2001179 isprime(M) n <- 10000 #length of sequence x <- rep(NA,n) x[1] <- 220472 # init for(i in 2:n) { x[i] <- (a*x[i-1]+b) %%M } y <- x/M # rescaling par(mfrow=c(1,2)) hist(y,prob=TRUE) plot(y[-1],y[-n],pch=".") Gilles Guillot (gigu@dtu.dk) Stochastic simulation and resampling methods October 22, 2013 10 / 32

Pseudo uniform random numbers Remarks on the congurential method Method given to produce numbers on { 0 , M } RNGs on [0 , 1] obtained by re-scaling Note the analogy with a roulette Algorithm is deterministic: same initial state produces same sequence Algorithm is periodic Fast Efficiency (ability to look random) depends strongly on a , b and M Gilles Guillot (gigu@dtu.dk) Stochastic simulation and resampling methods October 22, 2013 11 / 32

Pseudo uniform random numbers Assessing the quality of a random number generator Histogram Bidimensional histogram Auto-correlation function: ρ ( h ) = c ( h ) /V ar ( X ) where c ( h ) = Cov ( X k , X k + h ) Formal statistical test: Kolmogorov-Smirnov (KS) test: based on max | F ( x ) − F ⋆ ( x ) | returns a p-value measures how much the data conflict with H 0 a small p-value indicates a conflict between data and H 0 a large p-value indicates the absence of conflict between data and H 0 Gilles Guillot (gigu@dtu.dk) Stochastic simulation and resampling methods October 22, 2013 12 / 32

Pseudo uniform random numbers A state-of-the-art method: the Mersenne twister Previous method in use untill late 80’s Mersenne twister: algorithm proposed by Matsumoto and Nishimura in 1998 Iterative method Based on a matrix linear recurrence Has period 2 19937 − 1 when using 32-bit words Name derives from the fact that period length is chosen to be a Mersenne prime number Best pseudo-RNG to date Default algorithm in R (cf ? RNG ), Splus and Matlab Gilles Guillot (gigu@dtu.dk) Stochastic simulation and resampling methods October 22, 2013 13 / 32

Beyond uniform numbers on [0 , 1] Simulating non-uniform random numbers Most common families of methods include: Rejection methods Transformation methods Markov chain methods Gilles Guillot (gigu@dtu.dk) Stochastic simulation and resampling methods October 22, 2013 14 / 32

Random number generation in R R functions for random number simulation All common distributions are implemented Arguments include the desired sample size and distribution parameters (sometimes several parametrizations available) Examples: Continuous uniform U ([ a, b ]) : runif(n,min=a,max=b) Univariate normal N ( µ, σ ) : rnorm(n,mean=mu,sd=sigma) Exponential E ( λ ) : rexp(n,rate=lambda) Discrete with arbitrary weights (including uniform):: sample(size,x,replace,prob) See on-line help: ? Distributions Gilles Guillot (gigu@dtu.dk) Stochastic simulation and resampling methods October 22, 2013 15 / 32

The bootstrap The bootstrap Gilles Guillot (gigu@dtu.dk) Stochastic simulation and resampling methods October 22, 2013 16 / 32

The bootstrap Introductory example: estimating the accuracy of an estimator of the mean Consider the problem of estimating the expectation µ of a distribution with unknown variance σ 2 from an i.i.d sample X 1 , ..., X n . µ = ¯ A natural estimator of µ is ˆ X n The accuracy of ˆ µ can be assessed by computing MSE (ˆ µ ) = V (ˆ µ ) (since E (ˆ µ ) = µ ) µ ) = σ 2 /n We know that V (ˆ σ 2 can be estimated by the sample variance � ( X i − ¯ X n ) 2 / ( n − 1) µ ) = � ( X i − ¯ � X n ) 2 / ( n 2 − n ) Hence MSE (ˆ Gilles Guillot (gigu@dtu.dk) Stochastic simulation and resampling methods October 22, 2013 17 / 32

The bootstrap µ ) = σ 2 /n Now imagine for a while that we do not know that V (ˆ σ 2 = � ( X i − ¯ or that � X n ) 2 / ( n − 1) Gilles Guillot (gigu@dtu.dk) Stochastic simulation and resampling methods October 22, 2013 18 / 32

The bootstrap A fairly general situation: An unknown parameter θ An estimator ˆ θ No known formula for the variance (or MSE) of ˆ θ Gilles Guillot (gigu@dtu.dk) Stochastic simulation and resampling methods October 22, 2013 19 / 32

The bootstrap Back to the variance of ˆ µ By definition: variance = “expected square of the difference with expectation” V ( ¯ X n ) can be thought of as the number obtained as follows X (1) Take a 1st i.i.d sample of size n , compute ¯ n X (2) Take a 2nd i.i.d sample of size n , compute ¯ n . . . . . . Take a B-th i.i.d sample of size n , compute ¯ X ( B ) n   2 B B � � X n ) = 1 Compute � V ( ¯  ¯ X ( j ) X ( j ) ¯  − 1 /B n n B j =1 j =1 Gilles Guillot (gigu@dtu.dk) Stochastic simulation and resampling methods October 22, 2013 20 / 32

Stochastic simulation and resampling methods Statistical modelling: - PowerPoint PPT Presentation

Stochastic simulation and resampling methods Statistical modelling: Theory and practice Gilles Guillot gigu@dtu.dk October 22, 2013 Gilles Guillot (gigu@dtu.dk) Stochastic simulation and resampling methods October 22, 2013 1 / 32

Introduction to resampling methods Tushar Shanker Data Scientist DataCamp Statistical

Estimating the Performance of Predictive Models with Resampling Methods Florian Pargent

Introduction to Machine Learning Tuning: Nested Resampling compstat-lmu.github.io/lecture_i2ml

Cross-validation and the Bootstrap In the section we discuss two resampling methods:

An Introduction to Stochastic Simulation Stephen Gilmore Laboratory for Foundations of Computer

Resampling for nonstationary stochastic models Jacek Le skow Anna Dudek ukasz Lenart

Outline Narcisse Ngada DESY, MKK 1) What is simulation ? 14.05.2014 2) Why simulation ? 3)

The EXQUIRES (EXtensible QUantitative Image RESampling) Test Suite: Impact of the Downsampler,

Introduction to Machine Learning Evaluation: Resampling compstat-lmu.github.io/lecture_i2ml

Subsampling versus bootstrap in resampling-based model selection for multivariable regression

Controlling Adaptive Resampling Fons Adriaensen Casa della Musica, Parma Linux Audio Conference

Parametric bootstrap August 30, 2017 Resampling from the data or from distribution Simple

Simulation Monte Carlo Monte Carlo simulation Outcome of a single stochastic simulation run

DYNAMIC RESAMPLING FOR GUIDED EVOLUTIONARY MULTI-OBJECTIVE OPTIMIZATION OF STOCHASTIC SYSTEMS

Grid simulation (AliEn) Outline GRID simulation Simulation tool Ptolemy (Berkeley)

Stochastic Simulation Methods: Variance reduction methods Antithetic variables Bo Friis

Ekaterina Nosova DMI Dept of Mathematics and Informatics, University of Salerno, Italy

Variance Estimation for Survey-Weighted Data Using Bootstrap Resampling Methods: 2013

Background Data Resampling for Outlier-Aware Classification Out-of-distribution Detection

Challenge of Reproducible Pipelines Pjotr Prins 11th Biohackathon 2018 Matsue, Japan, December

Synthe'c data in US Census CompSci 590.03 Instructor: Ashwin

Update of noise filtering in protoDUNE Wenqiang Gu (BNL) Carlos Sarasty (University of

Advanced Simulation - Lecture 13 Patrick Rebeschini February 26th, 2018 Patrick Rebeschini

Last lecture Configuration Space Free-Space and C-Space Obstacles Minkowski Sums 1