Optimization and Simulation Statistical analysis and bootstrapping - PowerPoint PPT Presentation

Optimization and Simulation Statistical analysis and bootstrapping Michel Bierlaire Transport and Mobility Laboratory School of Architecture, Civil and Environmental Engineering Ecole Polytechnique F´ ed´ erale de Lausanne M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 1 / 21

Introduction The outputs of the simulator are random variables. Running the simulator provides one realization of these r.v. We have no access to the pdf or CDF of these r.v. Well... this is actually why we rely on simulation. How to derive statistics about a r.v. when only instances are known? How to measure the quality of this statistic? M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 2 / 21

Sample mean and variance Consider X 1 , . . . , X n i.i.d. r.v. E[ X i ] = µ , Var( X i ) = σ 2 . The sample mean n � X = 1 ¯ X i n i =1 is an unbiased estimate of the population mean µ , as E[ ¯ X ] = µ . The sample variance n � 1 S 2 = ( X i − ¯ X ) 2 n − 1 i =1 is an unbiased estimator of the population variance σ 2 , as E[ S 2 ] = σ 2 . (see proof: Ross, chapter 7) M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 3 / 21

Sample mean and variance Recursive computation 1 Initialize ¯ X 0 = 0, S 2 1 = 0. 2 Update the mean X k + X k +1 − ¯ X k X k +1 = ¯ ¯ k + 1 3 Update the variance � � 1 − 1 k + ( k + 1)( ¯ X k +1 − ¯ S 2 S 2 X k ) 2 . k +1 = k M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 4 / 21

Mean Square Error Consider X 1 , . . . , X n i.i.d. r.v. with CDF F . Consider a parameter θ ( F ) of the distribution (mean, quantile, mode, etc.) Consider � θ ( X 1 , . . . , X n ) an estimator of θ ( F ). The Mean Square Error of the estimator is defined as �� 2 � � MSE( F ) = E F θ ( X 1 , . . . , X n ) − θ ( F ) , where E F emphasizes that the expectation is taken under the assumption that the r.v. all have distribution F . If F is unknown, it is not immediate to find an estimator of MSE. M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 5 / 21

How many draws must be used? Let X a r.v. with mean θ and variance σ 2 . We want to estimate the mean θ of the simulated distribution. The estimator used is the sample mean: ¯ X . The mean square error is X − θ ) 2 ] = σ 2 E[( ¯ n The sample mean ¯ X is normally distributed with mean θ and variance σ 2 / n . So we can stop generating data when σ/ √ n is small. σ is approximated by the sample variance S . Law of large numbers: at least 100 draws (say) should be used. See Ross p. 121 for details. M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 6 / 21

Mean Square Error Other indicators than the mean are desired. Theoretical results about the MSE cannot always be derived. Solution: rely on simulation. Method: bootstrapping. M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 7 / 21

Empirical distribution function Consider X 1 , . . . , X n i.i.d. r.v. with CDF F . Consider a realization x 1 ,. . . , x n of these r.v. The empirical distribution function is defined as n � F e ( x ) = 1 I { x i ≤ x } , n i =1 where � 1 if x i ≤ x , I { x i ≤ x } = 0 otherwise . CDF of a r.v. that can take any x i with equal probability. M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 8 / 21

Empirical CDF 1 0 . 9 0 . 8 0 . 7 0 . 6 0 . 5 0 . 4 0 . 3 0 . 2 F e ( x ), n = 10 0 . 1 F ( x ) 0 0 0 . 5 1 1 . 5 2 2 . 5 3 3 . 5 4 M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 9 / 21

Empirical CDF 1 0 . 9 0 . 8 0 . 7 0 . 6 0 . 5 0 . 4 0 . 3 0 . 2 F e ( x ), n = 100 0 . 1 F ( x ) 0 0 1 2 3 4 5 6 7 8 M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 10 / 21

Empirical CDF 1 0 . 9 0 . 8 0 . 7 0 . 6 0 . 5 0 . 4 0 . 3 0 . 2 F e ( x ), n = 1000 0 . 1 F ( x ) 0 0 1 2 3 4 5 6 7 8 M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 11 / 21

From reality to data Reality Data Random variable X X e CDF F F e True parameter θ ( F ) θ ( F e ) X e 1 , . . . , X e Sample X 1 , . . . , X n ∼ F n ∼ F e � � θ ( X e 1 , . . . , X e Estimate θ ( X 1 , . . . , X n ) n ) M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 12 / 21

Mean Square Error We use the empirical distribution function F e We can approximate �� 2 � � MSE( F ) = E F θ ( X 1 , . . . , X n ) − θ ( F ) , by �� 2 � � θ ( X e 1 , . . . , X e MSE( F e ) = E F e n ) − θ ( F e ) , θ ( F e ) can be computed directly from the data (mean, variance, etc.) M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 13 / 21

Mean Square Error We want to compute �� 2 � θ ( X e � 1 , . . . , X e MSE( F e ) = E F e n ) − θ ( F e ) , X e i are r.v. that can take any x i with equal probability. Therefore, �� 2 � n n � � MSE( F e ) = 1 � · · · θ ( x i 1 , . . . , x i n ) − θ ( F e ) , n n i 1 =1 i n =1 Clearly impossible to compute when n is large. Solution: simulation. M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 14 / 21

Bootstrapping For r = 1 , . . . , R Draw x r 1 ,. . . , x r n from F e , that is draw from the data: Let s be a draw from U [0 , 1] 1 Set j = floor( ns ). 2 Return x j . 3 Compute � � 2 � θ ( x r 1 , . . . , x r M r = n ) − θ ( F e ) , Estimate of MSE( F e ) and, therefore, of MSE( F ): R � 1 M r R r =1 Typical value for R : 100. M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 15 / 21

Bootstrap: simple example Data: 0.636, -0.643, 0.183, -1.67, 0.462 Mean= -0.206 MSE= E[( ¯ X − θ ) 2 ] = S 2 / n = 0.1817 ˆ θ ( F e ) MSE r θ 1 -0.643 -0.643 -0.643 0.462 0.462 -0.201 -0.206 2.544e-05 2 -0.643 0.183 0.636 0.636 0.636 0.2896 -0.206 0.2456 3 -1.67 -1.67 0.183 0.462 0.636 -0.411 -0.206 0.04204 4 -1.67 -0.643 0.183 0.183 0.636 -0.2617 -0.206 0.003105 5 -0.643 0.462 0.462 0.636 0.636 0.3105 -0.206 0.2667 6 -1.67 -1.67 0.183 0.183 0.183 -0.5573 -0.206 0.1234 7 -0.643 0.183 0.183 0.462 0.636 0.1642 -0.206 0.137 8 -1.67 -1.67 -0.643 0.183 0.183 -0.7225 -0.206 0.2667 9 0.183 0.462 0.462 0.636 0.636 0.4756 -0.206 0.4646 10 -0.643 0.183 0.183 0.462 0.636 0.1642 -0.206 0.137 0.1686 M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 16 / 21

Python code def bootstrap(data): n = len(data) b = [] for i in range(0,n): r = random() index = int(n*r) b.append(data[index]) return b def percentile(dataSorted,p): i = max(int(round(p * len(dataSorted) + 0.5)), 2) return dataSorted[i-2] M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 17 / 21

95% percentile: Python code N = 10000 data = drawNormal(N) data.sort() theP = 0.975 quantile = float(percentile(data,theP)) print("{}% quantile={:.4f}".format(theP,quantile)) R=100 sum = 0.0 for l in range (0,R): b = bootstrap(data) b.sort() q = float(percentile(b,theP)) sum += (quantile-q)**2 print("MSE: {:.4f} sqrt(MSE): {:.4f}".format(sum/R,sqrt(sum/(R M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 18 / 21

Results N=10000 0.975% quantile=1.9472 MSE: 0.0009 sqrt(MSE): 0.0303 N=1000 0.975% quantile=1.8808 MSE: 0.0098 sqrt(MSE): 0.0988 N=100 0.975% quantile=1.4078 MSE: 0.0300 sqrt(MSE): 0.1732 M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 19 / 21

Summary The number of draws is determined by the required precision. In some cases, the precision is derived from theoretical results. If not, rely on bootstrapping. Idea: use simulation to estimate the Mean Square Error. M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 20 / 21

Appendix: MSE for the mean Consider X 1 , . . . , X n i.i.d. r.v. Denote θ = E[ X i ] and σ 2 = Var( X i ). X = � n Consider ¯ i =1 X i / n . X ] = � n E[ ¯ i =1 E[ X i ] / n = θ . MSE: E[( ¯ X − θ ) 2 ] Var ¯ = X � n � � = Var X i / n i =1 n � Var( X i ) / n 2 = i =1 σ 2 / n . = M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 21 / 21

Optimization and Simulation Statistical analysis and bootstrapping - PowerPoint PPT Presentation

Optimization and Simulation Statistical analysis and bootstrapping Michel Bierlaire Transport and Mobility Laboratory School of Architecture, Civil and Environmental Engineering Ecole Polytechnique F ed erale de Lausanne M. Bierlaire

Outline Narcisse Ngada DESY, MKK 1) What is simulation ? 14.05.2014 2) Why simulation ? 3)

Grid simulation (AliEn) Outline GRID simulation Simulation tool Ptolemy (Berkeley)

OR OR and and 2 Simulation Simulation for for Health Care Optimization Health Care

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

T7 Cloud Simulation On-demand access simulation December 2016 T7 Cloud Simulation December 2016

Simulation Simulation CHAPTER 1 INTRODUCTION TO SIMULATION 2 MODELING CHAPTER 1 INTRODUCTION

PATMOS2010 PATMOS2010 Optimization and Simulation Optimization and Simulation, Grenoble, France

to Simulation Optimization to Simulation, Optimization, and back Biology Physics

Optimization and Simulation Constrained optimization Michel Bierlaire michel.bierlaire@epfl.ch

Simulation of stationary processes Timo Tiihonen 2014 Tactical aspects of simulation

Automated Configuration of Co-simulation with Domain Specific Hints Co-simulation on the rise

MD3311 Simulation Results Joschua Dilly 28.01.2019 MD3311 Simulation Results 2 Introduction

Surgical Simulation: Surgical Simulation: We dont need simulation. We dont need

Why Bayesian methods in Simulation? Simulation Simulation Model Inputs BAYESIAN IDEAS

Statistical Simulation in Python Tushar Shanker Data Scientist DataCamp Statistical Simulation

Chapter 2 Simulation Examples Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

About interpolation on manifolds... How to interpolate points on curved spaces ? Light fast

Probability and Statistics for Computer Science Principal Component Analysis --- Exploring

JUST THE MATHS SLIDES NUMBER 13.2 INTEGRATION APPLICATIONS 2 (Mean values) & (Root

IN5550: Neural Methods in Natural Language Processing Lecture 2 Supervised Machine Learning:

Measuring inter- -annotator annotator Measuring inter agreement in GO agreement in GO

A S S O C I A T I O N O F S T A T E P U B L I C H E A L T H N U T R I T I O N I S T S A S S O

Lexical Association Measures Collocation Extraction Pavel Pecina pecina@ufal.mff.cuni.cz

Carnegie Mellon Univ. Problem Dept. of Computer Science Getting the data: Data Warehouses,

Optimization and Simulation Statistical analysis and bootstrapping - PowerPoint PPT Presentation

Optimization and Simulation Statistical analysis and bootstrapping Michel Bierlaire Transport and Mobility Laboratory School of Architecture, Civil and Environmental Engineering Ecole Polytechnique F ed erale de Lausanne M. Bierlaire

Outline Narcisse Ngada DESY, MKK 1) What is simulation ? 14.05.2014 2) Why simulation ? 3)

Grid simulation (AliEn) Outline GRID simulation Simulation tool Ptolemy (Berkeley)

OR OR and and 2 Simulation Simulation for for Health Care Optimization Health Care

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

T7 Cloud Simulation On-demand access simulation December 2016 T7 Cloud Simulation December 2016

Simulation Simulation CHAPTER 1 INTRODUCTION TO SIMULATION 2 MODELING CHAPTER 1 INTRODUCTION

PATMOS2010 PATMOS2010 Optimization and Simulation Optimization and Simulation, Grenoble, France

to Simulation Optimization to Simulation, Optimization, and back Biology Physics

Optimization and Simulation Constrained optimization Michel Bierlaire michel.bierlaire@epfl.ch

Simulation of stationary processes Timo Tiihonen 2014 Tactical aspects of simulation

Automated Configuration of Co-simulation with Domain Specific Hints Co-simulation on the rise

MD3311 Simulation Results Joschua Dilly 28.01.2019 MD3311 Simulation Results 2 Introduction

Surgical Simulation: Surgical Simulation: We dont need simulation. We dont need

Why Bayesian methods in Simulation? Simulation Simulation Model Inputs BAYESIAN IDEAS

Statistical Simulation in Python Tushar Shanker Data Scientist DataCamp Statistical Simulation

Chapter 2 Simulation Examples Banks, Carson, Nelson &amp; Nicol Discrete-Event System Simulation

About interpolation on manifolds... How to interpolate points on curved spaces ? Light fast

Probability and Statistics for Computer Science Principal Component Analysis --- Exploring

JUST THE MATHS SLIDES NUMBER 13.2 INTEGRATION APPLICATIONS 2 (Mean values) &amp; (Root

IN5550: Neural Methods in Natural Language Processing Lecture 2 Supervised Machine Learning:

Measuring inter- -annotator annotator Measuring inter agreement in GO agreement in GO

A S S O C I A T I O N O F S T A T E P U B L I C H E A L T H N U T R I T I O N I S T S A S S O

Lexical Association Measures Collocation Extraction Pavel Pecina pecina@ufal.mff.cuni.cz

Carnegie Mellon Univ. Problem Dept. of Computer Science Getting the data: Data Warehouses,

Chapter 2 Simulation Examples Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

JUST THE MATHS SLIDES NUMBER 13.2 INTEGRATION APPLICATIONS 2 (Mean values) & (Root