Optimization and Simulation Markov Chain Monte Carlo Methods Michel - PowerPoint PPT Presentation

Optimization and Simulation Markov Chain Monte Carlo Methods Michel Bierlaire Transport and Mobility Laboratory School of Architecture, Civil and Environmental Engineering Ecole Polytechnique F´ ed´ erale de Lausanne M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 1 / 50

Motivation Outline Motivation 1 Introduction to Markov chains 2 Stationary distributions 3 Metropolis-Hastings 4 Gibbs sampling 5 Simulated annealing 6 M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 2 / 50

Motivation The knapsack problem Patricia prepares a hike in the mountain. She has a knapsack with capacity W kg. She considers carrying a list of n items. Each item has a utility u i and a weight w i . What items should she take to maximize the total utility, while fitting in the knapsack? M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 3 / 50

Motivation Knapsack problem Simulation Let X be the set of all possible configurations (2 n ). Define a probability distribution: U ( x ) P ( x ) = � y ∈X U ( y ) Question: how to draw from this discrete random variable? M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 4 / 50

Introduction to Markov chains Outline Motivation 1 Introduction to Markov chains 2 Stationary distributions 3 Metropolis-Hastings 4 Gibbs sampling 5 Simulated annealing 6 M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 5 / 50

Introduction to Markov chains Markov Chains Andrey Markov, 1856–1922, Russian mathematician. M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 6 / 50

Introduction to Markov chains Markov Chains: glossary Stochastic process X t , t = 0 , 1 , . . . , , collection of r.v. with same support, or states space { 1 , . . . , i , . . . , J } . Markov process: (short memory) Pr( X t = i | X 0 , . . . , X t − 1 ) = Pr( X t = i | X t − 1 ) Homogeneous Markov process Pr( X t = j | X t − 1 = i ) = Pr( X t + k = j | X t − 1+ k = i ) = P ij ∀ t ≥ 1 , k ≥ 0 . M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 7 / 50

Introduction to Markov chains Markov Chains Transition matrix P ∈ R J × J . Properties: J � P ij = 1 , i = 1 , . . . , J , P ij ≥ 0 , ∀ i , j , j =1 Ergodicity If state j can be reached from state i with non zero probability, and i from j , we say that i communicates with j . Two states that communicate belong to the same class . A Markov chain is irreducible or ergodic if it contains only one class. With an ergodic chain, it is possible to go to every state from any state. M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 8 / 50

Introduction to Markov chains Markov Chains Aperiodic P t ij is the probability that the process reaches state j from i after t steps. Consider all t such that P t ii > 0. The largest common divisor d is called the period of state i . A state with period 1 is aperiodic . If P ii > 0, state i is aperiodic. The period is the same for all states in the same class. Therefore, if the chain is irreducible, if one state is aperiodic, they all are. M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 9 / 50

Introduction to Markov chains A periodic chain 1 3 2 3 1 1 2 2 1 1 3 2 1 1 2 5 4 1 2 1 1  0 0 0  2 2 1 2 0 0 0  3 3    P = 1 0 0 0 0 , d = 3 .    1 1  0 0 0   2 2 1 0 0 0 0 P t ii > 0 for t = 3 , 6 , 9 , 12 , 15 . . . M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 10 / 50

Introduction to Markov chains Another periodic chain 1 1 2 1 1 3 4 1 2 1 1 2 5 6 1  0 1 0 0 0 0  0 0 0 1 0 0     1 0 0 0 0 0   P = , d = 2 . 1 1   0 0 0 0   2 2   0 0 1 0 0 0   0 0 0 0 1 0 P t ii > 0 for t = 4 , 6 , 8 , 10 , 12 , . . . M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 11 / 50

Introduction to Markov chains Intuition Oscillation � 0 � 1 P = 1 0 The chain will not “converge” to something stable. M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 12 / 50

Introduction to Markov chains An aperiodic chain 1 1 2 1 1 1 3 3 4 1 3 1 1 3 5 6 1  0 1 0 0 0 0  0 0 0 1 0 0     1 0 0 0 0 0   P = , d = 1 . 1 1 1   0 0 0   3 3 3   0 0 1 0 0 0   0 0 0 0 1 0 P t ii > 0 for t = 3 , 4 , 6 , 7 , 8 , 9 , 10 , 11 , 12 . . . M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 13 / 50

Stationary distributions Outline Motivation 1 Introduction to Markov chains 2 Stationary distributions 3 Metropolis-Hastings 4 Gibbs sampling 5 Simulated annealing 6 M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 14 / 50

Stationary distributions Markov Chains Stationary probabilities J � Pr( j ) = Pr( j | i ) Pr( i ) i =1 Stationary probabilities: unique solution of the system J � π j = P ij π i , ∀ j = 1 , . . . , J . (1) i =1 J � π j = 1 . j =1 Solution exists for any irreducible chain. M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 15 / 50

Stationary distributions Example A machine can be in 4 states with respect to wear perfect condition, partially damaged, seriously damaged, completely useless. The degradation process can be modeled by an irreducible aperiodic homogeneous Markov process, with the following transition matrix:  0 . 95 0 . 04 0 . 01 0 . 0  0 . 0 0 . 90 0 . 05 0 . 05   P =   0 . 0 0 . 0 0 . 80 0 . 20   1 . 0 0 . 0 0 . 0 0 . 0 M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 16 / 50

Stationary distributions Example � 5 8 , 1 4 , 3 32 , 1 � Stationary distribution: 32   0 . 95 0 . 04 0 . 01 0 . 0 � 5 � � 5 � 8 , 1 4 , 3 32 , 1 0 . 0 0 . 90 0 . 05 0 . 05 8 , 1 4 , 3 32 , 1    =   0 . 0 0 . 0 0 . 80 0 . 20 32 32  1 . 0 0 . 0 0 . 0 0 . 0 Machine in perfect condition 5 days out of 8, in average. Repair occurs in average every 32 days From now on: Markov process = irreducible aperiodic homogeneous Markov process M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 17 / 50

Stationary distributions Markov Chains Detailed balance equations Consider the following system of equations: J � x i P ij = x j P ji , i � = j , x i = 1 (2) i =1 We sum over i : J J � � x i P ij = x j P ji = x j . i =1 i =1 If (2) has a solution, it is also a solution of (1). As π is the unique solution of (1) then x = π . π i P ij = π j P ji , i � = j The chain is said time reversible M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 18 / 50

Stationary distributions Stationary distributions Property π j = lim t →∞ Pr( X t = j ) j = 1 , . . . , J . Ergodicity Let f be any function on the state space. Then, with probability 1, T J 1 � � lim f ( X t ) = π j f ( j ) . T T →∞ t =1 j =1 Computing the expectation of a function of the stationary states is the same as to take the average of the values along a trajectory of the process. M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 19 / 50

Stationary distributions Example: T = 100 1 0 . 8 Pr( X t = j ) 0 . 6 0 . 4 0 . 2 0 0 20 40 60 80 100 t M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 20 / 50

Stationary distributions A periodic example It does not work for periodic chains � 0 � 1 P = 1 0 � 1 if t is odd Pr( X t = 1) = 0 if t is even t →∞ Pr( X t = 1) does not exist lim Staitonary distribution � 0 . 5 � π = 0 . 5 M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 23 / 50

Metropolis-Hastings Outline Motivation 1 Introduction to Markov chains 2 Stationary distributions 3 Metropolis-Hastings 4 Gibbs sampling 5 Simulated annealing 6 M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 24 / 50

Metropolis-Hastings Simulation Motivation Sample from very large discrete sets (e.g. sample paths between an origin and a destination). Full enumeration of the set is infeasible. Procedure We want to simulate a r.v. X with pmf Pr( X = j ) = p j . We generate a Markov process with limiting probabilities p j (how?) We simulate the evolution of the process. p j = π j = lim t →∞ Pr( X t = j ) j = 1 , . . . , J . M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 25 / 50

Metropolis-Hastings Simulation Assume that we are interested in simulating J � E[ f ( X )] = f ( j ) p j . j =1 We use ergodicity to estimate it with T 1 � f ( X t ) . T t =1 Drop early states (see above example) Better estimate: T + k 1 � f ( X t ) . T t =1+ k M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 26 / 50

Metropolis-Hastings Metropolis-Hastings Nicholas Metropolis W. Keith Hastings 1915 – 1999 1930 – M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 27 / 50

Optimization and Simulation Markov Chain Monte Carlo Methods Michel - PowerPoint PPT Presentation

Optimization and Simulation Markov Chain Monte Carlo Methods Michel Bierlaire Transport and Mobility Laboratory School of Architecture, Civil and Environmental Engineering Ecole Polytechnique F ed erale de Lausanne M. Bierlaire

Outline Narcisse Ngada DESY, MKK 1) What is simulation ? 14.05.2014 2) Why simulation ? 3)

Grid simulation (AliEn) Outline GRID simulation Simulation tool Ptolemy (Berkeley)

OR OR and and 2 Simulation Simulation for for Health Care Optimization Health Care

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

T7 Cloud Simulation On-demand access simulation December 2016 T7 Cloud Simulation December 2016

Simulation Simulation CHAPTER 1 INTRODUCTION TO SIMULATION 2 MODELING CHAPTER 1 INTRODUCTION

PATMOS2010 PATMOS2010 Optimization and Simulation Optimization and Simulation, Grenoble, France

to Simulation Optimization to Simulation, Optimization, and back Biology Physics

Optimization and Simulation Constrained optimization Michel Bierlaire michel.bierlaire@epfl.ch

Simulation of stationary processes Timo Tiihonen 2014 Tactical aspects of simulation

Automated Configuration of Co-simulation with Domain Specific Hints Co-simulation on the rise

MD3311 Simulation Results Joschua Dilly 28.01.2019 MD3311 Simulation Results 2 Introduction

Surgical Simulation: Surgical Simulation: We dont need simulation. We dont need

Why Bayesian methods in Simulation? Simulation Simulation Model Inputs BAYESIAN IDEAS

Statistical Simulation in Python Tushar Shanker Data Scientist DataCamp Statistical Simulation

Chapter 2 Simulation Examples Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

Infinite Dimensional Preconditioners V.B. Kiran Kumar Department of Mathematics Cochin

Year 11 Information Evening Preparing for GCSEs Thursday 23 January 2020 Y11 Information Evening

EECS 442 Computer Vision Prof. David Fouhey Winter 2019, University of Michigan

DB4SIL2 - Kernel assurance data for SIL2LinuxMP OpenTech Andreas Platschek <

CS 301 Lecture 05 Applications of Regular Languages Stephen Checkoway January 31, 2018 1 /

Computing and Processing Correspondences with Functional Maps SIGGRAPH Asia 2016 course Maks

Higher-order Functions Functions as Parameters Lecture 13 Assignments WS4 Functional

Predicng (mis)matches in sluicing Evidence from cloze, rang and reading me data Robin

Optimization and Simulation Markov Chain Monte Carlo Methods Michel - PowerPoint PPT Presentation

Optimization and Simulation Markov Chain Monte Carlo Methods Michel Bierlaire Transport and Mobility Laboratory School of Architecture, Civil and Environmental Engineering Ecole Polytechnique F ed erale de Lausanne M. Bierlaire

Outline Narcisse Ngada DESY, MKK 1) What is simulation ? 14.05.2014 2) Why simulation ? 3)

Grid simulation (AliEn) Outline GRID simulation Simulation tool Ptolemy (Berkeley)

OR OR and and 2 Simulation Simulation for for Health Care Optimization Health Care

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

T7 Cloud Simulation On-demand access simulation December 2016 T7 Cloud Simulation December 2016

Simulation Simulation CHAPTER 1 INTRODUCTION TO SIMULATION 2 MODELING CHAPTER 1 INTRODUCTION

PATMOS2010 PATMOS2010 Optimization and Simulation Optimization and Simulation, Grenoble, France

to Simulation Optimization to Simulation, Optimization, and back Biology Physics

Optimization and Simulation Constrained optimization Michel Bierlaire michel.bierlaire@epfl.ch

Simulation of stationary processes Timo Tiihonen 2014 Tactical aspects of simulation

Automated Configuration of Co-simulation with Domain Specific Hints Co-simulation on the rise

MD3311 Simulation Results Joschua Dilly 28.01.2019 MD3311 Simulation Results 2 Introduction

Surgical Simulation: Surgical Simulation: We dont need simulation. We dont need

Why Bayesian methods in Simulation? Simulation Simulation Model Inputs BAYESIAN IDEAS

Statistical Simulation in Python Tushar Shanker Data Scientist DataCamp Statistical Simulation

Chapter 2 Simulation Examples Banks, Carson, Nelson &amp; Nicol Discrete-Event System Simulation

Infinite Dimensional Preconditioners V.B. Kiran Kumar Department of Mathematics Cochin

Year 11 Information Evening Preparing for GCSEs Thursday 23 January 2020 Y11 Information Evening

EECS 442 Computer Vision Prof. David Fouhey Winter 2019, University of Michigan

DB4SIL2 - Kernel assurance data for SIL2LinuxMP OpenTech Andreas Platschek &lt;

CS 301 Lecture 05 Applications of Regular Languages Stephen Checkoway January 31, 2018 1 /

Computing and Processing Correspondences with Functional Maps SIGGRAPH Asia 2016 course Maks

Higher-order Functions Functions as Parameters Lecture 13 Assignments WS4 Functional

Predicng (mis)matches in sluicing Evidence from cloze, rang and reading me data Robin

Chapter 2 Simulation Examples Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

DB4SIL2 - Kernel assurance data for SIL2LinuxMP OpenTech Andreas Platschek <