Monte Carlo Methods CS60077: Reinforcement Learning Abir Das IIT - PowerPoint PPT Presentation

Monte Carlo Methods CS60077: Reinforcement Learning Abir Das IIT Kharagpur Oct 05 and 06, 2020

Agenda Introduction MC Evaluation MC Control Agenda § Understand how to evaluate policies in model-free setting using Monte Carlo methods § Understand Monte Carlo methods in model-free setting for control of Reinforcement Learning problems Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 2 / 32

Agenda Introduction MC Evaluation MC Control Resources § Reinforcement Learning by David Silver [Link] § Reinforcement Learning by Balaraman Ravindran [Link] § Monte Carlo Simulation by Nando de Freitas [Link] § SB: Chapter 5 Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 3 / 32

Agenda Introduction MC Evaluation MC Control Model Free Setting § Like the previous few lectures, here also we will deal with prediction and control problems but this time it will be in a model-free setting § In model-free setting we do not have the full knowledge of the MDP § Model-free prediction : Estimate the value function of an unknown MDP § Model-free control : Optimise the value function of an unknown MDP § Model-free methods require only experience - sample sequences of states, actions, and rewards ( S 1 , A 1 , R 2 , · · · ) from actual or simulated interaction with an environment. § Actual experince requires no knowledge of the environment’s dynamics. § Simulated experience ‘requires’ models to generate samples only. No knowledge of the complete probability distributions of state transitions is required. In many cases this is easy to do. Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 4 / 32

Agenda Introduction MC Evaluation MC Control Monte Carlo § What is the probability that a dart thrown uniformly at random in the unit square will hit the red area? (1,1) (0,1) & (0,0) (1,0) ' , 0 P(area)=? Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 5 / 32

Agenda Introduction MC Evaluation MC Control Monte Carlo § What is the probability that a dart thrown uniformly at random in the unit square will hit the red area? (1,1) (0,1) & (0,0) (1,0) ' , 0 P(area)= & ' ⁄ Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 6 / 32

Agenda Introduction MC Evaluation MC Control Monte Carlo § What is the probability that a dart thrown uniformly at random in the unit square will hit the red area? (1,1) (0,1) & (0,0) (1,0) ' , 0 ' P(area)= 𝜌 & ' ⁄ Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 8 / 32

Agenda Introduction MC Evaluation MC Control Monte Carlo § What is the probability that a dart thrown uniformly at random in the unit square will hit the red area? (1,1) (0,1) & (0,0) (1,0) ' , 0 # *+, -./+0 P(area)= # -12+ -./+0 Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 10 / 32

Agenda Introduction MC Evaluation MC Control Monte Carlo § What is the probability that a dart thrown uniformly at random in the unit square will hit the red area? (1,1) (0,1) x x x x x x xx x x x x 8 x x x x x 19 x x & (0,0) (1,0) ' , 0 # *+,-. /0 ,1* +,1+ P(area)= # *+,-. Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 11 / 32

Agenda Introduction MC Evaluation MC Control History of Monte Carlo § The bomb and ENIAC Image taken from: www.livescience.com Image taken from: www.digitaltrends.com Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 12 / 32

Agenda Introduction MC Evaluation MC Control Monte Carlo for Expectation Calculation � § Lets say we want to compute E [ f ( x )] = f ( x ) p ( x ) dx � x ( i ) � N § Draw i.i.d. samples i =1 from the probability density p ( x ) Image taken from: Nando de Freitas: MLSS 08 N � δ x ( i ) ( x ) [ δ x ( i ) ( x ) is impulse at x ( i ) on x axis] § Approximate p ( x ) ≈ 1 N i =1 � � N � f ( x ) 1 § E [ f ( x )] = f ( x ) p ( x ) dx ≈ δ x ( i ) ( x ) dx = N i =1 � N N � � � x ( i ) � 1 = 1 f ( x ) δ x ( i ) ( x ) dx f N N i =1 i =1 � �� f ( x ( i ) ) Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 13 / 32

Agenda Introduction MC Evaluation MC Control Monte Carlo Policy Evaluation § Learn v π from episodes of experience under policy π S 1 , A 1 , R 2 , S 2 , A 2 , R 3 , · · · , S k , A k , R k ∼ π § Recall that the return is the total discounted reward: G t = R t +1 + γR t +2 + · · · + γ T − 1 R T § Recall that the value function is the expected return: v π ( s ) = E [ G t | S t = s ] § Monte-Carlo policy evaluation uses empirical mean return instead of expected return Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 14 / 32

Agenda Introduction MC Evaluation MC Control First Visit Monte Carlo Policy Evaluation § To evaluate state s i.e. to learn v π ( s ) § The first time-step t that state s is visited in an episode, § Increment counter N ( s ) ← N ( s ) + 1 § Increment total retun S ( s ) ← S ( s ) + G t § Value is estimated by mean return V ( s ) = S ( s ) /N ( s ) § By law of large number, V ( s ) → v π ( s ) as N ( s ) → ∞ Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 15 / 32

Agenda Introduction MC Evaluation MC Control Every Visit Monte Carlo Policy Evaluation § To evaluate state s i.e. to learn v π ( s ) § Every time-step t that state s is visited in an episode, § Increment counter N ( s ) ← N ( s ) + 1 § Increment total retun S ( s ) ← S ( s ) + G t § Value is estimated by mean return V ( s ) = S ( s ) /N ( s ) § By law of large number, V ( s ) → v π ( s ) as N ( s ) → ∞ Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 16 / 32

Agenda Introduction MC Evaluation MC Control Blackjack Example States (200 of them): Current sum (12-21) Dealer’s showing card (ace-10) Do I have a “useable” ace? (yes-no) Action stick: Stop receiving cards (and terminate) Action twist: Take another card (no replacement) Reward for stick: +1 if sum of cards > sum of dealer cards 0 if sum of cards = sum of dealer cards -1 if sum of cards < sum of dealer cards Reward for twist: -1 if sum of cards > 21 (and terminate) 0 otherwise Transitions: automatically twist if sum of cards < 12 Slide courtesy: David Silver [Deepmind] Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 17 / 32

Agenda Introduction MC Evaluation MC Control Blackjack Example Policy: stick if sum of cards ≥ 20, otherwise twist Slide courtesy: David Silver [Deepmind] Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 18 / 32

Agenda Introduction MC Evaluation MC Control Monte Carlo Control § We will now, see how Monte Carlo estimation can be used in control . § This is mostly like the generalized policy iteration (GPI) where one maintains both an approximate policy and an approximate value function. Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 19 / 32

Agenda Introduction MC Evaluation MC Control Monte Carlo Control § We will now, see how Monte Carlo estimation can be used in control . § This is mostly like the generalized policy iteration (GPI) where one maintains both an approximate policy and an approximate value function. § Policy evaluation is done as Monte Carlo evaluation § Then, we can do greedy policy improvement. Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 19 / 32

Agenda Introduction MC Evaluation MC Control Monte Carlo Control § We will now, see how Monte Carlo estimation can be used in control . § This is mostly like the generalized policy iteration (GPI) where one maintains both an approximate policy and an approximate value function. § Policy evaluation is done as Monte Carlo evaluation § Then, we can do greedy policy improvement. § What is the problem!! Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 19 / 32

Agenda Introduction MC Evaluation MC Control Monte Carlo Control § We will now, see how Monte Carlo estimation can be used in control . § This is mostly like the generalized policy iteration (GPI) where one maintains both an approximate policy and an approximate value function. § Policy evaluation is done as Monte Carlo evaluation § Then, we can do greedy policy improvement. § What is the problem!! � � § π ′ ( s ) . r ( s, a ) + γ � p ( s ′ | s, a ) v π ( s ′ ) = arg max a ∈A s ′ ∈S Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 19 / 32

Agenda Introduction MC Evaluation MC Control Monte Carlo Control § Greedy policy improvement over v ( s ) requires model of MDP � � π ′ ( s ) . r ( s, a ) + γ � p ( s ′ | s, a ) v π ( s ′ ) = arg max a ∈A s ′ ∈S Abir Das (IIT Kharagpur) CS60077 Oct 05 and 06, 2020 20 / 32

Monte Carlo Methods CS60077: Reinforcement Learning Abir Das IIT - PowerPoint PPT Presentation

Monte Carlo Methods CS60077: Reinforcement Learning Abir Das IIT Kharagpur Oct 05 and 06, 2020 Agenda Introduction MC Evaluation MC Control Agenda Understand how to evaluate policies in model-free setting using Monte Carlo methods

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Monte Carlo Methods for physically based Volume rendering Monte Carlo Methods for physically based

Monte Carlo Methods An introduction to Monte Carlo (MC) methods How to use MC methods

Monte Carlo methods for volumetric light transport Monte Carlo methods for volumetric light

Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference,

Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1.

4. THE MONTE CARLO METHOD 4.1 I ntroduction This chapter is aimed at describing the Monte Carlo

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

Monte Carlo Methods and Area Estimates CS3220 - Summer 2008 Jonathan Kaldor Monte Carlo Methods

Monte Carlo Methods Monte Carlo Methods I, at any rate, am convinced that He does not throw dice.

Sequential Monte Carlo Methods Particle Filter Martin Ulmke Head of Research Group

Sequential Monte Carlo Methods Click to edit Master text styles Click to edit Master text

CSC2541: Deep Reinforcement Learning Lecture 3: Monte-Carlo and Temporal Difference Slides

Combinatorics of the PASEP partition function Matthieu Josuat-Verg` es Universit e Paris-sud

Conformal Supergravity, 4D Scattering Equations (and Monte Carlo Methods) Joe Farrow Based on

Thompson Sampling based Monte-Carlo Planning in POMDPs Aijun Bai 1 Feng Wu 2 Zongzhang Zhang 3

Geometrically Coupled Monte Carlo Sampling Mark Rowland Krzysztof Choromanski Franois Chalus

RSA Parameter Generation Bob needs to: - find 2 large primes p,q - find e s.t. gcd(e, (pq))=1

Monte Carlo Methods CS60077: Reinforcement Learning Abir Das IIT - PowerPoint PPT Presentation

Monte Carlo Methods CS60077: Reinforcement Learning Abir Das IIT Kharagpur Oct 05 and 06, 2020 Agenda Introduction MC Evaluation MC Control Agenda Understand how to evaluate policies in model-free setting using Monte Carlo methods

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Monte Carlo Methods for physically based Volume rendering Monte Carlo Methods for physically based

Monte Carlo Methods An introduction to Monte Carlo (MC) methods How to use MC methods

Monte Carlo methods for volumetric light transport Monte Carlo methods for volumetric light

Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference,

Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1

Monte Carlo Control CMPUT 366: Intelligent Systems S&amp;B 5.3-5.5, 5.7 Lecture Outline 1.

4. THE MONTE CARLO METHOD 4.1 I ntroduction This chapter is aimed at describing the Monte Carlo

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

Monte Carlo Methods and Area Estimates CS3220 - Summer 2008 Jonathan Kaldor Monte Carlo Methods

Monte Carlo Methods Monte Carlo Methods I, at any rate, am convinced that He does not throw dice.

Sequential Monte Carlo Methods Particle Filter Martin Ulmke Head of Research Group

Sequential Monte Carlo Methods Click to edit Master text styles Click to edit Master text

CSC2541: Deep Reinforcement Learning Lecture 3: Monte-Carlo and Temporal Difference Slides

Combinatorics of the PASEP partition function Matthieu Josuat-Verg` es Universit e Paris-sud

Conformal Supergravity, 4D Scattering Equations (and Monte Carlo Methods) Joe Farrow Based on

Thompson Sampling based Monte-Carlo Planning in POMDPs Aijun Bai 1 Feng Wu 2 Zongzhang Zhang 3

Geometrically Coupled Monte Carlo Sampling Mark Rowland Krzysztof Choromanski Franois Chalus

RSA Parameter Generation Bob needs to: - find 2 large primes p,q - find e s.t. gcd(e, (pq))=1

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1.