Monte Carlo Methods Prof. Kuan-Ting Lai 2020/4/17 Monte Carlo - PowerPoint PPT Presentation

Mar 19, 2024 •1.37k likes •1.62k views

Monte Carlo Methods Prof. Kuan-Ting Lai 2020/4/17 Monte Carlo Methods Learn directly from episodes of experience Model-free: no knowledge of MDP transitions / rewards Learn from complete episodes (episodic MDP): no bootstrapping

Monte Carlo Methods Prof. Kuan-Ting Lai 2020/4/17
Monte Carlo Methods • Learn directly from episodes of experience • Model-free: no knowledge of MDP transitions / rewards • Learn from complete episodes (episodic MDP): no bootstrapping • Use the simplest idea: value = mean return
Sutton, Richard S.; Barto, Andrew G.. Reinforcement Learning (Adaptive Computation and Machine Learning series) (p. 189)
Monte Carlo Prediction • First-visit MC vs. Every-visit MC 𝑇 𝑇
Blackjack (21) https://www.imdb.com/title/tt0478087/
• Goal: Each player tries to beat Rules of Blackjack the dealer by getting a count as close to 21 as possible • Lose if total > 21 (bust) • The game begins with two cards dealt to both dealer and player • One of the dealer’s cards is face up and the other is face down • Actions − Hit: Requests additional card − Stick: stop getting cards • Dealer sticks when his sum ≥ 17
Reinforcement Learning of Blackjack • States − Player’s current sum (12 ~ 21) − Dealers’ showing cards (ace, 2 ~ 10) − Use A as 1 or 11 − Total states: 10*10*2 = 200 • Reward − 1: Winning − -1: losing − 0: drawing • ** Automatically call if sum < 12
State-value function of Blackjack Policy: stick if sum of cards 20, otherwise twist
Monte Carlo Control
Exploring Starts for Monte Carlo • Many state-action may never be visited 𝐵 𝑇 • Randomly choose state- 𝑇 𝐵 action pairs and run a 𝑇 𝐵 lot of episodes 𝑇 𝐵
Optimal Policy Learnt by MC ES
Monte Carlo Control without Exploring Starts • On-policy − ε -greedy • Off-policy − Importance sampling
On-policy first-visit MC Control (for ε -greedy) 𝑇 𝐵 𝑇 𝐵
Off-policy Prediction via Importance Sampling • Use two policies − Target policy: the optimal policy we want to learn − behavior policy: more exploratory, used to generate behaviors • How to update target policy using behavior polic? − Importance sampling
Importance Sampling • Probability of state-action trajectory • Relative trajectory probability of target behavior policies
Update using Importance-sampling ratio Simple Average Weighted Average
Ordinary Importance Sampling is Unstable
Reference • David Silver, Lecture 4: Model-Free Prediction • Chapter 5, Richard S. Sutton and Andrew G. Barto , “Reinforcement Learning: An Introduction,” 2 nd edition, Nov. 2018

Recommend

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1. Introduction 2. History 3. Examples Introduction Monte Carlo methods are stochastic techniques. Monte Carlo method is very

1.27k views • 68 slides

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

P . Skands QCD Lecture III Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P . Skands 1 P . Skands QCD Lecture III A Monte Carlo technique: is any technique making use of random numbers to solve a

2.37k views • 105 slides

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience values, policy Monte Carlo methods can be used in two ways: ! model-free: No model necessary and still attains optimality ! Simulated: Needs only a

2.17k views • 32 slides

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include: Arnaud Doucet, Axel Finke, Anthony Lee, Nick Whiteley 7th January 2014 Introduction Monte Carlo Approximationof Monte Carlo Filters Approximating

1.46k views • 31 slides

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L 6 x 1L 6 x 1L 6 x 1L PINEAPPLE- GOLD DEL MONTE DEL MONTE COCCO DEL MONTE DEL MONTE DEL MONTE DEL MONTE DEL MONTE 8x1 lt PINEAPPLE

531 views • 12 slides

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational methods that use random sampling to estimate results Named after the famous Monte Carlo Casino 7 January 2019 OSU CSE 2 Throwing Darts 3 5

392 views • 12 slides

Monte Carlo Methods for physically based Volume rendering Monte Carlo Methods for physically based

Monte Carlo Methods for physically based Volume rendering Monte Carlo Methods for physically based Volume rendering SIGGRAPH 2018 Course SIGGRAPH 2018 Course Advanced methods and acceleration data structures Advanced methods and acceleration

578 views • 53 slides

Monte Carlo Methods An introduction to Monte Carlo (MC) methods How to use MC methods

Numerical and Scientific Computing with Applications David F . Gleich CS 314, Purdue September 12, 2016 In this class: Monte Carlo Methods An introduction to Monte Carlo (MC) methods How to use MC methods Next class to estimate

827 views • 17 slides

Monte Carlo methods for volumetric light transport Monte Carlo methods for volumetric light

Monte Carlo methods for volumetric light transport Monte Carlo methods for volumetric light transport simulation simulation STAR at EG 2018 STAR at EG 2018 Advanced methods and acceleration data structures Advanced methods and acceleration

543 views • 30 slides

Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference,

1 Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference, Stanford University, August 2016 2 Draft Program Monte Carlo, Quasi-Monte Carlo, Randomized quasi-Monte Carlo QMC point sets and

1.98k views • 148 slides

Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1

Outline Introduction MCL Mixture-MCL End Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1 Outline Introduction MCL Mixture-MCL End Introduction 1 Localization Problem Bayes Filter Monte Carlo

414 views • 23 slides

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1.

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1. Recap 2. Estimating Action Values 3. Monte Carlo Control 4. Importance Sampling 5. Off-Policy Monte Carlo Control Recap: Monte Carlo vs.

263 views • 22 slides

4. THE MONTE CARLO METHOD 4.1 I ntroduction This chapter is aimed at describing the Monte Carlo

C hapter 4: Monte Carlo Modeling of Grain Growth and Recrystallization, A.D. Rollett & P. Manohar 4. THE MONTE CARLO METHOD 4.1 I ntroduction This chapter is aimed at describing the Monte Carlo method for the simulation of grain growth and

2.05k views • 37 slides

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE, FMI INVERSE PROBLEMS SUMMER SCHOOL, HELSINKI 2019 INVERSE PROBLEMS SUMMER SCHOOL, HELSINKI 2019 1 Other MCMC variants and implementations

1.2k views • 83 slides

Monte Carlo Methods and Area Estimates CS3220 - Summer 2008 Jonathan Kaldor Monte Carlo Methods

Monte Carlo Methods and Area Estimates CS3220 - Summer 2008 Jonathan Kaldor Monte Carlo Methods In this course so far, we have assumed (either explicitly or implicitly) that we have some clear mathematical problem to solve Model to

1.26k views • 33 slides

Monte Carlo Methods Monte Carlo Methods I, at any rate, am convinced that He does not throw dice.

Monte Carlo Methods Monte Carlo Methods I, at any rate, am convinced that He does not throw dice. Albert Einstein Albert Einstein 1 Fall 2010 Pseudo Random Numbers: 1/3 Pseudo Random Numbers: 1/3 Random numbers are numbers occur in a

156 views • 15 slides

Background Overall participation in gambling activities by older adults is increasing.

3/13/2016 Center for Gambling Studies Problem Gam bling Am ong Older Adults: Prevalence, Etiology, and Treatm ent Lia Nower, JD, PhD, Professor and Director, Center for Gambling Studies, Rutgers University, School of Social Work Center

777 views • 23 slides

Databases Announcements Create Table and Drop Table Create Table 4 Create Table CREATE

Databases Announcements Create Table and Drop Table Create Table 4 Create Table CREATE TABLE expression syntax: 4 Create Table CREATE TABLE expression syntax: column-def: 4 Create Table CREATE TABLE expression syntax:

719 views • 49 slides

Reinforcement Learning Environments Fully-observable vs partially-observable Single agent

Reinforcement Learning Environments Fully-observable vs partially-observable Single agent vs multiple agents Deterministic vs stochastic Episodic vs sequential Static or dynamic Discrete or continuous What is reinforcement

684 views • 39 slides

Abstract My purpose is to design and create a better AI model than others to play the Dou Di Zhu.

Dou Di Zhu With AI Methods Dou Di Zhu With AI Methods CS297 Report Presented to Professor Chris Pollet Department of Computer Science San Jos State University In Partial Fulfillment Of the Requirements for the Class CS 297 By Xuesong Luo

582 views • 13 slides

COMS 4115 Final Project Card Game Language (CGL) December 18 th , 2012 Kevin Henrick Ryan Jones

COMS 4115 Final Project Card Game Language (CGL) December 18 th , 2012 Kevin Henrick Ryan Jones Mark Micchelli Hebo Yang Overview of CGL CGL is a programming language used for creating and compiling turn-based card games. The

317 views • 20 slides

Urban De Urban Developm elopment ent Of Office fice of of Housing Housing Counseling

U.S. .S. Depar Department of tment of Housing Housing and and Urban De Urban Developm elopment ent Of Office fice of of Housing Housing Counseling Counseling Examples of Computing the 10% De Minimis Rate Facilitated by Booth

634 views • 35 slides

9/11/2018 Disclosure I have no relevant financial relationships with any companies related to

9/11/2018 Disclosure I have no relevant financial relationships with any companies related to the content of this course. Lasting Legacy of Black Lives in Organ Donation Jack D. Lynch University of California, San Francisco Director,

516 views • 4 slides

CS 103 Unit 14 Classes Revisited Mark Redekopp 2 UML (Unified Modeling Language) Shows

1 CS 103 Unit 14 Classes Revisited Mark Redekopp 2 UML (Unified Modeling Language) Shows class definitions in a language-agnostic way Shows class hierarchy (inheritance, etc.) Each class shown in one box with 3 sections Class

305 views • 9 slides