Reinforcement learning Yifeng Tao School of Computer Science - - PowerPoint PPT Presentation

▶

Apr 15, 2023 348 likes •646 views

Introduction to Machine Learning Reinforcement learning Yifeng Tao School of Computer Science Carnegie Mellon University Slides adapted from Matt Gormley, Eric Xing Yifeng Tao Carnegie Mellon University 1 Learning Paradigms [Slide from Matt

SLIDE 1

Reinforcement learning

Yifeng Tao School of Computer Science Carnegie Mellon University Slides adapted from Matt Gormley, Eric Xing

Carnegie Mellon University 1 Yifeng Tao

Introduction to Machine Learning

SLIDE 2

Learning Paradigms

Yifeng Tao Carnegie Mellon University 2

[Slide from Matt Gormley]

SLIDE 3

Examples of Reinforcement Learning

Yifeng Tao Carnegie Mellon University 3

[Slide from Matt Gormley]

SLIDE 4

Robot in a room

Yifeng Tao Carnegie Mellon University 4

[Slide from Eric Xing]

SLIDE 5

History of Reinforcement Learning

Roots in the psychology of animal learning (Thorndike,1911).
Another independent thread was the problem of optimal control, and

its solution using dynamic programming (Bellman, 1957).

Idea of temporal difference learning (on-line method), e.g., playing

board games (Samuel, 1959).

A major breakthrough was the discovery of Q-learning (Watkins,

1989).

Yifeng Tao Carnegie Mellon University 5

[Slide from Eric Xing]

SLIDE 6

What is special about RL?

Yifeng Tao Carnegie Mellon University 6

[Slide from Eric Xing]

SLIDE 7

Elements of RL

Yifeng Tao Carnegie Mellon University 7

[Slide from Eric Xing]

SLIDE 8

Policy

Reward for each step: -0.1
Reward for each step -2

Yifeng Tao Carnegie Mellon University 8

[Slide from Eric Xing]

SLIDE 9

The Precise Goal

Yifeng Tao Carnegie Mellon University 9

[Slide from Eric Xing]

SLIDE 10

Reinforcement Learning

Train a policy to maximize the discounted, cumulative reward Rt0:
γ: should be a constant between 0 and 1
Bellman equation (deterministic):
Bellman equation (stochastic):

Yifeng Tao Carnegie Mellon University 10

[Slide from Matt Gormley]

SLIDE 11

Value Iteration

Yifeng Tao Carnegie Mellon University 11

[Slide from Matt Gormley]

SLIDE 12

Value Iteration Convergence

Yifeng Tao Carnegie Mellon University 12

[Slide from Matt Gormley]

SLIDE 13

Example: Robot Localization

Yifeng Tao Carnegie Mellon University 13

[Slide from Matt Gormley]

SLIDE 14

Value Iteration Variants

Variant 1: w/ Q(s,a) table

à

Variant 2: w/o Q(s,a) table

Yifeng Tao Carnegie Mellon University 14

[Slide from Matt Gormley]

SLIDE 15

Synchronous vs. Asynchronous Value Iteration

Yifeng Tao Carnegie Mellon University 15

[Slide from Matt Gormley]

SLIDE 16

Value Iteration Convergence

Yifeng Tao Carnegie Mellon University 16

[Slide from Matt Gormley]

SLIDE 17

Policy Iteration

Yifeng Tao Carnegie Mellon University 17

[Slide from Matt Gormley]

SLIDE 18

Policy Iteration

Yifeng Tao Carnegie Mellon University 18

[Slide from Matt Gormley]

SLIDE 19

Value Iteration vs. Policy Iteration

Yifeng Tao Carnegie Mellon University 19

[Slide from Matt Gormley]

SLIDE 20

Deep Q-Learning

Yifeng Tao Carnegie Mellon University 20

[Slide from Matt Gormley]

SLIDE 21

TD Gammon à Alpha Go

Yifeng Tao Carnegie Mellon University 21

[Slide from Matt Gormley]

SLIDE 22

Playing Atari with Deep RL

Yifeng Tao Carnegie Mellon University 22

[Slide from Matt Gormley]

SLIDE 23

Deep Q-Network (DQN) algorithm

Goal: train Q(s, a) to fit the unknown reward (Q) function.
Then, best policy:
Bellman equation:
Temporal difference error:
Huber loss:
B: a batch of transitions, sampled from the replay memory

Yifeng Tao Carnegie Mellon University 23

[Slide from https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html]

SLIDE 24

Experience Replay

Yifeng Tao Carnegie Mellon University 24

[Slide from Matt Gormley]

SLIDE 25

Alpha Go

Yifeng Tao Carnegie Mellon University 25

[Slide from Matt Gormley]

SLIDE 26

Constructing Genetic Association Database

Yifeng Tao Carnegie Mellon University 26

[Slide from Wang et al.]

SLIDE 27

Constructing Genetic Association Database

Yifeng Tao Carnegie Mellon University 27

[Slide from Wang et al.]

SLIDE 28

Take home message

Reward, value, and policy in reinforcement learning
Value iteration and convergence guarantee
Policy iteration
Deep Q-learning uses neural network to approximate Q-functions

Carnegie Mellon University 28 Yifeng Tao

SLIDE 29

References

Matt Gormley. 10601 Introduction to Machine Learning:

http://www.cs.cmu.edu/~mgormley/courses/10601/index.html

Eric Xing, Tom Mitchell. 10701 Introduction to Machine Learning:

http://www.cs.cmu.edu/~epxing/Class/10701-06f/

Adam Paszke. Reinforcement Learning (DQN) Tutorial:

https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.ht ml

Haohan Wang et al. 2019: Automatic Human-like Mining and

Constructing Reliable Genetic Association Database with Deep Reinforcement Learning

Carnegie Mellon University 29 Yifeng Tao