Reinforcement learning Yifeng Tao School of Computer Science - - PowerPoint PPT Presentation

reinforcement learning
SMART_READER_LITE
LIVE PREVIEW

Reinforcement learning Yifeng Tao School of Computer Science - - PowerPoint PPT Presentation

Introduction to Machine Learning Reinforcement learning Yifeng Tao School of Computer Science Carnegie Mellon University Slides adapted from Matt Gormley, Eric Xing Yifeng Tao Carnegie Mellon University 1 Learning Paradigms [Slide from Matt


slide-1
SLIDE 1

Reinforcement learning

Yifeng Tao School of Computer Science Carnegie Mellon University Slides adapted from Matt Gormley, Eric Xing

Carnegie Mellon University 1 Yifeng Tao

Introduction to Machine Learning

slide-2
SLIDE 2

Learning Paradigms

Yifeng Tao Carnegie Mellon University 2

[Slide from Matt Gormley]

slide-3
SLIDE 3

Examples of Reinforcement Learning

Yifeng Tao Carnegie Mellon University 3

[Slide from Matt Gormley]

slide-4
SLIDE 4

Robot in a room

Yifeng Tao Carnegie Mellon University 4

[Slide from Eric Xing]

slide-5
SLIDE 5

History of Reinforcement Learning

  • Roots in the psychology of animal learning (Thorndike,1911).
  • Another independent thread was the problem of optimal control, and

its solution using dynamic programming (Bellman, 1957).

  • Idea of temporal difference learning (on-line method), e.g., playing

board games (Samuel, 1959).

  • A major breakthrough was the discovery of Q-learning (Watkins,

1989).

Yifeng Tao Carnegie Mellon University 5

[Slide from Eric Xing]

slide-6
SLIDE 6

What is special about RL?

Yifeng Tao Carnegie Mellon University 6

[Slide from Eric Xing]

slide-7
SLIDE 7

Elements of RL

Yifeng Tao Carnegie Mellon University 7

[Slide from Eric Xing]

slide-8
SLIDE 8

Policy

  • Reward for each step: -0.1
  • Reward for each step -2

Yifeng Tao Carnegie Mellon University 8

[Slide from Eric Xing]

slide-9
SLIDE 9

The Precise Goal

Yifeng Tao Carnegie Mellon University 9

[Slide from Eric Xing]

slide-10
SLIDE 10

Reinforcement Learning

  • Train a policy to maximize the discounted, cumulative reward Rt0:
  • γ: should be a constant between 0 and 1
  • Bellman equation (deterministic):
  • Bellman equation (stochastic):

Yifeng Tao Carnegie Mellon University 10

[Slide from Matt Gormley]

slide-11
SLIDE 11

Value Iteration

Yifeng Tao Carnegie Mellon University 11

[Slide from Matt Gormley]

slide-12
SLIDE 12

Value Iteration Convergence

Yifeng Tao Carnegie Mellon University 12

[Slide from Matt Gormley]

slide-13
SLIDE 13

Example: Robot Localization

Yifeng Tao Carnegie Mellon University 13

[Slide from Matt Gormley]

slide-14
SLIDE 14

Value Iteration Variants

  • Variant 1: w/ Q(s,a) table

à

  • Variant 2: w/o Q(s,a) table

Yifeng Tao Carnegie Mellon University 14

[Slide from Matt Gormley]

slide-15
SLIDE 15

Synchronous vs. Asynchronous Value Iteration

Yifeng Tao Carnegie Mellon University 15

[Slide from Matt Gormley]

slide-16
SLIDE 16

Value Iteration Convergence

Yifeng Tao Carnegie Mellon University 16

[Slide from Matt Gormley]

slide-17
SLIDE 17

Policy Iteration

Yifeng Tao Carnegie Mellon University 17

[Slide from Matt Gormley]

slide-18
SLIDE 18

Policy Iteration

Yifeng Tao Carnegie Mellon University 18

[Slide from Matt Gormley]

slide-19
SLIDE 19

Value Iteration vs. Policy Iteration

Yifeng Tao Carnegie Mellon University 19

[Slide from Matt Gormley]

slide-20
SLIDE 20

Deep Q-Learning

Yifeng Tao Carnegie Mellon University 20

[Slide from Matt Gormley]

slide-21
SLIDE 21

TD Gammon à Alpha Go

Yifeng Tao Carnegie Mellon University 21

[Slide from Matt Gormley]

slide-22
SLIDE 22

Playing Atari with Deep RL

Yifeng Tao Carnegie Mellon University 22

[Slide from Matt Gormley]

slide-23
SLIDE 23

Deep Q-Network (DQN) algorithm

  • Goal: train Q(s, a) to fit the unknown reward (Q) function.
  • Then, best policy:
  • Bellman equation:
  • Temporal difference error:
  • Huber loss:
  • B: a batch of transitions, sampled from the replay memory

Yifeng Tao Carnegie Mellon University 23

[Slide from https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html]

slide-24
SLIDE 24

Experience Replay

Yifeng Tao Carnegie Mellon University 24

[Slide from Matt Gormley]

slide-25
SLIDE 25

Alpha Go

Yifeng Tao Carnegie Mellon University 25

[Slide from Matt Gormley]

slide-26
SLIDE 26

Constructing Genetic Association Database

Yifeng Tao Carnegie Mellon University 26

[Slide from Wang et al.]

slide-27
SLIDE 27

Constructing Genetic Association Database

Yifeng Tao Carnegie Mellon University 27

[Slide from Wang et al.]

slide-28
SLIDE 28

Take home message

  • Reward, value, and policy in reinforcement learning
  • Value iteration and convergence guarantee
  • Policy iteration
  • Deep Q-learning uses neural network to approximate Q-functions

Carnegie Mellon University 28 Yifeng Tao

slide-29
SLIDE 29

References

  • Matt Gormley. 10601 Introduction to Machine Learning:

http://www.cs.cmu.edu/~mgormley/courses/10601/index.html

  • Eric Xing, Tom Mitchell. 10701 Introduction to Machine Learning:

http://www.cs.cmu.edu/~epxing/Class/10701-06f/

  • Adam Paszke. Reinforcement Learning (DQN) Tutorial:

https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.ht ml

  • Haohan Wang et al. 2019: Automatic Human-like Mining and

Constructing Reliable Genetic Association Database with Deep Reinforcement Learning

Carnegie Mellon University 29 Yifeng Tao