Department of Computer Science CSCI 5622: Machine Learning Chenhao - - PowerPoint PPT Presentation

department of computer science csci 5622 machine learning
SMART_READER_LITE
LIVE PREVIEW

Department of Computer Science CSCI 5622: Machine Learning Chenhao - - PowerPoint PPT Presentation

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 21: Reinforcement learning I Slides adapted from Jordan Boyd-Graber, Chris Ketelsen 1 Administrivia Poster printing Email your poster to


slide-1
SLIDE 1

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 21: Reinforcement learning I Slides adapted from Jordan Boyd-Graber, Chris Ketelsen

1

slide-2
SLIDE 2

Administrivia

  • Poster printing
  • Email your poster to inkspot.umc@colorado.edu with

subject “Tan Poster Project” by Thursday noon

  • Poster size A1
  • Check Piazza for details
  • Light refreshments will be provided, invite your

friends

  • Poster session: DLC 1B70 on Dec 13

2

slide-3
SLIDE 3

Learning objectives

  • Understand the formulation of reinforcement

learning

  • Understand the definition of a policy and the
  • ptimal policy
  • Learn about value iteration
  • Most of the two lectures are based on Richard S.

Sutton and Andrew G. Barto’s book

3

slide-4
SLIDE 4

Supervised learning

4

Unsupervised learning

Data: X Latent structure: Z Data: X Labels: Y

slide-5
SLIDE 5

5

slide-6
SLIDE 6

6

An agent learns to behave in an environment

slide-7
SLIDE 7

Reinforcement learning examples

  • Minh et al. 2013
  • https://www.youtube.com/watch?v=V1eYniJ0Rnk

7

slide-8
SLIDE 8

Reinforcement learning examples

8

slide-9
SLIDE 9

Reinforcement learning

9

slide-10
SLIDE 10

Reinforcement learning

10

slide-11
SLIDE 11

Markov decision processes

11

slide-12
SLIDE 12

Markov decision processes

12

slide-13
SLIDE 13

Markov decision processes

13

slide-14
SLIDE 14

A few examples

14

  • Grid world
slide-15
SLIDE 15

A few examples

  • Atari game (Bonus: try Google image search

“atari breakout”)

15

slide-16
SLIDE 16

A few examples

  • Go

16

slide-17
SLIDE 17

Goal

  • Episodes: ending at a terminal state, e.g., a play
  • f a game
  • Continuing tasks: keep trying and having infinite

steps

17

slide-18
SLIDE 18

Policy

  • The agent’s action selection

18

slide-19
SLIDE 19

Value function

19

slide-20
SLIDE 20

Action-value function (Q-function)

20

slide-21
SLIDE 21

Optimal policy and optimal value function

21

slide-22
SLIDE 22

Optimal policy and optimal value function

22

slide-23
SLIDE 23

Optimal policy and optimal value function

23

slide-24
SLIDE 24

A concrete grid example

24

  • Grid world
slide-25
SLIDE 25

A concrete grid example

25

Rewards can be positive or negative Delayed reward: might not get reward until you reach goal Might have negative reward until you reach goal

slide-26
SLIDE 26

A concrete grid example

26

slide-27
SLIDE 27

A concrete grid example

27

slide-28
SLIDE 28

A concrete grid example

28

slide-29
SLIDE 29

A concrete grid example

29

slide-30
SLIDE 30

A concrete grid example

30

slide-31
SLIDE 31

A concrete grid example

31

slide-32
SLIDE 32

A concrete grid example

32

Take-Away: Optimal policy highly dependent on details of reward

slide-33
SLIDE 33

Value Iteration

33

Punchline: Discounted reward renders an infinite horizon value function finite. Great b/c we can actually compare value of different sequences

slide-34
SLIDE 34

Value Iteration

34

slide-35
SLIDE 35

Value Iteration

35

slide-36
SLIDE 36

Value Iteration

36

slide-37
SLIDE 37

Value Iteration

37

slide-38
SLIDE 38

Value Iteration

38

slide-39
SLIDE 39

Value Iteration

39

slide-40
SLIDE 40

Value Iteration

40

slide-41
SLIDE 41

Value Iteration

41

slide-42
SLIDE 42

Value Iteration

42

slide-43
SLIDE 43

Value Iteration

43

slide-44
SLIDE 44

Value Iteration

44