COMP 138: Reinforcement Learning Instructor : Jivko Sinapov Webpage : - - PowerPoint PPT Presentation

comp 138 reinforcement learning
SMART_READER_LITE
LIVE PREVIEW

COMP 138: Reinforcement Learning Instructor : Jivko Sinapov Webpage : - - PowerPoint PPT Presentation

COMP 138: Reinforcement Learning Instructor : Jivko Sinapov Webpage : https://www.eecs.tufts.edu/~jsinapov/teaching/comp150_RL_Fall2020/ BE a reinforcement learner You, as a class, will act as the learning agent BE a reinforcement learner


slide-1
SLIDE 1

COMP 138: Reinforcement Learning

Instructor: Jivko Sinapov Webpage: https://www.eecs.tufts.edu/~jsinapov/teaching/comp150_RL_Fall2020/

slide-2
SLIDE 2

BE a reinforcement learner

  • You, as a class, will act as the learning agent
slide-3
SLIDE 3

BE a reinforcement learner

  • You, as a class, will act as the learning agent
  • Actions: wave, clap, or nod
slide-4
SLIDE 4

BE a reinforcement learner

  • You, as a class, will act as the learning agent
  • Actions: wave, clap, or nod
  • Observations: color, reward
slide-5
SLIDE 5

BE a reinforcement learner

  • You, as a class, will act as the learning agent
  • Actions: wave, clap, or nod
  • Observations: color, reward
  • Goal: find an optimal policy
slide-6
SLIDE 6

BE a reinforcement learner

  • You, as a class, will act as the learning agent
  • Actions: wave, clap, or stand
  • Observations: color, reward
  • Goal: find an optimal policy

– What is a policy? What makes a policy optimal?

slide-7
SLIDE 7

How did you do it?

  • What is your policy, and how is it represented?
  • What does the world look like?
slide-8
SLIDE 8

What actually happened...

slide-9
SLIDE 9
slide-10
SLIDE 10
slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14

What actually happened...

slide-15
SLIDE 15

Now, let’s formalize this

(board or writing projector)

slide-16
SLIDE 16

About this course

  • Reinforcement Learning theory & practice
  • Theory at the start and practice towards end
  • Syllabus = the course web page:

https://www.eecs.tufts.edu/~jsinapov/teaching/comp150_RL/

slide-17
SLIDE 17

Where does RL fall within the field of Artificial Intelligence?

slide-18
SLIDE 18

Where does RL fall within the field of Artificial Intelligence?

  • AI → ML → RL
slide-19
SLIDE 19

Where does RL fall within the field of Artificial Intelligence?

  • AI → ML → RL
  • Type of Machine Learning:

– Supervised: learn from labeled examples – Unsupervised: learn from unlabeled examples – Reinforcement: learn through interaction

slide-20
SLIDE 20

Reduced Formalism

slide-21
SLIDE 21

Reduced Formalism

(board or writing projector)

slide-22
SLIDE 22

Take-home Message

  • Agent’s perspective: only the policy is under control
  • State representation and reward function are given
  • Focus on policy algorithms
  • Appeal: program agents by just specifying goals
  • Practice: need to pick state representation and reward

function

slide-23
SLIDE 23

Example Applications

slide-24
SLIDE 24

Example Applications

slide-25
SLIDE 25

Reading Assignment

  • Chapter 1 and 2 of Sutton and Barto
  • Reading response on Canvas due 9/11 before

class starts

slide-26
SLIDE 26

Programming Assignments

  • Students are required to complete 4 minor

programming assignments of their choosing

  • Default options: programing exercises from

Sutton and Barto (let’s look at some examples)

slide-27
SLIDE 27

Discussion Moderation

  • Each student will lead a reading discussion once

during the semester

  • Students can team up in a pair
  • Sign up sheet will be posted to Canvas tonight
  • Extra credit for anyone who volunteers for slots in the

next week

  • Presentation materials / notes or description of what

will be discussed should be emailed to me 48 hours before the class

slide-28
SLIDE 28

Next time...

slide-29
SLIDE 29
slide-30
SLIDE 30
slide-31
SLIDE 31

COMP 150: Reinforcement Learning

slide-32
SLIDE 32

Domains and Applications

slide-33
SLIDE 33

Curriculum Learning

. . . . . .

Example QuickChess game variants

slide-34
SLIDE 34

The Curriculum Learning Problem

Target task

Environment Agent Actjon State Reward

Task = MDP

Transfer Learning Sequencing Task Creatjon

[ Narverkar et al 2016 ]

slide-35
SLIDE 35

Textbook

The authors have made the book available: http://incompleteideas.net/book/bookdraft2017nov5.pdf

slide-36
SLIDE 36

Course Organization

  • Taught as a seminar: students take turns

presenting the readings

  • Will cover both theory and practice
  • Final projects – you will complete a project in

which you ask (and then answer) a relevant RL research question

slide-37
SLIDE 37