CS 378: Autonomous Intelligent Robotics FRI-II Instructor: Jivko - - PowerPoint PPT Presentation

cs 378 autonomous intelligent robotics fri ii
SMART_READER_LITE
LIVE PREVIEW

CS 378: Autonomous Intelligent Robotics FRI-II Instructor: Jivko - - PowerPoint PPT Presentation

CS 378: Autonomous Intelligent Robotics FRI-II Instructor: Jivko Sinapov http://www.cs.utexas.edu/~jsinapov/teaching/cs378_fall2016/ Reinforcement Learning (Part 2) Announcements Volunteers needed for robot study Sign up sheet here:


slide-1
SLIDE 1

CS 378: Autonomous Intelligent Robotics FRI-II

Instructor: Jivko Sinapov

http://www.cs.utexas.edu/~jsinapov/teaching/cs378_fall2016/

slide-2
SLIDE 2

Reinforcement Learning (Part 2)

slide-3
SLIDE 3

Announcements

slide-4
SLIDE 4

Volunteers needed for robot study

Sign up sheet here: https://docs.google.com/spreadsheets/d/1Gr 2GqlPt8kdTJwlZ3FerxU0J8oIoGt0pEbeR37iCHqY/ edit#gid=0 Further details will be made available on Canvas via an announcement

slide-5
SLIDE 5

FAI Talk this Friday

“Turning Assistive Machines into Assistive Robots” Brenna Argall Northwestern University Friday, Sept. 9th, 11 am @ GDC 6.302

[ https://www.cs.utexas.edu/~ai-lab/fai/ ] or google “fai ut cs”

slide-6
SLIDE 6

Robotics Seminar Series Talk

“Learning from and about humans using an autonomous multi-robot mobile platform” Jivko Sinapov UT Austin Wed., Sept. 7th, 3 pm @ GDC 5.302

slide-7
SLIDE 7

Robot Training

  • Sign up for a robot training session next week at:

https://docs.google.com/spreadsheets/d/1kz6QMPa-xkdFQNyV 0Biif913JKf73GIUGx9bVSLo_R8/edit?usp=sharing

  • Link will be posted as Announcement on Canvas
slide-8
SLIDE 8

Preliminary Project “Presentations”

  • Date: September 13th
  • Form groups of 2-3 prior to the date
  • Be prepared to talk about 2-3 project ideas for

5-10 minutes

  • Email me your group info, i.e., who is in it
slide-9
SLIDE 9

Project Ideas

slide-10
SLIDE 10

Project Idea: Improve the robot's grasping ability

  • Currently, the robot does not “remember” which

grasps succeeded and which failed

  • If the robot were to log the context of the grasp

(e.g., the position of the gripper relative to the

  • bject's point cloud) and the outcome, it could

incrementally learn a model to predict the

  • utcome given the context
  • The robot's current grasping software is

described in http://wiki.ros.org/agile_grasp

slide-11
SLIDE 11

Project Idea: Object Handover

  • Currently, the arm can let go of an object or

close its fingers upon sufficient contact using haptic feedback

  • Can you make it so that it can move towards an
  • bject held by a human and grasp it based on

visual and haptic feedback?

slide-12
SLIDE 12

Project Idea: Learning about objects from humans

  • The robot is currently able to grasp an object

from a table and navigate to an office

  • Can we use the GUI to ask humans questions

about objects and store this information in a database that can be used for learning recognition models?

slide-13
SLIDE 13

Project Idea: Large-Scale 3D object mapping

  • Can we combine 3D Plane detection and

Clustering to detect and map objects in the environment?

slide-14
SLIDE 14

Project Idea: Learning an object manipulation skill

  • Example: Pressing a button
slide-15
SLIDE 15

Project Idea: Enhance Virtour

www.cs.utexas.edu/~larg/bwi_virtour

slide-16
SLIDE 16
slide-17
SLIDE 17

Reinforcement Learning (Part 2)

slide-18
SLIDE 18

Markov Decision Process (MDP)

slide-19
SLIDE 19

Markov Decision Process (MDP)

The reward and state-transition observed at time t after picking action a in state s is independent of anything that happened before time t

slide-20
SLIDE 20

Maze World

[slide credit: David Silver]

slide-21
SLIDE 21

Maze World

[slide credit: David Silver]

State Representation: Factored vs. Tabula Rasa

slide-22
SLIDE 22

Maze Example: Policy

[slide credit: David Silver]

slide-23
SLIDE 23

Maze Example: Value Function

[slide credit: David Silver]

slide-24
SLIDE 24

Maze Example: Policy

[slide credit: David Silver]

slide-25
SLIDE 25

Maze Example: Model

[slide credit: David Silver]

slide-26
SLIDE 26

Sparse vs. Dense Reward

slide-27
SLIDE 27

Notation and Problem Formulation

  • Overview of notation in TEXPLORE paper
slide-28
SLIDE 28

Notation

Set of States: Set of Actions: Transition Function: Reward Function:

slide-29
SLIDE 29

Action-Value Function

slide-30
SLIDE 30

Action-Value Function

The value of taking action a in state s The reward received after taking action a in state s Probability of going to state s' from s after a Discount factor (between 0 and 1) a' is the action with the highest action- value in state s'

slide-31
SLIDE 31

Action-Value Function

Common algorithms to learn the action-value function include Q-Learning and SARSA The policy consists of always taking the action that maximize the action-value function

slide-32
SLIDE 32

Q-Learning Grid World Example

https://www-s.acm.illinois.edu/sigart/docs/QLearning.pdf

slide-33
SLIDE 33

RL in a nutshell

slide-34
SLIDE 34

RL in a nutshell

slide-35
SLIDE 35

Q-Learning

  • Guest Slides
slide-36
SLIDE 36

Pac Man Example

slide-37
SLIDE 37

Linear Function Approximator of Q*

w1 * x1 + w2 * x2 + … + wn * xn Φ(s,a) = x where x is an n-dimensional feature vector Q*(Φ(s,a) ) =

slide-38
SLIDE 38

How does Pac-Man “see” the world?

slide-39
SLIDE 39

How does Pac-Man “see” the world?

slide-40
SLIDE 40

How does Pac-Man “see” the world?

slide-41
SLIDE 41

Linear Function Approximator of Q*

w1 * x1 + w2 * x2 + … + wn * xn Φ(s,a) = x where x is an n-dimensional feature vector Q*(Φ(s,a) ) =

The task now is to find the optimal weight vector w

slide-42
SLIDE 42

Can RL learn directly from images?

  • Yes it can:
  • http://karpathy.github.io/2016/05/31/rl/
slide-43
SLIDE 43

Video on Updating a NN's Weights

Neural Networks Demystified [Part 3: Gradient Descent] https://www.youtube.com/watch?v=5u0jaA3qAGk

slide-44
SLIDE 44

Video of TAMER

http://labcast.media.mit.edu/?p=300

slide-45
SLIDE 45

Using RL: Essential Steps

1) Specify the state space or the state-action space

– Are the states and/or actions discrete or continous?

2) Specify the reward function

– If you have control over this, dense reward is better than

sparse reward

3) Specify the environment (e.g., a simulator or perhaps the real world) 4) Pick your favorite RL algorithm that can handle the state and action representation

slide-46
SLIDE 46

Resources

  • BURLAP: Java RL Library:

http://burlap.cs.brown.edu/

  • Reinforcement Learning: An Introduction

http://people.inf.elte.hu/lorincz/Files/RL_2 006/SuttonBook.pdf