Reinforcement Learning of Reinforcement Learning of Affordance Cues - - PowerPoint PPT Presentation

reinforcement learning of reinforcement learning of
SMART_READER_LITE
LIVE PREVIEW

Reinforcement Learning of Reinforcement Learning of Affordance Cues - - PowerPoint PPT Presentation

Reinforcement Learning of Reinforcement Learning of Affordance Cues Affordance Cues Final Status of Work Final Status of Work Lucas Paletta & Gerald Fritz Lucas Paletta & Gerald Fritz Computational Perception Group Computational


slide-1
SLIDE 1

MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008 Lucas PALETTA - Computational Perception Group (CAPE)

Reinforcement Learning of Reinforcement Learning of Affordance Cues Affordance Cues

Final Status of Work Final Status of Work

Lucas Paletta & Gerald Fritz Lucas Paletta & Gerald Fritz

Computational Perception Group Computational Perception Group Institute of Digital Image Processing Institute of Digital Image Processing JOANNEUM RESEARCH JOANNEUM RESEARCH Forschungsgesellschaft Forschungsgesellschaft mbH mbH Graz, Austria Graz, Austria

slide-2
SLIDE 2

MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008 Lucas PALETTA - Computational Perception Group (CAPE)

Feature Feature detectors detectors Learning Learning of

  • f affordance

affordance cues cues

Perception Module Perception Module

Architecture Architecture

slide-3
SLIDE 3

MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008 Lucas PALETTA - Computational Perception Group (CAPE)

Perception and Learning Perception and Learning

Supervised Learning of Supervised Learning of Affordance Cues Affordance Cues

top = T circular = T unknown size > 1426 non liftable liftable liftable size < 1410 non liftable

Y N Y Y N

liftable

pruning pruning

slide-4
SLIDE 4

MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008 Lucas PALETTA - Computational Perception Group (CAPE)

tb = bottom: unknown (1086.0) tb = top: | structure = circ: nonliftable (552.0) | structure = rect: | | size > 1426 : liftable (402.0) | | size <= 1426 : | | | size <= 1410 : liftable (72.0) | | | size > 1410 : nonliftable (6.0)

P(Aliftable|circ) ≈ 0.00 P(Anonliftable|circ) ≈ 1.00 P(Aliftable|rect) ≈ 0.99 P(Anonliftable|rect) ≈ 0.01

Fritz et al., SAB 2006, IROS 2006 Fritz et al., SAB 2006, IROS 2006

Perception and Learning Perception and Learning

Affordance Hypotheses Affordance Hypotheses

slide-5
SLIDE 5

MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008 Lucas PALETTA - Computational Perception Group (CAPE)

Entity Entity Memory Memory Feature Feature Detectors Detectors Outcome Outcome Classifier Classifier ‚ ‚Interaction Interaction‘ ‘

  • utcome
  • utcome

PM PM BM BM EM EM LM LM

Affordance Affordance Cue Cue Classifier Classifier DT DT Estimator Estimator

SUPERVISED SUPERVISED (DECISION TREE ) (DECISION TREE ) LEARNING LEARNING

Integration of Integration of Perception Perception Module and Module and Learning Learning Module Module

slide-6
SLIDE 6

MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008 Lucas PALETTA - Computational Perception Group (CAPE)

Decision Processes, Decision Processes, Rewards & Affordance Cues Rewards & Affordance Cues

reward affordance outcome

  • utcome

recognition perceptual perceptual state state ( (spatiotemp spatiotemp) )

trigger

cue cue state actions actions

slide-7
SLIDE 7

MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008 Lucas PALETTA - Computational Perception Group (CAPE)

State Estimation Final State Recognition MDP Decision Maker

s st

t

R Rt

t

a at

t

LEARNING & CONTROL LEARNING & CONTROL Image Analysis Entities, Attributes FEATURE RECOGNITION FEATURE RECOGNITION Curiosity Drive ATTENTION ATTENTION

Closed-Loop Learning Closed-Loop Learning

Paletta & Fritz, ICDL 2007 Paletta & Fritz, ICDL 2007 Paletta & Fritz, KI 2007 Paletta & Fritz, KI 2007

slide-8
SLIDE 8

MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008 Lucas PALETTA - Computational Perception Group (CAPE)

Reinforcement (Q-)Learning Reinforcement (Q-)Learning

s s‘ s‘‘ R = 0 R = +1

gripper gripper up up

Object LOW & Gripper LOW Object MID & Gripper MID Object HIGH & Gripper HIGH

final state gripper gripper up up gripper gripper down down gripper gripper down down

  • States

States : multi-modal & proprioceptive features

  • Actions

Actions : motor , crane, camera (a1,.., aA)

  • Rewards

Rewards: reward function is specific to affordance outcome driven reward:

  • R(t+1) = 1 if outcome occurs
  • R(t+1) = 0 otherwise
  • =
  • =

+ + +

) , ( ) , ( ) , (

n n t n t n t n

a s R E a S U a s Q

  • Predicted Reward

( )

  • +

+

  • )

' , ' ( max ) , ( 1 ) , (

'

a s Q R a s Q a s Q

a

  • Update Rule

predicted reward of early trigger state

slide-9
SLIDE 9

MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008 Lucas PALETTA - Computational Perception Group (CAPE)

Generalized Trigger Features and Generalized Trigger Features and Execution of Interaction Execution of Interaction

environment reward Rt state st action at agent agent

full state description affordance classifier 2 affordance classifier 3 affordance classifier4 b1 b2 b3 b4 affordance classifier 1

  • classification tree

(generalization

  • n predictive perceptual

features)

slide-10
SLIDE 10

MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008 Lucas PALETTA - Computational Perception Group (CAPE)

Integration of Integration of Perception Perception Module and Module and Learning Learning Module Module

Feature Feature Detectors Detectors Reward Reward Estimator Estimator Affordance Affordance Cue Cue Classifier Classifier Outcome Outcome Classifier Classifier ‚ ‚Interaction Interaction‘ ‘ Parameter Parameter Estimator Estimator

reward estimate reward estimate parameters parameters

  • utcome
  • utcome

(actual reward) (actual reward)

PM PM BM BM EM EM LM LM EXPLORATORY EXPLORATORY (REINFORCEMENT ) (REINFORCEMENT ) LEARNING LEARNING

slide-11
SLIDE 11

MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008 Lucas PALETTA - Computational Perception Group (CAPE)

Predicted Reward & Predicted Reward & Perceptual States Perceptual States

Demonstration

← ← est est. . future future cumulative cumulative reward reward ← ← classification classification affordance affordance cue cue ← ← proprioceptive proprioceptive features features ← ←

  • bservation
  • bservation of
  • f own
  • wn

actions actions ← ←

  • bservation
  • bservation of
  • f environment

environment

slide-12
SLIDE 12

MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008 Lucas PALETTA - Computational Perception Group (CAPE)

prospective states and affordance cues

Perception and Learning Perception and Learning

Reinforcement Learning of Reinforcement Learning of Affordance based Cues Affordance based Cues

slide-13
SLIDE 13

MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008 Lucas PALETTA - Computational Perception Group (CAPE)

Perception and Learning Perception and Learning

Learning Learning on Real World Imagery

  • n Real World Imagery
  • Real World Images with Image Analysis
  • Simulation of Magnet signals and Crane´s Rope Tension
slide-14
SLIDE 14

MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008 Lucas PALETTA - Computational Perception Group (CAPE)

Perception and Learning Perception and Learning

Video on Real-World Data Video on Real-World Data

slide-15
SLIDE 15

MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008 Lucas PALETTA - Computational Perception Group (CAPE)

Perception and Learning Perception and Learning

Key Contributions Key Contributions

  • Reinforcement Learning Methodology for Affordance Cueing

⇒ ⇒ Exploratory Learning without Supervision

Exploratory Learning without Supervision

⇒ ⇒ Reward Signal Determines Outcome Events

Reward Signal Determines Outcome Events

⇒ ⇒ Backtracking Determines Perceptual

Backtracking Determines Perceptual ‘ ‘Cue Cue’ ’ State State

⇒ ⇒ Enables Largely Autonomous Learning of Cueing

Enables Largely Autonomous Learning of Cueing

  • Perception-Action Framework for Affordance Recognition

⇒ Implicit Learning of Affordance Relations

Implicit Learning of Affordance Relations

slide-16
SLIDE 16

MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008 Lucas PALETTA - Computational Perception Group (CAPE)

Perception and Learning Perception and Learning

Directions of Future Work Directions of Future Work

  • Generalisation Towards Higher Order Features
  • Unsupervised Segmentation of Affordance Processing Stages

(C, B, O)

  • Integration into an Affordance Selection Framework