Reinforcement Learning of Reinforcement Learning of Affordance Cues - - PowerPoint PPT Presentation

▶

Sep 15, 2022 467 likes •644 views

Reinforcement Learning of Reinforcement Learning of Affordance Cues Affordance Cues Final Status of Work Final Status of Work Lucas Paletta & Gerald Fritz Lucas Paletta & Gerald Fritz Computational Perception Group Computational

SLIDE 1

MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008 Lucas PALETTA - Computational Perception Group (CAPE)

Reinforcement Learning of Reinforcement Learning of Affordance Cues Affordance Cues

Final Status of Work Final Status of Work

Lucas Paletta & Gerald Fritz Lucas Paletta & Gerald Fritz

Computational Perception Group Computational Perception Group Institute of Digital Image Processing Institute of Digital Image Processing JOANNEUM RESEARCH JOANNEUM RESEARCH Forschungsgesellschaft Forschungsgesellschaft mbH mbH Graz, Austria Graz, Austria

SLIDE 2

MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008 Lucas PALETTA - Computational Perception Group (CAPE)

Feature Feature detectors detectors Learning Learning of

f affordance

affordance cues cues

Perception Module Perception Module

Architecture Architecture

SLIDE 3

MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008 Lucas PALETTA - Computational Perception Group (CAPE)

Perception and Learning Perception and Learning

Supervised Learning of Supervised Learning of Affordance Cues Affordance Cues

top = T circular = T unknown size > 1426 non liftable liftable liftable size < 1410 non liftable

Y N Y Y N

liftable

pruning pruning

SLIDE 4

MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008 Lucas PALETTA - Computational Perception Group (CAPE)

tb = bottom: unknown (1086.0) tb = top: | structure = circ: nonliftable (552.0) | structure = rect: | | size > 1426 : liftable (402.0) | | size <= 1426 : | | | size <= 1410 : liftable (72.0) | | | size > 1410 : nonliftable (6.0)

P(Aliftable|circ) ≈ 0.00 P(Anonliftable|circ) ≈ 1.00 P(Aliftable|rect) ≈ 0.99 P(Anonliftable|rect) ≈ 0.01

Fritz et al., SAB 2006, IROS 2006 Fritz et al., SAB 2006, IROS 2006

Perception and Learning Perception and Learning

Affordance Hypotheses Affordance Hypotheses

SLIDE 5

MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008 Lucas PALETTA - Computational Perception Group (CAPE)

Entity Entity Memory Memory Feature Feature Detectors Detectors Outcome Outcome Classifier Classifier ‚ ‚Interaction Interaction‘ ‘

utcome
utcome

PM PM BM BM EM EM LM LM

Affordance Affordance Cue Cue Classifier Classifier DT DT Estimator Estimator

SUPERVISED SUPERVISED (DECISION TREE ) (DECISION TREE ) LEARNING LEARNING

Integration of Integration of Perception Perception Module and Module and Learning Learning Module Module

SLIDE 6

MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008 Lucas PALETTA - Computational Perception Group (CAPE)

Decision Processes, Decision Processes, Rewards & Affordance Cues Rewards & Affordance Cues

reward affordance outcome

utcome

recognition perceptual perceptual state state ( (spatiotemp spatiotemp) )

trigger

cue cue state actions actions

SLIDE 7

MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008 Lucas PALETTA - Computational Perception Group (CAPE)

State Estimation Final State Recognition MDP Decision Maker

s st

R Rt

a at

LEARNING & CONTROL LEARNING & CONTROL Image Analysis Entities, Attributes FEATURE RECOGNITION FEATURE RECOGNITION Curiosity Drive ATTENTION ATTENTION

Closed-Loop Learning Closed-Loop Learning

Paletta & Fritz, ICDL 2007 Paletta & Fritz, ICDL 2007 Paletta & Fritz, KI 2007 Paletta & Fritz, KI 2007

SLIDE 8

MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008 Lucas PALETTA - Computational Perception Group (CAPE)

Reinforcement (Q-)Learning Reinforcement (Q-)Learning

s s‘ s‘‘ R = 0 R = +1

gripper gripper up up

Object LOW & Gripper LOW Object MID & Gripper MID Object HIGH & Gripper HIGH

final state gripper gripper up up gripper gripper down down gripper gripper down down

States

States : multi-modal & proprioceptive features

Actions

Actions : motor , crane, camera (a1,.., aA)

Rewards

Rewards: reward function is specific to affordance outcome driven reward:

R(t+1) = 1 if outcome occurs
R(t+1) = 0 otherwise
=
=

+ + +

) , ( ) , ( ) , (

n n t n t n t n

a s R E a S U a s Q

Predicted Reward

( )

)

' , ' ( max ) , ( 1 ) , (

a s Q R a s Q a s Q

Update Rule

predicted reward of early trigger state

SLIDE 9

MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008 Lucas PALETTA - Computational Perception Group (CAPE)

Generalized Trigger Features and Generalized Trigger Features and Execution of Interaction Execution of Interaction

environment reward Rt state st action at agent agent

full state description affordance classifier 2 affordance classifier 3 affordance classifier4 b1 b2 b3 b4 affordance classifier 1

classification tree

(generalization

n predictive perceptual

features)

SLIDE 10

MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008 Lucas PALETTA - Computational Perception Group (CAPE)

Integration of Integration of Perception Perception Module and Module and Learning Learning Module Module

Feature Feature Detectors Detectors Reward Reward Estimator Estimator Affordance Affordance Cue Cue Classifier Classifier Outcome Outcome Classifier Classifier ‚ ‚Interaction Interaction‘ ‘ Parameter Parameter Estimator Estimator

reward estimate reward estimate parameters parameters

utcome
utcome

(actual reward) (actual reward)

PM PM BM BM EM EM LM LM EXPLORATORY EXPLORATORY (REINFORCEMENT ) (REINFORCEMENT ) LEARNING LEARNING

SLIDE 11

MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008 Lucas PALETTA - Computational Perception Group (CAPE)

Predicted Reward & Predicted Reward & Perceptual States Perceptual States

Demonstration

← ← est est. . future future cumulative cumulative reward reward ← ← classification classification affordance affordance cue cue ← ← proprioceptive proprioceptive features features ← ←

bservation
bservation of
f own
wn

actions actions ← ←

bservation
bservation of
f environment

environment

SLIDE 12

MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008 Lucas PALETTA - Computational Perception Group (CAPE)

prospective states and affordance cues

Perception and Learning Perception and Learning

Reinforcement Learning of Reinforcement Learning of Affordance based Cues Affordance based Cues

SLIDE 13

MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008 Lucas PALETTA - Computational Perception Group (CAPE)

Perception and Learning Perception and Learning

Learning Learning on Real World Imagery

n Real World Imagery
Real World Images with Image Analysis
Simulation of Magnet signals and Crane´s Rope Tension

SLIDE 14

MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008 Lucas PALETTA - Computational Perception Group (CAPE)

Perception and Learning Perception and Learning

Video on Real-World Data Video on Real-World Data

SLIDE 15

MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008 Lucas PALETTA - Computational Perception Group (CAPE)

Perception and Learning Perception and Learning

Key Contributions Key Contributions

Reinforcement Learning Methodology for Affordance Cueing

⇒ ⇒ Exploratory Learning without Supervision

Exploratory Learning without Supervision

⇒ ⇒ Reward Signal Determines Outcome Events

Reward Signal Determines Outcome Events

⇒ ⇒ Backtracking Determines Perceptual

Backtracking Determines Perceptual ‘ ‘Cue Cue’ ’ State State

⇒ ⇒ Enables Largely Autonomous Learning of Cueing

Enables Largely Autonomous Learning of Cueing

Perception-Action Framework for Affordance Recognition

⇒ Implicit Learning of Affordance Relations

Implicit Learning of Affordance Relations

SLIDE 16

MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008 Lucas PALETTA - Computational Perception Group (CAPE)

Reinforcement Learning of Reinforcement Learning of Affordance Cues Affordance Cues

Lucas Paletta & Gerald Fritz Lucas Paletta & Gerald Fritz

Feature Feature detectors detectors Learning Learning of

affordance cues cues

Perception Module Perception Module

Architecture Architecture

Perception and Learning Perception and Learning

Supervised Learning of Supervised Learning of Affordance Cues Affordance Cues

Y N Y Y N

pruning pruning

tb = bottom: unknown (1086.0) tb = top: | structure = circ: nonliftable (552.0) | structure = rect: | | size > 1426 : liftable (402.0) | | size <= 1426 : | | | size <= 1410 : liftable (72.0) | | | size > 1410 : nonliftable (6.0)

P(Aliftable|circ) ≈ 0.00 P(Anonliftable|circ) ≈ 1.00 P(Aliftable|rect) ≈ 0.99 P(Anonliftable|rect) ≈ 0.01

Perception and Learning Perception and Learning

Affordance Hypotheses Affordance Hypotheses

PM PM BM BM EM EM LM LM

SUPERVISED SUPERVISED (DECISION TREE ) (DECISION TREE ) LEARNING LEARNING

Integration of Integration of Perception Perception Module and Module and Learning Learning Module Module

Decision Processes, Decision Processes, Rewards & Affordance Cues Rewards & Affordance Cues

s st

R Rt

a at

Closed-Loop Learning Closed-Loop Learning

Reinforcement (Q-)Learning Reinforcement (Q-)Learning

s s‘ s‘‘ R = 0 R = +1

( )

predicted reward of early trigger state

Generalized Trigger Features and Generalized Trigger Features and Execution of Interaction Execution of Interaction

Integration of Integration of Perception Perception Module and Module and Learning Learning Module Module

PM PM BM BM EM EM LM LM EXPLORATORY EXPLORATORY (REINFORCEMENT ) (REINFORCEMENT ) LEARNING LEARNING

Predicted Reward & Predicted Reward & Perceptual States Perceptual States

Demonstration

← ← est est. . future future cumulative cumulative reward reward ← ← classification classification affordance affordance cue cue ← ← proprioceptive proprioceptive features features ← ←

actions actions ← ←

environment

prospective states and affordance cues

Perception and Learning Perception and Learning

Reinforcement Learning of Reinforcement Learning of Affordance based Cues Affordance based Cues

Perception and Learning Perception and Learning

Learning Learning on Real World Imagery

Perception and Learning Perception and Learning

Video on Real-World Data Video on Real-World Data

Perception and Learning Perception and Learning

Key Contributions Key Contributions

⇒ ⇒ Exploratory Learning without Supervision

Exploratory Learning without Supervision

⇒ ⇒ Reward Signal Determines Outcome Events

Reward Signal Determines Outcome Events

⇒ ⇒ Backtracking Determines Perceptual

Backtracking Determines Perceptual ‘ ‘Cue Cue’ ’ State State

⇒ ⇒ Enables Largely Autonomous Learning of Cueing

Enables Largely Autonomous Learning of Cueing

⇒ Implicit Learning of Affordance Relations

Implicit Learning of Affordance Relations

Perception and Learning Perception and Learning

Directions of Future Work Directions of Future Work

(C, B, O)