Category-Based Intrinsic Motivation Lisa Meeden Rachel Lee Ryan - - PowerPoint PPT Presentation

category based intrinsic motivation
SMART_READER_LITE
LIVE PREVIEW

Category-Based Intrinsic Motivation Lisa Meeden Rachel Lee Ryan - - PowerPoint PPT Presentation

Category-Based Intrinsic Motivation Lisa Meeden Rachel Lee Ryan Walker Swarthmore College, USA James Marshall Sarah Lawrence College, USA 9 th International Conference on Epigenetic Robotics Venice, Italy November 12-14, 2009 Research goals


slide-1
SLIDE 1

Category-Based Intrinsic Motivation

Lisa Meeden Rachel Lee Ryan Walker

Swarthmore College, USA

James Marshall

Sarah Lawrence College, USA

9th International Conference on Epigenetic Robotics Venice, Italy November 12-14, 2009

slide-2
SLIDE 2
  • Design a robot control architecture that implements an
  • ngoing, autonomous developmental learning process
  • Test it on a physical robot
  • Essential components

– Categorization – Prediction – Intrinsic motivation

Research goals

slide-3
SLIDE 3
  • “Categorization is of such fundamental importance for cognition

and intelligent behavior that a natural organism incapable of forming categories does not have much chance of survival”

—Lungarella et al., Developmental robotics: A survey, Connection Science, 2003

  • “The ability to make predictions is part of the core initial

knowledge on top of which human cognition is built”

—Spelke, Core knowledge, American Psychologist, 2000

  • “Intrinsic motivation is the inherent tendency to seek out

novelty and challenges, to extend and exercise one's capacities, to explore, and to learn”

—Ryan and Deci, American Psychologist, 2000

slide-4
SLIDE 4
  • Categorization

– Growing Neural Gas (Fritzke, 1995)

  • Prediction

– Artificial neural networks

  • Intrinsic motivation

– Intelligent Adaptive Curiosity (Oudeyer et al., 2007)

  • Physical robot

– Rovio

Implementation

slide-5
SLIDE 5

Environment

slide-6
SLIDE 6
  • Robot observes its surroundings visually and decides
  • n its own what actions to perform
  • Categorizes its sensory input based on similiarity to

previous inputs

  • Predicts what it will see on the next time step as a

result of performing a particular action

  • Learns by comparing its prediction to what it actually
  • bserves
  • Intrinsically motivated to choose actions that

maximize learning progress

Overview of experiment

slide-7
SLIDE 7

Perception-action loop

SMa SMb SMc SMd SMe

GNG Categories

experta expertb expertc experte expertd

slide-8
SLIDE 8

Perceive current sensory input S(t)

Camera image

S(t)

SMa SMb SMc SMd SMe

GNG Categories

experta expertb expertc experte expertd

slide-9
SLIDE 9

Consider possible motor actions

Camera image

M1(t) M2(t) M3(t) S(t)

Possible motor actions SMa SMb SMc SMd SMe

GNG Categories

experta expertb expertc experte expertd

slide-10
SLIDE 10

Find best matching categories

Camera image

M1(t) M2(t) M3(t)

Possible motor actions

SM(t)

Sensorimotor input

S(t)

SMa SMb SMc SMd SMe

GNG Categories

experta expertb expertc experte expertd

slide-11
SLIDE 11

Find best matching categories

Camera image

M1(t) M2(t) M3(t)

Possible motor actions

SM(t)

Sensorimotor input

S(t)

SMa SMb SMc SMd SMe

GNG Categories

experta expertb expertc experte expertd

slide-12
SLIDE 12

Find best matching categories

Camera image

M1(t) M2(t) M3(t)

Possible motor actions

SM(t)

Sensorimotor input

S(t)

SMa SMb SMc SMd SMe

GNG Categories

experta expertb expertc experte expertd

slide-13
SLIDE 13

Select expert with maximal learning progress

Camera image

SM(t)

Sensorimotor input

S(t)

SMa SMb SMc SMd SMe

GNG Categories

experta expertb expertc experte expertd

Selected action

M1(t) M2(t) M3(t)

Prediction

S'(t+1)

slide-14
SLIDE 14

Perform selected action and observe outcome

SMa SMb SMc SMd SMe

GNG Categories

experta expertb expertc experte expertd

Prediction

S'(t+1)

Outcome

S(t+1) SM(t)

Sensorimotor input

slide-15
SLIDE 15

Update expert based on prediction error

SMa SMb SMc SMd SMe

GNG Categories

experta expertb expertc experte expertd

Prediction

S'(t+1)

Outcome

S(t+1) SM(t)

Sensorimotor input

slide-16
SLIDE 16

Adjust GNG categories

SMa SMb SMc SMd SMe

GNG Categories

experta expertb expertc experte expertd

slide-17
SLIDE 17
  • Camera images
  • Red, green, and/or blue can

be detected

  • Robot chooses which color

to focus on

  • Sensory vector S(t) = (red, green, blue, area, position)
  • Example: (0, 1, 1, 0.12, 0.5)

Perceptions

slide-18
SLIDE 18

Example: Robot chooses to look at red

S(t) = (red, green, blue, area, position) = (1, 1, 1, 0.23, 1.0)

slide-19
SLIDE 19

Example: Robot chooses to look at blue

S(t) = (red, green, blue, area, position) = (1, 1, 0, 0, 0)

slide-20
SLIDE 20
  • Which color to focus on
  • How much to rotate
  • Motor vector M(t) = (colorFocus, rotation)

– colorFocus  [0...1]  [Red...Green...Blue] – rotation  [0...1]  [Left...Right]

  • Example: (0.8, 0.2) = focus on blue, turn left
  • Sensorimotor vectors SM(t) are 7-dimensional

Motor actions

slide-21
SLIDE 21
  • Green walls offer a constant background, which is

easy to predict

  • Red static robot is also predictable, but is only visible

from a few positions

  • Blue moving robot is smaller and harder to predict

Learning opportunities

slide-22
SLIDE 22

GNG growth over time

slide-23
SLIDE 23

Categories formed over time

slide-24
SLIDE 24

Categories formed over time

slide-25
SLIDE 25

Categories formed over time

slide-26
SLIDE 26

Categories formed over time

slide-27
SLIDE 27

Categories formed over time

slide-28
SLIDE 28

Categories formed over time

slide-29
SLIDE 29
  • Choosing actions at random causes the robot to focus
  • n blue about 33% of the time, regardless of whether

blue is actually present in the image

  • The presence or absence of blue is relatively

uncorrelated with the robot's choice of color channel

  • This is to be expected

Results: Random controller

slide-30
SLIDE 30

Random controller

r = 0.17

slide-31
SLIDE 31
  • Choosing intrinsically-motivated actions causes the

robot to focus on blue much more often when blue is present in the image

  • The correlation coefficient increases from 0.17 to

0.57

  • This correlation becomes progressively stronger
  • ver time, showing that the Rovio is learning to track

the smaller blue robot

Results: Intrinsically motivated controller

slide-32
SLIDE 32

Intrinsically motivated controller

r = 0.57

slide-33
SLIDE 33

Intrinsically motivated controller

r = 0.42

slide-34
SLIDE 34

Intrinsically motivated controller

r = 0.59

slide-35
SLIDE 35

Intrinsically motivated controller

r = 0.65

slide-36
SLIDE 36
  • By comparing the results for the red color channel and

the blue color channel, evidence for a developmental trajectory can be seen

  • In the early stages of learning, the Rovio focuses more

closely on the predictable red robot

  • Later on, the Rovio shifts to tracking the harder-to-

predict blue robot more closely

Results: Developmental trajectory

slide-37
SLIDE 37

Developmental trajectory

Phase 1 Phase 2 Phase 3

slide-38
SLIDE 38
  • Combining GNG's categorization with IAC's measure
  • f learning progress allows the robot to develop an

effective set of categories adapted to its environment

  • GNG grows only as much as necessary to capture the

significant relationships within the sensorimotor data

  • The robot gradually shifts from learning about features
  • f its world that are easy to predict to learning about

features that are harder to predict

Conclusions

slide-39
SLIDE 39
slide-40
SLIDE 40
  • The green color channel is active on nearly every time

step, and is the easiest to predict, yet always has the lowest overall focus

  • In 10 experiments performed with intrinsically-

motivated action selection, the robot consistently focused on blue and/or red more often than green

  • In 5 experiments performed with random action

selection, there was no significant difference in focus between red, green, or blue

Results

slide-41
SLIDE 41
  • Perceive current sensory input S(t)
  • Consider possible motor actions M1(t), M2(t), M3(t), ...
  • Find best matching category in memory for each

sensorimotor combination SM1(t), SM2(t), SM3(t), ...

  • Choose action M(t) associated with category with

maximal learning progress, and predict outcome S'(t+1)

  • Do M(t) and observe actual outcome S(t+1)
  • Use prediction error to update neural network weights
  • Adjust GNG categories to better match chosen SM(t)

Perception-Action Loop