Embodied Machines The Grounding (binding) Problem Real cognizers - - PowerPoint PPT Presentation

embodied machines
SMART_READER_LITE
LIVE PREVIEW

Embodied Machines The Grounding (binding) Problem Real cognizers - - PowerPoint PPT Presentation

Embodied Machines The Grounding (binding) Problem Real cognizers form multiple associations between concepts Affordances - how is an object interacted with Frames - Background structure against which concept is understood --


slide-1
SLIDE 1

Embodied Machines

  • The Grounding (binding) Problem

– Real cognizers form multiple associations between concepts

  • Affordances - how is an object interacted with
  • Frames - Background structure against which concept is understood
  • - sometimes highly complex (Educational system, family

relationships)

  • Emotions - witnessing event/seeing object conjures up emotional

states

  • Mental simulation - comprehending language may trigger imagistic

modeling of event based on experience

slide-2
SLIDE 2

Embodied Machines

– Mouse

  • Mammal, Small, furry, grey to brown, long whiskers, cats like

to play with them and then eat them, they’re used in experiments, ladies stand on chairs when they’re around, they squeak, they’re prolific breeders, they’re sold live as snake food, they’re one kind of rodent, they look a lot like rats, they are sometimes pets, they like to run on a wheel…

– Play

  • The opposite of work, it’s fun, kids do it, scheduled in during

grade school, you play games, you play with words, …

slide-3
SLIDE 3

Embodied Machines

– Approaches to meaning construction

  • NLP

– Text/speech is considered comprehended when parsed syntactically, and when word meanings have been assigned – Meaning is pre-determined by humans in some way

  • Embodied approach

– World has no structure until body begins to interact in it » Need goals & sensorimotor system – Experience --> meaning – Words map onto meaning

slide-4
SLIDE 4

Embodied Machines

– Steel’s talking heads

  • Simple robots

– Auditory & visual systems – Motivating goal = language game

  • Simple environment

– 2 dimensional world containing objects

  • Robots determine their own categories for objects
  • Robots determine their own labels for categories
  • Robots and environment are real physical entities
slide-5
SLIDE 5

Embodied Machines

– Cangelosi & Parisi

  • Virtual agents, virtual world
  • A kind of embodied learning

– Agents have physical location, orientation, movement capabilities within their environment – Agents consume mushrooms which affects their energy status – Agents (collectively) have a motivating task --> increase fitness

  • f species

– They sense perceptual characteristics, not mushrooms --> they learn which characteristics describe real vs. poisonous mushrooms – Agents (collectively) learn to categorize and label mushrooms

slide-6
SLIDE 6

Embodied Machines

– CELL (Deb Roy)

  • Cross channel Early Lexical Learning
  • Models embodied language learning using input that

approximates input to human infants

Instantiated in robot body with microphone/camera

  • CELL learns to form word meaning correspondences from

raw (unsegmented) audio and visual input

slide-7
SLIDE 7

Embodied Machines

– First Task

  • Segmentation

– Audio stream parsing into segments – Video stream parsing into objects – Segmentation process produces channel of ‘words’ and channel of shapes

– Second Task

  • Build a lexicon by identifying frequently co-occurring pairs of

audio & visual segments

slide-8
SLIDE 8

Embodied Machines

  • Illustrative example (not from actual data)
  • Imagine an utterance:

“…don’t throw the ball at the cat…”

Uttered in a scene containing these identified objects (Noise present)

slide-9
SLIDE 9

Embodied Machines

  • Objects not necessarily identified in same order as named in utterance
  • Time delays between utterance and object recognition highly likely

…throw the ball at the cat

slide-10
SLIDE 10

Embodied Machines

– Short term memory (STM) – look at a temporal window surrounding each word – Aim is to go back or forward far enough in time to have the word and referent in same window

…throw the ball at the cat

Short term memory

slide-11
SLIDE 11

Embodied Machines

– Window marches through data stream collecting segmented objects and words for possible mapping

…throw the ball at the cat

Short term memory

slide-12
SLIDE 12

Embodied Machines

…throw the ball at the cat

Short term memory

slide-13
SLIDE 13

Embodied Machines

…throw the ball at the cat

Short term memory

slide-14
SLIDE 14

Embodied Machines

  • Audio and visual segments that have a high degree of mutual

information—are likely semantically linked and should be saved in long term memory (LTM) 50 40 The 6 Cat 5 Ball

… …

Objects Words

116 59

∑unique

90,000 100 57

∑Unique occurrences

slide-15
SLIDE 15

Embodied Machines

  • Mutual information

MI = P(a&b) ≅ co-occurrence (a&b)

  • P(a) P(b) occurrence (a) * occurrence (b)

P (‘the’ & ) = 40/(90,000 * 59) = 0.0000075 P (‘cat’ & ) = 40/(100 * 59) = 0.0067

Words like ‘the’ are promiscuous. They co-occur with so many categories, they lack predictive power.

slide-16
SLIDE 16

Embodied Machines

  • Two implementations of CELL

– Robot – Learning from observing Infant/Caregiver interaction

slide-17
SLIDE 17

Embodied Machines

  • Robot

– Input: spoken utterances and images of objects acquired from video camera mounted on robot – Experimenter places objects in front of the robot and describes them – Acquisition of lexicon

  • Robot gathers visual information about environment while listening

to speech (discovers high MI pairs)

– Speech generation

  • Search for objects in environment then describe

– Speech understanding (maps word to object)

slide-18
SLIDE 18

Embodied Machines

  • Learning from infant-caregiver interaction

– Infants played with 7 classes of objects

  • Balls, shoes, keys, toy cars, trucks, dogs, horses
  • Care-giver/infant interaction was natural

– CELL attempted to build up lexicon from observing these interactions

  • Segmentation accuracy (segment boundaries correspond to word

boundaries?)

  • Word discovery (segments correspond to single word?)
  • Semantic accuracy (if word segmented properly, is it properly

mapped to an object?)

slide-19
SLIDE 19

Embodied Machines

  • Segmentation accuracy – 28% (compared to 7% for acoustic only

model)

  • Word discovery – 72% of segmented items were single words

(compared to 31% for acoustic only model)

  • Semantic accuracy – 57% of hypothesized lexical candidates are

both valid words and were linked to semantically relevant visual categories