COMP 150: Developmental Robotics Instructor: Jivko Sinapov - - PowerPoint PPT Presentation

comp 150 developmental robotics
SMART_READER_LITE
LIVE PREVIEW

COMP 150: Developmental Robotics Instructor: Jivko Sinapov - - PowerPoint PPT Presentation

COMP 150: Developmental Robotics Instructor: Jivko Sinapov www.cs.tufts.edu/~jsinapov Language Acquisition Something fun... Announcements Project Deadlines Project Presentations: Dec 5 and 7 Final Report + Deliverables: Dec 11


slide-1
SLIDE 1

COMP 150: Developmental Robotics

Instructor: Jivko Sinapov www.cs.tufts.edu/~jsinapov

slide-2
SLIDE 2

Language Acquisition

slide-3
SLIDE 3

Something fun...

slide-4
SLIDE 4

Announcements

slide-5
SLIDE 5

Project Deadlines

  • Project Presentations: Dec 5 and 7
  • Final Report + Deliverables: Dec 11
  • Deliverables:

– Presentation slides + videos – Final Report (PDF) – Source code (link to github repositories)

slide-6
SLIDE 6

Presentation Guidelines

  • 10 minutes talk + 5 min for questions
  • Practice! Time your presentation when you

practice and use a timer during the actual presentation as well

  • My advice: find another group and practice

to each other

  • Format: Google Slides (so that we don’t

have to switch computers)

slide-7
SLIDE 7

Presentation Schedule – Tue Dec 5

  • Raina Galbiati, Doo-yun Her, and Cassie Collins
  • Azmina Karukappadath, Sam Weiss, and Yuelin

Liu

  • Timi Dayo-Kayode, Michael Edegware, and

Jong Seo Yoon

  • Matt Ryan
  • Meghan O'Brien, Tooba Ahsen, and Elizabeth

Lanzilla

slide-8
SLIDE 8

Presentation Schedule – Thu Dec 7

  • Julia Novakoff, Teddy Laurita, and George

Pesmazoglou

  • Eric Chen, Matt Shenton, and Avi Block
  • Ari Brown and Julie Jiang
  • Christopher Hylwa, Sonal Chatter, and Brett

Gurman

  • Brad Oosterveld, Tyler Frascav
slide-9
SLIDE 9

Final Report Guidelines

  • Approximately 8 pages + 1 page for references
  • Default Google Doc template or default overleaf

LaTeX template

  • A bit about the structure…
  • May include appendix if you have a lot of visual

results

slide-10
SLIDE 10

Last Readings and Homework

  • See course website
slide-11
SLIDE 11

Language Acquisition

slide-12
SLIDE 12

The Turing Test

slide-13
SLIDE 13

The Turing Test

slide-14
SLIDE 14

The Turing Test

slide-15
SLIDE 15

The First ChatBot (~1966)

slide-16
SLIDE 16

ELIZA

  • http://psych.fullerton.edu/mbirnbaum/psych101/

Eliza.htm

slide-17
SLIDE 17

Discussion: what is missing from programs like ELIZA?

slide-18
SLIDE 18

Natural Language Processing

  • The study of algorithms and data structures

used to manipulate text and text-like data

  • Applications in information retrieval, web

search, dialogue agents, text mining, etc.

  • Traditionally, not concerned with connecting

semantic representations to the real world

slide-19
SLIDE 19

Example: Computing Parse Trees

slide-20
SLIDE 20

Example: Document Classification

https://abbyy.technology/_media/en:features:classification- scheme.png

slide-21
SLIDE 21

Example: Word Embeddings

https://image.slidesharecdn.com/introductiontowordembeddings-160405062343/95/a-simple-introduction-to-word-embeddings-5-638.jpg?cb=1494520542

slide-22
SLIDE 22

The Symbol Grounding Problem

“How can the semantic interpretation of a formal symbol system be made intrinsic to the system, rather than just parasitic on the meanings in our heads? How can the meanings of the meaningless symbol tokens, manipulated solely on the basis

  • f their (arbitrary)shapes, be grounded in

anything but other meaningless symbols?”

  • Steven Hamas, 1990
slide-23
SLIDE 23

Deb Roy, “Grounding Language in the World: Schema Theory Meets Semiotics” (2005)

slide-24
SLIDE 24

Circular Definitions

slide-25
SLIDE 25

Grounding

slide-26
SLIDE 26

Sensor Projections

slide-27
SLIDE 27

Sensor Projections

INPUT IMAGE Color Histogram

slide-28
SLIDE 28

Transformer Projection

slide-29
SLIDE 29

Transformer Projection

Color Histogram Entropy of Histogram

slide-30
SLIDE 30

Categorizer

Entropy of Histogram “Multicolored”

slide-31
SLIDE 31

Action Projector

slide-32
SLIDE 32
slide-33
SLIDE 33
slide-34
SLIDE 34

Schemas for Actions

slide-35
SLIDE 35

Schemas for Objects

slide-36
SLIDE 36

Spatial Relations

slide-37
SLIDE 37

Deb Roy’s Definition of Grounding

  • “I define grounding as a causal-predictive cycle

by which an agent maintains beliefs about its world.” (p. 8)

  • “An agent’s basic grounding cycle cannot

require mediation by another agent.” (p. 9)

  • “An autonomous robot simply cannot afford to

have a human in the loop interpreting sensory data on its behalf.” (p. 9)

slide-38
SLIDE 38
  • “Cyclic interactions between robots and their

environment, when well designed, enable a robot to learn, verify, and use world knowledge to pursue goals. I believe we should extend this design philosophy to the domain of language and intentional communication.” (p. 5)

slide-39
SLIDE 39
  • “causality alone is not a sufficient basis for

grounding beliefs. Grounding also requires prediction of the future with respect to the agent’s own actions.” (p. 10)

  • “The problem with ignoring the predictive part of

the grounding cycle has sometimes been called the ”homunculus problem”.”

slide-40
SLIDE 40
slide-41
SLIDE 41

Take Home Message

Language should be grounded in terms of the robot’s own perceptual and sensorimotor capabilities

slide-42
SLIDE 42

Thomason, J., Sinapov, J., Svetlik, M., Stone, P., and Mooney, R. (2016) Learning Multi-Modal Grounded Linguistic Semantics by Playing I, Spy In proceedings of the 2016 International Joint Conference on Artificial Intelligence (IJCAI)

slide-43
SLIDE 43

43

Motivation: Grounded Language Learning

Robot, fetch me the green empty bottle

slide-44
SLIDE 44

44

Exploratory Behaviors in our Robot

slide-45
SLIDE 45

45

Video

slide-46
SLIDE 46

46

Video

slide-47
SLIDE 47

47

Video

slide-48
SLIDE 48

48

Sensorimotor Feature Extraction

Time Joint Efforts (Haptics) . . . . . .

slide-49
SLIDE 49

49

Sensorimotor Contexts

grasp lift hold lower drop

proprio- ception

push press

haptics

look

audio shape color VGG

slide-50
SLIDE 50

50

Sensorimotor Contexts

grasp lift hold lower drop

proprio- ception

push press

haptics

look

audio shape color VGG

slide-51
SLIDE 51

51

Feature Extraction: Color

Color Histogram (4 x 4 x 4 = 64 bins)

Object Segmentation

slide-52
SLIDE 52

52

Feature Extraction: Shape

3D Object Point Cloud Histogram of Shape Features

slide-53
SLIDE 53

53

Joint-Torque values for all joints Joint-Torque Features

Feature Extraction: Haptics

slide-54
SLIDE 54

54

Feature Extraction: Audio

audio spectrogram Spectro-temporal Features

slide-55
SLIDE 55

55

Feature Extraction: VGG

slide-56
SLIDE 56

56

Feature Extraction: VGG

slide-57
SLIDE 57

57

Data from a single exploratory trial

grasp lift hold lower drop

proprio- ception

push press

haptics

look

audio shape color VGG

x 5 per object

slide-58
SLIDE 58

58

Category Recognition Overview

Category Recognition Models

Sensorimotor Feature Extraction Interaction with Object Category Estimates

. . . Empty? Red? Container?

Sinapov, J., Schenck, C., and Stoytchev, A. (2014). Learning Relational Object Categories Using Behavioral Exploration and Multimodal Perception In the Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA)

slide-59
SLIDE 59

59

Key Questions

How can the robot learn object-related words from everyday human users? Do human users use non-visual object descriptors when referring to objects?

slide-60
SLIDE 60

60

Object Exploration Dataset

32 common household and

  • ffice items

Each object was explored a total of 5 times with 7 different behaviors The robot perceived objects using the visual, auditory, and haptic sensory modalities

Thomason, J., Sinapov, J., Svetlik, M., Stone, P., and Mooney, R. (2016). Learning Multi-Modal Grounded Linguistic Semantics by Playing I, Spy In proceedings of the 2016 International Joint Conference on Artificial Intelligence (IJCAI)

slide-61
SLIDE 61

61

Our attempt: I-Spy game

slide-62
SLIDE 62

62

Learning Words via Game-play

Human: “an empty metallic aluminum container”

slide-63
SLIDE 63

63

Semantic Parsing

slide-64
SLIDE 64

64

Example Words for an Object

slide-65
SLIDE 65

65

Learning Words via Game-play

slide-66
SLIDE 66

66

Learning Words via Game-play

Human: “a tall blue cylindrical container”

slide-67
SLIDE 67

67

Learning Words via Game-play

Robot: “open half-full container”

slide-68
SLIDE 68

68

Asking Verification Questions

slide-69
SLIDE 69

69

Results

slide-70
SLIDE 70

70

“can” “tall” “half-full” “pink” WORD F-measure improvement as a result of adding non- visual modalities 0.857 0.516 0.463

. . . . . . . .

slide-71
SLIDE 71

71

Summary of Experiment

  • The robot learned over 80 words through interactive

game play

  • The robot's word representations were grounded in

multiple behaviors and sensory modalities

  • Future Work:

– Active action selection when classifying a new object – Active action selection when learning a new words – Actively seek humans out for help with learning about

  • bjects
slide-72
SLIDE 72

72

“Opportunistic” Active Learning

Thomason, J., Padmakumar, A., Sinapov, J., Hart, J., Stone, P., and Mooney, R. (2017) Opportunistic Active Learning for Grounding Natural Language Descriptions In proceedings of the 1st Annual Conference on Robot Learning (CoRL 2017)

slide-73
SLIDE 73

73

“Opportunistic” Active Learning

Thomason, J., Padmakumar, A., Sinapov, J., Hart, J., Stone, P., and Mooney, R. (2017) Opportunistic Active Learning for Grounding Natural Language Descriptions In proceedings of the 1st Annual Conference on Robot Learning (CoRL 2017)

slide-74
SLIDE 74

74

What actions should the robot perform when learning a new word?

  • Baseline: perform all actions on a set of labeled
  • bjects and estimate which ones work well
  • But can we do better?
slide-75
SLIDE 75

75

Behavior Scores for Words

slide-76
SLIDE 76

76

Word Embeddings

Thomason, J., Sinapov, J., Stone, P., and Mooney, R. (2018) Guiding Exploratory Behaviors for Multi-Modal Grounding of Linguistic Descriptions To appear in proceedings of the 32nd Conference of the Association for the Advancement of Artificial Intelligence (AAAI)

slide-77
SLIDE 77

77

Word Embeddings

Thomason, J., Sinapov, J., Stone, P., and Mooney, R. (2018) Guiding Exploratory Behaviors for Multi-Modal Grounding of Linguistic Descriptions To appear in proceedings of the 32nd Conference of the Association for the Advancement of Artificial Intelligence (AAAI)

slide-78
SLIDE 78

78

Word Embeddings

Thomason, J., Sinapov, J., Stone, P., and Mooney, R. (2018) Guiding Exploratory Behaviors for Multi-Modal Grounding of Linguistic Descriptions To appear in proceedings of the 32nd Conference of the Association for the Advancement of Artificial Intelligence (AAAI)

slide-79
SLIDE 79

79

Results

slide-80
SLIDE 80

80

Results

slide-81
SLIDE 81

Discussion

  • What are some of the limitations of these

approaches?

  • When will they fail?
slide-82
SLIDE 82

Project Breakout

slide-83
SLIDE 83
slide-84
SLIDE 84