Teaching a robot to interpret natural language navigation - - PowerPoint PPT Presentation

teaching a robot to interpret natural language navigation
SMART_READER_LITE
LIVE PREVIEW

Teaching a robot to interpret natural language navigation - - PowerPoint PPT Presentation

Teaching a robot to interpret natural language navigation instructions Ryan Eloff Supervisor: Dr. H. Kamper Department of Electrical & Electronic Engineering Maties Machine Learning Stellenbosch University 23 February 2018 UNIVERSITEIT


slide-1
SLIDE 1

Teaching a robot to interpret natural language navigation instructions

Ryan Eloff

Supervisor: Dr. H. Kamper Department of Electrical & Electronic Engineering Maties Machine Learning Stellenbosch University

23 February 2018

UNIVERSITEIT •STELLENBOSCH •UNIVERSITY

jou kennisvennoot

  • your knowledge partner
slide-2
SLIDE 2

Natural language for human-robot interaction

The Gambit platform which learns to identify and pick up objects from a users speech and gestures.1

  • 1C. Matuszek, L. Bo, L. Zettlemoyer, et al., “Learning from unscripted deictic gesture and

language for human-robot interactions,” in Proceedings of the Twenty-Eighth AAAI Conference

  • n Artificial Intelligence, 2014.

2 / 20

slide-3
SLIDE 3

Natural language for human-robot interaction

A robotic wheelchair that learns about the environment from a narrated tour.2

  • 2M. R. Walter, S. Hemachandra, B. Homberg, et al., “A framework for learning semantic

maps from grounded natural language descriptions,” The International Journal of Robotics Research, 2014.

3 / 20

slide-4
SLIDE 4

Natural language for human-robot interaction

◮ Still face challenges in building robots that can understand our written natural language instructions.

4 / 20

slide-5
SLIDE 5

Natural language for human-robot interaction

◮ Still face challenges in building robots that can understand our written natural language instructions. ◮ Most existing systems are only available in most commonly used languages: English, German, Spanish ...

4 / 20

slide-6
SLIDE 6

Natural language for human-robot interaction

◮ Still face challenges in building robots that can understand our written natural language instructions. ◮ Most existing systems are only available in most commonly used languages: English, German, Spanish ... ◮ These technologies are not available for low-resource languages. ◮ This is a problem in South Africa!

4 / 20

slide-7
SLIDE 7

Natural language instructions for route direction

Sofa Hatrack

5 / 20

slide-8
SLIDE 8

Natural language instructions for route direction

Sofa Hatrack

◮ “Walk forward once” ◮ “Move to the sofa” ◮ “Go down till the chair” ◮ “Go forward one segment to the intersection with a bare concrete floor”

5 / 20

slide-9
SLIDE 9

Natural language instructions for route direction

Sofa Hatrack

◮ “Walk forward once” ◮ “Move to the sofa” ◮ “Go down till the chair” ◮ “Go forward one segment to the intersection with a bare concrete floor”

Can we build a language independent system for interpreting written natural language navigation instructions that is also less dependent on data?

5 / 20

slide-10
SLIDE 10

Evaluation, virtual environments, & web application

6 / 20

slide-11
SLIDE 11

Example instruction data

◮ Example 1: “Walk two steps forward”

7 / 20

slide-12
SLIDE 12

Example instruction data

◮ Example 2: “Move forward and turn right”

8 / 20

slide-13
SLIDE 13

Example instruction data

◮ Example 3: “Move to the easel”

9 / 20

slide-14
SLIDE 14

Unseen instruction problem

◮ Unseen: “Walk three steps to the easel”

10 / 20

slide-15
SLIDE 15

General model

11 / 20

slide-16
SLIDE 16

General model

Model Sentence Embeddings Navigation Plan Constructor x(i)

embedded

p(i) Instruction x(i) World State w(i) Observations Testing Training Instruction x(j) World State w(j) Sentence Embeddings x(j)

embedded

Navigation Plan Executor p(j) Output Action Sequence a(j) Test Input Action Sequence a(i) (“Black Box” Learning Algorithm)

11 / 20

slide-17
SLIDE 17

General model

Model Sentence Embeddings Navigation Plan Constructor x(i)

embedded

p(i) Instruction x(i) World State w(i) Observations Testing Training Instruction x(j) World State w(j) Sentence Embeddings x(j)

embedded

Navigation Plan Executor p(j) Output Action Sequence a(j) Test Input Action Sequence a(i) (“Black Box” Learning Algorithm)

11 / 20

slide-18
SLIDE 18

General model

Model Sentence Embeddings Navigation Plan Constructor x(i)

embedded

p(i) Instruction x(i) World State w(i) Observations Testing Training Instruction x(j) World State w(j) Sentence Embeddings x(j)

embedded

Navigation Plan Executor p(j) Output Action Sequence a(j) Test Input Action Sequence a(i) (“Black Box” Learning Algorithm)

11 / 20

slide-19
SLIDE 19

General model

Model Sentence Embeddings Navigation Plan Constructor x(i)

embedded

p(i) Instruction x(i) World State w(i) Observations Testing Training Instruction x(j) World State w(j) Sentence Embeddings x(j)

embedded

Navigation Plan Executor p(j) Output Action Sequence a(j) Test Input Action Sequence a(i) (“Black Box” Learning Algorithm)

11 / 20

slide-20
SLIDE 20

General model

Model Sentence Embeddings Navigation Plan Constructor x(i)

embedded

p(i) Instruction x(i) World State w(i) Observations Testing Training Instruction x(j) World State w(j) Sentence Embeddings x(j)

embedded

Navigation Plan Executor p(j) Output Action Sequence a(j) Test Input Action Sequence a(i) (“Black Box” Learning Algorithm)

11 / 20

slide-21
SLIDE 21

General model

Model Sentence Embeddings Navigation Plan Constructor x(i)

embedded

p(i) Instruction x(i) World State w(i) Observations Testing Training Instruction x(j) World State w(j) Sentence Embeddings x(j)

embedded

Navigation Plan Executor p(j) Output Action Sequence a(j) Test Input Action Sequence a(i) (“Black Box” Learning Algorithm)

11 / 20

slide-22
SLIDE 22

Proposed models

k-Nearest neighbours (kNN) models:

f1 f2 q2 q1 q3

Class labels: Turn ( LEFT ) Turn ( RIGHT ) Distance metric

Proposed distance metrics: ◮ Levenshtein distance ◮ Euclidean distance

12 / 20

slide-23
SLIDE 23

Proposed models

k-Nearest neighbours (kNN) models:

f1 f2 q2 q1 q3

Class labels: Turn ( LEFT ) Turn ( RIGHT ) Distance metric

Proposed distance metrics: ◮ Levenshtein distance ◮ Euclidean distance Basic one-vs-rest (OVR) logistic regression model:

Input features x(i) θ(0) . . . OVR Parameter Vectors y0 y1 ym yk Output plan label p(i)

k

max index k Output class probabilities Encoded Instruction World State w(i)

n−1

. . . . . . θ(1) θ(m)

12 / 20

slide-24
SLIDE 24

Results: Overall model performance

◮ Overall performance of the proposed models and that of previous work: Model F1 Accuracy kNN with Levenshtein distance 77.10 55.25 kNN with Euclidean distance 75.44 51.65 Basic logistic regression 78.18 57.32 Stepwise logistic regression 72.58 44.36 Chen and Mooney (2011) 68.37 54.40 Chen (2012) 69.43 57.28 Mei et al. (2016)

  • 69.98

13 / 20

slide-25
SLIDE 25

Results: Performance on small datasets

◮ Average strict accuracy of the proposed models when trained on different sample sizes n:

Number of samples [n] Accuracy [%] 64 128 640 192 256 320 384 448 512 576 20 40 60 50 30 10 70 Legend: kNN with Euclidean distance kNN with Levenshtein distance Basic logistic regression Stepwise logistic regression

14 / 20

slide-26
SLIDE 26

Results: Performance on an Afrikaans corpus

◮ Proposed models evaluated on an Afrikaans translation of the benchmark English corpus*:

77.1 75.05 73.52 73.5 78.18 76 72.58 71.29 10 20 30 40 50 60 70 80 90

English corpus Afrikaans corpus F1-score [%] kNN with Levenshtein distance kNN with Euclidean distance Basic logistic regression Stepwise logistic regression

55.25 52.39 51.39 51.52 57.32 54.9 44.36 42.75 10 20 30 40 50 60 70

English corpus Afrikaans corpus Accuracy [%] kNN with Levenshtein distance kNN with Euclidean distance Basic logistic regression Stepwise logistic regression

* English corpus translated to Afrikaans using Google’s state-of-the-art Neural Machine Translation system.

15 / 20

slide-27
SLIDE 27

Contributions

◮ Simple machine learning models used for the novel purpose of learning from limited datasets, which is competitive with complex systems of previous work.

16 / 20

slide-28
SLIDE 28

Contributions

◮ Simple machine learning models used for the novel purpose of learning from limited datasets, which is competitive with complex systems of previous work. ◮ Web application has data collection capabilities which is ideal for future research on human-robot interaction.

16 / 20

slide-29
SLIDE 29

Contributions

◮ Simple machine learning models used for the novel purpose of learning from limited datasets, which is competitive with complex systems of previous work. ◮ Web application has data collection capabilities which is ideal for future research on human-robot interaction. ◮ These models are the first steps and foundation for future research on more complex models for limited data. (and speech!!)

16 / 20

slide-30
SLIDE 30

Contributions

◮ Simple machine learning models used for the novel purpose of learning from limited datasets, which is competitive with complex systems of previous work. ◮ Web application has data collection capabilities which is ideal for future research on human-robot interaction. ◮ These models are the first steps and foundation for future research on more complex models for limited data. (and speech!!) ◮ First Afrikaans system for Robotic instruction following. The system may also be trained in any language, so it is ideal for low-resource languages.

16 / 20

slide-31
SLIDE 31

My current work

◮ One-shot learning of small multimodal datasets consisting of images combined with speech captions. ◮ One-shot learning example:3

(1) (2) (3)

Which of these is a ”dax”? This is a ”dax”.

◮ Goal: Develop systems that learn and think like people.

  • 3R. Feinman and B. M. Lake, “Learning inductive biases with simple neural networks,”

arXiv:1802.02745, 2018.

17 / 20

slide-32
SLIDE 32

My current work

◮ One-shot learning of small multimodal datasets consisting of images combined with speech captions. ◮ One-shot learning example:3

(1) (2) (3)

Which of these is a ”dax”? This is a ”dax”.

◮ Goal: Develop systems that learn and think like people.

  • 3R. Feinman and B. M. Lake, “Learning inductive biases with simple neural networks,”

arXiv:1802.02745, 2018.

17 / 20

slide-33
SLIDE 33

My current work

◮ One-shot learning of small multimodal datasets consisting of images combined with speech captions. ◮ One-shot learning example:3

(1) (2) (3)

Which of these is a ”dax”? This is a ”dax”.

◮ Goal: Develop systems that learn and think like people.

  • 3R. Feinman and B. M. Lake, “Learning inductive biases with simple neural networks,”

arXiv:1802.02745, 2018.

17 / 20

slide-34
SLIDE 34

Some other slides ...

18 / 20

slide-35
SLIDE 35

Word embeddings with latent semantic analysis

Example instructions: ◮ d1 : Move forward two steps ◮ d2 : Travel forward ◮ d3 : Travel two steps ◮ d4 : Turn left ◮ d5 : Turn right and move to the chair

1.75 1.50 1.25 1.00 0.75 0.50 0.25 0.00 1.5 1.0 0.5 0.0 0.5 move forward two steps travel turn left right chair d1 d2 d3 d4 d5 Word Embedding Sentence Embedding 19 / 20

slide-36
SLIDE 36

Word embeddings with latent semantic analysis

Example instructions: ◮ d1 : Move forward two steps ◮ d2 : Travel forward ◮ d3 : Travel two steps ◮ d4 : Turn left ◮ d5 : Turn right and move to the chair

1.75 1.50 1.25 1.00 0.75 0.50 0.25 0.00 1.5 1.0 0.5 0.0 0.5 move forward two steps travel turn left right chair d1 d2 d3 d4 d5 Word Embedding Sentence Embedding 19 / 20

slide-37
SLIDE 37

Stepwise logistic regression

Stepwise one-vs-rest (OVR) logistic regression model:

Input features x(i) θ(0) . . . OVR Parameter Vectors y0 y1 ym yk Output plan label s(i)

n

max index k Output class probabilities Encoded Instruction World State w(i)

n−1

Previous plan s(i)

n−1

. . . . . . Execute plan and repeat until Stop action s(i)

0 = Start action

Update after each prediction Output sequence p(i) = {s(i)

1 , . . . , s(i) N }

Stop action θ(1) θ(m)

20 / 20

slide-38
SLIDE 38

Stepwise logistic regression

Stepwise one-vs-rest (OVR) logistic regression model:

Input features x(i) θ(0) . . . OVR Parameter Vectors y0 y1 ym yk Output plan label s(i)

n

max index k Output class probabilities Encoded Instruction World State w(i)

n−1

Previous plan s(i)

n−1

. . . . . . Execute plan and make new prediction s(i)

0 = Start action

Update after each prediction Output sequence p(i) = {s(i)

1 , . . . , s(i) N }

Stop action θ(1) θ(m)

20 / 20

slide-39
SLIDE 39

Stepwise logistic regression

Stepwise one-vs-rest (OVR) logistic regression model:

Input features x(i) θ(0) . . . OVR Parameter Vectors y0 y1 ym yk Output plan label s(i)

n

max index k Output class probabilities Encoded Instruction World State w(i)

n−1

Previous plan s(i)

n−1

. . . . . . Execute plan and make new prediction until Stop action s(i)

0 = Start action

Update after each prediction Output sequence p(i) = {s(i)

1 , . . . , s(i) N }

Stop action θ(1) θ(m)

20 / 20