The Extended SPaRKy Restaurant Corpus designing a corpus with - - PowerPoint PPT Presentation

the extended sparky restaurant corpus
SMART_READER_LITE
LIVE PREVIEW

The Extended SPaRKy Restaurant Corpus designing a corpus with - - PowerPoint PPT Presentation

The Extended SPaRKy Restaurant Corpus designing a corpus with variable information density David M. Howcroft Dietrich Klakow Vera Demberg Department of Language Science and Technology Saarland Informatics Campus, Saarland University, Germany


slide-1
SLIDE 1

The Extended SPaRKy Restaurant Corpus

designing a corpus with variable information density David M. Howcroft Dietrich Klakow Vera Demberg

Department of Language Science and Technology Saarland Informatics Campus, Saarland University, Germany

Interspeech 2017

@_dmh

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 1 / 11

slide-2
SLIDE 2

Spoken Dialogue Systems

Natural Language Understanding Automatic Speech Recognition User Dialogue Management Natural Language Generation Speech Synthesis

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 2 / 11

slide-3
SLIDE 3

Spoken Dialogue Systems

Natural Language Understanding Automatic Speech Recognition User Dialogue Management Natural Language Generation Speech Synthesis

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 2 / 11

slide-4
SLIDE 4

Spoken Dialogue Systems

Natural Language Understanding Automatic Speech Recognition User Dialogue Management Natural Language Generation Speech Synthesis

name price cuisine Due Fratelli $$ Italian Andalucia $$$ Spanish, Seafood

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 2 / 11

slide-5
SLIDE 5

Spoken Dialogue Systems

Natural Language Understanding Automatic Speech Recognition User Dialogue Management Natural Language Generation Speech Synthesis

name price cuisine Due Fratelli $$ Italian Andalucia $$$ Spanish, Seafood Due Fratelli serves average-priced Italian food, while Andalucia is a Spanish, seafood restaurant with moderately high prices.

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 2 / 11

slide-6
SLIDE 6

Spoken Dialogue Systems

Natural Language Understanding Automatic Speech Recognition User Dialogue Management Natural Language Generation Speech Synthesis

Adapting linguistic complexity (specifically, information density)

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 2 / 11

slide-7
SLIDE 7

Spoken Dialogue Systems

Natural Language Understanding Automatic Speech Recognition User Dialogue Management Natural Language Generation Speech Synthesis

Adapting linguistic complexity (specifically, information density) Due Fratelli serves average-priced Italian food, while Andalucia is a Spanish, seafood restaurant with moderately high prices.

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 2 / 11

slide-8
SLIDE 8

Spoken Dialogue Systems

Natural Language Understanding Automatic Speech Recognition User Dialogue Management Natural Language Generation Speech Synthesis

Adapting linguistic complexity (specifically, information density) Due Fratelli serves average-priced Italian food, while Andalucia is a Spanish, seafood restaurant with moderately high prices. Due Fratelli is an Italian restaurant. Its price is average. On the other hand, Andalucia is somewhat expensive. They serve Spanish, seafood there.

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 2 / 11

slide-9
SLIDE 9

Spoken Dialogue Systems

Natural Language Understanding Automatic Speech Recognition User Dialogue Management Natural Language Generation Speech Synthesis

Adapting linguistic complexity (specifically, information density) Due Fratelli serves average-priced Italian food, while Andalucia is a Spanish, seafood restaurant with moderately high prices. Due Fratelli is an Italian restaurant. Its price is average. On the other hand, Andalucia is somewhat expensive. They serve Spanish, seafood there. . . .

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 2 / 11

slide-10
SLIDE 10

Traditional & end-to-end approaches to NLG

name price cuisine Due Fratelli $$ Italian Andalucia $$$ Spanish, Seafood

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 3 / 11

slide-11
SLIDE 11

Traditional & end-to-end approaches to NLG

name price cuisine Due Fratelli $$ Italian Andalucia $$$ Spanish, Seafood Traditionally: we start writing rules...

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 3 / 11

slide-12
SLIDE 12

Traditional & end-to-end approaches to NLG

name price cuisine Due Fratelli $$ Italian Andalucia $$$ Spanish, Seafood Traditionally: we start writing rules... assert_cuisine(NAME, CUISINE)

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 3 / 11

slide-13
SLIDE 13

Traditional & end-to-end approaches to NLG

name price cuisine Due Fratelli $$ Italian Andalucia $$$ Spanish, Seafood Traditionally: we start writing rules... assert_cuisine(NAME, CUISINE) → “NAME is a CUISINE restaurant”

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 3 / 11

slide-14
SLIDE 14

Traditional & end-to-end approaches to NLG

name price cuisine Due Fratelli $$ Italian Andalucia $$$ Spanish, Seafood Traditionally: we start writing rules... assert_cuisine(NAME, CUISINE) → “NAME is a CUISINE restaurant” → “NAME serves CUISINE food”

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 3 / 11

slide-15
SLIDE 15

Traditional & end-to-end approaches to NLG

name price cuisine Due Fratelli $$ Italian Andalucia $$$ Spanish, Seafood Traditionally: we start writing rules... assert_cuisine(NAME, CUISINE) → “NAME is a CUISINE restaurant” → “NAME serves CUISINE food” Machine learning on meaning representations paired with output texts

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 3 / 11

slide-16
SLIDE 16

Traditional & end-to-end approaches to NLG

name price cuisine Due Fratelli $$ Italian Andalucia $$$ Spanish, Seafood Traditionally: we start writing rules... assert_cuisine(NAME, CUISINE) → “NAME is a CUISINE restaurant” → “NAME serves CUISINE food” Machine learning on meaning representations paired with output texts

◮ Semantic Parsing (Zettlemoyer & Collins 2005, inter alia)

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 3 / 11

slide-17
SLIDE 17

Traditional & end-to-end approaches to NLG

name price cuisine Due Fratelli $$ Italian Andalucia $$$ Spanish, Seafood Traditionally: we start writing rules... assert_cuisine(NAME, CUISINE) → “NAME is a CUISINE restaurant” → “NAME serves CUISINE food” Machine learning on meaning representations paired with output texts

◮ Semantic Parsing (Zettlemoyer & Collins 2005, inter alia) ◮ End-to-end Generation (Mairesse et al. 2010, Angeli et al. 2010, Konstas & Lapata 2013, Wen et al. 2015, i.a.)

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 3 / 11

slide-18
SLIDE 18

Traditional & end-to-end approaches to NLG

name price cuisine Due Fratelli $$ Italian Andalucia $$$ Spanish, Seafood Traditionally: we start writing rules... assert_cuisine(NAME, CUISINE) → “NAME is a CUISINE restaurant” → “NAME serves CUISINE food” Machine learning on meaning representations paired with output texts

◮ Semantic Parsing (Zettlemoyer & Collins 2005, inter alia) ◮ End-to-end Generation (Mairesse et al. 2010, Angeli et al. 2010, Konstas & Lapata 2013, Wen et al. 2015, i.a.)

Either way, we need a corpus with meaning representations!

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 3 / 11

slide-19
SLIDE 19

Discourse-level meaning representations

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 4 / 11

slide-20
SLIDE 20

Discourse-level meaning representations

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 4 / 11

slide-21
SLIDE 21

Discourse-level meaning representations

Due Fratelli is an Italian restaurant, while Andalucia is a Spanish seafood

  • restaurant. However, Due Fratelli’s price is average, while Andalucia’s price

is more expensive.

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 4 / 11

slide-22
SLIDE 22

Discourse-level meaning representations

Due Fratelli is an Italian restaurant, while Andalucia is a Spanish seafood

  • restaurant. However, Due Fratelli’s price is average, while Andalucia’s price

is more expensive.

◮ The SPaRKy Restaurant Corpus (Walker et al. 2007)

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 4 / 11

slide-23
SLIDE 23

Discourse-level meaning representations

Due Fratelli is an Italian restaurant, while Andalucia is a Spanish seafood

  • restaurant. However, Due Fratelli’s price is average, while Andalucia’s price

is more expensive.

◮ The SPaRKy Restaurant Corpus (Walker et al. 2007)

◮ 1800 texts from an NLG system Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 4 / 11

slide-24
SLIDE 24

Discourse-level meaning representations

Due Fratelli is an Italian restaurant, while Andalucia is a Spanish seafood

  • restaurant. However, Due Fratelli’s price is average, while Andalucia’s price

is more expensive.

◮ The SPaRKy Restaurant Corpus (Walker et al. 2007)

◮ 1800 texts from an NLG system ◮ discourse semantics, but limited variation Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 4 / 11

slide-25
SLIDE 25

Crowdsourced Corpora

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 5 / 11

slide-26
SLIDE 26

Crowdsourced Corpora

BAGEL Corpus (Mairesse et al. 2010)

◮ 404 utterances for 202 dialogue acts ◮ e.g. inform(name=DueFratelli;price=$$;cuisine=Italian)

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 5 / 11

slide-27
SLIDE 27

Crowdsourced Corpora

BAGEL Corpus (Mairesse et al. 2010)

◮ 404 utterances for 202 dialogue acts ◮ e.g. inform(name=DueFratelli;price=$$;cuisine=Italian)

SFX-restaurants (Wen et al. 2015)

◮ 5k utterances+DAs

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 5 / 11

slide-28
SLIDE 28

Crowdsourced Corpora

BAGEL Corpus (Mairesse et al. 2010)

◮ 404 utterances for 202 dialogue acts ◮ e.g. inform(name=DueFratelli;price=$$;cuisine=Italian)

SFX-restaurants (Wen et al. 2015)

◮ 5k utterances+DAs

Novikova et al. 2016

◮ 1243 utterances+DAs ◮ increased variation (image-based elicitation)

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 5 / 11

slide-29
SLIDE 29

Crowdsourced Corpora

BAGEL Corpus (Mairesse et al. 2010)

◮ 404 utterances for 202 dialogue acts ◮ e.g. inform(name=DueFratelli;price=$$;cuisine=Italian)

SFX-restaurants (Wen et al. 2015)

◮ 5k utterances+DAs

Novikova et al. 2016

◮ 1243 utterances+DAs ◮ increased variation (image-based elicitation)

E2E Challenge Dataset (Novikova et al. 2017)

◮ 50k utterances+DAs ◮ same (image-based) elicitation

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 5 / 11

slide-30
SLIDE 30

Building the corpus

Objective: the best of both worlds

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 6 / 11

slide-31
SLIDE 31

Building the corpus

Objective: the best of both worlds

1 discourse-level semantic representation

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 6 / 11

slide-32
SLIDE 32

Building the corpus

Objective: the best of both worlds

1 discourse-level semantic representation 2 with a good amount of variation

◮ esp. with respect to information density Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 6 / 11

slide-33
SLIDE 33

Building the corpus

Objective: the best of both worlds

1 discourse-level semantic representation 2 with a good amount of variation

◮ esp. with respect to information density

Method: collect paraphrases

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 6 / 11

slide-34
SLIDE 34

Building the corpus

Objective: the best of both worlds

1 discourse-level semantic representation 2 with a good amount of variation

◮ esp. with respect to information density

Method: collect paraphrases

◮ already have discourse-level semantics

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 6 / 11

slide-35
SLIDE 35

Building the corpus

Objective: the best of both worlds

1 discourse-level semantic representation 2 with a good amount of variation

◮ esp. with respect to information density

Method: collect paraphrases

◮ already have discourse-level semantics ◮ more variation than in the original SPaRKy corpus

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 6 / 11

slide-36
SLIDE 36

Building the corpus

Objective: the best of both worlds

1 discourse-level semantic representation 2 with a good amount of variation

◮ esp. with respect to information density

Method: collect paraphrases

◮ already have discourse-level semantics ◮ more variation than in the original SPaRKy corpus

2 conditions: default vs. elderly audience

We are adding variety to an existing dialogue system and we need your help! In this task, you will be given a text about one or more restaurants written by our existing system. Your job is to express the same facts, describing the restaurant(s) as you would describe them to your...

default: ...friends or family. elderly: ...85-year-old grandmother.

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 6 / 11

slide-37
SLIDE 37

Corpus Statistics

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 7 / 11

slide-38
SLIDE 38

Corpus Statistics

◮ about 5k texts, with discourse-level semantic annotations

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 7 / 11

slide-39
SLIDE 39

Corpus Statistics

◮ about 5k texts, with discourse-level semantic annotations ◮ significantly lower information density in the elderly condition

100 200 5.0 7.5 10.0 12.5

  • Avg. Surprisal (in bits; 30 bins)

Frequency .id

default elderly

Subjects use lower−surprisal sentences addressing grandma

Average surprisal across texts

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 7 / 11

slide-40
SLIDE 40

Examples (1)

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 8 / 11

slide-41
SLIDE 41

Examples (1)

One Italian restaurant is called Caffe Buon Gusto. However, John’s Pizzeria is an Italian pizza restaurant. Choose Caffe Buon Gusto if you desire a traditional Italian restaurant. Otherwise, try out John’s Pizzeria.

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 8 / 11

slide-42
SLIDE 42

Examples (1)

One Italian restaurant is called Caffe Buon Gusto. However, John’s Pizzeria is an Italian pizza restaurant. Choose Caffe Buon Gusto if you desire a traditional Italian restaurant. Otherwise, try out John’s Pizzeria.

  • cf. Caffe Buon Gusto is an Italian restaurant. John’s Pizzeria, on the other hand,

is an Italian, Pizza restaurant.

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 8 / 11

slide-43
SLIDE 43

Examples (2)

Chez Joesphine is the best choice because of food quality, service and decor. Hands down, Chez Josephine has the best quality food out of all of these restaurants. Employees are always happy to help you and the atmosphere is fantastic.

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 9 / 11

slide-44
SLIDE 44

Examples (2)

Chez Joesphine is the best choice because of food quality, service and decor. Hands down, Chez Josephine has the best quality food out of all of these restaurants. Employees are always happy to help you and the atmosphere is fantastic.

  • cf. Chez Josephine has the best overall quality among the selected restaurants. It

has very good service, with very good decor. It has very good food quality.

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 9 / 11

slide-45
SLIDE 45

Summary

We built a corpus which includes:

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 10 / 11

slide-46
SLIDE 46

Summary

We built a corpus which includes:

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 10 / 11

slide-47
SLIDE 47

Summary

We built a corpus which includes:

◮ variation with respect to information density, and

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 10 / 11

slide-48
SLIDE 48

Summary

We built a corpus which includes:

◮ variation with respect to information density, and ◮ a hierarchical semantic annotation.

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 10 / 11

slide-49
SLIDE 49

Summary

We built a corpus which includes:

◮ variation with respect to information density, and ◮ a hierarchical semantic annotation.

Next step:

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 10 / 11

slide-50
SLIDE 50

Summary

We built a corpus which includes:

◮ variation with respect to information density, and ◮ a hierarchical semantic annotation.

Next step:

◮ Learning NLG rules trained on this corpus

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 10 / 11

slide-51
SLIDE 51

Summary

We built a corpus which includes:

◮ variation with respect to information density, and ◮ a hierarchical semantic annotation.

Next step:

◮ Learning NLG rules trained on this corpus

This work was supported by the DFG through SFB 1102 ’Information Density and Linguistic Encoding’.

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 10 / 11

slide-52
SLIDE 52

Summary

We built a corpus which includes:

◮ variation with respect to information density, and ◮ a hierarchical semantic annotation.

Next step:

◮ Learning NLG rules trained on this corpus

This work was supported by the DFG through SFB 1102 ’Information Density and Linguistic Encoding’. Corpus release coming in September! Watch http://bit.ly/howcroft_interspeech_2017

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 10 / 11

slide-53
SLIDE 53

Summary

We built a corpus which includes:

◮ variation with respect to information density, and ◮ a hierarchical semantic annotation.

Next step:

◮ Learning NLG rules trained on this corpus

This work was supported by the DFG through SFB 1102 ’Information Density and Linguistic Encoding’. Corpus release coming in September! Watch http://bit.ly/howcroft_interspeech_2017

Thank you!

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 10 / 11

slide-54
SLIDE 54

Did we achieve greater lexical variety?

corpus # texts Vocabulary BAGEL 404 74 SFX-restaurant 5192 353 Novikova et al. 1243 238 Original SRC 1760 99 Extended SRC 5356 577

Table: Vocabulary diversity and corpus size

Howcroft, Klakow, Demberg (UdS) Extended SPaRKy Restaurant Corpus Interspeech 2017 11 / 11