A Context-aware Natural Language Generation Dataset for Dialogue - - PowerPoint PPT Presentation

a context aware natural language generation dataset for
SMART_READER_LITE
LIVE PREVIEW

A Context-aware Natural Language Generation Dataset for Dialogue - - PowerPoint PPT Presentation

. . . . . . . . . . . . . . A Context-aware Natural Language Generation Dataset for Dialogue Systems Ondej Duek and Filip Jurek Institute of Formal and Applied Linguistics Charles University in Prague May 28, 2016 LREC


slide-1
SLIDE 1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

A Context-aware Natural Language Generation Dataset for Dialogue Systems

Ondřej Dušek and Filip Jurčíček

Institute of Formal and Applied Linguistics Charles University in Prague

May 28, 2016 LREC RE-WOCHAT workshop

1/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-2
SLIDE 2

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction

Introduction

  • A new NLG dataset for dialogue systems
  • English public transport domain
  • “Ordinary” NLG dataset (in our setting):
  • input DA (meaning) + natural language sentence(s)
  • Our set:
  • input DA + natural language sentences + preceding context
  • If the generator knows how the user asked, it should be able to

produce a more natural response

2/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-3
SLIDE 3

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction

Introduction

  • A new NLG dataset for dialogue systems
  • English public transport domain
  • “Ordinary” NLG dataset (in our setting):
  • input DA (meaning) + natural language sentence(s)
  • Our set:
  • input DA + natural language sentences + preceding context
  • If the generator knows how the user asked, it should be able to

produce a more natural response

2/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

inform(from_stop="Fulton Street", vehicle=bus, direction="Rector Street", departure_time=9:13pm, line=M21) Go by the 9:13pm bus on the M21 line from Fulton Street directly to Rector Street

slide-4
SLIDE 4

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction

Introduction

  • A new NLG dataset for dialogue systems
  • English public transport domain
  • “Ordinary” NLG dataset (in our setting):
  • input DA (meaning) + natural language sentence(s)
  • Our set:
  • input DA + natural language sentences + preceding context
  • If the generator knows how the user asked, it should be able to

produce a more natural response

2/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

I'm headed to Rector Street inform(from_stop="Fulton Street", vehicle=bus, direction="Rector Street", departure_time=9:13pm, line=M21) Go by the 9:13pm bus on the M21 line from Fulton Street directly to Rector Street

NEW→

slide-5
SLIDE 5

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction

Introduction

  • A new NLG dataset for dialogue systems
  • English public transport domain
  • “Ordinary” NLG dataset (in our setting):
  • input DA (meaning) + natural language sentence(s)
  • Our set:
  • input DA + natural language sentences + preceding context
  • If the generator knows how the user asked, it should be able to

produce a more natural response

2/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

I'm headed to Rector Street inform(from_stop="Fulton Street", vehicle=bus, direction="Rector Street", departure_time=9:13pm, line=M21) Go by the 9:13pm bus on the M21 line from Fulton Street directly to Rector Street

slide-6
SLIDE 6

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction

Introduction

  • A new NLG dataset for dialogue systems
  • English public transport domain
  • “Ordinary” NLG dataset (in our setting):
  • input DA (meaning) + natural language sentence(s)
  • Our set:
  • input DA + natural language sentences + preceding context
  • If the generator knows how the user asked, it should be able to

produce a more natural response

2/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

I'm headed to Rector Street inform(from_stop="Fulton Street", vehicle=bus, direction="Rector Street", departure_time=9:13pm, line=M21) Heading to Rector Street from Fulton Street, take a bus line M21 at 9:13pm.

slide-7
SLIDE 7

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction

Outline of this talk

  • 1. Why should we look at preceding context: entrainment
  • 2. How to obtain naturally looking contextual data
  • collecting our set
  • 3. A summary of the collected set

3/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-8
SLIDE 8

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction

Outline of this talk

  • 1. Why should we look at preceding context: entrainment
  • 2. How to obtain naturally looking contextual data
  • collecting our set
  • 3. A summary of the collected set

3/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-9
SLIDE 9

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction

Outline of this talk

  • 1. Why should we look at preceding context: entrainment
  • 2. How to obtain naturally looking contextual data
  • collecting our set
  • 3. A summary of the collected set

3/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-10
SLIDE 10

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Motivation

Why context? Entrainment

Entrainment/alignment/adaptation in dialogue

  • “Mutual linguistic convergence”
  • speakers primed (influenced) by previously said
  • Reusing words and syntax
  • Occurs naturally, subconscious
  • Found to help dialogue success (Friedberg et al. ‘12)

Entrainment in dialogue systems

  • Several experiments, successful

(Lopes et al. ‘13, ‘15; He et al. ‘14)

  • Limited, partially or completely rule-based

4/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-11
SLIDE 11

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Motivation

Why context? Entrainment

Entrainment/alignment/adaptation in dialogue

  • “Mutual linguistic convergence”
  • speakers primed (influenced) by previously said
  • Reusing words and syntax
  • Occurs naturally, subconscious
  • Found to help dialogue success (Friedberg et al. ‘12)

Entrainment in dialogue systems

  • Several experiments, successful

(Lopes et al. ‘13, ‘15; He et al. ‘14)

  • Limited, partially or completely rule-based

4/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-12
SLIDE 12

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Motivation

Why context? Entrainment

Entrainment/alignment/adaptation in dialogue

  • “Mutual linguistic convergence”
  • speakers primed (influenced) by previously said
  • Reusing words and syntax
  • Occurs naturally, subconscious
  • Found to help dialogue success (Friedberg et al. ‘12)

Entrainment in dialogue systems

  • Several experiments, successful

(Lopes et al. ‘13, ‘15; He et al. ‘14)

  • Limited, partially or completely rule-based

4/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-13
SLIDE 13

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Motivation

Why context? Entrainment

Entrainment/alignment/adaptation in dialogue

  • “Mutual linguistic convergence”
  • speakers primed (influenced) by previously said
  • Reusing words and syntax
  • Occurs naturally, subconscious
  • Found to help dialogue success (Friedberg et al. ‘12)

Entrainment in dialogue systems

  • Several experiments, successful

(Lopes et al. ‘13, ‘15; He et al. ‘14)

  • Limited, partially or completely rule-based

4/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

how bout the next ride Sorry, I did not find a later option. I'm sorry, the next ride was not found.

slide-14
SLIDE 14

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Motivation

Why context? Entrainment

Entrainment/alignment/adaptation in dialogue

  • “Mutual linguistic convergence”
  • speakers primed (influenced) by previously said
  • Reusing words and syntax
  • Occurs naturally, subconscious
  • Found to help dialogue success (Friedberg et al. ‘12)

Entrainment in dialogue systems

  • Several experiments, successful

(Lopes et al. ‘13, ‘15; He et al. ‘14)

  • Limited, partially or completely rule-based

4/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

how bout the next ride Sorry, I did not find a later option. I'm sorry, the next ride was not found. what is the distance of this trip The .... trip covers a ......... distance of 10.4 miles. It is around 10.4 miles. The distance is 10.4 miles.

slide-15
SLIDE 15

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Motivation

Why context? Entrainment

Entrainment/alignment/adaptation in dialogue

  • “Mutual linguistic convergence”
  • speakers primed (influenced) by previously said
  • Reusing words and syntax
  • Occurs naturally, subconscious
  • Found to help dialogue success (Friedberg et al. ‘12)

Entrainment in dialogue systems

  • Several experiments, successful

(Lopes et al. ‘13, ‘15; He et al. ‘14)

  • Limited, partially or completely rule-based

4/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-16
SLIDE 16

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Motivation

Why context? Entrainment

Entrainment/alignment/adaptation in dialogue

  • “Mutual linguistic convergence”
  • speakers primed (influenced) by previously said
  • Reusing words and syntax
  • Occurs naturally, subconscious
  • Found to help dialogue success (Friedberg et al. ‘12)

Entrainment in dialogue systems

  • Several experiments, successful

(Lopes et al. ‘13, ‘15; He et al. ‘14)

  • Limited, partially or completely rule-based

4/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-17
SLIDE 17

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Motivation

What we aim for

  • Fully trainable NLG that allows entrainment
  • let the system adapt to users’ words and syntax
  • let the data handle the rules
  • We hope for:
  • more natural system responses
  • possibly higher task success
  • applicability to other domains, chat-based systems
  • We need training data
  • …that is why we collected this dataset!

5/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-18
SLIDE 18

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Motivation

What we aim for

  • Fully trainable NLG that allows entrainment
  • let the system adapt to users’ words and syntax
  • let the data handle the rules
  • We hope for:
  • more natural system responses
  • possibly higher task success
  • applicability to other domains, chat-based systems
  • We need training data
  • …that is why we collected this dataset!

5/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-19
SLIDE 19

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Motivation

What we aim for

  • Fully trainable NLG that allows entrainment
  • let the system adapt to users’ words and syntax
  • let the data handle the rules
  • We hope for:
  • more natural system responses
  • possibly higher task success
  • applicability to other domains, chat-based systems
  • We need training data
  • …that is why we collected this dataset!

5/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-20
SLIDE 20

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Motivation

What we aim for

  • Fully trainable NLG that allows entrainment
  • let the system adapt to users’ words and syntax
  • let the data handle the rules
  • We hope for:
  • more natural system responses
  • possibly higher task success
  • applicability to other domains, chat-based systems
  • We need training data
  • …that is why we collected this dataset!

5/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-21
SLIDE 21

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Collecting the set

Our domain

  • Alex English SDS for NYC public transport
  • https://github.com/UFAL-DSG/alex
  • Bus/subway services on Manhattan
  • Alex can do more, limited just for this set
  • Users ask for a schedule, may request details/modify search
  • 13 slots
  • from_stop, to_stop
  • departure_time
  • vehicle
  • duration

6/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-22
SLIDE 22

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Collecting the set

Our domain

  • Alex English SDS for NYC public transport
  • https://github.com/UFAL-DSG/alex
  • Bus/subway services on Manhattan
  • Alex can do more, limited just for this set
  • Users ask for a schedule, may request details/modify search
  • 13 slots
  • from_stop, to_stop
  • departure_time
  • vehicle
  • duration

6/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-23
SLIDE 23

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Collecting the set

Our domain

  • Alex English SDS for NYC public transport
  • https://github.com/UFAL-DSG/alex
  • Bus/subway services on Manhattan
  • Alex can do more, limited just for this set
  • Users ask for a schedule, may request details/modify search
  • 13 slots
  • from_stop, to_stop
  • departure_time
  • vehicle
  • duration

6/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-24
SLIDE 24

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Collecting the set

Our domain

  • Alex English SDS for NYC public transport
  • https://github.com/UFAL-DSG/alex
  • Bus/subway services on Manhattan
  • Alex can do more, limited just for this set
  • Users ask for a schedule, may request details/modify search
  • 13 slots
  • from_stop, to_stop
  • departure_time
  • vehicle
  • duration

6/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-25
SLIDE 25

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Collecting the set

Collecting the set

Getting natural utterances cheap and fast

  • Using crowdsourcing (CrowdFlower)

Addressing data sparsity

  • Delexicalization (places, times etc.

“X”)

  • both context and response
  • Limiting context to previous sentence
  • likely to have the strongest entrainment impact

Collection progress

  • 1. Get natural user utterances in calls to a live dialogue system
  • 2. Generate response DA
  • 3. Collect natural language paraphrases

7/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-26
SLIDE 26

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Collecting the set

Collecting the set

Getting natural utterances cheap and fast

  • Using crowdsourcing (CrowdFlower)

Addressing data sparsity

  • Delexicalization (places, times etc. → “X”)
  • both context and response
  • Limiting context to previous sentence
  • likely to have the strongest entrainment impact

Collection progress

  • 1. Get natural user utterances in calls to a live dialogue system
  • 2. Generate response DA
  • 3. Collect natural language paraphrases

7/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-27
SLIDE 27

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Collecting the set

Collecting the set

Getting natural utterances cheap and fast

  • Using crowdsourcing (CrowdFlower)

Addressing data sparsity

  • Delexicalization (places, times etc. → “X”)
  • both context and response
  • Limiting context to previous sentence
  • likely to have the strongest entrainment impact

Collection progress

  • 1. Get natural user utterances in calls to a live dialogue system
  • 2. Generate response DA
  • 3. Collect natural language paraphrases

7/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-28
SLIDE 28

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Collecting the set

Collecting the set

Getting natural utterances cheap and fast

  • Using crowdsourcing (CrowdFlower)

Addressing data sparsity

  • Delexicalization (places, times etc. → “X”)
  • both context and response
  • Limiting context to previous sentence
  • likely to have the strongest entrainment impact

Collection progress

  • 1. Get natural user utterances in calls to a live dialogue system
  • 2. Generate response DA
  • 3. Collect natural language paraphrases

7/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-29
SLIDE 29

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Collecting the set

Collecting the set

Getting natural utterances cheap and fast

  • Using crowdsourcing (CrowdFlower)

Addressing data sparsity

  • Delexicalization (places, times etc. → “X”)
  • both context and response
  • Limiting context to previous sentence
  • likely to have the strongest entrainment impact

Collection progress

  • 1. Get natural user utterances in calls to a live dialogue system
  • 2. Generate response DA
  • 3. Collect natural language paraphrases

7/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-30
SLIDE 30

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Collecting the set

Collecting the set

Getting natural utterances cheap and fast

  • Using crowdsourcing (CrowdFlower)

Addressing data sparsity

  • Delexicalization (places, times etc. → “X”)
  • both context and response
  • Limiting context to previous sentence
  • likely to have the strongest entrainment impact

Collection progress

  • 1. Get natural user utterances in calls to a live dialogue system
  • 2. Generate response DA
  • 3. Collect natural language paraphrases

7/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-31
SLIDE 31

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Collecting the set

Collecting the set

Getting natural utterances cheap and fast

  • Using crowdsourcing (CrowdFlower)

Addressing data sparsity

  • Delexicalization (places, times etc. → “X”)
  • both context and response
  • Limiting context to previous sentence
  • likely to have the strongest entrainment impact

Collection progress

  • 1. Get natural user utterances in calls to a live dialogue system
  • 2. Generate response DA
  • 3. Collect natural language paraphrases

7/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-32
SLIDE 32

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Collecting the set

Getting natural user requests

  • 1. Record calls to live Alex SDS
  • assign tasks to people on CrowdFlower
  • varying synonyms in task description
  • people unaware that wording is important
  • 2. Manually transcribe on CrowdFlower
  • 3. Parse using Alex handcrafued SLU
  • parsing transcriptions gives better results than ASR n-best lists

8/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-33
SLIDE 33

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Collecting the set

Getting natural user requests

  • 1. Record calls to live Alex SDS
  • assign tasks to people on CrowdFlower
  • varying synonyms in task description
  • people unaware that wording is important
  • 2. Manually transcribe on CrowdFlower
  • 3. Parse using Alex handcrafued SLU
  • parsing transcriptions gives better results than ASR n-best lists

8/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

You want a connection – your departure stop is Marble Hill, and you want to go to Roosevelt Island. Ask how long the journey will take. Ask about a schedule

  • afuerwards. Then modify your query: Ask for a ride at six o'clock in the evening.

Ask for a connection by bus. Do as if you changed your mind: Say that your destination stop is City Hall. You are searching for transit options leaving from Houston Street with the destination of Marble Hill. When you are offered a schedule, ask about the time

  • f arrival at your destination. Then ask for a connection afuer that. Modify your

query: Request information about an alternative at six p.m. and state that you prefer to go by bus. Tell the system that you want to travel from Park Place to Inwood. When you are

  • ffered a trip, ask about the time needed. Then ask for another alternative.

Change your search: Ask about a ride at 6 o'clock p.m. and tell the system that you would rather use the bus.

slide-34
SLIDE 34

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Collecting the set

Getting natural user requests

  • 1. Record calls to live Alex SDS
  • assign tasks to people on CrowdFlower
  • varying synonyms in task description
  • people unaware that wording is important
  • 2. Manually transcribe on CrowdFlower
  • 3. Parse using Alex handcrafued SLU
  • parsing transcriptions gives better results than ASR n-best lists

8/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-35
SLIDE 35

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Collecting the set

Getting natural user requests

  • 1. Record calls to live Alex SDS
  • assign tasks to people on CrowdFlower
  • varying synonyms in task description
  • people unaware that wording is important
  • 2. Manually transcribe on CrowdFlower
  • 3. Parse using Alex handcrafued SLU
  • parsing transcriptions gives better results than ASR n-best lists

8/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-36
SLIDE 36

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Collecting the set

Getting natural user requests

  • 1. Record calls to live Alex SDS
  • assign tasks to people on CrowdFlower
  • varying synonyms in task description
  • people unaware that wording is important
  • 2. Manually transcribe on CrowdFlower
  • 3. Parse using Alex handcrafued SLU
  • parsing transcriptions gives better results than ASR n-best lists

8/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-37
SLIDE 37

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Collecting the set

Generating response DA

  • Handcrafued simple rule-based bigram policy
  • All possible replies for a single context utterance
  • confirmation
  • answer
  • apology
  • request for additional information
  • In a real dialogue, the correct reply would depend on longer

history, but here we try them all

9/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-38
SLIDE 38

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Collecting the set

Generating response DA

  • Handcrafued simple rule-based bigram policy
  • All possible replies for a single context utterance
  • confirmation
  • answer
  • apology
  • request for additional information
  • In a real dialogue, the correct reply would depend on longer

history, but here we try them all

9/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

what about a connection by bus iconfirm(vehicle=bus) inform(from_stop="Dyckman Street", direction="Park Place", vehicle=bus, line=M103, departure_time=7:05pm) inform_no_match(vehicle=bus) request(to_stop)

slide-39
SLIDE 39

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Collecting the set

Generating response DA

  • Handcrafued simple rule-based bigram policy
  • All possible replies for a single context utterance
  • confirmation
  • answer
  • apology
  • request for additional information
  • In a real dialogue, the correct reply would depend on longer

history, but here we try them all

9/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

what about a connection by bus iconfirm(vehicle=bus) inform(from_stop="Dyckman Street", direction="Park Place", vehicle=bus, line=M103, departure_time=7:05pm) inform_no_match(vehicle=bus) request(to_stop)

slide-40
SLIDE 40

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Collecting the set

Generating response DA

  • Handcrafued simple rule-based bigram policy
  • All possible replies for a single context utterance
  • confirmation
  • answer
  • apology
  • request for additional information
  • In a real dialogue, the correct reply would depend on longer

history, but here we try them all

9/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-41
SLIDE 41

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Collecting the set

Collecting natural language responses

  • CrowdFlower interface
  • Context displayed at hand
  • Minimal slot name description
  • Short instructions
  • Checks: contents, spelling; automatic + manual
  • ca. 20% overhead (repeated submissions)

10/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-42
SLIDE 42

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Collecting the set

Collecting natural language responses

  • CrowdFlower interface
  • Context displayed at hand
  • Minimal slot name description
  • Short instructions
  • Checks: contents, spelling; automatic + manual
  • ca. 20% overhead (repeated submissions)

10/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-43
SLIDE 43

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Collecting the set

Collecting natural language responses

  • CrowdFlower interface
  • Context displayed at hand
  • Minimal slot name description
  • Short instructions
  • Checks: contents, spelling; automatic + manual
  • ca. 20% overhead (repeated submissions)

10/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-44
SLIDE 44

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Collecting the set

Collecting natural language responses

  • CrowdFlower interface
  • Context displayed at hand
  • Minimal slot name description
  • Short instructions
  • Checks: contents, spelling; automatic + manual
  • ca. 20% overhead (repeated submissions)

10/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-45
SLIDE 45

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Collecting the set

Collecting natural language responses

  • CrowdFlower interface
  • Context displayed at hand
  • Minimal slot name description
  • Short instructions
  • Checks: contents, spelling; automatic + manual
  • ca. 20% overhead (repeated submissions)

10/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-46
SLIDE 46

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Collecting the set

Collecting natural language responses

  • CrowdFlower interface
  • Context displayed at hand
  • Minimal slot name description
  • Short instructions
  • Checks: contents, spelling; automatic + manual
  • ca. 20% overhead (repeated submissions)

10/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-47
SLIDE 47

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Collected Dataset

Dataset summary

Size

total response paraphrases 5,577 unique (delex.) context + response DA 1,859 unique (delex.) context 552 unique (delex.) context with min. 2 occurrences 119 unique response DA 83 unique response DA types 6 unique slots 13

Entrainment

Syntactic ∼59% Lexical ∼31% Both ∼19%

11/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

  • subjective, based on word & phrase reuse,

word order, pronouns

slide-48
SLIDE 48

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Thank you for your attention

Dataset available for download

  • JSON + CSV
  • CC BY-SA 4.0
  • GitHub: bit.ly/nlgdata (link given in the paper)

Contact us

Ondřej Dušek & Filip Jurčíček Charles University in Prague

  • dusek@ufal.mff.cuni.cz

12/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems

slide-49
SLIDE 49

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

References

Friedberg et al. (2012). Lexical entrainment and success in student engineering groups. SLT, pp. 404–409. Hu et al. (2014). Entrainment in pedestrian direction giving: How many kinds of entrainment. IWSDS, pp. 90–101. Lopes et al. (2013). Automated two-way entrainment to improve spoken dialog system performance. ICASSP, pp. 8372–8376. Lopes et al. (2015). From rule-based to data-driven lexical entrainment models in spoken dialog systems. Computer Speech & Language, 31(1):87–112.

13/ 13 Ondřej Dušek & Filip Jurčíček A Context-aware NLG Dataset for Dialogue Systems