Error Analysis Applied to End-to-End Spoken Language Understanding - - PowerPoint PPT Presentation

error analysis applied to end to end spoken language
SMART_READER_LITE
LIVE PREVIEW

Error Analysis Applied to End-to-End Spoken Language Understanding - - PowerPoint PPT Presentation

Antoine Caubrire, Sahar Ghannay, Natalia Tomashenko, Renato De Mori, Antoine Laurent, Emmanuel Morin, Yannick Estve ICASSP - May 2020 Error Analysis Applied to End-to-End Spoken Language Understanding Introduction Context Analysis our


slide-1
SLIDE 1

Error Analysis Applied to End-to-End Spoken Language Understanding

Antoine Caubrière, Sahar Ghannay, Natalia Tomashenko, Renato De Mori, Antoine Laurent, Emmanuel Morin, Yannick Estève ICASSP - May 2020

slide-2
SLIDE 2

ICASSP 2020

Context Analysis our End-to-End (E2E) Spoken Language Understanding (SLU) system This system reaches state-of-the-art performance for a french SLU task

1

  • A. Caubrière et al.

Introduction

slide-3
SLIDE 3

ICASSP 2020

Context Analysis our End-to-End (E2E) Spoken Language Understanding (SLU) system This system reaches state-of-the-art performance for a french SLU task Goal Analyze the errors produced by the system Understand the weakness of this E2E system From the weakness, discover how to improve our approach

1

  • A. Caubrière et al.

Introduction

slide-4
SLIDE 4

ICASSP 2020

Deep Speech 2 (DS2) [Amodei et al.] (2016) End-to-end speech recognition system Connectionist Temporal Classification (CTC) Allow the system to learn the alignment between speech and output sequence to produce

2

  • A. Caubrière et al.

Analysed system

slide-5
SLIDE 5

ICASSP 2020

Deep Speech 2 (DS2) [Amodei et al.] (2016) End-to-end speech recognition system Connectionist Temporal Classification (CTC) Allow the system to learn the alignment between speech and output sequence to produce End-to-End Spoken Language Understanding (SLU) [Ghannay et al.] (2018)

2

  • A. Caubrière et al.

Analysed system

Tag’s boundaries injection ASR : The sculptor Caesar died yesterday in Paris at the age of seventy-seven years NER : The sculptor <pers Caesar > died <time yesterday > in <loc Paris > at the age of <amount seventy-seven years >

slide-6
SLIDE 6

ICASSP 2020

Deep Speech 2 (DS2) [Amodei et al.] (2016) End-to-end speech recognition system Connectionist Temporal Classification (CTC) Allow the system to learn the alignment between speech and output sequence to produce End-to-End Spoken Language Understanding (SLU) [Ghannay et al.] (2018) Curriculum-based transfer learning (CTL) [Caubrière et al.] (2019) Train the same model through a sequence of training processes and transfer learning Keep all parameters except the top layer Use of different tasks sorted from the most generic to the most specific

2

  • A. Caubrière et al.

Analysed system

Tag’s boundaries injection ASR : The sculptor Caesar died yesterday in Paris at the age of seventy-seven years NER : The sculptor <pers Caesar > died <time yesterday > in <loc Paris > at the age of <amount seventy-seven years >

slide-7
SLIDE 7

Automatic Speech Recognition (ASR)

  • A. Caubrière et al.

Learned tasks

3 ICASSP 2020

slide-8
SLIDE 8

Automatic Speech Recognition (ASR) Named Entity Recognition (NER) Annotation according to 8 entity-types (pers, loc, amount, etc)

  • A. Caubrière et al.

Learned tasks

3 ICASSP 2020

slide-9
SLIDE 9

Automatic Speech Recognition (ASR) Named Entity Recognition (NER) Annotation according to 8 entity-types (pers, loc, amount, etc) Merged semantic concepts extraction (SC_mer) MEDIA: French hotel booking task PORTMEDIA: French theater ticket booking task Annotation according to 76 semantic concepts (location-town, stay-nbNight, nb-reservation, etc)

  • A. Caubrière et al.

Learned tasks

3 ICASSP 2020

slide-10
SLIDE 10

Automatic Speech Recognition (ASR) Named Entity Recognition (NER) Annotation according to 8 entity-types (pers, loc, amount, etc) Merged semantic concepts extraction (SC_mer) MEDIA: French hotel booking task PORTMEDIA: French theater ticket booking task Annotation according to 76 semantic concepts (location-town, stay-nbNight, nb-reservation, etc) Semantic concepts extraction on MEDIA (M) Our target task

  • A. Caubrière et al.

Learned tasks

3 ICASSP 2020

slide-11
SLIDE 11

Automatic Speech Recognition (ASR) Named Entity Recognition (NER) Annotation according to 8 entity-types (pers, loc, amount, etc) Merged semantic concepts extraction (SC_mer) MEDIA: French hotel booking task PORTMEDIA: French theater ticket booking task Annotation according to 76 semantic concepts (location-town, stay-nbNight, nb-reservation, etc) Semantic concepts extraction on MEDIA (M) Our target task Order of learned tasks We define the following order of specificity: Speech > Named Entities > Semantic Concepts

  • A. Caubrière et al.

amount stay-nbNight nb-reservation

Learned tasks

3 ICASSP 2020

slide-12
SLIDE 12

ICASSP 2020

  • A. Caubrière et al.

Data

4

French data sets Uses of as much data as possible at our disposal Broadcast news, Telephone and Human-Human Dialogue

slide-13
SLIDE 13

ICASSP 2020

  • A. Caubrière et al.

Data

4

Speech ~ 360h French data sets Uses of as much data as possible at our disposal Broadcast news, Telephone and Human-Human Dialogue

slide-14
SLIDE 14

ICASSP 2020

  • A. Caubrière et al.

Data

4

NE ~ 300h Speech ~ 360h French data sets Uses of as much data as possible at our disposal Broadcast news, Telephone and Human-Human Dialogue

slide-15
SLIDE 15

ICASSP 2020

  • A. Caubrière et al.

Data

4

NE ~ 300h Speech ~ 360h SC ~ 25h French data sets Uses of as much data as possible at our disposal Broadcast news, Telephone and Human-Human Dialogue

slide-16
SLIDE 16

ICASSP 2020 5

  • A. Caubrière et al.

(ASR) Character sequence (ASR) Character sequence

Generic concepts Specific concepts

CTL approach

softmax bLSTM CNN FC

slide-17
SLIDE 17

ICASSP 2020 5

  • A. Caubrière et al.

(ASR) Character sequence (ASR) Character sequence (NER) Character sequence & named entity

Generic concepts Specific concepts

Keep

CTL approach

softmax bLSTM CNN FC

slide-18
SLIDE 18

ICASSP 2020 5

  • A. Caubrière et al.

(ASR) Character sequence (ASR) Character sequence (NER) Character sequence & named entity (SC_mer) Character sequence & merged semantic concepts

Generic concepts Specific concepts

Keep Keep Reset

CTL approach

softmax bLSTM CNN FC

slide-19
SLIDE 19

ICASSP 2020 5

  • A. Caubrière et al.

(ASR) Character sequence (ASR) Character sequence (NER) Character sequence & named entity (SC_mer) Character sequence & merged semantic concepts (M) Character sequence & target semantic concepts

Generic concepts Specific concepts

Keep Keep Keep Reset Reset Reset

CTL approach

softmax bLSTM CNN FC

slide-20
SLIDE 20

ICASSP 2020 6

  • A. Caubrière et al.

Errors distribution

Systems outputs for MEDIA concepts (development dataset) The thirty most common mistakes

slide-21
SLIDE 21

ICASSP 2020 6

  • A. Caubrière et al.

Errors characteristics Concepts deletion errors mainly Represented by a few concepts Frequent errors corresponding to concepts with small values (connectprop, lienref-coref, ...)

Errors distribution

Systems outputs for MEDIA concepts (development dataset) The thirty most common mistakes

slide-22
SLIDE 22

ICASSP 2020 6

  • A. Caubrière et al.

Errors characteristics Concepts deletion errors mainly Represented by a few concepts Frequent errors corresponding to concepts supported by small words (connectprop, lienref-coref, ...)

  • > “I'd like to know <lienref-coref the > <object price > of the night

<connectprop and > if there are any <object rooms > left”

Errors distribution

Systems outputs for MEDIA concepts (development dataset) The thirty most common mistakes

slide-23
SLIDE 23

ICASSP 2020 7

  • A. Caubrière et al.

Transcription problem

Cases of concepts deletions (MEDIA development dataset)

slide-24
SLIDE 24

ICASSP 2020 7

  • A. Caubrière et al.

Transcription problem

Cases of concepts deletions (MEDIA development dataset) Reference of an example

  • > “From <time-date nineteen > to <time-date twenty-two > october <connectprop and >

in <location-town Périgueux >”

slide-25
SLIDE 25

ICASSP 2020 7

  • A. Caubrière et al.

Transcription problem

Cases of concepts deletions (MEDIA development dataset) Reference of an example

  • > “From <time-date nineteen > to <time-date twenty-two > october <connectprop and >

in <location-town Périgueux >” Correct automatic transcription

  • > “From <time-date nineteen > to <time-date twenty-two > october and in <location-town Périgueux >”
slide-26
SLIDE 26

ICASSP 2020 7

  • A. Caubrière et al.

Transcription problem

Cases of concepts deletions (MEDIA development dataset) Reference of an example

  • > “From <time-date nineteen > to <time-date twenty-two > october <connectprop and >

in <location-town Périgueux >” Correct automatic transcription

  • > “From <time-date nineteen > to <time-date twenty-two > october and in <location-town Périgueux >”

Incorrect automatic transcription

  • > “From <time-date nineteen > to <time-date twenty-two > october <connectprop and > par lieu ”
slide-27
SLIDE 27

ICASSP 2020 7

  • A. Caubrière et al.

Transcription problem

Cases of concepts deletions (MEDIA development dataset) Reference of an example

  • > “From <time-date nineteen > to <time-date twenty-two > october <connectprop and >

in <location-town Périgueux >” Correct automatic transcription

  • > “From <time-date nineteen > to <time-date twenty-two > october and in <location-town Périgueux >”

Incorrect automatic transcription

  • > “From <time-date nineteen > to <time-date twenty-two > october <connectprop and > par lieu ”

Correct automatic transcription but the value is nested in another concept

  • > “From <time-date nineteen > to <time-date twenty-two > october <location-town and in Périgueux >”
slide-28
SLIDE 28

ICASSP 2020 8

  • A. Caubrière et al.

Transcription problem

Focused concept Nb Deletion Correct ASR Wrong ASR Nested connectProp

39 28 6 5

lienref-coref

33 19 10 4

  • bjet

38 31 4 3

slide-29
SLIDE 29

ICASSP 2020 8

  • A. Caubrière et al.

Transcription problem

Extra observation Regularly ended tags without any associated started tags

  • >

“I'd like to know the > <object price > of the night” A concept segmentation issue Focused concept Nb Deletion Correct ASR Wrong ASR Nested connectProp

39 28 6 5

lienref-coref

33 19 10 4

  • bjet

38 31 4 3

slide-30
SLIDE 30

ICASSP 2020 9

  • A. Caubrière et al.

Segmentation problem

Tackle the concept segmentation issue Split the final MEDIA task into two tasks in the CTL approach Firstly, train the system to retrieve the boundaries of concepts only (M_seg) Secondly, train the system to specify the concepts (classical M task) CTL approach become : ASR -> NER -> SC_mer -> M_seg -> M

slide-31
SLIDE 31

ICASSP 2020 9

  • A. Caubrière et al.

Segmentation problem

Tackle the concept segmentation issue Split the final MEDIA task into two tasks in the CTL approach Firstly, train the system to retrieve the boundaries of concepts only (M_seg) Secondly, train the system to specify the concepts (classical M task) CTL approach become : ASR -> NER -> SC_mer -> M_seg -> M Outputs to produce for M_seg Replace each starting tag by a generic ‘<’ M : “I'd like to know <lienref-coref the > <object price > of the night <connectprop and > if there are any <object rooms > left” M_seg : “I'd like to know < the > < price > of the night < and > if there are any < rooms > left”

slide-32
SLIDE 32

ICASSP 2020 10

  • A. Caubrière et al.

Segmentation problem

Concept Value Error Rate (CVER) Evaluates concepts and values (Words within the concepts) Concept Error Rate (CER) Evaluates concepts only

slide-33
SLIDE 33

ICASSP 2020 10

  • A. Caubrière et al.

Segmentation problem

System CER* CVER* ASR -> NER -> SC_mer -> M

21.6 27.7

ASR -> NER -> SC_mer -> M_seg -> M

20.7 27.2

Relative gain +4.1% +1.0%

ASR : Automatic Speech Recognition NER : Named Entity Recognition SC_mer : Merged Semantic Concept extraction M_seg : MEDIA segmentation task M : MEDIA task

Concept Value Error Rate (CVER) Evaluates concepts and values (Words within the concepts) Concept Error Rate (CER) Evaluates concepts only

* Scores for MEDIA test set

slide-34
SLIDE 34

ICASSP 2020 11

  • A. Caubrière et al.

NER task contribution

Unseen Concept/Value pairs (UCV) Examples seen in the development dataset which do not appear in the training dataset A total of 533 UCV

slide-35
SLIDE 35

ICASSP 2020 11

  • A. Caubrière et al.

NER task contribution

System Correct Concept/Value Correct Value ASR -> NER -> SC_mer -> M

132 38

ASR -> SC_mer -> M

124 36

Unseen Concept/Value pairs (UCV) Examples seen in the development dataset which do not appear in the training dataset A total of 533 UCV

slide-36
SLIDE 36

ICASSP 2020 11

  • A. Caubrière et al.

NER task contribution

System Correct Concept/Value Correct Value ASR -> NER -> SC_mer -> M

132 38

ASR -> SC_mer -> M

124 36

Unseen Concept/Value pairs (UCV) Examples seen in the development dataset which do not appear in the training dataset A total of 533 UCV Only a small number of correct values (around a quarter) Incorrect speech transcription for a major part of the UCV

slide-37
SLIDE 37

ICASSP 2020 12

  • A. Caubrière et al.

NER task contribution

Delta of concepts recognition errors With and without the NER task during the training

slide-38
SLIDE 38

ICASSP 2020 12

  • A. Caubrière et al.

NER task contribution

Delta of concepts recognition errors With and without the NER task during the training Observations Strongly positive evolution Improved concepts close to named entities ‘Location-town’, ‘time-day-month’, ‘stay-nbcouple’ ...

slide-39
SLIDE 39

ICASSP 2020 13

  • A. Caubrière et al.

Embeddings extraction (MEDIA development dataset) From the last bLSTM layer of DS2 DS2 trained with CTC loss function One embedding per input frame One character per input frame

softmax bLSTM CNN FC extraction

Embeddings visualisation

slide-40
SLIDE 40

ICASSP 2020 13

  • A. Caubrière et al.

Embeddings extraction (MEDIA development dataset) From the last bLSTM layer of DS2 DS2 trained with CTC loss function One embedding per input frame One character per input frame Words and concepts representation Represented by more than one character Use of the sum of each frame’s embeddings Use of a t-SNE transformation for a 2D representation

softmax bLSTM CNN FC extraction

Embeddings visualisation

slide-41
SLIDE 41

ICASSP 2020 14

  • A. Caubrière et al.

Embeddings visualisation

Observation Each color represent a semantic class Concepts of the same classes are clustered Some very clear clusters An area with mixed concepts

slide-42
SLIDE 42

ICASSP 2020 15

  • A. Caubrière et al.

Green : well-recognized concepts Red : badly-recognized concepts Observation Main errors are in the mixed area Concepts errors seem to be related to an insufficiently discriminative internal representation

Embeddings visualisation

slide-43
SLIDE 43

ICASSP 2020 16

  • A. Caubrière et al.

Conclusion

We presented a qualitative study of errors produced by an end-to-end SLU system

slide-44
SLIDE 44

ICASSP 2020 16

  • A. Caubrière et al.

Conclusion

We presented a qualitative study of errors produced by an end-to-end SLU system We observed that most of the errors concern generic and domain-independent concepts

slide-45
SLIDE 45

ICASSP 2020 16

  • A. Caubrière et al.

Conclusion

We presented a qualitative study of errors produced by an end-to-end SLU system We observed that most of the errors concern generic and domain-independent concepts We detected a concept segmentation issue for our SLU system

slide-46
SLIDE 46

ICASSP 2020 16

  • A. Caubrière et al.

Conclusion

We presented a qualitative study of errors produced by an end-to-end SLU system We observed that most of the errors concern generic and domain-independent concepts We detected a concept segmentation issue for our SLU system We proposed an intermediate segmentation training task which allows 4.1% relative gain

slide-47
SLIDE 47

ICASSP 2020 16

  • A. Caubrière et al.

Conclusion

We presented a qualitative study of errors produced by an end-to-end SLU system We observed that most of the errors concern generic and domain-independent concepts We detected a concept segmentation issue for our SLU system We proposed an intermediate segmentation training task which allows 4.1% relative gain We proposed a way to compute embeddings of sub-sequences

slide-48
SLIDE 48

ICASSP 2020 16

  • A. Caubrière et al.

Conclusion

We presented a qualitative study of errors produced by an end-to-end SLU system We observed that most of the errors concern generic and domain-independent concepts We detected a concept segmentation issue for our SLU system We proposed an intermediate segmentation training task which allows 4.1% relative gain We proposed a way to compute embeddings of sub-sequences We observed that output concept errors appear to be related to an insufficiently discriminative internal representation

slide-49
SLIDE 49

ICASSP 2020 17

  • A. Caubrière et al.

Perspectives

Take benefit from this cartography How to take benefit to improve performances? How to force the system to represent the concepts in a more relevant space? Exploit the position of embeddings in the continuous space

slide-50
SLIDE 50

ICASSP 2020

  • A. Caubrière et al.

Thank you

Contact: antoine.caubriere@univ-lemans.fr