SMT error analysis and mapping to syntactic, semantic and structural - - PowerPoint PPT Presentation

smt error analysis and mapping to syntactic semantic and
SMART_READER_LITE
LIVE PREVIEW

SMT error analysis and mapping to syntactic, semantic and structural - - PowerPoint PPT Presentation

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions SMT error analysis and mapping to syntactic, semantic and structural fixes Nora Aranberri IXA Group


slide-1
SLIDE 1

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

SMT error analysis and mapping to syntactic, semantic and structural fixes

Nora Aranberri

IXA Group University of the Basque Country

SSST-9 2015 1

slide-2
SLIDE 2

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Index

1 Introduction 2 Error classification schemes 3 Our approach 4 The SMT systems 5 Error analysis results 6 Fixing possibilities with SSS 7 Conclusions

SSST-9 2015 2

slide-3
SLIDE 3

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Error analysis for SMT

Does error analysis make sense for SMT?

SSST-9 2015 3

slide-4
SLIDE 4

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Error analysis for SMT

Does error analysis make sense for SMT? Which aspects should it cover?

SSST-9 2015 3

slide-5
SLIDE 5

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Index

1 Introduction 2 Error classification schemes 3 Our approach 4 The SMT systems 5 Error analysis results 6 Fixing possibilities with SSS 7 Conclusions

SSST-9 2015 4

slide-6
SLIDE 6

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Dynamic Quality Framework - TAUS

Dimensions: attributes, grammatical and localization issues

SSST-9 2015 5

slide-7
SLIDE 7

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Dynamic Quality Framework - TAUS

Dimensions: attributes, grammatical and localization issues Disadvantages High-level annotation only Mixed dimensions

SSST-9 2015 5

slide-8
SLIDE 8

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Multidimensional Quality Metric - QTLaunchPad

MQM Core: a hierarchy of 22 issues Dimensions: quality attributes, grammar/linguistic and edit-types Top level dichotomy: accuracy vs fluency Lower-levels: grammatical/linguistic and edit-type errors

SSST-9 2015 6

slide-9
SLIDE 9

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Multidimensional Quality Metric - QTLaunchPad

MQM Core: a hierarchy of 22 issues Dimensions: quality attributes, grammar/linguistic and edit-types Top level dichotomy: accuracy vs fluency Lower-levels: grammatical/linguistic and edit-type errors Disadvantages Mixed dimensions Issues often too broad to identify SSS solutions

SSST-9 2015 6

slide-10
SLIDE 10

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

SMT-oriented classification - Vilar et al. 2006

Dimensions: edit-types and linguistic issues Top level: edit-types Lower-levels: edit-types, spans and grammatical issues

SSST-9 2015 7

slide-11
SLIDE 11

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

SMT-oriented classification - Vilar et al. 2006

Dimensions: edit-types and linguistic issues Top level: edit-types Lower-levels: edit-types, spans and grammatical issues Disadvantages Not informative enough linguistically for SSS solutions

SSST-9 2015 7

slide-12
SLIDE 12

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Index

1 Introduction 2 Error classification schemes 3 Our approach 4 The SMT systems 5 Error analysis results 6 Fixing possibilities with SSS 7 Conclusions

SSST-9 2015 8

slide-13
SLIDE 13

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Proposed bidimensional error scheme

dynamic, extensible bidimensional scheme

Top-level category Subclasses Incorrect Missing Additional Lexis Morphosyntax Verbs Order Punctuation Untranslated

Complementary dimensions: linguistic issues and edit-types Six top linguistic categories Dynamic extensible hierarchy Three edit-type categories

SSST-9 2015 9

slide-14
SLIDE 14

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Proposed bidimensional error scheme

dynamic, extensible bidimensional scheme

Top-level category Subclasses Incorrect Missing Additional Lexis Morphosyntax Verbs Order Punctuation Untranslated

Complementary dimensions: linguistic issues and edit-types Six top linguistic categories Dynamic extensible hierarchy Three edit-type categories To be considered Further dimensions (e.g. severity)

SSST-9 2015 9

slide-15
SLIDE 15

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Index

1 Introduction 2 Error classification schemes 3 Our approach 4 The SMT systems 5 Error analysis results 6 Fixing possibilities with SSS 7 Conclusions

SSST-9 2015 10

slide-16
SLIDE 16

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Description of the SMT systems

English-Spanish SMT Standard phrase-based Moses system Training data: Bilingual: europarl, UN corpus, News Commentary and Common Crawl (∼335M words) Monolingual: Spanish part of europarl, News Commentary and Common Crawl (∼60M words) In-domain tuning data: 1,000 QA interactions BLEU score (in-domain): 45.86 English-Basque SMT Phrase-based Moses system Alignment at lemma-level Training data: Bilingual: academic booksk, software manuals and UI strings, web-crawled data (∼13.5M words) Monolingual: Basque part of the above + administrative text (∼21M words) In-domain tuning data: 1,000 QA interactions BLEU score: 20.24

SSST-9 2015 11

slide-17
SLIDE 17

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Index

1 Introduction 2 Error classification schemes 3 Our approach 4 The SMT systems 5 Error analysis results 6 Fixing possibilities with SSS 7 Conclusions

SSST-9 2015 12

slide-18
SLIDE 18

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Error analysis for the English-Spanish pair

137 sentences evaluated with a total of 169 errors. Lexis: 31% of the total errors

example of a lexical error Click run where it says vulnerabilities. Pulse correr donde dice vulnerabilidades.

SSST-9 2015 13

slide-19
SLIDE 19

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Error analysis for the English-Spanish pair

137 sentences evaluated with a total of 169 errors. Lexis: 31% of the total errors Morphosyntax: 29%

example of a morphosyntactic error Yes, you can share files and folders with one or more users on MEO Cloud. S´ ı, puede compartir archivos y carpetas con uno o m´ as usuarios sobre MEO Cloud.

SSST-9 2015 13

slide-20
SLIDE 20

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Error analysis for the English-Spanish pair

137 sentences evaluated with a total of 169 errors. Lexis: 31% of the total errors Morphosyntax: 29% Verbs: 18%

example of a verb error Connect your computer to the ZON HUB via Ethernet cable. Conectar su ordenador a la HUB af a trav´ es de cable Ethernet. (infinitive to connect)

SSST-9 2015 13

slide-21
SLIDE 21

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Error analysis for the English-Spanish pair

137 sentences evaluated with a total of 169 errors. Lexis: 31% of the total errors Morphosyntax: 29% Verbs: 18% Order: 11%

example of an ordering error Tap Import to copy your Android Browser Favourites. Toca Importar para copiar su navegador de Android favoritos.

SSST-9 2015 13

slide-22
SLIDE 22

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Error analysis for the English-Spanish pair

137 sentences evaluated with a total of 169 errors. Lexis: 31% of the total errors Morphosyntax: 29% Verbs: 18% Order: 11% Punctuation: 6%

example of a punctuation error If I buy a computer abroad, will it work in Portugal Si compro un ordenador en el extranjero, funcionar´ a en Portugal? (missing ‘).

SSST-9 2015 13

slide-23
SLIDE 23

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Error analysis for the English-Spanish pair

137 sentences evaluated with a total of 169 errors. Lexis: 31% of the total errors Morphosyntax: 29% Verbs: 18% Order: 11% Punctuation: 6% Untranslated: 5%

example of an unstranslated unit Then click on the yellow disc with a green tick. Then haga clic en el disco de color amarillo con una marca verde.

SSST-9 2015 13

slide-24
SLIDE 24

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Error analysis for the English-Spanish pair

137 sentences evaluated with a total of 169 errors. Lexis: 31% of the total errors Morphosyntax: 29% Verbs: 18% Order: 11% Punctuation: 6% Untranslated: 5%

SSST-9 2015 13

slide-25
SLIDE 25

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Error analysis for the English-Basque pair

140 sentences evaluated with a total of 393 errors. Lexis: 23% of the total errors

example of a lexical error Go to WhatsApp > ’Menu Button’ > ’Status’. Joan menu botoia WhatsApp > ” ” > ” egoera ”.

SSST-9 2015 14

slide-26
SLIDE 26

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Error analysis for the English-Basque pair

140 sentences evaluated with a total of 393 errors. Lexis: 23% of the total errors Morphosyntax: 39%

example of a morphosyntactic error simply by dragging the profile of the person concerned to the various circles. besterik gabe, arrastatu pertsonaren profila hainbat nahia zirkulu.

SSST-9 2015 14

slide-27
SLIDE 27

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Error analysis for the English-Basque pair

140 sentences evaluated with a total of 393 errors. Lexis: 23% of the total errors Morphosyntax: 39% Verbs: 18%

example of a verb error Choose a standard status or personalize one. Egoera estandar bat edo pertsonalizatu bat.

SSST-9 2015 14

slide-28
SLIDE 28

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Error analysis for the English-Basque pair

140 sentences evaluated with a total of 393 errors. Lexis: 23% of the total errors Morphosyntax: 39% Verbs: 18% Order: 14%

example of an ordering error You can use the app iPP Podcast Player you find on Google Play. Aplikazioa erabil dezakezu IPP podcast Player aurkitu duzu Google erreproduzitu.

SSST-9 2015 14

slide-29
SLIDE 29

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Error analysis for the English-Basque pair

140 sentences evaluated with a total of 393 errors. Lexis: 23% of the total errors Morphosyntax: 39% Verbs: 18% Order: 14% Punctuation: 4%

example of a punctuation error How can I change the language to of Mega to Portuguese? Nola aldatu hizkuntza of Mega, portugesa?

SSST-9 2015 14

slide-30
SLIDE 30

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Error analysis for the English-Basque pair

140 sentences evaluated with a total of 393 errors. Lexis: 23% of the total errors Morphosyntax: 39% Verbs: 18% Order: 14% Punctuation: 4% Untranslated: 1%

example of an unstranslated unit How much space do I have for free on Mega? Zenbat leku ditut doan on Mega?

SSST-9 2015 14

slide-31
SLIDE 31

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Error analysis for the English-Basque pair

140 sentences evaluated with a total of 393 errors. Lexis: 23% of the total errors Morphosyntax: 39% Verbs: 18% Order: 14% Punctuation: 4% Untranslated: 1%

SSST-9 2015 14

slide-32
SLIDE 32

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Index

1 Introduction 2 Error classification schemes 3 Our approach 4 The SMT systems 5 Error analysis results 6 Fixing possibilities with SSS 7 Conclusions

SSST-9 2015 15

slide-33
SLIDE 33

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Fixing possibilities with syntax, semantics and structure

Main sources for errors use-scenario-specific features

Q&A chat scenario in the information technology domain

language pair-specific features

SSST-9 2015 16

slide-34
SLIDE 34

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Fixing possibilities with syntax, semantics and structure

IT names not identified (CR)NER

SSST-9 2015 17

slide-35
SLIDE 35

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Fixing possibilities with syntax, semantics and structure

IT names not identified (CR)NER UIs not identified NER

SSST-9 2015 17

slide-36
SLIDE 36

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Fixing possibilities with syntax, semantics and structure

IT names not identified (CR)NER UIs not identified NER domain-specific terminology WSD

SSST-9 2015 17

slide-37
SLIDE 37

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Fixing possibilities with syntax, semantics and structure

IT names not identified (CR)NER UIs not identified NER domain-specific terminology WSD Basque postpositions predicate-argument structures and semantic roles

SSST-9 2015 17

slide-38
SLIDE 38

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Fixing possibilities with syntax, semantics and structure

IT names not identified (CR)NER UIs not identified NER domain-specific terminology WSD Basque postpositions predicate-argument structures and semantic roles incorrectly ordered phrases predicate-argument structures and semantic roles

SSST-9 2015 17

slide-39
SLIDE 39

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Fixing possibilities with syntax, semantics and structure

IT names not identified (CR)NER UIs not identified NER domain-specific terminology WSD Basque postpositions predicate-argument structures and semantic roles incorrectly ordered phrases predicate-argument structures and semantic roles split phrases syntax

SSST-9 2015 17

slide-40
SLIDE 40

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Fixing possibilities with syntax, semantics and structure

IT names not identified (CR)NER UIs not identified NER domain-specific terminology WSD Basque postpositions predicate-argument structures and semantic roles incorrectly ordered phrases predicate-argument structures and semantic roles split phrases syntax incorrect phrase-internal order syntax

SSST-9 2015 17

slide-41
SLIDE 41

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Fixing possibilities with syntax, semantics and structure

IT names not identified (CR)NER UIs not identified NER domain-specific terminology WSD Basque postpositions predicate-argument structures and semantic roles incorrectly ordered phrases predicate-argument structures and semantic roles split phrases syntax incorrect phrase-internal order syntax incorrectly generated verb phrases

SSST-9 2015 17

slide-42
SLIDE 42

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Index

1 Introduction 2 Error classification schemes 3 Our approach 4 The SMT systems 5 Error analysis results 6 Fixing possibilities with SSS 7 Conclusions

SSST-9 2015 18

slide-43
SLIDE 43

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Conclusions

proposed a bidimensional error classification

dynamic, extensible linguistically-informed dimension edit-types

evaluated English-Spanish and English-Basque SMT systems

domain-related issues:

terminology and UI strings

language-pair-specific issues:

English prepositions and subordinate markers for Basque verb phrase generation

  • rdering at different levels

revealed potential of SSS to solve errors

SSST-9 2015 19

slide-44
SLIDE 44

Introduction Error classification schemes Our approach The SMT systems Error analysis results Fixing possibilities with SSS Conclusions

Thanks! nora.aranberri@ehu.eus

SSST-9 2015 20