Robust Multilingual Statistical Morphology Generation Models Ondej - PowerPoint PPT Presentation

Introduction The system Results Robust Multilingual Statistical Morphology Generation Models Ondřej Dušek and Filip Jurčíček Institute of Formal and Applied Linguistics Charles University in Prague August 6, 2013 . . . . . . 1/ 12 Ondřej Dušek and Filip Jurčíček Robust Multilingual Statistical Morphology Generation Models

What we do ( Flect ) Semantics Semantics EN DE ES CA JA CS Syntax Syntax In these languages N Natural Language Generation a t u r a Morphology Morphology l L a n g u a g e G We solve this We solve this e n e r a t Text Text i o n Introduction The system Results Introduction Morphology in NLG • Last step of the whole NLG pipeline • Usually does not get a lot of attention, but is necessary . . . . . . 2/ 12 Ondřej Dušek and Filip Jurčíček Robust Multilingual Statistical Morphology Generation Models

Semantics EN DE ES CA JA CS Syntax In these languages Natural Language Generation Morphology We solve this Text Introduction The system Results Introduction Morphology in NLG • Last step of the whole NLG pipeline • Usually does not get a lot of attention, but is necessary What we do ( Flect ) Semantics Syntax N a t u r a Morphology l L a n g u a g e G We solve this e n e r a t Text i o n . . . . . . 2/ 12 Ondřej Dušek and Filip Jurčíček Robust Multilingual Statistical Morphology Generation Models

Introduction The system Results Introduction Morphology in NLG • Last step of the whole NLG pipeline • Usually does not get a lot of attention, but is necessary What we do ( Flect ) Semantics Semantics EN DE ES CA JA CS Syntax Syntax In these languages N Natural Language Generation a t u r a Morphology Morphology l L a n g u a g e G We solve this We solve this e n e r a t Text Text i o n . . . . . . 2/ 12 Ondřej Dušek and Filip Jurčíček Robust Multilingual Statistical Morphology Generation Models

Languages with more inflection (e.g. Czech): even for the simplest things é ě Toto se líbí uživateli Jana Nováková. --------- - - [masc] [fem] This is liked by user (name) [dat] [nom] e u Děkujeme, Jan Novák , vaše hlasování bylo vytvořeno. Thank you, (name) [nom] your poll has been created Introduction The system Results The need for morphology in generation • English – not so much: hard-coded solutions often work well enough . . . . . . 3/ 12 Ondřej Dušek and Filip Jurčíček Robust Multilingual Statistical Morphology Generation Models

Introduction The system Results The need for morphology in generation • English – not so much: hard-coded solutions often work well enough • Languages with more inflection (e.g. Czech): even for the simplest things é ě Toto se líbí uživateli Jana Nováková. --------- - - [masc] [fem] This is liked by user (name) [dat] [nom] e u Děkujeme, Jan Novák , vaše hlasování bylo vytvořeno. Thank you, (name) [nom] your poll has been created . . . . . . 3/ 12 Ondřej Dušek and Filip Jurčíček Robust Multilingual Statistical Morphology Generation Models

Introduction The system Results The task at hand word NNS words + Wort NN Wörtern + Neut,Pl,Dat be + VBZ is gen=c,num=s,person=3, ser + V es mood=indicative,tense=present • Input: Lemma (base form) or stem + morphological properties (POS, case, gender, etc.) • Output: Inflected word form • Inverse to POS tagging . . . . . . 4/ 12 Ondřej Dušek and Filip Jurčíček Robust Multilingual Statistical Morphology Generation Models

Hand-written rules? rule Work well, but are hard to maintain x y B C Machine learning! x 1 Obtain the rules automatically w 1 rule w 2 x 2 Plenty of treebanks of sufficient size available w n Only work known to us: Bohnet et al. 2010 x n σ Introduction The system Results Possible solutions Dictionary? • Works well, but has limited size • Not many large-coverage openly available ones . . . . . . 5/ 12 Ondřej Dušek and Filip Jurčíček Robust Multilingual Statistical Morphology Generation Models

Machine learning! x 1 Obtain the rules automatically w 1 rule w 2 x 2 Plenty of treebanks of sufficient size available w n Only work known to us: Bohnet et al. 2010 x n σ Introduction The system Results Possible solutions Dictionary? • Works well, but has limited size • Not many large-coverage openly available ones Hand-written rules? rule • Work well, but are hard to maintain x y B C . . . . . . 5/ 12 Ondřej Dušek and Filip Jurčíček Robust Multilingual Statistical Morphology Generation Models

σ Introduction The system Results Possible solutions Dictionary? • Works well, but has limited size • Not many large-coverage openly available ones Hand-written rules? rule • Work well, but are hard to maintain x y B C Machine learning! x 1 • Obtain the rules automatically w 1 rule w 2 x 2 • Plenty of treebanks of sufficient size available w n • Only work known to us: Bohnet et al. 2010 x n . . . . . . 5/ 12 Ondřej Dušek and Filip Jurčíček Robust Multilingual Statistical Morphology Generation Models

[at the end] [at the end] [at the end] [replace the whole word] [delete one letter] [delete one letter] [delete one letter] be *is fly fly fly flies >1-ies flies >1-ies flies >1-ies is [and add these] [and add these] [and add these] [5 letters from the end] [5 letters from the end] [delete one letter] [delete one letter] Mutter Mutter >2-t, <ge >2-t, <ge >2-t, <ge sparen sparen sparen Mütter 5:1-ü Mütter 5:1-ü gespart gespart gespart [add this] [add this] [add this] [at the beginning] [at the beginning] [at the beginning] [and add this] [and add this] Introduction The system Results Casting inflection patterns as multi-class classification [at the end] [delete one letter] fly flies >1-ies [and add these] Our inflection rules: edit scripts • A kind of diffs : how to modify the lemma to get the form • Based on Levenshtein distance . . . . . . 6/ 12 Ondřej Dušek and Filip Jurčíček Robust Multilingual Statistical Morphology Generation Models

[at the end] [at the end] [replace the whole word] [delete one letter] [delete one letter] be *is fly fly flies >1-ies flies >1-ies is [and add these] [and add these] [5 letters from the end] [5 letters from the end] [delete one letter] [delete one letter] Mutter Mutter >2-t, <ge >2-t, <ge sparen sparen Mütter 5:1-ü Mütter 5:1-ü gespart gespart [add this] [add this] [at the beginning] [at the beginning] [and add this] [and add this] Introduction The system Results Casting inflection patterns as multi-class classification [at the end] [at the end] [delete one letter] [delete one letter] fly fly flies >1-ies flies >1-ies [and add these] [and add these] >2-t, <ge sparen gespart [add this] [at the beginning] Our inflection rules: edit scripts • A kind of diffs : how to modify the lemma to get the form • Based on Levenshtein distance . . . . . . 6/ 12 Ondřej Dušek and Filip Jurčíček Robust Multilingual Statistical Morphology Generation Models

[at the end] [replace the whole word] [delete one letter] be *is fly flies >1-ies is [and add these] [5 letters from the end] [delete one letter] Mutter >2-t, <ge sparen Mütter 5:1-ü gespart [add this] [at the beginning] [and add this] Introduction The system Results Casting inflection patterns as multi-class classification [at the end] [at the end] [at the end] [delete one letter] [delete one letter] [delete one letter] fly fly flies >1-ies fly flies >1-ies flies >1-ies [and add these] [and add these] [and add these] [5 letters from the end] [delete one letter] Mutter >2-t, <ge >2-t, <ge sparen sparen Mütter 5:1-ü gespart gespart [add this] [add this] [at the beginning] [at the beginning] [and add this] Our inflection rules: edit scripts • A kind of diffs : how to modify the lemma to get the form • Based on Levenshtein distance . . . . . . 6/ 12 Ondřej Dušek and Filip Jurčíček Robust Multilingual Statistical Morphology Generation Models

Introduction The system Results Casting inflection patterns as multi-class classification [at the end] [at the end] [at the end] [replace the whole word] [at the end] [delete one letter] [delete one letter] [delete one letter] be [delete one letter] *is fly fly fly flies >1-ies flies >1-ies flies >1-ies fly is flies >1-ies [and add these] [and add these] [and add these] [and add these] [5 letters from the end] [5 letters from the end] [delete one letter] [delete one letter] Mutter Mutter >2-t, <ge >2-t, <ge >2-t, <ge sparen sparen sparen Mütter 5:1-ü Mütter 5:1-ü gespart gespart gespart [add this] [add this] [add this] [at the beginning] [at the beginning] [at the beginning] [and add this] [and add this] Our inflection rules: edit scripts • A kind of diffs : how to modify the lemma to get the form • Based on Levenshtein distance . . . . . . 6/ 12 Ondřej Dušek and Filip Jurčíček Robust Multilingual Statistical Morphology Generation Models

Robust Multilingual Statistical Morphology Generation Models Ondej - PowerPoint PPT Presentation

Introduction The system Results Robust Multilingual Statistical Morphology Generation Models Ondej Duek and Filip Jurek Institute of Formal and Applied Linguistics Charles University in Prague August 6, 2013 . . . . . . 1/

Drupal 8s multilingual APIs Gbor Hojtsy DRUPAL 7 MULTILINGUAL DRUPAL 7 MULTILINGUAL Drupal

Drupal 8 Multilingual Wonderland Gabor Hojtsy Acquia Foreign language site Multilingual site

Morphology Morphology Morphology yields words with Morphology yields words with predictable

Computational Morphology: Machine learning of morphology Yulia Zinova 09 April 2014 16 July

Update on morphology WP activities M. Huertas-Company (GAL-SWG - morphology) EUCLID France - 7

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

Multilingual App Toolkit Standards and multilingual software development 29, April 2015 Jan

Lexical Phonology and Morphology February 4, 2016 Lexical Phonology and Morphology Paul

Computational Morphology: Introduction Yulia Zinova SoSe 2020 Yulia Zinova Computational

Introduction to English Linguistics 3: Morphology and Word Formation Part I: Morphology Part II:

Introduction to English Linguistics 3: Morphology and Word Formation Part I: Morphology Part II:

Morphology and Corpora: Introduction Marco Baroni University of Bologna Granada Morphology

Discrete Morphology and Distances on graphs Jean Cousty Four-Day Course on Mathematical

Morphology parsing Informatics 2A: Lecture 7 John Longley School of Informatics University of

Short Course in Supervised Learning Robust Optimization and Machine Learning Robust Supervised

Monitoring and analysing multilingual media reports Monitoring and analysing multilingual media

Hashing Index Scheme for Persistent Memory Pengfei Zuo , Yu Hua, Jie Wu Huazhong University of

Fern Green Open House (CCE) CCE Structure Total Defence Leadership for Day All

An Overview of Human Computation Dr. Ling-Jyh Chen (cclljj@iis.sinica.edu.tw) Institute of

The IAEAs technical cooperation programme: contributing for peace and development Ms Ana

A Closer Look at Adaptive Regret Dmitry Adamskiy Joint work with Wouter Koolen, Volodya Vovk and

Theorie der Informatik 6. Formale Sprachen und Grammatiken Malte Helmert Gabriele R oger

Whole Person Care Los Angeles Clemens Hong MD MPH Director, Whole Person Care Medical

For the Open-Set Data Classification Zhuoyi Wang, Bo Dong, Yu Lin, Yigong Wang, Latifur Khan