Induction of Multilingual Morphology with only Minimal Supervision - PowerPoint PPT Presentation

Introduction Task Definition Contextual Similarity Model Combination Induction of Multilingual Morphology with only Minimal Supervision Richard Wicentowski Computer Science Department Swarthmore College November 15, 2006

Introduction Task Definition Contextual Similarity Model Combination Outline Introduction 1 Task Definition 2 Contextual Similarity 3 Model Combination 4

Introduction Task Definition Contextual Similarity Model Combination Motivation: Machine Translation Saint-Exupéry, Le Petit Prince, 1943 Bien sûr, dit le renard. Tu n’es pas encore pour moi qu’un petit garçon tout semblable à cent mille petits garçons. Et je n’ai pas besoin de toi. Et tu n’as pas besoin de moi non plus. Je ne suis pour toi qu’un renard semblable à cent mille renards. Mais, si tu m’apprivoises, nous aurons besoin l’un de l’autre. Tu seras pour moi unique au monde. Je serai pour toi unique au monde... Je commence à comprendre, dit le petit prince. Il y a une fleur... je crois qu’elle m’a apprivoisé...

Introduction Task Definition Contextual Similarity Model Combination

Introduction Task Definition Contextual Similarity Model Combination Motivation: Machine Translation Saint-Exupéry, Le Petit Prince, 1943 Of course, known as the fox. You are not yet for me that a little boy very similar to a hundred and thousand small boys. And I do not need you. And you do not need me either. I am for you only one fox similar to a hundred and thousand foxes. But, if you tame me, we will need one the other. You will be for me single in the world. I will be for you single in the world... I start to include/understand, known as the small prince. It there be a flower... I believe that it me have tame...

Introduction Task Definition Contextual Similarity Model Combination

Introduction Task Definition Contextual Similarity Model Combination Native Native Language Speakers Language Speakers (millions) (millions) Mandarin Chinese 867 Marathi 68 Hindi 400 Tamil 68 Spanish 390 Korean 67 English 310 French 64 Standard Arabic 206 Urdu 61 Indonesian 222 Italian 61 Bengali 194 Turkish 60 Portuguese 177 Yoruba 47 Russian 145 Gujarati 46 Japanese 121 Polish 46 Persian 101 Ukranian 39 Punjabi 104 Malayalam 36 Javanese 76 Kannada 35 German 75 Oriya 32 Vietnamese 70 Burmese 32 Telugu 70 Thai 31

Introduction Task Definition Contextual Similarity Model Combination Resources Needed for Machine Translation What resources are needed to translate from Hindi to Bengali? Hindi / Bengali dictionary Word translation in context (Lexical choice) Morphological analyzers and generators Syntactic parsers / knowledge of grammar And, if we wanted to do this translation from speech rather than written text, we’d also need speech recognizers...

Introduction Task Definition Contextual Similarity Model Combination Morphology and Lexical Choice in Machine Translation Saint-Exupéry, Le Petit Prince, 1943 Bien sûr, dit le renard. Tu n’es pas encore pour moi qu’un petit garçon tout semblable à cent mille petits garçons. Et je n’ai pas besoin de toi. Et tu n’as pas besoin de moi non plus. Je ne suis pour toi qu’un renard semblable à cent mille renards. Mais, si tu m’apprivoises, nous aurons besoin l’un de l’autre. Tu seras pour moi unique au monde. Je serai pour toi unique au monde... Je commence à comprendre, dit le petit prince. Il y a une fleur... je crois qu’elle m’a apprivoisé...

Introduction Task Definition Contextual Similarity Model Combination Dictionary coverage vs. Inflectional Degree 45% Swedish 40% Dictionary coverage by type 35% English Spanish 30% Portuguese French 25% Italian 20% 15% Turkish 10% 1 10 100 Average number of inflections per root

Introduction Task Definition Contextual Similarity Model Combination Morphology and Lexical Choice in Machine Translation Je commence à comprendre, dit le petit prince. Il y a une fleur... je crois qu’elle m’a apprivoisé... ... critiquer crois croasser croyez croître croire crussiez croiser crût croquer crotter croyant ... Morphological Analysis

Introduction Task Definition Contextual Similarity Model Combination Morphology and Lexical Choice in Machine Translation Je commence à comprendre, dit le petit prince. Il y a une fleur... je crois qu’elle m’a apprivoisé... ... ... critiquer criticize crois croasser grow croyez croître suppose croire believe crussiez croiser consider crût croquer conceive crotter cross croyant ... ... Morphological Analysis Lexical Choice

Introduction Task Definition Contextual Similarity Model Combination Morphology and Lexical Choice in Machine Translation Je commence à comprendre, dit le petit prince. Il y a une fleur... je crois qu’elle m’a apprivoisé... ... ... critiquer criticize crois believe croasser grow croyez croître suppose believes croire believe crussiez croiser consider believed crût croquer conceive crotter cross croyant believing ... ... Morphological Analysis Lexical Choice Morphological Generation

Introduction Task Definition Contextual Similarity Model Combination Outline Introduction 1 Task Definition 2 Contextual Similarity 3 Model Combination 4

Introduction Task Definition Contextual Similarity Model Combination Task definition Morphological Analysis Input inflection Output root, optional part of speech Morphological Generation Input root, part of speech Output inflection

Introduction Task Definition Contextual Similarity Model Combination Task definition Morphological Analysis Input inflection crois Output root, optional part of speech croire, 2S Imperative croire, 1S Present croire, 2S Present Morphological Generation Input root, part of speech croire, Present Participle Output inflection croyant

Introduction Task Definition Contextual Similarity Model Combination Task definition Morphological Analysis Input inflection burned Output root, optional part of speech burn, Past Indicative burn, Past Participle Morphological Generation Input root, part of speech burn, Past Indicative Output inflection burnt burned

Introduction Task Definition Contextual Similarity Model Combination Inflectional morphological phenomena prefixation: geuza → mligeuza ( Swahili ) affixation suffixation: adhair → adhairim ( Irish ) circumfixation: mischen → gemischt ( German ) infixation: palit → pumalit ( Tagalog ) point-of- placer → plaça ( French ) affixation elision: close → closing ( English ) stem gemination: stir → stirred ( English ) changes voicing: zwerft → zwerven ( Dutch ) vowel abartmak → abartmasanız ( Turkish ) harmony addetmek → addetmeseniz ( Turkish ) internal afbryde → afbrød ( Danish ) vowel shift skrike skreik ( Norwegian ) →

Introduction Task Definition Contextual Similarity Model Combination Inflectional morphological phenomena reduplication: gupit → gugupit ( Tagalog ) agglutination: gupit → igugupit agglutination agglutination: gupit → ipagugupit agglutination: gupit → ipinagugupit and agglutination: ev → evde ( Turkish ) agglutination: evde → evdeki agglutination: evdeki → evdekiler reduplication reduplication: rumah → rumahrumah ( Malay ) reduplication: ibu → ibuibu root and ktb kateb ( Arabic ) → pattern ktb kattab → highly fi → erai ( Romanian ) irregular j¯ an¯ a gay¯ a ( Hindi ) → forms eiga áttum ( Icelandic ) →

Introduction Task Definition Contextual Similarity Model Combination Task definition In order to perform morphological analysis, we must design an algorithm which can predict the root forms of inflections. There are three ways to approach the task using a machine-learning framework: Supervised Learning: The algorithm is provided with 1 training data, e.g. crois → croire . Minimally Supervised Learning: The algorithm is provided 2 some explicit information, but not in the form of training pairs, e.g. “This language is suffixal”, or “-ing is a productive suffix in this language” . Unsupervised Learning: The algorithm is not provided with 3 any explicit information; rather, information must be extracted from other sources, e.g. a large text corpus.

Introduction Task Definition Contextual Similarity Model Combination Supervised Machine Learning Algorithms A class of algorithms designed to form generalizations from “training data” in order to make predictions about previously unseen data. For example, given this training data... inflected verb citation form jumping jump singing sing burning burn ... ... ...we want to predict the citation form of an inflected verb: inflected verb citation form fishing ? carting ? soaring ?

Induction of Multilingual Morphology with only Minimal Supervision - PowerPoint PPT Presentation

Introduction Task Definition Contextual Similarity Model Combination Induction of Multilingual Morphology with only Minimal Supervision Richard Wicentowski Computer Science Department Swarthmore College November 15, 2006 Introduction Task

Drupal 8s multilingual APIs Gbor Hojtsy DRUPAL 7 MULTILINGUAL DRUPAL 7 MULTILINGUAL Drupal

Drupal 8 Multilingual Wonderland Gabor Hojtsy Acquia Foreign language site Multilingual site

Morphology Morphology Morphology yields words with Morphology yields words with predictable

Induction Stepwise induction (for T PA , T cons ) Complete induction (for T PA , T cons )

Induction and recursion Chapter 5 Chapter Summary Mathematical Induction Strong Induction

Coarse Classification of Binary Minimal Clones Zarathustra Brady Minimal clones A clone C is

Computational Morphology: Machine learning of morphology Yulia Zinova 09 April 2014 16 July

Update on morphology WP activities M. Huertas-Company (GAL-SWG - morphology) EUCLID France - 7

Mathematical Induction Lecture 10-11 Menu Mathematical Induction Strong Induction

MA THEMA TICAL INDUCTION Induction and Deduction Mathematical Induction (its

Beyond Inductive Definitions Induction-Recursion, Induction-Induction, Coalgebras Anton

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

Strong induction (3) 23/38 Let P be a unary predicate on N Strong induction: Induction . . .

Improving Morphology Induction with Spelling Rules Jason Naradowsky University of Massachusetts

Multilingual App Toolkit Standards and multilingual software development 29, April 2015 Jan

Lexical Phonology and Morphology February 4, 2016 Lexical Phonology and Morphology Paul

THE WORLDS PREMIER GROWTH-ORIENTED ROYALTY COMPANY COR ORPORATE P PRESENTATION ION

TD Securities Mining Conference January 17-18, 2018 Forward Looking Information This presentation

!"#$%&'()+,-)&(. /"012345,6&('0

What are the Pitfalls for Bangladesh? Presented by Towfiqul Islam Khan Research Fellow, CPD 10

BIBLIONEF Give books to children Open them up to the world www.biblionef.com Why do we offer

Presentation at 7 th Annual General Meeting 17 April 2019 Important Notice Information contained

Requiem Mass For Brother Colm Matthias Comyns Presentation Brothers Mount St. Joseph Blarney

HOW TO DESIGN TED-WORTHY PRESENTATION SLIDES (BLACK & WHITE EDITION): PRESENTATION DESIGN

Sambuz

Useful Links

Newsletter

Mail Us

Induction of Multilingual Morphology with only Minimal Supervision - PowerPoint PPT Presentation

Introduction Task Definition Contextual Similarity Model Combination Induction of Multilingual Morphology with only Minimal Supervision Richard Wicentowski Computer Science Department Swarthmore College November 15, 2006 Introduction Task

Drupal 8s multilingual APIs Gbor Hojtsy DRUPAL 7 MULTILINGUAL DRUPAL 7 MULTILINGUAL Drupal

Drupal 8 Multilingual Wonderland Gabor Hojtsy Acquia Foreign language site Multilingual site

Morphology Morphology Morphology yields words with Morphology yields words with predictable

Induction Stepwise induction (for T PA , T cons ) Complete induction (for T PA , T cons )

Induction and recursion Chapter 5 Chapter Summary Mathematical Induction Strong Induction

Coarse Classification of Binary Minimal Clones Zarathustra Brady Minimal clones A clone C is

Computational Morphology: Machine learning of morphology Yulia Zinova 09 April 2014 16 July

Update on morphology WP activities M. Huertas-Company (GAL-SWG - morphology) EUCLID France - 7

Mathematical Induction Lecture 10-11 Menu Mathematical Induction Strong Induction

MA THEMA TICAL INDUCTION Induction and Deduction Mathematical Induction (its

Beyond Inductive Definitions Induction-Recursion, Induction-Induction, Coalgebras Anton

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

Strong induction (3) 23/38 Let P be a unary predicate on N Strong induction: Induction . . .

Improving Morphology Induction with Spelling Rules Jason Naradowsky University of Massachusetts

Multilingual App Toolkit Standards and multilingual software development 29, April 2015 Jan

Lexical Phonology and Morphology February 4, 2016 Lexical Phonology and Morphology Paul

THE WORLDS PREMIER GROWTH-ORIENTED ROYALTY COMPANY COR ORPORATE P PRESENTATION ION

TD Securities Mining Conference January 17-18, 2018 Forward Looking Information This presentation

!&quot;#$%&amp;'()*+,-)&amp;(.* /&quot;0*1234*5,6&amp;('0

What are the Pitfalls for Bangladesh? Presented by Towfiqul Islam Khan Research Fellow, CPD 10

BIBLIONEF Give books to children Open them up to the world www.biblionef.com Why do we offer

Presentation at 7 th Annual General Meeting 17 April 2019 Important Notice Information contained

Requiem Mass For Brother Colm Matthias Comyns Presentation Brothers Mount St. Joseph Blarney

HOW TO DESIGN TED-WORTHY PRESENTATION SLIDES (BLACK &amp; WHITE EDITION): PRESENTATION DESIGN

Sambuz

Useful Links

Newsletter

Mail Us

!"#$%&'()+,-)&(. /"012345,6&('0

HOW TO DESIGN TED-WORTHY PRESENTATION SLIDES (BLACK & WHITE EDITION): PRESENTATION DESIGN