discovery of inflectional
play

Discovery of Inflectional Paradigms from Plain Text using Graphical - PowerPoint PPT Presentation

A Non-Parametric Model for the Discovery of Inflectional Paradigms from Plain Text using Graphical Models over Strings Markus Dreyer Center for Language and Speech Processing (CLSP) Human Language Technology Center of Excellence (HLTCOE) Johns


  1. String Pairs 1 s 1 = #breaking# # # Pr(s 1 ,s 2 ) = 1/Z F(s 1 ,s 2 ) s 2 = # # #broke# F(s 1 ,s 2 ) = #breaking# + exp � i θ i f i ( ) ) ( #br ε oke εε # #breaking# + exp � i θ i f i ( ) #bro ε ke εε # #brea ε king# + exp � i θ i f i ( ) #br εε oke εε # + exp � i θ i f i ( ) #break ε ing# #bro ε ke εεε # + . . .

  2. #break ε ing# #bro ε ke εεε # + exp � i θ i f i ( ) #break ε ing# #bro ε ke εεε #

  3. #break ε ing# #bro ε ke εεε # + exp � i θ i f i ( ) #break ε ing# #bro ε ke εεε #

  4. #break ε ing# #bro ε ke εεε # + exp � i θ i f i ( ) #break ε ing# #bro ε ke εεε #

  5. #break ε ing# #bro ε ke εεε # + exp � i θ i f i ( ) #break ε ing# #bro ε ke εεε #

  6. #break ε ing# #bro ε ke εεε # + exp � i θ i f i ( ) #break ε ing# #bro ε ke εεε #

  7. #break ε ing# #bro ε ke εεε # + exp � i θ i f i ( ) #break ε ing# #bro ε ke εεε #

  8. #break ε ing# #bro ε ke εεε #

  9. #break ε ing# #bro ε ke εεε # eak o ε k full window

  10. #break ε ing# #bro ε ke εεε # eak VVC o ε k V ε C full vowels, window consonants

  11. #break ε ing# #bro ε ke εεε # eak VVC ??? o ε k V ε C o ε k full vowels, target window consonants language

  12. #break ε ing# #bro ε ke εεε # ??? “collapsed” ok eak VVC ??? o ε k V ε C o ε k full vowels, target window consonants language

  13. #break ε ing# #bro ε ke εεε # ??? “collapsed” ok eak VVC ??? subst ident del o ε k V ε C o ε k full vowels, target subst, del, ins, window consonants language ident

  14. #break ε ing# f o s n o i s r e v d d a o s l A e r a t #bro ε ke εεε # a h t s e r u t a e f e s e h t ! s m a r g i b o t f f o d e k c a b ??? “collapsed” ?k ?ak ?VC ??? ident ??? del ? ε k ? ε C ? ε k full vowels, target subst, del, ins, window consonants language ident

  15. String Pairs 1 • To compute such feature-based scores for two string variables S 1 and S 2 , we S 1 construct a weighted finite-state F transducer F • It can assign a score to any string pair S 2 s 1 , s 2 1 Pr(s 1 ,s 2 ) = F(s 1 ,s 2 ) Z

  16. Background: Finite-state machines ? What is a finite-state acceptor (FSA) An automaton with a finite number of states and arcs. Can be used to assign a score to any string . ? What is a finite-state transducer (FST) Same as FSA, but used to assign score to any string pair (e.g. evaluating how well they go together).

  17. String Pairs 1 • Specific kind of grammar that describes and scores one or more strings • Closure properties under many useful operations (we will use composition, intersection, projection) • Useful for many tasks in natural language processing

  18. String Pairs 1 b r e c h e n S 1 = S 1 F F S 2 b r a c h t S 2 =

  19. String Pairs 1 b r e c h e n S 1 = finite-state transducer S 1 F F S 2 b r a c h t S 2 =

  20. String Pairs 1 b r e c h e n S 1 = finite-state transducer S 1 F F S 2 b r a c h t S 2 =

  21. String Pairs 1 b r e c h e n S 1 = finite-state transducer S 1 F F S 2 arc have weights, determined by their features b r a c h t S 2 =

  22. String Pairs 1 b r e c h e n S 1 = finite-state transducer S 1 F F S 2 arc have weights, determined by their features b r a c h t S 2 = Transducer F computes score by looking at all alignments

  23. String Pairs 1 Sum over all paths in the finite-state transducer b r e c h e n S 1 = finite-state transducer S 1 F F =13.26 S 2 arc have weights, determined by their features b r a c h t S 2 = Transducer F computes score by looking at all alignments

  24. String Pairs 1 • The alignment between the string pair is a latent variable. • We add more latent variables to the model: • Change regions For details, see my thesis, and Dreyer, Smith & Eisner, 2008 • Conjugations classes

  25. String Pairs 1 Inflection (on German verbs) 95 88 81 74 67 60 13SIA-13SKE 2PIE-13PKE 2PKE-z rP-pA Moses3 FST FST (+latent) (baseline) (this talk) See my thesis, and Dreyer, Smith & Eisner, 2008

  26. String Pairs 1 Lemmatization 100 90 80 70 Basque English Irish Tagalog Wicentowski (2002) This talk See my thesis, and Dreyer, Smith & Eisner, 2008

  27. String Pairs 1 Transliteration competition, NEWS 2009 Accuracy on English-to-Russian 61.3 60.5 60.0 54.5 a o k C t M T y (basic features) r l k C U e a B o b t I I I T N U l A s U i U h T

  28. Conclusions / Contributions 1 • Presented a novel, well-defined probability model over string pairs (or single strings) • General enough to model many string-to-string problems in NLP (and neighboring disciplines) • Achieved high-scoring results in different tasks (inflection, lemmatization, transliteration) in multiple languages ( German, Basque, English, Irish, Tagalog, Russian)

  29. Conclusions / Contributions 1 • Linguistic properties and soft constraints can be expressed and learned (prefer certain vowel/consonant sequences, prefer identities, ...) • Arbitrary-length output is handled elegantly (eliminates need for limiting structure insertion) • Much information does not need to be annotated; it is inferred as hidden variables (alignments, conjugation classes, regions)

  30. Overview p ( ) 1 String pairs p ( ) Multiple strings 2 (paradigms) Text and 3 p ( ) paradigms

  31. Multiple Strings 2 • We’ve seen how to model 2 strings, using feature-based finite- state machines • But we have bigger goals ...

  32. Multiple Strings 2

  33. Example applications 2 Inflectional paradigms

  34. Example applications 2 Inflectional paradigms

  35. Example applications 2 Inflectional paradigms ? ? ? ? ? ? ? ? ? ? ? ?

  36. Example applications 2 Inflectional paradigms predict predict

  37. Example applications 2 Inflectional paradigms predict predict

  38. Example applications 2 Inflectional paradigms predict predict

  39. Example applications 2 Inflectional paradigms predict predict reinforce

  40. アイスクリーム Example applications 2 Transliteration (using phonology) ice cream English orthography English phonology Japanese orthography Japanese phonology

  41. Example applications 2 Spelling correction Misspelling egg sample Pronunciation example Correct spelling

  42. Example applications 2 ... and all other tasks where word forms and representations interact: • Cognate modeling • Multiple-string alignment • System combination

  43. Multiple Strings 2 • Let’s build a general probability model over multiple strings • It extends the string-pair model we saw in the last part. • We will later be able to use it to learn how to inflect verbs.

  44. Model. Factor graph examples 2 Factor Graph: 1 Pr(s 1 , s 2 ) = Z S 1 x F 1 (s 1 , s 2 ) F 1 S 2

  45. Model. Factor graph examples 2 Factor Graph: 1 Pr(s 1 , s 2 ) = Z S 1 x F 1 (s 1 , s 2 ) Random variable, F 1 ranges over any string S 2 Random variable, ranges over any string

  46. Model. Factor graph examples 2 Factor Graph: 1 Pr(s 1 , s 2 ) = Z S 1 x F 1 (s 1 , s 2 ) Random variable, F 1 ranges over any string Potential function, can score any string pair S 2 Random variable, ranges over any string

  47. Model. Factor graph examples 2 Factor Graph: 1 Pr(s 1 , s 2 ) = Z S 1 x F 1 (s 1 , s 2 ) F 1 S 2

  48. Model. Factor graph examples 2 Factor Graph: 1 Pr(s 1 , s 2 , s 3 ) = Z S 1 x F 1 (s 1 , s 2 ) F 1 x F 2 (s 1 , s 3 ) F 2 S 2 S 3

  49. Model. Factor graph examples 2 Factor Graph: 1 Pr(s 1 , s 2 , s 3, s 4 ) = Z S 1 x F 1 (s 1 , s 2 ) F 1 F 3 x F 2 (s 1 , s 3 ) F 2 x F 3 (s 1 , s 4 ) S 2 S 4 S 3

  50. Model. Factor graph examples 2 Factor Graph: 1 Pr(s 1 , s 2 , s 3, s 4 ) = Z S 1 x F 1 (s 1 , s 2 ) F 1 F 3 x F 2 (s 1 , s 3 ) F 2 x F 3 (s 1 , s 4 ) x F 4 (s 2 , s 3 ) F 4 S 2 S 4 S 3

  51. Model. Factor graph examples 2 Factor Graph: 1 Pr(s 1 , s 2 , s 3, s 4 ) = Z S 1 x F 1 (s 1 , s 2 ) F 1 F 3 x F 2 (s 1 , s 3 ) F 2 x F 3 (s 1 , s 4 ) x F 4 (s 2 , s 3 ) F 4 F 5 S 2 S 4 S 3 x F 5 (s 3 , s 4 )

  52. Model. Factor graph examples 2 Factor Graph: 1 Pr(s 1 , s 2 , s 3, s 4 ) = Z S 1 x F 1 (s 1 , s 2 ) F 1 F 3 x F 2 (s 1 , s 3 ) F 2 x F 3 (s 1 , s 4 ) x F 4 (s 2 , s 3 ) F 4 F 5 S 2 S 4 S 3 x F 5 (s 3 , s 4 ) F 6 x F 6 (s 2 , s 4 )

  53. Model. Factor graph examples 2 Factor Graph: 1 Pr(s 1 , s 2 , s 3, s 4 ) = Z S 1 x F 1 (s 1 , s 2 ) F 1 F 3 x F 2 (s 1 , s 3 ) Potential function, F 2 can score any string pair x F 3 (s 1 , s 4 ) x F 4 (s 2 , s 3 ) F 4 F 5 S 2 S 4 S 3 x F 5 (s 3 , s 4 ) F 6 x F 6 (s 2 , s 4 )

  54. Model. Factor graph examples 2 Factor Graph: S 1 F 1 F 3 Potential function, F 2 can score any string pair F 4 F 5 Each potential function F S 2 S 4 S 3 is computed by a finite- F 6 state transducer .

  55. Model. Factor graph examples 2 Factor Graph: 1 Pr(s 1 , s 2 , s 3, s 4 ) = Z S 1 x F 1 (s 1 , s 2 ) F 1 F 3 x F 2 (s 1 , s 3 ) F 2 x F 3 (s 1 , s 4 ) x F 4 (s 2 , s 3 ) F 4 F 5 S 2 S 4 S 3 x F 5 (s 3 , s 4 ) F 6 x F 6 (s 2 , s 4 ) A formal description of such a model ...

  56. Model. Summary 2 • It is formally an undirected graphical model (a.k.a. Markov Random Field, MRF ), • in which the variables are string-valued , and the factors (potential functions) are finite-state transducers. Dreyer & Eisner, 2009

  57. Model. Less formal description 2 To model multiple strings and their various interactions, I ... • use many finite-state transducers , • have each of them look at a different string pair, • plug them together into a big network, • and coordinate them to predict all strings jointly (also: train the transducers jointly).

  58. Model. Comparison with k-tape FSM 2 • Model k strings with a k-tape finite-state machine? F F b r e ε chen ε b r ε ach εε t b r ε achen ε b r ε ach εεε S 2 S 4 S 1 S 3 • Factored model more powerful: • Encode swaps and other useful models ☺ ☹ • Encode undecidable models

  59. Model. Comparison with k-tape FSM 2 • Model k strings with a k-tape finite-state machine? • >26 k arcs, intractable! Multiple-sequence alignment F F b r e ε chen ε b r ε ach εε t b r ε achen ε b r ε ach εεε S 2 S 4 S 1 S 3 • Factored model more powerful: • Encode swaps and other useful models ☺ ☹ • Encode undecidable models

  60. Inference. Overview 2 Factor Graph: S 1 F 1 F 3 F 2 F 4 F 5 S 2 S 4 S 3 F 6

  61. Inference. Overview 2 • Run Belief Propagation Factor Graph: (BP) S 1 F 1 F 3 F 2 F 4 F 5 S 2 S 4 S 3 F 6

  62. Inference. Overview 2 • Run Belief Propagation Factor Graph: (BP) S 1 • BP is a message-passing F 1 F 3 algorithm, a F 2 generalization of forward-backward . F 4 F 5 S 2 S 4 S 3 F 6

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend