an unsupervised method for uncovering morphological chains
play

An Unsupervised Method for Uncovering Morphological Chains Karthik - PowerPoint PPT Presentation

An Unsupervised Method for Uncovering Morphological Chains Karthik Narasimhan Regina Barzilay Tommi Jaakkola CSAIL, Massachusetts Institute of Technology 1 Morphological Chains 2 Morphological Chains Chains to model the formation of words.


  1. An Unsupervised Method for Uncovering Morphological Chains Karthik Narasimhan Regina Barzilay Tommi Jaakkola CSAIL, Massachusetts Institute of Technology 1

  2. Morphological Chains 2

  3. Morphological Chains Chains to model the formation of words. 2

  4. Morphological Chains Chains to model the formation of words. paint → painting → paintings 2

  5. Morphological Chains Chains to model the formation of words. paint → painting → paintings Richer representation than traditional scenarios 2

  6. Morphological Chains Chains to model the formation of words. paint → painting → paintings Richer representation than traditional scenarios Segmentation 2

  7. Morphological Chains Chains to model the formation of words. paint → painting → paintings Richer representation than traditional scenarios Paradigms Segmentation 2

  8. Morphological Chains Chains to model the formation of words. paint → painting → paintings Richer representation than traditional scenarios Paradigms Segmentation 2

  9. Our Approach Core Idea : Unsupervised discriminative model over pairs of words in the chain. paint → painting 3

  10. Our Approach Core Idea : Unsupervised discriminative model over pairs of words in the chain. paint → painting • Orthographic features 
 Morfessor (Goldwater and Johnson, 2004; Creutz and Lagus, 2007), Poon et al., 2009, Dreyer and Eisner, 2009, Sirts and Goldwater, 2013 3

  11. Our Approach Core Idea : Unsupervised discriminative model over pairs of words in the chain. paint → painting • Orthographic features 
 Morfessor (Goldwater and Johnson, 2004; Creutz and Lagus, 2007), Poon et al., 2009, Dreyer and Eisner, 2009, Sirts and Goldwater, 2013 • Semantic features 
 Schone and Jurafsky, 2000; Baroni et al., 2002 3

  12. Our Approach Core Idea : Unsupervised discriminative model over pairs of words in the chain. paint → painting • Orthographic features 
 Morfessor (Goldwater and Johnson, 2004; Creutz and Lagus, 2007), Poon et al., 2009, Dreyer and Eisner, 2009, Sirts and Goldwater, 2013 • Semantic features 
 Schone and Jurafsky, 2000; Baroni et al., 2002 • Handle transformations. (plan → planning) 3

  13. Textual Cues 4

  14. Textual Cues Orthographic 4

  15. Textual Cues Orthographic Patterns in the characters forming words. 4

  16. Textual Cues Orthographic Patterns in the characters forming words. paint pain paints pains painted pained 4

  17. Textual Cues Orthographic Patterns in the characters forming words. paint pain paints pains painted pained pain ran paint rant 4

  18. Textual Cues Orthographic Semantic Patterns in the Meaning embedded as characters forming vectors. words. paint pain paints pains painted pained pain ran paint rant 4

  19. Textual Cues Orthographic Semantic Patterns in the Meaning embedded as characters forming vectors. words. A B cos(A,B) paint pain paint paints 0.68 paints pains paint painted 0.60 painted pained pain pains 0.60 pain paint 0.11 ran rant 0.09 pain ran paint rant 4

  20. Textual Cues Orthographic Semantic Patterns in the Meaning embedded as characters forming vectors. words. A B cos(A,B) paint pain paint paints 0.68 paints pains paint painted 0.60 painted pained pain pains 0.60 pain paint 0.11 ran rant 0.09 pain ran paint rant 4

  21. Task Setup Training Word Vector Learning Unannotated word list Large text corpus with frequencies a 395134 ability 17793 able 56802 about 524355 Wikipedia 5

  22. 6

  23. Multiple chains possible for a word. nation → national → international → internationally nation → national → nationally → internationally 6

  24. Multiple chains possible for a word. nation → national → international → internationally nation → national → nationally → internationally Different chains can share word pairs. nation → national → international → internationally nation → national → nationalize 6

  25. Independence Assumption 7

  26. Independence Assumption Treat word-parent pairs separately 7

  27. Independence Assumption Treat word-parent pairs separately national Word ( w ) 7

  28. Independence Assumption Treat word-parent pairs separately national nation Suffix Word ( w ) Parent ( p ) Type ( t ) 7

  29. Independence Assumption Treat word-parent pairs separately national nation Suffix Word ( w ) Parent ( p ) Type ( t ) 7

  30. Independence Assumption Treat word-parent pairs separately national nation Suffix Word ( w ) Parent ( p ) Type ( t ) Candidate ( z ) 7

  31. national nation Suffix Word ( w ) Parent ( p ) Type ( t ) Candidate ( z ) 8

  32. national nation Suffix Word ( w ) Parent ( p ) Type ( t ) Candidate ( z ) P ( w, z ) ∝ e θ · φ ( w,z ) 8

  33. national nation Suffix Word ( w ) Parent ( p ) Type ( t ) Candidate ( z ) P ( w, z ) ∝ e θ · φ ( w,z ) Types - Prefix, Suffix, Transformations, Stop. 8

  34. Transformations • Templates for handling changes in stem during addition of affixes. • Repetition template: PQ → PQQR (for each Q in alphabet). Ex. plan → planning P Q R • Feature template for each transformation. 9

  35. Transformation types 10

  36. Transformation types 3 different transformations: 10

  37. Transformation types 3 different transformations: • Repetition (plan → planning) 10

  38. Transformation types 3 different transformations: • Repetition (plan → planning) • Deletion (decide → deciding) 10

  39. Transformation types 3 different transformations: • Repetition (plan → planning) • Deletion (decide → deciding) • Modification (carry → carried) 10

  40. Transformation types 3 different transformations: • Repetition (plan → planning) • Deletion (decide → deciding) • Modification (carry → carried) Trade-off between types of transformation and computational tractability. 10

  41. Transformation types 3 different transformations: • Repetition (plan → planning) • Deletion (decide → deciding) • Modification (carry → carried) Trade-off between types of transformation and computational tractability. • These three do well for a range of languages and are computationally tractable: max O(| ∑ | 2 ) for alphabet ∑ 10

  42. Features φ (w,z) 11

  43. Features φ (w,z) Orthographic 11

  44. Features φ (w,z) Orthographic • A ffi xes : Indicator feature for top affixes 11

  45. Features φ (w,z) Orthographic • A ffi xes : Indicator feature for top affixes • A ffi x Correlation : pairs of affixes sharing set of stems (inter-, re-), (under-, over-) 11

  46. Features φ (w,z) Orthographic • A ffi xes : Indicator feature for top affixes • A ffi x Correlation : pairs of affixes sharing set of stems (inter-, re-), (under-, over-) • Word freq. of parent 11

  47. Features φ (w,z) Orthographic • A ffi xes : Indicator feature for top affixes • A ffi x Correlation : pairs of affixes sharing set of stems (inter-, re-), (under-, over-) • Word freq. of parent • Transformation types with character bigrams 11

  48. Features φ (w,z) Orthographic Semantic • A ffi xes : Indicator feature • Cosine similarity for top affixes between word vectors of word and parent • A ffi x Correlation : pairs of affixes sharing set of stems (inter-, re-), (under-, over-) • Word freq. of parent • Transformation types with character bigrams 11

  49. Features φ (w,z) Orthographic Semantic • A ffi xes : Indicator feature • Cosine similarity for top affixes between word vectors of word and parent • A ffi x Correlation : pairs of affixes sharing set of stems (inter-, re-), (under-, over-) • Word freq. of parent • Transformation types with character bigrams Cosine similarity with player 11

  50. Learning 12

  51. Learning • Objective: 12

  52. Learning • Objective: e θ · φ ( w,z ) Y Y X Y X P ( w ) = P ( w, z ) = w 0 ∈ Σ ⇤ ,z 0 e θ · φ ( w 0 ,z 0 ) P w w z w z 12

  53. Learning • Objective: e θ · φ ( w,z ) Y Y X Y X P ( w ) = P ( w, z ) = w 0 ∈ Σ ⇤ ,z 0 e θ · φ ( w 0 ,z 0 ) P w w z w z • Optimize likelihood using convex optimization: LBFGS-B (with regularization) 12

  54. Learning • Objective: e θ · φ ( w,z ) Y Y X Y X P ( w ) = P ( w, z ) = w 0 ∈ Σ ⇤ ,z 0 e θ · φ ( w 0 ,z 0 ) P w w z w z • Optimize likelihood using convex optimization: LBFGS-B (with regularization) • Not tractable - requires summing over all possible strings in alphabet to calculate normalization constant, Z. 12

  55. Learning • Objective: e θ · φ ( w,z ) Y Y X Y X P ( w ) = P ( w, z ) = w 0 ∈ Σ ⇤ ,z 0 e θ · φ ( w 0 ,z 0 ) P w w z w z • Optimize likelihood using convex optimization: LBFGS-B (with regularization) • Not tractable - requires summing over all possible strings in alphabet to calculate normalization constant, Z. 12

  56. Contrastive Estimation 13

  57. Contrastive Estimation • Instead, we use Contrastive Estimation (Smith and Eisner, 2005): 13

  58. Contrastive Estimation • Instead, we use Contrastive Estimation (Smith and Eisner, 2005): • Neighborhood of invalid words for each word to take probability mass from. 13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend