exploiting multilingual lexical resources to predict the
play

Exploiting multilingual lexical resources to predict the - PowerPoint PPT Presentation

Exploiting multilingual lexical resources to predict the compositionality of MWEs Paul Cook University of New Brunswick Compositionality Many MWEs exhibit semantic idiomaticity Compositionality: The extent to which the meaning of an MWE


  1. Exploiting multilingual lexical resources to predict the compositionality of MWEs Paul Cook University of New Brunswick

  2. Compositionality • Many MWEs exhibit semantic idiomaticity • Compositionality: The extent to which the meaning of an MWE is predictable from the meanings of its components • Compositionality lies on a continuum • Compositionality predictions: binary (Bannard et al., 2003), multi- way (Fazly and Stevenson 2007), continuous (Reddy et al. 2011) 2

  3. Compositionality • An MWE component word is compositional if its meaning is reflected in the meaning of the expression • Compositionality predictions: MWE as a whole, and individual component words (Bannard et al., 2003; Reddy et al. 2011) 3

  4. In this talk… • Compositionality prediction for many languages and many kinds of MWEs via a multilingual lexical resource • Token-level MWE identification is important for type- level compositionality prediction • Compositionality scores are not the end of the story: The case of English VPCs 4

  5. Compositionality prediction: String similarity

  6. String similarity • Compositionality prediction based on string similarity of MWE and component words, under translation (Salehi and Cook, 2013) • Applicable to many types of MWEs in many languages • Does not require • Language or construction specific properties (e.g., Fazly et al., 2009) • A parallel corpus (e.g., de Caseli et al., 2010; Salehi et al., 2012) • A monolingual corpus (e.g., Lin 1999; Reddy et al., 2011) • Requires a multilingual dictionary 6

  7. Motivation Source Target Translation language language kick the bucket mord kick zad the --- bucket satl Source Source Target Target Translation language language make a decision tasmim gereftan make sakht a yek decision tasmim 7

  8. Motivation Source Target Translation language language public service khadamaat omumi public omumi service khedmat 8

  9. Approach (1) Translate Panlex omumi khadamaat omumi khedmat Compare Compare (LCS, LEV1, LEV2, SW) (LCS, LEV1, LEV2, SW) s 1 s 2 9

  10. Translations • Translate into 54 languages! (not just one) • Panlex • Free online translation resource • Combines many translation dictionaries • 20M lemmas; 9k language varieties; 1.1B translations • Select top-10 languages via cross-validation 10

  11. Approach (2) public service vs. public public service vs. service Best 10 languages scores Mean Mean s ( 1 ) s α + − α 1 2 Compositionality score 11

  12. Results: ENC Method Correlation (r) Reddy et al. (2011) 0.714 String similarity 0.649 String similarity: Best single language 0.497 String similarity + Reddy et al. 0.742 • 90 English noun compounds (ENC, Reddy et al., 2011) • Nested 10-fold cross-validation 12

  13. Results: EVPC Method Accuracy Bannard et al. (2003) 0.600 String similarity 0.693 • 160 English verb-particle constructions (EVPC, Bannard, 2006) • Binary compositionality prediction for verb component 13

  14. Results: GNC Method Correlation (r) String similarity 0.372 String similarity: Best single language 0.320 Schulte im Walde et al. (2013) 0.450 246 German noun compounds (GNC, von der Heide and Borgwaldt, 2009; Schulte im Walde et al. 2013) 14

  15. Most-selected languages ENC EVPC (verb) GNC Language Family Language Family Language Family Czech Slavic Basque Basque Polish Slavic Norwegian Germanic Lithuanian Baltic Lithuanian Baltic Portuguese Romance Slovenian Slavic Finnish Uralic 15

  16. Compositionality prediction: Distributional similarity

  17. Compositionality via distributional similarity bucket kick kick the bucket die (Katz and Giesbrecht, 2006; Reddy et al., 2011) 17

  18. Compositionality via distributional similarity bucket pail kick the pail kick kick the bucket die (Katz and Giesbrecht, 2006; Reddy et al., 2011) 18

  19. Distributional similarity of multi-way translations • Compositionality based on distributional similarity under translation into many languages (Salehi, Cook and Baldwin, 2014) • Still applicable to many languages and kinds of MWEs • Additional requirement: Many monolingual corpora 19

  20. Approach (1) 20

  21. Approach (2) public service vs. public public service vs. service Best N languages scores Mean Mean s ( 1 ) s α + − α 1 2 Compositionality score 21

  22. Results Method ENC EVPC GNC 1. Source language 0.700 0.177 0.141 2. Best-N target languages 0.434 0.398 0.113 3. 1 + 2 0.725 0.312 0.178 4. 1 + 2 + String similarity 0.732 0.417 0.364 Correlation (r) on each dataset 22

  23. Compositionality prediction and token-level MWE identification

  24. MWE identification: VPC vs V+PP • VPCs occur in two configurations and are ambiguous with V+PPs: 1. Look up the number (VPC: joined) 2. Look the number up (VPC: split) 3. Look up the chimney (V+PP) • Token-level MWE identification strategy: Full-token n- gram match • For EVPC this led to poor performance of English distributional similarity 24

  25. MWE identification: MWEs vs literal combinations • Expressions can be ambiguous between MWEs and literal combinations 1. I think Paula might hit the roof if you start ironing 2. When the blood hit the roof of the car I realised it was serious • Many verb-noun idiomatic combinations are primarily used literally! (Fazly et al., 2009) • blow ( the ) whistle (65%); pull ( one’s ) leg (78%); see star ( s ) (92%) • Type-level compositionality prediction based on distributional similarity can be influenced by the (possibly predominant!) literal usages of an expression 25

  26. Beyond compositionality predictions

  27. Limitations • Compositionality predictions don’t indicate which meaning of a component word is contributed • Kangaroo court • Court is mostly compositional (4.4/5, Reddy et al., 2011) • Place for legal trials? Area for playing sports? • Stir up • 16% of annotators judged up to be entailed (Bannard, 2006) • Particles are highly polysemous • Is some other meaning contributed? 27

  28. Cognitive grammar • Cognitive linguistics: Many so-called idiomatic expressions in fact draw on meaning of component words • Cognitive grammar: Represent non-spatial concepts as spatial relations • Trajector (TR): Object that is conceptually foregrounded • Landmark (LM): Object against which TR is foregrounded • Schema: Abstract conceptualization of TR and LM in some initial and final configuration, as communicated by an expression • Lindner (1981) identifies 4 senses for up in VPCs 28

  29. Vertical up • TR moves away from LM in direction of increase along a vertically-oriented axis • Prototypical upward movement: The balloon floated up • Movement along an abstract vertical axis: The price of gas jumped up • Metaphorical extensions: Up as a path into… • Perceptual field: show up , spring up • Mental field: dream up , think up • State of activity: get up , start up 29

  30. Goal-oriented up • TR approaches a goal LM; movement is not necessarily vertical • Prototypical examples: • The bus drew up to the stop • He walked up to the bar • Metaphorical extensions: • Social domain: The intern kissed up to his boss • Domain of time: The deadline is coming up quickly 30

  31. Completive up • A sub-sense of goal-oriented up ; shares its schema • LM represents an action being done to completion • Corresponds to up as an aspectual marker • Examples: • Clean up your room! • Penelope drank up all her milk • I filled the car up 31

  32. Reflexive up • Also a sub-sense of goal- oriented up • Sub-parts of TR approach each other • Examples: • The CEO bottled up her anger until she burst • He crumpled up the piece of paper • Tie up your skates! 32

  33. Schematic network for up 33

  34. Schemas vs Compositionality • Compared annotations from Bannard (2006) and Cook and Stevenson (2006) • Particles judged compositional correspond to vertical up • E.g., spring up , stay up • Non-compositional particles include all 4 senses, e.g., • Vertical: speed up • Goal-oriented: back up • Completive: beat up • Reflexive: roll up 34

  35. Experiments • Type-level classification of up VPCs by sense (Cook and Stevenson, 2006) • Supervised learning approach: SVM • Features • Verb : Relative frequencies of syntactic slots: subject, direct object, indirect object, object of preposition • Particle : • Frequency of split vs joined construction • Frequency of verb with other particles 35

  36. Results % accuracy Features 3-way 2-way Baseline 33 50 Verb 51 67 Particle 33 47 Verb + Particle 54 63 • 3-way: Merge goal-oriented and completive up • 2-way: Vertical up vs rest • Dataset: 180 up VPCs annotated for sense evenly split into train/dev/test sets 36

  37. Summary • Compositionality predictions via a multilingual lexical resource are applicable to many languages and kinds of MWEs • MWE identification is important for compositionality prediction based on distributional similarity • Cognitive grammar provides an alternative way to describe the semantics of English VPCs 37

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend