Exploiting multilingual lexical resources to predict the compositionality of MWEs
Paul Cook University of New Brunswick
Exploiting multilingual lexical resources to predict the - - PowerPoint PPT Presentation
Exploiting multilingual lexical resources to predict the compositionality of MWEs Paul Cook University of New Brunswick Compositionality Many MWEs exhibit semantic idiomaticity Compositionality: The extent to which the meaning of an MWE
Paul Cook University of New Brunswick
predictable from the meanings of its components
way (Fazly and Stevenson 2007), continuous (Reddy et al. 2011)
2
meaning is reflected in the meaning of the expression
individual component words (Bannard et al., 2003; Reddy et al. 2011)
3
and many kinds of MWEs via a multilingual lexical resource
level compositionality prediction
story: The case of English VPCs
4
component words, under translation (Salehi and Cook, 2013)
2009)
2012)
6
kick the bucket kick the bucket mord zad
Source language Target language Translation Source Target
make a decision make a decision tasmim gereftan sakht yek tasmim
Source language Target language Translation
7
public service public service khadamaat omumi
khedmat
Source language Target language Translation
8
Translate Panlex khadamaat omumi
khedmat Compare (LCS, LEV1, LEV2, SW) s1 Compare (LCS, LEV1, LEV2, SW) s2
9
translations
10
Mean Mean public service vs. public public service vs. service Best 10 languages scores
2 1
) 1 ( s s α α − +
Compositionality score
11
12
Method Correlation (r) Reddy et al. (2011) 0.714 String similarity 0.649 String similarity: Best single language 0.497 String similarity + Reddy et al. 0.742
13
Method Accuracy Bannard et al. (2003) 0.600 String similarity 0.693
14
Method Correlation (r) String similarity 0.372 String similarity: Best single language 0.320 Schulte im Walde et al. (2013) 0.450
246 German noun compounds (GNC, von der Heide and Borgwaldt, 2009; Schulte im Walde et al. 2013)
ENC EVPC (verb) GNC Language Family Language Family Language Family Czech Slavic Basque Basque Polish Slavic Norwegian Germanic Lithuanian Baltic Lithuanian Baltic Portuguese Romance Slovenian Slavic Finnish Uralic
15
die kick the bucket kick bucket (Katz and Giesbrecht, 2006; Reddy et al., 2011)
17
die kick the bucket kick bucket pail kick the pail (Katz and Giesbrecht, 2006; Reddy et al., 2011)
18
under translation into many languages (Salehi, Cook and Baldwin, 2014)
MWEs
19
20
Mean Mean public service vs. public public service vs. service Best N languages scores
2 1
) 1 ( s s α α − +
Compositionality score
21
22
Method ENC EVPC GNC
0.700 0.177 0.141
0.434 0.398 0.113
0.725 0.312 0.178
0.732 0.417 0.364
Correlation (r) on each dataset
with V+PPs:
gram match
distributional similarity
24
combinations
(Fazly et al., 2009)
can be influenced by the (possibly predominant!) literal usages of an expression
25
component word is contributed
27
fact draw on meaning of component words
relations
and final configuration, as communicated by an expression
28
increase along a vertically-oriented axis
The balloon floated up
axis: The price of gas jumped up
into…
29
movement is not necessarily vertical
kissed up to his boss
is coming up quickly
30
31
each other
anger until she burst
piece of paper
32
33
Stevenson (2006)
34
Stevenson, 2006)
direct object, indirect object, object of preposition
35
Features % accuracy 3-way 2-way Baseline 33 50 Verb 51 67 Particle 33 47 Verb + Particle 54 63
into train/dev/test sets
36
lexical resource are applicable to many languages and kinds of MWEs
prediction based on distributional similarity
describe the semantics of English VPCs
37
Timothy Baldwin, Afsaneh Fazly, Bahar Salehi, and Suzanne Stevenson
Research Council of Canada (NSERC), NICTA, the University of Toronto, and The University of Melbourne for funding this research
38