Fixed Similes: Measuring Aspects of the Relation between MWE - - PowerPoint PPT Presentation

fixed similes measuring aspects of the relation between
SMART_READER_LITE
LIVE PREVIEW

Fixed Similes: Measuring Aspects of the Relation between MWE - - PowerPoint PPT Presentation

Fixed Similes: Measuring Aspects of the Relation between MWE Idiomatic Semantics and Syntactic Flexibility PANAGIOTIS KOURIS STELLA MARKANTONATOU YANIS MAISTROS SCHOOL OF ELECTRICAL AND SCHOOL OF ELECTRICAL AND INSTITUTE FOR LANGUAGE


slide-1
SLIDE 1

Fixed Similes: Measuring Aspects of the Relation between MWE Idiomatic Semantics and Syntactic Flexibility

PANAGIOTIS KOURIS SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING, NATIONAL TECHNICAL UNIVERSITY OF ATHENS, ATHENS, GREECE YANIS MAISTROS SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING, NATIONAL TECHNICAL UNIVERSITY OF ATHENS, ATHENS, GREECE

STELLA MARKANTONATOU INSTITUTE FOR LANGUAGE AND SPEECH PROCESSING, ATHENA RIC, GREECE

1

slide-2
SLIDE 2

We acknowledge support of this work by the project “Computatjonal Sciences and Technologies for Data, Content and Interactjon” (MIS 5002437) which is implemented under the Actjon “Reinforcement of the Research and Innovatjon Infrastructure”, funded by the Operatjonal Programme ``Competjtjveness, Entrepreneurship and Innovatjon" (NSRF 2014-2020) and co-fjnanced by Greece and the European Union (European Regional Development Fund).

2

slide-3
SLIDE 3

SYNTACTIC FLEXIBILITY vs IDIOMATICITY

Fixed similes (FS): adjectjve+connector+(artjcle)+noun, e.g. sweet like honey. Modern Greek. We fjnd that idiomatjc semantjcs is related with a subset of FS syntactjc alternatjves. Idiomatjcity: similarity degree between FS semantjcs and the semantjcs of their free adjectjve. We identjfy and measure two types of similarity, one of which allows us to make predictjons about the syntactjc fmexibility of FS. Measurements were made on a web-retrieved corpus of 4900 FS usage examples.

3

slide-4
SLIDE 4

PARTS OF A FIXED SIMILE (FS)

Terminology adopted from Hanks (2005). Εγώ κόκκινη σαν (την) παπαρούνα I red as (the) poppy ADJECTIVE σαν (DETERMIINER) NOUN

TENOR PROPERTY VEHICLE

4

slide-5
SLIDE 5

FIXED SIMILES

Simile is a fjgure of speech. Unlike metaphor, it draws atuentjon to the likeness between the tenor and the vehicle that are implied to share certain propertjes (Veale & Hao, 2007). Similes draw on conventjonal beliefs about likeness and efgectjvely convey the speaker’s superlatjve evaluatjon of the tenor (Israel, Harding & Tobin, 2004). Many conventjonal similes tend to be fjxed and have idiom status (Hanks, 2005). Greek: κόκκινος σαν αστακός (kokinos san astakos) ‘red as lobster’ applies mainly to PERSONs or BODY(parts). 96,4% of its occurrences in our data denote blushing or sunburned people. FS share a good part of simile usage in English (Niculae & Danescu-Niculescu-Mizil, 2014}.

5

slide-6
SLIDE 6

Modern Greek FS structures

SIMILATIVE STATEMENTS adjectjve + σαν + (determiner) +noun (the “normatjve” form) γλυκός σαν (το) μέλι sweet like (the) honey’ COMPARATIVE πιο adjectjve + (και)+από+ (determiner) +noun πιο γλυκός (και) από (το) μέλι more sweet (and) than (the) honey EQUATIVE τόσο+ adjectjve +όσο+ determiner +noun τόσο γλυκός όσο το μέλι as sweet as the honey adjectjve +punctuatjon mark+ σαν + (determiner) +noun Ήταν ο ύπνος γλυκός. Σαν το μέλι της κηρήθρας. ‘Sleep was sweet. Like the honey of the honeycomb.’ (Chila-Markopoulou, 1986; Israel , Harding & Tobin, 2004; Mpouli, 2015)

6

slide-7
SLIDE 7

SCOMP (copula) Της Παλιαχώρας ο χορός ήταν γλυκός σαν μέλι. ‘Paliachora’s dance was sweet like honey.’ SCON (verb adjunct controlled by the sentence subject) Ελαφρύς σαν πούπουλο, πήδηξε στο μαξιλάρι. ‘As light as the feather, he jumped on the pillow.’ AJSIM (adjectjval modifjer) Ένας γλυκός σαν μέλι ύπνος τον τύλιξε.

  • Lit. A sweet like honey sleep wrapped him.

SS (verbless period): Ύπνος γλυκός σαν μέλι. ‘A sleep as sweet as honey.’

SYNTACTIC FUNCTIONS OF AN ADJECTIVE (1)

7

slide-8
SLIDE 8

DIR (subject) λευκό σαν το χιόνι, μαύρο..μπορούν να παίξουν παιχνίδι στο σπίτι σας

  • Lit. as white as the snow, black…can play a game in your place

INDIR (prepositjonal complement) χρώματα που κυμαίνονται από χρυσοκίτρινο ως λευκό σαν χιόνι!

  • Lit. colours ranging from golden-yellow to white like snow!

INTENSIFYING ADVERB «Δεν ήμασταν ιδιαίτερα μακριά, αλλά ήταν πάρα πολύ γρήγοροι - σαν αστραπή», δήλωσε.

  • Lit. “(We) were not partjcularly far, but (they) were very much fast-like lightjng”, he declared.

SYNTACTIC FUNCTIONS OF AN ADJECTIVE (2)

8

slide-9
SLIDE 9

DATA (1)-FS identifjcation

Hellenic Natjonal Corpus (HNC), htup://hnc.ilsp.gr/, and a corpus of 100 million words collected with crawlers Patuern adjectjve+σαν+(det)+noun: 152 similes that appeared more than once were retained 260 natjve speakers of Modern Greek specifjed which similes they would use in their everyday exchange with a tailor made FB applicatjon. 85 similes were used by a critjcal number of speakers of which 20 FS are used for this presentatjon. The 20 FS represent the classes defjned in two difgerent simile classifjcatjons by the semantjcs of the vehicle and of the property, one of Modern Greek FS (Mpolla- Mavridou, 1996) and one of English similes (Hanks, 2005).

9

slide-10
SLIDE 10

DATA (2)-web retrieved data- morpholgy and determiner

«άσπρος» «σαν πανί» white.SG.MASC.NOM cloth.SG.NEUT.NOM/ACC «άσπρου» «σαν πανί»

  • white. SG.MASC.GEN/SG.NEUT.GEN

cloth.SG.NEUT.NOM/ACC «άσπρες» «σαν πανί» white..PL.FEM.NOM/ PL.FEM..ACC cloth.SG.NEUT.NOM/ACC «άσπρες» «σαν πανιά» white.PL.FEM.NOM/ PL.FEM..ACC cloth.PL.NEUT.NOM/ACC To make sure that the required structures would be retrieved (+ morphological variatjon): Inverted word order «σαν το πανί άσπρος» “like the cloth white” Equatjve «τόσο άσπρος» «σαν το πανί» “as white” “like the cloth” Comparatjve «πιο άσπρος» «από το πανί» “more white” “than the cloth” «σαν το πανί» “like the cloth”

Modern Greek has 3 genders, 3 cases and 2 numbers and relatjve free word order.

10

slide-11
SLIDE 11

DATA (3)-web retrieved lexical variation

Replacement of the adjectjve or the noun with possible alternatjves (synonyms, diminutjves, etc, search with the “σαν + VEHICLE” part). άσπρος σαν το πανί Lit. white as the cloth κάτασπρος ‘very white’, λευκός ‘white’, πάλλευκος*‘very white’, κατάλευκος ‘very white’, ωχρός ‘pale’, κάτωχρος ‘very pale’, κίτρινος ‘yellow’, κατακίτρινος ‘very yellow’ *πάλλευκος (in use ancient form of “very white”) did not return any FS hits οπλισμένος/αρματωμένος σαν αστακός

  • Lit. armed like a lobster

πιστός σαν σκυλί/σκύλος faithful like dog.SG.NEUT/dog.SG.MASC κόκκινος/ερυθρός σαν παπαρούνα/παπαρουνίτσα red/red.ANCIENT like poppy/poppy.DIMINUTIVE Google searches were applied on the lexical variants as before. Less than 10 FS gave lexical variants

11

slide-12
SLIDE 12

4900 usage examples were selected from about 20 tjmes more material in terms of uniqueness and originality

12

slide-13
SLIDE 13

DATA (4)-web retrieved syntactic variation

FS as adjectjves and not as sententjal structures of the type Tenor is FS (e.g., This music is as sweet as honey.):

  • FS support a wide range of syntactjc functjons apart from that of the complement of the

copula (Slides 6 & 7)

  • Retrieved complements of various copulas (not only of BE): 18.8%-77%
  • Retrieved complements of various copulas + verbless structures: 18.8%-81.5%

the syntactjc variatjon of the FS structures (sweet as honey, more sweet than honey, as sweet as honey) was checked.

13

slide-14
SLIDE 14

Labels for morpho-syntactjc variatjon

14

slide-15
SLIDE 15

Semantic annotation

Annotatjng tenors with WordNet supersenses (Schneider et al., 2013).

No suffjcient Modern Greek WordNets are available, so Greek tenors were translated into

  • English. In case of pronouns/pro-

drop, tenors were induced from the context. Inter-annotator agreement: 2400 instances (49% of the data), representjng 4 FS (FS20, FS19, FS18, FS1), were annotated by 5

  • linguists. Krippendorfg’s alpha =

0.95.

15

slide-16
SLIDE 16

Fixedness is related with semantic idiomaticity

Pearson correlatjon coeffjcient: 0.84 with p-value=3,2 .10^-6

Shannon entropy to measure syntactjc/semantjc diversity. morphosyntactjc entropy: FS instances as vectors (presence or absence of a feature) semantjc entropy: number of

  • ccurrences of each WordNet

supersense

constr, comp, toso, ixp-punc, ixp-creatjve, ixp-n, empp, mwo increased P correlatjon iwo, ixp-w, mod, var, det, empm decreased P correlatjon

Result reminiscent of the Kay & Sag (2012) observatjon: not all syntactjc contexts interfere with MWE semantjcs.

16

slide-17
SLIDE 17

Clustering (1)

All syntactjc features: six clusters with most frequent features det, var, comp, constr (constr’s and comp’s syntactjc diversity correlates with tenor's semantjc diversity) Geearaert, Bayen & Newman (2017), using eyetracking, identifjed lexical variation (corresponding to our var) as one of the ``easiest” MWE variations for English in terms of comprehension.

Hypothesis: ``easy” MWE variations in terms of comprehension are independent of MWE idiomaticity and more frequent than other syntactic structures in the speakers' output.

FS instance clusters–not FS classifjcatjon by syntactjc fmexibility. Logistjc Principal Component Analysis (LPCA): dimensionality reductjon, visualizatjon of FS instances in two dimensions. k-Means algorithm: clusters

  • f FS instances with clear

limits.

17

slide-18
SLIDE 18

Clustering (2)

constr is the most frequent feature in cluster 1 and the second most frequent feature in the clusters 2, 3 & 4 where the features comp, punc, toso are most frequent.

When only the syntactjc fmexibility features that contribute positjvely to the fmexibility-idiomatjc semantjcs correlatjon are used, four clusters are

  • btained with most

frequent features constr, comp, punc, toso.

18

slide-19
SLIDE 19

A quantitative defjnition of idiomaticity

FS as adjectjves FS idiomatjcity can be modeled as the similarity between the tenor semantjcs and the semantjcs of the NPs selected by the free property of the FS.

E.g., the similarity between the semantjcs of the tenors of άσπρος σαν το πανί Lit. white as cloth and the semantjcs of the NPs selected by the free occurrences of the adjectjve άσπρος `white'.

Development of a corpus of free propertjes

For each property of the 20 FS, the fjrst 200 unique instances from the HNC were retrieved. MWEs involving the adjectjves were excluded, e.g. κόκκινη γραμμή (kokini grami) `red line‘ -kokini grami denotes a boundary or limit which should not be crossed.  The NPs selected by the adjectjves in questjon were annotated with the WordNet supersenses.

19

slide-20
SLIDE 20

A quantitative defjnition of idiomaticity

Semantjc similarity measure 1 (SSM1): Cosine similarity based on frequency. The vectors of the two sets of supersenses include the frequency* of each supersense, e.g., the frequency of the supersense PERSON in the semantjcs of Lit. white as cloth and in the semantjcs of the NPs selected by the FS's free property. Semantjc similarity measure 2 (SSM2): Cosine similarity based on binary vectors: each supersense vector encodes the presence or absence (i.e. 1 or 0, respectjvely) of the supersense, e.g., Lit. white as cloth selects 7 semantjc categories and the respectjve free adjectjve selects 17; 5 semantjc categories occur in both sets.

* Frequencies normalized in [0,1]

20

slide-21
SLIDE 21

SSM2 correlates with syntactic/semantic diversity

Pearson correlatjon coeffjcient: (i) 0,61 with p value = 4,4 .10^-4, (ii) 0,85 with p value = 2.10^-6

SSM2 is a predictor of FS syntactjc fmexibility.

21

slide-22
SLIDE 22

Quantitative study: new aspects of the relation “FS (MWE) syntactic fmexibility

  • idiomatic semantics”

cosine similarity by the semantjc categories selected by the FS and its free property (SSM2), but not cosine similarity by frequency (SSM1), correlates with FS syntactjc and semantjc diversity; a possible predictor of FS syntactjc fmexibility and idiomatjcity. syntactjc deviatjons from the normatjve form can be split in ``FS semantjcs sensitjve '' and ``FS semantjcs insensitjve'' ones; the clustering of FS instances as vectors of syntactjc fmexibility features showcases lexical variatjon, a ``semantjcs insensitjve syntactjc deviatjon“, as a dominant feature; lexical variatjon has been shown by V+N MWE comprehension studies to require less comprehension efgort than other syntactjc constructjons. If future research shows that a correlatjon exists, it may be inferred that syntactjc fmexibility depends on certain cognitjve factors/skills.

22

slide-23
SLIDE 23

Aknowledgements

We thank Katerina Selimi, Dimitra Stasinou, Vasiliki Moutzouri and Maria Chantou for their help at the annotatjon phase of this work.

23

slide-24
SLIDE 24

EYΧΑΡΙΣΤΟΥΜΕ THANK.1PL

HOMELAND, FTERI, KEFALONIA

24

slide-25
SLIDE 25

adjectival and adverbial comparatives. PhD thesis, National and Kapodistrian University of Athens,

  • 1986. (In Greek).

Afsaneh Fazly and Suzanne Stevenson. Distinguishing subtypes of multiword expressions using linguistically-motivated statistical measures. In Proceedings of the Workshop on A Broader Perspective

  • n Multiword Expressions, 9–16, June 2007.

Kristina Geeraert, R. Harald Bayen Bayen, and John Newman. Understanding idiomatic variation. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), 80–90, April 2017. Patrick Hanks. Similes and sets: The English preposition “like”. In Languages and Linguistics: Festschrift for Professor Fr. Čermák. Philosophy Faculty of the Charles University, Prague, 2005. Michael Israel, Jennifer Riddle Harding, and Vera T

  • bin. On simile. In Michel Achard and Suzanne

Kemmer, editors, Language, Culture and Mind, 123–135. CSLI Publications, 2004. Paul Kay and Ivan A. Sag. A lexical theory

  • f

phrasal idioms. Available at: www1.icsi.berkeley.edu/_kay/idioms-submitted.pdf, 2012. Vasileia Mpolla-Mavridou. A contrastive study of the fjxed similes of the Greek and English languages. PhD thesis, Aristotle University of Thessaloniki, 1996. (In Greek). Suzanne Mpouli and Jean-Gabriel Ganascia. “pale as death” or “pâle come le mort”: Frozen similes used as literary clichés. In EUROPRHAS 2015: Computerised and Corpus-Based Approaches to Phraseology: Monolingual and Multilingual Perspectives, 2015. Vlad Niculae and Cristian Danescu-Niculescu-Mizil. Brighter than gold: Figurative language in user generated comparisons. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2008–2018, October 2014. Nathan Schneider, Behrang Mohit, Kemal Ofmazer, and Noah A. Smith. Coarse lexical semantic annotation with supersenses: An Arabic case study. In Proceedings of NAACL-HLT 2013, pages 661–667. Association for Computational Linguistics, June 2013. T

  • ny Veale and Yanfen Hao. Learning to understand fjgurative language: From similes to metaphor to
  • irony. In Proceedings of the 29th Annual Meeting of the Cognitive Science Society (CogSci 2007), 2007.

25