A Cohesion Graph Based Approach for Unsupervised Recognition of - PowerPoint PPT Presentation

A Cohesion Graph Based Approach for Unsupervised Recognition of Literal and Nonliteral Use of Multiword Expressions Linlin Li and Caroline Sporleder MMCI / Computational Linguistics, Saarland University { linlin,csporleder } @coli.uni-sb.de TextGraphs 2009, Singapore August 7 Linlin Li, Caroline Sporleder Recognition of Literal and Nonliteral Use of MWEs 1/ 15

Why is Non-Literal Language a Problem? Examples of Non-Literal Language Dissanayake said that Kumaratunga was ”playing with fire” after she accused military’s top brass of interfering in the peace process. Kumaratunga has said in an interview she would not tolerate attempts by the army high command to sabotage her peace moves. A defence analyst close to the government said Kumaratunga had spoken a ”load of rubbish” and the security forces would not take kindly to her disparaging comments about them. Linlin Li, Caroline Sporleder Recognition of Literal and Nonliteral Use of MWEs 2/ 15

Why is Non-Literal Language a Problem? Examples of Non-Literal Language Dissanayake said that Kumaratunga was ”playing with fire” after she accused military’s top brass of interfering in the peace process. Kumaratunga has said in an interview she would not tolerate attempts by the army high command to sabotage her peace moves. A defence analyst close to the government said Kumaratunga had spoken a ”load of rubbish” and the security forces would not take kindly to her disparaging comments about them. Non-Literal Expressions (idioms, metaphors etc.) occur frequently in language often behave idiosyncratically have to be recognised automatically to be analysed and interpreted in an appropriate way Linlin Li, Caroline Sporleder Recognition of Literal and Nonliteral Use of MWEs 2/ 15

Dealing with Idioms Most previous research: automatic idiom extraction methods (type-based classification) But: doesn’t work for creative language use potentially idiomatic expressions can be used in literal sense Literal Usage (1) Somehow I always end up spilling the beans all over the floor and looking foolish when the clerk comes to sweep them up. (2) Grilling outdoors is much more than just another dry-heat cooking method. It’s the chance to play with fire, satisfying a primal urge to stir around in coals. Linlin Li, Caroline Sporleder Recognition of Literal and Nonliteral Use of MWEs 3/ 15

Dealing with Idioms Most previous research: automatic idiom extraction methods (type-based classification) But: doesn’t work for creative language use potentially idiomatic expressions can be used in literal sense Literal Usage (1) Somehow I always end up spilling the beans all over the floor and looking foolish when the clerk comes to sweep them up. (2) Grilling outdoors is much more than just another dry-heat cooking method. It’s the chance to play with fire, satisfying a primal urge to stir around in coals. ⇒ Idioms have to be recognised in discourse context! (token-based classification) Linlin Li, Caroline Sporleder Recognition of Literal and Nonliteral Use of MWEs 3/ 15

Token-based Idiom Classification Previous Approaches: Katz and Giesbrecht (2006): supervised machine learning (k-nn), vector space model Birke and Sarkar (2006): bootstrapping from seed lists Cook et al. (2007), Fazly et al. (to appear): unsupervised, predict non-literal if idiom is in canonical form ( ≈ dictionary form) An idiomatic VNC (verb+noun combination) tends to have one (or at most a small number of) canonical form(s), which are its most preferred syntactic patterns (Fazly and Stevenson (2006)) This method determines the canonical form of an expression to be those forms whose frequency is much higher than the average frequency of all its forms ⇒ limited consideration of discourse context Linlin Li, Caroline Sporleder Recognition of Literal and Nonliteral Use of MWEs 4/ 15

How do you know whether an expression is used idiomatically? Literal Usage Grilling outdoors is much more than just another dry-heat cooking method. It’s the chance to play with fire, satisfying a primal urge to stir around in coals. Linlin Li, Caroline Sporleder Recognition of Literal and Nonliteral Use of MWEs 5/ 15

How do you know whether an expression is used idiomatically? Literal Usage Grilling outdoors is much more than just another dry-heat cooking method. It’s the chance to play with fire, satisfying a primal urge to stir around in coals. Literally used expressions typically exhibit lexical cohesion with the surrounding discourse (e.g. participate in lexical chains of semanti- cally related words). Linlin Li, Caroline Sporleder Recognition of Literal and Nonliteral Use of MWEs 5/ 15

How do you know whether an expression is used idiomatically? Non-Literal Usage Dissanayake said that Kumaratunga was ”playing with fire” after she accused military’s top brass of interfering in the peace process. Kumaratunga has said in an interview she would not tolerate attempts by the army high command to sabotage her peace moves. A defence analyst close to the government said Kumaratunga had spoken a ”load of rubbish” and the security forces would not take kindly to her disparaging comments about them. Non-Literally used expressions typically do not participate in cohesive chains. Linlin Li, Caroline Sporleder Recognition of Literal and Nonliteral Use of MWEs 5/ 15

A Cohesion-based Approach to Idiom Detection Identifying Idiomatic Usage Are there (strong) cohesive ties between the component words of the idiom and the context? Yes ⇒ literal usage No ⇒ non-literal usage We need: a measure of semantic relatedness a method for modelling lexical cohesion: cohesion graph Linlin Li, Caroline Sporleder Recognition of Literal and Nonliteral Use of MWEs 6/ 15

Modelling Semantic Relatedness We have to model non-classical relations (e.g. fire - coals , sweep up - spill , ice - freeze ) and world knowledge ( Wayne Rooney - ball ). ⇒ distributional approaches better suited than WordNet-based ones ⇒ ideally, we need loads of up-to-date data Normalised Google Distance (NGD) (Cilibrasi and Vitanyi, 2007) use search engine page counts (here: Yahoo) as proxies for word co-occurrence NGD ( x , y ) = max { log f ( x ) , log f ( y ) } − log f ( x , y ) log M − min { log f ( x ) , log f ( y ) } ( x , y : target words, M : total number of pages indexed) Linlin Li, Caroline Sporleder Recognition of Literal and Nonliteral Use of MWEs 7/ 15

Modelling Cohesion: Cohesion Graph We played v 1 a couple of party v 2 games v 3 to break v 4 the ice v 5 . Graph-based Classifier (∆ c > 0 ⇒ literal ): ′ ) ∆ c = c ( G ) − c ( G ′ : { v 1 , v 2 , v 3 } ) ( G : { v 1 , v 2 , v 3 , v 4 , v 5 } , G Linlin Li, Caroline Sporleder Recognition of Literal and Nonliteral Use of MWEs 8/ 15

Weighting the Graph: edges The further two tokens occur from each other, the more likely it is that their relatedness is accidental Low Weight Edge Next week the two diplomats will meet in an attempt to break the ice between the two nations. A crucial issue in the talks will be the long-running water dispute. Linlin Li, Caroline Sporleder Recognition of Literal and Nonliteral Use of MWEs 9/ 15

Weighting the Graph: edges The further two tokens occur from each other, the more likely it is that their relatedness is accidental Low Weight Edge Next week the two diplomats will meet in an attempt to break the ice between the two nations. A crucial issue in the talks will be the long-running water dispute. defined in terms of the inverse of the distance δ between the two token positions id i and id j : δ ( id i , id j ) λ ij = � δ ( id i , id j ) j Linlin Li, Caroline Sporleder Recognition of Literal and Nonliteral Use of MWEs 9/ 15

Weighting the Graph: nodes Less important tokens should be assigned less weight when modelling discourse connectivity Low Weight Node “Gujral will meet Sharif on Monday and discuss bilateral relations,” the Press Trust of India added. The minister said Sharif and Gujral would be able to “break the ice” over Kashmir. Linlin Li, Caroline Sporleder Recognition of Literal and Nonliteral Use of MWEs 10/ 15

Weighting the Graph: nodes Less important tokens should be assigned less weight when modelling discourse connectivity Low Weight Node “Gujral will meet Sharif on Monday and discuss bilateral relations,” the Press Trust of India added. The minister said Sharif and Gujral would be able to “break the ice” over Kashmir. the salience of a token for the semantic context of the text is defined on a tf . idf -based weighting scheme: | D | salience ( t i ) = log |{ d : t i ∈ d }| Linlin Li, Caroline Sporleder Recognition of Literal and Nonliteral Use of MWEs 10/ 15

A Cohesion Graph Based Approach for Unsupervised Recognition of - PowerPoint PPT Presentation

A Cohesion Graph Based Approach for Unsupervised Recognition of Literal and Nonliteral Use of Multiword Expressions Linlin Li and Caroline Sporleder MMCI / Computational Linguistics, Saarland University { linlin,csporleder } @coli.uni-sb.de

5 th Cohesion Report 5 th Cohesion Report and and 5 th Cohesion Forum 5 th Cohesion Forum

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

EU COHESION POLICY IN THE PUBLIC SPHERE: HOW DO THE MEDIA FRAME EU COHESION POLICY? Results from

Cohesion Policy support for Renewable Energy Camelia M. Kovcs European Commission

Discussion with Cohesion region stakeholders on Cohesion policy for 2021-2027 in Slovenia

Is Social Cohesion the missing link in overcoming violence, inequality and poverty? Laboratory

Se Session Outline Defining Cohesion Carrons Conceptual Model of Cohesion Measuring

8 Chapter 8: Group Cohesion Defining Cohesion A dynamic process reflected in the tendency

The manifestation of Hilberts Nullstellensatz in Lawveres Axiomatic Cohesion Mat as

CSSE 220 Coupling and Cohesion Scoping Please checkout VideoStore from your SVN The plan

Logical Topology and Axiomatic Cohesion David Jaz Myers Johns Hopkins University March 12, 2019

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

A Bayesian Approach to A Bayesian Approach to Unsupervised One- Unsupervised One -Shot Shot

On the Limitations of Unsupervised Bilingual Dictionary Induction Anders Sgaard Sebastian

Outline Morning program Preliminaries Modeling user behavior Semantic matching Learning to

Learning Machines Seminars 2020-11-05 Uncertainty in deep learning Olof Mogren, PhD RISE

Invariant neural networks and probabilistic symmetry Benjamin Bloem-Reddy , University of Oxford

Computational Semantics and Pragmatics Autumn 2012 Raquel Fernndez Institute for Logic,

Economic Freedom and Public Policy: Economics as a Moral Discipline Lord Turner Chairman of the

Abstract Stone Duality Paul Taylor University of Manchester Funded by UK EPSRC GR/S58522

Iconicity in the grammar: Pluractionality in French Sign Language Jeremy Kuhn Valentina

PType System : A Featherweight Parallelizability Detector Dana N. Xu National University of

Sambuz

Useful Links

Newsletter

Mail Us