Metaphor Detection through Term Relevance Marc Schulder Eduard - - PowerPoint PPT Presentation

metaphor detection
SMART_READER_LITE
LIVE PREVIEW

Metaphor Detection through Term Relevance Marc Schulder Eduard - - PowerPoint PPT Presentation

Metaphor Detection through Term Relevance Marc Schulder Eduard Hovy Saarland University Carnegie Mellon University 1 Metaphor Detection And yet we stand together as we did two centuries ago. 2 Metaphor Detection And yet we


slide-1
SLIDE 1

Metaphor Detection

through

Term Relevance

1

Marc Schulder Saarland University Eduard Hovy Carnegie Mellon University

slide-2
SLIDE 2

Metaphor Detection

2

“And yet we stand together as we did two centuries ago.”

slide-3
SLIDE 3

Metaphor Detection

2

“And yet we stand together as we did two centuries ago.”

slide-4
SLIDE 4

Metaphor Detection

Challenges

3

Required Knowledge

slide-5
SLIDE 5

Metaphor Detection

Challenges

3

Required Knowledge

Conceptual Mappings
 Target Domain ⨉ Mappings ⨉ Source Domains

slide-6
SLIDE 6

Metaphor Detection

Challenges

3

Required Knowledge

Conceptual Mappings
 Target Domain ⨉ Mappings ⨉ Source Domains Selectional Preference Violation
 Preferences ⨉ Argument Domains

slide-7
SLIDE 7

Metaphor Detection

Challenges

3

Required Knowledge

Conceptual Mappings
 Target Domain ⨉ Mappings ⨉ Source Domains Selectional Preference Violation
 Preferences ⨉ Argument Domains

Typical Restrictions


Low Coverage
 POS Limitations

slide-8
SLIDE 8

Metaphor Detection

Challenges

3

Required Knowledge

Conceptual Mappings
 Target Domain ⨉ Mappings ⨉ Source Domains Selectional Preference Violation
 Preferences ⨉ Argument Domains

Typical Restrictions


Low Coverage
 POS Limitations

slide-9
SLIDE 9

Metaphor Detection

Term Relevance

4

slide-10
SLIDE 10

Metaphor Detection

Term Relevance

4

Simple

slide-11
SLIDE 11

Metaphor Detection

Term Relevance

4

Simple Robust

slide-12
SLIDE 12

Metaphor Detection

Term Relevance

4

Simple Robust POS independent

slide-13
SLIDE 13

Metaphor Detection

Term Relevance

4

Simple Robust POS independent Target Domain only

slide-14
SLIDE 14

Term Relevance

Hypothesis

5

If a word does not fit in the context, then it is probably not meant literally.

See also Sporleder & Li (2009)/Li & Sporleder(2010)

slide-15
SLIDE 15

Term Relevance

Overview

6

Relevance Metric Domain Data


(Web Corpus)

Evaluation


(Metaphor Corpus)

Basic Classifier Multi-Feature Classifier

slide-16
SLIDE 16

Term Relevance

Metric

7

Is word common in all domains? Is word typical for this domain? literal metaphor literal

yes yes no no

slide-17
SLIDE 17

Term Relevance

TF-IDF

8

Term Frequency

How often does term appear in a document

Document Frequency

In how many documents does term appear

TF-IDF

Impact of term on document term frequency ⨉ inverse document frequency

slide-18
SLIDE 18

Term Relevance

TF-IDF

8

Term Frequency

How often does term appear in a document

Document Frequency

In how many documents does term appear

TF-IDF

Impact of term on document term frequency ⨉ inverse document frequency

D

  • m

a i n

domain domai d

  • m

a i n s d

  • m

a i n

slide-19
SLIDE 19

Term Relevance

TF-IDF

8

Term Frequency

How often does term appear in a document

Document Frequency

In how many documents does term appear

TF-IDF

Impact of term on document term frequency ⨉ inverse document frequency

D

  • m

a i n

domain domai d

  • m

a i n s

Domain Relevance

d

  • m

a i n

slide-20
SLIDE 20

Term Relevance

Metric

9

Is word common in all domains? Is word typical for this domain? literal metaphor literal

yes yes no no

slide-21
SLIDE 21

Term Relevance

Metric

10

document frequency > δ Is word typical for this domain? literal metaphor literal

yes yes no no

slide-22
SLIDE 22

Term Relevance

Metric

11

document frequency > δ domain relevance > 𝜹 literal metaphor literal

yes yes no no

slide-23
SLIDE 23

Overview

12

Relevance Metric Domain Data Evaluation Basic Classifier Multi-Feature Classifier

slide-24
SLIDE 24

Domain Data

Web Corpus

ClueWeb-09

13

1 Billion Web Documents 500 Million English Web Documents Segment en0000 3 Million Documents 1.8 Million Documents without Spam

slide-25
SLIDE 25

Domain Data

Domain Clustering

14

Lucene Database Domain Seeds Domain Data Pseudo-Domain Data

ClueWeb-09

slide-26
SLIDE 26

Domain Data

Domain Clustering

14

Lucene Database Domain Seeds Domain Data Pseudo-Domain Data Legislative

pass law regulate debate parliament

ClueWeb-09

slide-27
SLIDE 27

Domain Data

Domain Clustering

14

Lucene Database Domain Seeds Domain Data Pseudo-Domain Data Legislative
 10,000 docs

ClueWeb-09

slide-28
SLIDE 28

Domain Data

Domain Clustering

14

Lucene Database Domain Seeds Domain Data Pseudo-Domain Data Legislative
 10,000 docs Economy

budget tax spend
 plan finances

ClueWeb-09

slide-29
SLIDE 29

Domain Data

Domain Clustering

14

Lucene Database Domain Seeds Domain Data Pseudo-Domain Data Legislative
 10,000 docs Economy
 10,000 docs

ClueWeb-09

slide-30
SLIDE 30

Domain Data

Domain Clustering

14

Lucene Database Domain Seeds Domain Data Pseudo-Domain Data Legislative
 10,000 docs Economy
 10,000 docs Pseudo 1
 10,000 docs Pseudo 2
 10,000 docs Pseudo 3
 10,000 docs

ClueWeb-09

slide-31
SLIDE 31

Overview

15

Relevance Metric Domain Data Evaluation Basic Classifier Multi-Feature Classifier

slide-32
SLIDE 32

Evaluation

Experimental Setup

16

“And yet we stand together as we did two centuries ago.”

slide-33
SLIDE 33

Evaluation

Experimental Setup

16

“And yet we stand together as we did two centuries ago.” X X X M M X X X X X X

slide-34
SLIDE 34

Evaluation

Experimental Setup

16

Metrics

  • F-Measure
  • Precision
  • Recall
  • Accuracy

“And yet we stand together as we did two centuries ago.” X X X M M X X X X X X

slide-35
SLIDE 35

Evaluation

Experimental Setup

16

Metrics

  • F-Measure
  • Precision
  • Recall
  • Accuracy

“And yet we stand together as we did two centuries ago.” X X X M M X X X X X X

slide-36
SLIDE 36

Evaluation

Experimental Setup

16

Metrics

  • F-Measure
  • Precision
  • Recall
  • Accuracy

Baseline: “And yet we stand together as we did two centuries ago.” X X X M M X X X X X X M M M M M M M M M M M

slide-37
SLIDE 37

Evaluation

Gold Corpus

17

MICS Governance Corpus 2510 Sentences

slide-38
SLIDE 38

Evaluation

Gold Corpus

17

MICS Governance Corpus 2510 Sentences

23 % 60 % 17 %

0 metaphors 1 metaphor 2+ metaphors

slide-39
SLIDE 39

Evaluation

Gold Corpus

17

MICS Governance Corpus 2510 Sentences “And yet we stand together as we did two centuries ago.”
 “Many Jewish voters will find themselves at a crossroads.” Examples

23 % 60 % 17 %

0 metaphors 1 metaphor 2+ metaphors

slide-40
SLIDE 40

Overview

18

Relevance Metric Domain Data Evaluation Basic Classifier Multi-Feature Classifier

slide-41
SLIDE 41

Basic Classifier

Seeds & Thresholds

19

slide-42
SLIDE 42

Basic Classifier

Seeds & Thresholds

19

Economy

budget tax spend
 plan finances

slide-43
SLIDE 43

Basic Classifier

Seeds & Thresholds

19

Economy

budget tax spend
 plan finances

Economy
 10,000 docs

slide-44
SLIDE 44

Basic Classifier

Seeds & Thresholds

19

document frequency > δ domain relevance > 𝜹

Economy

budget tax spend
 plan finances

Economy
 10,000 docs

slide-45
SLIDE 45

Basic Classifier

Seeds & Thresholds

20

document frequency > δ domain relevance > 𝜹 literal metaphor literal

yes yes no no

slide-46
SLIDE 46

Basic Classifier

Seeds & Thresholds

20

document frequency > δ domain relevance > 𝜹 literal metaphor literal

yes yes no no

8 Subdomains 4-14 manual seeds 8 ⨉ 10.000 Docs each

𝜹=0.02 ; δ=0.1

Seed Set 1: Manual

slide-47
SLIDE 47

Basic Classifier

Seeds & Thresholds

21

document frequency > δ literal literal

yes yes no no

8 Subdomains 4-14 manual seeds 8 ⨉ 10.000 Docs each

𝜹=0.02 ; δ=0.1

Seed Set 1: Manual

legislative relevance > 𝜹 economy relevance > 𝜹 ... literal

yes no

metaphor

slide-48
SLIDE 48

Basic Classifier

Seeds & Thresholds

22

document frequency > δ domain relevance > 𝜹 literal metaphor literal

yes yes no no

8 Subdomains 4-14 manual seeds 8 ⨉ 10.000 Docs each

𝜹=0.02 ; δ=0.1

Seed Set 1: Manual

slide-49
SLIDE 49

Basic Classifier

Seeds & Thresholds

22

document frequency > δ domain relevance > 𝜹 literal metaphor literal

yes yes no no

8 Subdomains 4-14 manual seeds 8 ⨉ 10.000 Docs each

𝜹=0.02 ; δ=0.1

Seed Set 1: Manual 1 Domain 50 best gold metaphors 80.000 Docs

𝜹=0.01 ; δ=0.1

Seed Set 2: Gold

slide-50
SLIDE 50

Basic Classifier

Evaluation

23

0.25 0.5 0.75 1 F1 Precision Recall

.591 .245 .346 .478 .276 .350 1.000 .142 .249

All Metaphor Manual Seeds Gold Seeds

slide-51
SLIDE 51

Overview

Almost Done

24

Relevance Metric Domain Data Evaluation Basic Classifier Multi-Feature Classifier

slide-52
SLIDE 52

Multi-Feature Classifier

Conditional Random Fields

25

Bigram model
 10-fold cross validation Setup

slide-53
SLIDE 53

Multi-Feature Classifier

Conditional Random Fields

25

Bigram model
 10-fold cross validation Setup Part of Speech
 Lexicographer Sense
 Features Relevance Weights (𝜹=0.02 ; δ=0.79)

slide-54
SLIDE 54

Multi-Feature Classifier

Evaluation

26

0.25 0.5 0.75 1 F1 Precision Recall

.263 .640 .373 .230 .654 .340 .130 .683 .219 .108 .706 .187

CRF: Basic CRF: Relev CRF: PosLex CRF: PosLex + Relev All Metaphor Manual Seeds

slide-55
SLIDE 55

Multi-Feature Classifier

Training Size

27

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 200 400 600 800 1000 1200 1400 1600 1800 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 F1 Number of Training Sentences Models CRF Basic CRF PosLex Threshold + Relevance + Relevance Baseline

slide-56
SLIDE 56

Multi-Feature Classifier

Relative Gain

28

0 % 25 % 50 % 75 % 100 % 125 % 150 % 200 400 600 800 1000 1200 1400 1600 1800 0 % 25 % 50 % 75 % 100 % 125 % 150 % Relative Gain Number of Training Sentences Models CRF Basic CRF PosLex

slide-57
SLIDE 57

Conclusion

29

Pro


Term Relevance is a cheap Metaphor Heuristic
 Useful in Low Resource Scenarios

Con


Performance still too low


Future Work


Term Relevance: Semantic Vector Space
 Domain Corpora: Topic Modelling
 Application: Other non-literal devices