Modeling and Representing Negation in Data-driven Machine - - PowerPoint PPT Presentation

modeling and representing negation in data driven machine
SMART_READER_LITE
LIVE PREVIEW

Modeling and Representing Negation in Data-driven Machine - - PowerPoint PPT Presentation

Modeling and Representing Negation in Data-driven Machine Learning-based Sentiment Analysis Robert Remus rremus@informatik.uni-leipzig.de Natural Language Processing Group Department of Computer Science University of Leipzig, Germany


slide-1
SLIDE 1

Modeling and Representing Negation in Data-driven Machine Learning-based Sentiment Analysis

Robert Remus

rremus@informatik.uni-leipzig.de

Natural Language Processing Group Department of Computer Science University of Leipzig, Germany

ESSEM-2013 — December 3rd, 2013

1 · 13

slide-2
SLIDE 2

Negation Modeling — Introduction I

In sentiment analysis (SA), negation plays a special role

[Wiegand et al., 2010]: (1) They are comfortable to wear+. (2) They are not comfortable to wear+−.

2 · 13

slide-3
SLIDE 3

Negation Modeling — Introduction II

Negations . . . are expressed via negation words/signals, e.g. “don’t x” “no findings of x” “rules out x” . . .

and via morphology, e.g.

“un-x” “x-free” “x-less” . . . have a negation scope, i.e. the words that are negated, e.g.

(1) They are not comfortable to wear.

3 · 13

slide-4
SLIDE 4

Negation Modeling — Introduction III

In compositional semantic approaches to SA, negations are usually

captured via some ad hoc rule(s), e.g.

“Polarity(not [arg1]) = ¬Polarity(arg1)”

[Choi & Cardie, 2008]

But what about

(1) The stand doesn’t work. (2) The stand doesn’t work well. ?

How to model and represent negation in a data-driven machine

learning-based approach to SA

. . . based solely on word n-grams and . . . w/o lexical resources, such as SentiWordNet

[Esuli & Sebastiani, 2006]

?

4 · 13

slide-5
SLIDE 5

Negation Modeling — Implicitly

Implicit negation modeling via higher order word n-grams: bigrams (“*n’t return”) trigrams (“lack of padding”) tetragrams (“denied sending wrong size”) . . . So, we don’t need to incorporate extra knowledge of negation into

  • ur model, that’s convenient!

But what about long negation scopes (length ≥ 4) as in

(1) The leather straps have never worn out or broken. ?

Long negation scopes are the rule, not the exception! (>70%) Word n-grams (n < 5) don’t capture such long negation scopes Learning models using word n-grams (n ≥ 3) is usually backed up by

almost no findings in the training data

5 · 13

slide-6
SLIDE 6

Negation Modeling — Explicitly I

Let’s incorporate some knowledge of negation into our model and

model negation explicitly!

Vital: negation scope detection (NSD)

(1) They don’t stand up to laundering very wellstand up to laundering very well, in that they shrink up quite a bit. e.g. via

NegEx1 — regular expression-based = “baseline” LingScope2 — CRF-based = “state-of-the-art” 1http://code.google.com/p/negex/ 2http://sourceforge.net/projects/lingscope/ 6 · 13

slide-7
SLIDE 7

Negation Modeling — Explicitly II

Once negation scopes are detected, negated and non-negated word

n-grams need to be explicitly represented in feature space:

W = {wi}, i = 1, . . . , d word n-grams X = {0, 1}d feature space of size d where for xj ∈ X xjk = 1 denotes the presence of wk xjk = 0 denotes the absence of wk For each feature xjk: additional feature ˘

xjk

˘

xjk = 1 encodes that wk appears negated

˘

xjk = 0 encodes that wk appears non-negated

Result: augmented feature space ˘

X = {0, 1}2d

In ˘

X we are now able to represent whether a word n-gram

w is present ([1, 0]), w is absent ([0, 0]), w is present and negated ([0, 1]) or w is present both negated and non-negated ([1, 1]). 7 · 13

slide-8
SLIDE 8

Negation Modeling — Explicitly III

Example: explicit negation modeling for word unigrams in

(1) They don’t stand up to laundering very well, in that they shrink up quite a bit.

Na¨

ıve tokenization that splits at white spaces

Ignore punctuation characters Vocabulary Wuni = {“bit”, “don’t”, “down”, “laundering”, “quite”,

“shrink”, “stand”, “up”, “very”, “well”}

Scheme bit don’t down laundering quite shrink stand up/up very well w/ [1, 0 1, 0 0, 0 0, 1 1, 0 1, 0 0, 1 1, 1 0, 1 0, 1] w/o [1 1 1 1 1 1 1 1 1 ]

Table : Stylized feature vectors of example (1).

8 · 13

slide-9
SLIDE 9

Negation Modeling — Evaluation I

3 SA subtasks:

  • 1. In-domain document-level polarity classification on

10 domains from [Blitzer et al., 2007]’s Multi-Domain Sentiment

Dataset v2.0

  • 2. Cross-domain document-level polarity classification on

90 source domain–target domain pairs from the same data set

  • 3. Sentence-level polarity classification on

[Pang & Lee, 2005]’s sentence polarity dataset v1.0 9 · 13

slide-10
SLIDE 10

Negation Modeling — Evaluation II

Standard setup: SVMs, linear kernel, fixed C = 2.0 Implicit negation modeling/features: word {uni,bi,tri}-grams Explicit negation modeling word {uni,bi,tri}-grams NSD: NegEx & LingScope Evaluation measure: accuracy averaged over 10-fold cross validations For cross-domain experiments: 3 domain adaptation methods = lots & lots & lots of combinations . . . 3 3Summarized evaluation results are to be found in the paper corresponding to

this talk. Additionally, full evaluation results are available at http://asv.informatik.uni-leipzig.de/staff/Robert_Remus

10 · 13

slide-11
SLIDE 11

Negation Modeling — Results “in a nutshell”

Explicitly modeling negation always yields statistically significant

better results than modeling it only implicitly

Explicitly modeling negation not only of word unigrams, but of

higher order word n-grams is beneficial

Discriminative data-driven word n-gram models + explicit negation

modeling = competitive: outperforms several state-of-the-art models

LingScope performs better than NegEx 11 · 13

slide-12
SLIDE 12

Negation Modeling — Future Work

Given appropriate scope detection methods, our approach is easily

extensible to model

  • ther valence shifters [Polanyi & Zaenen, 2006], e.g. intensifiers like

“very” or “many”

hedges [Lakoff, 1973], e.g.“may” or “might”. Accounting for negation scopes in the scope of other negations:

(1) I don’t care that they are not really leather.

12 · 13

slide-13
SLIDE 13

Thanks!

Any questions or suggestions?

13 · 13

slide-14
SLIDE 14

Appendix — Literature I

Blitzer, J., Dredze, M., & Pereira, F. C. (2007). Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL) (pp. 440–447). Choi, Y. & Cardie, C. (2008). Learning with compositional semantics as structural inference for subsentential sentiment analysis. In Proceedings of the 13th Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 793–801).

slide-15
SLIDE 15

Appendix — Literature II

Esuli, A. & Sebastiani, F. (2006). SentiWordNet: A publicly available lexical resource for opinion mining. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC) (pp. 417–422). Lakoff, G. (1973). Hedging: A study in media criteria and the logic of fuzzy concepts. Journal of Philosophical Logic, 2, 458–508. Pang, B. & Lee, L. (2005). Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL) (pp. 115–124).

slide-16
SLIDE 16

Appendix — Literature III

Polanyi, L. & Zaenen, A. (2006). Contextual valence shifters. In J. G. Shanahan, Y. Qu, & J. Wiebe (editors), Computing Attitude and Affect in Text: Theory and Application, Ausgabe 20

  • f The Information Retrieval Series. Computing Attitude and

Affect in Text: Theory and Applications (pp. 1–9). Dordrecht: Springer. Wiegand, M., Balahur, A., Roth, B., Klakow, D., & Montoyo, A. (2010). A survey on the role of negation in sentiment analysis. In Proceedings of the 2010 Workshop on Negation and Speculation in Natural Language Processing (NeSp-NLP) (pp. 60–68).