Dependency parses for NLU Christopher Potts CS 244U: Natural - - PowerPoint PPT Presentation

dependency parses for nlu
SMART_READER_LITE
LIVE PREVIEW

Dependency parses for NLU Christopher Potts CS 244U: Natural - - PowerPoint PPT Presentation

Introduction Overview Argument structure advmod Classifiers Negation Refs. Dependency parses for NLU Christopher Potts CS 244U: Natural language understanding April 21 1 / 42 Introduction Overview Argument structure advmod


slide-1
SLIDE 1

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Dependency parses for NLU

Christopher Potts CS 244U: Natural language understanding April 21

1 / 42

slide-2
SLIDE 2

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Syntactic structure: My dog will not go in the lake.

Treebank-style parsetree Dependencies Collapsed dependencies

ROOT go root My dog poss will not nsubj aux neg in prep lake pobj the det ROOT go root My dog poss will not nsubj aux neg lake prep_in the det

2 / 42

slide-3
SLIDE 3

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Simplified relationships, easier feature extraction

S NP NNP Gerald VP VBD gave NP NNS awards PP TO to NP NNS puppies

Gerald gave nsubj awards dobj puppies prep_to

S NP NNP Gerald VP VBD gave NP NNS puppies NP NNS awards

Gerald gave nsubj puppies iobj awards dobj

3 / 42

slide-4
SLIDE 4

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Plan and goals

Goals

  • Make the case for Stanford dependency structures (de Marneffe et al. 2006;

de Marneffe and Manning 2008a,b; de Marneffe et al. 2013)

  • Highlight some of the ways that semantic information is passed around

inside sentences.

  • Engage with other topics: VSMs, classifiers, and semantic parsing.

4 / 42

slide-5
SLIDE 5

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Plan and goals

Goals

  • Make the case for Stanford dependency structures (de Marneffe et al. 2006;

de Marneffe and Manning 2008a,b; de Marneffe et al. 2013)

  • Highlight some of the ways that semantic information is passed around

inside sentences.

  • Engage with other topics: VSMs, classifiers, and semantic parsing.

Not covered here

The theory of parsing, the theory of semantic dependencies, or the details of mapping from phrase structure trees to dependencies. In short, we’re going to be consumers of dependencies, seeking to use them to get ahead in NLU.

4 / 42

slide-6
SLIDE 6

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Plan and goals

Goals

  • Make the case for Stanford dependency structures (de Marneffe et al. 2006;

de Marneffe and Manning 2008a,b; de Marneffe et al. 2013)

  • Highlight some of the ways that semantic information is passed around

inside sentences.

  • Engage with other topics: VSMs, classifiers, and semantic parsing.

Not covered here

The theory of parsing, the theory of semantic dependencies, or the details of mapping from phrase structure trees to dependencies. In short, we’re going to be consumers of dependencies, seeking to use them to get ahead in NLU.

Plan

1 Get a feel for Stanford dependencies 2 Case study: advmod-based VSMs 3 Case study: dependencies as classifier features 4 Case study: capturing the semantic influence of negation

4 / 42

slide-7
SLIDE 7

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Dependency structures in NLU

Dependencies as the basis for features:

  • Word-sense disambiguation (Lin 1998)

[last year’s slides on WSD]

  • Relation extraction (Snow et al. 2005; Mintz et al. 2009)
  • Semantic role labeling (Surdeanu et al. 2008; Johansson and Nugues 2008)
  • Semantic parsing (Liang et al. 2013)
  • Detecting speaker commitment (hedging, etc.; de Marneffe et al. 2012)
  • Forecasting public opinion (Lerman et al. 2008)
  • Analysis of political debates (Balahur et al. 2009)
  • Drug interactions (Percha et al. 2012)
  • . . .

5 / 42

slide-8
SLIDE 8

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Stanford dependencies relation hierarchy

dep aux conj cc arg ref expl mod punct handles hyphenation differently sdep list auxpass cop subj comp agent nsubj csubj nsubjpass

  • bj

attr abandoned ccomp xcomp expanded compl collapsed to mark mark rel abandoned acomp dobj iobj pobj advcl purpcl collapsed to advcl tmod rcmod amod extended to include parenthetical ages infmod partmod num number appos mwe extended nn abbrev collapsed with appos advmod poss possessive prt det prep discourse goeswith vmod vocative neg xobj

Updates from de Marneffe et al. 2013:

  • New relations are boxed.
  • Changed/deleted relations are in red, with notes

6 / 42

slide-9
SLIDE 9

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Stanford dependencies relation hierarchy

aux auxpass cop 6 / 42

slide-10
SLIDE 10

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Stanford dependencies relation hierarchy

auxpass cop subj comp agent nsubj csubj nsubjpass

  • bj

attr abandoned ccomp xcomp expanded compl collapsed to mark mark rel abandoned acomp dobj iobj pobj advcl purpcl collapsed to advcl tmod rcmod

6 / 42

slide-11
SLIDE 11

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Stanford dependencies relation hierarchy

dep conj cc arg ref expl mod punct handles hyphenation diffe sdep list

6 / 42

slide-12
SLIDE 12

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Stanford dependencies relation hierarchy

cc arg ref expl mod handles hyphenation differently sdep list compl collapsed to mark mark rel abandoned acomp advcl purpcl collapsed to advcl tmod rcmod amod extended to include parenthetical ages infmod partmod num number appos mwe extended nn abbrev collapsed with appos advmod poss possessive prt det prep discourse goeswith vmod vocative neg

New relations are boxed. Changed/deleted relations a

6 / 42

slide-13
SLIDE 13

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Stanford dependency construction

Ruled-based mapping from phrase structure trees to dependency graphs: 1. Dependency extraction: for each constituent, identify its seman- tic head and project the head up- wards: VP MD might VP VB have VP VBN escaped

7 / 42

slide-14
SLIDE 14

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Stanford dependency construction

Ruled-based mapping from phrase structure trees to dependency graphs: 1. Dependency extraction: for each constituent, identify its seman- tic head and project the head up- wards: VP MD might VP VB have VP VBN escaped

7 / 42

slide-15
SLIDE 15

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Stanford dependency construction

Ruled-based mapping from phrase structure trees to dependency graphs: 1. Dependency extraction: for each constituent, identify its seman- tic head and project the head up- wards: VP[escaped] MD[might] might VP[escaped] VB[have] have VP[escaped] VBN[escaped] escaped

7 / 42

slide-16
SLIDE 16

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Stanford dependency construction

Ruled-based mapping from phrase structure trees to dependency graphs: 1. Dependency extraction: for each constituent, identify its seman- tic head and project the head up- wards: VP[escaped] MD[might] might VP[escaped] VB[have] have VP[escaped] VBN[escaped] escaped

  • 2. Dependency typing: label each

dependency pair with the most spe- cific appropriate relation in terms of the dependency hierarchy.

  • relation: aux
  • parent: VP
  • Tregex pattern:

VP < VP < /ˆ(?:TO|MD|VB.*|AUXG?)$/=target

Relations determined: aux(escaped, might) aux(escaped, have) Rules might also deliver dep(escaped, might) Always favor the most specific.

7 / 42

slide-17
SLIDE 17

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Stanford dependencies: basic and collapsed

Quoting from the javadocs, trees/EnglishGrammaticalRelations.java: The “collapsed” grammatical relations primarily differ as follows:

  • Some multiword conjunctions and prepositions are treated as single words,

and then processed as below.

  • Prepositions do not appear as words but are turned into new “prep” or

“prepc” grammatical relations, one for each preposition.

  • Conjunctions do not appear as words but are turned into new “conj”

grammatical relations, one for each conjunction.

  • The possessive “’s” is deleted, leaving just the relation between the

possessor and possessum.

  • Agents of passive sentences are recognized and marked as agent and not

as prep by.

8 / 42

slide-18
SLIDE 18

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Stanford tools

The Stanford parser is distributed with starter Java code for parsing your own

  • data. It also has a flexible command-line interface. Some relevant commands:

# Map plain text to dependency structures:

java -mx3000m -cp stanford-parser.jar edu.stanford.nlp.parser.lexparser.LexicalizedParser

  • outputFormat "typedDependencies" englishPCFG.ser.gz textFile

# Map tagged data to dependency structures:

java -mx3000m -cp stanford-parser.jar edu.stanford.nlp.parser.lexparser.LexicalizedParser

  • outputFormat "typedDependencies" -tokenized -tagSeparator / englishPCFG.ser.gz taggedFile

# Map phrase-structure trees to Stanford collapsed dependencies (change -collapsed to -basic for collapsed versions):

java -cp stanford-parser.jar edu.stanford.nlp.trees.EnglishGrammaticalStructure

  • treeFile treeFile -collapsed

Software/docs: http://nlp.stanford.edu/software/lex-parser.shtml

9 / 42

slide-19
SLIDE 19

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Graphviz

Graphiviz is free graphing software that makes it easy to visualize dependency structures: http://www.graphviz.org/

Al said nsubj raining ccomp that it was complm nsubj aux

digraph g { /* Nodes */ "Al-1" [label="Al"]; "said-2" [label="said"]; "that-3" [label="that"]; "it-4" [label="it"]; "was-5" [label="was"]; "raining-6" [label="raining"]; /* Edges */ "said-2" -> "Al-1" [label="nsubj"]; "raining-6" -> "that-3" [label="complm"]; "raining-6" -> "it-4" [label="nsubj"]; "raining-6" -> "was-5" [label="aux"]; "said-2" -> "raining-6" [label="ccomp"]; }

10 / 42

slide-20
SLIDE 20

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Argument structure

  • This section reviews the way basic constituents are represented in Stanford

dependency structures.

  • I concentrate on the most heavily used relations.
  • To understand the less-used ones, consult the dependencies manual

(de Marneffe and Manning 2008a) and play around with examples using the

  • nline parser demo:

http://nlp.stanford.edu:8080/parser/index.jsp

11 / 42

slide-21
SLIDE 21

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Verbal structures

aux conj cc arg auxpass cop subj comp agent nsubj csubj nsubjpass

  • bj

attr abandoned ccomp xcomp expanded compl collapsed to mark mark rel abandoned acomp dobj iobj pobj advcl purpcl collapsed to advcl tmod rcmod

12 / 42

slide-22
SLIDE 22

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Verbal structures: intransitive and transitive

Intransitive

Al might Al might have Al escaped. Al might escape. have escaped. been escaping.

Al escaped nsubj Al might escape nsubj aux Al might have escaped nsubj aux aux         

Transitive

Gerald gave Gerald gave awards to puppies Sue saw stars. puppies awards. basic collapsed

Sue saw nsubj stars dobj Gerald gave nsubj puppies iobj awards dobj Gerald gave nsubj awards dobj to prep puppies pobj Gerald gave nsubj awards dobj puppies prep_to

13 / 42

slide-23
SLIDE 23

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Verbal structures: sentential complements

Tensed

Al said that it was raining.

Al said nsubj raining ccomp that it was complm nsubj aux

Infinitival

Kim wants to win. Basic Collapsed

Kim wants nsubj win xcomp to aux Kim wants nsubj win xcomp to xsubj aux

14 / 42

slide-24
SLIDE 24

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Nominals

cc arg ref expl mod handles hyphenation differently sdep list compl collapsed to mark mark rel abandoned acomp advcl purpcl collapsed to advcl tmod rcmod amod extended to include parenthetical ages infmod partmod num number appos mwe extended nn abbrev collapsed with appos advmod poss possessive prt det prep discourse goeswith vmod vocative neg

New relations are boxed. Changed/deleted relations a

15 / 42

slide-25
SLIDE 25

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Nominal structures

Basic

Possessive Proper name Quantifier Determiner basic collapsed

Sam Everyone the student det

Sam 's possessive bike poss Sam bike poss

Modified

Prepositional Adjective basic collapsed Relative clause

the happy student det amod the happy student det amod

  • f

prep linguistics pobj the happy student det amod linguistics prep_of the student det won rcmod who nsubj

16 / 42

slide-26
SLIDE 26

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Modification

Predicative constructions

Basic Lexical pred Lexical Small clause

Edna is happy nsubj cop Edna seems happy nsubj cop Edna looked nsubj happy acomp Edna considers nsubj happy xcomp Sam nsubj

Adverbs

surprisingly not surprisingly wonderfully happy amazingly happy happy in no way happy

wonderfully happy advmod surprisingly amazingly happy advmod advmod not surprisingly happy neg advmod Edna is no way dep happy nsubj cop advmod

17 / 42

slide-27
SLIDE 27

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Coordination: conj and cc

Nominals (here, nsubj)

Ivan and Penny left. basic collapsed

Ivan and cc Penny conj left nsubj Ivan Penny conj_and left nsubj nsubj

Verb phrases

Nobody sang and danced. basic collapsed

Nobody sang nsubj and cc danced conj Nobody sang nsubj danced conj_and nsubj

18 / 42

slide-28
SLIDE 28

Introduction Overview Argument structure advmod Classifiers Negation Refs.

advmod dependencies

totally open

  • tall, short

lower closed

  • wet, bent

upper closed

  • pure, straight

totally closed

  • paque, open

Adverbs for distinguishing scales

  • Maximality: completely, fully, totally, absolutely, 100%, perfectly, . . .
  • Proportion: half, mostly, most of the way, two-thirds, three-sevenths, . . .
  • Minimality: slightly, somewhat, partially, . . .

Adverb Totally open Totally closed Upper closed Lower closed Maximality *

  • *

Proportion *

  • *

* Minimality *

  • *
  • Table: Summary of adverb patterns.

(Kennedy and McNally 2005; Kennedy 2007; Syrett and Lidz 2010)

19 / 42

slide-29
SLIDE 29

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Gigaword NYT (h/t to Nate Chambers for the parsing!)

Available in list format (tab-separated values):

http://www.stanford.edu/class/cs224u/restricted/data/gigawordnyt-advmod.tsv.zip Or: /afs/ir/class/cs224u/WWW/restricted/data/gigawordnyt-advmod.tsv.zip

Pairs advmod(X, Y) with counts: 1 end here 98434 2 well as 84031 3 longer no 74486 4 far so 71853 5 much so 71460 6 now right 66373 7 much too 66264 8 much how 64794 9 said also 62588 10 year earlier 60290 . . . 3211133 scuff how 1

20 / 42

slide-30
SLIDE 30

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Gigaword NYT (h/t to Nate Chambers for the parsing!)

dependent × parent matrix: raw counts

when also just now more so even how where as is 17663 21310 10853 46433 2094 8204 8388 14546 22985 2039 have 20657 20156 18757 31288 2162 7508 13003 4184 12573 1572 was 26976 10634 8253 3014 1265 4025 5644 6554 11818 1920 said 19695 62588 3984 4953 923 4933 6198 575 4209 608 much 207 145 4184 474 10079 71460 421 64794 140 46174 are 11546 14212 4929 23470 2418 7591 4779 7952 19832 1214 get 19342 4004 8474 5811 1401 2657 5930 14477 6840 718 do 8299 1550 7908 9899 2733 37339 2915 14474 2376 598 ’s 7811 9488 8815 13779 1371 3949 4293 1690 6281 1500 had 16854 16247 7039 3128 1512 1703 7930 1735 6936 1742

Dependent × parent matrix: positive PMI with contextual discounting

when also just now more so even how where as is 0.00 0.04 0.00 1.12 0.00 0.00 0.00 0.16 0.65 0.00 have 0.00 0.30 0.48 1.05 0.00 0.00 0.38 0.00 0.36 0.00 was 0.23 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.40 0.00 said 0.00 1.56 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 much 0.00 0.00 0.00 0.00 0.11 2.01 0.00 2.09 0.00 1.80 are 0.00 0.17 0.00 0.98 0.00 0.00 0.00 0.09 1.04 0.00 get 0.32 0.00 0.21 0.00 0.00 0.00 0.12 1.00 0.28 0.00 do 0.00 0.00 0.14 0.42 0.00 1.77 0.00 1.00 0.00 0.00 ’s 0.00 0.07 0.25 0.75 0.00 0.00 0.00 0.00 0.20 0.00 had 0.22 0.65 0.06 0.00 0.00 0.00 0.45 0.00 0.34 0.00

20 / 42

slide-31
SLIDE 31

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Some neighbors (cosine distance, PPMI+discounting matrix)

Adverbs

absolutely certainly never recently somewhat quickly utterly definitely not subsequently slightly swiftly totally surely maybe ago considerably soon truly probably either since decidedly gradually completely

  • bviously

ever later extremely rapidly equally undoubtedly yes shortly terribly slowly quite necessarily why previously very eventually

  • bviously

indeed would first markedly immediately really clearly simply when equally promptly whatsoever therefore pray already more fast

Adjectives

happy sad tall full straight closed excited painful large empty largest closing pleased frustrating wide tight straightforward shut nice tragic steep complete twice sealed comfortable depressing strong crowded best halted silly ugly thin

  • ver

certain corp. proud embarrassing lucky solid steady suspended good beautiful quick smooth

  • rdinary

retired nervous dumb good dark decent canceled uncomfortable unfortunate high filled smooth ending

21 / 42

slide-32
SLIDE 32

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Latent Semantic Analysis

1 Apply singular value decomposition to the PPMI+discounting matrix. 2 Inspect singular values; settle on 25 dimensions:

1 94 218 357 496 635 774 913 1068 1238 1408 1578 1748 1918 2088 2258 2428 2598 2768 2938

Value Rank 50 100 150 200 250 300

3 For rows (dependents): R[ , 1 : 25] × S[1 : 25, 1 : 25] 4 For columns (dependents): S[1 : 25, 1 : 25] × C[ , 1 : 25]T

22 / 42

slide-33
SLIDE 33

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Latent Semantic Analysis

1 Apply singular value decomposition to the PPMI+discounting matrix. 2 Inspect singular values; settle on 25 dimensions:

1 5 9 13 18 23 28 33 38 43 48 53 58 63 68 73 78 83 88 93 98 104 110 116 122 128

Value Rank 50 100 150 200 250 300

3 For rows (dependents): R[ , 1 : 25] × S[1 : 25, 1 : 25] 4 For columns (dependents): S[1 : 25, 1 : 25] × C[ , 1 : 25]T

22 / 42

slide-34
SLIDE 34

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Some adverb neighbors (cosine distance, PPMI + discounting + LSA)

Adverbs without LSA (repeated from earlier)

absolutely certainly never recently somewhat quickly utterly definitely not subsequently slightly swiftly totally surely maybe ago considerably soon truly probably either since decidedly gradually completely

  • bviously

ever later extremely rapidly equally undoubtedly yes shortly terribly slowly quite necessarily why previously very eventually

  • bviously

indeed would first markedly immediately really clearly simply when equally promptly whatsoever therefore pray already more fast

Adverbs with LSA (25 dimensions)

absolutely certainly never recently somewhat quickly utterly surely you subsequently palpably swiftly truly definitely maybe later decidedly soon totally probably just d.calif seeming prematurely manifestly doubt yes ago any instantly wholly undoubtedly

  • k

r.ohio slightly immediately patently necessarily q shortly congenitally speedily hardly importantly pray first distinctly eventually indisputably doubtless hey d.mo visibly gradually flat.out secondly anyway since sufficiently slowly

23 / 42

slide-35
SLIDE 35

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Some adjective neighbors (cosine distance, PPMI + discounting + LSA)

Adjectives without LSA (repeated from earlier)

happy sad tall full straight closed excited painful large empty largest closing pleased frustrating wide tight straightforward shut nice tragic steep complete twice sealed comfortable depressing strong crowded best halted silly ugly thin

  • ver

certain corp. proud embarrassing lucky solid steady suspended good beautiful quick smooth

  • rdinary

retired nervous dumb good dark decent canceled uncomfortable unfortunate high filled smooth ending

Adjectives with LSA (25 dimensions)

happy sad tall full straight closed nice ugly thick light normal suspended terrible scary deep flat free shut strange weird loud calm flat retired cute strange bright dry natural halted scary tragic cheap smooth certain replaced wild nasty tight quiet conventional stopped excited dumb fast cool routine cleared cool boring hot soft benign locked special

  • dd

quick steady reasonable sealed

24 / 42

slide-36
SLIDE 36

Introduction Overview Argument structure advmod Classifiers Negation Refs.

t-SNE (van der Maaten and Geoffrey 2008) 2d embedding of the PPMI+discounting matrix: adverbs

25 / 42

slide-37
SLIDE 37

Introduction Overview Argument structure advmod Classifiers Negation Refs.

t-SNE (van der Maaten and Geoffrey 2008) 2d embedding of the PPMI+discounting matrix: adverbs

25 / 42

slide-38
SLIDE 38

Introduction Overview Argument structure advmod Classifiers Negation Refs.

t-SNE (van der Maaten and Geoffrey 2008) 2d embedding of the PPMI+discounting matrix: adverbs

25 / 42

slide-39
SLIDE 39

Introduction Overview Argument structure advmod Classifiers Negation Refs.

t-SNE (van der Maaten and Geoffrey 2008) 2d embedding of the PPMI+discounting matrix: adverbs

25 / 42

slide-40
SLIDE 40

Introduction Overview Argument structure advmod Classifiers Negation Refs.

t-SNE (van der Maaten and Geoffrey 2008) 2d embedding of the PPMI+discounting matrix: dependents

26 / 42

slide-41
SLIDE 41

Introduction Overview Argument structure advmod Classifiers Negation Refs.

t-SNE (van der Maaten and Geoffrey 2008) 2d embedding of the PPMI+discounting matrix: dependents

26 / 42

slide-42
SLIDE 42

Introduction Overview Argument structure advmod Classifiers Negation Refs.

t-SNE (van der Maaten and Geoffrey 2008) 2d embedding of the PPMI+discounting matrix: dependents

26 / 42

slide-43
SLIDE 43

Introduction Overview Argument structure advmod Classifiers Negation Refs.

t-SNE (van der Maaten and Geoffrey 2008) 2d embedding of the PPMI+discounting matrix: dependents

26 / 42

slide-44
SLIDE 44

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Adverbial constructions

From a large collection of online product reviews:

Modifiers Count much more 4724 even more 4334 not very 2723 far more 2490 not too 2458 just plain 2117 just too 1938 very very 1819 not only 1771 way too 1594 little more 1508 not really 1422 . . . just not very 216 just too damn 89 really not very 82 not only very 79

  • nly slightly less

66 still not very 65 actually not too 58 still pretty darn 49

not very happy neg advmod

  • nly

slightly less happy advmod advmod advmod really not too happy advmod neg advmod 27 / 42

slide-45
SLIDE 45

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Classifier hypothesis: dependency edges beat bigrams

                    det(movie, This) → 1 nsubj(good, movie) → 1 aux(good, does) → 1 neg(good, not) → 1 cop(good, seem) → 1                                                   ‘<s> This’ → 1 ‘This movie’ → 1 ‘movie does’ → 1 ‘does not’ → 1 ‘not seem’ → 1 ‘seem good’ → 1 ‘good </s>’ → 1                              

Figure: This movie does not seem good

               det(scenery, the) → 1 nsubj(spectacular, scenery) → 1 cop(spectacular, was) → 1 conj but(spectacular, distracting) → 1                                              ‘<s> The’ → 1 ‘The scenery’ → 1 ‘scenery was’ → 1 ‘was spectacular’ → 1 ‘spectacular but’ → 1 ‘but distracting’ → 1 ‘distracting </s>’ → 1                              

Figure: This scenery was spectacular but distracting

28 / 42

slide-46
SLIDE 46

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Positive/negative sentiment with IMDB reviews

20K positive and 20K negative reviews from this collection: http://ai.stanford.edu/˜amaas/data/sentiment/

<sentence> <str>honestly , this is the worst franchise exploitation train wreck since .. .</str> <dep>[advmod(wreck-10, honestly-1), nsubj(wreck-10, this-3), ... ]</dep> </sentence> <sentence> <str>predator : requiem disaster .</str> <dep>[nn(disaster-4, requiem-3), dep(predator-1, disaster-4)]</dep> </sentence> . . .

Data and my code (using Python/sklearn): http://www.stanford.edu/class/cs224u/code/depvsbigram.zip

29 / 42

slide-47
SLIDE 47

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Experimental set-up

Logistic Regression (MaxEnt) classifier. For each feature set: Data and my code (using Python/sklearn): http://www.stanford.edu/class/cs224u/code/depvsbigram.zip

30 / 42

slide-48
SLIDE 48

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Experimental set-up

Logistic Regression (MaxEnt) classifier. For each feature set:

1 Feature extraction: texts to vectors of feature counts.

Data and my code (using Python/sklearn): http://www.stanford.edu/class/cs224u/code/depvsbigram.zip

30 / 42

slide-49
SLIDE 49

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Experimental set-up

Logistic Regression (MaxEnt) classifier. For each feature set:

1 Feature extraction: texts to vectors of feature counts. 2 Randomly split the data:

Data and my code (using Python/sklearn): http://www.stanford.edu/class/cs224u/code/depvsbigram.zip

30 / 42

slide-50
SLIDE 50

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Experimental set-up

Logistic Regression (MaxEnt) classifier. For each feature set:

1 Feature extraction: texts to vectors of feature counts. 2 Randomly split the data:

  • 50% dev-set

Data and my code (using Python/sklearn): http://www.stanford.edu/class/cs224u/code/depvsbigram.zip

30 / 42

slide-51
SLIDE 51

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Experimental set-up

Logistic Regression (MaxEnt) classifier. For each feature set:

1 Feature extraction: texts to vectors of feature counts. 2 Randomly split the data:

  • 50% dev-set
  • 50% eval-set

Data and my code (using Python/sklearn): http://www.stanford.edu/class/cs224u/code/depvsbigram.zip

30 / 42

slide-52
SLIDE 52

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Experimental set-up

Logistic Regression (MaxEnt) classifier. For each feature set:

1 Feature extraction: texts to vectors of feature counts. 2 Randomly split the data:

  • 50% dev-set
  • 50% eval-set

3 With the dev-set, find the top 5000 most informative features (using a χ2 test

  • f association) and the best regularization regime (L1 vs. L2, regularization

strength in [0.1, 2]). Data and my code (using Python/sklearn): http://www.stanford.edu/class/cs224u/code/depvsbigram.zip

30 / 42

slide-53
SLIDE 53

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Experimental set-up

Logistic Regression (MaxEnt) classifier. For each feature set:

1 Feature extraction: texts to vectors of feature counts. 2 Randomly split the data:

  • 50% dev-set
  • 50% eval-set

3 With the dev-set, find the top 5000 most informative features (using a χ2 test

  • f association) and the best regularization regime (L1 vs. L2, regularization

strength in [0.1, 2]).

4 With the eval-set, evaluate the best model via 10-fold cross-validation.

Data and my code (using Python/sklearn): http://www.stanford.edu/class/cs224u/code/depvsbigram.zip

30 / 42

slide-54
SLIDE 54

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Experimental set-up

Logistic Regression (MaxEnt) classifier. For each feature set:

1 Feature extraction: texts to vectors of feature counts. 2 Randomly split the data:

  • 50% dev-set
  • 50% eval-set

3 With the dev-set, find the top 5000 most informative features (using a χ2 test

  • f association) and the best regularization regime (L1 vs. L2, regularization

strength in [0.1, 2]).

4 With the eval-set, evaluate the best model via 10-fold cross-validation. 5 F1 as the primary evaluation statistic; non-parametric Wilcoxon rank-sums

test to compare differences for statistical significance. Data and my code (using Python/sklearn): http://www.stanford.edu/class/cs224u/code/depvsbigram.zip

30 / 42

slide-55
SLIDE 55

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Results

unigrams bigrams dependencies 91.2 90.3 88.6

Figure: Results of 10-fold cross-validation. Error bars are standard errors. All pairs of models are statistically different (p < 0.001).

Features Penalty Prior Unigrams L2 0.1 Bigrams L2 0.2 Dependencies L2 0.2

31 / 42

slide-56
SLIDE 56

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Discussion

  • Ceiling effect?
  • Loss of information as a result of dependencies tokenization?
  • Sparsity induced by the interlocking dependency relations?
  • . . .

32 / 42

slide-57
SLIDE 57

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Negation

  • Negation is frequent, systematic, and semantically potent.
  • Let’s see if we can use dependencies to get a grip on what it means and

how it interacts with its fellow constituents.

  • The lessons learned should generalize to a wide range of semantic relations

and operations, many of which we will study during the unit on semantic composition.

33 / 42

slide-58
SLIDE 58

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Tracking the influence of negation: semantic scope

A few examples (of many): I didn’t enjoy it. I never enjoy it. I don’t think I will enjoy it.

I did n't enjoy nsubj aux neg it dobj I never enjoy nsubj neg it dobj I do n't think nsubj aux neg enjoy ccomp I will nsubj aux it dobj

34 / 42

slide-59
SLIDE 59

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Scope domains

Parse trees Op Scope domain for Op NP Op Scope domain for Op PP NP Op Scope domain for Op

  • Dependencies. ‘rel’ should exclude

certain non-scope relations.

Op {det, amod} ... ... rel Op ... rel

(Danescu-Niculescu-Mizil et al. 2009; Danescu-Niculescu-Mizil and Lee 2010)

35 / 42

slide-60
SLIDE 60

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Negation generalized: downward monotonicity

Definition (Upward monotonicity)

An operator δ is upward monotone iff for all expressions α in the domain of δ: if α ⊆ β, then (δα) ⊆ (δβ)

Definition (Downard monotonicity)

An operator δ is downward monotone iff for all expressions α in the domain of δ: if α ⊆ β, then (δβ) ⊆ (δα)

36 / 42

slide-61
SLIDE 61

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Negation generalized: downward monotonicity

Definition (Upward monotonicity)

An operator δ is upward monotone iff for all expressions α in the domain of δ: if α ⊆ β, then (δα) ⊆ (δβ)

Definition (Downard monotonicity)

An operator δ is downward monotone iff for all expressions α in the domain of δ: if α ⊆ β, then (δβ) ⊆ (δα) A student smoked. A Swedish student smoked. A student smoked cigars.

36 / 42

slide-62
SLIDE 62

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Negation generalized: downward monotonicity

Definition (Upward monotonicity)

An operator δ is upward monotone iff for all expressions α in the domain of δ: if α ⊆ β, then (δα) ⊆ (δβ)

Definition (Downard monotonicity)

An operator δ is downward monotone iff for all expressions α in the domain of δ: if α ⊆ β, then (δβ) ⊆ (δα) A student smoked.

  • A Swedish student smoked.

A student smoked cigars.

36 / 42

slide-63
SLIDE 63

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Negation generalized: downward monotonicity

Definition (Upward monotonicity)

An operator δ is upward monotone iff for all expressions α in the domain of δ: if α ⊆ β, then (δα) ⊆ (δβ)

Definition (Downard monotonicity)

An operator δ is downward monotone iff for all expressions α in the domain of δ: if α ⊆ β, then (δβ) ⊆ (δα) A student smoked.

  • A Swedish student smoked.

A student smoked cigars. No student smoked. No Swedish student smoked. No student smoked cigars.

36 / 42

slide-64
SLIDE 64

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Negation generalized: downward monotonicity

Definition (Upward monotonicity)

An operator δ is upward monotone iff for all expressions α in the domain of δ: if α ⊆ β, then (δα) ⊆ (δβ)

Definition (Downard monotonicity)

An operator δ is downward monotone iff for all expressions α in the domain of δ: if α ⊆ β, then (δβ) ⊆ (δα) A student smoked.

  • A Swedish student smoked.

A student smoked cigars. No student smoked.

  • No Swedish student smoked.

No student smoked cigars.

36 / 42

slide-65
SLIDE 65

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Negation generalized: downward monotonicity

Definition (Upward monotonicity)

An operator δ is upward monotone iff for all expressions α in the domain of δ: if α ⊆ β, then (δα) ⊆ (δβ)

Definition (Downard monotonicity)

An operator δ is downward monotone iff for all expressions α in the domain of δ: if α ⊆ β, then (δβ) ⊆ (δα) A student smoked.

  • A Swedish student smoked.

A student smoked cigars. No student smoked.

  • No Swedish student smoked.

No student smoked cigars. Every student smoked. Every Swedish student smoked. Every student smoked cigars.

36 / 42

slide-66
SLIDE 66

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Negation generalized: downward monotonicity

Definition (Upward monotonicity)

An operator δ is upward monotone iff for all expressions α in the domain of δ: if α ⊆ β, then (δα) ⊆ (δβ)

Definition (Downard monotonicity)

An operator δ is downward monotone iff for all expressions α in the domain of δ: if α ⊆ β, then (δβ) ⊆ (δα) A student smoked.

  • A Swedish student smoked.

A student smoked cigars. No student smoked.

  • No Swedish student smoked.

No student smoked cigars. Every student smoked.

  • Every Swedish student smoked.

Every student smoked cigars.

36 / 42

slide-67
SLIDE 67

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Negation generalized: downward monotonicity

Definition (Upward monotonicity)

An operator δ is upward monotone iff for all expressions α in the domain of δ: if α ⊆ β, then (δα) ⊆ (δβ)

Definition (Downard monotonicity)

An operator δ is downward monotone iff for all expressions α in the domain of δ: if α ⊆ β, then (δβ) ⊆ (δα) A student smoked.

  • A Swedish student smoked.

A student smoked cigars. No student smoked.

  • No Swedish student smoked.

No student smoked cigars. Every student smoked.

  • Every Swedish student smoked.

Every student smoked cigars. Few students smoked. Few Swedish students smoked. Few students smoked cigars.

36 / 42

slide-68
SLIDE 68

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Negation generalized: downward monotonicity

Definition (Upward monotonicity)

An operator δ is upward monotone iff for all expressions α in the domain of δ: if α ⊆ β, then (δα) ⊆ (δβ)

Definition (Downard monotonicity)

An operator δ is downward monotone iff for all expressions α in the domain of δ: if α ⊆ β, then (δβ) ⊆ (δα) A student smoked.

  • A Swedish student smoked.

A student smoked cigars. No student smoked.

  • No Swedish student smoked.

No student smoked cigars. Every student smoked.

  • Every Swedish student smoked.

Every student smoked cigars. Few students smoked.

  • Few Swedish students smoked.

Few students smoked cigars.

36 / 42

slide-69
SLIDE 69

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Marking the scope of negation

A few examples (of many):

the movie was not very good . the movie det was not very good nsubj cop neg advmod i rarely enjoy horror movies . i rarely enjoy dep advmod movies dobj horror nn i do n't think that is a good idea . i do n't think nsubj aux neg idea ccomp that is a good complm cop det amod

37 / 42

slide-70
SLIDE 70

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Approximation with tokenized strings

I’d be remiss if I didn’t point out that the effects of negation can be nicely approximated by a string-level operation (Das and Chen 2001; Pang et al. 2002).

1 Tokenize in a way that isolates and preserves clause-level punctuation.

Starter Python tokenizer:

http://sentiment.christopherpotts.net/code-data/happyfuntokenizing.py

2 Append a NEG suffix to every word appearing between a negation and a

clause-level punctuation mark.

3 A negation is any word matching this regex:

(?: ˆ(?:never|no|nothing|nowhere|noone|none|not| havent|hasnt|hadnt|cant|couldnt|shouldnt| wont|wouldnt|dont|doesnt|didnt|isnt|arent|aint )$ ) | n’t

38 / 42

slide-71
SLIDE 71

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Predicting the effects of negation using IMDB user-supplied reviews

Outside the scope of negation

good – 732,963 tokens

Category

  • 4.5
  • 3.5
  • 2.5
  • 1.5
  • 0.5

0.5 1.5 2.5 3.5 4.5 0.08 0.13

Cat = 0.01 (p = 0.152) Cat^2 = -0.02 (p < 0.001)

bad – 254,146 tokens

Category

  • 4.5
  • 3.5
  • 2.5
  • 1.5
  • 0.5

0.5 1.5 2.5 3.5 4.5 0.04 0.09 0.14 0.18 0.23

Cat = -0.2 (p < 0.001) Cat^2 = 0.01 (p < 0.001)

excellent – 136,404 tokens

Category

  • 4.5
  • 3.5
  • 2.5
  • 1.5
  • 0.5

0.5 1.5 2.5 3.5 4.5 0.03 0.07 0.12 0.16 0.21

Cat = 0.22 (p < 0.001)

terrible – 45,470 tokens

Category

  • 4.5
  • 3.5
  • 2.5
  • 1.5
  • 0.5

0.5 1.5 2.5 3.5 4.5 0.03 0.07 0.1 0.15 0.22 0.3

Cat = -0.28 (p < 0.001) Cat^2 = 0.02 (p < 0.001)

39 / 42

slide-72
SLIDE 72

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Predicting the effects of negation using IMDB user-supplied reviews

Outside the scope of negation

good – 732,963 tokens

Category

  • 4.5
  • 3.5
  • 2.5
  • 1.5
  • 0.5

0.5 1.5 2.5 3.5 4.5 0.08 0.13

Cat = 0.01 (p = 0.152) Cat^2 = -0.02 (p < 0.001)

bad – 254,146 tokens

Category

  • 4.5
  • 3.5
  • 2.5
  • 1.5
  • 0.5

0.5 1.5 2.5 3.5 4.5 0.04 0.09 0.14 0.18 0.23

Cat = -0.2 (p < 0.001) Cat^2 = 0.01 (p < 0.001)

excellent – 136,404 tokens

Category

  • 4.5
  • 3.5
  • 2.5
  • 1.5
  • 0.5

0.5 1.5 2.5 3.5 4.5 0.03 0.07 0.12 0.16 0.21

Cat = 0.22 (p < 0.001)

terrible – 45,470 tokens

Category

  • 4.5
  • 3.5
  • 2.5
  • 1.5
  • 0.5

0.5 1.5 2.5 3.5 4.5 0.03 0.07 0.1 0.15 0.22 0.3

Cat = -0.28 (p < 0.001) Cat^2 = 0.02 (p < 0.001)

In the scope of negation

neg(good) – 169,772 tokens

Category

  • 4.5
  • 3.5
  • 2.5
  • 1.5
  • 0.5

0.5 1.5 2.5 3.5 4.5 0.07 0.12

Cat = -0.06 (p < 0.001) Cat^2 = -0.01 (p < 0.001)

neg(bad) – 113,865 tokens

Category

  • 4.5
  • 3.5
  • 2.5
  • 1.5
  • 0.5

0.5 1.5 2.5 3.5 4.5 0.04 0.09 0.14

Cat = -0.14 (p < 0.001) Cat^2 = -0.02 (p = 0.011)

neg(excellent) – 10,393 tokens

Category

  • 4.5
  • 3.5
  • 2.5
  • 1.5
  • 0.5

0.5 1.5 2.5 3.5 4.5 0.05 0.1 0.17

Cat = 0.15 (p < 0.001)

neg(terrible) – 9,936 tokens

Category

  • 4.5
  • 3.5
  • 2.5
  • 1.5
  • 0.5

0.5 1.5 2.5 3.5 4.5 0.02 0.1 0.14 0.21

Cat = -0.25 (p < 0.001)

39 / 42

slide-73
SLIDE 73

Introduction Overview Argument structure advmod Classifiers Negation Refs.

Generalizing further still: commitment and perspective

Overview

  • Whereas neg(p) entails that p is not factual,
  • speech and attitude predicates are semantically consistent with p and its

negation,

  • though the pragmatics is a lot more complicated; (de Marneffe et al. 2012).

Examples

1 The dictator claimed that no citizens were injured. 2 The Red Cross claimed that no citizens were injured. 3 They said it would be horrible, but they were wrong: I loved it!!!

How might we get a grip on the semantic effects of these predicates?

40 / 42

slide-74
SLIDE 74

Introduction Overview Argument structure advmod Classifiers Negation Refs.

References I

Balahur, Alexandra; Zornitsa Kozareva; and Andr´ es Montoyo. 2009. Determining the polarity and source of opinions expressed in political debates. In Alexander Gelbukh, ed., Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing, 468–480. Berlin: Springer. doi:\bibinfo{doi}{10.1007/978-3-642-00382-0 38}. Danescu-Niculescu-Mizil, Cristian and Lillian Lee. 2010. Don’t ‘have a clue’? Unsupervised co-learning of downward-entailing operators. In Proceedings of the ACL 2010 Conference Short Papers, 247–252. Uppsala, Sweden: Association for Computational Linguistics. Danescu-Niculescu-Mizil, Cristian; Lillian Lee; and Richard Ducott. 2009. Without a ‘doubt’? Unsupervised discovery of downward-entailing

  • perators. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association

for Computational Linguistics, 137–145. Association for Computational Linguistics. Das, Sanjiv and Mike Chen. 2001. Yahoo! for Amazon: Extracting market sentiment from stock message boards. In Proceedings of the 8th Asia Pacific Finance Association Annual Conference. de Marneffe, Marie-Catherine; Bill MacCartney; and Christopher D. Manning. 2006. Generating typed dependency parses from phrase structure

  • parses. In Proceedings of the Fifth International Conference on Language Resources and Evaluation, 449–454. ACL.

Johansson, Richard and Pierre Nugues. 2008. Dependency-based semantic role labeling of PropBank. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, 69–78. Honolulu, Hawaii: Association for Computational Linguistics. Kennedy, Christopher. 2007. Vagueness and grammar: The semantics of relative and absolute gradable adjective. Linguistics and Philosophy 30(1):1–45. Kennedy, Christopher and Louise McNally. 2005. Scale structure and the semantic typology of gradable predicates. Language 81(2):345–381. Lerman, Kevin; Ari Gilder; Mark Dredze; and Fernando Pereira. 2008. Reading the markets: Forecasting public opinion of political candidates by news analysis. In Proceedings of the 22nd International Conference on Computational Linguistics, 473–480. Manchester, UK: Association for Computational Linguistics. Liang, Percy; Michael I. Jordan; and Dan Klein. 2013. Learning dependency-based compositional semantics. Computational Linguistics 39(2):389–446. doi:\bibinfo{doi}{10.1162/COLI a 00127}. Lin, Dekang. 1998. Automatic retrieval and clustering of similar words. In Proceedings of COLING-ACL, 768–774. Montreal: ACl. van der Maaten, Laurens and Hinton Geoffrey. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9:2579–2605. de Marneffe, Marie-Catherine; Miriam Connor; Natalia Silveira; Samuel R. Bowman; Timothy Dozat; and Christopher D. Manning. 2013. More constructions, more genres: Extending Stanford Dependencies. In Eva Hajiˇ cov´ a; Kim Gerdes; and Leo Wanner, eds., Proceedings of the Second International Conference on Dependency Linguistics, 187–196. Prague. de Marneffe, Marie-Catherine and Christopher D. Manning. 2008a. Stanford Typed Dependencies Manual. Stanford University. de Marneffe, Marie-Catherine and Christopher D. Manning. 2008b. The Stanford typed dependencies representation. In Proceedings of the COLING 2008 Workshop on Cross-Framework and Cross-Domain Parser Evaluation, 1–8. ACL. de Marneffe, Marie-Catherine; Christopher D. Manning; and Christopher Potts. 2012. Did it happen? The pragmatic complexity of veridicality

  • assessment. Computational Linguistics 38(2):301–333.

Mintz, Mike; Steven Bills; Rion Snow; and Daniel Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, 1003–1011. Suntec, Singapore: Association for Computational Linguistics. 41 / 42

slide-75
SLIDE 75

Introduction Overview Argument structure advmod Classifiers Negation Refs.

References II

Pang, Bo; Lillian Lee; and Shivakumar Vaithyanathan. 2002. Thumbs up? sentiment classification using machine learning techniques. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 79–86. Philadelphia: Association for Computational Linguistics. Percha, Bethany; Yael Garten; and Russ B. Altman. 2012. Discovering and explanation of drug–drug interactions via text mining. Pacific Symposium on Biocomputing 410–421. Snow, Rion; Daniel Jurafsky; and Andrew Y. Ng. 2005. Learning syntactic patterns for automatic hypernym discovery. In Lawrence K. Saul; Yair Weiss; and L´ eon Bottou, eds., Advances in Neural Information Processing Systems 17, 1297–1304. Cambridge, MA: MIT Press. Surdeanu, Mihai; Richard Johansson; Adam Meyers; Llu´ ıs M` arquez; and Joakim Nivre. 2008. The CoNLL 2008 shared task on joint parsing of syntactic and semantic dependencies. In CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning, 159–177. Manchester: Coling 2008 Organizing Committee. Syrett, Kristen and Jeffrey Lidz. 2010. 30-month-olds use the distribution and meaning of adverbs to interpret novel adjectives. Language Learning and Development 6(4):258–282. 42 / 42