[PPT] - Semantic Roles, Frames, and Expectations CMSC 473/673 UMBC PowerPoint Presentation

SLIDE 1

Semantic Roles, Frames, and Expectations

CMSC 473/673 UMBC November 27th and 29th, 2017

SLIDE 2

Course Announcement 1: Assignment 4

Due Monday December 11th (~2 weeks) Any questions?

SLIDE 3

Course Announcement 2: Final Exam

No mandatory final exam December 20th, 1pm-3pm: optional second midterm/final Averaged into first midterm score No practice questions Register by Monday 12/11: https://goo.gl/forms/aXflKkP0BIRxhOS83

SLIDE 4

Recap from last time…

SLIDE 5

Probabilistic Context Free Grammar (PCFG) Tasks

Find the most likely parse (for an observed sequence) Calculate the (log) likelihood of an observed sequence w1, …, wN Learn the grammar parameters

SLIDE 6

CKY Algorithms

Weights   ⓪ ①

Recognizer Boolean (True/False)

r

and False True Viterbi [0,1] max * 1 Inside [0,1] + * 1 Outside? Not really (“Semiring Parsing,” Goodman, 1998). But there is a connection between inside-outside and backprop! (“Inside-Outside and Forward-Backward Algorithms are Just Backprop,” Eisner, 2016)

Adapted from Jason Eisner

SLIDE 7

Expectation Maximization (EM)

0. Assume some value for your parameters

Two step, iterative algorithm

1. E-step: count under uncertainty, assuming these

parameters

2. M-step: maximize log-likelihood, assuming these

uncertain counts

estimated counts

p(X  Y Z)

“Inside-outside”

𝔽[𝑌 → 𝑍 𝑎 | 𝑥1𝑥2 ⋯ 𝑥𝑂] = 𝑞(𝑌 → 𝑍 𝑎) 𝑀(𝑥1𝑥2 ⋯ 𝑥𝑂) ෍

0≤𝑗<𝑙<𝑘≤𝑂

𝛽 𝑌, 𝑗, 𝑘 𝛾 𝑍, 𝑗, 𝑙 𝛾 𝑎, 𝑙, 𝑘 𝔽[𝑌 → 𝑏 | 𝑥1𝑥2 ⋯ 𝑥𝑂] =

𝑞(𝑌 → 𝑏) 𝑀(𝑥1𝑥2 ⋯ 𝑥𝑂) ෍

0≤𝑗<𝑂:𝑥𝑗=𝑏

𝛽 𝑌, 𝑗, 𝑗 + 1

SLIDE 8

Projective Dependency Trees

No crossing arcs

SLP3: Figs 14.2, 14.3

✔ Projective ✖ Not projective non projective parses capture

certain long-range dependencies
free word order

SLIDE 9

Are CFGs for Naught?

Nope! Simple algorithm from Xia and Palmer (2011)

1. Mark the head child of

each node in a phrase structure, using “appropriate” head rules.

2. In the dependency

structure, make the head

f each non-head child

depend on the head of the head-child.

Papa ate the caviar with a spoon

NP V D N P D N NP NP PP VP VP S

ate spoon spoon caviar ate ate

SLIDE 10

(Some) Dependency Parsing Algorithms

Dynamic Programming Eisner Algorithm (Eisner 1996) Transition-based Shift-reduce, arc standard Graph-based Maximum spanning tree

SLIDE 11

Shift-Reduce Dependency Parsing

Tools: input words, some special root symbol ($), and a stack to hold configurations Shift:

– move tokens onto the stack – decide if top two elements of the stack form a valid (good) grammatical dependency

Reduce:

– If there’s a valid relation, place head on the stack

decide how? Search problem! what is valid? Learn it! what are the possible actions?

SLIDE 12

Arc Standard Parsing

state  {[root], [words], [] } while state ≠ {[root], [], [(deps)]} { t ← ORACLE(state) state ← APPLY(t, state) } return state

Possibility Action Name Action Meaning Assign the current word as the head of some previously seen word LEFTARC Assert a head-dependent relation between the word at the top of stack and the word directly beneath it; remove the lower word from the stack Assign some previously seen word as the head of the current word RIGHTARC Assert a head-dependent relation between the second word on the stack and the word at the top; remove the word at the top of the stack Wait processing the current word; add it for later SHIFT Remove the word from the front of the input buffer and push it onto the stack

SLIDE 13

Papa ate the caviar

Deps Stack Word Buffer Action

$

Papa ate the caviar SHIFT

Papa $

ate the caviar SHIFT

ate Papa $

the caviar LEFTARC ate->Papa ate $ caviar SHIFT ate->Papa the ate $

SHIFT

ate->Papa caviar the ate $

LEFTARC

ate->Papa, caviar-> the caviar ate $

RIGHTARC

ate->Papa, caviar-> the, ate->caviar ate $

RIGHTARC

ate->Papa, caviar-> the, ate->caviar, $->ate

SLIDE 14

Arc Standard Parsing

state  {[root], [words], [] } while state ≠ {[root], [], [(deps)]} { t ← ORACLE(state) state ← APPLY(t, state) } return state

Q: What is the time complexity?

SLIDE 15

Arc Standard Parsing

state  {[root], [words], [] } while state ≠ {[root], [], [(deps)]} { t ← ORACLE(state) state ← APPLY(t, state) } return state

Q: What is the time complexity? A: Linear

SLIDE 16

Arc Standard Parsing

state  {[root], [words], [] } while state ≠ {[root], [], [(deps)]} { t ← ORACLE(state) state ← APPLY(t, state) } return state

Q: What is the time complexity? A: Linear Q: What’s potentially problematic?

SLIDE 17

Arc Standard Parsing

state  {[root], [words], [] } while state ≠ {[root], [], [(deps)]} { t ← ORACLE(state) state ← APPLY(t, state) } return state

Q: What is the time complexity? A: Linear Q: What’s potentially problematic? A: This is a greedy algorithm

SLIDE 18

Learning An Oracle (Predictor)

Training data: dependency treebank Input: configuration Output: {LEFTARC, RIGHTARC, SHIFT}

t ← ORACLE(state)

Choose LEFTARC if it produces a correct head-dependent relation given the reference

parse and the current configuration

Choose RIGHTARC if
it produces a correct head-dependent relation given the reference parse and
all of the dependents of the word at the top of the stack have already been

assigned

Otherwise, choose SHIFT

SLIDE 19

Training the Predictor

Predict action t give configuration s t = φ(s) Extract features of the configuration Examples: word forms, lemmas, POS, morphological features How? Perceptron, Maxent, Support Vector Machines, Multilayer Perceptrons, Neural Networks

Take CMSC 478 (678) to learn more about these

SLIDE 20

Becoming Less Greedy

Beam search Breadth-first search strategy (CMSC 471/671) At each stage, keep K options open

SLIDE 21

Evaluation

Exact Match (per-sentence accuracy) Unlabeled Attachment Score (UAS) Labeled Attachment Score (LS, LAS) Recall/Precision/F1 for particular relation types

SLIDE 22

From Dependencies to Shallow Semantics

SLIDE 23

From Syntax to Shallow Semantics

Angeli et al. (2015)

“Open Information Extraction”

SLIDE 24

From Syntax to Shallow Semantics

http://corenlp.run/ (constituency & dependency) https://github.com/hltcoe/predpatt http://openie.allenai.org/ http://www.cs.rochester.edu/research/knext/browse/ (constituency trees) http://rtw.ml.cmu.edu/rtw/

Angeli et al. (2015)

“Open Information Extraction” a sampling of efforts

SLIDE 25

Semantic Role Labeling

Who did what to whom at where?

The police officer detained the suspect at the scene of the crime

ARG0 ARG2 AM-loc V

Agent Theme Predicate Location

Following slides adapted from SLP3

SLIDE 26

Predicate Alternations

XYZ corporation bought the stock.

SLIDE 27

Predicate Alternations

XYZ corporation bought the stock. They sold the stock to XYZ corporation.

SLIDE 28

Predicate Alternations

XYZ corporation bought the stock. They sold the stock to XYZ corporation. The stock was bought by XYZ corporation. The purchase of the stock by XYZ corporation... The stock purchase by XYZ corporation...

SLIDE 29

A Shallow Semantic Representation: Semantic Roles

Predicates (bought, sold, purchase) represent a situation Semantic roles express the abstract role that arguments of a predicate can take in the event

buyer proto-agent agent More specific More general

(event)

SLIDE 30

Thematic roles

Sasha broke the window Pat opened the door Subjects of break and open: Breaker and Opener Specific to each event

SLIDE 31

Thematic roles

Sasha broke the window Pat opened the door Subjects of break and open: Breaker and Opener Specific to each event Breaker and Opener have something in common!

Volitional actors Often animate Direct causal responsibility for their events

Thematic roles are a way to capture this semantic commonality between Breakers and Eaters.

SLIDE 32

Thematic roles

Sasha broke the window Pat opened the door Subjects of break and open: Breaker and Opener Specific to each event Breaker and Opener have something in common!

Volitional actors Often animate Direct causal responsibility for their events

Thematic roles are a way to capture this semantic commonality between Breakers and Eaters. They are both AGENTS. The BrokenThing and OpenedThing, are

THEMES.

prototypically inanimate objects affected in some way by the action

SLIDE 33

Thematic roles

Sasha broke the window Pat opened the door Subjects of break and open: Breaker and Opener Specific to each event Breaker and Opener have something in common!

Volitional actors Often animate Direct causal responsibility for their events

Thematic roles are a way to capture this semantic commonality between Breakers and Eaters. They are both AGENTS. The BrokenThing and OpenedThing, are THEMES.

prototypically inanimate objects affected in some way by the action

Modern formulation from Fillmore (1966,1968), Gruber (1965)

Fillmore influenced by Lucien Tesnière’s (1959) Êléments de Syntaxe Structurale, the book that introduced dependency grammar

SLIDE 34

Typical Thematic Roles

SLIDE 35

Verb Alternations (Diathesis Alternations)

Break: AGENT, INSTRUMENT, or THEME as subject Give: THEME and GOAL in either order

SLIDE 36

Verb Alternations (Diathesis Alternations)

Levin (1993): 47 semantic classes (“Levin classes”) for

3100 English verbs and alternations. In online resource

VerbNet.

Break: AGENT, INSTRUMENT, or THEME as subject Give: THEME and GOAL in either order

SLIDE 37

Issues with Thematic Roles

Hard to create (define) a standard set of roles Role fragmentation

SLIDE 38

Issues with Thematic Roles

Hard to create (define) a standard set of roles Role fragmentation

Levin and Rappaport Hovav (2015): two kinds of INSTRUMENTS intermediary instruments that can appear as subjects The cook opened the jar with the new gadget. The new gadget opened the jar. enabling instruments that cannot Shelly ate the sliced banana with a fork. *The fork ate the sliced banana.

SLIDE 39

Alternatives to Thematic Roles

1. Fewer roles: generalized semantic roles,

defined as prototypes (Dowty 1991)

PROTO-AGENT PROTO-PATIENT

2. More roles: Define roles specific to a group of

predicates

FrameNet PropBank

SLIDE 40

PropBank Frame Files

Palmer, Martha, Daniel Gildea, and Paul Kingsbury. 2005. The Proposition Bank: An Annotated Corpus of Semantic Roles. Computational Linguistics, 31(1):71–106

SLIDE 41

View Commonalities Across Sentences

SLIDE 42

Human Annotated PropBank Data

Penn English TreeBank, OntoNotes 5.0.

Total ~2 million words

Penn Chinese TreeBank Hindi/Urdu PropBank Arabic PropBank

–

Language Final Count English 10,615* Chinese 24, 642 Arabic 7,015

2013 Verb Frames Coverage

Count of word sense (lexical units)

From Martha Palmer 2013 Tutorial

SLIDE 43

FrameNet

Baker et al. 1998, Fillmore et al. 2003, Fillmore and Baker 2009, Ruppenhofer et al. 2006 Roles in PropBank are specific to a verb Role in FrameNet are specific to a frame

a background knowledge structure that defines a set of frame-specific semantic roles, called frame elements Frames can be related (inherited, demonstrate alternations, etc.)

SLIDE 44

The “Change position on a scale” Frame

This frame consists of words that indicate the change of an ITEM’s position on a scale (the ATTRIBUTE) from a starting point (INITIAL VALUE) to an end point (FINAL VALUE)

SLIDE 45

Lexical Triggers

The “Change position on a scale” Frame

SLIDE 46

Frame Roles (Elements)

The “Change position on a scale” Frame

SLIDE 47

FrameNet and PropBank representations

SLIDE 48

FrameNet and PropBank representations

SLIDE 49

Automatic Semantic Parses

English Gigaword, v5 Annotated NYT English Wikipedia Total Documents 8.74M 1.81M 5.06M 15.61M Sentences 170M 70M 154M 422M Tokens 4.3B 1.4B 2.3B 8B Vocabulary (≥ 100) 225K 120K 264K 91K Semantic Frames 2.6B 780M 1.1B 4.4B

Ferraro et al. (2014)

https://goo.gl/BrsG4x (or Globus---talk to me) talk to me

2x FrameNet 1x PropBank

SLIDE 50

Semantic Role Labeling (SRL)

Find the semantic roles of each argument of each predicate in a sentence.

SLIDE 51

Why Semantic Role Labeling

A useful shallow semantic representation Improves NLP tasks:

question answering (Shen and Lapata 2007, Surdeanu et al. 2011) machine translation (Liu and Gildea 2010, Lo et al. 2013)

SLIDE 52

A Simple Parse-Based Algorithm

Input: sentence Output: Labeled tree parse = GETPARSE(sentence) for each predicate in parse { for each node in parse { fv = EXTRACTFEATURES(node, predicate, parse) CLASSIFYNODE(node, fv, parse) } }

SLIDE 53

Simple Predicate Prediction

PropBank: choose all verbs FrameNet: choose every word that was labeled as a target in training data

SLIDE 54

SRL Features

Headword of constituent

Examiner

Headword POS

NNP

Voice of the clause

Active

Subcategorization of pred

VP -> VBD NP PP

Named Entity type of constituent

ORGANIZATION

First and last words of constituent

The, Examiner

Linear position re: predicate

before

Path Features

SLIDE 55

Path Features

Path in the parse tree from the constituent to the predicate

SLIDE 56

Path Features

Path in the parse tree from the constituent to the predicate

SLIDE 57

Frequent Path Features

Palmer, Gildea, Xue (2010)

SLIDE 58

3-step SRL

1. Pruning: use simple heuristics to prune

unlikely constituents.

2. Identification: a binary classification of each

node as an argument to be labeled or a NONE.

3. Classification: a 1-of-N classification of all the

constituents that were labeled as arguments by the previous stage

SLIDE 59

3-step SRL

1. Pruning: use simple heuristics to prune unlikely constituents. 2. Identification: a binary classification of each node as an argument to be labeled or a NONE. 3. Classification: a 1-of-N classification of all the constituents that were labeled as arguments by the previous stage

Pruning & Identification

Prune the very unlikely constituents first, and then use a classifier to get rid of the rest Very few of the nodes in the tree could possible be arguments of that

ne predicate

Imbalance between

positive samples (constituents that are arguments of predicate) negative samples (constituents that are not arguments of predicate)

SLIDE 60

Features for Frame Identification

Das et al (2014)

SLIDE 61

Joint-Inference SRL: Reranking

Stage 1: SRL system produces multiple possible labels for each constituent Stage 2: Find the best global label for all constituents

SLIDE 62

Joint-Inference SRL: Factor Graph

Make a large, probabilistic factor graph Run (loopy) belief propagation Take CMSC 678 (478) to learn more

SLIDE 63

Joint-Inference SRL: Neural/Deep SRL

Make a large (deep) neural network Run back propagation Take CMSC 678 (478) to learn more

SLIDE 64

Not Just English

SLIDE 65

Not Just Verbs: NomBank

Meyers et al. 2004 Figure from Jiang and Ng 2006

SLIDE 66

Additional Issues for Nouns

Features:

Nominalization lexicon (employment employ) Morphological stem

Different positions

Most arguments of nominal predicates occur inside the NP Others are introduced by support verbs Especially light verbs “X made an argument”, “Y took a nap”

SLIDE 67

Logical Forms of Sentences

SLIDE 68

Logical Forms of Sentences

Papa ate the caviar

NP V D N NP VP S

ate ate

SLIDE 69

Logical Forms of Sentences

Papa ate the caviar

NP V D N NP VP S

ate ate

SLIDE 70

Logical Forms of Sentences

Papa ate the caviar

NP V D N NP VP S

ate ate

SLIDE 71

Selectional Restrictions

I want to eat someplace nearby.

SLIDE 72

Selectional Restrictions

I want to eat someplace nearby.

(a)

SLIDE 73

Selectional Restrictions

I want to eat someplace nearby.

(a) (b)

SLIDE 74

Selectional Restrictions

I want to eat someplace nearby.

(a) (b) How do we know speaker didn’t mean (b)?

SLIDE 75

Selectional Restrictions

I want to eat someplace nearby.

(a) (b) How do we know speaker didn’t mean (b)? The THEME of eating tends to be something edible

SLIDE 76

Selectional Restrictions and Word Senses

The restaurant serves green-lipped mussels.

THEME is some kind of food

Which airlines serve Denver?

THEME is an appropriate location

SLIDE 77

Selectional Restrictions Vary in Specificity

I often ask the musicians to imagine a tennis game. To diagonalize a matrix is to find its eigenvalues. Radon is an odorless gas that can’t be detected by human senses.

SLIDE 78

One Way to Represent Selectional Restrictions

but do have a large knowledge base of facts about edible things?! (do we know a hamburger is edible? sort of)

SLIDE 79

WordNet

Knowledge graph containing concept relations

hamburger sandwich hero gyro

SLIDE 80

WordNet

Knowledge graph containing concept relations

hamburger sandwich hero gyro hypernym: specific to general a hamburger is-a sandwich

SLIDE 81

WordNet

Knowledge graph containing concept relations

hamburger sandwich hero gyro hyponym: general to specific a hamburger is-a sandwich

SLIDE 82

WordNet

Knowledge graph containing concept relations

hamburger sandwich hero gyro Other relationships too:

meronymy, holonymy

(part of whole, whole of part)

troponymy

(describing manner of an event)

entailment

(what else must happen in an event)

SLIDE 83

WordNet Knows About Hamburgers

hamburger sandwich snack food dish nutriment food substance matter physical entity entity

SLIDE 84

WordNet Synsets for Selectional Restrictions

“The THEME of eat must be WordNet synset {food, nutrient}” Similarly

THEME of imagine: synset {entity} THEME of lift: synset {physical entity} THEME of diagonalize: synset {matrix}

Allows:

imagine a hamburger and lift a hamburger,

Correctly rules out:

diagonalize a hamburger.

SLIDE 85

Selectional Preferences

Initially: strict constraints (Katz and Fodor 1963)

Eat [+FOOD]

which turned into preferences (Wilks 1975)

“But it fell apart in 1931, perhaps because people realized you can’t eat gold for lunch if you’re hungry.”

SLIDE 86

Computing Selectional Association (Resnik 1993)

A probabilistic measure of the strength of association between a predicate and a semantic class of its argument

Parse a corpus Count all the times each predicate appears with each argument word Assume each word is a partial observation of all the WordNet concepts associated with that word

Some high and low associations:

SLIDE 87

A Simpler Model of Selectional Association (Brockmann and Lapata, 2003)

Model just the association of predicate v with a single noun n

Parse a huge corpus Count how often a noun n occurs in relation r with verb v:

log count(n,v,r)

(or the probability)

SLIDE 88

A Simpler Model of Selectional Association (Brockmann and Lapata, 2003)

Model just the association of predicate v with a single noun n

Parse a huge corpus Count how often a noun n occurs in relation r with verb v:

log count(n,v,r)

(or the probability)

See: Bergsma, Lin, Goebel (2008) for evaluation/comparison

SLIDE 89

Revisiting the PropBank Theory

1. Fewer roles: generalized semantic roles,

defined as prototypes (Dowty 1991)

PROTO-AGENT PROTO-PATIENT

2. More roles: Define roles specific to a group of

predicates

FrameNet PropBank

SLIDE 90

Revisiting the PropBank Theory

1. Fewer roles: generalized semantic roles,

defined as prototypes (Dowty 1991)

PROTO-AGENT PROTO-PATIENT

2. More roles: Define roles specific to a group of

predicates

FrameNet PropBank

Exploring semantic expectations

SLIDE 91

Dowty (1991)’s Properties

Property instigated volitional awareness sentient moved physically existed existed before existed during existed after changed possession changed state stationary

SLIDE 92

Dowty (1991)’s Properties

Property instigated Arg caused the Pred to happen volitional Arg chose to be involved in the Pred awareness Arg was/were aware of being involved in the Pred sentient Arg was sentient moved Arg changes/changed location during the Pred physically existed Arg existed as a physical object existed before Arg existed before the Pred began existed during Arg existed during the Pred existed after Arg existed after the Pred stopped changed possession Arg changed position during the Pred changed state Arg was/were altered or changed by the end of the Pred stationary Arg was stationary during the Pred

SLIDE 93

Dowty (1991)’s Properties

Property Proto-Agent Proto-Patient instigated Arg caused the Pred to happen ✔ volitional Arg chose to be involved in the Pred ✔ awareness Arg was/were aware of being involved in the Pred ✔

?

sentient Arg was sentient ✔

?

moved Arg changes/changed location during the Pred ✔ physically existed Arg existed as a physical object ✔ existed before Arg existed before the Pred began

?

existed during Arg existed during the Pred

?

existed after Arg existed after the Pred stopped

?

changed possession Arg changed position during the Pred

?

changed state Arg was/were altered or changed by the end of the Pred ✔ stationary Arg was stationary during the Pred ✔

SLIDE 94

Annotating for Dowty (1991)’s Properties

Property Q: How likely is it that… instigated Arg caused the Pred to happen? volitional Arg chose to be involved in the Pred? awareness Arg was/were aware of being involved in the Pred? sentient Arg was sentient? moved Arg changes/changed location during the Pred? physically existed Arg existed as a physical object? existed before Arg existed before the Pred began? existed during Arg existed during the Pred? existed after Arg existed after the Pred stopped? changed possession Arg changed position during the Pred? changed state Arg was/were altered or changed by the end of the Pred? stationary Arg was stationary during the Pred?

Reisinger et al. (2015)

SLIDE 95

Annotating for Dowty (1991)’s Properties

Reisinger et al. (2015)

SLIDE 96

Semantic Proto-Roles

Reisinger et al. (2015)

SLIDE 97

Semantic Proto-Role Labeling

independent logistic regression classifiers with verb embeddings

Reisinger et al. (2015)

SLIDE 98

Question Answer Semantic Role Labeling

He et al. (2015)

SLIDE 99

Question Answer Semantic Role Labeling

He et al. (2015)

Mechanical Turk & align to PropBank

SLIDE 100