Advanced Natural Language Processing Lecture 25 Lexical Semantics - - PowerPoint PPT Presentation

advanced natural language processing lecture 25 lexical
SMART_READER_LITE
LIVE PREVIEW

Advanced Natural Language Processing Lecture 25 Lexical Semantics - - PowerPoint PPT Presentation

Advanced Natural Language Processing Lecture 25 Lexical Semantics 2: Questions Answering and Word Sense Disambiguation Johanna Moore (some slides by Philipp Koehn) 20 November 2012 Johanna Moore ANLP Lecture 24 20 November 2012 1 Question


slide-1
SLIDE 1

Advanced Natural Language Processing Lecture 25 Lexical Semantics 2: Questions Answering and Word Sense Disambiguation

Johanna Moore (some slides by Philipp Koehn) 20 November 2012

Johanna Moore ANLP Lecture 24 20 November 2012

slide-2
SLIDE 2

1

Question Answering

  • We would like to build

– a machine that answers questions in natural language – may have access to knowledge bases, dictionaries, thesauri – may have access to vast quantities of English text

  • Basically, a smarter Google
  • This task is typically called Question Answering
  • What will we need to be able to do this?

Johanna Moore ANLP Lecture 24 20 November 2012

slide-3
SLIDE 3

2

Example Question

  • Question

When was Barack Obama born?

  • Text available to the machine

Barack Obama was born on August 4, 1961

  • This is easy.

– just phrase a Google query properly: "Barack Obama was born on *" – syntactic rules that convert questions into statements are straight-forward

Johanna Moore ANLP Lecture 24 20 November 2012

slide-4
SLIDE 4

3

Example Question (2)

  • Question

What kind of plants grow in Scotland?

  • Text available to the machine

A new chemical plant was opened in Scotland. Heather is just one of the many plants that grow in Scotland.

  • What is hard?

– words may have different meanings – we need to be able to disambiguate them

Johanna Moore ANLP Lecture 24 20 November 2012

slide-5
SLIDE 5

4

Example Question (3)

  • Question

Do the police use dogs to sniff for drugs?

  • Text available to the machine

The police use canines to sniff for drugs.

  • What is hard?

– words may have the “same” meaning (synonyms, hyponyms) – we need to be able to match them

Johanna Moore ANLP Lecture 24 20 November 2012

slide-6
SLIDE 6

5

Example Question (4)

  • Question

Which animals love to swim?

  • Text available to the machine

Ice bears love to swim in the freezing waters of the Arctic.

  • What is hard?

– some words belong to groups which are referred to by other words – we need to have database of such A is-a B relationships, such as the WordNet object hierarchy

Johanna Moore ANLP Lecture 24 20 November 2012

slide-7
SLIDE 7

6

Example Question (5)

  • Question

What is the name of George Bush’s poodle?

  • Text available to the machine

President George Bush has a terrier called Barney.

  • What is hard?

– we need to know that poodle and terrier are related–they share a common ancestor in a taxonomy such as the WordNet object hierarchy – words need to be grouped together into semantically related classes

Johanna Moore ANLP Lecture 24 20 November 2012

slide-8
SLIDE 8

7

Example Question (6)

  • Question

Did Poland reduce its carbon emissions since 1989?

  • Text available to the machine

Due to the collapse of the industrial sector after the end of communism in 1989, all countries in Central Europe saw a fall in carbon emissions. Poland is a country in Central Europe.

  • What is hard?

– we need to do logical inference to relate the two sentences

Johanna Moore ANLP Lecture 24 20 November 2012

slide-9
SLIDE 9

8

Word Sense Disambiguation (WSD)

An important capability for automated question answering is word sense disambiguation, i.e., the ability to select the correct sense for each word in a given context What types of plants grow in Scotland? There are many approaches to this problem:

  • Constraint satisfaction approaches
  • Dictionary approaches
  • Supervised ML
  • Unsupervised ML

Johanna Moore ANLP Lecture 24 20 November 2012

slide-10
SLIDE 10

9

Constraint Satisfaction

Three cases:

  • Disambiguate an argument by using the selection restrictions from an

unambiguous predicate.

  • Disambiguate a predicate by using the selection restrictions from an

unambiguous argument.

  • Mutual disambiguation of an argument and a predicate.

Johanna Moore ANLP Lecture 24 20 November 2012

slide-11
SLIDE 11

10

Constraint Satisfaction Examples

Disambiguating arguments using predicates: “In our house, everybody has a career and none of them includes washing dishes,” he says. In her tiny kitchen at home, Ms. Chen works efficiently, stir-frying several simple dishes, including braised pig’s ears and chicken livers with green peppers. Disambiguate dishes using the selectional restrictions that predicates washing and stir-fry place on their arguments

Johanna Moore ANLP Lecture 24 20 November 2012

slide-12
SLIDE 12

11

Constraint Satisfaction Examples

Disambiguating predicates using arguments:

  • 1. Well, there was the time they served green-lipped mussels from New

Zealand.

  • 2. Which airlines serve Denver?

3. Which ones serve breakfast? Sense of serve in 1 requires its patient to be edible Sense of serve in 2 requires its patient to be geographical entity Sense of serve in 3 requires its patient to be a meal designator

Johanna Moore ANLP Lecture 24 20 November 2012

slide-13
SLIDE 13

12

Constraint Satisfaction Examples

Mutual disambiguation I’m looking for a restaurant that serves vegetarian dishes. Assuming 3 senses of serve and 2 of dishes gives 6 possible sense combinations, but only 1 satisfies all selectional restrictions

Johanna Moore ANLP Lecture 24 20 November 2012

slide-14
SLIDE 14

13

Problems with Constraint Satisfaction Approach

  • The need to parse to get the verb-argument information needed to make

it work

  • Scaling up to large numbers of words (WordNet helps with this)
  • Getting details of all selectional restrictions correct
  • Wider context can sanction violation of selectional restriction

But it fell apart in 1931, perhaps because people realized you can’t eat gold for lunch.

  • Dealing with metaphorical uses that violate the constraints

If you want to kill the Soviet Union, get it to try to eat Afghanistan.

Johanna Moore ANLP Lecture 24 20 November 2012

slide-15
SLIDE 15

14

WSD as a Classification Problem

Assume corpus of texts with words labeled with their senses

  • She pays 3% interest/INTEREST-MONEY on the loan.
  • He showed a lot of interest/INTEREST-CURIOSITY in the painting.

Similar to POS tagging

  • given a corpus tagged with senses
  • identify features that indicate one sense over another
  • learn a model that predicts the correct sense given the features

We can apply similar supervised learning methods

  • Naive Bayes
  • Decision lists
  • Decision trees, etc.

Johanna Moore ANLP Lecture 24 20 November 2012

slide-16
SLIDE 16

15

What are useful features for WSD?

“If one examines the words in a book, one at a time as through an opaque mask with a hole in it one word wide, then it is obviously impossible to determine, one at a time, the meaning of the words. . . But if one lengthens the slit in the opaque mask, until one can see not only the central word in question but also say N words on either side, then if N is large enough one can unambiguously decide the meaning of the central word. . . The practical question is: ”What minimum value of N will, at least in a tolerable fraction

  • f cases, lead to the correct choice of meaning for the central word?”

Warren Weaver, A Synopsis of Linguistic Theory, 1955

Johanna Moore ANLP Lecture 24 20 November 2012

slide-17
SLIDE 17

16

Feature Extraction: Collocational Features

Collocational Features: information about words in specific positions to the left

  • r right of the target word
  • plant life
  • plant closure
  • manufacturing plant
  • assembly plant

Features extracted for context words:

  • word itself
  • root form
  • POS

Johanna Moore ANLP Lecture 24 20 November 2012

slide-18
SLIDE 18

17

Example

An electric guitar and bass player stand off to one side, not really part

  • f the scene, just as a sort of nod to gringo expectations perhaps.

Collocational feature vector extracted from window of 2 words (+ POS tags) to right and left of target word. [guitar, NN1, and, CJC, player, NN1, stand, VVB]

Johanna Moore ANLP Lecture 24 20 November 2012

slide-19
SLIDE 19

18

Feature Extraction: Bag of Words Features

Bag of Words Features: all content words in an N-word window E.g., vector of binary features indicating whether word w, from vocabulary, V ,

  • ccurs in context window

An electric guitar and bass player stand off to one side, not really part

  • f the scene, just as a sort of nod to gringo expectations perhaps.

V =[fishing, big, sound, player, fly, rod, pound, double, runs, playing, guitar, band]

Window = 10 Bag of Words Feature Vector: [0,0,0,1,0,0,0,0,0,0,1,0]

Johanna Moore ANLP Lecture 24 20 November 2012

slide-20
SLIDE 20

19

Other useful features

Of course, many other features may be included:

  • Syntactically related words
  • Syntactic role in sense
  • Topic of the text

Johanna Moore ANLP Lecture 24 20 November 2012

slide-21
SLIDE 21

20

Supervised Learning Approaches to WSD

Learn a WSD model from a representative set of labeled instances from the same distribution as the test set

  • input is a training set consisting of feature-encoded inputs labeled with the

appropriate sense

  • output is a classifier that assigns labels to new, unseen feature-encoded inputs

Johanna Moore ANLP Lecture 24 20 November 2012

slide-22
SLIDE 22

21

Naive Bayes Classifiers

Choose the most probable sense, ˆ s, from the possible senses, S, for a given feature vector, V = v1, v2, . . . vn ˆ s = argmax

s∈S

P(s|V ) rewriting and assuming independent features yields: ˆ s = argmax

s∈S

P(s)Π

n j=1P(vj|s)

I.e., we can estimate the probability of an entire vector given a sense by the product of the probabilities of its individual features given that sense.

Johanna Moore ANLP Lecture 24 20 November 2012

slide-23
SLIDE 23

22

Naive Bayes Classifiers

Where do the numbers come from? From a tagged corpus. For example, the probability of guitar occurring one position to the right of each sense of the word bass is computed from the corpus. One problem with Naive Bayes is that it’s hard for humans to understand how it makes its decisions

Johanna Moore ANLP Lecture 24 20 November 2012

slide-24
SLIDE 24

23

Decision List Classifiers

Decision List classifiers are simple to learn/train and are often effective Equivalent to case statements in programming languages, i.e., they consist of an

  • rdered set of conditions with simple conclusions
  • A sequence of tests is applied to each target-word feature vector
  • Each test is indicative of a particular sense
  • If a test succeeds, that sense is returned, otherwise the next test in the

sequence is applied

  • At end of sequence, majority class is returned

Easier for people to understand

Johanna Moore ANLP Lecture 24 20 November 2012

slide-25
SLIDE 25

24

Example Decision List

Rule Sense fish within window = ⇒ bass1 striped bass = ⇒ bass1 guitar within window = ⇒ bass2 bass player = ⇒ bass2 piano within window = ⇒ bass2 tenor within window = ⇒ bass2 sea bass = ⇒ bass1 play/V bass = ⇒ bass2 river within window = ⇒ bass1 violin within window = ⇒ bass2 salmon within window = ⇒ bass1

  • n bass

= ⇒ bass2 bass are = ⇒ bass1

Johanna Moore ANLP Lecture 24 20 November 2012

slide-26
SLIDE 26

25

Learning Decision Lists

Generate and order the sequence of tests based on characteristics of the training data Yarowsky’s (1994) approach on a binary sense distinction task: Every feature-value pair is considered a test Rank order the tests based on their individual accuracy on the training data, defined as:

  • log

✓P(Sense1|fi) P(Sense2|fi) ◆

  • 95% on binary decision tasks.

Johanna Moore ANLP Lecture 24 20 November 2012

slide-27
SLIDE 27

26

Evaluation

WSD systems are usually developed and evaluated intrinsically, i.e., treated as if they were stand alone systems Evaluation metric is percentage of target words that are tagged correctly, i.e., there is an exact match with the hand-labeled tags Two major types

  • Fine-grained tagging to a dictionary (e.g., WordNet) sense
  • Coarse-grained binary tagging, e.g., musical vs. fish sense of bass

And what about partial credit? E.g., confusing a particular musical sense of bass (e.g., instrument) with a fish sense is clearly worse than confusing it with another musical sense (e.g., singer)

Johanna Moore ANLP Lecture 24 20 November 2012

slide-28
SLIDE 28

27

Evaluation

How do you choose which you want to do? You choose based on your application. A text-to-speech system needs to know if it is a [baes] or a [beys]. It doesn’t need to know if its singer or a bass fiddle, or a fresh or saltwater fish.

Johanna Moore ANLP Lecture 24 20 November 2012

slide-29
SLIDE 29

28

Evaluation

Many aspects of WSD evaluation have been standardized by the senseval and semeval efforts Provide shared task with training and testing materials along with the sense inventories for WSD tasks in a variety of languages

Johanna Moore ANLP Lecture 24 20 November 2012

slide-30
SLIDE 30

29

Evaluation

The bottom line is that classifiers all perform essentially the same on similar tasks in this domain. . . naive Bayes, decision lists, decision trees, neural nets. They differ in their end product (how inspectable is it?), how much data they need, and how long they take to run. A big drawback is that they all need labelled data Next time: unsupervised methods for WSD

Johanna Moore ANLP Lecture 24 20 November 2012