[PPT] - Taaltheorie en Taalverwerking BSc Artificial Intelligence Raquel PowerPoint Presentation

SLIDE 1

Taaltheorie en Taalverwerking

BSc Artificial Intelligence

Raquel Fernández Institute for Logic, Language, and Computation

Winter 2012, lecture 3b

Raquel Fernández TtTv 2012 - lecture 3b 1 / 19

SLIDE 2

Plan for Today

Theoretical session:

PCFGs
Exam

Practical session:

Projects: teams & topics
Work with tutors over problematic homework.

Raquel Fernández TtTv 2012 - lecture 3b 2 / 19

SLIDE 3

Ambiguity

Ambiguity is pervasive in natural language:

Raquel Fernández TtTv 2012 - lecture 3b 3 / 19

SLIDE 4

Ambiguity

Ambiguity is pervasive in natural language: Some NLP tasks may do without disambiguation but most natural language understanding task need to disambiguate to get at the intended interpretation.

Raquel Fernández TtTv 2012 - lecture 3b 3 / 19

SLIDE 5

Probabilistic Parsing

The abstract parsers we looked at can represent ambiguities (by

returning more than one parse tree) but cannot resolve them.

Raquel Fernández TtTv 2012 - lecture 3b 4 / 19

SLIDE 6

Probabilistic Parsing

The abstract parsers we looked at can represent ambiguities (by

returning more than one parse tree) but cannot resolve them.

Main idea behind probabilistic parsing: compute the probability
f each possible tree given a sentence and choose the most

probably one.

Raquel Fernández TtTv 2012 - lecture 3b 4 / 19

SLIDE 7

Probabilistic Parsing

The abstract parsers we looked at can represent ambiguities (by

returning more than one parse tree) but cannot resolve them.

Main idea behind probabilistic parsing: compute the probability
f each possible tree given a sentence and choose the most

probably one.

At first glance to compute the probability of a parse tree seems

difficult: trees are complex structures and the set of all possible trees generated by a grammar will most likely be infinite.

Raquel Fernández TtTv 2012 - lecture 3b 4 / 19

SLIDE 8

Probabilistic Parsing

The abstract parsers we looked at can represent ambiguities (by

returning more than one parse tree) but cannot resolve them.

Main idea behind probabilistic parsing: compute the probability
f each possible tree given a sentence and choose the most

probably one.

At first glance to compute the probability of a parse tree seems

difficult: trees are complex structures and the set of all possible trees generated by a grammar will most likely be infinite.

Probabilistic CFGs allow us to compute the probability of a parse

tree from the probability of the grammar rules used to derive it.

Raquel Fernández TtTv 2012 - lecture 3b 4 / 19

SLIDE 9

Probabilistic CFGs

A PCFG is a CFG where each rule is augmented with a probability:

Raquel Fernández TtTv 2012 - lecture 3b 5 / 19

SLIDE 10

Probabilistic CFGs

A PCFG is a CFG where each rule is augmented with a probability:

Σ: a finite alphabet of terminal symbols
N : a finite set of non-terminal symbols
S: a special symbol S ∈ N called the start symbol
R: a set of rules each of the form A → β [p], where

∗ A is a non-terminal symbol ∗ β is any sequence of terminal or non-terminal symbols, including ǫ ∗ p is a number between 0 and 1 expressing the probability that A will be expanded to the sequence β, which we can write as P(A → β) ∗ for any non-terminal A, the sum of the probabilities for all rules A → β must be one:

β P(A → β) = 1

P(A → β) is a conditional probability P(β | A): the probability of

bserving a β once we have observed an A.

Raquel Fernández TtTv 2012 - lecture 3b 5 / 19

SLIDE 11

En Example PCFG

Each probability is constrained to be non-negative, and for any non-terminal A, the probabilities for all rules with that non-terminal on the left-hand-side of the rule must sum to 1.

Raquel Fernández TtTv 2012 - lecture 3b 6 / 19

SLIDE 12

Probability of a Parse Tree

The probability of a parse tree for a given sentence P(t, S) is the product of the probabilities of all the grammar rules used in the derivation of the sentence.

Raquel Fernández TtTv 2012 - lecture 3b 7 / 19

SLIDE 13

Probability of a Parse Tree

The probability of a parse tree for a given sentence P(t, S) is the product of the probabilities of all the grammar rules used in the derivation of the sentence.

S NP DT the NN man VP Vt saw NP DT the NN dog

Raquel Fernández TtTv 2012 - lecture 3b 7 / 19

SLIDE 14

Probability of a Parse Tree

The probability of a parse tree for a given sentence P(t, S) is the product of the probabilities of all the grammar rules used in the derivation of the sentence.

S NP DT the NN man VP Vt saw NP DT the NN dog P(t, S) = p(S → NP VP) × p(NP → DT NN )× p(VP → Vt NN ) × p(NP → DT NN )

Raquel Fernández TtTv 2012 - lecture 3b 7 / 19

SLIDE 15

Probability of a Parse Tree

S NP DT the NN man VP Vt saw NP DT the NN dog What’s the probability of this tree?

And the probability of

[S [NP the man] [VP saw [NP the dog [PP with the telescope]]] and [S [NP the man] [VP saw [NP the dog] [PP with the telescope]] ?

Raquel Fernández TtTv 2012 - lecture 3b 8 / 19

SLIDE 16

Disambiguation with PCFGs

These probabilities can provide a criterion for disambiguation: they give us a ranking over possible parses for any sentence – we can simply choose the parse tree with the highest probability.

Raquel Fernández TtTv 2012 - lecture 3b 9 / 19

SLIDE 17

Learning PCFGs: Treebanks

How do we know the probability of each rule?

Raquel Fernández TtTv 2012 - lecture 3b 10 / 19

SLIDE 18

Learning PCFGs: Treebanks

How do we know the probability of each rule? We can compute them from a corpus of parsed sentences – a treebank.

Raquel Fernández TtTv 2012 - lecture 3b 10 / 19

SLIDE 19

Learning PCFGs: Treebanks

How do we know the probability of each rule? We can compute them from a corpus of parsed sentences – a treebank.

the most well-known treebank is the Penn Treebank, which

includes parse trees of sentences from different corpora, and in different languages. http://www.cis.upenn.edu/~treebank/

Raquel Fernández TtTv 2012 - lecture 3b 10 / 19

SLIDE 20

Learning PCFGs: Treebanks

How do we know the probability of each rule? We can compute them from a corpus of parsed sentences – a treebank.

the most well-known treebank is the Penn Treebank, which

includes parse trees of sentences from different corpora, and in different languages. http://www.cis.upenn.edu/~treebank/

a treebank is typically built by automatic parsing plus manual

correction of parse trees.

standarization of the types of grammar rules allowed, POS tags

and generally all non-terminal symbols is of course critical.

Raquel Fernández TtTv 2012 - lecture 3b 10 / 19

SLIDE 21

Learning PCFGs: Treebanks

How do we know the probability of each rule? We can compute them from a corpus of parsed sentences – a treebank.

the most well-known treebank is the Penn Treebank, which

includes parse trees of sentences from different corpora, and in different languages. http://www.cis.upenn.edu/~treebank/

a treebank is typically built by automatic parsing plus manual

correction of parse trees.

standarization of the types of grammar rules allowed, POS tags

and generally all non-terminal symbols is of course critical.

the parsed sentences in a treebank implicitly constitute a

grammar of the language: we can extract the CFG rules used to derive them.

Raquel Fernández TtTv 2012 - lecture 3b 10 / 19

SLIDE 22

Treebanks

Treebanks are useful tools to study syntactic phenomena.

(S (NP (NNP John)) (VP (VPZ loves) (NP (NNP Mary))) (. .))

Raquel Fernández TtTv 2012 - lecture 3b 12 / 19

SLIDE 23

Treebanks

Treebanks are useful tools to study syntactic phenomena.

(S (NP (NNP John)) (VP (VPZ loves) (NP (NNP Mary))) (. .)) ( (CODE SpeakerA4 .)) ( (S (INTJ Well) , (EDITED (RM [) (NP-SBJ I) , (IP +)) (NP-SBJ I) (RS ]) (VP think (SBAR 0 (S (NP-SBJ it) (VP ’s (NP-PRD a (ADJP pretty good) idea))))) . E_S))

The bracketed tree on the left is from the Penn Treebank of the Switchboard

corpus. For more resources, check out http://en.wikipedia.org/wiki/Treebank

Raquel Fernández TtTv 2012 - lecture 3b 12 / 19

SLIDE 24

Learning PCFGs from Treebanks

Raquel Fernández TtTv 2012 - lecture 3b 13 / 19

SLIDE 25

Learning PCFGs from Treebanks

For each non-terminal A, we want to compute the probability of each rule A → β that expands A.

Raquel Fernández TtTv 2012 - lecture 3b 13 / 19

SLIDE 26

Learning PCFGs from Treebanks

For each non-terminal A, we want to compute the probability of each rule A → β that expands A.

we count how often a rule A → β occurs in the treebank, and
divide that by the total number of rules that expand A (the total

number of occurrences of A in the treebank).

P(A → β) = Total(A → β) Total(A)

Raquel Fernández TtTv 2012 - lecture 3b 13 / 19

SLIDE 27

Learning PCFGs from Treebanks

For each non-terminal A, we want to compute the probability of each rule A → β that expands A.

we count how often a rule A → β occurs in the treebank, and
divide that by the total number of rules that expand A (the total

number of occurrences of A in the treebank).

P(A → β) = Total(A → β) Total(A)

For example, if the rule VP → Vt NP is seen 105 times in our corpus, and the non-terminal VP is seen 1000 times, then

P(VP → Vt NP) = 105 1000 = 0.105

Raquel Fernández TtTv 2012 - lecture 3b 13 / 19

SLIDE 28

Problem: Insensitivity to Lexical Information

We said before. . .

These probabilities can provide a criterion for disambiguation: they

give us a ranking over possible parses for any sentence – we can simply choose the parse tree with the highest probability.

Raquel Fernández TtTv 2012 - lecture 3b 14 / 19

SLIDE 29

Problem: Insensitivity to Lexical Information

We said before. . .

These probabilities can provide a criterion for disambiguation: they

give us a ranking over possible parses for any sentence – we can simply choose the parse tree with the highest probability.

Does it always work?

the frequency of syntactic structures doesn’t seem enough to
disambiguate. . .

He eats soup with a spoon / He eats soup with potatoes

Raquel Fernández TtTv 2012 - lecture 3b 14 / 19

SLIDE 30

Problem: Insensitivity to Lexical Information

We said before. . .

These probabilities can provide a criterion for disambiguation: they

give us a ranking over possible parses for any sentence – we can simply choose the parse tree with the highest probability.

Does it always work?

the frequency of syntactic structures doesn’t seem enough to
disambiguate. . .

He eats soup with a spoon / He eats soup with potatoes

choosing between the two possible parse trees for these sentences

comes down to the choice between the rule NP → NP PP and the rule VP → NP PP.

the probability of these rules depends on the training corpus.

Raquel Fernández TtTv 2012 - lecture 3b 14 / 19

SLIDE 31

Problem: Insensitivity to Lexical Information

We said before. . .

These probabilities can provide a criterion for disambiguation: they

give us a ranking over possible parses for any sentence – we can simply choose the parse tree with the highest probability.

Does it always work?

the frequency of syntactic structures doesn’t seem enough to
disambiguate. . .

He eats soup with a spoon / He eats soup with potatoes

choosing between the two possible parse trees for these sentences

comes down to the choice between the rule NP → NP PP and the rule VP → NP PP.

the probability of these rules depends on the training corpus.

Let’s try the Stanford Probabilistic Parser: http://nlp.stanford.edu:8080/parser/

Raquel Fernández TtTv 2012 - lecture 3b 14 / 19

SLIDE 32

Problem: Insensitivity to Lexical Information

A possible way out is to allow for so-called lexicalized rules, where non-terminal symbols in a tree are annotated with their lexical head.

Raquel Fernández TtTv 2012 - lecture 3b 15 / 19

SLIDE 33

Human Parsing

One factor in the preferences humans display for particular parses seem to be lexicalized syntactic probabilities.

Raquel Fernández TtTv 2012 - lecture 3b 16 / 19

SLIDE 34

Summary

Ambiguity is a big problem for parsing and NLU in general
We can extend CFGs with probabilities –PCFG
The probability of a parse tree is computed by multiplying the

probabilities of the rules used in the derivation

PCFG probabilities can be learned from a treebank
PCFGs suffer from several problems, e.g. lexical independence –
ne way to deal with this problem is to include lexicalized rules.
Experimental evidence suggest that humans use probabilistic

information when interpreting sentences.

Raquel Fernández TtTv 2012 - lecture 3b 17 / 19

SLIDE 35

Exam

Raquel Fernández TtTv 2012 - lecture 3b 18 / 19

SLIDE 36

Project Topics

Some options that came up the first day of the course:

irony recognition
information retrieval
Siri
sentiment analysis
speech synthesis
automatic speech recognition
spell checking
grammar correction
automatic summarization
chat bots
navigation interfaces
question-answering
human-robot communication
. . .

Raquel Fernández TtTv 2012 - lecture 3b 19 / 19