Taaltheorie en Taalverwerking BSc Artificial Intelligence Raquel - - PowerPoint PPT Presentation

taaltheorie en taalverwerking
SMART_READER_LITE
LIVE PREVIEW

Taaltheorie en Taalverwerking BSc Artificial Intelligence Raquel - - PowerPoint PPT Presentation

Taaltheorie en Taalverwerking BSc Artificial Intelligence Raquel Fernndez Institute for Logic, Language, and Computation Winter 2012, lecture 3b Raquel Fernndez TtTv 2012 - lecture 3b 1 / 19 Plan for Today Theoretical session: PCFGs


slide-1
SLIDE 1

Taaltheorie en Taalverwerking

BSc Artificial Intelligence

Raquel Fernández Institute for Logic, Language, and Computation

Winter 2012, lecture 3b

Raquel Fernández TtTv 2012 - lecture 3b 1 / 19

slide-2
SLIDE 2

Plan for Today

Theoretical session:

  • PCFGs
  • Exam

Practical session:

  • Projects: teams & topics
  • Work with tutors over problematic homework.

Raquel Fernández TtTv 2012 - lecture 3b 2 / 19

slide-3
SLIDE 3

Ambiguity

Ambiguity is pervasive in natural language:

Raquel Fernández TtTv 2012 - lecture 3b 3 / 19

slide-4
SLIDE 4

Ambiguity

Ambiguity is pervasive in natural language: Some NLP tasks may do without disambiguation but most natural language understanding task need to disambiguate to get at the intended interpretation.

Raquel Fernández TtTv 2012 - lecture 3b 3 / 19

slide-5
SLIDE 5

Probabilistic Parsing

  • The abstract parsers we looked at can represent ambiguities (by

returning more than one parse tree) but cannot resolve them.

Raquel Fernández TtTv 2012 - lecture 3b 4 / 19

slide-6
SLIDE 6

Probabilistic Parsing

  • The abstract parsers we looked at can represent ambiguities (by

returning more than one parse tree) but cannot resolve them.

  • Main idea behind probabilistic parsing: compute the probability
  • f each possible tree given a sentence and choose the most

probably one.

Raquel Fernández TtTv 2012 - lecture 3b 4 / 19

slide-7
SLIDE 7

Probabilistic Parsing

  • The abstract parsers we looked at can represent ambiguities (by

returning more than one parse tree) but cannot resolve them.

  • Main idea behind probabilistic parsing: compute the probability
  • f each possible tree given a sentence and choose the most

probably one.

  • At first glance to compute the probability of a parse tree seems

difficult: trees are complex structures and the set of all possible trees generated by a grammar will most likely be infinite.

Raquel Fernández TtTv 2012 - lecture 3b 4 / 19

slide-8
SLIDE 8

Probabilistic Parsing

  • The abstract parsers we looked at can represent ambiguities (by

returning more than one parse tree) but cannot resolve them.

  • Main idea behind probabilistic parsing: compute the probability
  • f each possible tree given a sentence and choose the most

probably one.

  • At first glance to compute the probability of a parse tree seems

difficult: trees are complex structures and the set of all possible trees generated by a grammar will most likely be infinite.

  • Probabilistic CFGs allow us to compute the probability of a parse

tree from the probability of the grammar rules used to derive it.

Raquel Fernández TtTv 2012 - lecture 3b 4 / 19

slide-9
SLIDE 9

Probabilistic CFGs

A PCFG is a CFG where each rule is augmented with a probability:

Raquel Fernández TtTv 2012 - lecture 3b 5 / 19

slide-10
SLIDE 10

Probabilistic CFGs

A PCFG is a CFG where each rule is augmented with a probability:

  • Σ: a finite alphabet of terminal symbols
  • N : a finite set of non-terminal symbols
  • S: a special symbol S ∈ N called the start symbol
  • R: a set of rules each of the form A → β [p], where

∗ A is a non-terminal symbol ∗ β is any sequence of terminal or non-terminal symbols, including ǫ ∗ p is a number between 0 and 1 expressing the probability that A will be expanded to the sequence β, which we can write as P(A → β) ∗ for any non-terminal A, the sum of the probabilities for all rules A → β must be one:

β P(A → β) = 1

P(A → β) is a conditional probability P(β | A): the probability of

  • bserving a β once we have observed an A.

Raquel Fernández TtTv 2012 - lecture 3b 5 / 19

slide-11
SLIDE 11

En Example PCFG

Each probability is constrained to be non-negative, and for any non-terminal A, the probabilities for all rules with that non-terminal on the left-hand-side of the rule must sum to 1.

Raquel Fernández TtTv 2012 - lecture 3b 6 / 19

slide-12
SLIDE 12

Probability of a Parse Tree

The probability of a parse tree for a given sentence P(t, S) is the product of the probabilities of all the grammar rules used in the derivation of the sentence.

Raquel Fernández TtTv 2012 - lecture 3b 7 / 19

slide-13
SLIDE 13

Probability of a Parse Tree

The probability of a parse tree for a given sentence P(t, S) is the product of the probabilities of all the grammar rules used in the derivation of the sentence.

S NP DT the NN man VP Vt saw NP DT the NN dog

Raquel Fernández TtTv 2012 - lecture 3b 7 / 19

slide-14
SLIDE 14

Probability of a Parse Tree

The probability of a parse tree for a given sentence P(t, S) is the product of the probabilities of all the grammar rules used in the derivation of the sentence.

S NP DT the NN man VP Vt saw NP DT the NN dog P(t, S) = p(S → NP VP) × p(NP → DT NN )× p(VP → Vt NN ) × p(NP → DT NN )

Raquel Fernández TtTv 2012 - lecture 3b 7 / 19

slide-15
SLIDE 15

Probability of a Parse Tree

S NP DT the NN man VP Vt saw NP DT the NN dog What’s the probability of this tree?

And the probability of

[S [NP the man] [VP saw [NP the dog [PP with the telescope]]] and [S [NP the man] [VP saw [NP the dog] [PP with the telescope]] ?

Raquel Fernández TtTv 2012 - lecture 3b 8 / 19

slide-16
SLIDE 16

Disambiguation with PCFGs

These probabilities can provide a criterion for disambiguation: they give us a ranking over possible parses for any sentence – we can simply choose the parse tree with the highest probability.

Raquel Fernández TtTv 2012 - lecture 3b 9 / 19

slide-17
SLIDE 17

Learning PCFGs: Treebanks

How do we know the probability of each rule?

Raquel Fernández TtTv 2012 - lecture 3b 10 / 19

slide-18
SLIDE 18

Learning PCFGs: Treebanks

How do we know the probability of each rule? We can compute them from a corpus of parsed sentences – a treebank.

Raquel Fernández TtTv 2012 - lecture 3b 10 / 19

slide-19
SLIDE 19

Learning PCFGs: Treebanks

How do we know the probability of each rule? We can compute them from a corpus of parsed sentences – a treebank.

  • the most well-known treebank is the Penn Treebank, which

includes parse trees of sentences from different corpora, and in different languages. http://www.cis.upenn.edu/~treebank/

Raquel Fernández TtTv 2012 - lecture 3b 10 / 19

slide-20
SLIDE 20

Learning PCFGs: Treebanks

How do we know the probability of each rule? We can compute them from a corpus of parsed sentences – a treebank.

  • the most well-known treebank is the Penn Treebank, which

includes parse trees of sentences from different corpora, and in different languages. http://www.cis.upenn.edu/~treebank/

  • a treebank is typically built by automatic parsing plus manual

correction of parse trees.

  • standarization of the types of grammar rules allowed, POS tags

and generally all non-terminal symbols is of course critical.

Raquel Fernández TtTv 2012 - lecture 3b 10 / 19

slide-21
SLIDE 21

Learning PCFGs: Treebanks

How do we know the probability of each rule? We can compute them from a corpus of parsed sentences – a treebank.

  • the most well-known treebank is the Penn Treebank, which

includes parse trees of sentences from different corpora, and in different languages. http://www.cis.upenn.edu/~treebank/

  • a treebank is typically built by automatic parsing plus manual

correction of parse trees.

  • standarization of the types of grammar rules allowed, POS tags

and generally all non-terminal symbols is of course critical.

  • the parsed sentences in a treebank implicitly constitute a

grammar of the language: we can extract the CFG rules used to derive them.

Raquel Fernández TtTv 2012 - lecture 3b 10 / 19

slide-22
SLIDE 22

Treebanks

Treebanks are useful tools to study syntactic phenomena.

(S (NP (NNP John)) (VP (VPZ loves) (NP (NNP Mary))) (. .))

Raquel Fernández TtTv 2012 - lecture 3b 12 / 19

slide-23
SLIDE 23

Treebanks

Treebanks are useful tools to study syntactic phenomena.

(S (NP (NNP John)) (VP (VPZ loves) (NP (NNP Mary))) (. .)) ( (CODE SpeakerA4 .)) ( (S (INTJ Well) , (EDITED (RM [) (NP-SBJ I) , (IP +)) (NP-SBJ I) (RS ]) (VP think (SBAR 0 (S (NP-SBJ it) (VP ’s (NP-PRD a (ADJP pretty good) idea))))) . E_S))

The bracketed tree on the left is from the Penn Treebank of the Switchboard

  • corpus. For more resources, check out http://en.wikipedia.org/wiki/Treebank

Raquel Fernández TtTv 2012 - lecture 3b 12 / 19

slide-24
SLIDE 24

Learning PCFGs from Treebanks

Raquel Fernández TtTv 2012 - lecture 3b 13 / 19

slide-25
SLIDE 25

Learning PCFGs from Treebanks

For each non-terminal A, we want to compute the probability of each rule A → β that expands A.

Raquel Fernández TtTv 2012 - lecture 3b 13 / 19

slide-26
SLIDE 26

Learning PCFGs from Treebanks

For each non-terminal A, we want to compute the probability of each rule A → β that expands A.

  • we count how often a rule A → β occurs in the treebank, and
  • divide that by the total number of rules that expand A (the total

number of occurrences of A in the treebank).

P(A → β) = Total(A → β) Total(A)

Raquel Fernández TtTv 2012 - lecture 3b 13 / 19

slide-27
SLIDE 27

Learning PCFGs from Treebanks

For each non-terminal A, we want to compute the probability of each rule A → β that expands A.

  • we count how often a rule A → β occurs in the treebank, and
  • divide that by the total number of rules that expand A (the total

number of occurrences of A in the treebank).

P(A → β) = Total(A → β) Total(A)

For example, if the rule VP → Vt NP is seen 105 times in our corpus, and the non-terminal VP is seen 1000 times, then

P(VP → Vt NP) = 105 1000 = 0.105

Raquel Fernández TtTv 2012 - lecture 3b 13 / 19

slide-28
SLIDE 28

Problem: Insensitivity to Lexical Information

We said before. . .

  • These probabilities can provide a criterion for disambiguation: they

give us a ranking over possible parses for any sentence – we can simply choose the parse tree with the highest probability.

Raquel Fernández TtTv 2012 - lecture 3b 14 / 19

slide-29
SLIDE 29

Problem: Insensitivity to Lexical Information

We said before. . .

  • These probabilities can provide a criterion for disambiguation: they

give us a ranking over possible parses for any sentence – we can simply choose the parse tree with the highest probability.

Does it always work?

  • the frequency of syntactic structures doesn’t seem enough to
  • disambiguate. . .

He eats soup with a spoon / He eats soup with potatoes

Raquel Fernández TtTv 2012 - lecture 3b 14 / 19

slide-30
SLIDE 30

Problem: Insensitivity to Lexical Information

We said before. . .

  • These probabilities can provide a criterion for disambiguation: they

give us a ranking over possible parses for any sentence – we can simply choose the parse tree with the highest probability.

Does it always work?

  • the frequency of syntactic structures doesn’t seem enough to
  • disambiguate. . .

He eats soup with a spoon / He eats soup with potatoes

  • choosing between the two possible parse trees for these sentences

comes down to the choice between the rule NP → NP PP and the rule VP → NP PP.

  • the probability of these rules depends on the training corpus.

Raquel Fernández TtTv 2012 - lecture 3b 14 / 19

slide-31
SLIDE 31

Problem: Insensitivity to Lexical Information

We said before. . .

  • These probabilities can provide a criterion for disambiguation: they

give us a ranking over possible parses for any sentence – we can simply choose the parse tree with the highest probability.

Does it always work?

  • the frequency of syntactic structures doesn’t seem enough to
  • disambiguate. . .

He eats soup with a spoon / He eats soup with potatoes

  • choosing between the two possible parse trees for these sentences

comes down to the choice between the rule NP → NP PP and the rule VP → NP PP.

  • the probability of these rules depends on the training corpus.

Let’s try the Stanford Probabilistic Parser: http://nlp.stanford.edu:8080/parser/

Raquel Fernández TtTv 2012 - lecture 3b 14 / 19

slide-32
SLIDE 32

Problem: Insensitivity to Lexical Information

A possible way out is to allow for so-called lexicalized rules, where non-terminal symbols in a tree are annotated with their lexical head.

Raquel Fernández TtTv 2012 - lecture 3b 15 / 19

slide-33
SLIDE 33

Human Parsing

One factor in the preferences humans display for particular parses seem to be lexicalized syntactic probabilities.

Raquel Fernández TtTv 2012 - lecture 3b 16 / 19

slide-34
SLIDE 34

Summary

  • Ambiguity is a big problem for parsing and NLU in general
  • We can extend CFGs with probabilities –PCFG
  • The probability of a parse tree is computed by multiplying the

probabilities of the rules used in the derivation

  • PCFG probabilities can be learned from a treebank
  • PCFGs suffer from several problems, e.g. lexical independence –
  • ne way to deal with this problem is to include lexicalized rules.
  • Experimental evidence suggest that humans use probabilistic

information when interpreting sentences.

Raquel Fernández TtTv 2012 - lecture 3b 17 / 19

slide-35
SLIDE 35

Exam

Raquel Fernández TtTv 2012 - lecture 3b 18 / 19

slide-36
SLIDE 36

Project Topics

Some options that came up the first day of the course:

  • irony recognition
  • information retrieval
  • Siri
  • sentiment analysis
  • speech synthesis
  • automatic speech recognition
  • spell checking
  • grammar correction
  • automatic summarization
  • chat bots
  • navigation interfaces
  • question-answering
  • human-robot communication
  • . . .

Raquel Fernández TtTv 2012 - lecture 3b 19 / 19