Taaltheorie en Taalverwerking BSc Artificial Intelligence Raquel - - PowerPoint PPT Presentation

taaltheorie en taalverwerking
SMART_READER_LITE
LIVE PREVIEW

Taaltheorie en Taalverwerking BSc Artificial Intelligence Raquel - - PowerPoint PPT Presentation

Taaltheorie en Taalverwerking BSc Artificial Intelligence Raquel Fernndez Institute for Logic, Language, and Computation Winter 2012, lecture 3a Raquel Fernndez TtTv 2012 - lecture 3a 1 / 19 Plan for Today Theoretical session:


slide-1
SLIDE 1

Taaltheorie en Taalverwerking

BSc Artificial Intelligence

Raquel Fernández Institute for Logic, Language, and Computation

Winter 2012, lecture 3a

Raquel Fernández TtTv 2012 - lecture 3a 1 / 19

slide-2
SLIDE 2

Plan for Today

Theoretical session:

  • Parsing (part of ch. 13 of J&M)
  • Guest lecture on Machine Translation

Practical session:

  • Questions/problems regarding HW#2 (I’ll be at there 17-18h).
  • Time to finish HW#3.
  • Review project groups.

Raquel Fernández TtTv 2012 - lecture 3a 2 / 19

slide-3
SLIDE 3

Parsing

Syntactic parsing is the task of computing a parse tree for a sentence given a grammar.

  • When we use grammars as recognizers, the recognizer also

parses the sentence (goes through a derivation), but we are not interested in the resulting structure.

  • When we use grammars as parsers, we are interested in the tree

structure assigned to a particular sentence. Parsing can be viewed as a search problem:

  • the parser searches through the space of possible parse trees

allowed by a grammar to find the right tree for a given sentence.

  • note: recognition/parsing of regular languages can also be viewed as a

search problem, but since any non-deterministic FSA is equivalent to a deterministic FSA the search ‘problem’ is not a problem in theory.

Raquel Fernández TtTv 2012 - lecture 3a 3 / 19

slide-4
SLIDE 4

Parsing as a Search Problem

A grammar defines a search space of possible trees – each state in this space corresponds to a tree. The space includes:

  • all the complete trees a grammar can generate (trees whose

leaves correspond to words and cannot be further expanded), and

  • all the partial trees (where some node can still be expanded by a

rule), which can be seen as intermediate steps towards the generation of complete trees. ⇒ the search space of natural language grammars can be huge!

Raquel Fernández TtTv 2012 - lecture 3a 4 / 19

slide-5
SLIDE 5

Parsing as a Search Problem

How does a parser assign a parse tree to a sentence? Given a sentence and a grammar, the parser navigates the search space following two constraints:

  • the complete parse tree of a given sentence must have leaves

that correspond to the words in it.

  • the root of the complete parse tree must be the start symbol S
  • f the grammar.

These two constraints give rise to the two search strategies underlying most parsers: bottom-up and top-down.

Raquel Fernández TtTv 2012 - lecture 3a 5 / 19

slide-6
SLIDE 6

Bottom-Up Parsing

  • The starting point of a bottom-up parser are the words of the

input sentence.

  • The parser proceeds by building up structure from the bottom to

the top of the tree.

  • It does so by looking at the grammar rules right-to-left.
  • At each stage, it considers as many (partial) trees as can be built

by matching the right-hand side of a rule with the current input.

  • The parser succeeds if it is able to build a tree that covers all

teh input and whose root is the start symbol of the grammar.

Raquel Fernández TtTv 2012 - lecture 3a 6 / 19

slide-7
SLIDE 7

Bottom-Up Parsing

Raquel Fernández TtTv 2012 - lecture 3a 7 / 19

slide-8
SLIDE 8

Top-Down Parsing

  • The starting point of a top-down parser is the start symbol of

the grammar.

  • The parser starts by assuming that the input is indeed a

well-formed sentence and it tries to prove this by building up structure from the top of the tree down to the leaves.

  • It does so by looking at the grammar rules left-to-right.
  • At each stage, it considers as many (partial) trees as can be

built by matching the left-hand side of a rule with the currently available non-terminal nodes.

  • The parser succeeds if the leaves of at least one of the trees it

has constructed matches the words of the input sentence

Raquel Fernández TtTv 2012 - lecture 3a 8 / 19

slide-9
SLIDE 9

Top-Down Parsing

Raquel Fernández TtTv 2012 - lecture 3a 9 / 19

slide-10
SLIDE 10

Bottom-Up vs. Top-Down

These two basic strategies have advantages and disadvantages:

  • Top-Down parsers never explore illegal parse trees that cannot

form an S – but waste time on trees that can never match the input words.

  • Bottom-Up parsers never explore trees that are inconsistent with

input sentence – but waste time exploring illegal parse trees that will never lead to an S root. Actual parsing algorithms may combine these two strategies.

Raquel Fernández TtTv 2012 - lecture 3a 10 / 19

slide-11
SLIDE 11

A Grammar’s Search Space

How can we define the search space of a given grammar? For simplicity, let us focus on the top-down approach (the same considerations apply to the bottom-up approach). Let’s assume that the states in the search space are created by:

  • applying the grammar rules in the order in which they appear in

the grammar, and

  • expanding the nodes at a given level in a tree from left to right.

We can define the search space of a given grammar following one

  • f two strategies: depth-first or breadth-first.
  • depth-first: we work vertically – priority is given to nodes that are

lower or deeper in the tree

  • breadth-first: we work horizontally – priority is given to nodes that are

higher up in the tree

Raquel Fernández TtTv 2012 - lecture 3a 11 / 19

slide-12
SLIDE 12

Search Space: Depth-first

  • 1. S → NP VP
  • 2. NP → Det N
  • 3. VP → V
  • 4. VP → V A
  • 5. D → the
  • 6. N → dog
  • 7. N → cat
  • 8. V → runs
  • 9. A → fast

For simplicity, sequences of states where there is no branching are collapsed into one single state.

S NP D the N VP S NP D the N dog VP S NP D the N cat VP S NP D the N dog VP V runs S NP D the N dog VP V runs A fast S NP D the N cat VP V runs S NP D the N cat VP V runs A fast Raquel Fernández TtTv 2012 - lecture 3a 12 / 19

slide-13
SLIDE 13

Search Space: Breadth-first

  • 1. S → NP VP
  • 2. NP → Det N
  • 3. VP → V
  • 4. VP → V A
  • 5. D → the
  • 6. N → dog
  • 7. N → cat
  • 8. V → runs
  • 9. A → fast

For simplicity, sequences of states where there is no branching are collapsed into one single state.

S NP D N VP S NP D N VP V S NP D N VP V A S NP D the N dog VP V runs S NP D the N cat VP V runs S NP D the N dog VP V runs A fast S NP D the N cat VP V runs A fast Raquel Fernández TtTv 2012 - lecture 3a 13 / 19

slide-14
SLIDE 14

Realistic Search

  • Since the search space of a realistic grammar can be huge,

parsing algorithms do not actually build the full space of parse trees that a grammar allows and then search for the tree that corresponds to a given sentence.

  • Instead, they expand the search space incrementally by

systematically exploring one state at a time.

  • When parsing a given a sentence, parsers explore paths in a

theoretical search space.

Raquel Fernández TtTv 2012 - lecture 3a 14 / 19

slide-15
SLIDE 15

Exploring Paths

breadth-first

1 2 3 4 5 6 7

depth-first

1 2 5 3 4 6 7 Raquel Fernández TtTv 2012 - lecture 3a 15 / 19

slide-16
SLIDE 16

Top-Down depth-first with bottom-up filtering

We can combine top-down and bottom-up parsing by adding the following constraint: the parser should not consider any grammar rule that leads to words which are not part of the input sentence.

S NP D the N VP S NP D the N dog VP S NP D the N cat VP The cat runs fast Raquel Fernández TtTv 2012 - lecture 3a 16 / 19

slide-17
SLIDE 17

Structural Ambiguity

There are several types of structural or syntactic ambiguity:

  • attachment ambiguity: one constituent can appear in more than one

location in the parse tree (we have already seen this kind of ambiguity).

The tourist saw the astronomer with the telescope I shot an elephant in my pajamas We saw the Eiffel Tower flying to Paris

  • coordination ambiguity: different sets of phrases can be conjoined

together (a variant of attachment ambiguity)

  • ld men and women

→ old [men & women] / [ old men ] & women nationwide television and radio → nationwide [t & r] / [nationwide t] & r the light red chair → the [light [blue chair ] ] / the [ [light blue] chair ]

  • local ambiguity: a part of a sentence is ambiguous (has more than one

parse tree) even thought the whole sentence may not be so.

book that flight → POS ambiguity of ‘book’ (V or N) the robber knew Vincent shot Marsellus → the grammar may be able to assign a sentential structure to the sub-string ‘the robber knew Vincent’.

Raquel Fernández TtTv 2012 - lecture 3a 17 / 19

slide-18
SLIDE 18

Structural Ambiguity

  • We have been referring to ambiguous sentences.
  • We say that a grammar is ambiguous if it can generate more

than one parse tree for a given sentence.

∗ note that local ambiguity is possible with grammars that are not

  • ambiguous. For instance, this grammar is not ambiguous even

though it gives rise to local ambiguity:

S → NP VP S → VP NP → Det N VP → V NP Det → the | that N → book | flight V → book

Raquel Fernández TtTv 2012 - lecture 3a 18 / 19

slide-19
SLIDE 19

Syntactic Disambiguation

Ambiguity is perhaps the worst enemy of parsers.

  • Syntactic disambiguation is the task of choosing one parse tree

among the possible parses of an ambiguous sentence.

  • This task is critical because structure guides how we assign

meaning to a given sentence.

  • Parsing by itself does not offer tools for syntactic disambiguation

– a parser can at most return all possible parse trees.

  • On Friday we’ll look into basic probabilistic techniques for

syntactic disambiguation (PCFGs).

Raquel Fernández TtTv 2012 - lecture 3a 19 / 19