Formal Languages, Grammars and Automata Lecture 5 Helle Hvid Hansen - - PowerPoint PPT Presentation

formal languages grammars and automata lecture 5
SMART_READER_LITE
LIVE PREVIEW

Formal Languages, Grammars and Automata Lecture 5 Helle Hvid Hansen - - PowerPoint PPT Presentation

Introduction Context-Free Grammars Radboud University Nijmegen Regular Grammars The CYK Algorithm Formal Languages, Grammars and Automata Lecture 5 Helle Hvid Hansen helle@cs.ru.nl http://www.cs.ru.nl/~helle/ Foundations Group


slide-1
SLIDE 1

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Formal Languages, Grammars and Automata Lecture 5

Helle Hvid Hansen

helle@cs.ru.nl http://www.cs.ru.nl/~helle/

Foundations Group – Intelligent Systems Section Institute for Computing and Information Sciences Radboud University Nijmegen

6 June 2014

Helle Hvid Hansen 6 June 2014 FLGA 1 / 19

slide-2
SLIDE 2

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Midterm Enquete Results

lecture level count (very slow) 0 1 4 (good) 2 7 3 5 (very fast) 4 3 average: 2.4 exercise level count (very easy) 0 1 3 (appropriate) 2 9 3 6 (very hard) 4 average: 2.2 Comments:

  • Sometimes too fast, sometimes too slow (3 students).
  • Solutions online (2 students).

Helle Hvid Hansen 6 June 2014 FLGA 2 / 19

slide-3
SLIDE 3

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Overview

  • Applications of finite automata: text search, natural language

processing, lexical analysis (parsing), biology, video games (PacMan), internet protocols (TCP), ... (see notes on webpage).

  • Programming languages (like Java etc.) and natural

languages are generally not regular.

Helle Hvid Hansen 6 June 2014 FLGA 3 / 19

slide-4
SLIDE 4

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Overview

  • Applications of finite automata: text search, natural language

processing, lexical analysis (parsing), biology, video games (PacMan), internet protocols (TCP), ... (see notes on webpage).

  • Programming languages (like Java etc.) and natural

languages are generally not regular. Formal Languages Grammars (generators) Automata (acceptors)

Helle Hvid Hansen 6 June 2014 FLGA 3 / 19

slide-5
SLIDE 5

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Today

Topics:

  • Context-free grammars and context-free languages
  • Regular grammars.

Motivation/Application: Compilation of programming languages.

Helle Hvid Hansen 6 June 2014 FLGA 4 / 19

slide-6
SLIDE 6

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Compilation and Parsing

source code

lexical analysis (DFA)

  • (ASCII)

token string

parsing

  • (context-free language)

parse tree

code generation

  • (data structure)

executable code

Helle Hvid Hansen 6 June 2014 FLGA 5 / 19

slide-7
SLIDE 7

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Generating Strings with Production Rules

Example: S → O | E E → λ | aEa | bEb O → a | b | aOa | bOb

Helle Hvid Hansen 6 June 2014 FLGA 6 / 19

slide-8
SLIDE 8

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Generating Strings with Production Rules

Example: S → O | E E → λ | aEa | bEb O → a | b | aOa | bOb Productions, always start with S S → E → aEa → abEba → abba

Helle Hvid Hansen 6 June 2014 FLGA 6 / 19

slide-9
SLIDE 9

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Generating Strings with Production Rules

Example: S → O | E E → λ | aEa | bEb O → a | b | aOa | bOb Productions, always start with S S → E → aEa → abEba → abba S → E → bEb → baEab → babEbab → babaEabab → babaabab

Helle Hvid Hansen 6 June 2014 FLGA 6 / 19

slide-10
SLIDE 10

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Generating Strings with Production Rules

Example: S → O | E E → λ | aEa | bEb O → a | b | aOa | bOb Productions, always start with S S → E → aEa → abEba → abba S → E → bEb → baEab → babEbab → babaEabab → babaabab S → O → bOb → bab

Helle Hvid Hansen 6 June 2014 FLGA 6 / 19

slide-11
SLIDE 11

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Generating Strings with Production Rules

Example: S → O | E E → λ | aEa | bEb O → a | b | aOa | bOb Productions, always start with S S → E → aEa → abEba → abba S → E → bEb → baEab → babEbab → babaEabab → babaabab S → O → bOb → bab S → O → bOb → baOab → babab

Helle Hvid Hansen 6 June 2014 FLGA 6 / 19

slide-12
SLIDE 12

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Generating Strings with Production Rules

Example: S → O | E E → λ | aEa | bEb O → a | b | aOa | bOb Productions, always start with S S → E → aEa → abEba → abba S → E → bEb → baEab → babEbab → babaEabab → babaabab S → O → bOb → bab S → O → bOb → baOab → babab We can generate exactly the set of words w with w = wR (w is a palindrome)

Helle Hvid Hansen 6 June 2014 FLGA 6 / 19

slide-13
SLIDE 13

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Context-Free Grammar

  • Def. A context-free grammar (CFG) G = (V , Σ, S, P) consists of

V a set of non-terminal symbols Σ a set of terminal symbols S a start symbol, S ∈ V P a set of production rules of the form X → w where X ∈ V , w ∈ (V ∪ Σ)∗

Helle Hvid Hansen 6 June 2014 FLGA 7 / 19

slide-14
SLIDE 14

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Context-Free Grammar

  • Def. A context-free grammar (CFG) G = (V , Σ, S, P) consists of

V a set of non-terminal symbols Σ a set of terminal symbols S a start symbol, S ∈ V P a set of production rules of the form X → w where X ∈ V , w ∈ (V ∪ Σ)∗ Notation (Backus-Naur Form or BNF): Group together rules for the same non-terminal: E → λ | aEa | bEb is shorthand for three rules: E → λ, E → aEa, E → bEb

Helle Hvid Hansen 6 June 2014 FLGA 7 / 19

slide-15
SLIDE 15

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Derivations and Language

Let G = (V , Σ, S, P) be a context-free grammar, and let u, v, w ∈ (V ∪ Σ)∗ be arbitrary.

  • Given a string uXv ∈ (V ∪ Σ)∗, we can apply a rule X → w:

uXv → uwv

  • A derivation of u from v is a sequence of rule applications:

v → v′ → v′′ → · · · → u We say that u can be derived from v in G if there is a derivation of u from v in G, and write v ⇒ u.

Helle Hvid Hansen 6 June 2014 FLGA 8 / 19

slide-16
SLIDE 16

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Derivations and Language

Let G = (V , Σ, S, P) be a context-free grammar, and let u, v, w ∈ (V ∪ Σ)∗ be arbitrary.

  • Given a string uXv ∈ (V ∪ Σ)∗, we can apply a rule X → w:

uXv → uwv

  • A derivation of u from v is a sequence of rule applications:

v → v′ → v′′ → · · · → u We say that u can be derived from v in G if there is a derivation of u from v in G, and write v ⇒ u.

  • Why “context-free”?

A rule X → w can be applied in any context to replace X with w. (There are also context-sensitive grammars)

Helle Hvid Hansen 6 June 2014 FLGA 8 / 19

slide-17
SLIDE 17

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Language and Parsing

  • The language generated by G is

L(G) = {w ∈ Σ∗ | S ⇒ w}

  • A parser is an algorithm that determines for given G and w

whether w ∈ L(G)? Note: in general there can be many derivations of a w in G.

Helle Hvid Hansen 6 June 2014 FLGA 9 / 19

slide-18
SLIDE 18

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Language and Parsing

  • The language generated by G is

L(G) = {w ∈ Σ∗ | S ⇒ w}

  • A parser is an algorithm that determines for given G and w

whether w ∈ L(G)? Note: in general there can be many derivations of a w in G.

  • Def. A derivation is leftmost if in each step a rule is applied to the

leftmost non-terminal. (Think of reading from left to right). Rightmost derivations are defined analogously. Lemma: For a CFG G and word w, w ∈ L(G) iff there is a leftmost derivation of w in G. (So we can restrict to searching for a leftmost derivation.)

Helle Hvid Hansen 6 June 2014 FLGA 9 / 19

slide-19
SLIDE 19

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Another Example

G : S → aSB | λ, B → b Derivations of aabb: d1 : S → aSB → aaSBB → aaBB → aabB → aabb d2 : S → aSB → aSb → aaSBb → aaSbb → aabb d3 : S → aSB → aSb → aaSBb → aaBb → aabb (Beware of typo in derivations in [Silva], Example 5.2.4) Derivation d1 is leftmost, d2 is rightmost, d3 is neither. Derivation Trees (on blackboard).

Helle Hvid Hansen 6 June 2014 FLGA 10 / 19

slide-20
SLIDE 20

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Ambiguity

  • Def. A CFG G is unambiguous if for each w ∈ L(G) there

exists a unique leftmost derivation of w in G. Otherwise, G is ambiguous.

Helle Hvid Hansen 6 June 2014 FLGA 11 / 19

slide-21
SLIDE 21

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Ambiguity

  • Def. A CFG G is unambiguous if for each w ∈ L(G) there

exists a unique leftmost derivation of w in G. Otherwise, G is ambiguous.

  • Two syntactically correct strings can have different meanings.

Examples: – “Time flies like an arrow; fruit flies like a banana” – “The peasants are revolting” (“revolting”= disgusting/in rebellion)

Helle Hvid Hansen 6 June 2014 FLGA 11 / 19

slide-22
SLIDE 22

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Ambiguity

  • Def. A CFG G is unambiguous if for each w ∈ L(G) there

exists a unique leftmost derivation of w in G. Otherwise, G is ambiguous.

  • Two syntactically correct strings can have different meanings.

Examples: – “Time flies like an arrow; fruit flies like a banana” – “The peasants are revolting” (“revolting”= disgusting/in rebellion)

  • Programming language should be given by unambiguous

grammar.

Helle Hvid Hansen 6 June 2014 FLGA 11 / 19

slide-23
SLIDE 23

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Regular and Context-Free Languages

  • Def. A language L is context-free if there is a CFG G such that

L = L(G). Theorem: Every regular language is context-free.

Helle Hvid Hansen 6 June 2014 FLGA 12 / 19

slide-24
SLIDE 24

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Regular and Context-Free Languages

  • Def. A language L is context-free if there is a CFG G such that

L = L(G). Theorem: Every regular language is context-free. Proof: Let M = (Q, q0, δ, F) be a DFA that accepts L ⊆ Σ∗. Define a context-free grammar GM as follows: V = Q(non-terminals are states) Σ (terminal symbols are letters of alphabet) S = q0 P = {q → aq′ | δ(q)(a) = q′} ∪ {q → λ | q ∈ F}

Helle Hvid Hansen 6 June 2014 FLGA 12 / 19

slide-25
SLIDE 25

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Regular and Context-Free Languages

  • Def. A language L is context-free if there is a CFG G such that

L = L(G). Theorem: Every regular language is context-free. Proof: Let M = (Q, q0, δ, F) be a DFA that accepts L ⊆ Σ∗. Define a context-free grammar GM as follows: V = Q(non-terminals are states) Σ (terminal symbols are letters of alphabet) S = q0 P = {q → aq′ | δ(q)(a) = q′} ∪ {q → λ | q ∈ F} L(GM) = L(M) since computations correspond to derivations: M : q0

a1

− → q1

a2

− → · · ·

an

− → qn ∈ F GM : q0 → a1q1 → a1a2q2 → · · · → a1 · · · anqn → a1 · · · anλ

Helle Hvid Hansen 6 June 2014 FLGA 12 / 19

slide-26
SLIDE 26

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Regular Grammars

  • Def. A regular grammar is a CFG G = (V , Σ, S, P) in which all

rules have the form X → uY

  • r

X → u where X, Y ∈ V and u ∈ Σ∗ (u consists only of terminals).

Helle Hvid Hansen 6 June 2014 FLGA 13 / 19

slide-27
SLIDE 27

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Regular Grammars

  • Def. A regular grammar is a CFG G = (V , Σ, S, P) in which all

rules have the form X → uY

  • r

X → u where X, Y ∈ V and u ∈ Σ∗ (u consists only of terminals). Example: S → abaX | Y X → bX | aY | λ Y → X | aX | bb

Helle Hvid Hansen 6 June 2014 FLGA 13 / 19

slide-28
SLIDE 28

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Regular Grammars

  • Def. A regular grammar is a CFG G = (V , Σ, S, P) in which all

rules have the form X → uY

  • r

X → u where X, Y ∈ V and u ∈ Σ∗ (u consists only of terminals). Example: S → abaX | Y X → bX | aY | λ Y → X | aX | bb Theorem: Every regular language is generated by a regular grammar (by the previous construction). Note: In some texts, a regular grammar is defined as having rules

  • nly of the form X → aY or X → λ. This corresponds to the

difference between DFA and NFA-λ.

Helle Hvid Hansen 6 June 2014 FLGA 13 / 19

slide-29
SLIDE 29

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Regular Grammars and Regular Languages

Theorem: If G is a regular grammar, then L(G) is a regular.

Helle Hvid Hansen 6 June 2014 FLGA 14 / 19

slide-30
SLIDE 30

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Regular Grammars and Regular Languages

Theorem: If G is a regular grammar, then L(G) is a regular. Proof: Build an NFA-λ MG = (Q, q0, δ, F) as follows:

  • State set Q = V plus some extra states (see below), q0 = S,
  • F = {X ∈ V | X → λ} plus some extra states (see below)
  • Transitions:
  • For each rule of the form X → a1a2 · · · anY , add new states

q1, q2, . . . , qn−1 to Q and transitions X

a1

− → q1

a2

− → · · ·

an−1

− → qn−1

an

− → Y

  • For each rule of the form X → a1a2 · · · an, add new states

q1, q2, . . . , qn−1 to Q and transitions X

a1

− → q1

a2

− → · · ·

an−1

− → qn−1

an

− → qn and add qn to F.

  • For each rule of the form X → Y add λ-transition X

λ

− → Y .

Helle Hvid Hansen 6 June 2014 FLGA 14 / 19

slide-31
SLIDE 31

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Regular Grammars and Regular Languages II

...Proof continued: Again, derivations correspond to computations, e.g. in G : S → abaX | Y X → bX | aY | λ Y → X | aX | bb we have: G : S → Y → aX → aaY → aabb MG : S

λ

− → Y

a

− → X

a

− → Y

b

− → q1

b

− → qn ∈ F Hence we have: L(MG) = L(G).

Helle Hvid Hansen 6 June 2014 FLGA 15 / 19

slide-32
SLIDE 32

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Parsing Algorithm

Given a CFG G and a word w, how do we determine w ∈ L(G)? The Cocke-Younger-Kasami (CYK) ALgorithm. Requires G to be in Chomsky normal form.

Helle Hvid Hansen 6 June 2014 FLGA 16 / 19

slide-33
SLIDE 33

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Parsing Algorithm

Given a CFG G and a word w, how do we determine w ∈ L(G)? The Cocke-Younger-Kasami (CYK) ALgorithm. Requires G to be in Chomsky normal form.

  • Def. A CFG G is in Chomsky normal form if all its productions are
  • f the form:

X → YZ

  • r

X → a where X, Y , Z ∈ V and a ∈ Σ. (Note: λ / ∈ L(G).) Lemma: Every CFG G can be transformed into an equivalent CFG in Chomsky normal form. (See e.g. Sudkamp.)

Helle Hvid Hansen 6 June 2014 FLGA 16 / 19

slide-34
SLIDE 34

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

The CYK Algorithm

Idea: For each substring u of input word w, compute the set of non-terminals that can produce u. Example: G : S → AB | BA | SS | AC | BD, A → a, C → SB, B → b, D → SA. Run algorithm on w = aabbab. (continue on blackboard, see notes on webpage)

Helle Hvid Hansen 6 June 2014 FLGA 17 / 19

slide-35
SLIDE 35

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

The CYK-Algorithm

Helle Hvid Hansen 6 June 2014 FLGA 18 / 19

slide-36
SLIDE 36

Introduction Context-Free Grammars Regular Grammars The CYK Algorithm

Radboud University Nijmegen

Context-Free Art: http://www.contextfreeart.org

Helle Hvid Hansen 6 June 2014 FLGA 19 / 19