Context Free Languages and Grammars Lecture 7 September 18, 2018 - - PowerPoint PPT Presentation

context free languages and grammars
SMART_READER_LITE
LIVE PREVIEW

Context Free Languages and Grammars Lecture 7 September 18, 2018 - - PowerPoint PPT Presentation

CS/ECE 374: Algorithms & Models of Computation, Fall 2018 Context Free Languages and Grammars Lecture 7 September 18, 2018 Nikita Borisov (UIUC) CS/ECE 374 1 Fall 2018 1 / 37 Regular Languages Regular expressions allow us to


slide-1
SLIDE 1

CS/ECE 374: Algorithms & Models of Computation, Fall 2018

Context Free Languages and Grammars

Lecture 7

September 18, 2018

Nikita Borisov (UIUC) CS/ECE 374 1 Fall 2018 1 / 37

slide-2
SLIDE 2

Regular Languages

Regular expressions allow us to describe/express a class of languages compactly and precisely. Equivalence with DFAs show the following: given any regular expression r there is a very efficient algorithm for solving the language recognition problem for L(r): given w ∈ Σ∗ is w ∈ L(r)?

Nikita Borisov (UIUC) CS/ECE 374 2 Fall 2018 2 / 37

slide-3
SLIDE 3

Regular Languages

Regular expressions allow us to describe/express a class of languages compactly and precisely. Equivalence with DFAs show the following: given any regular expression r there is a very efficient algorithm for solving the language recognition problem for L(r): given w ∈ Σ∗ is w ∈ L(r)? In fact the running time of the algorithm is linear in |w|.

Nikita Borisov (UIUC) CS/ECE 374 2 Fall 2018 2 / 37

slide-4
SLIDE 4

Regular Languages

Regular expressions allow us to describe/express a class of languages compactly and precisely. Equivalence with DFAs show the following: given any regular expression r there is a very efficient algorithm for solving the language recognition problem for L(r): given w ∈ Σ∗ is w ∈ L(r)? In fact the running time of the algorithm is linear in |w|. Disadvantage of regular expressions/languages:

Nikita Borisov (UIUC) CS/ECE 374 2 Fall 2018 2 / 37

slide-5
SLIDE 5

Regular Languages

Regular expressions allow us to describe/express a class of languages compactly and precisely. Equivalence with DFAs show the following: given any regular expression r there is a very efficient algorithm for solving the language recognition problem for L(r): given w ∈ Σ∗ is w ∈ L(r)? In fact the running time of the algorithm is linear in |w|. Disadvantage of regular expressions/languages: too simple and cannot express interesting features such as balanced parenthesis that we need in programming languages. No recursion allowed even in limited form.

Nikita Borisov (UIUC) CS/ECE 374 2 Fall 2018 2 / 37

slide-6
SLIDE 6

Language classes: Chomsky Hierarchy

Generative models for languages based on grammars.

Regular Context Free Context Sensitive Recursively Enumerable All

Nikita Borisov (UIUC) CS/ECE 374 3 Fall 2018 3 / 37

slide-7
SLIDE 7

Chomsky Hierarchy and Machines

For each class one can define a corresponding class of machines.

Regular Context Free Context Sensitive Recursively Enumerable All

DFA PDA TM LBA

Nikita Borisov (UIUC) CS/ECE 374 4 Fall 2018 4 / 37

slide-8
SLIDE 8

Programming Language Design

Question: What is a valid C program? Or a Python program? Question: Given a string w what is an algorithm to check whether w is a valid C program? The parsing problem.

Nikita Borisov (UIUC) CS/ECE 374 5 Fall 2018 5 / 37

slide-9
SLIDE 9

Context Free Languages and Grammars

Programming Language Specification Parsing Natural language understanding Generative model giving structure . . . CFLs provide a good balance between expressivity and tractability. Limited form of recursion.

Nikita Borisov (UIUC) CS/ECE 374 6 Fall 2018 6 / 37

slide-10
SLIDE 10

Programming Languages

Nikita Borisov (UIUC) CS/ECE 374 7 Fall 2018 7 / 37

slide-11
SLIDE 11

Natural Language Processing

Nikita Borisov (UIUC) CS/ECE 374 8 Fall 2018 8 / 37

slide-12
SLIDE 12

Models of Growth

L-systems http://www.kevs3d.co.uk/dev/lsystems/

Nikita Borisov (UIUC) CS/ECE 374 9 Fall 2018 9 / 37

slide-13
SLIDE 13

Kolam drawing generated by grammar

Nikita Borisov (UIUC) CS/ECE 374 10 Fall 2018 10 / 37

slide-14
SLIDE 14

Context Free Grammar (CFG) Definition

Definition

A CFG is is a quadruple G = (V , T, P, S) V is a finite set of non-terminal symbols

Nikita Borisov (UIUC) CS/ECE 374 11 Fall 2018 11 / 37

slide-15
SLIDE 15

Context Free Grammar (CFG) Definition

Definition

A CFG is is a quadruple G = (V , T, P, S) V is a finite set of non-terminal symbols T is a finite set of terminal symbols (alphabet)

Nikita Borisov (UIUC) CS/ECE 374 11 Fall 2018 11 / 37

slide-16
SLIDE 16

Context Free Grammar (CFG) Definition

Definition

A CFG is is a quadruple G = (V , T, P, S) V is a finite set of non-terminal symbols T is a finite set of terminal symbols (alphabet) P is a finite set of productions, each of the form A → α where A ∈ V and α is a string in (V ∪ T)∗. Formally, P ⊂ V × (V ∪ T)∗.

Nikita Borisov (UIUC) CS/ECE 374 11 Fall 2018 11 / 37

slide-17
SLIDE 17

Context Free Grammar (CFG) Definition

Definition

A CFG is is a quadruple G = (V , T, P, S) V is a finite set of non-terminal symbols T is a finite set of terminal symbols (alphabet) P is a finite set of productions, each of the form A → α where A ∈ V and α is a string in (V ∪ T)∗. Formally, P ⊂ V × (V ∪ T)∗. S ∈ V is a start symbol

Nikita Borisov (UIUC) CS/ECE 374 11 Fall 2018 11 / 37

slide-18
SLIDE 18

Example

V = {S} T = {a, b} P = {S → ǫ | a | b | aSa | bSb} (abbrev. for S → ǫ, S → a, S → b, S → aSa, S → bSb)

Nikita Borisov (UIUC) CS/ECE 374 12 Fall 2018 12 / 37

slide-19
SLIDE 19

Example

V = {S} T = {a, b} P = {S → ǫ | a | b | aSa | bSb} (abbrev. for S → ǫ, S → a, S → b, S → aSa, S → bSb) S aSA abSba abbSBba abbba

Nikita Borisov (UIUC) CS/ECE 374 12 Fall 2018 12 / 37

slide-20
SLIDE 20

Example

V = {S} T = {a, b} P = {S → ǫ | a | b | aSa | bSb} (abbrev. for S → ǫ, S → a, S → b, S → aSa, S → bSb) S aSA abSba abbSBba abbba What strings can S generate like this?

Nikita Borisov (UIUC) CS/ECE 374 12 Fall 2018 12 / 37

slide-21
SLIDE 21

Palindromes

Madam in Eden I’m Adam Dog doo? Good God! Dogma: I am God. A man, a plan, a canal, Panama Are we not drawn onward, we few, drawn onward to new era? Doc, note: I dissent. A fast never prevents a fatness. I diet on cod. http://www.palindromelist.net

Nikita Borisov (UIUC) CS/ECE 374 13 Fall 2018 13 / 37

slide-22
SLIDE 22

Example

L = {0n1n | n ≥ 0}

Nikita Borisov (UIUC) CS/ECE 374 14 Fall 2018 14 / 37

slide-23
SLIDE 23

Example

L = {0n1n | n ≥ 0} S → ǫ | 0S1

Nikita Borisov (UIUC) CS/ECE 374 14 Fall 2018 14 / 37

slide-24
SLIDE 24

Notation and Convention

Let G = (V , T, P, S) then a, b, c, d, . . . , in T (terminals) A, B, C, D, . . . , in V (non-terminals) u, v, w, x, y, . . . in T ∗ for strings of terminals α, β, γ, . . . in (V ∪ T)∗ X, Y , Z in V ∪ T

Nikita Borisov (UIUC) CS/ECE 374 15 Fall 2018 15 / 37

slide-25
SLIDE 25

“Derives” relation

Formalism for how strings are derived/generated

Definition

Let G = (V , T, P, S) be a CFG. For strings α1, α2 ∈ (V ∪ T)∗ we say α1 derives α2 denoted by α1 G α2 if there exist strings β, γ, δ in (V ∪ T)∗ such that α1 = βAδ α2 = βγδ A → γ is in P. Examples: S ǫ, S 0S1, 0S1 00S11, 0S1 01.

Nikita Borisov (UIUC) CS/ECE 374 16 Fall 2018 16 / 37

slide-26
SLIDE 26

“Derives” relation continued

Definition

For integer k ≥ 0, α1 k α2 inductive defined: α1 0 α2 if α1 = α2 α1 k α2 if α1 β1 and β1 k−1 α2.

Nikita Borisov (UIUC) CS/ECE 374 17 Fall 2018 17 / 37

slide-27
SLIDE 27

“Derives” relation continued

Definition

For integer k ≥ 0, α1 k α2 inductive defined: α1 0 α2 if α1 = α2 α1 k α2 if α1 β1 and β1 k−1 α2. Alternative defn: α1 k α2 if α1 k−1 β1 and β1 α2

Nikita Borisov (UIUC) CS/ECE 374 17 Fall 2018 17 / 37

slide-28
SLIDE 28

“Derives” relation continued

Definition

For integer k ≥ 0, α1 k α2 inductive defined: α1 0 α2 if α1 = α2 α1 k α2 if α1 β1 and β1 k−1 α2. Alternative defn: α1 k α2 if α1 k−1 β1 and β1 α2

  • ∗ is the reflexive and transitive closure of .

α1

∗ α2 if α1 k α2 for some k.

Examples: S

∗ ǫ, 0S1 ∗ 0000011111.

Nikita Borisov (UIUC) CS/ECE 374 17 Fall 2018 17 / 37

slide-29
SLIDE 29

Context Free Languages

Definition

The language generated by CFG G = (V , T, P, S) is denoted by L(G) where L(G) = {w ∈ T ∗ | S

∗ w}.

Nikita Borisov (UIUC) CS/ECE 374 18 Fall 2018 18 / 37

slide-30
SLIDE 30

Context Free Languages

Definition

The language generated by CFG G = (V , T, P, S) is denoted by L(G) where L(G) = {w ∈ T ∗ | S

∗ w}.

Definition

A language L is context free (CFL) if it is generated by a context free

  • grammar. That is, there is a CFG G such that L = L(G).

Nikita Borisov (UIUC) CS/ECE 374 18 Fall 2018 18 / 37

slide-31
SLIDE 31

Examples

L = {0n1n | n ≥ 0}

Nikita Borisov (UIUC) CS/ECE 374 19 Fall 2018 19 / 37

slide-32
SLIDE 32

Examples

L = {0n1n | n ≥ 0} L = {0n1m | m > n}

Nikita Borisov (UIUC) CS/ECE 374 19 Fall 2018 19 / 37

slide-33
SLIDE 33

Examples

L = {0n1n | n ≥ 0} L = {0n1m | m > n} L = {0n1m | m < n}

Nikita Borisov (UIUC) CS/ECE 374 19 Fall 2018 19 / 37

slide-34
SLIDE 34

Examples

L = {0n1n | n ≥ 0} L = {0n1m | m > n} L = {0n1m | m < n} L = {w ∈ {(, )}∗ | w is properly nested string of parenthesis}

Nikita Borisov (UIUC) CS/ECE 374 19 Fall 2018 19 / 37

slide-35
SLIDE 35

Examples

L = {0n1n | n ≥ 0} L = {0n1m | m > n} L = {0n1m | m < n} L = {w ∈ {(, )}∗ | w is properly nested string of parenthesis} L = {w ∈ {0, 1}∗ | w has twice as many 1s as 0’s}

Nikita Borisov (UIUC) CS/ECE 374 19 Fall 2018 19 / 37

slide-36
SLIDE 36

Closure Properties of CFLs

G1 = (V1, T, P1, S1) and G2 = (V2, T, P2, S2) Assumption: V1 ∩ V2 = ∅, that is, non-terminals are not shared

Nikita Borisov (UIUC) CS/ECE 374 20 Fall 2018 20 / 37

slide-37
SLIDE 37

Closure Properties of CFLs

G1 = (V1, T, P1, S1) and G2 = (V2, T, P2, S2) Assumption: V1 ∩ V2 = ∅, that is, non-terminals are not shared

Theorem

CFLs are closed under union. L1, L2 CFLs implies L1 ∪ L2 is a CFL.

Nikita Borisov (UIUC) CS/ECE 374 20 Fall 2018 20 / 37

slide-38
SLIDE 38

Closure Properties of CFLs

G1 = (V1, T, P1, S1) and G2 = (V2, T, P2, S2) Assumption: V1 ∩ V2 = ∅, that is, non-terminals are not shared

Theorem

CFLs are closed under union. L1, L2 CFLs implies L1 ∪ L2 is a CFL.

Theorem

CFLs are closed under concatenation. L1, L2 CFLs implies L1·L2 is a CFL.

Nikita Borisov (UIUC) CS/ECE 374 20 Fall 2018 20 / 37

slide-39
SLIDE 39

Closure Properties of CFLs

G1 = (V1, T, P1, S1) and G2 = (V2, T, P2, S2) Assumption: V1 ∩ V2 = ∅, that is, non-terminals are not shared

Theorem

CFLs are closed under union. L1, L2 CFLs implies L1 ∪ L2 is a CFL.

Theorem

CFLs are closed under concatenation. L1, L2 CFLs implies L1·L2 is a CFL.

Theorem

CFLs are closed under Kleene star. L CFL implies L∗ is a CFL.

Nikita Borisov (UIUC) CS/ECE 374 20 Fall 2018 20 / 37

slide-40
SLIDE 40

Closure Properties of CFLs

G1 = (V1, T, P1, S1) and G2 = (V2, T, P2, S2) Assumption: V1 ∩ V2 = ∅, that is, non-terminals are not shared

Theorem

CFLs are closed under union. L1, L2 CFLs implies L1 ∪ L2 is a CFL.

Theorem

CFLs are closed under concatenation. L1, L2 CFLs implies L1·L2 is a CFL.

Theorem

CFLs are closed under Kleene star. L CFL implies L∗ is a CFL.

Nikita Borisov (UIUC) CS/ECE 374 20 Fall 2018 20 / 37

slide-41
SLIDE 41

Exercise

Prove that every regular language is context-free using previous closure properties. Prove the set of regular expressions over an alphabet Σ forms a non-regular language which is context-free.

Nikita Borisov (UIUC) CS/ECE 374 21 Fall 2018 21 / 37

slide-42
SLIDE 42

Closure Properties of CFLs continued

Theorem

CFLs are not closed under complement or intersection.

Theorem

If L1 is a CFL and L2 is regular then L1 ∩ L2 is a CFL.

Nikita Borisov (UIUC) CS/ECE 374 22 Fall 2018 22 / 37

slide-43
SLIDE 43

Canonical non-CFL

Theorem

L = {anbncn | n ≥ 0} is not context-free. Proof based on pumping lemma for CFLs. Technical and outside the scope of this class.

Nikita Borisov (UIUC) CS/ECE 374 23 Fall 2018 23 / 37

slide-44
SLIDE 44

Parse Trees or Derivation Trees

A tree to represent the derivation S

∗ w.

Rooted tree with root labeled S Non-terminals at each internal node of tree Terminals at leaves Children of internal node indicate how non-terminal was expanded using a production rule

Nikita Borisov (UIUC) CS/ECE 374 24 Fall 2018 24 / 37

slide-45
SLIDE 45

Parse Trees or Derivation Trees

A tree to represent the derivation S

∗ w.

Rooted tree with root labeled S Non-terminals at each internal node of tree Terminals at leaves Children of internal node indicate how non-terminal was expanded using a production rule A picture is worth a thousand words

Nikita Borisov (UIUC) CS/ECE 374 24 Fall 2018 24 / 37

slide-46
SLIDE 46

Example

S à aSb | bSa | SS | ab| ba | ε

S è aSb è abSab è abSSab è abbaSab è abbaab A corresponding derivation of abbaab

S S b a S a b S S b a ε

A derivation tree for abbaab

(also called “parse tree”)

Nikita Borisov (UIUC) CS/ECE 374 25 Fall 2018 25 / 37

slide-47
SLIDE 47

Ambiguity in CFLs

Definition

A CFG G is ambiguous if there is a string w ∈ L(G) with two different parse trees. If there is no such string then G is unambiguous. Example: S → S − S | 1 | 2 | 3

S S S S – – S S – S S – S S 3 2 1 3 2 1 3–(2–1) (3–2)–1

Nikita Borisov (UIUC) CS/ECE 374 26 Fall 2018 26 / 37

slide-48
SLIDE 48

Ambiguity in CFLs

Original grammar: S → S − S | 1 | 2 | 3 Unambiguous grammar: S → S − C | 1 | 2 | 3 C → 1 | 2 | 3

S S – C – S C 3 2 1 (3–2)–1

The grammar forces a parse corresponding to left-to-right evaluation.

Nikita Borisov (UIUC) CS/ECE 374 27 Fall 2018 27 / 37

slide-49
SLIDE 49

Inherently ambiguous languages

Definition

A CFL L is inherently ambiguous if there is no unambiguous CFG G such that L = L(G).

Nikita Borisov (UIUC) CS/ECE 374 28 Fall 2018 28 / 37

slide-50
SLIDE 50

Inherently ambiguous languages

Definition

A CFL L is inherently ambiguous if there is no unambiguous CFG G such that L = L(G). There exist inherently ambiguous CFLs. Example: L = {anbmck | n = m or m = k}

Nikita Borisov (UIUC) CS/ECE 374 28 Fall 2018 28 / 37

slide-51
SLIDE 51

Inherently ambiguous languages

Definition

A CFL L is inherently ambiguous if there is no unambiguous CFG G such that L = L(G). There exist inherently ambiguous CFLs. Example: L = {anbmck | n = m or m = k} Given a grammar G it is undecidable to check whether L(G) is inherently ambiguous. No algorithm!

Nikita Borisov (UIUC) CS/ECE 374 28 Fall 2018 28 / 37

slide-52
SLIDE 52

Inductive proofs for CFGs

Question: How do we formally prove that a CFG L(G) = L? Example: S → ǫ | a | b | aSa | bSb

Theorem

L(G) = {palindromes} = {w | w = w R}

Nikita Borisov (UIUC) CS/ECE 374 29 Fall 2018 29 / 37

slide-53
SLIDE 53

Inductive proofs for CFGs

Question: How do we formally prove that a CFG L(G) = L? Example: S → ǫ | a | b | aSa | bSb

Theorem

L(G) = {palindromes} = {w | w = w R} Two directions: L(G) ⊆ L, that is, S

∗ w then w = w R

L ⊆ L(G), that is, w = w R then S

∗ w

Nikita Borisov (UIUC) CS/ECE 374 29 Fall 2018 29 / 37

slide-54
SLIDE 54

L(G) ⊆ L

Show that if S

∗ w then w = w R

By induction on length of derivation, meaning For all k ≥ 1, S

∗k w implies w = w R.

Nikita Borisov (UIUC) CS/ECE 374 30 Fall 2018 30 / 37

slide-55
SLIDE 55

L(G) ⊆ L

Show that if S

∗ w then w = w R

By induction on length of derivation, meaning For all k ≥ 1, S

∗k w implies w = w R.

If S 1 w then w = ǫ or w = a or w = b. Each case w = w R. Assume that for all k < n, that if S →k w then w = w R Let S n w (with n > 1). Wlog w begin with a.

Then S → aSa k−1 aua where w = aua. And S n−1 u and hence IH, u = uR. Therefore w r = (aua)R = (ua)Ra = auRa = aua = w.

Nikita Borisov (UIUC) CS/ECE 374 30 Fall 2018 30 / 37

slide-56
SLIDE 56

L ⊆ L(G)

Show that if w = w R then S

∗ w.

By induction on |w| That is, for all k ≥ 0, |w| = k and w = w R implies S

∗ w.

Exercise: Fill in proof.

Nikita Borisov (UIUC) CS/ECE 374 31 Fall 2018 31 / 37

slide-57
SLIDE 57

Mutual Induction

Situation is more complicated with grammars that have multiple non-terminals. See Section 5.3.2 of the notes for an example proof.

Nikita Borisov (UIUC) CS/ECE 374 32 Fall 2018 32 / 37

slide-58
SLIDE 58

Normal Forms

Normal forms are a way to restrict form of production rules Advantage: Simpler/more convenient algorithms and proofs

Nikita Borisov (UIUC) CS/ECE 374 33 Fall 2018 33 / 37

slide-59
SLIDE 59

Normal Forms

Normal forms are a way to restrict form of production rules Advantage: Simpler/more convenient algorithms and proofs Two standard normal forms for CFGs Chomsky normal form Greibach normal form

Nikita Borisov (UIUC) CS/ECE 374 33 Fall 2018 33 / 37

slide-60
SLIDE 60

Normal Forms

Chomsky Normal Form: Productions are all of the form A → BC or A → a. If ǫ ∈ L then S → ǫ is also allowed. Every CFG G can be converted into CNF form via an efficient algorithm Advantage: parse tree of constant degree.

Nikita Borisov (UIUC) CS/ECE 374 34 Fall 2018 34 / 37

slide-61
SLIDE 61

Normal Forms

Chomsky Normal Form: Productions are all of the form A → BC or A → a. If ǫ ∈ L then S → ǫ is also allowed. Every CFG G can be converted into CNF form via an efficient algorithm Advantage: parse tree of constant degree. Greiback Normal Form: Only productions of the form A → aβ are allowed. All CFLs without ǫ have a grammar in GNF. Efficient algorithm. Advantage: Every derivation adds exactly one terminal.

Nikita Borisov (UIUC) CS/ECE 374 34 Fall 2018 34 / 37

slide-62
SLIDE 62

Language recognition for CFLs

Algorithmic question: Given CFG G and string w ∈ Σ∗ is w ∈ L(G)?

Nikita Borisov (UIUC) CS/ECE 374 35 Fall 2018 35 / 37

slide-63
SLIDE 63

Language recognition for CFLs

Algorithmic question: Given CFG G and string w ∈ Σ∗ is w ∈ L(G)? Later in course: algorithm for above problem that runs in O(|w|3) time for any fixed grammar G. Via dynamic programming. Hence parsing problem for programming languages is solvable. However cubic time algorithm is too slow! For this reason grammars for PLs are restricted even further to make parsing algorithm faster (essentially linear time) — see CS 421 and compiler courses. In programming languages some amount of “context” may be

  • necessary. But CSL recognition is undecidable (no algorithm)! Hence

people use ad hoc methods for the limited needs in PLs.

Nikita Borisov (UIUC) CS/ECE 374 35 Fall 2018 35 / 37

slide-64
SLIDE 64

Things to know: Pushdown Automata

PDA: a NFA coupled with a stack PDAs and CFGs are equivalent: both generate exactly CFLs. PDA is a machine-centric view of CFLs. Helps prove that the intersection of a CFL and a regular language is a CFL.

Nikita Borisov (UIUC) CS/ECE 374 36 Fall 2018 36 / 37

slide-65
SLIDE 65

Chomsky Hierarchy

See Wikipedia article for more on Chomsky Hierarchy including the grammar rules for Context Sensitive Languages etc. https://en.wikipedia.org/wiki/Chomsky_hierarchy

Nikita Borisov (UIUC) CS/ECE 374 37 Fall 2018 37 / 37