lecture slides for mat 73006 theoretical computer science
play

Lecture Slides for MAT-73006 Theoretical computer science PART Ib: - PowerPoint PPT Presentation

Lecture Slides for MAT-73006 Theoretical computer science PART Ib: Automata and Languages. Context-Free languages Henri Hansen January 26, 2015 1 Context-free languages There are several very simple languages that are not regu- lar, such


  1. Lecture Slides for MAT-73006 Theoretical computer science PART Ib: Automata and Languages. Context-Free languages Henri Hansen January 26, 2015 1

  2. Context-free languages • There are several very simple languages that are not regu- lar, such as { 0 n 1 n | n ≥ 0 } • They are ”simple” to describe mathematically, but computa- tionally the situation is different • An important class of languages is context-free languages . • We shall explore a way of describing these languages, called context-free grammars . 2

  3. • An important area of application for these grammars is found in programming languages

  4. Context-free grammar • Let us start with an example of a grammar: A → 0 A 1 A → B B → # • These three rules are substitution rules . The left hand side of each rule contains a variable , and the right hand side contains a string consisting of variables and terminal sym- bols 3

  5. • Terminal symbols are symbols of the language that is being defined, i.e., Σ is the set of terminal symbols • A grammar describes a language by generating the strings in the language. This happens by the following the proce- dure: 1. Write down the start variable. Unless otherwise stated, it is the left-hand side of the topmost rule 2. Find a variable that has been written down, and a rule that has this variable as it left-hand side. Replace the written down variable with the right-hand side of the rule 3. Repeat step 2 until no variables remain.

  6. • For example, the example grammar can generate the string 000#111 • The sequence of substitutions that results in the string is called a derivation . • A derivation can also have a graphic representation as a parse tree . • The set of strings that can be generated by a given grammar is called the language of the grammar .

  7. A more complicated example � SENTENCE � → � NOUN - PHRASE � � VERB - PHRASE � � NOUN - PHRASE � → � CMPLX - NOUN � | � CMPLX - NOUN � � PREP - PHRASE � � VERB - PHRASE � → � CMPLX - VERB � | � CMPLX - VERB � � PREP - PHRASE � � PREP - PHRASE � → � PREP � � CMPLX - NOUN � � CMPLX - NOUN � → � ARTICLE � � NOUN � � CMPLX - VERB � → � VERB � | � VERB � � NOUN - PHRASE � � ARTICLE � → a | the � NOUN � → boy | girl | flower � VERB � → likes | sees | touches � PREP � → with 4

  8. Formal definition of CFG • A context-free grammar is a 4-tuple ( V, Σ , R, S ) , where 1. V is a finite set called variables 2. Σ is a finite set, disjoint from V called terminals (AKA alphabet) 3. R is a finite set of rules , a rule being a pair ( v, σ ) where v is a variable and σ is s string of variables and termi- nals; also written as v → σ 4. S ∈ V is the starting variable 5

  9. • if u , v and w are strings of variables and terminals, and A → w is a rule of the grammar, then uAv yields the string uwv , written uAv ⇒ uwv . • We say that u derives v , written u ⇒ ∗ v if u = v or if there is some sequence u ⇒ u 1 ⇒ u 2 ⇒ · · · ⇒ u k ⇒ v • The language of the grammar is the set { w ∈ Σ ∗ | S ⇒ ∗ w }

  10. Examples of CFGs. • Often we write a CFG by simply giving the rules; the vari- ables are the symbols that appear at left-hand sides and the others are terminals. • S ⇒ aSb | SS | ǫ (think of a as "(" and b as ")") • E → E + T | T T → T × F | F F → ( E ) | n 6

  11. Where the alphabet is { n, + , × , ( , ) } • A compiler of a programming language translates code into another form; CFG:s are used, for instance in describing programming language syntax • the process by which the meaning of a string is found by relating it to a grammar, is known as parsing .

  12. Ambiguity • Consider the grammar rule E → E + E | E × E | ( E ) | a . There are several derivations for strings such as a + a × a • Definition: A grammar is ambiguous if there are two or more ways of deriving a string of its language • Ambiguity makes (unique) parsing impossible, so obviously one should strive to describe languages unambiguously when- ever possible, • Some languages are inherently ambiguous , i.e., all gram- mars that generate them, are ambiguous 7

  13. Pushdown automata • Regular languages were defined as languages that are rec- ognized by some finite automaton • Context-free languages can similarly be recognized by cer- tain kind of automata, due to the recursive nature of context- free languages, some form of memory is needed. • Informally, pushdown automata are like nondeterministic fi- nite automata, but instead of simply moving from one state to another, they use a stack to store information about what the automaton has done in the past, and this information affects what the automaton does next 8

  14. • When a pushdown automaton is in a given state, it responds to the alphabet that is read from the input, and to the vari- able that is on top of the stack. • Let us mark Σ ǫ the set Σ ∪ { ǫ } (and similarly for Γ ǫ • Formally: A pushdown automaton is a 6-tuple ( Q, Σ , Γ , δ, q 0 , F ) , where 1. Q is the (finite) set of states 2. Σ is the input alphabet 3. Γ is the stack alphabet

  15. 4. δ : Q × Σ ǫ × Γ ǫ �→ 2 Q × Γ ǫ is the nondeterministic tran- sition function 5. q 0 ∈ Q is the start state 6. F ⊆ Q is the set of accept states • A pushdown automaton (PDA) M = ( Q, Σ , Γ , δ, q 0 , F ) ac- cepts an input a 1 · · · a n (where a i ∈ Σ ǫ ) if and only if there is some sequence of states q 0 q 1 · · · q n and a set of strings g 0 , g 1 , · · · , g n of Γ ∗ ǫ such that the following conditions are met: 1. g 0 = ǫ , i.e., the automaton starts with an empty stack

  16. 2. for 0 ≤ i ≤ n − 1 we have ( q i +1 , x ) ∈ δ ( q i , a i +1 , y ) and g i = yt and g i +1 = xt ; i.e., the content of the stack is the same after the move, except possibly the topmost element 3. q n ∈ F • To understand the transition function, if ( q i +1 , x ) ∈ δ ( q i , a i +1 , y ) , then this transition can executed if y is on top of the stack, the automaton is in state q i and the next read input symbol is a i +1 . After it is executed, y is removed from the stack and x is put on top, and the automaton has moved to state q i +1

  17. Example • Consider the language { a i b j c k | i = j or i = k } i.e., either the number of b s or the number of c s is the same as the number of a s. • Informally, it is relatively easy to consider a PDA that ac- cepts the language: First read all a s, pushing a counter into the stack. Then, nondeterministically choose to count either the b s or the c s and match their number with a s. 9

  18. c, ǫ → ǫ b, a → ǫ ǫ, $ → ǫ q 2 q 3 ǫ, ǫ → ǫ ǫ, ǫ → $ ǫ, ǫ → ǫ ǫ, ǫ → ǫ ǫ, $ → ǫ q 0 q 1 q 4 q 5 q 6 a, ǫ → a c, a → ǫ b, ǫ → ǫ

  19. Equivalence • Pushdown automata and context-free grammars are equiv- alent in the same way as regular expressions and finite au- tomata are: • Theorem: A language is context-free if and only if there is a pushdown automaton that recognizes it • First we explain how to prove this in the other direction. Let A be a context free language. By definition then, it has a CFG, say G that generates it 10

  20. • The idea of the proof is as follows: We generate a nonde- terministic PDA that, when reaging an input "guesses" what substitutions are needed for a given string. 1. Initially, the PDA puts the start variable on the stack 2. After this, the automaton always looks at the top symbol of the stack. If it is a variable, then it nondeterministi- cally chooses a rule to apply, removes the variable and replaces the variable with the right-hand side of the rule (in reverse order) 3. If the top symbol is a terminal, then it compares it to the next input. If the symbols differ, this branch rejects; otherwise the top symbol is simply removed.

  21. 4. If the stack is empty when the input ends, the automaton accepts. • Please verify that the automaton accepts exactly the strings that are generated by the grammar! • The other direction is proven so that we generate a context free grammar from the transition relation of a PDA • Given a PDA P three modifications are made: 1. It will contain only one accepting state, q a . This is not a problem, because nondeterminism is allowed

  22. 2. The automaton only accepts after it has emptied the stack. This is not a restriction either 3. Every transition either pushes a symbol (but does not remove) or removes a symbol (but does not add) to the stack. Again, this is not a restriction, because transitions can be "split" into two. • The PDA is then used as a recipe for creating a grammar that generates exactly the language that is accepted by the PDA; let p be the first state and q be the last state (the unique accept state). • When P is computing on a string, say x , conditions 2 and 3 require that the first operation adds and the last operation

  23. removes a symbol of the stack. If the symbols are different, then the stack must have been empty at some point (why??) • If the symbols are the same, we create the rule A pq → aA rs b , where a is the input read at the first move and b at the last move. • If the symbols are not the same, then the there is some state r in which the stack is empty. we create a rule A pq → A pr A rq , and so on. • To formalize the proof, let ( Q, Σ , Γ , δ, q 0 , { q a } ) be a PDA (after the modification)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend