comp3630 6360 theory of computation semester 1 2020 the
play

COMP3630/6360: Theory of Computation Semester 1, 2020 The - PowerPoint PPT Presentation

COMP3630/6360: Theory of Computation Semester 1, 2020 The Australian National University Regular Expressions 1 / 21 This Lecture Covers Chapter 3 of HMU: Regular Expressions and Languages Introduction to regular expressions and regular


  1. COMP3630/6360: Theory of Computation Semester 1, 2020 The Australian National University Regular Expressions 1 / 21

  2. This Lecture Covers Chapter 3 of HMU: Regular Expressions and Languages � Introduction to regular expressions and regular languages � Equivalence of classes of regular languages and languages accepted � Algebraic laws of (abstract) regular expressions Additional Reading: Chapter 3 of HMU.

  3. Regular Expressions and Languages Regular Expressions: Overview ∠ So far: DFAs, NFAs were given a machine-like description ∠ Regular expressions are user-friendly and declarative formulation ∠ Regular expressions find extensive use. ∠ Searching/finding strings/pattern matching or conformance in text-formatting systems (e.g., UNIX grep , egrep , fgrep ) ∠ Lexical analyzers (in compilers) use regular expressions to identify tokens (e.g., Lex , Flex ) ∠ In Web forms to (structurally) validate entries (passwords, dates, email IDs) ∠ A regular expression over an alphabet Σ is a string consisting of: ∠ symbols from Σ ∠ constants: ∅ , ǫ ∠ operators: + , ∗ ∠ parantheses: (, ) ∠ Regular expressions are defined inductively. 3 / 21

  4. Regular Expressions and Languages Regular Expressions: Definition ∠ Regular expressions are defined inductively as follows: ∠ Basis: B1 ∅ and ǫ are regular expressions. B2 For each a ∈ Σ , a is a regular expression. ∠ Induction: If E and F are regular expressions, then: Option 1: HMU Approach Option 2: A ‘Precise’ Approach I1 so is E ∗ I1’ so is ( E ∗ ) I2 so is E + F I2’ so is ( E + F ) I3 so is EF I3’ so is ( EF ) I4 so is ( E ) . ∠ Only those generated by the above induction are regular. ∠ Remark 1 : Some authors/texts use | instead of + . HMU uses +. ∠ Remark 2 : All expressions generated by Option 2 are also generated by Option 1. I 1 + I 4 ⇒ I 1 ′ ; I 2 + I 4 ⇒ I 2 ′ ; I 3 + I 4 ⇒ I 3 ′ . ∠ Remark 3 : Some expressions are regular according to Option 1 but not Option 2. E.g., (( 0 )) , 0 + 11 ∗ 4 / 21

  5. Regular Expressions and Languages Regular Expressions: Examples ∠ Let Σ = { 0 , 1 } . ∠ (((( 0 + 1 ) 1 ) ∗ ) 0 ) is a regular expression 0 + 11 ∗ 0 is a regular expression Rule Rule Expression Expression 0 (B2) 0 (B2) 1 (B2) 1 (B2) (0+1) (I2’) 0 + 1 (I2) ((0+1)1) (I3’) (I3) 0 + 11 (((0+1)1)*) (I1’) (I1) 0 + 11 ∗ (I3) ((((0+1)1)*)0) (I3’) 0 + 11 ∗ 0 5 / 21

  6. Regular Expressions and Languages What do Regular Expressions Stand for? ∠ Each properly parenthesized regular expression E (i.e., a regular expression that is generated by Option 2) is a shorthand for a language. ∠ A language is said to be regular if it corresponds to a regular expression. This correspondence is defined by the following induction procedure: ∠ Basis: B1 L ( ∅ ) = ∅ ; (Empty Language) L ( ǫ ) = { ǫ } ; (Language with only the empty string) B2 L ( a ) = { a } , a ∈ Σ (Language with only the symbol a ) ∠ Induction: For any regular expressions E and F , I1’ L (( E ∗ )) = ( L ( E )) ∗ = { ǫ } ∪ L ( E ) ∪ L ( E ) 2 ∪ · · · (Kleene- ∗ closure of L ( E ) ) I2’ L (( E + F )) = L ( E ) ∪ L ( F ) (Union) I3’ L (( EF )) = L ( E ) L ( F ) (Concatenation) What if a regular expression is generated by Option 1? 6 / 21

  7. Regular Expressions and Languages What if an Expression is not Bracketed Properly? ∠ Improperly parenthesized expressions lead to confusion, e.g., is 0 + 11 the same as ( 0 + 1 ) 1 or ( 0 + ( 11 )) ? ∠ Improperly parenthesized regular expressions (generated by Option 1) must be converted to properly parenthesized expressions ∠ We remove unwanted parentheses by replacing (( E )) by ( E ) inductively. ∠ Additionally, if E is a symbol or a constant, we replace it by E + ∅ e.g., (( 0 )) ≡ ( 0 + ∅ ) , (( 0 + 1 )) ≡ ( 0 + 1 ) ∠ Apply precedence rules: ∠ First: ∗ applies to the smallest (properly bracketable) expression preceding ∗ , e.g., 01 ∗ ≡ ( 0 ( 1 ∗ )) ∠ Second: concatenation applies from left to right, e.g., 010 ≡ (( 01 ) 0 ) ∠ Third: + applies from left to right, e.g., a + b + c ≡ (( a + b ) + c ) Examples ∠ 0 + 11 ∗ ≡ ( 0 + ( 1 ( 1 ∗ ))) L ( 0 + 11 ∗ ) = L ((( 0 )) + 11 ∗ ) = ( L ( 0 ) ∪ ( L ( 1 ) L ( 1 ) ∗ ) = { 0 , 1 , 11 , 111 , 1111 , . . . } ∠ (( 0 )) + 11 ∗ ≡ (( 0 + ∅ ) + ( 1 ( 1 ∗ ))) 7 / 21

  8. DFAs and Regular Languages Regular Languages: Some Basic Properties Theorem 3.2.1 Let w ∈ Σ ∗ . Then { w } is regular. Proof of Theorem 3.2.1 ∠ Languages { ǫ } and { a } for a ∈ Σ are regular (B1, B2). By a straightforward Induction argument, we can show that for any k ∈ N and w = s 1 · · · s k ∈ Σ k , { w } = L ( s 1 s 2 · · · s k ) . Theorem 3.2.2 Let L 1 and L 2 be regular languages. Then, L ∗ 1 , L 1 ∪ L 2 and L 1 L 2 are also regular. Proof of Theorem 3.2.2 ∠ Let L i = L ( E i ) for i = 1 , 2. Then, L ∗ 1 = L (( E ∗ 1 )) , L 1 ∪ L 2 = L (( E 1 + E 2 )) and L 1 L 2 = L (( E 1 E 2 )) . Since E ∗ 1 , ( E 1 + E 2 ) and ( E 1 E 2 ) are regular expressions, the claim holds. ∠ Corollary 1: The class of regular languages is closed under finite union and concatenation, i.e., if L 1 , . . . , L k are regular languages for any k ∈ N , then L 1 ∪ · · · L k and L 1 · · · L k are also regular languages. ∠ Corollary 2: Any finite language is regular. 8 / 21

  9. DFAs and Regular Languages DFAs and Regular Languages Theorem 3.2.3 For every regular language M, there exists a DFA A such that M = L ( A ) . Proof of Theorem 3.2.3 ∠ WLOG, let Σ = { 0 , 1 } . Let M be a regular language. Then, M = L ( E ) for some regular expression E . ∠ For each regular expression, we will devise an ǫ -NFA. ∅ › ∠ Basis: 0 ; 1 A : A : 0 ; 1 q 0 q 1 q 0 q 1 1 0 q 2 q 2 A : 1 A : 0 1 0 q 0 q 1 q 0 q 1 9 / 21

  10. DFAs and Regular Languages DFAs and Regular Languages Proof of Theorem 3.2.3 (Cont’d) ∠ Induction I1’: ( E ∗ ) › E E › . . . › . . . › 10 / 21

  11. DFAs and Regular Languages DFAs and Regular Languages Proof of Theorem 3.2.3 (Cont’d) ∠ Induction I2’: E ( E + F ) E . . . . . . › F › F . . . . . . 11 / 21

  12. DFAs and Regular Languages DFAs and Regular Languages Proof of Theorem 3.2.1 (Cont’d) ∠ Induction I3’: E . . . (EF) F E F . . . . . . . . . 12 / 21

  13. DFAs and Regular Languages So Far... Languages accepted by Regular Languages DFAs, NFAs, › -NFAs Finite languages ∠ Is the inclusion strict? ∠ Are there languages accepted by DFAs that are not regular? 13 / 21

  14. DFAs and Regular Languages DFAs and Regular Languages Theorem 3.2.4 For every DFA A, there is a regular expression E such that L ( A ) = L ( E ) . Proof of Theorem 3.2.4 ∠ Let DFA A = ( Q , Σ , δ, q 0 , F ) be given. ∠ Let us rename the states so that Q = { q 0 , q 1 , q 2 , . . . , q n − 1 ) . ∠ For any string s 1 . . . s k ∈ L ( A ) , there is a path s 1 s 2 s k − → q i 1 − → q i 2 · · · − → q i k ∈ F q 0 ∠ Define: R ( i , j , k ) be the set of all input strings that move the internal state of A from q i to q j using paths whose intermediate nodes comprise only of q ℓ , ℓ < k . States q k ,. . . , q n − 1 q j q i States q 0 ,. . . , q k − 1 ∠ Idea: prove that (a) each R ( i , j , k ) is regular, and (b) L ( A ) is a union of R ( i , j , k ) ’s. 14 / 21

  15. DFAs and Regular Languages DFAs and Regular Languages Proof of Theorem 3.2.4 (Cont’d) ∠ Note that L ( A ) = � R ( 0 , j , n ) . (i.e., paths that start in q 0 and end in an accepting j : q j ∈ F state with intermediate nodes q 0 , q 1 , . . . , q n − 1 (all nodes)) ∠ L ( A ) will be regular if each R ( i , j , k ) to be regular. We now proceed by induction to show that each R ( i , j , k ) is regular. ∠ Basis: Consider R ( i , j , 0 ) for i , j ∈ { 0 , 1 , . . . , n − 1 } . ∠ R ( i , j , 0 ) consists of strings whose corresponding paths start in q i and end in q j with intermediate nodes q ℓ , ℓ < 0. ⇒ No intermediate nodes ⇒ R ( i , j , 0 ) contains strings that change state q i to q j directly ⇒ R ( i , j , 0 ) ⊆ { ǫ } ∪ Σ ⇒ R ( i , j , 0 ) is a regular language [Corollary 2] ∠ Induction: Let R ( i , j , ℓ ) be regular for i , j ∈ { 0 , . . . , n − 1 } and 0 ≤ ℓ < k . Consider R ( i , j , k ) for i , j ∈ { 0 , . . . , n − 1 } . 15 / 21

  16. DFAs and Regular Languages DFAs and Regular Languages Proof of Theorem 3.2.4 (Cont’d) ∠ The strings in R ( i , j , k ) correspond either to paths whose intermediate nodes belong to { q 0 , . . . , q k − 1 } . ∠ Partition R ( i , j , k ) as follows: Case (a): Strings whose paths do not have q k − 1 as an intermediate node. Case (b): Strings whose paths do pass through q k − 1 as an intermediate node. q i q j case (b) States q 0 ; : : : ; q k − 2 ∠ R ( i , j , k ) = { Case (a) strings } ∪ { Case (b) strings } . ∠ Case (a) Strings are exactly those in R ( i , j , k − 1 ) ∠ Hence, R ( i , j , k ) = R ( i , j , k − 1 ) ∪ { Case (b) strings } . 16 / 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend