COMP3630/6360: Theory of Computation Semester 1, 2020 The - PowerPoint PPT Presentation

COMP3630/6360: Theory of Computation Semester 1, 2020 The Australian National University Regular Expressions 1 / 21

This Lecture Covers Chapter 3 of HMU: Regular Expressions and Languages � Introduction to regular expressions and regular languages � Equivalence of classes of regular languages and languages accepted � Algebraic laws of (abstract) regular expressions Additional Reading: Chapter 3 of HMU.

Regular Expressions and Languages Regular Expressions: Overview ∠ So far: DFAs, NFAs were given a machine-like description ∠ Regular expressions are user-friendly and declarative formulation ∠ Regular expressions find extensive use. ∠ Searching/finding strings/pattern matching or conformance in text-formatting systems (e.g., UNIX grep , egrep , fgrep ) ∠ Lexical analyzers (in compilers) use regular expressions to identify tokens (e.g., Lex , Flex ) ∠ In Web forms to (structurally) validate entries (passwords, dates, email IDs) ∠ A regular expression over an alphabet Σ is a string consisting of: ∠ symbols from Σ ∠ constants: ∅ , ǫ ∠ operators: + , ∗ ∠ parantheses: (, ) ∠ Regular expressions are defined inductively. 3 / 21

Regular Expressions and Languages Regular Expressions: Definition ∠ Regular expressions are defined inductively as follows: ∠ Basis: B1 ∅ and ǫ are regular expressions. B2 For each a ∈ Σ , a is a regular expression. ∠ Induction: If E and F are regular expressions, then: Option 1: HMU Approach Option 2: A ‘Precise’ Approach I1 so is E ∗ I1’ so is ( E ∗ ) I2 so is E + F I2’ so is ( E + F ) I3 so is EF I3’ so is ( EF ) I4 so is ( E ) . ∠ Only those generated by the above induction are regular. ∠ Remark 1 : Some authors/texts use | instead of + . HMU uses +. ∠ Remark 2 : All expressions generated by Option 2 are also generated by Option 1. I 1 + I 4 ⇒ I 1 ′ ; I 2 + I 4 ⇒ I 2 ′ ; I 3 + I 4 ⇒ I 3 ′ . ∠ Remark 3 : Some expressions are regular according to Option 1 but not Option 2. E.g., (( 0 )) , 0 + 11 ∗ 4 / 21

Regular Expressions and Languages Regular Expressions: Examples ∠ Let Σ = { 0 , 1 } . ∠ (((( 0 + 1 ) 1 ) ∗ ) 0 ) is a regular expression 0 + 11 ∗ 0 is a regular expression Rule Rule Expression Expression 0 (B2) 0 (B2) 1 (B2) 1 (B2) (0+1) (I2’) 0 + 1 (I2) ((0+1)1) (I3’) (I3) 0 + 11 (((0+1)1)*) (I1’) (I1) 0 + 11 ∗ (I3) ((((0+1)1)*)0) (I3’) 0 + 11 ∗ 0 5 / 21

Regular Expressions and Languages What do Regular Expressions Stand for? ∠ Each properly parenthesized regular expression E (i.e., a regular expression that is generated by Option 2) is a shorthand for a language. ∠ A language is said to be regular if it corresponds to a regular expression. This correspondence is defined by the following induction procedure: ∠ Basis: B1 L ( ∅ ) = ∅ ; (Empty Language) L ( ǫ ) = { ǫ } ; (Language with only the empty string) B2 L ( a ) = { a } , a ∈ Σ (Language with only the symbol a ) ∠ Induction: For any regular expressions E and F , I1’ L (( E ∗ )) = ( L ( E )) ∗ = { ǫ } ∪ L ( E ) ∪ L ( E ) 2 ∪ · · · (Kleene- ∗ closure of L ( E ) ) I2’ L (( E + F )) = L ( E ) ∪ L ( F ) (Union) I3’ L (( EF )) = L ( E ) L ( F ) (Concatenation) What if a regular expression is generated by Option 1? 6 / 21

Regular Expressions and Languages What if an Expression is not Bracketed Properly? ∠ Improperly parenthesized expressions lead to confusion, e.g., is 0 + 11 the same as ( 0 + 1 ) 1 or ( 0 + ( 11 )) ? ∠ Improperly parenthesized regular expressions (generated by Option 1) must be converted to properly parenthesized expressions ∠ We remove unwanted parentheses by replacing (( E )) by ( E ) inductively. ∠ Additionally, if E is a symbol or a constant, we replace it by E + ∅ e.g., (( 0 )) ≡ ( 0 + ∅ ) , (( 0 + 1 )) ≡ ( 0 + 1 ) ∠ Apply precedence rules: ∠ First: ∗ applies to the smallest (properly bracketable) expression preceding ∗ , e.g., 01 ∗ ≡ ( 0 ( 1 ∗ )) ∠ Second: concatenation applies from left to right, e.g., 010 ≡ (( 01 ) 0 ) ∠ Third: + applies from left to right, e.g., a + b + c ≡ (( a + b ) + c ) Examples ∠ 0 + 11 ∗ ≡ ( 0 + ( 1 ( 1 ∗ ))) L ( 0 + 11 ∗ ) = L ((( 0 )) + 11 ∗ ) = ( L ( 0 ) ∪ ( L ( 1 ) L ( 1 ) ∗ ) = { 0 , 1 , 11 , 111 , 1111 , . . . } ∠ (( 0 )) + 11 ∗ ≡ (( 0 + ∅ ) + ( 1 ( 1 ∗ ))) 7 / 21

DFAs and Regular Languages Regular Languages: Some Basic Properties Theorem 3.2.1 Let w ∈ Σ ∗ . Then { w } is regular. Proof of Theorem 3.2.1 ∠ Languages { ǫ } and { a } for a ∈ Σ are regular (B1, B2). By a straightforward Induction argument, we can show that for any k ∈ N and w = s 1 · · · s k ∈ Σ k , { w } = L ( s 1 s 2 · · · s k ) . Theorem 3.2.2 Let L 1 and L 2 be regular languages. Then, L ∗ 1 , L 1 ∪ L 2 and L 1 L 2 are also regular. Proof of Theorem 3.2.2 ∠ Let L i = L ( E i ) for i = 1 , 2. Then, L ∗ 1 = L (( E ∗ 1 )) , L 1 ∪ L 2 = L (( E 1 + E 2 )) and L 1 L 2 = L (( E 1 E 2 )) . Since E ∗ 1 , ( E 1 + E 2 ) and ( E 1 E 2 ) are regular expressions, the claim holds. ∠ Corollary 1: The class of regular languages is closed under finite union and concatenation, i.e., if L 1 , . . . , L k are regular languages for any k ∈ N , then L 1 ∪ · · · L k and L 1 · · · L k are also regular languages. ∠ Corollary 2: Any finite language is regular. 8 / 21

DFAs and Regular Languages DFAs and Regular Languages Theorem 3.2.3 For every regular language M, there exists a DFA A such that M = L ( A ) . Proof of Theorem 3.2.3 ∠ WLOG, let Σ = { 0 , 1 } . Let M be a regular language. Then, M = L ( E ) for some regular expression E . ∠ For each regular expression, we will devise an ǫ -NFA. ∅ › ∠ Basis: 0 ; 1 A : A : 0 ; 1 q 0 q 1 q 0 q 1 1 0 q 2 q 2 A : 1 A : 0 1 0 q 0 q 1 q 0 q 1 9 / 21

DFAs and Regular Languages DFAs and Regular Languages Proof of Theorem 3.2.3 (Cont’d) ∠ Induction I1’: ( E ∗ ) › E E › . . . › . . . › 10 / 21

DFAs and Regular Languages DFAs and Regular Languages Proof of Theorem 3.2.3 (Cont’d) ∠ Induction I2’: E ( E + F ) E . . . . . . › F › F . . . . . . 11 / 21

DFAs and Regular Languages DFAs and Regular Languages Proof of Theorem 3.2.1 (Cont’d) ∠ Induction I3’: E . . . (EF) F E F . . . . . . . . . 12 / 21

DFAs and Regular Languages So Far... Languages accepted by Regular Languages DFAs, NFAs, › -NFAs Finite languages ∠ Is the inclusion strict? ∠ Are there languages accepted by DFAs that are not regular? 13 / 21

DFAs and Regular Languages DFAs and Regular Languages Theorem 3.2.4 For every DFA A, there is a regular expression E such that L ( A ) = L ( E ) . Proof of Theorem 3.2.4 ∠ Let DFA A = ( Q , Σ , δ, q 0 , F ) be given. ∠ Let us rename the states so that Q = { q 0 , q 1 , q 2 , . . . , q n − 1 ) . ∠ For any string s 1 . . . s k ∈ L ( A ) , there is a path s 1 s 2 s k − → q i 1 − → q i 2 · · · − → q i k ∈ F q 0 ∠ Define: R ( i , j , k ) be the set of all input strings that move the internal state of A from q i to q j using paths whose intermediate nodes comprise only of q ℓ , ℓ < k . States q k ,. . . , q n − 1 q j q i States q 0 ,. . . , q k − 1 ∠ Idea: prove that (a) each R ( i , j , k ) is regular, and (b) L ( A ) is a union of R ( i , j , k ) ’s. 14 / 21

DFAs and Regular Languages DFAs and Regular Languages Proof of Theorem 3.2.4 (Cont’d) ∠ Note that L ( A ) = � R ( 0 , j , n ) . (i.e., paths that start in q 0 and end in an accepting j : q j ∈ F state with intermediate nodes q 0 , q 1 , . . . , q n − 1 (all nodes)) ∠ L ( A ) will be regular if each R ( i , j , k ) to be regular. We now proceed by induction to show that each R ( i , j , k ) is regular. ∠ Basis: Consider R ( i , j , 0 ) for i , j ∈ { 0 , 1 , . . . , n − 1 } . ∠ R ( i , j , 0 ) consists of strings whose corresponding paths start in q i and end in q j with intermediate nodes q ℓ , ℓ < 0. ⇒ No intermediate nodes ⇒ R ( i , j , 0 ) contains strings that change state q i to q j directly ⇒ R ( i , j , 0 ) ⊆ { ǫ } ∪ Σ ⇒ R ( i , j , 0 ) is a regular language [Corollary 2] ∠ Induction: Let R ( i , j , ℓ ) be regular for i , j ∈ { 0 , . . . , n − 1 } and 0 ≤ ℓ < k . Consider R ( i , j , k ) for i , j ∈ { 0 , . . . , n − 1 } . 15 / 21

DFAs and Regular Languages DFAs and Regular Languages Proof of Theorem 3.2.4 (Cont’d) ∠ The strings in R ( i , j , k ) correspond either to paths whose intermediate nodes belong to { q 0 , . . . , q k − 1 } . ∠ Partition R ( i , j , k ) as follows: Case (a): Strings whose paths do not have q k − 1 as an intermediate node. Case (b): Strings whose paths do pass through q k − 1 as an intermediate node. q i q j case (b) States q 0 ; : : : ; q k − 2 ∠ R ( i , j , k ) = { Case (a) strings } ∪ { Case (b) strings } . ∠ Case (a) Strings are exactly those in R ( i , j , k − 1 ) ∠ Hence, R ( i , j , k ) = R ( i , j , k − 1 ) ∪ { Case (b) strings } . 16 / 21

COMP3630/6360: Theory of Computation Semester 1, 2020 The - PowerPoint PPT Presentation

COMP3630/6360: Theory of Computation Semester 1, 2020 The Australian National University Regular Expressions 1 / 21 This Lecture Covers Chapter 3 of HMU: Regular Expressions and Languages Introduction to regular expressions and regular

COMP3630/6360: Theory of Computation Semester 1, 2020 The Australian National University Finite

COMP3630/6360: Theory of Computation Semester 1, 2020 The Australian National University

COMP3630/6360: Theory of Computation Semester 1, 2020 The Australian National University Turing

COMP3630/6360: Theory of Computation Semester 1, 2020 The Australian National University

COMP3630/6360: Theory of Computation Semester 1, 2020 The Australian National University Context

COMP3630/6360: Theory of Computation Semester 1, 2020 The Australian National University Normal

Semester projects Semester projects Semester projects Semester projects Principles of Complex

Formal Definition of Computation Formal Definition of Computation p.1/28 Computation

CS 6360: Educational Technology Lecture 1: Overview Promise Why should you take this class?

L2_PythonCrashCourse August 17, 2017 1 Lecture 2: Python Crash Course CSCI 4360/6360: Data

Assignment 1 Postmortem CSCI 4360/6360 Data Science II Tuesday, September 5, 2017 Poll Review

BU CS 332 Theory of Computation Lecture 17: Reading: Midterm II review Sipser Ch 3.1

Theory of Computation CS3102 Gabriel Robins Department of Computer Science University of

Game Theory: Spring 2020 Ulle Endriss Institute for Logic, Language and Computation University

The European Semester of economic policy coordination Alexia Zammit European Semester Officer

Theory of Computation Textbook The Nature of Computation by Cristopher Moore and (CS

Concatenation hierarchies and separation Marc Zeitoun LaBRI, Bordeaux University Caalm 19,

Membership Properties for Regular Languages 5DV037 Fundamentals of Computer Science Ume a

Applications in finite state automata Completeness of Regular Relations Kurt Eberle

CS 301 Lecture 07 Closure properties of regular languages Stephen Checkoway February 7, 2018

CSCI 3136 Principles of Programming Languages Lexical Analysis and Automata Theory - 1 Summer

Problems from Formal Language Theory Computability and Complexity Decision Problems Acceptance:

Iteration (Kleene Star) Roland Backhouse October 15, 2002 2 Outline Axioms

Computer Systems Lecture 19 NFAs and Regular Expressions CS 230 - Spring 2020 4-1