 
              Regular Expressions  Another means to describe languages Regular Expressions accepted by Finite Automata.  In some books, regular languages, by definition, are described using regular expressions. Specifying Languages Regular Languages  Recall: how do we specify languages?  A regular expression describes a  If language is finite, you can list all of its strings. language using only the set operations  L = {a, aa, aba, aca} of:  Descriptive:  Union  L = {x | n a (x) = n b (x)}  Concatenation  Using basic Language operations  L= {aa, ab} * ∪ {b}{bb} *  Kleene Star  Regular languages are described using this last method Kleene Star Operation Regular Expressions  The set of strings that can be obtained by  Regular expressions are the mechanism concatenating any number of elements of a by which regular languages are language L is called the Kleene Star, L * described: �  Take the “set operation” definition of the * U i 0 1 2 3 4 L L L L L L L ... = = � � � � language and: i 0 =  Replace ∪ with +  Note that since, L * contains L 0 , λ is an  Replace {} with () element of L *  And you have a regular expression 1
Regular expressions Regular Expression { λ } Recursive definition of regular languages / λ  expression over Σ : { 011} 011 ∅ is a regular language and its regular 1. { 0,1} 0 + 1 expression is ∅ { λ } is a regular language and λ is its regular {0, 01} 0 + 01 2. expression {110} * {0,1} (110) * (0+1) For each a ∈ Σ , { a } is a regular language and 3. {10, 11, 01} * (10 + 11 + 01) * its regular expression is a {0, 11} * ({11} * ∪ {101, λ }) (0 + 11) * ((11) * + 101 + λ ) Regular Expression Regular Expressions 4. If L 1 and L 2 are regular languages with regular  Some shorthand expressions r 1 and r 2 then  If we apply precedents to the operators, we can -- L 1 ∪ L 2 is a regular language with regular relax the full parenthesized definition: expression (r 1 + r 2 )  Kleene star has highest precedent -- L 1 L 2 is a regular language with regular  Concatenation had mid precedent expression (r 1 r 2 ) -- book uses (r 1 •r 2 )  + has lowest precedent -- L 1 * is a regular language with regular expression  Thus (r 1 * )  a + b * c is the same as (a + ((b * )c)) -- Regular expressions can be parenthesized to  (a + b) * is not the same as a + b * indicate operator precidence I.e. (r 1 ) Only languages obtainable by using rules 1-4 are regular languages . Regular Expressions Regular Expressions  More shorthand  Even more shorthand  Equating regular expressions.  Sometimes you might see in the book:  Two regular expressions are considered equal if  r n where n indicates the number of they describe the same language concatenations of r (e.g. r 6 )  1 * 1 * = 1 *  r + to indicate one or more concatenations of r.  (a + b) * ≠ a + b *  Note that this is only shorthand!  r 6 and r + are not regular expressions. 2
Regular Expressions Regular Expressions  Important thing to remember  Questions?  A regular expression is not a language  A regular expression is used to describe a language.  It is incorrect to say that for a language L,  L = (a + b + c) *  But it’s okay to say that L is described by  (a + b + c) * Examples of Regular Languages Examples of Regular Languages  All finite languages can be described  All finite languages can be described by using regular expressions regular expressions  A finite language L can be expressed as  Can anyone tell me why? the union of languages each with one string corresponding to a string in L  Example:  L = {a, aa, aba, aca}  L = {a} ∪ { aa} ∪ {aba} ∪ {aca}  Regular expression: (a + aa + aba + aca) Examples of Regular Languages Examples of Regular Languages  L = {x ∈ {0,1} * | |x| is even}  L = {x ∈ {0,1} * | x does not end in 01 }  Any string of even length can be obtained by  If x does not end in 01, then either concatenating strings length 2.  |x| < 2 or  Any concatenation of strings of length 2 will be even  x ends in 00, 10, or 11  L = {00, 01, 10, 11} *  A regular expression that describes L is:  ε + 0 + 1 + (0 + 1) * (00 + 10 + 11)  Regular expressions describing L:  (00 + 01 + 10 + 11) *  ((0 + 1)(0 + 1)) * 3
Useful properties of regular Examples of Regular Languages expressions  L = {x ∈ {0,1} * | x contains an odd  Commutative number of 0s }  L + M = M + L  Associative  Express x = yz  (L + M) + N = L + (M + N)  y is a string of the form y=1 i 01 j  (LM)N = L(MN)  In z, there must be an even number of  Identities additional 0s or z = (01 k 01 m ) *  ∅ + L = L + ∅ = L  x can be described by (1 * 01 * )(01 * 01 * ) *  λ L = L λ = L  Questions?  ∅ L = L ∅ = ∅ Useful properties of regular Useful properties of regular expressions expressions  Closures  Distributed  (L * ) * = L *  L (M + N) = LM + LN  ∅ * = λ  (M + N)L = ML + NL  λ * = λ  Idempotent  L + = LL *  L + L = L  L * = L + + λ  Questions? Practical uses for regular expressions Practical uses for regular expressions  grep  How a compiler works  Global (search for) Regular Expressions and Print Stream Parse lexer parser codegen of tokens  Finds patterns of characters in a text file. Tree Object  grep man foo.txt code Source  grep [ab]*c[de]? foo.txt file 4
Practical uses for regular expressions Practical uses for regular expressions  How a compiler works  How a compiler works  The Lexical Analyzer (lexer) reads source  Tokens can be described using regular code and generates a stream of tokens expressions!  What is a token?  Identifier  Keyword  Number  Operator  Punctuation Examples of Regular Languages Examples of Regular Languages  L = set of valid C identifiers  L = set of valid C keywords  A valid C identifier begins with a letter or _  This is a finite set  A valid C identifier contains letters,  L can be described by numbers, and _  if + then + else + while + do + goto + break  If we let: + switch + …  l = {a , b , … , z , A , B , … , Z}  d = {1 , 2 , … , 9 , 0}  Then a regular expression for L:  (l + _)(l + d + _) * Summary Practical uses for regular expressions  lex  Regular languages can be expressed using only the set operations of union, concatenation, Kleene Star.  Program that will create a lexical analyzer.  Regular languages  Input: set of valid tokens  Means of describing: Regular Expression  Machine for accepting: Finite Automata  Tokens are given by regular expressions.  Practical uses  Text search (grep)  Compilers / Lexical Analysis (lex)  Questions?  Questions? 5
For next time The bottom line  Chicken or the egg?  Regular expressions and finite automata are equivalent in their ability to describe  Which came first, the regular expression or the finite automata? languages.  McCulloch/Pitts -- used finite automata to model neural  Every regular expression has a FA that accepts the networks (1943) language it describes  Kleene (mid 1950s) -- Applied to regular sets  The language accepted by an FA can be described  Ken Thompson/ Bell Labs folk (1970s) -- QED / ed / grep by some regular expression. / lex / awk / …  The Kleene Theorem! (1956)  Recall:  Princeton dudes (1937)  But that’s next time…. 6
Recommend
More recommend