taaltheorie en taalverwerking
play

Taaltheorie en Taalverwerking BSc Artificial Intelligence Raquel - PowerPoint PPT Presentation

Taaltheorie en Taalverwerking BSc Artificial Intelligence Raquel Fernndez Institute for Logic, Language, and Computation Winter 2014, lecture 1a Raquel Fernndez TTTV 2014 - lecture 1a 1 TTTV: Practical Matters Lecturer: Raquel


  1. Formal Languages: strings and alphabets A formal language is a set of strings, each string composed of symbols from a finite set called an alphabet (or a vocabulary). Examples • Let Σ 1 = { 0 , 1 } be an alphabet. Then all binary numbers are strings over Σ 1 . For instance: 01101 , 000001 , 1101 . • Let Σ 2 = { a , b , c , d , e , f , g } be an alphabet. Then bee , dad , cabbage , and face are strings over Σ 2 , as are fffff and agagag . • Let Σ 3 = { ba , ca , fa , ce , fe , ge } be an alphabet. Then face is a string over Σ 3 but bee , dad or cabbage are not. Raquel Fernández TTTV 2014 - lecture 1a 9

  2. Formal Languages: strings and alphabets A formal language is a set of strings, each string composed of symbols from a finite set called an alphabet (or a vocabulary). Examples • Let Σ 1 = { 0 , 1 } be an alphabet. Then all binary numbers are strings over Σ 1 . For instance: 01101 , 000001 , 1101 . • Let Σ 2 = { a , b , c , d , e , f , g } be an alphabet. Then bee , dad , cabbage , and face are strings over Σ 2 , as are fffff and agagag . • Let Σ 3 = { ba , ca , fa , ce , fe , ge } be an alphabet. Then face is a string over Σ 3 but bee , dad or cabbage are not. • Let Σ 4 = {♠ , △ , ♣} be an alphabet. Then ♠♠ and ♣△♣ are strings over Σ 4 . Raquel Fernández TTTV 2014 - lecture 1a 9

  3. Strings and Substrings The length of a string is the number of token symbols from the alphabet it contains. Raquel Fernández TTTV 2014 - lecture 1a 10

  4. Strings and Substrings The length of a string is the number of token symbols from the alphabet it contains. Examples • the length of face over Σ 2 = { a , b , c , d , e , f , g } is 4 Raquel Fernández TTTV 2014 - lecture 1a 10

  5. Strings and Substrings The length of a string is the number of token symbols from the alphabet it contains. Examples • the length of face over Σ 2 = { a , b , c , d , e , f , g } is 4 • the length of face over Σ 3 = { ba , ca , fa , ce , fe , ge } is Raquel Fernández TTTV 2014 - lecture 1a 10

  6. Strings and Substrings The length of a string is the number of token symbols from the alphabet it contains. Examples • the length of face over Σ 2 = { a , b , c , d , e , f , g } is 4 • the length of face over Σ 3 = { ba , ca , fa , ce , fe , ge } is 2 Raquel Fernández TTTV 2014 - lecture 1a 10

  7. Strings and Substrings The length of a string is the number of token symbols from the alphabet it contains. Examples • the length of face over Σ 2 = { a , b , c , d , e , f , g } is 4 • the length of face over Σ 3 = { ba , ca , fa , ce , fe , ge } is 2 The string of length 0 is called the empty string, denoted ǫ Raquel Fernández TTTV 2014 - lecture 1a 10

  8. Strings and Substrings The length of a string is the number of token symbols from the alphabet it contains. Examples • the length of face over Σ 2 = { a , b , c , d , e , f , g } is 4 • the length of face over Σ 3 = { ba , ca , fa , ce , fe , ge } is 2 The string of length 0 is called the empty string, denoted ǫ Given a string s , a substring of s is a string formed by taking contiguous symbols of s in the order in which they occurr in s . An initial substring is called a prefix and a final substring, a suffix. Raquel Fernández TTTV 2014 - lecture 1a 10

  9. Strings and Substrings The length of a string is the number of token symbols from the alphabet it contains. Examples • the length of face over Σ 2 = { a , b , c , d , e , f , g } is 4 • the length of face over Σ 3 = { ba , ca , fa , ce , fe , ge } is 2 The string of length 0 is called the empty string, denoted ǫ Given a string s , a substring of s is a string formed by taking contiguous symbols of s in the order in which they occurr in s . An initial substring is called a prefix and a final substring, a suffix. Examples Let unthinkable be a string over Σ = { a , b , c . . . x , y , z } Raquel Fernández TTTV 2014 - lecture 1a 10

  10. Strings and Substrings The length of a string is the number of token symbols from the alphabet it contains. Examples • the length of face over Σ 2 = { a , b , c , d , e , f , g } is 4 • the length of face over Σ 3 = { ba , ca , fa , ce , fe , ge } is 2 The string of length 0 is called the empty string, denoted ǫ Given a string s , a substring of s is a string formed by taking contiguous symbols of s in the order in which they occurr in s . An initial substring is called a prefix and a final substring, a suffix. Examples Let unthinkable be a string over Σ = { a , b , c . . . x , y , z } Then, ǫ , un , unth , unthinkable are prefixes, while ǫ , e , able , thinkable , and unthinkable are suffixes. Other substrings include nthi , inka , bl . Raquel Fernández TTTV 2014 - lecture 1a 10

  11. Some Operations on Strings Raquel Fernández TTTV 2014 - lecture 1a 11

  12. Some Operations on Strings • Concatenation: two string s 1 and s 2 over Σ can be concatenated (written one after the other) to form a new string s 1 · s 2 over Σ . Σ = { a , b } a · b = ab Raquel Fernández TTTV 2014 - lecture 1a 11

  13. Some Operations on Strings • Concatenation: two string s 1 and s 2 over Σ can be concatenated (written one after the other) to form a new string s 1 · s 2 over Σ . Σ = { a , b } a · b = ab • Exponent: we can apply an exponent operator n to a string s . The resulting string s n is obtained by concatenating s with itself n times. a 0 = ǫ , a 1 = a , a 2 = aa , a 3 = aaa . . . Raquel Fernández TTTV 2014 - lecture 1a 11

  14. Some Operations on Strings • Concatenation: two string s 1 and s 2 over Σ can be concatenated (written one after the other) to form a new string s 1 · s 2 over Σ . Σ = { a , b } a · b = ab • Exponent: we can apply an exponent operator n to a string s . The resulting string s n is obtained by concatenating s with itself n times. a 0 = ǫ , a 1 = a , a 2 = aa , a 3 = aaa . . . • Kleene star: a special exponent operator ∗ which applied to a string s denotes any string obtained by concatenating s with itself any number of times. a ∗ = ǫ or a or aa or aaa . . . Raquel Fernández TTTV 2014 - lecture 1a 11

  15. Formal Languages Σ ∗ Raquel Fernández TTTV 2014 - lecture 1a 12

  16. Formal Languages Σ ∗ denotes the set of all strings over an alphabet Σ . → note that Σ ∗ is always infinite, regardless of the number of symbols Σ contains. Raquel Fernández TTTV 2014 - lecture 1a 12

  17. Formal Languages Σ ∗ denotes the set of all strings over an alphabet Σ . → note that Σ ∗ is always infinite, regardless of the number of symbols Σ contains. We may now define a formal language over an alphabet Σ as any subset of Σ ∗ Raquel Fernández TTTV 2014 - lecture 1a 12

  18. Formal Languages Σ ∗ denotes the set of all strings over an alphabet Σ . → note that Σ ∗ is always infinite, regardless of the number of symbols Σ contains. We may now define a formal language over an alphabet Σ as any subset of Σ ∗ Examples of formal languages Let Σ = { a , b , c . . . x , y , z } . Then Σ ∗ is the set of strings over the Latin alphabet and the following subsets of Σ ∗ are possible formal languages: Raquel Fernández TTTV 2014 - lecture 1a 12

  19. Formal Languages Σ ∗ denotes the set of all strings over an alphabet Σ . → note that Σ ∗ is always infinite, regardless of the number of symbols Σ contains. We may now define a formal language over an alphabet Σ as any subset of Σ ∗ Examples of formal languages Let Σ = { a , b , c . . . x , y , z } . Then Σ ∗ is the set of strings over the Latin alphabet and the following subsets of Σ ∗ are possible formal languages: • the set of strings consisting of consonants only Raquel Fernández TTTV 2014 - lecture 1a 12

  20. Formal Languages Σ ∗ denotes the set of all strings over an alphabet Σ . → note that Σ ∗ is always infinite, regardless of the number of symbols Σ contains. We may now define a formal language over an alphabet Σ as any subset of Σ ∗ Examples of formal languages Let Σ = { a , b , c . . . x , y , z } . Then Σ ∗ is the set of strings over the Latin alphabet and the following subsets of Σ ∗ are possible formal languages: • the set of strings consisting of consonants only • the set of strings containing at least one vowel and one consonant Raquel Fernández TTTV 2014 - lecture 1a 12

  21. Formal Languages Σ ∗ denotes the set of all strings over an alphabet Σ . → note that Σ ∗ is always infinite, regardless of the number of symbols Σ contains. We may now define a formal language over an alphabet Σ as any subset of Σ ∗ Examples of formal languages Let Σ = { a , b , c . . . x , y , z } . Then Σ ∗ is the set of strings over the Latin alphabet and the following subsets of Σ ∗ are possible formal languages: • the set of strings consisting of consonants only • the set of strings containing at least one vowel and one consonant • the set of strings whose length is less than 9 symbols Raquel Fernández TTTV 2014 - lecture 1a 12

  22. Formal Languages Σ ∗ denotes the set of all strings over an alphabet Σ . → note that Σ ∗ is always infinite, regardless of the number of symbols Σ contains. We may now define a formal language over an alphabet Σ as any subset of Σ ∗ Examples of formal languages Let Σ = { a , b , c . . . x , y , z } . Then Σ ∗ is the set of strings over the Latin alphabet and the following subsets of Σ ∗ are possible formal languages: • the set of strings consisting of consonants only • the set of strings containing at least one vowel and one consonant • the set of strings whose length is less than 9 symbols • the set { one , two , three , four , five , six , seven , eight , nine , ten } Raquel Fernández TTTV 2014 - lecture 1a 12

  23. Formal Languages Σ ∗ denotes the set of all strings over an alphabet Σ . → note that Σ ∗ is always infinite, regardless of the number of symbols Σ contains. We may now define a formal language over an alphabet Σ as any subset of Σ ∗ Examples of formal languages Let Σ = { a , b , c . . . x , y , z } . Then Σ ∗ is the set of strings over the Latin alphabet and the following subsets of Σ ∗ are possible formal languages: • the set of strings consisting of consonants only • the set of strings containing at least one vowel and one consonant • the set of strings whose length is less than 9 symbols • the set { one , two , three , four , five , six , seven , eight , nine , ten } • the set of all English words Raquel Fernández TTTV 2014 - lecture 1a 12

  24. Formal Languages Σ ∗ denotes the set of all strings over an alphabet Σ . → note that Σ ∗ is always infinite, regardless of the number of symbols Σ contains. We may now define a formal language over an alphabet Σ as any subset of Σ ∗ Examples of formal languages Let Σ = { a , b , c . . . x , y , z } . Then Σ ∗ is the set of strings over the Latin alphabet and the following subsets of Σ ∗ are possible formal languages: • the set of strings consisting of consonants only • the set of strings containing at least one vowel and one consonant • the set of strings whose length is less than 9 symbols • the set { one , two , three , four , five , six , seven , eight , nine , ten } • the set of all English words • the empty set Raquel Fernández TTTV 2014 - lecture 1a 12

  25. Formal Languages How can we characterise the language(s) we are interested in? • given an alphabet Σ and the infinite set Σ ∗ of formal languages it can give rise to, how can we select a particular formal language? Raquel Fernández TTTV 2014 - lecture 1a 13

  26. Formal Languages How can we characterise the language(s) we are interested in? • given an alphabet Σ and the infinite set Σ ∗ of formal languages it can give rise to, how can we select a particular formal language? For instance: ∗ can we distinguish the set of strings of letters that constitute proper English words? ∗ can we distinguish the set of strings of words that count as well-formed sentences of Dutch? Raquel Fernández TTTV 2014 - lecture 1a 13

  27. Formal Languages How can we characterise the language(s) we are interested in? • given an alphabet Σ and the infinite set Σ ∗ of formal languages it can give rise to, how can we select a particular formal language? For instance: ∗ can we distinguish the set of strings of letters that constitute proper English words? ∗ can we distinguish the set of strings of words that count as well-formed sentences of Dutch? We have two formal mechanisms at our disposal: • formalisms (formal expressions and grammars): sets of rules • automata: computational devices for computing languages Raquel Fernández TTTV 2014 - lecture 1a 13

  28. Formalisms and Automata • Formalisms and automata allow us to distinguish a formal language of interest (a set of strings) from other possible languages over a given alphabet. ∗ they capture the patterns that characterise a language ∗ as such, they act as a definition of the language they capture • From an abstract point of view, a natural language – like Dutch or English – is a set of strings (of sounds/letters, of words, etc.) • Therefore, formalisms and automata can help us to model aspects of natural languages. Raquel Fernández TTTV 2014 - lecture 1a 14

  29. Formalisms and Automata • Formalisms and automata allow us to distinguish a formal language of interest (a set of strings) from other possible languages over a given alphabet. ∗ they capture the patterns that characterise a language ∗ as such, they act as a definition of the language they capture • From an abstract point of view, a natural language – like Dutch or English – is a set of strings (of sounds/letters, of words, etc.) • Therefore, formalisms and automata can help us to model aspects of natural languages. This week: we’ll look into one formalism to define formal languages, regular expressions, and into one type of automaton, finite state automata. Raquel Fernández TTTV 2014 - lecture 1a 14

  30. Regular Expressions Regular expressions are a formal notation for characterising sets of strings that follow a fairly simple regular pattern. Raquel Fernández TTTV 2014 - lecture 1a 15

  31. Regular Expressions Regular expressions are a formal notation for characterising sets of strings that follow a fairly simple regular pattern. We can construct regular expressions over an alphabet Σ as follows: Regular expression Languages empty set: ∅ {} { ǫ } empty string: ǫ symbol ( ∀ a ∈ Σ ): a { a } If a and b are reg exp, so are: concatenation: a · b { ab } disjunction (or union): ( a | b ) { a , b } Kleene star (or closure): { ǫ, a , aa , aaa , aaaaa , . . . } a ∗ Raquel Fernández TTTV 2014 - lecture 1a 15

  32. Regular Expressions Regular expressions are a formal notation for characterising sets of strings that follow a fairly simple regular pattern. We can construct regular expressions over an alphabet Σ as follows: Regular expression Languages empty set: ∅ {} { ǫ } empty string: ǫ symbol ( ∀ a ∈ Σ ): a { a } If a and b are reg exp, so are: concatenation: a · b { ab } disjunction (or union): ( a | b ) { a , b } Kleene star (or closure): { ǫ, a , aa , aaa , aaaaa , . . . } a ∗ • we often ignore the dot in concatenation ( a · b ) and simply write ab • disjunction (or union) may be written as a | b , a + b or a ∪ b Raquel Fernández TTTV 2014 - lecture 1a 15

  33. Regular Expressions Regular expressions are a formal notation for characterising sets of strings that follow a fairly simple regular pattern. We can construct regular expressions over an alphabet Σ as follows: Regular expression Languages empty set: ∅ {} { ǫ } empty string: ǫ symbol ( ∀ a ∈ Σ ): a { a } If a and b are reg exp, so are: concatenation: a · b { ab } disjunction (or union): ( a | b ) { a , b } Kleene star (or closure): { ǫ, a , aa , aaa , aaaaa , . . . } a ∗ • we often ignore the dot in concatenation ( a · b ) and simply write ab • disjunction (or union) may be written as a | b , a + b or a ∪ b • a + is the set of a -strings with at least one a ; Raquel Fernández TTTV 2014 - lecture 1a 15

  34. Regular Expressions Regular expressions are a formal notation for characterising sets of strings that follow a fairly simple regular pattern. We can construct regular expressions over an alphabet Σ as follows: Regular expression Languages empty set: ∅ {} { ǫ } empty string: ǫ symbol ( ∀ a ∈ Σ ): a { a } If a and b are reg exp, so are: concatenation: a · b { ab } disjunction (or union): ( a | b ) { a , b } Kleene star (or closure): { ǫ, a , aa , aaa , aaaaa , . . . } a ∗ • we often ignore the dot in concatenation ( a · b ) and simply write ab • disjunction (or union) may be written as a | b , a + b or a ∪ b • a + is the set of a -strings with at least one a ; can be used to abbreviate a ∗ a or aa ∗ Raquel Fernández TTTV 2014 - lecture 1a 15

  35. Regular Expressions Regular expressions are a formal notation for characterising sets of strings that follow a fairly simple regular pattern. We can construct regular expressions over an alphabet Σ as follows: Regular expression Languages empty set: ∅ {} { ǫ } empty string: ǫ symbol ( ∀ a ∈ Σ ): a { a } If a and b are reg exp, so are: concatenation: a · b { ab } disjunction (or union): ( a | b ) { a , b } Kleene star (or closure): { ǫ, a , aa , aaa , aaaaa , . . . } a ∗ • we often ignore the dot in concatenation ( a · b ) and simply write ab • disjunction (or union) may be written as a | b , a + b or a ∪ b • a + is the set of a -strings with at least one a ; can be used to abbreviate a ∗ a or aa ∗ • the notation Σ ∗ can be seen as abbreviating ( a | b | ... ) ∗ for any symbol a , b , . . . in Σ . Raquel Fernández TTTV 2014 - lecture 1a 15

  36. Regular Expressions Regular expressions are a formal notation for characterising sets of strings that follow a fairly simple regular pattern. We can construct regular expressions over an alphabet Σ as follows: Regular expression Languages empty set: ∅ {} { ǫ } empty string: ǫ symbol ( ∀ a ∈ Σ ): a { a } If a and b are reg exp, so are: concatenation: a · b { ab } disjunction (or union): ( a | b ) { a , b } Kleene star (or closure): { ǫ, a , aa , aaa , aaaaa , . . . } a ∗ • we often ignore the dot in concatenation ( a · b ) and simply write ab • disjunction (or union) may be written as a | b , a + b or a ∪ b • a + is the set of a -strings with at least one a ; can be used to abbreviate a ∗ a or aa ∗ • the notation Σ ∗ can be seen as abbreviating ( a | b | ... ) ∗ for any symbol a , b , . . . in Σ . • Σ n can be seen as abbreviating the concatenation of ( a | b | ... ) with itself n times Raquel Fernández TTTV 2014 - lecture 1a 15

  37. Regular Expressions Regular expressions are a formal notation for characterising sets of strings that follow a fairly simple regular pattern. We can construct regular expressions over an alphabet Σ as follows: Regular expression Languages empty set: ∅ {} { ǫ } empty string: ǫ symbol ( ∀ a ∈ Σ ): a { a } If a and b are reg exp, so are: concatenation: a · b { ab } disjunction (or union): ( a | b ) { a , b } Kleene star (or closure): { ǫ, a , aa , aaa , aaaaa , . . . } a ∗ • we often ignore the dot in concatenation ( a · b ) and simply write ab • disjunction (or union) may be written as a | b , a + b or a ∪ b • a + is the set of a -strings with at least one a ; can be used to abbreviate a ∗ a or aa ∗ • the notation Σ ∗ can be seen as abbreviating ( a | b | ... ) ∗ for any symbol a , b , . . . in Σ . • Σ n can be seen as abbreviating the concatenation of ( a | b | ... ) with itself n times • a n can be used to abbreviate the concatenation of a with itself n times Raquel Fernández TTTV 2014 - lecture 1a 15

  38. Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language Raquel Fernández TTTV 2014 - lecture 1a 16

  39. Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c Raquel Fernández TTTV 2014 - lecture 1a 16

  40. Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } Raquel Fernández TTTV 2014 - lecture 1a 16

  41. Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c Raquel Fernández TTTV 2014 - lecture 1a 16

  42. Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c { c , ac , bc , aac , abc , bac , bbc , babc . . . } Raquel Fernández TTTV 2014 - lecture 1a 16

  43. Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c { c , ac , bc , aac , abc , bac , bbc , babc . . . } ( a ∗ c ) | ( b ∗ c ) Raquel Fernández TTTV 2014 - lecture 1a 16

  44. Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c { c , ac , bc , aac , abc , bac , bbc , babc . . . } ( a ∗ c ) | ( b ∗ c ) { c , ac , aac , aaac , . . . bc , bbc , bbbc , . . . } Raquel Fernández TTTV 2014 - lecture 1a 16

  45. Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c { c , ac , bc , aac , abc , bac , bbc , babc . . . } ( a ∗ c ) | ( b ∗ c ) { c , ac , aac , aaac , . . . bc , bbc , bbbc , . . . } me ( o ) ∗ w Raquel Fernández TTTV 2014 - lecture 1a 16

  46. Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c { c , ac , bc , aac , abc , bac , bbc , babc . . . } ( a ∗ c ) | ( b ∗ c ) { c , ac , aac , aaac , . . . bc , bbc , bbbc , . . . } me ( o ) ∗ w { mew , meow , meoow , meooow , meooooooooow , . . . } Raquel Fernández TTTV 2014 - lecture 1a 16

  47. Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c { c , ac , bc , aac , abc , bac , bbc , babc . . . } ( a ∗ c ) | ( b ∗ c ) { c , ac , aac , aaac , . . . bc , bbc , bbbc , . . . } me ( o ) ∗ w { mew , meow , meoow , meooow , meooooooooow , . . . } ba ( a ) + Raquel Fernández TTTV 2014 - lecture 1a 16

  48. Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c { c , ac , bc , aac , abc , bac , bbc , babc . . . } ( a ∗ c ) | ( b ∗ c ) { c , ac , aac , aaac , . . . bc , bbc , bbbc , . . . } me ( o ) ∗ w { mew , meow , meoow , meooow , meooooooooow , . . . } ba ( a ) + { baa , baaaaaa , baaaaaaaaa , . . . } Raquel Fernández TTTV 2014 - lecture 1a 16

  49. Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c { c , ac , bc , aac , abc , bac , bbc , babc . . . } ( a ∗ c ) | ( b ∗ c ) { c , ac , aac , aaac , . . . bc , bbc , bbbc , . . . } me ( o ) ∗ w { mew , meow , meoow , meooow , meooooooooow , . . . } ba ( a ) + { baa , baaaaaa , baaaaaaaaa , . . . } sun Σ ∗ Raquel Fernández TTTV 2014 - lecture 1a 16

  50. Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c { c , ac , bc , aac , abc , bac , bbc , babc . . . } ( a ∗ c ) | ( b ∗ c ) { c , ac , aac , aaac , . . . bc , bbc , bbbc , . . . } me ( o ) ∗ w { mew , meow , meoow , meooow , meooooooooow , . . . } ba ( a ) + { baa , baaaaaa , baaaaaaaaa , . . . } sun Σ ∗ { sun , sunglasses , sunset , sunz , sunaaaaa , sunyxjshiksr . . . } Raquel Fernández TTTV 2014 - lecture 1a 16

  51. Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c { c , ac , bc , aac , abc , bac , bbc , babc . . . } ( a ∗ c ) | ( b ∗ c ) { c , ac , aac , aaac , . . . bc , bbc , bbbc , . . . } me ( o ) ∗ w { mew , meow , meoow , meooow , meooooooooow , . . . } ba ( a ) + { baa , baaaaaa , baaaaaaaaa , . . . } sun Σ ∗ { sun , sunglasses , sunset , sunz , sunaaaaa , sunyxjshiksr . . . } ( co ) 2 Σ ∗ Raquel Fernández TTTV 2014 - lecture 1a 16

  52. Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c { c , ac , bc , aac , abc , bac , bbc , babc . . . } ( a ∗ c ) | ( b ∗ c ) { c , ac , aac , aaac , . . . bc , bbc , bbbc , . . . } me ( o ) ∗ w { mew , meow , meoow , meooow , meooooooooow , . . . } ba ( a ) + { baa , baaaaaa , baaaaaaaaa , . . . } sun Σ ∗ { sun , sunglasses , sunset , sunz , sunaaaaa , sunyxjshiksr . . . } ( co ) 2 Σ ∗ { coco , cocoa , coconut , cocoz , coconjsbfx , cocococovuyfvco . . . } Raquel Fernández TTTV 2014 - lecture 1a 16

  53. Regular Expressions in Programming Languages Many programming languages such as Perl, Python, Java, or Unix tools like grep include ways to specify regular expressions. For instance: Perl notation underlying regular expression ranges: ( a | b | c | d | . . . | z ) [a-z] optionality: colo(u)?r colo ( u | ǫ ) r digits: (0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9) \d There are many other options, such as negation, upper-/lower-case, white spaces, etc. Raquel Fernández TTTV 2014 - lecture 1a 17

  54. Regular Expressions in Programming Languages Many programming languages such as Perl, Python, Java, or Unix tools like grep include ways to specify regular expressions. For instance: Perl notation underlying regular expression ranges: ( a | b | c | d | . . . | z ) [a-z] optionality: colo(u)?r colo ( u | ǫ ) r digits: (0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9) \d There are many other options, such as negation, upper-/lower-case, white spaces, etc. These operators are “syntactic sugar”: all regular expressions can be constructed with the basic operations we have seen earlier: concatenation, disjunction, and kleene star, plus the empty string. Raquel Fernández TTTV 2014 - lecture 1a 17

  55. Regular Expressions in Programming Languages Many programming languages such as Perl, Python, Java, or Unix tools like grep include ways to specify regular expressions. For instance: Perl notation underlying regular expression ranges: ( a | b | c | d | . . . | z ) [a-z] optionality: colo(u)?r colo ( u | ǫ ) r digits: (0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9) \d There are many other options, such as negation, upper-/lower-case, white spaces, etc. These operators are “syntactic sugar”: all regular expressions can be constructed with the basic operations we have seen earlier: concatenation, disjunction, and kleene star, plus the empty string. This syntactic sugar, however, is very useful! regex are used all over the place for string search. This won’t be covered in class – see the book and practice in the werkcolleges. Raquel Fernández TTTV 2014 - lecture 1a 17

  56. Regular Expressions and Automata • A regular expression allows us characterise a formal language declaratively. • We can characterise the same language procedurally by means of an automaton that specifies how the language is computed. Raquel Fernández TTTV 2014 - lecture 1a 18

  57. Regular Expressions and Automata • A regular expression allows us characterise a formal language declaratively. • We can characterise the same language procedurally by means of an automaton that specifies how the language is computed. regular expression: meo ∗ w regular expression: ( b | c ) ar Raquel Fernández TTTV 2014 - lecture 1a 18

  58. Regular Expressions and Automata • A regular expression allows us characterise a formal language declaratively. • We can characterise the same language procedurally by means of an automaton that specifies how the language is computed. regular expression: meo ∗ w regular expression: ( b | c ) ar language: { mew , meow , meoow , meooow ... } language: { bar , car } Raquel Fernández TTTV 2014 - lecture 1a 18

  59. Regular Expressions and Automata • A regular expression allows us characterise a formal language declaratively. • We can characterise the same language procedurally by means of an automaton that specifies how the language is computed. regular expression: meo ∗ w regular expression: ( b | c ) ar language: { mew , meow , meoow , meooow ... } language: { bar , car } o c m e w a r q 0 q 1 q 2 q 3 q 0 q 1 q 2 q 3 b Raquel Fernández TTTV 2014 - lecture 1a 18

  60. Finite State Automata: Formal Definition Raquel Fernández TTTV 2014 - lecture 1a 19

  61. Finite State Automata: Formal Definition We can formally specify an FSA by the following 5 parameters: • Q : a finite set of states • Σ : a finite input alphabet • q 0 ∈ Q : the start state • F : a set of final or accepting states ( F ⊆ Q ) • δ : a transition function between states that maps pairs of states and input symbols � q , i � to a new state q ′ ( Q × Σ → Q ). Raquel Fernández TTTV 2014 - lecture 1a 19

  62. Finite State Automata: Formal Definition We can formally specify an FSA by the following 5 parameters: • Q : a finite set of states • Σ : a finite input alphabet • q 0 ∈ Q : the start state • F : a set of final or accepting states ( F ⊆ Q ) • δ : a transition function between states that maps pairs of states and input symbols � q , i � to a new state q ′ ( Q × Σ → Q ). c a r q 0 q 1 q 2 q 3 b Raquel Fernández TTTV 2014 - lecture 1a 19

  63. Finite State Automata: Formal Definition We can formally specify an FSA by the following 5 parameters: • Q : a finite set of states • Σ : a finite input alphabet • q 0 ∈ Q : the start state • F : a set of final or accepting states ( F ⊆ Q ) • δ : a transition function between states that maps pairs of states and input symbols � q , i � to a new state q ′ ( Q × Σ → Q ). Q = { q 0 , q 1 , q 2 , q 3 } c a r Σ = { a , b , c , r } q 0 q 1 q 2 q 3 b start state: q 0 F = { q 3 } δ = { ( � q 0 , c � , q 1 ) , ( � q 0 , b � , q 1 ) , ( � q 1 , a � , q 2 ) , ( � q 2 , r � , q 3 ) } Raquel Fernández TTTV 2014 - lecture 1a 19

  64. Finite State Automata with ǫ -transitions We may extend the transition function δ with the empty string symbol ǫ , so that δ = Q × Σ ∪ { ǫ } → Q regular expression: colo ( u | ǫ ) r Raquel Fernández TTTV 2014 - lecture 1a 20

  65. Finite State Automata with ǫ -transitions We may extend the transition function δ with the empty string symbol ǫ , so that δ = Q × Σ ∪ { ǫ } → Q u colo r regular expression: colo ( u | ǫ ) r q 0 q 1 q 2 q 3 ǫ Raquel Fernández TTTV 2014 - lecture 1a 20

  66. Finite State Automata with ǫ -transitions We may extend the transition function δ with the empty string symbol ǫ , so that δ = Q × Σ ∪ { ǫ } → Q u colo r regular expression: colo ( u | ǫ ) r q 0 q 1 q 2 q 3 ǫ . . . but note that every FSA with ǫ -transitions is equivalent to one without them: Raquel Fernández TTTV 2014 - lecture 1a 20

  67. Finite State Automata with ǫ -transitions We may extend the transition function δ with the empty string symbol ǫ , so that δ = Q × Σ ∪ { ǫ } → Q u colo r regular expression: colo ( u | ǫ ) r q 0 q 1 q 2 q 3 ǫ . . . but note that every FSA with ǫ -transitions is equivalent to one without them: colo u r q 0 q 1 q 2 q 3 regular expression: colo ( u | ǫ ) r r (more on this when we discuss non-deterministic FSAs) Raquel Fernández TTTV 2014 - lecture 1a 20

  68. Regular Expressions and FSAs Regular expressions and FSAs capture the same class of languages: any language that is the denotation of some regular expression can be computed by an FSA, and vice-versa. Raquel Fernández TTTV 2014 - lecture 1a 21

  69. Regular Expressions and FSAs Regular expressions and FSAs capture the same class of languages: any language that is the denotation of some regular expression can be computed by an FSA, and vice-versa. Let’s see how we can build an FSA from any regular expression. Reg. exp. Languages ∅ {} empty set: empty string: ǫ { ǫ } symbol ( ∀ a ∈ Σ ): { a } a If a and b are reg exp, so are: concatenation: ab { ab } disjunction (or union): ( ab | ba ) { ab , ba } Kleene star (or closure): a ∗ { ǫ, a , aa , aaa , aaaaa , . . . } Strategy: • Base case: build an automaton for simple expressions • Inductive step: show how to reproduce each of the operations on regular expressions with an automaton Raquel Fernández TTTV 2014 - lecture 1a 21

  70. From Reg Exp to FSA: Base Case Regular expression Corresponding FSAs a q 0 q 1 a Raquel Fernández TTTV 2014 - lecture 1a 22

  71. From Reg Exp to FSA: Base Case Regular expression Corresponding FSAs a q 0 q 1 a ǫ q 0 q 1 ǫ Raquel Fernández TTTV 2014 - lecture 1a 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend