Taaltheorie en Taalverwerking BSc Artificial Intelligence Raquel - - PowerPoint PPT Presentation

taaltheorie en taalverwerking
SMART_READER_LITE
LIVE PREVIEW

Taaltheorie en Taalverwerking BSc Artificial Intelligence Raquel - - PowerPoint PPT Presentation

Taaltheorie en Taalverwerking BSc Artificial Intelligence Raquel Fernndez Institute for Logic, Language, and Computation Winter 2014, lecture 1a Raquel Fernndez TTTV 2014 - lecture 1a 1 TTTV: Practical Matters Lecturer: Raquel


slide-1
SLIDE 1

Taaltheorie en Taalverwerking

BSc Artificial Intelligence

Raquel Fernández Institute for Logic, Language, and Computation

Winter 2014, lecture 1a

Raquel Fernández TTTV 2014 - lecture 1a 1

slide-2
SLIDE 2

TTTV: Practical Matters

  • Lecturer:

∗ Raquel Fernández – raquel.fernandez@uva.nl Institute for Logic, Language & Computation (ILLC) ∗ Building SP 107, room F1.07 (meeting by appointment) ∗ http://www.illc.uva.nl/~raquel

Raquel Fernández TTTV 2014 - lecture 1a 2

slide-3
SLIDE 3

TTTV: Practical Matters

  • Lecturer:

∗ Raquel Fernández – raquel.fernandez@uva.nl Institute for Logic, Language & Computation (ILLC) ∗ Building SP 107, room F1.07 (meeting by appointment) ∗ http://www.illc.uva.nl/~raquel

  • Teaching Assistants: meet them today at the practical session

∗ Group A: Lewis Zwart ∗ Group B: Mart van Baalen ∗ Group C: Nick de Wolf ∗ Group D: Ysbrand Galama

Raquel Fernández TTTV 2014 - lecture 1a 2

slide-4
SLIDE 4

TTTV: Practical Matters

  • Lecturer:

∗ Raquel Fernández – raquel.fernandez@uva.nl Institute for Logic, Language & Computation (ILLC) ∗ Building SP 107, room F1.07 (meeting by appointment) ∗ http://www.illc.uva.nl/~raquel

  • Teaching Assistants: meet them today at the practical session

∗ Group A: Lewis Zwart ∗ Group B: Mart van Baalen ∗ Group C: Nick de Wolf ∗ Group D: Ysbrand Galama

  • Course information: visit the Blackboard site for the course

∗ studiewijzer ∗ weekly materials such as readings and lecture slides ∗ exercises, assignments, including submissions

  • Book: Jurafsky & Martin, Speech and Language Processing

Raquel Fernández TTTV 2014 - lecture 1a 2

slide-5
SLIDE 5

What is this course about?

Raquel Fernández TTTV 2014 - lecture 1a 3

slide-6
SLIDE 6

What is this course about?

Taaltheorie en Taalverwerking ≈ Computational Linguistics

Raquel Fernández TTTV 2014 - lecture 1a 3

slide-7
SLIDE 7

What is this course about?

Taaltheorie en Taalverwerking ≈ Computational Linguistics ≈ Natural Language Processing

Raquel Fernández TTTV 2014 - lecture 1a 3

slide-8
SLIDE 8

What is this course about?

Taaltheorie en Taalverwerking ≈ Computational Linguistics ≈ Natural Language Processing

“Computational linguistics (CL) is a discipline at the interface of linguistics and computer science which is concerned with the computational aspects of the human language faculty. It belongs to the cognitive sciences and overlaps with the field of artificial intelligence, a branch of computer science aiming at computational models of human cognition.” Hans Uszkoreit

Raquel Fernández TTTV 2014 - lecture 1a 3

slide-9
SLIDE 9

Levels of Language Processing

Language is very easy for us, but very difficult for computers. What kind of knowledge do we need to be able to process, understand, and properly react to language?

Raquel Fernández TTTV 2014 - lecture 1a 4

slide-10
SLIDE 10

Levels of Language Processing

Language is very easy for us, but very difficult for computers. What kind of knowledge do we need to be able to process, understand, and properly react to language?

  • phonetics and phonology

the sequences of sounds that make up a language

  • morphology

the meaningful components of words

  • syntax

the structural relationships between words in a sentence

  • semantics

the meaning of words and sentences

  • pragmatics

the relation between meaning and the intentions of the speakers

Raquel Fernández TTTV 2014 - lecture 1a 4

slide-11
SLIDE 11

Levels of Language Processing

  • Most often, we are interested in semantics:

“Mary phoned the president”

Raquel Fernández TTTV 2014 - lecture 1a 5

slide-12
SLIDE 12

Levels of Language Processing

  • Most often, we are interested in semantics:

“Mary phoned the president” who did what

Raquel Fernández TTTV 2014 - lecture 1a 5

slide-13
SLIDE 13

Levels of Language Processing

  • Most often, we are interested in semantics:

“Mary phoned the president” who did what (to whom)

Raquel Fernández TTTV 2014 - lecture 1a 5

slide-14
SLIDE 14

Levels of Language Processing

  • Most often, we are interested in semantics:

“Mary phoned the president” who did what (to whom) (when

Raquel Fernández TTTV 2014 - lecture 1a 5

slide-15
SLIDE 15

Levels of Language Processing

  • Most often, we are interested in semantics:

“Mary phoned the president” who did what (to whom) (when, where

Raquel Fernández TTTV 2014 - lecture 1a 5

slide-16
SLIDE 16

Levels of Language Processing

  • Most often, we are interested in semantics:

“Mary phoned the president” who did what (to whom) (when, where ...why)

Raquel Fernández TTTV 2014 - lecture 1a 5

slide-17
SLIDE 17

Levels of Language Processing

  • Most often, we are interested in semantics:

“Mary phoned the president” who did what (to whom) (when, where ...why) ∃x[President(x) ∧ Phone(Mary, x)]

Raquel Fernández TTTV 2014 - lecture 1a 5

slide-18
SLIDE 18

Levels of Language Processing

  • Most often, we are interested in semantics:

“Mary phoned the president” who did what (to whom) (when, where ...why) ∃x[President(x) ∧ Phone(Mary, x)]

morphology & syntax help us to get to the meaning we are after

Raquel Fernández TTTV 2014 - lecture 1a 5

slide-19
SLIDE 19

Levels of Language Processing

  • Most often, we are interested in semantics:

“Mary phoned the president” who did what (to whom) (when, where ...why) ∃x[President(x) ∧ Phone(Mary, x)]

morphology & syntax help us to get to the meaning we are after

  • Sometimes, we need less than deep semantic representations

bag of words approach: {phone, president} ≈ {call, leader}

Raquel Fernández TTTV 2014 - lecture 1a 5

slide-20
SLIDE 20

Levels of Language Processing

  • Most often, we are interested in semantics:

“Mary phoned the president” who did what (to whom) (when, where ...why) ∃x[President(x) ∧ Phone(Mary, x)]

morphology & syntax help us to get to the meaning we are after

  • Sometimes, we need less than deep semantic representations

bag of words approach: {phone, president} ≈ {call, leader}

  • Sometimes we need more. . . pragmatics

“Mary has a gun” a statement? a warning? “Can you plass me the salt?” a question? a command?

Raquel Fernández TTTV 2014 - lecture 1a 5

slide-21
SLIDE 21

In this course we’ll focus on. . .

  • syntax, with a bit of morphology (part 1)
  • semantics (part 2)

See the list of readings in BB and the learning objectives in the studiewijzer.

Raquel Fernández TTTV 2014 - lecture 1a 6

slide-22
SLIDE 22

Related courses in the AI curriculum

We will build on knowledge and skills you have acquired during the first semester of the 1st year:

  • Logisch Programmeren en Zoektechnieken
  • Inleiding Logica

Raquel Fernández TTTV 2014 - lecture 1a 7

slide-23
SLIDE 23

Related courses in the AI curriculum

We will build on knowledge and skills you have acquired during the first semester of the 1st year:

  • Logisch Programmeren en Zoektechnieken
  • Inleiding Logica

Other language-related courses in subsequent years:

  • 2nd year: Natuurlijke Taalmodellen en Interfaces
  • 3rd year: Discourse

Possibility to do your Bachelor’s thesis on language-related topics.

Raquel Fernández TTTV 2014 - lecture 1a 7

slide-24
SLIDE 24

Let’s get started

We will start by looking into the structural properties of language. In particular, this week we’ll look into Formal Language Theory, which studies the structural regularities of “languages” as abstract entities.

Raquel Fernández TTTV 2014 - lecture 1a 8

slide-25
SLIDE 25

Formal Languages: strings and alphabets

A formal language is a set of strings, each string composed of symbols from a finite set called an alphabet (or a vocabulary).

Raquel Fernández TTTV 2014 - lecture 1a 9

slide-26
SLIDE 26

Formal Languages: strings and alphabets

A formal language is a set of strings, each string composed of symbols from a finite set called an alphabet (or a vocabulary).

Examples

  • Let Σ1 = {0, 1} be an alphabet. Then all binary numbers are strings over Σ1.

For instance: 01101, 000001, 1101.

Raquel Fernández TTTV 2014 - lecture 1a 9

slide-27
SLIDE 27

Formal Languages: strings and alphabets

A formal language is a set of strings, each string composed of symbols from a finite set called an alphabet (or a vocabulary).

Examples

  • Let Σ1 = {0, 1} be an alphabet. Then all binary numbers are strings over Σ1.

For instance: 01101, 000001, 1101.

  • Let Σ2 = {a, b, c, d, e, f , g} be an alphabet.

Raquel Fernández TTTV 2014 - lecture 1a 9

slide-28
SLIDE 28

Formal Languages: strings and alphabets

A formal language is a set of strings, each string composed of symbols from a finite set called an alphabet (or a vocabulary).

Examples

  • Let Σ1 = {0, 1} be an alphabet. Then all binary numbers are strings over Σ1.

For instance: 01101, 000001, 1101.

  • Let Σ2 = {a, b, c, d, e, f , g} be an alphabet. Then bee, dad, cabbage, and face are

strings over Σ2, as are fffff and agagag.

Raquel Fernández TTTV 2014 - lecture 1a 9

slide-29
SLIDE 29

Formal Languages: strings and alphabets

A formal language is a set of strings, each string composed of symbols from a finite set called an alphabet (or a vocabulary).

Examples

  • Let Σ1 = {0, 1} be an alphabet. Then all binary numbers are strings over Σ1.

For instance: 01101, 000001, 1101.

  • Let Σ2 = {a, b, c, d, e, f , g} be an alphabet. Then bee, dad, cabbage, and face are

strings over Σ2, as are fffff and agagag.

  • Let Σ3 = {ba, ca, fa, ce, fe, ge} be an alphabet.

Raquel Fernández TTTV 2014 - lecture 1a 9

slide-30
SLIDE 30

Formal Languages: strings and alphabets

A formal language is a set of strings, each string composed of symbols from a finite set called an alphabet (or a vocabulary).

Examples

  • Let Σ1 = {0, 1} be an alphabet. Then all binary numbers are strings over Σ1.

For instance: 01101, 000001, 1101.

  • Let Σ2 = {a, b, c, d, e, f , g} be an alphabet. Then bee, dad, cabbage, and face are

strings over Σ2, as are fffff and agagag.

  • Let Σ3 = {ba, ca, fa, ce, fe, ge} be an alphabet. Then face is a string over Σ3 but

bee, dad or cabbage are not.

Raquel Fernández TTTV 2014 - lecture 1a 9

slide-31
SLIDE 31

Formal Languages: strings and alphabets

A formal language is a set of strings, each string composed of symbols from a finite set called an alphabet (or a vocabulary).

Examples

  • Let Σ1 = {0, 1} be an alphabet. Then all binary numbers are strings over Σ1.

For instance: 01101, 000001, 1101.

  • Let Σ2 = {a, b, c, d, e, f , g} be an alphabet. Then bee, dad, cabbage, and face are

strings over Σ2, as are fffff and agagag.

  • Let Σ3 = {ba, ca, fa, ce, fe, ge} be an alphabet. Then face is a string over Σ3 but

bee, dad or cabbage are not.

  • Let Σ4 = {♠, △, ♣} be an alphabet. Then ♠♠ and ♣△♣ are strings over Σ4.

Raquel Fernández TTTV 2014 - lecture 1a 9

slide-32
SLIDE 32

Strings and Substrings

The length of a string is the number of token symbols from the alphabet it contains.

Raquel Fernández TTTV 2014 - lecture 1a 10

slide-33
SLIDE 33

Strings and Substrings

The length of a string is the number of token symbols from the alphabet it contains.

Examples

  • the length of face over Σ2 = {a, b, c, d, e, f , g} is 4

Raquel Fernández TTTV 2014 - lecture 1a 10

slide-34
SLIDE 34

Strings and Substrings

The length of a string is the number of token symbols from the alphabet it contains.

Examples

  • the length of face over Σ2 = {a, b, c, d, e, f , g} is 4
  • the length of face over Σ3 = {ba, ca, fa, ce, fe, ge} is

Raquel Fernández TTTV 2014 - lecture 1a 10

slide-35
SLIDE 35

Strings and Substrings

The length of a string is the number of token symbols from the alphabet it contains.

Examples

  • the length of face over Σ2 = {a, b, c, d, e, f , g} is 4
  • the length of face over Σ3 = {ba, ca, fa, ce, fe, ge} is 2

Raquel Fernández TTTV 2014 - lecture 1a 10

slide-36
SLIDE 36

Strings and Substrings

The length of a string is the number of token symbols from the alphabet it contains.

Examples

  • the length of face over Σ2 = {a, b, c, d, e, f , g} is 4
  • the length of face over Σ3 = {ba, ca, fa, ce, fe, ge} is 2

The string of length 0 is called the empty string, denoted ǫ

Raquel Fernández TTTV 2014 - lecture 1a 10

slide-37
SLIDE 37

Strings and Substrings

The length of a string is the number of token symbols from the alphabet it contains.

Examples

  • the length of face over Σ2 = {a, b, c, d, e, f , g} is 4
  • the length of face over Σ3 = {ba, ca, fa, ce, fe, ge} is 2

The string of length 0 is called the empty string, denoted ǫ Given a string s, a substring of s is a string formed by taking contiguous symbols of s in the order in which they occurr in s. An initial substring is called a prefix and a final substring, a suffix.

Raquel Fernández TTTV 2014 - lecture 1a 10

slide-38
SLIDE 38

Strings and Substrings

The length of a string is the number of token symbols from the alphabet it contains.

Examples

  • the length of face over Σ2 = {a, b, c, d, e, f , g} is 4
  • the length of face over Σ3 = {ba, ca, fa, ce, fe, ge} is 2

The string of length 0 is called the empty string, denoted ǫ Given a string s, a substring of s is a string formed by taking contiguous symbols of s in the order in which they occurr in s. An initial substring is called a prefix and a final substring, a suffix.

Examples Let unthinkable be a string over Σ = {a, b, c . . . x, y, z}

Raquel Fernández TTTV 2014 - lecture 1a 10

slide-39
SLIDE 39

Strings and Substrings

The length of a string is the number of token symbols from the alphabet it contains.

Examples

  • the length of face over Σ2 = {a, b, c, d, e, f , g} is 4
  • the length of face over Σ3 = {ba, ca, fa, ce, fe, ge} is 2

The string of length 0 is called the empty string, denoted ǫ Given a string s, a substring of s is a string formed by taking contiguous symbols of s in the order in which they occurr in s. An initial substring is called a prefix and a final substring, a suffix.

Examples Let unthinkable be a string over Σ = {a, b, c . . . x, y, z} Then, ǫ, un, unth, unthinkable are prefixes, while ǫ, e, able, thinkable, and unthinkable are suffixes. Other substrings include nthi, inka, bl.

Raquel Fernández TTTV 2014 - lecture 1a 10

slide-40
SLIDE 40

Some Operations on Strings

Raquel Fernández TTTV 2014 - lecture 1a 11

slide-41
SLIDE 41

Some Operations on Strings

  • Concatenation: two string s1 and s2 over Σ can be concatenated

(written one after the other) to form a new string s1 · s2 over Σ.

Σ = {a, b} a · b = ab

Raquel Fernández TTTV 2014 - lecture 1a 11

slide-42
SLIDE 42

Some Operations on Strings

  • Concatenation: two string s1 and s2 over Σ can be concatenated

(written one after the other) to form a new string s1 · s2 over Σ.

Σ = {a, b} a · b = ab

  • Exponent: we can apply an exponent operator n to a string s.

The resulting string sn is obtained by concatenating s with itself n times.

a0 = ǫ, a1 = a, a2 = aa, a3 = aaa. . .

Raquel Fernández TTTV 2014 - lecture 1a 11

slide-43
SLIDE 43

Some Operations on Strings

  • Concatenation: two string s1 and s2 over Σ can be concatenated

(written one after the other) to form a new string s1 · s2 over Σ.

Σ = {a, b} a · b = ab

  • Exponent: we can apply an exponent operator n to a string s.

The resulting string sn is obtained by concatenating s with itself n times.

a0 = ǫ, a1 = a, a2 = aa, a3 = aaa. . .

  • Kleene star: a special exponent operator ∗ which applied to a

string s denotes any string obtained by concatenating s with itself any number of times.

a∗ = ǫ or a or aa or aaa . . .

Raquel Fernández TTTV 2014 - lecture 1a 11

slide-44
SLIDE 44

Formal Languages

Σ∗

Raquel Fernández TTTV 2014 - lecture 1a 12

slide-45
SLIDE 45

Formal Languages

Σ∗ denotes the set of all strings over an alphabet Σ.

→ note that Σ∗ is always infinite, regardless of the number of symbols Σ contains.

Raquel Fernández TTTV 2014 - lecture 1a 12

slide-46
SLIDE 46

Formal Languages

Σ∗ denotes the set of all strings over an alphabet Σ.

→ note that Σ∗ is always infinite, regardless of the number of symbols Σ contains.

We may now define a formal language over an alphabet Σ as any subset of Σ∗

Raquel Fernández TTTV 2014 - lecture 1a 12

slide-47
SLIDE 47

Formal Languages

Σ∗ denotes the set of all strings over an alphabet Σ.

→ note that Σ∗ is always infinite, regardless of the number of symbols Σ contains.

We may now define a formal language over an alphabet Σ as any subset of Σ∗

Examples of formal languages Let Σ = {a, b, c . . . x, y, z}. Then Σ∗ is the set of strings over the Latin alphabet and the following subsets of Σ∗ are possible formal languages:

Raquel Fernández TTTV 2014 - lecture 1a 12

slide-48
SLIDE 48

Formal Languages

Σ∗ denotes the set of all strings over an alphabet Σ.

→ note that Σ∗ is always infinite, regardless of the number of symbols Σ contains.

We may now define a formal language over an alphabet Σ as any subset of Σ∗

Examples of formal languages Let Σ = {a, b, c . . . x, y, z}. Then Σ∗ is the set of strings over the Latin alphabet and the following subsets of Σ∗ are possible formal languages:

  • the set of strings consisting of consonants only

Raquel Fernández TTTV 2014 - lecture 1a 12

slide-49
SLIDE 49

Formal Languages

Σ∗ denotes the set of all strings over an alphabet Σ.

→ note that Σ∗ is always infinite, regardless of the number of symbols Σ contains.

We may now define a formal language over an alphabet Σ as any subset of Σ∗

Examples of formal languages Let Σ = {a, b, c . . . x, y, z}. Then Σ∗ is the set of strings over the Latin alphabet and the following subsets of Σ∗ are possible formal languages:

  • the set of strings consisting of consonants only
  • the set of strings containing at least one vowel and one consonant

Raquel Fernández TTTV 2014 - lecture 1a 12

slide-50
SLIDE 50

Formal Languages

Σ∗ denotes the set of all strings over an alphabet Σ.

→ note that Σ∗ is always infinite, regardless of the number of symbols Σ contains.

We may now define a formal language over an alphabet Σ as any subset of Σ∗

Examples of formal languages Let Σ = {a, b, c . . . x, y, z}. Then Σ∗ is the set of strings over the Latin alphabet and the following subsets of Σ∗ are possible formal languages:

  • the set of strings consisting of consonants only
  • the set of strings containing at least one vowel and one consonant
  • the set of strings whose length is less than 9 symbols

Raquel Fernández TTTV 2014 - lecture 1a 12

slide-51
SLIDE 51

Formal Languages

Σ∗ denotes the set of all strings over an alphabet Σ.

→ note that Σ∗ is always infinite, regardless of the number of symbols Σ contains.

We may now define a formal language over an alphabet Σ as any subset of Σ∗

Examples of formal languages Let Σ = {a, b, c . . . x, y, z}. Then Σ∗ is the set of strings over the Latin alphabet and the following subsets of Σ∗ are possible formal languages:

  • the set of strings consisting of consonants only
  • the set of strings containing at least one vowel and one consonant
  • the set of strings whose length is less than 9 symbols
  • the set {one, two, three, four, five, six, seven, eight, nine, ten}

Raquel Fernández TTTV 2014 - lecture 1a 12

slide-52
SLIDE 52

Formal Languages

Σ∗ denotes the set of all strings over an alphabet Σ.

→ note that Σ∗ is always infinite, regardless of the number of symbols Σ contains.

We may now define a formal language over an alphabet Σ as any subset of Σ∗

Examples of formal languages Let Σ = {a, b, c . . . x, y, z}. Then Σ∗ is the set of strings over the Latin alphabet and the following subsets of Σ∗ are possible formal languages:

  • the set of strings consisting of consonants only
  • the set of strings containing at least one vowel and one consonant
  • the set of strings whose length is less than 9 symbols
  • the set {one, two, three, four, five, six, seven, eight, nine, ten}
  • the set of all English words

Raquel Fernández TTTV 2014 - lecture 1a 12

slide-53
SLIDE 53

Formal Languages

Σ∗ denotes the set of all strings over an alphabet Σ.

→ note that Σ∗ is always infinite, regardless of the number of symbols Σ contains.

We may now define a formal language over an alphabet Σ as any subset of Σ∗

Examples of formal languages Let Σ = {a, b, c . . . x, y, z}. Then Σ∗ is the set of strings over the Latin alphabet and the following subsets of Σ∗ are possible formal languages:

  • the set of strings consisting of consonants only
  • the set of strings containing at least one vowel and one consonant
  • the set of strings whose length is less than 9 symbols
  • the set {one, two, three, four, five, six, seven, eight, nine, ten}
  • the set of all English words
  • the empty set

Raquel Fernández TTTV 2014 - lecture 1a 12

slide-54
SLIDE 54

Formal Languages

How can we characterise the language(s) we are interested in?

  • given an alphabet Σ and the infinite set Σ∗ of formal languages it can

give rise to, how can we select a particular formal language?

Raquel Fernández TTTV 2014 - lecture 1a 13

slide-55
SLIDE 55

Formal Languages

How can we characterise the language(s) we are interested in?

  • given an alphabet Σ and the infinite set Σ∗ of formal languages it can

give rise to, how can we select a particular formal language? For instance: ∗ can we distinguish the set of strings of letters that constitute proper English words? ∗ can we distinguish the set of strings of words that count as well-formed sentences of Dutch?

Raquel Fernández TTTV 2014 - lecture 1a 13

slide-56
SLIDE 56

Formal Languages

How can we characterise the language(s) we are interested in?

  • given an alphabet Σ and the infinite set Σ∗ of formal languages it can

give rise to, how can we select a particular formal language? For instance: ∗ can we distinguish the set of strings of letters that constitute proper English words? ∗ can we distinguish the set of strings of words that count as well-formed sentences of Dutch?

We have two formal mechanisms at our disposal:

  • formalisms (formal expressions and grammars): sets of rules
  • automata: computational devices for computing languages

Raquel Fernández TTTV 2014 - lecture 1a 13

slide-57
SLIDE 57

Formalisms and Automata

  • Formalisms and automata allow us to distinguish a formal

language of interest (a set of strings) from other possible languages over a given alphabet.

∗ they capture the patterns that characterise a language ∗ as such, they act as a definition of the language they capture

  • From an abstract point of view, a natural language – like Dutch
  • r English – is a set of strings (of sounds/letters, of words, etc.)
  • Therefore, formalisms and automata can help us to model

aspects of natural languages.

Raquel Fernández TTTV 2014 - lecture 1a 14

slide-58
SLIDE 58

Formalisms and Automata

  • Formalisms and automata allow us to distinguish a formal

language of interest (a set of strings) from other possible languages over a given alphabet.

∗ they capture the patterns that characterise a language ∗ as such, they act as a definition of the language they capture

  • From an abstract point of view, a natural language – like Dutch
  • r English – is a set of strings (of sounds/letters, of words, etc.)
  • Therefore, formalisms and automata can help us to model

aspects of natural languages. This week: we’ll look into one formalism to define formal languages, regular expressions, and into one type of automaton, finite state automata.

Raquel Fernández TTTV 2014 - lecture 1a 14

slide-59
SLIDE 59

Regular Expressions

Regular expressions are a formal notation for characterising sets of strings that follow a fairly simple regular pattern.

Raquel Fernández TTTV 2014 - lecture 1a 15

slide-60
SLIDE 60

Regular Expressions

Regular expressions are a formal notation for characterising sets of strings that follow a fairly simple regular pattern. We can construct regular expressions over an alphabet Σ as follows:

Regular expression Languages empty set: ∅ {} empty string: ǫ {ǫ} symbol (∀a ∈ Σ): a {a} If a and b are reg exp, so are: concatenation: a · b {ab} disjunction (or union): (a|b) {a, b} Kleene star (or closure): a∗ {ǫ, a, aa, aaa, aaaaa, . . .}

Raquel Fernández TTTV 2014 - lecture 1a 15

slide-61
SLIDE 61

Regular Expressions

Regular expressions are a formal notation for characterising sets of strings that follow a fairly simple regular pattern. We can construct regular expressions over an alphabet Σ as follows:

Regular expression Languages empty set: ∅ {} empty string: ǫ {ǫ} symbol (∀a ∈ Σ): a {a} If a and b are reg exp, so are: concatenation: a · b {ab} disjunction (or union): (a|b) {a, b} Kleene star (or closure): a∗ {ǫ, a, aa, aaa, aaaaa, . . .}

  • we often ignore the dot in concatenation (a · b) and simply write ab
  • disjunction (or union) may be written as a|b, a + b or a ∪ b

Raquel Fernández TTTV 2014 - lecture 1a 15

slide-62
SLIDE 62

Regular Expressions

Regular expressions are a formal notation for characterising sets of strings that follow a fairly simple regular pattern. We can construct regular expressions over an alphabet Σ as follows:

Regular expression Languages empty set: ∅ {} empty string: ǫ {ǫ} symbol (∀a ∈ Σ): a {a} If a and b are reg exp, so are: concatenation: a · b {ab} disjunction (or union): (a|b) {a, b} Kleene star (or closure): a∗ {ǫ, a, aa, aaa, aaaaa, . . .}

  • we often ignore the dot in concatenation (a · b) and simply write ab
  • disjunction (or union) may be written as a|b, a + b or a ∪ b
  • a+ is the set of a-strings with at least one a;

Raquel Fernández TTTV 2014 - lecture 1a 15

slide-63
SLIDE 63

Regular Expressions

Regular expressions are a formal notation for characterising sets of strings that follow a fairly simple regular pattern. We can construct regular expressions over an alphabet Σ as follows:

Regular expression Languages empty set: ∅ {} empty string: ǫ {ǫ} symbol (∀a ∈ Σ): a {a} If a and b are reg exp, so are: concatenation: a · b {ab} disjunction (or union): (a|b) {a, b} Kleene star (or closure): a∗ {ǫ, a, aa, aaa, aaaaa, . . .}

  • we often ignore the dot in concatenation (a · b) and simply write ab
  • disjunction (or union) may be written as a|b, a + b or a ∪ b
  • a+ is the set of a-strings with at least one a; can be used to abbreviate a∗a or aa∗

Raquel Fernández TTTV 2014 - lecture 1a 15

slide-64
SLIDE 64

Regular Expressions

Regular expressions are a formal notation for characterising sets of strings that follow a fairly simple regular pattern. We can construct regular expressions over an alphabet Σ as follows:

Regular expression Languages empty set: ∅ {} empty string: ǫ {ǫ} symbol (∀a ∈ Σ): a {a} If a and b are reg exp, so are: concatenation: a · b {ab} disjunction (or union): (a|b) {a, b} Kleene star (or closure): a∗ {ǫ, a, aa, aaa, aaaaa, . . .}

  • we often ignore the dot in concatenation (a · b) and simply write ab
  • disjunction (or union) may be written as a|b, a + b or a ∪ b
  • a+ is the set of a-strings with at least one a; can be used to abbreviate a∗a or aa∗
  • the notation Σ∗ can be seen as abbreviating (a|b|...)∗ for any symbol a, b, . . . in Σ.

Raquel Fernández TTTV 2014 - lecture 1a 15

slide-65
SLIDE 65

Regular Expressions

Regular expressions are a formal notation for characterising sets of strings that follow a fairly simple regular pattern. We can construct regular expressions over an alphabet Σ as follows:

Regular expression Languages empty set: ∅ {} empty string: ǫ {ǫ} symbol (∀a ∈ Σ): a {a} If a and b are reg exp, so are: concatenation: a · b {ab} disjunction (or union): (a|b) {a, b} Kleene star (or closure): a∗ {ǫ, a, aa, aaa, aaaaa, . . .}

  • we often ignore the dot in concatenation (a · b) and simply write ab
  • disjunction (or union) may be written as a|b, a + b or a ∪ b
  • a+ is the set of a-strings with at least one a; can be used to abbreviate a∗a or aa∗
  • the notation Σ∗ can be seen as abbreviating (a|b|...)∗ for any symbol a, b, . . . in Σ.
  • Σn can be seen as abbreviating the concatenation of (a|b|...) with itself n times

Raquel Fernández TTTV 2014 - lecture 1a 15

slide-66
SLIDE 66

Regular Expressions

Regular expressions are a formal notation for characterising sets of strings that follow a fairly simple regular pattern. We can construct regular expressions over an alphabet Σ as follows:

Regular expression Languages empty set: ∅ {} empty string: ǫ {ǫ} symbol (∀a ∈ Σ): a {a} If a and b are reg exp, so are: concatenation: a · b {ab} disjunction (or union): (a|b) {a, b} Kleene star (or closure): a∗ {ǫ, a, aa, aaa, aaaaa, . . .}

  • we often ignore the dot in concatenation (a · b) and simply write ab
  • disjunction (or union) may be written as a|b, a + b or a ∪ b
  • a+ is the set of a-strings with at least one a; can be used to abbreviate a∗a or aa∗
  • the notation Σ∗ can be seen as abbreviating (a|b|...)∗ for any symbol a, b, . . . in Σ.
  • Σn can be seen as abbreviating the concatenation of (a|b|...) with itself n times
  • an can be used to abbreviate the concatenation of a with itself n times

Raquel Fernández TTTV 2014 - lecture 1a 15

slide-67
SLIDE 67

Regular Expressions: Examples

What kind of strings would the language defined by each of the following regular expressions contain?

Let Σ = {a, b, c, d...x, y, z} Regular expression Language

Raquel Fernández TTTV 2014 - lecture 1a 16

slide-68
SLIDE 68

Regular Expressions: Examples

What kind of strings would the language defined by each of the following regular expressions contain?

Let Σ = {a, b, c, d...x, y, z} Regular expression Language (ab)∗c

Raquel Fernández TTTV 2014 - lecture 1a 16

slide-69
SLIDE 69

Regular Expressions: Examples

What kind of strings would the language defined by each of the following regular expressions contain?

Let Σ = {a, b, c, d...x, y, z} Regular expression Language (ab)∗c {c, abc, ababc, abababc, . . .}

Raquel Fernández TTTV 2014 - lecture 1a 16

slide-70
SLIDE 70

Regular Expressions: Examples

What kind of strings would the language defined by each of the following regular expressions contain?

Let Σ = {a, b, c, d...x, y, z} Regular expression Language (ab)∗c {c, abc, ababc, abababc, . . .} (a|b)∗c

Raquel Fernández TTTV 2014 - lecture 1a 16

slide-71
SLIDE 71

Regular Expressions: Examples

What kind of strings would the language defined by each of the following regular expressions contain?

Let Σ = {a, b, c, d...x, y, z} Regular expression Language (ab)∗c {c, abc, ababc, abababc, . . .} (a|b)∗c {c, ac, bc, aac, abc, bac, bbc, babc . . . }

Raquel Fernández TTTV 2014 - lecture 1a 16

slide-72
SLIDE 72

Regular Expressions: Examples

What kind of strings would the language defined by each of the following regular expressions contain?

Let Σ = {a, b, c, d...x, y, z} Regular expression Language (ab)∗c {c, abc, ababc, abababc, . . .} (a|b)∗c {c, ac, bc, aac, abc, bac, bbc, babc . . . } (a∗c)|(b∗c)

Raquel Fernández TTTV 2014 - lecture 1a 16

slide-73
SLIDE 73

Regular Expressions: Examples

What kind of strings would the language defined by each of the following regular expressions contain?

Let Σ = {a, b, c, d...x, y, z} Regular expression Language (ab)∗c {c, abc, ababc, abababc, . . .} (a|b)∗c {c, ac, bc, aac, abc, bac, bbc, babc . . . } (a∗c)|(b∗c) {c, ac, aac, aaac, . . . bc, bbc, bbbc, . . .}

Raquel Fernández TTTV 2014 - lecture 1a 16

slide-74
SLIDE 74

Regular Expressions: Examples

What kind of strings would the language defined by each of the following regular expressions contain?

Let Σ = {a, b, c, d...x, y, z} Regular expression Language (ab)∗c {c, abc, ababc, abababc, . . .} (a|b)∗c {c, ac, bc, aac, abc, bac, bbc, babc . . . } (a∗c)|(b∗c) {c, ac, aac, aaac, . . . bc, bbc, bbbc, . . .} me(o)∗w

Raquel Fernández TTTV 2014 - lecture 1a 16

slide-75
SLIDE 75

Regular Expressions: Examples

What kind of strings would the language defined by each of the following regular expressions contain?

Let Σ = {a, b, c, d...x, y, z} Regular expression Language (ab)∗c {c, abc, ababc, abababc, . . .} (a|b)∗c {c, ac, bc, aac, abc, bac, bbc, babc . . . } (a∗c)|(b∗c) {c, ac, aac, aaac, . . . bc, bbc, bbbc, . . .} me(o)∗w {mew, meow, meoow, meooow, meooooooooow, . . .}

Raquel Fernández TTTV 2014 - lecture 1a 16

slide-76
SLIDE 76

Regular Expressions: Examples

What kind of strings would the language defined by each of the following regular expressions contain?

Let Σ = {a, b, c, d...x, y, z} Regular expression Language (ab)∗c {c, abc, ababc, abababc, . . .} (a|b)∗c {c, ac, bc, aac, abc, bac, bbc, babc . . . } (a∗c)|(b∗c) {c, ac, aac, aaac, . . . bc, bbc, bbbc, . . .} me(o)∗w {mew, meow, meoow, meooow, meooooooooow, . . .} ba(a)+

Raquel Fernández TTTV 2014 - lecture 1a 16

slide-77
SLIDE 77

Regular Expressions: Examples

What kind of strings would the language defined by each of the following regular expressions contain?

Let Σ = {a, b, c, d...x, y, z} Regular expression Language (ab)∗c {c, abc, ababc, abababc, . . .} (a|b)∗c {c, ac, bc, aac, abc, bac, bbc, babc . . . } (a∗c)|(b∗c) {c, ac, aac, aaac, . . . bc, bbc, bbbc, . . .} me(o)∗w {mew, meow, meoow, meooow, meooooooooow, . . .} ba(a)+ {baa, baaaaaa, baaaaaaaaa, . . .}

Raquel Fernández TTTV 2014 - lecture 1a 16

slide-78
SLIDE 78

Regular Expressions: Examples

What kind of strings would the language defined by each of the following regular expressions contain?

Let Σ = {a, b, c, d...x, y, z} Regular expression Language (ab)∗c {c, abc, ababc, abababc, . . .} (a|b)∗c {c, ac, bc, aac, abc, bac, bbc, babc . . . } (a∗c)|(b∗c) {c, ac, aac, aaac, . . . bc, bbc, bbbc, . . .} me(o)∗w {mew, meow, meoow, meooow, meooooooooow, . . .} ba(a)+ {baa, baaaaaa, baaaaaaaaa, . . .} sunΣ∗

Raquel Fernández TTTV 2014 - lecture 1a 16

slide-79
SLIDE 79

Regular Expressions: Examples

What kind of strings would the language defined by each of the following regular expressions contain?

Let Σ = {a, b, c, d...x, y, z} Regular expression Language (ab)∗c {c, abc, ababc, abababc, . . .} (a|b)∗c {c, ac, bc, aac, abc, bac, bbc, babc . . . } (a∗c)|(b∗c) {c, ac, aac, aaac, . . . bc, bbc, bbbc, . . .} me(o)∗w {mew, meow, meoow, meooow, meooooooooow, . . .} ba(a)+ {baa, baaaaaa, baaaaaaaaa, . . .} sunΣ∗ {sun, sunglasses, sunset, sunz, sunaaaaa, sunyxjshiksr . . .}

Raquel Fernández TTTV 2014 - lecture 1a 16

slide-80
SLIDE 80

Regular Expressions: Examples

What kind of strings would the language defined by each of the following regular expressions contain?

Let Σ = {a, b, c, d...x, y, z} Regular expression Language (ab)∗c {c, abc, ababc, abababc, . . .} (a|b)∗c {c, ac, bc, aac, abc, bac, bbc, babc . . . } (a∗c)|(b∗c) {c, ac, aac, aaac, . . . bc, bbc, bbbc, . . .} me(o)∗w {mew, meow, meoow, meooow, meooooooooow, . . .} ba(a)+ {baa, baaaaaa, baaaaaaaaa, . . .} sunΣ∗ {sun, sunglasses, sunset, sunz, sunaaaaa, sunyxjshiksr . . .} (co)2Σ∗

Raquel Fernández TTTV 2014 - lecture 1a 16

slide-81
SLIDE 81

Regular Expressions: Examples

What kind of strings would the language defined by each of the following regular expressions contain?

Let Σ = {a, b, c, d...x, y, z} Regular expression Language (ab)∗c {c, abc, ababc, abababc, . . .} (a|b)∗c {c, ac, bc, aac, abc, bac, bbc, babc . . . } (a∗c)|(b∗c) {c, ac, aac, aaac, . . . bc, bbc, bbbc, . . .} me(o)∗w {mew, meow, meoow, meooow, meooooooooow, . . .} ba(a)+ {baa, baaaaaa, baaaaaaaaa, . . .} sunΣ∗ {sun, sunglasses, sunset, sunz, sunaaaaa, sunyxjshiksr . . .} (co)2Σ∗ {coco, cocoa, coconut, cocoz, coconjsbfx, cocococovuyfvco . . .}

Raquel Fernández TTTV 2014 - lecture 1a 16

slide-82
SLIDE 82

Regular Expressions in Programming Languages

Many programming languages such as Perl, Python, Java, or Unix tools like grep include ways to specify regular expressions. For instance:

Perl notation underlying regular expression ranges: [a-z] (a|b|c|d| . . . |z)

  • ptionality:

colo(u)?r colo(u|ǫ)r digits: \d (0|1|2|3|4|5|6|7|8|9) There are many other options, such as negation, upper-/lower-case, white spaces, etc.

Raquel Fernández TTTV 2014 - lecture 1a 17

slide-83
SLIDE 83

Regular Expressions in Programming Languages

Many programming languages such as Perl, Python, Java, or Unix tools like grep include ways to specify regular expressions. For instance:

Perl notation underlying regular expression ranges: [a-z] (a|b|c|d| . . . |z)

  • ptionality:

colo(u)?r colo(u|ǫ)r digits: \d (0|1|2|3|4|5|6|7|8|9) There are many other options, such as negation, upper-/lower-case, white spaces, etc.

These operators are “syntactic sugar”: all regular expressions can be constructed with the basic operations we have seen earlier: concatenation, disjunction, and kleene star, plus the empty string.

Raquel Fernández TTTV 2014 - lecture 1a 17

slide-84
SLIDE 84

Regular Expressions in Programming Languages

Many programming languages such as Perl, Python, Java, or Unix tools like grep include ways to specify regular expressions. For instance:

Perl notation underlying regular expression ranges: [a-z] (a|b|c|d| . . . |z)

  • ptionality:

colo(u)?r colo(u|ǫ)r digits: \d (0|1|2|3|4|5|6|7|8|9) There are many other options, such as negation, upper-/lower-case, white spaces, etc.

These operators are “syntactic sugar”: all regular expressions can be constructed with the basic operations we have seen earlier: concatenation, disjunction, and kleene star, plus the empty string. This syntactic sugar, however, is very useful! regex are used all

  • ver the place for string search. This won’t be covered in class –

see the book and practice in the werkcolleges.

Raquel Fernández TTTV 2014 - lecture 1a 17

slide-85
SLIDE 85

Regular Expressions and Automata

  • A regular expression allows us characterise a formal language

declaratively.

  • We can characterise the same language procedurally by means of

an automaton that specifies how the language is computed.

Raquel Fernández TTTV 2014 - lecture 1a 18

slide-86
SLIDE 86

Regular Expressions and Automata

  • A regular expression allows us characterise a formal language

declaratively.

  • We can characterise the same language procedurally by means of

an automaton that specifies how the language is computed.

regular expression: meo∗w regular expression: (b|c)ar

Raquel Fernández TTTV 2014 - lecture 1a 18

slide-87
SLIDE 87

Regular Expressions and Automata

  • A regular expression allows us characterise a formal language

declaratively.

  • We can characterise the same language procedurally by means of

an automaton that specifies how the language is computed.

regular expression: meo∗w regular expression: (b|c)ar language: {mew, meow, meoow, meooow...} language: {bar, car}

Raquel Fernández TTTV 2014 - lecture 1a 18

slide-88
SLIDE 88

Regular Expressions and Automata

  • A regular expression allows us characterise a formal language

declaratively.

  • We can characterise the same language procedurally by means of

an automaton that specifies how the language is computed.

regular expression: meo∗w regular expression: (b|c)ar language: {mew, meow, meoow, meooow...} language: {bar, car} q0 q1 q2 q3 m e

  • w

q0 q1 q2 q3 c b a r

Raquel Fernández TTTV 2014 - lecture 1a 18

slide-89
SLIDE 89

Finite State Automata: Formal Definition

Raquel Fernández TTTV 2014 - lecture 1a 19

slide-90
SLIDE 90

Finite State Automata: Formal Definition

We can formally specify an FSA by the following 5 parameters:

  • Q: a finite set of states
  • Σ: a finite input alphabet
  • q0 ∈ Q: the start state
  • F: a set of final or accepting states (F ⊆ Q)
  • δ: a transition function between states that maps pairs of states

and input symbols q, i to a new state q′ (Q × Σ → Q).

Raquel Fernández TTTV 2014 - lecture 1a 19

slide-91
SLIDE 91

Finite State Automata: Formal Definition

We can formally specify an FSA by the following 5 parameters:

  • Q: a finite set of states
  • Σ: a finite input alphabet
  • q0 ∈ Q: the start state
  • F: a set of final or accepting states (F ⊆ Q)
  • δ: a transition function between states that maps pairs of states

and input symbols q, i to a new state q′ (Q × Σ → Q).

q0 q1 q2 q3 c b a r

Raquel Fernández TTTV 2014 - lecture 1a 19

slide-92
SLIDE 92

Finite State Automata: Formal Definition

We can formally specify an FSA by the following 5 parameters:

  • Q: a finite set of states
  • Σ: a finite input alphabet
  • q0 ∈ Q: the start state
  • F: a set of final or accepting states (F ⊆ Q)
  • δ: a transition function between states that maps pairs of states

and input symbols q, i to a new state q′ (Q × Σ → Q).

q0 q1 q2 q3 c b a r Q = {q0, q1, q2, q3} Σ = {a, b, c, r} start state: q0 F = {q3} δ = {(q0, c, q1), (q0, b, q1), (q1, a, q2), (q2, r, q3)}

Raquel Fernández TTTV 2014 - lecture 1a 19

slide-93
SLIDE 93

Finite State Automata with ǫ-transitions

We may extend the transition function δ with the empty string symbol ǫ, so that δ = Q × Σ ∪ {ǫ} → Q

regular expression: colo(u|ǫ)r

Raquel Fernández TTTV 2014 - lecture 1a 20

slide-94
SLIDE 94

Finite State Automata with ǫ-transitions

We may extend the transition function δ with the empty string symbol ǫ, so that δ = Q × Σ ∪ {ǫ} → Q

regular expression: colo(u|ǫ)r q0 q1 q2 q3 colo u ǫ r

Raquel Fernández TTTV 2014 - lecture 1a 20

slide-95
SLIDE 95

Finite State Automata with ǫ-transitions

We may extend the transition function δ with the empty string symbol ǫ, so that δ = Q × Σ ∪ {ǫ} → Q

regular expression: colo(u|ǫ)r q0 q1 q2 q3 colo u ǫ r

. . . but note that every FSA with ǫ-transitions is equivalent to one without them:

Raquel Fernández TTTV 2014 - lecture 1a 20

slide-96
SLIDE 96

Finite State Automata with ǫ-transitions

We may extend the transition function δ with the empty string symbol ǫ, so that δ = Q × Σ ∪ {ǫ} → Q

regular expression: colo(u|ǫ)r q0 q1 q2 q3 colo u ǫ r

. . . but note that every FSA with ǫ-transitions is equivalent to one without them:

regular expression: colo(u|ǫ)r q0 q1 q2 q3 colo u r r

(more on this when we discuss non-deterministic FSAs)

Raquel Fernández TTTV 2014 - lecture 1a 20

slide-97
SLIDE 97

Regular Expressions and FSAs

Regular expressions and FSAs capture the same class of languages: any language that is the denotation of some regular expression can be computed by an FSA, and vice-versa.

Raquel Fernández TTTV 2014 - lecture 1a 21

slide-98
SLIDE 98

Regular Expressions and FSAs

Regular expressions and FSAs capture the same class of languages: any language that is the denotation of some regular expression can be computed by an FSA, and vice-versa. Let’s see how we can build an FSA from any regular expression.

  • Reg. exp. Languages

empty set: ∅ {} empty string: ǫ {ǫ} symbol (∀a ∈ Σ): a {a} If a and b are reg exp, so are: concatenation: ab {ab} disjunction (or union): (ab|ba) {ab, ba} Kleene star (or closure): a∗ {ǫ, a, aa, aaa, aaaaa, . . .}

Strategy:

  • Base case: build an automaton for simple expressions
  • Inductive step: show how to reproduce each of the operations on

regular expressions with an automaton

Raquel Fernández TTTV 2014 - lecture 1a 21

slide-99
SLIDE 99

From Reg Exp to FSA: Base Case

Regular expression Corresponding FSAs

a q0 q1 a

Raquel Fernández TTTV 2014 - lecture 1a 22

slide-100
SLIDE 100

From Reg Exp to FSA: Base Case

Regular expression Corresponding FSAs

a q0 q1 a ǫ q0 q1 ǫ

Raquel Fernández TTTV 2014 - lecture 1a 22

slide-101
SLIDE 101

From Reg Exp to FSA: Base Case

Regular expression Corresponding FSAs

a q0 q1 a ǫ q0 q1 ǫ ∅ q0 q1

Raquel Fernández TTTV 2014 - lecture 1a 22

slide-102
SLIDE 102

From Reg Exp to FSA: Base Case

Regular expression Corresponding FSAs

a q0 q1 a ǫ q0 q1 ǫ ∅ q0 q1

  • r simply

q0

Raquel Fernández TTTV 2014 - lecture 1a 22

slide-103
SLIDE 103

From Reg Exp to FSA: Concatenation

Regular expression:

  • If a and b are regular expressions, so is ab.

Raquel Fernández TTTV 2014 - lecture 1a 23

slide-104
SLIDE 104

From Reg Exp to FSA: Concatenation

Regular expression:

  • If a and b are regular expressions, so is ab.

Corresponding FSA:

  • build FSA for a (A1), build FSA for b (A2)
  • link all final states in A1 to the initial state of A2 with ǫ-transitions.

Raquel Fernández TTTV 2014 - lecture 1a 23

slide-105
SLIDE 105

From Reg Exp to FSA: Concatenation

Regular expression:

  • If a and b are regular expressions, so is ab.

Corresponding FSA:

  • build FSA for a (A1), build FSA for b (A2)
  • link all final states in A1 to the initial state of A2 with ǫ-transitions.

This gives us a general procedure regardless of the properties of A1 and

  • A2. We may be able to simplify the ǫ-transitions at the end.

Raquel Fernández TTTV 2014 - lecture 1a 23

slide-106
SLIDE 106

From Reg Exp to FSA: Disjunction

Regular expression:

  • If a and b are regular expressions, so is (a|b).

Raquel Fernández TTTV 2014 - lecture 1a 24

slide-107
SLIDE 107

From Reg Exp to FSA: Disjunction

Regular expression:

  • If a and b are regular expressions, so is (a|b).

Corresponding FSA:

  • build FSA for a (A1), build FSA for b (A2)
  • add a new initial state and link it to A1 ’s initial state and to A2’s

initial state with ǫ-transitions. Again: this gives us a general procedure regardless of the properties of A1 and A2. We may be able to simplify the ǫ-transitions at the end.

Raquel Fernández TTTV 2014 - lecture 1a 24

slide-108
SLIDE 108

From Reg Exp to FSA: Kleene * or Closure

Regular expression:

  • a∗: the concatenation of a with itself any number of times.

Raquel Fernández TTTV 2014 - lecture 1a 25

slide-109
SLIDE 109

From Reg Exp to FSA: Kleene * or Closure

Regular expression:

  • a∗: the concatenation of a with itself any number of times.

Corresponding FSA:

  • build FSA for a (A)
  • add a new final state and link all of A’s final states to it with

ǫ-transitions

  • link A’s initial state to the new final state with an ǫ-transition in both

directions.

Raquel Fernández TTTV 2014 - lecture 1a 25

slide-110
SLIDE 110

From Reg Exp to FSA: Kleene * or Closure

Regular expression:

  • a∗: the concatenation of a with itself any number of times.

Corresponding FSA:

  • build FSA for a (A)
  • add a new final state and link all of A’s final states to it with

ǫ-transitions

  • link A’s initial state to the new final state with an ǫ-transition in both

directions.

In the simplest case (a), after simplification this amounts to: q0 a

Raquel Fernández TTTV 2014 - lecture 1a 25

slide-111
SLIDE 111

Summig Up

Raquel Fernández TTTV 2014 - lecture 1a 26

slide-112
SLIDE 112

Summig Up

  • Engaging in linguistic behaviour requeires different kinds of

knowledge about language: syntax, semantics, pragmatics...

Raquel Fernández TTTV 2014 - lecture 1a 26

slide-113
SLIDE 113

Summig Up

  • Engaging in linguistic behaviour requeires different kinds of

knowledge about language: syntax, semantics, pragmatics...

  • We started to look into how to formaly characterise structural

properties of languages using tools from formal language theory

Raquel Fernández TTTV 2014 - lecture 1a 26

slide-114
SLIDE 114

Summig Up

  • Engaging in linguistic behaviour requeires different kinds of

knowledge about language: syntax, semantics, pragmatics...

  • We started to look into how to formaly characterise structural

properties of languages using tools from formal language theory

∗ a formal language is a set of strings built up from a finite alphabet

  • r vocabulary

Raquel Fernández TTTV 2014 - lecture 1a 26

slide-115
SLIDE 115

Summig Up

  • Engaging in linguistic behaviour requeires different kinds of

knowledge about language: syntax, semantics, pragmatics...

  • We started to look into how to formaly characterise structural

properties of languages using tools from formal language theory

∗ a formal language is a set of strings built up from a finite alphabet

  • r vocabulary

∗ regular expressions are a formalisms for characterising languages that follow simple regular patterns

Raquel Fernández TTTV 2014 - lecture 1a 26

slide-116
SLIDE 116

Summig Up

  • Engaging in linguistic behaviour requeires different kinds of

knowledge about language: syntax, semantics, pragmatics...

  • We started to look into how to formaly characterise structural

properties of languages using tools from formal language theory

∗ a formal language is a set of strings built up from a finite alphabet

  • r vocabulary

∗ regular expressions are a formalisms for characterising languages that follow simple regular patterns ∗ many programming languages include regex functions and notation

Raquel Fernández TTTV 2014 - lecture 1a 26

slide-117
SLIDE 117

Summig Up

  • Engaging in linguistic behaviour requeires different kinds of

knowledge about language: syntax, semantics, pragmatics...

  • We started to look into how to formaly characterise structural

properties of languages using tools from formal language theory

∗ a formal language is a set of strings built up from a finite alphabet

  • r vocabulary

∗ regular expressions are a formalisms for characterising languages that follow simple regular patterns ∗ many programming languages include regex functions and notation ∗ regular languages can be characterised procedurally by means of finite state automata

Raquel Fernández TTTV 2014 - lecture 1a 26

slide-118
SLIDE 118

Summig Up

  • Engaging in linguistic behaviour requeires different kinds of

knowledge about language: syntax, semantics, pragmatics...

  • We started to look into how to formaly characterise structural

properties of languages using tools from formal language theory

∗ a formal language is a set of strings built up from a finite alphabet

  • r vocabulary

∗ regular expressions are a formalisms for characterising languages that follow simple regular patterns ∗ many programming languages include regex functions and notation ∗ regular languages can be characterised procedurally by means of finite state automata ∗ regex and FSAs capture the same class of languages

Raquel Fernández TTTV 2014 - lecture 1a 26

slide-119
SLIDE 119

To do by tomorrow

  • Read the Studiewijzer
  • Familiarise yourself with the BB site
  • In particular, see “Weekly Materials > Week 1”

Raquel Fernández TTTV 2014 - lecture 1a 27