Computational Linguistics II: Parsing Formal Languages: Regular - - PowerPoint PPT Presentation

computational linguistics ii parsing
SMART_READER_LITE
LIVE PREVIEW

Computational Linguistics II: Parsing Formal Languages: Regular - - PowerPoint PPT Presentation

Computational Linguistics II: Parsing Formal Languages: Regular Languages II Frank Richter & Jan-Philipp S ohn fr@sfs.uni-tuebingen.de, jp.soehn@uni-tuebingen.de Computational Linguistics II: Parsing p.1 Reminder: The Big Picture


slide-1
SLIDE 1

Computational Linguistics II: Parsing

Formal Languages: Regular Languages II

Frank Richter & Jan-Philipp S¨

  • hn

fr@sfs.uni-tuebingen.de, jp.soehn@uni-tuebingen.de

Computational Linguistics II: Parsing – p.1

slide-2
SLIDE 2

Reminder: The Big Picture

hierarchy grammar machine

  • ther

type 3

  • reg. grammar

DFA

  • reg. expressions

NFA

  • det. cf.

LR(k) grammar DPDA type 2 CFG PDA type 1 CSG LBA type 0 unrestricted Turing grammar machine

DFA: Deterministic finite state automaton (D)PDA: (Deterministic) Pushdown automaton CFG: Context-free grammar CSG: Context-sensitive grammar LBA: Linear bounded automaton

Computational Linguistics II: Parsing – p.2

slide-3
SLIDE 3

Form of Grammars of Type 0–3

For i ∈ {0, 1, 2, 3}, a grammar N, T, P, S of Type i, with N the set of non-terminal symbols, T the set of terminal symbols (N and T disjoint, Σ = N ∪ T), P the set of productions, and S the start symbol (S ∈ N), obeys the following restrictions: T3: Every production in P is of the form A → aB or A → ǫ, with B, A ∈ N, a ∈ T. T2: Every production in P is of the form A → x, with A ∈ N and x ∈ Σ∗. T1: Every production in P is of the form x1Ax2 → x1yx2, with

x1, x2 ∈ Σ∗, y ∈ Σ+, A ∈ N and the possible exception of C → ǫ in case C does not occur on the righthand side of

a rule in P. T0: No restrictions.

Computational Linguistics II: Parsing – p.3

slide-4
SLIDE 4

Regular Languages

Regular grammars,

Computational Linguistics II: Parsing – p.4

slide-5
SLIDE 5

Regular Languages

Regular grammars, deterministic finite state automata,

Computational Linguistics II: Parsing – p.4

slide-6
SLIDE 6

Regular Languages

Regular grammars, deterministic finite state automata, nondeterministic finite state automata, and

Computational Linguistics II: Parsing – p.4

slide-7
SLIDE 7

Regular Languages

Regular grammars, deterministic finite state automata, nondeterministic finite state automata, and regular expressions

Computational Linguistics II: Parsing – p.4

slide-8
SLIDE 8

Regular Languages

Regular grammars, deterministic finite state automata, nondeterministic finite state automata, and regular expressions characterize the same class of languages, viz. Type 3 languages.

Computational Linguistics II: Parsing – p.4

slide-9
SLIDE 9

Reminder: DFA

Definition 1 (DFA) A deterministic FSA (DFA) is a

quintuple (Σ, Q, i, F, δ) where

Σ is a finite set called the alphabet, Q is a finite set of states, i ∈ Q is the initial state, F ⊆ Q the set of final states, and δ is the transition function from Q × Σ to Q.

Computational Linguistics II: Parsing – p.5

slide-10
SLIDE 10

Reminder: Acceptance

Definition 3 (Acceptance)

Given a DFA M = (Σ, Q, i, F, δ), the language L(M) accepted by M is

L(M) = {x ∈ Σ∗|ˆ δ(i, x) ∈ F}.

Computational Linguistics II: Parsing – p.6

slide-11
SLIDE 11

Nondeterministic Finite-state Automata

Definition 4 (NFA) A nondeterministic finite-state

automaton is a quintuple (Σ, Q, S, F, δ) where

Σ is a finite set called the alphabet, Q is a finite set of states, S ⊆ Q is the set of initial states, F ⊆ Q the set of final states, and δ is the transition function from Q × Σ to Pow(Q).

Computational Linguistics II: Parsing – p.7

slide-12
SLIDE 12

Theorem (Rabin/Scott)

For every language accepted by an NFA there is a DFA which accepts the same language.

Computational Linguistics II: Parsing – p.8

slide-13
SLIDE 13

Regular Expressions

Given an alphabet Σ of symbols the following are all and

  • nly the regular expressions over the alphabet

Σ ∪ {Ø, 0, |, ∗, [, ]}:

Ø empty set the empty string (ǫ, [])

σ

for all σ ∈ Σ [α | β] union (for α, β reg.ex.) (α ∪ β, α + β) [α β] concatenation (for α, β reg.ex.) [α*] Kleene star (for α reg.ex.)

Computational Linguistics II: Parsing – p.9

slide-14
SLIDE 14

Meaning of Regular Expressions

L(Ø) = ∅ the empty language L(0) = {0} the empty-string language L(σ) = {σ} L([α | β]) = L(α) ∪ L(β) L([α β]) = L(α) ◦ L(β) L([α∗]) = (L(α))*

Σ∗ is called the universal language. Note that the universal

language is given relative to a particular alphabet.

Computational Linguistics II: Parsing – p.10

slide-15
SLIDE 15

Theorem (Kleene)

The set of languages which can be described by regular expressions is the set of regular languages.

Computational Linguistics II: Parsing – p.11

slide-16
SLIDE 16

Pumping Lemma for Regular Languages

uvw theorem:

For each regular language L there is an integer n such that for each x ∈ L with |x| ≥ n there are u, v, w with x = uvw such that

  • 1. |v| ≥ 1,
  • 2. |uv| ≤ n,
  • 3. for all i ∈ I

N0: uviw ∈ L.

Computational Linguistics II: Parsing – p.12

slide-17
SLIDE 17

A Non-regular Language

Corollary

Let Σ be {a,b}. L = {anbn | n ∈ I

N} is not regular.

Proof

Assume k ∈ I

  • N. For each akbk = uvw with v= ǫ
  • 1. v = al, 0< l ≤ k, or
  • 2. v = al1bl2, 0< l1, l2 ≤ k, or
  • 3. v = bl, 0< l ≤ k, or

In each case we have uv2w ∈ L. The result follows with the Pumping Lemma.

Computational Linguistics II: Parsing – p.13

slide-18
SLIDE 18

Natural and Regular Languages

Corollary German is not a regular language. Proof Consider

L1={Ein Spion (der einen Spion)k observiertl wird meist selbst observiert} L1 is regular. L1 ∩ Deutsch = {Ein Spion (der einen Spion)k observiertk wird meist selbst

  • bserviert}

is not regular.

Computational Linguistics II: Parsing – p.14

slide-19
SLIDE 19

Theorem (Myhill/Nerode)

The following three statements are equivalent:

  • 1. The set L ⊆ Σ∗ is accepted by some DFA.
  • 2. L is the union of some of the equivalence classes of a

right invariant equivalence relation of finite index.

  • 3. Let equivalence relation RL be defined by: xRLy iff for

all z ∈ Σ∗, xz ∈ L iff yz ∈ L. Then RL is of finite index.

Computational Linguistics II: Parsing – p.15

slide-20
SLIDE 20

Minimization

For every nondeterministic finite-state automaton there exists an equivalent deterministic automaton with a minimal number of states.

Computational Linguistics II: Parsing – p.16

slide-21
SLIDE 21

Closure Properties of Regular Languages

Regular languages are closed under union intersection complement product Kleene star

Computational Linguistics II: Parsing – p.17

slide-22
SLIDE 22

Closure Properties of Regular Languages

Regular languages are closed under union (regular expression) intersection complement product (regular expression) Kleene star (regular expression)

Computational Linguistics II: Parsing – p.17

slide-23
SLIDE 23

Closure Properties of Regular Languages

Regular languages are closed under union (regular expression) intersection (e.g. constructive) complement product (regular expression) Kleene star (regular expression)

Computational Linguistics II: Parsing – p.17

slide-24
SLIDE 24

Closure Properties of Regular Languages

Regular languages are closed under union (regular expression) intersection (e.g. constructive) complement (DFA) product (regular expression) Kleene star (regular expression)

Computational Linguistics II: Parsing – p.17

slide-25
SLIDE 25

Decidable Problems for Reg. Languages

  • 1. Word problem

Computational Linguistics II: Parsing – p.18

slide-26
SLIDE 26

Decidable Problems for Reg. Languages

  • 1. Word problem
  • 2. Emptiness

Computational Linguistics II: Parsing – p.18

slide-27
SLIDE 27

Decidable Problems for Reg. Languages

  • 1. Word problem
  • 2. Emptiness
  • 3. Finiteness

Computational Linguistics II: Parsing – p.18

slide-28
SLIDE 28

Decidable Problems for Reg. Languages

  • 1. Word problem
  • 2. Emptiness
  • 3. Finiteness
  • 4. Intersection

Computational Linguistics II: Parsing – p.18

slide-29
SLIDE 29

Decidable Problems for Reg. Languages

  • 1. Word problem
  • 2. Emptiness
  • 3. Finiteness
  • 4. Intersection
  • 5. Equivalence

Computational Linguistics II: Parsing – p.18