Context-Free Grammars Z. Sawa (TU Ostrava) Introd. to Theoretical - - PowerPoint PPT Presentation

context free grammars
SMART_READER_LITE
LIVE PREVIEW

Context-Free Grammars Z. Sawa (TU Ostrava) Introd. to Theoretical - - PowerPoint PPT Presentation

Context-Free Grammars Z. Sawa (TU Ostrava) Introd. to Theoretical Computer Science April 20, 2020 1 / 63 Context-Free Grammars Example: We would like to describe a language of arithmetic expressions, containing expressions such as: 175


slide-1
SLIDE 1

Context-Free Grammars

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 1 / 63

slide-2
SLIDE 2

Context-Free Grammars

Example: We would like to describe a language of arithmetic expressions, containing expressions such as: 175 (9+15) (((10-4)*((1+34)+2))/(3+(-37))) For simplicity we assume that: Expressions are fully parenthesized. The only arithmetic operations are “+”, “-”, “*”, “/”and unary “-”. Values of operands are natural numbers written in decimal — a number is represented as a non-empty sequence of digits. Alphabet: Σ = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, +, -, *, /, (, )}

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 2 / 63

slide-3
SLIDE 3

Context-Free Grammars

Example (cont.): A description by an inductive definition: Digit is any of characters 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. Number is a non-empty sequence of digits, i.e.:

If α is a digit then α is a number. If α is a digit and β is a number then also αβ is a number.

Expression is a sequence of symbols constructed according to the following rules:

If α is a number then α is an expression. If α is an expression then also (-α) is an expression. If α and β are expressions then also (α+β) is an expression. If α and β are expressions then also (α-β) is an expression. If α and β are expressions then also (α*β) is an expression. If α and β are expressions then also (α/β) is an expression.

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 3 / 63

slide-4
SLIDE 4

Context-Free Grammars

Example (cont.): The same information that was described by the previous inductive definition can be represented by a context-free grammar: New auxiliary symbols, called nonterminals, are introduced: D — stands for an arbitrary digit C — stands for an arbitrary number E — stands for an arbitrary expression D → 0 D → 1 D → 2 D → 3 D → 4 D → 5 D → 6 D → 7 D → 8 D → 9 C → D C → DC E → C E → (-E) E → (E+E) E → (E-E) E → (E*E) E → (E/E)

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 4 / 63

slide-5
SLIDE 5

Context-Free Grammars

Example (cont.): Written in a more succinct way: D → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 C → D | DC E → C | (-E) | (E+E) | (E-E) | (E*E) | (E/E)

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 5 / 63

slide-6
SLIDE 6

Context-Free Grammars

Example: A language where words are (possibly empty) sequences of expressions described in the previous example, where individual expressions are separated by commas (the alphabet must be extended with symbol “,”): S → T | ε T → E | E,T D → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 C → D | DC E → C | (-E) | (E+E) | (E-E) | (E*E) | (E/E)

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 6 / 63

slide-7
SLIDE 7

Context-Free Grammars

Example: Statements of some programming language (a fragment of a grammar): S → E; | T | if (E) S | if (E) S else S | while (E) S | do S while (E); | for (F; F; F) S | return F; T → { U } U → ε | SU F → ε | E E → . . . Remark: S — statement T — block of statements U — sequence of statements E — expression F — optional expression that can be omitted

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 7 / 63

slide-8
SLIDE 8

Context-Free Grammars

Formally, a context-free grammar is a tuple G = (Π, Σ, S, P) where: Π is a finite set of nonterminal symbols (nonterminals) Σ is a finite set of terminal symbols (terminals), where Π ∩ Σ = ∅ S ∈ Π is an initial nonterminal P ⊆ Π × (Π ∪ Σ)∗ is a finite set of rewrite rules

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 8 / 63

slide-9
SLIDE 9

Context-Free Grammars

Remarks: We will use uppercase letters A, B, C, . . . to denote nonterminal symbols. We will use lowercase letters a, b, c, . . . or digits 0, 1, 2, . . . to denote terminal symbols. We will use lowercase Greek letters α, β, γ, . . . do denote strings from (Π ∪ Σ)∗. We will use the following notation for rules instead of (A, α) A → α A – left-hand side of the rule α – right-hand side of the rule

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 9 / 63

slide-10
SLIDE 10

Context-Free Grammars

Example: Grammar G = (Π, Σ, S, P) where Π = {A, B, C} Σ = {a, b} S = A P contains rules A → aBBb A → AaA B → ε B → bCA C → AB C → a C → b

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 10 / 63

slide-11
SLIDE 11

Context-Free Grammars

Remark: If we have more rules with the same left-hand side, as for example A → α1 A → α2 A → α3 we can write them in a more succinct way as A → α1 | α2 | α3 For example, the rules of the grammar from the previous slide can be written as A → aBBb | AaA B → ε | bCA C → AB | a | b

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 11 / 63

slide-12
SLIDE 12

Context-Free Grammars

Grammars are used for generating words. Example: G = (Π, Σ, A, P) where Π = {A, B, C}, Σ = {a, b}, and P contains rules A → aBBb | AaA B → ε | bCA C → AB | a | b For example, the word abbabb can be in grammar G generated as follows:

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 12 / 63

slide-13
SLIDE 13

Context-Free Grammars

Grammars are used for generating words. Example: G = (Π, Σ, A, P) where Π = {A, B, C}, Σ = {a, b}, and P contains rules A → aBBb | AaA B → ε | bCA C → AB | a | b For example, the word abbabb can be in grammar G generated as follows: A

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 12 / 63

slide-14
SLIDE 14

Context-Free Grammars

Grammars are used for generating words. Example: G = (Π, Σ, A, P) where Π = {A, B, C}, Σ = {a, b}, and P contains rules A → aBBb | AaA B → ε | bCA C → AB | a | b For example, the word abbabb can be in grammar G generated as follows: A

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 12 / 63

slide-15
SLIDE 15

Context-Free Grammars

Grammars are used for generating words. Example: G = (Π, Σ, A, P) where Π = {A, B, C}, Σ = {a, b}, and P contains rules A → aBBb | AaA B → ε | bCA C → AB | a | b For example, the word abbabb can be in grammar G generated as follows: A ⇒ aBBb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 12 / 63

slide-16
SLIDE 16

Context-Free Grammars

Grammars are used for generating words. Example: G = (Π, Σ, A, P) where Π = {A, B, C}, Σ = {a, b}, and P contains rules A → aBBb | AaA B → ε | bCA C → AB | a | b For example, the word abbabb can be in grammar G generated as follows: A ⇒ aBBb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 12 / 63

slide-17
SLIDE 17

Context-Free Grammars

Grammars are used for generating words. Example: G = (Π, Σ, A, P) where Π = {A, B, C}, Σ = {a, b}, and P contains rules A → aBBb | AaA B → ε | bCA C → AB | a | b For example, the word abbabb can be in grammar G generated as follows: A ⇒ aBBb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 12 / 63

slide-18
SLIDE 18

Context-Free Grammars

Grammars are used for generating words. Example: G = (Π, Σ, A, P) where Π = {A, B, C}, Σ = {a, b}, and P contains rules A → aBBb | AaA B → ε | bCA C → AB | a | b For example, the word abbabb can be in grammar G generated as follows: A ⇒ aBBb ⇒ abCABb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 12 / 63

slide-19
SLIDE 19

Context-Free Grammars

Grammars are used for generating words. Example: G = (Π, Σ, A, P) where Π = {A, B, C}, Σ = {a, b}, and P contains rules A → aBBb | AaA B → ε | bCA C → AB | a | b For example, the word abbabb can be in grammar G generated as follows: A ⇒ aBBb ⇒ abCABb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 12 / 63

slide-20
SLIDE 20

Context-Free Grammars

Grammars are used for generating words. Example: G = (Π, Σ, A, P) where Π = {A, B, C}, Σ = {a, b}, and P contains rules A → aBBb | AaA B → ε | bCA C → AB | a | b For example, the word abbabb can be in grammar G generated as follows: A ⇒ aBBb ⇒ abCABb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 12 / 63

slide-21
SLIDE 21

Context-Free Grammars

Grammars are used for generating words. Example: G = (Π, Σ, A, P) where Π = {A, B, C}, Σ = {a, b}, and P contains rules A → aBBb | AaA B → ε | bCA C → AB | a | b For example, the word abbabb can be in grammar G generated as follows: A ⇒ aBBb ⇒ abCABb ⇒ abCaBBbBb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 12 / 63

slide-22
SLIDE 22

Context-Free Grammars

Grammars are used for generating words. Example: G = (Π, Σ, A, P) where Π = {A, B, C}, Σ = {a, b}, and P contains rules A → aBBb | AaA B → ε | bCA C → AB | a | b For example, the word abbabb can be in grammar G generated as follows: A ⇒ aBBb ⇒ abCABb ⇒ abCaBBbBb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 12 / 63

slide-23
SLIDE 23

Context-Free Grammars

Grammars are used for generating words. Example: G = (Π, Σ, A, P) where Π = {A, B, C}, Σ = {a, b}, and P contains rules A → aBBb | AaA B → ε | bCA C → AB | a | b For example, the word abbabb can be in grammar G generated as follows: A ⇒ aBBb ⇒ abCABb ⇒ abCaBBbBb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 12 / 63

slide-24
SLIDE 24

Context-Free Grammars

Grammars are used for generating words. Example: G = (Π, Σ, A, P) where Π = {A, B, C}, Σ = {a, b}, and P contains rules A → aBBb | AaA B → ε | bCA C → AB | a | b For example, the word abbabb can be in grammar G generated as follows: A ⇒ aBBb ⇒ abCABb ⇒ abCaBBbBb ⇒ abCaBbBb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 12 / 63

slide-25
SLIDE 25

Context-Free Grammars

Grammars are used for generating words. Example: G = (Π, Σ, A, P) where Π = {A, B, C}, Σ = {a, b}, and P contains rules A → aBBb | AaA B → ε | bCA C → AB | a | b For example, the word abbabb can be in grammar G generated as follows: A ⇒ aBBb ⇒ abCABb ⇒ abCaBBbBb ⇒ abCaBbBb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 12 / 63

slide-26
SLIDE 26

Context-Free Grammars

Grammars are used for generating words. Example: G = (Π, Σ, A, P) where Π = {A, B, C}, Σ = {a, b}, and P contains rules A → aBBb | AaA B → ε | bCA C → AB | a | b For example, the word abbabb can be in grammar G generated as follows: A ⇒ aBBb ⇒ abCABb ⇒ abCaBBbBb ⇒ abCaBbBb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 12 / 63

slide-27
SLIDE 27

Context-Free Grammars

Grammars are used for generating words. Example: G = (Π, Σ, A, P) where Π = {A, B, C}, Σ = {a, b}, and P contains rules A → aBBb | AaA B → ε | bCA C → AB | a | b For example, the word abbabb can be in grammar G generated as follows: A ⇒ aBBb ⇒ abCABb ⇒ abCaBBbBb ⇒ abCaBbBb ⇒ abbaBbBb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 12 / 63

slide-28
SLIDE 28

Context-Free Grammars

Grammars are used for generating words. Example: G = (Π, Σ, A, P) where Π = {A, B, C}, Σ = {a, b}, and P contains rules A → aBBb | AaA B → ε | bCA C → AB | a | b For example, the word abbabb can be in grammar G generated as follows: A ⇒ aBBb ⇒ abCABb ⇒ abCaBBbBb ⇒ abCaBbBb ⇒ abbaBbBb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 12 / 63

slide-29
SLIDE 29

Context-Free Grammars

Grammars are used for generating words. Example: G = (Π, Σ, A, P) where Π = {A, B, C}, Σ = {a, b}, and P contains rules A → aBBb | AaA B → ε | bCA C → AB | a | b For example, the word abbabb can be in grammar G generated as follows: A ⇒ aBBb ⇒ abCABb ⇒ abCaBBbBb ⇒ abCaBbBb ⇒ abbaBbBb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 12 / 63

slide-30
SLIDE 30

Context-Free Grammars

Grammars are used for generating words. Example: G = (Π, Σ, A, P) where Π = {A, B, C}, Σ = {a, b}, and P contains rules A → aBBb | AaA B → ε | bCA C → AB | a | b For example, the word abbabb can be in grammar G generated as follows: A ⇒ aBBb ⇒ abCABb ⇒ abCaBBbBb ⇒ abCaBbBb ⇒ abbaBbBb ⇒ abbaBbb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 12 / 63

slide-31
SLIDE 31

Context-Free Grammars

Grammars are used for generating words. Example: G = (Π, Σ, A, P) where Π = {A, B, C}, Σ = {a, b}, and P contains rules A → aBBb | AaA B → ε | bCA C → AB | a | b For example, the word abbabb can be in grammar G generated as follows: A ⇒ aBBb ⇒ abCABb ⇒ abCaBBbBb ⇒ abCaBbBb ⇒ abbaBbBb ⇒ abbaBbb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 12 / 63

slide-32
SLIDE 32

Context-Free Grammars

Grammars are used for generating words. Example: G = (Π, Σ, A, P) where Π = {A, B, C}, Σ = {a, b}, and P contains rules A → aBBb | AaA B → ε | bCA C → AB | a | b For example, the word abbabb can be in grammar G generated as follows: A ⇒ aBBb ⇒ abCABb ⇒ abCaBBbBb ⇒ abCaBbBb ⇒ abbaBbBb ⇒ abbaBbb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 12 / 63

slide-33
SLIDE 33

Context-Free Grammars

Grammars are used for generating words. Example: G = (Π, Σ, A, P) where Π = {A, B, C}, Σ = {a, b}, and P contains rules A → aBBb | AaA B → ε | bCA C → AB | a | b For example, the word abbabb can be in grammar G generated as follows: A ⇒ aBBb ⇒ abCABb ⇒ abCaBBbBb ⇒ abCaBbBb ⇒ abbaBbBb ⇒ abbaBbb ⇒ abbabb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 12 / 63

slide-34
SLIDE 34

Context-Free Grammars

Grammars are used for generating words. Example: G = (Π, Σ, A, P) where Π = {A, B, C}, Σ = {a, b}, and P contains rules A → aBBb | AaA B → ε | bCA C → AB | a | b For example, the word abbabb can be in grammar G generated as follows: A ⇒ aBBb ⇒ abCABb ⇒ abCaBBbBb ⇒ abCaBbBb ⇒ abbaBbBb ⇒ abbaBbb ⇒ abbabb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 12 / 63

slide-35
SLIDE 35

Context-Free Grammars

On strings from (Π ∪ Σ)∗ we define relation ⇒⊆ (Π ∪ Σ)∗ × (Π ∪ Σ)∗ such that α ⇒ α′ iff α = β1Aβ2 and α′ = β1γβ2 for some β1, β2, γ ∈ (Π ∪ Σ)∗ and A ∈ Π where (A → γ) ∈ P. Example: If (B → bCA) ∈ P then aCBbA ⇒ aCbCAbA Remark: Informally, α ⇒ α′ means that it is possible to derive α′ from α by one step where an occurrence of some nonterminal A in α is replaced with the right-hand side of some rule A → γ with A on the left-hand side.

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 13 / 63

slide-36
SLIDE 36

Context-Free Grammars

On strings from (Π ∪ Σ)∗ we define relation ⇒⊆ (Π ∪ Σ)∗ × (Π ∪ Σ)∗ such that α ⇒ α′ iff α = β1Aβ2 and α′ = β1γβ2 for some β1, β2, γ ∈ (Π ∪ Σ)∗ and A ∈ Π where (A → γ) ∈ P. Example: If (B → bCA) ∈ P then aCBbA ⇒ aCbCAbA Remark: Informally, α ⇒ α′ means that it is possible to derive α′ from α by one step where an occurrence of some nonterminal A in α is replaced with the right-hand side of some rule A → γ with A on the left-hand side.

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 13 / 63

slide-37
SLIDE 37

Context-Free Grammars

A derivation of length n is a sequence β0, β1, β2, · · · , βn, where βi ∈ (Π ∪ Σ)∗, and where βi−1 ⇒ βi for all 1 ≤ i ≤ n, which can be written more succinctly as β0 ⇒ β1 ⇒ β2 ⇒ . . . ⇒ βn−1 ⇒ βn The fact that for given α, α′ ∈ (Π ∪ Σ)∗ and n ∈ N there exists some derivation β0 ⇒ β1 ⇒ β2 ⇒ . . . ⇒ βn−1 ⇒ βn, where α = β0 and α′ = βn, is denoted α ⇒n α′ The fact that α ⇒n α′ for some n ≥ 0, is denoted α ⇒∗ α′ Remark: Relation ⇒∗ is the reflexive and transitive closure of relation ⇒ (i.e., the smallest reflexive and transitive relation containing relation ⇒).

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 14 / 63

slide-38
SLIDE 38

Context-Free Grammars

Sentential forms are those α ∈ (Π ∪ Σ)∗, for which S ⇒∗ α where S is the initial nonterminal.

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 15 / 63

slide-39
SLIDE 39

Context-Free Grammars

A language L(G) generated by a grammar G = (Π, Σ, S, P) is the set of all words over alphabet Σ that can be derived by some derivation from the initial nonterminal S using rules from P, i.e., L(G) = {w ∈ Σ∗ | S ⇒∗ w}

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 16 / 63

slide-40
SLIDE 40

Context-Free Grammars

Example: We want to construct a grammar generating the language L = {anbn | n ≥ 0}

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 17 / 63

slide-41
SLIDE 41

Context-Free Grammars

Example: We want to construct a grammar generating the language L = {anbn | n ≥ 0} Grammar G = (Π, Σ, S, P) where Π = {S}, Σ = {a, b}, and P contains S → ε | aSb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 17 / 63

slide-42
SLIDE 42

Context-Free Grammars

Example: We want to construct a grammar generating the language L = {anbn | n ≥ 0} Grammar G = (Π, Σ, S, P) where Π = {S}, Σ = {a, b}, and P contains S → ε | aSb S ⇒ ε S ⇒ aSb ⇒ ab S ⇒ aSb ⇒ aaSbb ⇒ aabb S ⇒ aSb ⇒ aaSbb ⇒ aaaSbbb ⇒ aaabbb S ⇒ aSb ⇒ aaSbb ⇒ aaaSbbb ⇒ aaaaSbbbb ⇒ aaaabbbb · · ·

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 17 / 63

slide-43
SLIDE 43

Context-Free Grammars

Example: We want to construct a grammar generating the language consisting of all palindroms over the alphabet {a, b}, i.e., L = {w ∈ {a, b}∗ | w = wR} Remark: wR denotes the reverse of a word w, i.e., the word w written backwards.

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 18 / 63

slide-44
SLIDE 44

Context-Free Grammars

Example: We want to construct a grammar generating the language consisting of all palindroms over the alphabet {a, b}, i.e., L = {w ∈ {a, b}∗ | w = wR} Remark: wR denotes the reverse of a word w, i.e., the word w written backwards. Solution: S → ε | a | b | aSa | bSb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 18 / 63

slide-45
SLIDE 45

Context-Free Grammars

Example: We want to construct a grammar generating the language consisting of all palindroms over the alphabet {a, b}, i.e., L = {w ∈ {a, b}∗ | w = wR} Remark: wR denotes the reverse of a word w, i.e., the word w written backwards. Solution: S → ε | a | b | aSa | bSb S ⇒ aSa ⇒ abSba ⇒ abaSaba ⇒ abaaaba

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 18 / 63

slide-46
SLIDE 46

Context-Free Grammars

Example: We want to construct a grammar generating the language L consisting of all correctly parenthesised sequences of symbols ‘(’ and ‘)’. For example (()())(()) ∈ L but )()) ∈ L.

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 19 / 63

slide-47
SLIDE 47

Context-Free Grammars

Example: We want to construct a grammar generating the language L consisting of all correctly parenthesised sequences of symbols ‘(’ and ‘)’. For example (()())(()) ∈ L but )()) ∈ L. Solution: S → ε | (S) | SS

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 19 / 63

slide-48
SLIDE 48

Context-Free Grammars

Example: We want to construct a grammar generating the language L consisting of all correctly parenthesised sequences of symbols ‘(’ and ‘)’. For example (()())(()) ∈ L but )()) ∈ L. Solution: S → ε | (S) | SS S ⇒ SS ⇒ (S)S ⇒ (S)(S) ⇒ (SS)(S) ⇒ ((S)S)(S) ⇒ (()S)(S) ⇒ (()(S))(S) ⇒ (()())(S) ⇒ (()())((S)) ⇒ (()())(())

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 19 / 63

slide-49
SLIDE 49

Context-Free Grammars

Example: We want to construct a grammar generating the language L consisting of all correctly constructed arithmetic experessions where

  • perands are always of the form ‘a’ and where symbols + and ∗ can be

used as operators. For example (a + a) ∗ a + (a ∗ a) ∈ L.

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 20 / 63

slide-50
SLIDE 50

Context-Free Grammars

Example: We want to construct a grammar generating the language L consisting of all correctly constructed arithmetic experessions where

  • perands are always of the form ‘a’ and where symbols + and ∗ can be

used as operators. For example (a + a) ∗ a + (a ∗ a) ∈ L. Solution: E → a | E + E | E ∗ E | (E)

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 20 / 63

slide-51
SLIDE 51

Context-Free Grammars

Example: We want to construct a grammar generating the language L consisting of all correctly constructed arithmetic experessions where

  • perands are always of the form ‘a’ and where symbols + and ∗ can be

used as operators. For example (a + a) ∗ a + (a ∗ a) ∈ L. Solution: E → a | E + E | E ∗ E | (E) E ⇒ E + E ⇒ E ∗ E + E ⇒ (E) ∗ E + E ⇒ (E + E) ∗ E + E ⇒ (a + E) ∗ E + E ⇒ (a + a) ∗ E + E ⇒ (a + a) ∗ a + E ⇒ (a + a) ∗ a + (E) ⇒ (a + a) ∗ a + (E ∗ E) ⇒ (a + a) ∗ a + (a ∗ E) ⇒ (a + a) ∗ a + (a ∗ a)

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 20 / 63

slide-52
SLIDE 52

Derivation Tree

A → aBBb | AaA B → ε | bCA C → AB | a | b

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 21 / 63

slide-53
SLIDE 53

Derivation Tree

A → aBBb | AaA B → ε | bCA C → AB | a | b A A

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 21 / 63

slide-54
SLIDE 54

Derivation Tree

A → aBBb | AaA B → ε | bCA C → AB | a | b A A

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 21 / 63

slide-55
SLIDE 55

Derivation Tree

A → aBBb | AaA B → ε | bCA C → AB | a | b A a B B b A ⇒ aBBb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 21 / 63

slide-56
SLIDE 56

Derivation Tree

A → aBBb | AaA B → ε | bCA C → AB | a | b A a B B b A ⇒ aBBb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 21 / 63

slide-57
SLIDE 57

Derivation Tree

A → aBBb | AaA B → ε | bCA C → AB | a | b A a B B b A ⇒ aBBb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 21 / 63

slide-58
SLIDE 58

Derivation Tree

A → aBBb | AaA B → ε | bCA C → AB | a | b A a B b C A B b A ⇒ aBBb ⇒ abCABb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 21 / 63

slide-59
SLIDE 59

Derivation Tree

A → aBBb | AaA B → ε | bCA C → AB | a | b A a B b C A B b A ⇒ aBBb ⇒ abCABb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 21 / 63

slide-60
SLIDE 60

Derivation Tree

A → aBBb | AaA B → ε | bCA C → AB | a | b A a B b C A B b A ⇒ aBBb ⇒ abCABb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 21 / 63

slide-61
SLIDE 61

Derivation Tree

A → aBBb | AaA B → ε | bCA C → AB | a | b A a B b C A a B B b B b A ⇒ aBBb ⇒ abCABb ⇒ abCaBBbBb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 21 / 63

slide-62
SLIDE 62

Derivation Tree

A → aBBb | AaA B → ε | bCA C → AB | a | b A a B b C A a B B b B b A ⇒ aBBb ⇒ abCABb ⇒ abCaBBbBb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 21 / 63

slide-63
SLIDE 63

Derivation Tree

A → aBBb | AaA B → ε | bCA C → AB | a | b A a B b C A a B B b B b A ⇒ aBBb ⇒ abCABb ⇒ abCaBBbBb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 21 / 63

slide-64
SLIDE 64

Derivation Tree

A → aBBb | AaA B → ε | bCA C → AB | a | b A a B b C A a B B ε b B b A ⇒ aBBb ⇒ abCABb ⇒ abCaBBbBb ⇒ abCaBbBb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 21 / 63

slide-65
SLIDE 65

Derivation Tree

A → aBBb | AaA B → ε | bCA C → AB | a | b A a B b C A a B B ε b B b A ⇒ aBBb ⇒ abCABb ⇒ abCaBBbBb ⇒ abCaBbBb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 21 / 63

slide-66
SLIDE 66

Derivation Tree

A → aBBb | AaA B → ε | bCA C → AB | a | b A a B b C A a B B ε b B b A ⇒ aBBb ⇒ abCABb ⇒ abCaBBbBb ⇒ abCaBbBb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 21 / 63

slide-67
SLIDE 67

Derivation Tree

A → aBBb | AaA B → ε | bCA C → AB | a | b A a B b C b A a B B ε b B b A ⇒ aBBb ⇒ abCABb ⇒ abCaBBbBb ⇒ abCaBbBb ⇒ abbaBbBb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 21 / 63

slide-68
SLIDE 68

Derivation Tree

A → aBBb | AaA B → ε | bCA C → AB | a | b A a B b C b A a B B ε b B b A ⇒ aBBb ⇒ abCABb ⇒ abCaBBbBb ⇒ abCaBbBb ⇒ abbaBbBb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 21 / 63

slide-69
SLIDE 69

Derivation Tree

A → aBBb | AaA B → ε | bCA C → AB | a | b A a B b C b A a B B ε b B b A ⇒ aBBb ⇒ abCABb ⇒ abCaBBbBb ⇒ abCaBbBb ⇒ abbaBbBb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 21 / 63

slide-70
SLIDE 70

Derivation Tree

A → aBBb | AaA B → ε | bCA C → AB | a | b A a B b C b A a B B ε b B ε b A ⇒ aBBb ⇒ abCABb ⇒ abCaBBbBb ⇒ abCaBbBb ⇒ abbaBbBb ⇒ abbaBbb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 21 / 63

slide-71
SLIDE 71

Derivation Tree

A → aBBb | AaA B → ε | bCA C → AB | a | b A a B b C b A a B B ε b B ε b A ⇒ aBBb ⇒ abCABb ⇒ abCaBBbBb ⇒ abCaBbBb ⇒ abbaBbBb ⇒ abbaBbb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 21 / 63

slide-72
SLIDE 72

Derivation Tree

A → aBBb | AaA B → ε | bCA C → AB | a | b A a B b C b A a B B ε b B ε b A ⇒ aBBb ⇒ abCABb ⇒ abCaBBbBb ⇒ abCaBbBb ⇒ abbaBbBb ⇒ abbaBbb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 21 / 63

slide-73
SLIDE 73

Derivation Tree

A → aBBb | AaA B → ε | bCA C → AB | a | b A a B b C b A a B ε B ε b B ε b A ⇒ aBBb ⇒ abCABb ⇒ abCaBBbBb ⇒ abCaBbBb ⇒ abbaBbBb ⇒ abbaBbb ⇒ abbabb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 21 / 63

slide-74
SLIDE 74

Derivation Tree

A → aBBb | AaA B → ε | bCA C → AB | a | b A a B b C b A a B ε B ε b B ε b A ⇒ aBBb ⇒ abCABb ⇒ abCaBBbBb ⇒ abCaBbBb ⇒ abbaBbBb ⇒ abbaBbb ⇒ abbabb

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 21 / 63

slide-75
SLIDE 75

Derivation Tree

For each derivation there is some derivation tree: Nodes of the tree are labelled with terminals and nonterminals. The root of the tree is labelled with the initial nonterminal. The leafs of the tree are labelled with terminals or with symbols ε. The remaining nodes of the tree are labelled with nonterminals. If a node is labelled with some nonterminal A then its children are labelled with the symbols from the right-hand side of some rewriting rule A → α.

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 22 / 63

slide-76
SLIDE 76

Left and Right Derivation

E → a | E + E | E ∗ E | (E) A left derivation is a derivation where in every step we always replace the leftmost nonterminal. E ⇒ E + E ⇒ E ∗ E + E ⇒ a ∗ E + E ⇒ a ∗ a + E ⇒ a ∗ a + a A right derivation is a derivation where in every step we always replace the rightmost nonterminal. E ⇒ E + E ⇒ E + a ⇒ E ∗ E + a ⇒ E ∗ a + a ⇒ a ∗ a + a A derivation need not be left or right: E ⇒ E + E ⇒ E ∗ E + E ⇒ E ∗ a + E ⇒ E ∗ a + a ⇒ a ∗ a + a

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 23 / 63

slide-77
SLIDE 77

Left and Right Derivation

There can be several different derivations corresponding to one derivation tree. For every derivation tree, there is exactly one left and exactly one right derivation corresponding to the tree.

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 24 / 63

slide-78
SLIDE 78

Equvalence of Grammars

Grammars G1 and G2 are equivalent if they generate the same language, i.e., if L(G1) = L(G2). Remark: The problem of equivalence of context-free grammars is algorithmically undecidable. It can be shown that it is not possible to construct an algorithm that would decide for any pair of context-free grammars if they are equivalent or not. Even the problem to decide if a grammar generates the language Σ∗ is algorithmically undecidable.

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 25 / 63

slide-79
SLIDE 79

Ambiguous Grammars

A grammar G is ambiguous if there is a word w ∈ L(G) that has two different derivation trees, resp. two different left or two different right derivations. Example: E ⇒ E + E ⇒ E ∗ E + E ⇒ a ∗ E + E ⇒ a ∗ a + E ⇒ a ∗ a + a E ⇒ E ∗ E ⇒ E ∗ E + E ⇒ a ∗ E + E ⇒ a ∗ a + E ⇒ a ∗ a + a E E E a ∗ E a + E a E E a ∗ E E a + E a

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 26 / 63

slide-80
SLIDE 80

Ambiguous Grammars

Sometimes it is possible to replace an ambiguous grammar with a grammar generating the same language but which is not ambiguous. Example: A grammar E → a | E + E | E ∗ E | (E) can be replaced with the equivalent grammar E → T | T + E T → F | F ∗ T F → a | (E) Remark: If there is no unambiguous grammar equivalent to a given ambiguous grammar, we say it is inherently ambiguous.

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 27 / 63

slide-81
SLIDE 81

Context-Free Languages

Definition

A language L is context-free if there exists some context-free grammar G such that L = L(G). The class of context-free languages is closed with respect to: concatenation union iteration The class of context-free languages is not closed with respect to: complement intersection

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 28 / 63

slide-82
SLIDE 82

Context-Free Languages

We have two grammars G1 = (Π1, Σ, S1, P1) and G2 = (Π2, Σ, S2, P2), and can assume that Π1 ∩ Π2 = ∅ and S ∈ Π1 ∪ Π2. Grammar G such that L(G) = L(G1) · L(G2): G = (Π1 ∪ Π2 ∪ {S}, Σ, S, P1 ∪ P2 ∪ {S → S1S2}) Grammar G such that L(G) = L(G1) ∪ L(G2): G = (Π1 ∪ Π2 ∪ {S}, Σ, S, P1 ∪ P2 ∪ {S → S1, S → S2}) Grammar G such that L(G) = L(G1)∗: G = (Π1 ∪ {S}, Σ, S, P1 ∪ {S → ε, S → S1S})

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 29 / 63

slide-83
SLIDE 83

A Context-Free Grammar for a Regular Expression

Example: The construction of a context-free grammar for regular expression ((a + b) · b)∗: ∗ · + b a b

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 30 / 63

slide-84
SLIDE 84

A Context-Free Grammar for a Regular Expression

Example: The construction of a context-free grammar for regular expression ((a + b) · b)∗: ∗ · + b a S1 b S1 → a

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 30 / 63

slide-85
SLIDE 85

A Context-Free Grammar for a Regular Expression

Example: The construction of a context-free grammar for regular expression ((a + b) · b)∗: ∗ · + b S2 a S1 b S2 S2 → b S1 → a

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 30 / 63

slide-86
SLIDE 86

A Context-Free Grammar for a Regular Expression

Example: The construction of a context-free grammar for regular expression ((a + b) · b)∗: ∗ · + S3 b S2 a S1 b S2 S3 → S1 | S2 S2 → b S1 → a

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 30 / 63

slide-87
SLIDE 87

A Context-Free Grammar for a Regular Expression

Example: The construction of a context-free grammar for regular expression ((a + b) · b)∗: ∗ · S4 + S3 b S2 a S1 b S2 S4 → S3S2 S3 → S1 | S2 S2 → b S1 → a

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 30 / 63

slide-88
SLIDE 88

A Context-Free Grammar for a Regular Expression

Example: The construction of a context-free grammar for regular expression ((a + b) · b)∗: ∗ S5 · S4 + S3 b S2 a S1 b S2 S5 → ε | S4S5 S4 → S3S2 S3 → S1 | S2 S2 → b S1 → a

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 30 / 63

slide-89
SLIDE 89

Lexical and Syntactic Analysis — an example

Example: We would like to recognize a language of arithmetic expressions containing expressions such as: 34 x+1

  • x * 2 + 128 * (y - z / 3)

The expressions can contain number constants — sequences of digits 0, 1, . . . , 9. The expressions can contain names of variables — sequences consisting of letters, digits, and symbol “ ”, which do not start with a digit. The expressions can contain basic arithmetic operations — “+”, “-”, “*”, “/”, and unary “-”. It is possible to use parentheses — “(” and “)”, and to use a standard priority of arithmetic operations.

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 31 / 63

slide-90
SLIDE 90

Lexical and Syntactic Analysis — an example

The problem we want to solve: Input: a sequence of characters (e.g., a string, a text file, etc.) Output: an abstract syntax tree representing the structure of a given expression, or an information about a syntax error in the expression

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 32 / 63

slide-91
SLIDE 91

Lexical and Syntactic Analysis — an example

It is convenient to decompose this problem into several parts: Lexical analysis — recognizing of lexical elements (so called tokens) such as for example identifiers, number constants, operators, etc. Syntactic analysis — determining whether a given sequence of tokens corresponds to an allowed structure of expressions; basically, it means finding corresponding derivation (resp. derivation tree) for a given word in a context-free grammar representing the given language (e.g., in our case, the language of all well-formed expressions). Construction of an abstract syntax tree — this phase is usually connected with the syntax analysis, where the result, actually produced by the program, is typically not directly a derivation tree but rather some kind of abstract syntax tree or performing of some actions connected with rules of the given grammar.

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 33 / 63

slide-92
SLIDE 92

Lexical and Syntactic Analysis — an example

Terminals for the grammar representing well-formed expressions: ident — identifier, e.g. “x”, “q3”, “count r12” num — number constant, e.g. “5”, “42”, “65535” “(” — left parenthesis “)” — right parenthesis “+” — plus “-” — minus “*” — star “/” — slash Remark: Recognizing of sequences of symbols that correspond to individual terminals is the goal of lexical analysis.

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 34 / 63

slide-93
SLIDE 93

Lexical and Syntactic Analysis — an example

Example: Expression -x * 2 + 128 * (y - z / 3) is represented by the following sequence of symbols:

  • x

* 2 + 1 2 8 * ( y

  • z

/ 3 ) The following sequence of tokens corresponds to this sequence of symbols; these tokens are terminal symbols of the given context-free grammar:

  • ident * num + num * ( ident - ident / num )
  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 35 / 63

slide-94
SLIDE 94

Lexical and Syntactic Analysis — an example

The context-free grammar for the given language — the first try: E → ident | num | ( E ) | - E | E + E | E - E | E * E | E / E

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 36 / 63

slide-95
SLIDE 95

Lexical and Syntactic Analysis — an example

The context-free grammar for the given language — the first try: E → ident | num | ( E ) | - E | E + E | E - E | E * E | E / E This grammar is ambiguous.

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 36 / 63

slide-96
SLIDE 96

Lexical and Syntactic Analysis — an example

The context-free grammar for the given language — the second try: E → T | T + E | T - E T → F | F * T | F / T F → ident | num | ( E ) | - F Different levels of priority are represented by different nonterminals: E — expression T — term F — factor This grammar is unambiguous.

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 36 / 63

slide-97
SLIDE 97

Lexical and Syntactic Analysis — an example

The context-free grammar for the given language — the third try: E → T | T A E A → + | - T → F | F M T M → * | / F → ident | num | ( E ) | - F We create separate nonterminals for operators on different levels of priority: A — additive operator M — multiplicative operator

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 36 / 63

slide-98
SLIDE 98

Lexical and Syntactic Analysis — an example

The context-free grammar for the given language — the fourth try: S → E eof E → T | T A E A → + | - T → F | F M T M → * | / F → ident | num | ( E ) | - F It is useful to introduce special nonterminal eof representing the end of input. Moreover, in this grammar the initial nonterminal S does not occur

  • n the right hand side of any grammar.
  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 36 / 63

slide-99
SLIDE 99

Implementation of Lexical Analysis

Enumerated type Token kind representing different kinds of tokens: T EOF — the end of input T Ident — identifier T Number — number constant T LParen — “(” T RParen — “)” T Plus — “+” T Minus — “-” T Star — “*” T Slash — “/”

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 37 / 63

slide-100
SLIDE 100

Implementation of Lexical Analysis

Variable c : a currently processed character (resp. a special value eof representing the end of input): at the beginning, the first character in the input is read to variable c function next-char() returns a next charater from the input Some helper functions: error() — outputs an information about a syntax error and aborts the processing of the expression is-ident-start-char(c) — tests whether c is a charater that can occur at the beginning of an identifier is-ident-normal-char(c) — tests whether c is a character that can

  • ccur in an identifier (on other positions except beginning)

is-digit(c) — tests whether c is a digit

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 38 / 63

slide-101
SLIDE 101

Implementation of Lexical Analysis

Some other helper functions: create-ident(s) — creates an identifier from a given string s create-number(s) — creates a number from a given string s Auxiliary variables: last-ident — the last processed identifier last-num — the last processed number constant Function next-token() — the main part of the lexical analyser, it returns the following token from the input

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 39 / 63

slide-102
SLIDE 102

Implementation of Lexical Analysis

next-token (): while c ∈ {“ ”, “\t”} do c := next-char(); if c == eof then return T EOF else switch c do case “(”: do c := next-char(); return T LParen case “)”: do c := next-char(); return T RParen case “+”: do c := next-char(); return T Plus case “–”: do c := next-char(); return T Minus case “*”: do c := next-char(); return T Star case “/”: do c := next-char(); return T Slash

  • therwise do

if is-ident-start-char(c) then return scan-ident() else if is-digit(c) then return scan-number() else error()

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 40 / 63

slide-103
SLIDE 103

Implementation of Lexical Analysis

scan-ident (): s := c c := next-char() while is-ident-normal-char(c) do s := s · c c := next-char() last-ident := create-ident(s) return T Ident

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 41 / 63

slide-104
SLIDE 104

Implementation of Lexical Analysis

scan-number (): s := c c := next-char() while is-digit(c) do s := s · c c := next-char() last-num := create-number(s) return T Number

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 42 / 63

slide-105
SLIDE 105

Implementation of Syntactic Analysis

Variable t : the last processed token A helper function: init-scanner():

initializes the lexical analyser reads the first character from the input into variable c, aby tam byl nachyst´ an pro n´ asledn´ a vol´ an´ ı funkce next-token()

Reading a next token: next-token():

this is the previously described main function of the lexical analyser by repeatedly calling this function we read the tokens variable c always contains the symbol that has been read last

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 43 / 63

slide-106
SLIDE 106

Implementation of Syntactic Analysis

One of the often used methods of syntactic analysis is recursive descent: For each nonterminal there is a corresponding function — the function corresponding to nonterminal A implements all rules with nonterminal A on the left-hand side. In a given function, the next token is used to select between corresponding rules. Instructions in the body of a function correspond to processing of right-hand sides of the rules:

an occurrence of nonterminal B — the function corresponding to nonterminal B is called an occurrence of terminal a — it is checked that the following token corresponds to terminal a, when it does, the next token is read,

  • therwise an error is reported
  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 44 / 63

slide-107
SLIDE 107

Implementation of Syntactic Analysis

The previously described grammed is not very suitable for the recursive descent because it is not possible for nonterminals E and T to determine in a deterministic way one of the given pair of rules by use of just one following symbol: S → E eof E → T | T A E A → + | - T → F | F M T M → * | / F → ident | num | ( E ) | - F For example, if we want to rewrite nonterminal T and we know that the following terminal in the input is num , this terminal can be generated by use of any of the rules T → F T → F M T

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 45 / 63

slide-108
SLIDE 108

Implementation of Syntactic Analysis

The following modified grammar does not have this problem: S → E eof E → T G G → ε | A T G A → + | - T → F U U → ε | M F U M → * | / F → - F | ( E ) | ident | num

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 46 / 63

slide-109
SLIDE 109

Implementation of Syntactic Analysis

Parse (): init-scanner() t := next-token() Parse-S()

S → E eof

Parse-S (): Parse-E() if t = T EOF then error()

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 47 / 63

slide-110
SLIDE 110

Implementation of Syntactic Analysis

E → T G

Parse-E (): Parse-T() Parse-G()

G → ε | A T G

Parse-G (): if t ∈ {T Plus, T Minus} then Parse-A() Parse-T() Parse-G()

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 48 / 63

slide-111
SLIDE 111

Implementation of Syntactic Analysis

T → F U

Parse-T (): Parse-F() Parse-U()

U → ε | M F U

Parse-U (e1): if t ∈ {T Star, T Slash} then Parse-M() Parse-F() parse-U()

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 49 / 63

slide-112
SLIDE 112

Implementation of Syntactic Analysis

A → + | -

Parse-A (): switch t do case T Plus do t := next-token() case T Minus do t := next-token()

  • therwise do error()
  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 50 / 63

slide-113
SLIDE 113

Implementation of Syntactic Analysis

M → * | /

Parse-M (): switch t do case T Star do t := next-token() case T Slash do t := next-token()

  • therwise do error()
  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 51 / 63

slide-114
SLIDE 114

Implementation of Syntactic Analysis

F → ident | num | ( E ) | - F

Parse-F (): switch t do case T Ident do t := next-token() case T Number do t := next-token() case T LParen do t := next-token() Parse-E() if t = T RParen then error() t := next-token() case T Minus do t := next-token() Parse-F()

  • therwise do error()
  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 52 / 63

slide-115
SLIDE 115

Implementation of Syntactic Analysis

If a function ends with a recursive call of itself, as for example function Parse-G(), it is possible to replace this recursion with an iteration. Functions Parse-E() and Parse-G() can be merged into one function. Similarly, it is possible to replace a recursion with an iteration in function Parse-U(), and functions Parse-T() and Parse-U() can be merged into one function.

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 53 / 63

slide-116
SLIDE 116

E → T G G → ε | A T G

Parse-E (): Parse-T() while t ∈ {T Plus, T Minus} do Parse-A() Parse-T()

T → F U U → ε | M F U

Parse-T (): Parse-F() while t ∈ {T Star, T Slash} do Parse-M() Parse-F()

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 54 / 63

slide-117
SLIDE 117

Implementation of Syntactic Analysis

The implementation described above just finds out whether the given input corresponds to some word that can be generated by the given grammar. If this is the case, it reads whole input and finishes successfully. If it is not the case, function error() is called. In real implementation, it is useful to provide function error() with error messages describing the kind of error together with the information about a position in the input where the error occurred (e.g., this line and column where the currently processed token starts). Function error() can use this information to create error messages that are displayed to a user.

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 55 / 63

slide-118
SLIDE 118

Implementation of Syntactic Analysis

Typically, we do not want to use syntactic analysis just to check that the input is correct but also to create abstract syntax tree or to perform some other types of actions connected with individual rules of the grammar. The previously presented code can be used as a base that can be extended with other actions such as construction of an abstract syntax tree, modifications of read expressions, and possibly some

  • ther types of computation.

When the functions that correspond to nonterminals should create the corresponding abstract syntax tree, they can return the constructed subtree, corresponding to the part of the expression generated from the given nonterminal, as a return value.

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 56 / 63

slide-119
SLIDE 119

Implementation of Syntactic Analysis

Construction of an abstract syntax tree: An enumerated type representing binary arithmetic operations: enum Bin op { Add, Sub, Mul, Div } An enumerated type representing unary arithmetic operations: enum Un op { Un minus } Functions for creation of different kinds of nodes of an abstract syntax tree:

mk-var(ident) — creates a leaf representing a variable mk-num(num) — creates a leaf representing a number constant mk-unary(op, e) — creates a node with one child e, on which a unary operation op (of type Un op) is applied mk-binary(op, e1, e2) — creates a node with two children e1 and e2,

  • n which a binary operation op (of type Bin op) is applied
  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 57 / 63

slide-120
SLIDE 120

Implementation of Syntactic Analysis

S → E eof

Parse (): init-scanner() t := next-token() e := Parse-E() if t = T EOF then error() return e

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 58 / 63

slide-121
SLIDE 121

Implementation of Syntactic Analysis

E → T G G → ε | A T G

Parse-E (): e1 := Parse-T() while t ∈ {T Plus, T Minus} do

  • p := Parse-A()

e2 := Parse-T() e1 := mk-binary(op, e1, e2) return e1

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 59 / 63

slide-122
SLIDE 122

Implementation of Syntactic Analysis

A → + | -

Parse-A (): switch t do case T Plus do t := next-token() return Add case T Minus do t := next-token() return Sub

  • therwise do error()
  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 60 / 63

slide-123
SLIDE 123

Implementation of Syntactic Analysis

T → F U U → ε | M F U

Parse-T (): e1 := Parse-F() while t ∈ {T Star, T Slash} do

  • p := Parse-M()

e2 := Parse-F() e1 := mk-binary(op, e1, e2) return e1

  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 61 / 63

slide-124
SLIDE 124

Implementation of Syntactic Analysis

M → * | /

Parse-M (): switch t do case T Star do t := next-token() return Mul case T Slash do t := next-token() return Div

  • therwise do error()
  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 62 / 63

slide-125
SLIDE 125

F → ident | num | ( E ) | - F

Parse-F (): switch t do case T Ident do e := mk-var(last-ident) t := next-token() return e case T Number do e := mk-num(last-num) t := next-token() return e case T LParen do t := next-token() e := Parse-E() if t = T RParen then error() t := next-token() return e case T Minus do t := next-token() e := Parse-F() return mk-unary(Un minus, e)

  • therwise do error()
  • Z. Sawa (TU Ostrava)
  • Introd. to Theoretical Computer Science

April 20, 2020 63 / 63