Context-Free Grammars 19 March 2019 OSU CSE 1 BL Compiler - - PowerPoint PPT Presentation

context free grammars
SMART_READER_LITE
LIVE PREVIEW

Context-Free Grammars 19 March 2019 OSU CSE 1 BL Compiler - - PowerPoint PPT Presentation

Context-Free Grammars 19 March 2019 OSU CSE 1 BL Compiler Structure Code Tokenizer Parser Generator string of string of abstract string of characters tokens program integers (source code) (words) (object code) The parser is


slide-1
SLIDE 1

Context-Free Grammars

19 March 2019 OSU CSE 1

slide-2
SLIDE 2

BL Compiler Structure

19 March 2019 OSU CSE 2

Code Generator Parser Tokenizer string of characters (source code) string of tokens (“words”) abstract program string of integers (object code)

The parser is arguably the most interesting, and most difficult, piece of the BL compiler.

slide-3
SLIDE 3

Plan for the BL Parser

  • Design a context-free grammar (CFG) to

specify syntactically valid BL programs

  • Use the grammar to implement a

recursive-descent parser (i.e., an algorithm to parse a BL program and construct the corresponding Program

  • bject)

19 March 2019 OSU CSE 3

slide-4
SLIDE 4

Plan for the BL Parser

  • Design a context-free grammar (CFG) to

specify syntactically valid BL programs

  • Use the grammar to implement a

recursive-descent parser (i.e., an algorithm to parse a BL program and construct the corresponding Program

  • bject)

19 March 2019 OSU CSE 4

A grammar is a set of formation rules for strings in a language.

slide-5
SLIDE 5

Plan for the BL Parser

  • Design a context-free grammar (CFG) to

specify syntactically valid BL programs

  • Use the grammar to implement a

recursive-descent parser (i.e., an algorithm to parse a BL program and construct the corresponding Program

  • bject)

19 March 2019 OSU CSE 5

A grammar is context-free if it satisfies certain technical conditions described herein.

slide-6
SLIDE 6

Languages

  • A language is a set of strings over some

alphabet Σ

  • If L is a language, then mathematically it is

a set of string of Σ

19 March 2019 OSU CSE 6

slide-7
SLIDE 7

Aside: Characters vs. Tokens

  • In the following examples of CFGs, we

deal with languages over the alphabet of individual characters (e.g., Java’s char values)

Σ = character

  • In the BL project, we deal with languages
  • ver an alphabet of tokens (to be

explained later)

19 March 2019 OSU CSE 7

slide-8
SLIDE 8

Example: Real-Number Constants

  • Some syntactically valid real-number

constants (i.e., some strings in the “language of valid real-number constants”):

37.044 615.22E16 99241. 18.E-93

19 March 2019 OSU CSE 8

slide-9
SLIDE 9

CFG Rewrite Rules

real-const → digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent exponent → E digit-seq | E + digit-seq | E – digit-seq digit-seq → digit digit-seq | digit digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

19 March 2019 OSU CSE 9

slide-10
SLIDE 10

CFG Rewrite Rules

real-const → digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent exponent → E digit-seq | E + digit-seq | E – digit-seq digit-seq → digit digit-seq | digit digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

19 March 2019 OSU CSE 10

This is a rewrite rule (a replacement rule), which describes how strings in the language may be formed.

slide-11
SLIDE 11

CFG Rewrite Rules

real-const → digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent exponent → E digit-seq | E + digit-seq | E – digit-seq digit-seq → digit digit-seq | digit digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

19 March 2019 OSU CSE 11

A name on the left of a rewrite rule is called a non-terminal symbol.

slide-12
SLIDE 12

CFG Rewrite Rules

real-const → digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent exponent → E digit-seq | E + digit-seq | E – digit-seq digit-seq → digit digit-seq | digit digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

19 March 2019 OSU CSE 12

The special CFG symbol → means “can be rewritten as”

  • r “can be replaced by”.
slide-13
SLIDE 13

CFG Rewrite Rules

real-const → digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent exponent → E digit-seq | E + digit-seq | E – digit-seq digit-seq → digit digit-seq | digit digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

19 March 2019 OSU CSE 13

The special CFG symbol | means “or”, i.e., there are multiple possible “rewrites” for the same non-terminal.

slide-14
SLIDE 14

CFG Rewrite Rules

real-const → digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent exponent → E digit-seq | E + digit-seq | E – digit-seq digit-seq → digit digit-seq | digit digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

19 March 2019 OSU CSE 14

So this ...

slide-15
SLIDE 15

CFG Rewrite Rules

real-const → digit-seq . digit-seq real-const → digit-seq . digit-seq exponent real-const → digit-seq . real-const → digit-seq . exponent exponent → E digit-seq | E + digit-seq | E – digit-seq digit-seq → digit digit-seq | digit digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

19 March 2019 OSU CSE 15

... means exactly the same thing as these four separate rewrite rules.

slide-16
SLIDE 16

CFG Rewrite Rules

real-const → digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent exponent → E digit-seq | E + digit-seq | E – digit-seq digit-seq → digit digit-seq | digit digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

19 March 2019 OSU CSE 16

One non-terminal symbol (normally in the first rewrite rule) is called the start symbol.

slide-17
SLIDE 17

CFG Rewrite Rules

real-const → digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent exponent → E digit-seq | E + digit-seq | E – digit-seq digit-seq → digit digit-seq | digit digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

19 March 2019 OSU CSE 17

A symbol from the alphabet

  • n the right-hand side of a

rewrite rule is called a terminal symbol.

slide-18
SLIDE 18

CFG Rewrite Rules

real-const → digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent exponent → E digit-seq | E + digit-seq | E – digit-seq digit-seq → digit digit-seq | digit digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

19 March 2019 OSU CSE 18

To remember the name: terminal symbols are what you end up with when generating strings in the language (see below).

slide-19
SLIDE 19

Four Components of a CFG

  • Non-terminal symbols for this CFG:

– real-const, exponent, digit-seq, digit

  • Terminal symbols for this CFG:

– ., E, +, -, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

  • Start symbol for this CFG:

– real-const

  • Rewrite rules for this CFG:

– (see previous slides)

19 March 2019 OSU CSE 19

slide-20
SLIDE 20

Derivations

  • A derivation of a string of terminal

symbols consists of a sequence of specific rewrite-rule applications that begin with the start symbol and continue until only terminal symbols remain

– A string is in the language of the CFG iff there is a derivation that leads to it

  • The symbol ⇒ indicates a derivation step,

i.e., a specific rewrite-rule application

19 March 2019 OSU CSE 20

slide-21
SLIDE 21

Example: Derivation of 5.6E10

  • Begin with the start symbol:

real-const ⇒

19 March 2019 OSU CSE 21

slide-22
SLIDE 22

Example: Derivation of 5.6E10

  • Begin with the start symbol:

real-const ⇒

  • ... and pick one possible rewrite:

real-const → digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent

19 March 2019 OSU CSE 22

Which rewrite is appropriate to derive 5.6E10?

slide-23
SLIDE 23

Example: Derivation of 5.6E10

  • This is the first step of the derivation:

real-const ⇒ digit-seq . digit-seq exponent

19 March 2019 OSU CSE 23

slide-24
SLIDE 24

Example: Derivation of 5.6E10

  • Choose a non-terminal to rewrite:

real-const ⇒ digit-seq . digit-seq exponent

19 March 2019 OSU CSE 24

slide-25
SLIDE 25

Example: Derivation of 5.6E10

  • Choose a non-terminal to rewrite:

real-const ⇒ digit-seq . digit-seq exponent

  • ... and pick one possible rewrite:

digit-seq → digit digit-seq | digit

19 March 2019 OSU CSE 25

Which rewrite is appropriate to derive 5.6E10?

slide-26
SLIDE 26

Example: Derivation of 5.6E10

  • This is the second step of the derivation:

real-const ⇒ digit-seq . digit-seq exponent ⇒ digit . digit-seq exponent

19 March 2019 OSU CSE 26

slide-27
SLIDE 27

Example: Derivation of 5.6E10

  • Choose a non-terminal to rewrite:

real-const ⇒ digit-seq . digit-seq exponent ⇒ digit . digit-seq exponent

19 March 2019 OSU CSE 27

slide-28
SLIDE 28

Example: Derivation of 5.6E10

  • Choose a non-terminal to rewrite:

real-const ⇒ digit-seq . digit-seq exponent ⇒ digit . digit-seq exponent

  • ... and pick one possible rewrite:

digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

19 March 2019 OSU CSE 28

slide-29
SLIDE 29

Example: Derivation of 5.6E10

  • This is the third step of the derivation:

real-const ⇒ digit-seq . digit-seq exponent ⇒ digit . digit-seq exponent ⇒ 5 . digit-seq exponent

19 March 2019 OSU CSE 29

slide-30
SLIDE 30

Example: Derivation of 5.6E10

  • Choose a non-terminal to rewrite:

real-const ⇒ digit-seq . digit-seq exponent ⇒ digit . digit-seq exponent ⇒ 5 . digit-seq exponent

19 March 2019 OSU CSE 30

slide-31
SLIDE 31

Example: Derivation of 5.6E10

  • Choose a non-terminal to rewrite:

real-const ⇒ digit-seq . digit-seq exponent ⇒ digit . digit-seq exponent ⇒ 5 . digit-seq exponent

  • ... and pick one possible rewrite:

digit-seq → digit digit-seq | digit

19 March 2019 OSU CSE 31

slide-32
SLIDE 32

One Derivation of 5.6E10

real-const ⇒ digit-seq . digit-seq exponent ⇒ digit . digit-seq exponent ⇒ 5 . digit-seq exponent ⇒ 5 . digit exponent ⇒ 5 . 6 exponent ⇒ 5 . 6 E digit-seq ⇒ 5 . 6 E digit digit-seq ⇒ 5 . 6 E 1 digit-seq ⇒ 5 . 6 E 1 digit ⇒ 5 . 6 E 1 0

19 March 2019 OSU CSE 32

slide-33
SLIDE 33

One Derivation of 5.6E10

real-const ⇒ digit-seq . digit-seq exponent ⇒ digit . digit-seq exponent ⇒ 5 . digit-seq exponent ⇒ 5 . digit exponent ⇒ 5 . 6 exponent ⇒ 5 . 6 E digit-seq ⇒ 5 . 6 E digit digit-seq ⇒ 5 . 6 E 1 digit-seq ⇒ 5 . 6 E 1 digit ⇒ 5 . 6 E 1 0

19 March 2019 OSU CSE 33

Note that a derivation is used in this way to generate a string in the language of the CFG.

slide-34
SLIDE 34

Another Derivation of 5.6E10

real-const ⇒ digit-seq . digit-seq exponent ⇒ digit-seq . digit-seq E digit-seq ⇒ digit-seq . digit-seq E digit digit-seq ⇒ digit-seq . digit-seq E digit digit ⇒ digit-seq . digit-seq E digit 0 ⇒ digit-seq . digit-seq E 1 0 ⇒ digit-seq . digit E 1 0 ⇒ digit-seq . 6 E 1 0 ⇒ digit . 6 E 1 0 ⇒ 5 . 6 E 1 0

19 March 2019 OSU CSE 34

slide-35
SLIDE 35

Derivation Trees

  • A derivation tree depicts a derivation

(such as those above) in a tree

  • Note that the order in which rewrites are

done is sometimes arbitrary

– A tree captures the required temporal order of rewrites from top-to-bottom – A tree captures the required spatial order among terminal symbols from left-to-right

19 March 2019 OSU CSE 35

slide-36
SLIDE 36

A Derivation Tree for 5.6E10

19 March 2019 OSU CSE 36

real-const digit exponent digit-seq digit-seq . digit 5 6 digit-seq E digit digit-seq digit 1

slide-37
SLIDE 37

A Derivation Tree for 5.6E10

19 March 2019 OSU CSE 37

real-const digit exponent digit-seq digit-seq . digit 5 6 digit-seq E digit digit-seq digit 1

This tree captures both derivations previously illustrated (and all others) for 5.6E10.

slide-38
SLIDE 38

Other Examples

  • Can you find a derivation tree for 5.E3?

– If so, it’s in the language of the CFG;

  • therwise it’s not in that language
  • Can you find a derivation tree for .6E10?

– If so, it’s in the language of the CFG;

  • therwise it’s not in that language

19 March 2019 OSU CSE 38

slide-39
SLIDE 39

A Famous CFG

expr → expr add-op term | term term → term mult-op factor | factor factor → ( expr ) | digit-seq add-op → + | - mult-op → * | DIV | REM digit-seq → digit digit-seq | digit digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

19 March 2019 OSU CSE 39

slide-40
SLIDE 40

Example: 4+6*2

  • Find a derivation tree for 4+6*2

19 March 2019 OSU CSE 40

slide-41
SLIDE 41

A Derivation Tree for 4+6*2

19 March 2019 OSU CSE 41

expr digit digit-seq + 4 expr add-op term term factor term mult-op factor digit digit-seq 6 factor digit digit-seq 2 *

slide-42
SLIDE 42

Example: (4+6)*2

  • Find a derivation tree for (4+6)*2
  • How is it different from the previous one?

19 March 2019 OSU CSE 42

slide-43
SLIDE 43

A Simpler CFG for Expressions

expr → expr op expr | ( expr ) | digit-seq

  • p

→ + | - | * | DIV | REM digit-seq → digit digit-seq | digit digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

19 March 2019 OSU CSE 43

slide-44
SLIDE 44

One Derivation Tree for 4+6*2

19 March 2019 OSU CSE 44

expr digit digit-seq + 4 expr

  • p

expr expr

  • p

expr digit digit-seq 6 digit digit-seq 2 *

slide-45
SLIDE 45

Another Derivation Tree for 4+6*2

19 March 2019 OSU CSE 45

expr *

  • p

expr expr

  • p

expr digit digit-seq 4 digit digit-seq 6 + digit digit-seq 2 expr

slide-46
SLIDE 46

Ambiguity

  • The second (simpler) CFG for arithmetic

expressions is ambiguous because some strings in the language of the CFG have more than one derivation tree

  • As is often the case, ambiguity is bad

– If you want to use the derivation tree as the basis for evaluating the expression, only one

  • f the derivation trees shown above results in

the right answer (which one?)

19 March 2019 OSU CSE 46

slide-47
SLIDE 47

Resources

  • Wikipedia: Context-Free Grammar

– http://en.wikipedia.org/wiki/Context-free_grammar

19 March 2019 OSU CSE 47