Related Reading Chapter 2 Grammars and Parse Trees I Programming - - PDF document

related reading
SMART_READER_LITE
LIVE PREVIEW

Related Reading Chapter 2 Grammars and Parse Trees I Programming - - PDF document

Related Reading Chapter 2 Grammars and Parse Trees I Programming Languages Concepts and Constructs, Ravi Sethi Defining Language Syntax Lecture 2 Formal Grammars CS631 Fall 2000 1 Lecture 2 Formal Grammars CS631 Fall 2000 2 Overview A


slide-1
SLIDE 1

1

Lecture 2 Formal Grammars CS631 Fall 2000 1

Grammars and Parse Trees I

Defining Language Syntax

Lecture 2 Formal Grammars CS631 Fall 2000 2

Related Reading

Chapter 2 Programming Languages Concepts and Constructs, Ravi Sethi

Lecture 2 Formal Grammars CS631 Fall 2000 3

Overview

We need a way to describe programming languages.

– Grammars – Parse Trees – Equivalent grammar notations

Context Free Grammars Backus-Naur Format Extended Backus-Naur Format

Note: On Wed we will expand on these concepts

Lecture 2 Formal Grammars CS631 Fall 2000 4

A Grammar

A sentence is a noun phrase, a verb, and a noun phrase. A noun phrase is an article and a noun. A verb is… An article is… A noun is... <S> ::= <NP> <V> <NP> <NP> ::= <A> <N> <V> ::= eats | loves | hates <A> ::= a | the <N> ::= dog | cat | rat

Lecture 2 Formal Grammars CS631 Fall 2000 5

A Parse Tree

<S> <NP> <V> <NP> <A> <N> <A> <N> the dog the cat loves

Lecture 2 Formal Grammars CS631 Fall 2000 6

Basic Concepts

✁ We use context-free grammars to define

language syntax.

✁ The grammar defines how to build parse

trees; the language is the set of strings derived by some parse tree.

slide-2
SLIDE 2

2

Lecture 2 Formal Grammars CS631 Fall 2000 7

Formal Definition

A grammar consists of four parts:

– the set of terminals (also called tokens): the atomic

symbols that make up the language

– the set of nonterminals: the variables representing

language constructs

– the set of productions: tree-building rules that define

possible children for each nonterminal

– the start symbol: the nonterminal that forms the root

  • f any parse tree for the grammar

<S> ::= <NP> <V> <NP> <NP> ::= <A> <N> <V> ::= eats | loves | hates <A> ::= a | the <N> ::= dog | cat | rat start symbol a production nonterminals terminals

Lecture 2 Formal Grammars CS631 Fall 2000 9

Context-free Grammars

✁ Such grammars are sometimes called

context-free grammars (CFG’s): left-hand side of each production is one nonterminal

✁ We can use any production for a given

nonterminal to decide what children to give it, without looking at the rest of the tree. (Note: Other kinds of grammars exist: regular grammars (weaker), context-sensitive grammars (stronger), etc.)

Lecture 2 Formal Grammars CS631 Fall 2000 10

Note on CFG Formal Notation

✁ If you take CS517 you will see one way of

expressing CFG’s:

✁ But in programming language studies there is a

different notation for the same idea...

S → aSb | X X → cX | ∈

Lecture 2 Formal Grammars CS631 Fall 2000 11

Backus-Naur Form (BNF)

Conventions:

– nonterminals are enclosed in angle brackets – the symbol ::= separates the two sides of a

production, and | separates alternatives on the right-hand side.

– The special nonterminal <empty> represents the

zero-length string.

BNF Example

<exp> ::= <exp> + <exp> | <exp> * <exp> | (<exp>) | a | b | c Note that there are six productions in this grammar. It is equivalent to this: <exp> ::= <exp> + <exp> <exp> ::= <exp> * <exp> <exp> ::= (<exp>) <exp> ::= a <exp> ::= b <exp> ::= c

slide-3
SLIDE 3

3

Lecture 2 Formal Grammars CS631 Fall 2000 13

Parse Trees

To build a parse tree:

✁ Put the start symbol at the root. ✁ Add children to every nonterminal,

following any one of the productions for that nonterminal in the grammar.

✁ Done when all the leaves are terminal. ✁ Read off leaves from left to right; that’s the

string derived by the tree.

Lecture 2 Formal Grammars CS631 Fall 2000 14

Example: Parse tree for (a + b * c)

<exp> ( <exp> ) <exp> + <exp> a <exp> * <exp> b c <exp> ::= <exp> + <exp> | <exp> * <exp> | (<exp>) | a | b | c

Lecture 2 Formal Grammars CS631 Fall 2000 15

Practice Exercise

Show a parse tree for each of these strings: a+b a*b+c (a+b) (a+(b)) ((a+b)*c <exp> ::= <exp> + <exp> | <exp> * <exp> | (<exp>) | a | b | c

Lecture 2 Formal Grammars CS631 Fall 2000 16

Compiler Note

✁ What you just did is parsing: trying to find a

parse tree for a given string.

✁ That’s what compilers do for every program you

try to compile: try to build a parse tree for your program, using the grammar for whatever language you used.

✁ Take CS654 to learn about algorithms for doing

this efficiently.

Lecture 2 Formal Grammars CS631 Fall 2000 17

Language Definition

✁ We use grammars to define the syntax of

programming languages.

✁ The language defined by a grammar is the

set of all strings that can be derived by some parse tree for the grammar.

✁ The set of strings is often infinite although

grammars are finite.

Lecture 2 Formal Grammars CS631 Fall 2000 18

Practice Exercise

Give a BNF grammar for each of the following languages:

  • 1. The set of all strings consisting of 0 or more concatenated

copies of the string ab.

  • 2. The set of all strings consisting of 0 or more a’s followed

by 0 or more b’s.

slide-4
SLIDE 4

4

Lecture 2 Formal Grammars CS631 Fall 2000 19

Practice Exercise

Give a BNF grammar for each of the following languages:

  • 1. The set of all strings consisting of 0 or more a’s with a

semicolon after each one.

  • 2. The set of all strings consisting of 1 or more a’s separated

by semicolons (but not before the first or after the last).

  • 3. The set of all strings consisting of 0 or more a’s separated

by semicolons (but not before the first or after the last).

Lecture 2 Formal Grammars CS631 Fall 2000 20

EBNF

✁ Additional syntax to simplify some

grammar chores:

– {x} to mean zero or more repetitions of x – [x] to mean x is optional (i.e. x | <empty>) – () for grouping – | to mean a choice among alternatives – quotes around terminals, if necessary, to

distinguish from all these meta-symbols

Lecture 2 Formal Grammars CS631 Fall 2000 21

Practice Exercise

Give an EBNF grammar for each of these languages. Use the EBNF extensions where possible to simplify the grammars.

  • 1. All the languages from the previous set of exercises.
  • 2. The language of legal Pascal compound statements: the

keyword begin, followed by 0 or more statements separated by semicolons, followed by end. (Don’t worry about productions for the <statement> nonterminal.)

  • 3. The language of legal C iteration statements using while,

and do. (Don’t worry about productions for the <expression> and <statement> nonterminals.)

Lecture 2 Formal Grammars CS631 Fall 2000 22

Many Other Variations

✁ BNF and EBNF ideas are widely used. ✁ Exact notation differs, in spite of occasional

efforts to get uniformity.

– Niklaus Wirth. What Can We Do About the

Unnecessary Diversity of Notation for Syntatic

  • Definitions. Communications of the ACM,

November, 1977.

✁ But as long as you understand the ideas,

differences in notation are easy to pick up.

Lecture 2 Formal Grammars CS631 Fall 2000 23

Example: Java Grammar Excerpt

WhileStatement:

while ( Expression ) Statement

WhileStatementNoShortIf:

while ( Expression ) StatementNoShortIf

DoStatement:

do Statement while ( Expression ) ;

ForStatement:

for ( ForInitopt ; Expressionopt ; ForUpdateopt ) Statement

Lecture 2 Formal Grammars CS631 Fall 2000 24

Example: Java Grammar continued

ForInit: StatementExpressionList LocalVariableDeclaration ForUpdate: StatementExpressionList StatementExpressionList: StatementExpression StatementExpressionList , StatementExpression

slide-5
SLIDE 5

5

Lecture 2 Formal Grammars CS631 Fall 2000 25

Compiler Issues: Abstract Syntax Tree (AST)

✁ A tree structure used by compilers. ✁ A parse tree with nonterminals removed,

containing only what the compiler needs for code generation.

✁ Usually, each node is an operator and each

subtree of that node is an operand...

Lecture 2 Formal Grammars CS631 Fall 2000 26

AST Example

<exp> <exp> + <exp> ( <exp> ) <exp> * <exp> a b c * + b c a But there’s no standard definition for this. It depends on the compiler.

Compilers and Interpreters

Source Code Scanner Parser

AST

Static Analyzer Code Generator Virtual Machine (Interpreter) Physical Machine Converts input file into a stream of tokens for parsing. Parses tokens using a grammar; produces Abstract Syntax Tree Checks things like type correctness. Executes the program using a simulated machine (like the Java VM) Generates code for physical machine.

Lecture 2 Formal Grammars CS631 Fall 2000 28

Summary

✁ We use context-free grammars to define

language syntax.

✁ The grammar defines how to build parse

trees; the language is the set of strings derived by some parse tree.

✁ Different notations, same ideas:

– formal grammars – Backus-Naur Form (BNF) – Extended BNF (EBNF)

Lecture 2 Formal Grammars CS631 Fall 2000 29

Review Questions

Look at questions 2.4, 2.6, 2.9