related reading
play

Related Reading Chapter 2 Grammars and Parse Trees I Programming - PDF document

Related Reading Chapter 2 Grammars and Parse Trees I Programming Languages Concepts and Constructs, Ravi Sethi Defining Language Syntax Lecture 2 Formal Grammars CS631 Fall 2000 1 Lecture 2 Formal Grammars CS631 Fall 2000 2 Overview A


  1. Related Reading Chapter 2 Grammars and Parse Trees I Programming Languages Concepts and Constructs, Ravi Sethi Defining Language Syntax Lecture 2 Formal Grammars CS631 Fall 2000 1 Lecture 2 Formal Grammars CS631 Fall 2000 2 Overview A Grammar We need a way to describe programming languages. A sentence is a noun < S > ::= < NP > < V > < NP > phrase, a verb, and a – Grammars noun phrase. – Parse Trees A noun phrase is an – Equivalent grammar notations < NP > ::= < A > < N > article and a noun. � Context Free Grammars � Backus-Naur Format A verb is… < V > ::= eats | loves | hates � Extended Backus-Naur Format An article is… < A > ::= a | the Note: On Wed we will expand on these concepts A noun is... < N > ::= dog | cat | rat Lecture 2 Formal Grammars CS631 Fall 2000 3 Lecture 2 Formal Grammars CS631 Fall 2000 4 A Parse Tree Basic Concepts < S > ✁ We use context-free grammars to define < NP > < V > < NP > language syntax. ✁ The grammar defines how to build parse < A > < N > loves < A > < N > trees; the language is the set of strings derived by some parse tree. the dog the cat Lecture 2 Formal Grammars CS631 Fall 2000 5 Lecture 2 Formal Grammars CS631 Fall 2000 6 1

  2. Formal Definition start symbol < S > ::= < NP > < V > < NP > A grammar consists of four parts: – the set of terminals (also called tokens): the atomic a production symbols that make up the language < NP > ::= < A > < N > – the set of nonterminals: the variables representing language constructs < V > ::= eats | loves | hates – the set of productions: tree-building rules that define < A > ::= a | the possible children for each nonterminal nonterminals – the start symbol: the nonterminal that forms the root < N > ::= dog | cat | rat of any parse tree for the grammar terminals Lecture 2 Formal Grammars CS631 Fall 2000 7 Context-free Grammars Note on CFG Formal Notation ✁ Such grammars are sometimes called ✁ If you take CS517 you will see one way of context-free grammars (CFG’s): left-hand expressing CFG’s: side of each production is one nonterminal ✁ We can use any production for a given S → aSb | X nonterminal to decide what children to give X → cX | ∈ it, without looking at the rest of the tree. ✁ But in programming language studies there is a (Note: Other kinds of grammars exist: regular different notation for the same idea... grammars (weaker), context-sensitive grammars (stronger), etc.) Lecture 2 Formal Grammars CS631 Fall 2000 9 Lecture 2 Formal Grammars CS631 Fall 2000 10 BNF Example Backus-Naur Form (BNF) < exp > ::= < exp > + < exp > | < exp > * < exp > Conventions: | ( < exp > ) – nonterminals are enclosed in angle brackets | a | b | c – the symbol ::= separates the two sides of a Note that there are six productions in this grammar. production, and | separates alternatives on the It is equivalent to this: right-hand side. < exp > ::= < exp > + < exp > – The special nonterminal < empty > represents the < exp > ::= < exp > * < exp > zero-length string. < exp > ::= ( < exp > ) < exp > ::= a < exp > ::= b < exp > ::= c Lecture 2 Formal Grammars CS631 Fall 2000 11 2

  3. Parse Trees Example: Parse tree for (a + b * c) To build a parse tree: < exp > ::= < exp > + < exp > ✁ Put the start symbol at the root. < exp > | < exp > * < exp > ✁ Add children to every nonterminal, | ( < exp > ) ( < exp > ) | a | b | c following any one of the productions for < exp > + < exp > that nonterminal in the grammar. ✁ Done when all the leaves are terminal. < exp > * < exp > a ✁ Read off leaves from left to right; that’s the b c string derived by the tree. Lecture 2 Formal Grammars CS631 Fall 2000 13 Lecture 2 Formal Grammars CS631 Fall 2000 14 Practice Exercise Compiler Note ✁ What you just did is parsing : trying to find a < exp > ::= < exp > + < exp > | < exp > * < exp > parse tree for a given string. | ( < exp > ) ✁ That’s what compilers do for every program you | a | b | c try to compile: try to build a parse tree for your Show a parse tree for each of these strings: program, using the grammar for whatever a+b language you used. a*b+c ✁ Take CS654 to learn about algorithms for doing (a+b) (a+(b)) this efficiently. ((a+b)*c Lecture 2 Formal Grammars CS631 Fall 2000 15 Lecture 2 Formal Grammars CS631 Fall 2000 16 Language Definition Practice Exercise ✁ We use grammars to define the syntax of Give a BNF grammar for each of the following languages: programming languages. ✁ The language defined by a grammar is the 1. The set of all strings consisting of 0 or more concatenated copies of the string ab . set of all strings that can be derived by some parse tree for the grammar. ✁ The set of strings is often infinite although 2. The set of all strings consisting of 0 or more a ’s followed by 0 or more b ’s. grammars are finite. Lecture 2 Formal Grammars CS631 Fall 2000 17 Lecture 2 Formal Grammars CS631 Fall 2000 18 3

  4. Practice Exercise EBNF ✁ Additional syntax to simplify some Give a BNF grammar for each of the following languages: grammar chores: 1. The set of all strings consisting of 0 or more a ’s with a semicolon after each one. – {x} to mean zero or more repetitions of x – [x] to mean x is optional (i.e. x | < empty >) 2. The set of all strings consisting of 1 or more a ’s separated by semicolons (but not before the first or after the last). – () for grouping – | to mean a choice among alternatives 3. The set of all strings consisting of 0 or more a ’s separated – quotes around terminals, if necessary, to by semicolons (but not before the first or after the last). distinguish from all these meta-symbols Lecture 2 Formal Grammars CS631 Fall 2000 19 Lecture 2 Formal Grammars CS631 Fall 2000 20 Practice Exercise Many Other Variations ✁ BNF and EBNF ideas are widely used. Give an EBNF grammar for each of these languages. Use the EBNF extensions where possible to simplify the grammars. ✁ Exact notation differs, in spite of occasional 1. All the languages from the previous set of exercises. efforts to get uniformity. 2. The language of legal Pascal compound statements: the – Niklaus Wirth . What Can We Do About the keyword begin , followed by 0 or more statements separated Unnecessary Diversity of Notation for Syntatic by semicolons, followed by end . (Don’t worry about Definitions . Communications of the ACM , productions for the < statement > nonterminal.) November, 1977. ✁ But as long as you understand the ideas, 3. The language of legal C iteration statements using while , and do . (Don’t worry about productions for the differences in notation are easy to pick up. < expression > and < statement > nonterminals.) Lecture 2 Formal Grammars CS631 Fall 2000 21 Lecture 2 Formal Grammars CS631 Fall 2000 22 Example: Java Grammar Excerpt Example: Java Grammar continued ForInit: StatementExpressionList WhileStatement: LocalVariableDeclaration while ( Expression ) Statement ForUpdate: WhileStatementNoShortIf: StatementExpressionList while ( Expression ) StatementNoShortIf StatementExpressionList: DoStatement: StatementExpression do Statement while ( Expression ) ; StatementExpressionList , StatementExpression ForStatement: for ( ForInit opt ; Expression opt ; ForUpdate opt ) Statement Lecture 2 Formal Grammars CS631 Fall 2000 23 Lecture 2 Formal Grammars CS631 Fall 2000 24 4

  5. Compiler Issues: AST Example Abstract Syntax Tree (AST) ✁ A tree structure used by compilers. < exp > ✁ A parse tree with nonterminals removed, ( < exp > ) + containing only what the compiler needs for code a * < exp > + < exp > generation. b c ✁ Usually, each node is an operator and each a < exp > * < exp > subtree of that node is an operand... b c But there’s no standard definition for this. It depends on the compiler. Lecture 2 Formal Grammars CS631 Fall 2000 25 Lecture 2 Formal Grammars CS631 Fall 2000 26 Compilers and Interpreters Summary ✁ We use context-free grammars to define Generates code Checks things like Converts input file for physical machine. type correctness. into a stream of language syntax. tokens for parsing. ✁ The grammar defines how to build parse Code Physical Generator Machine trees; the language is the set of strings Source Static Scanner Parser AST derived by some parse tree. Code Analyzer ✁ Different notations, same ideas: Virtual Machine Parses tokens using (Interpreter) – formal grammars a grammar; produces – Backus-Naur Form (BNF) Abstract Syntax Tree Executes the program using a simulated machine – Extended BNF (EBNF) (like the Java VM) Lecture 2 Formal Grammars CS631 Fall 2000 28 Review Questions Look at questions 2.4, 2.6, 2.9 Lecture 2 Formal Grammars CS631 Fall 2000 29 5

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend