Compiler Development (CMPSC 401)
Syntax Analysis Janyl Jumadinova February 14, 2019
Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 1 / 14
Compiler Development (CMPSC 401) Syntax Analysis Janyl Jumadinova - - PowerPoint PPT Presentation
Compiler Development (CMPSC 401) Syntax Analysis Janyl Jumadinova February 14, 2019 Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 1 / 14 Syntax Analysis (Parsing) Janyl Jumadinova Compiler Development (CMPSC 401)
Syntax Analysis Janyl Jumadinova February 14, 2019
Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 1 / 14
Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 2 / 14
Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 2 / 14
Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 3 / 14
After lexical analysis (scanning), we have a series of tokens. In syntax analysis (or parsing ), we want to interpret what those tokens mean.
Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 3 / 14
After lexical analysis (scanning), we have a series of tokens. In syntax analysis (or parsing ), we want to interpret what those tokens mean. Goal: Recover the structure described by that series of tokens. Goal: Report errors if those tokens do not properly encode a structure.
Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 3 / 14
An alphabet is a set of symbols that act as letters. A language over is a set of strings made from symbols in .
Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 4 / 14
An alphabet is a set of symbols that act as letters. A language over is a set of strings made from symbols in . When scanning, our alphabet was ASCII or Unicode characters. We produced tokens.
Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 4 / 14
An alphabet is a set of symbols that act as letters. A language over is a set of strings made from symbols in . When scanning, our alphabet was ASCII or Unicode characters. We produced tokens. When parsing, our alphabet is the set of tokens produced by the scanner.
Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 4 / 14
When scanning, we used regular expressions to define each token. Unfortunately, regular expressions are (usually) too weak to define programming languages.
Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 5 / 14
When scanning, we used regular expressions to define each token. Unfortunately, regular expressions are (usually) too weak to define programming languages. Cannot define a regular expression matching all expressions with properly balanced parentheses. Cannot define a regular expression matching all functions with properly nested block structure.
Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 5 / 14
When scanning, we used regular expressions to define each token. Unfortunately, regular expressions are (usually) too weak to define programming languages. Cannot define a regular expression matching all expressions with properly balanced parentheses. Cannot define a regular expression matching all functions with properly nested block structure. We need a more powerful formalism.
Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 5 / 14
A context-free grammar (or CFG) is a formalism for defining languages. Can define the context-free languages, a strict superset of the regular languages. Unlike regular grammars, the right hand-side of the production rules are unrestricted.
Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 6 / 14
One possible CFG for describing all legal arithmetic expressions using addition, subtraction, multiplication, and division
Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 7 / 14
One possible CFG for describing all legal arithmetic expressions using addition, subtraction, multiplication, and division
Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 8 / 14
Formally, a context-free grammar (as is the regular grammar) is a collection of four objects: A set of nonterminal symbols (or variables ), A set of terminal symbols, A set of production rules saying how each nonterminal can be converted by a string of terminals and nonterminals, and A start symbol that begins the derivation.
Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 9 / 14
A context-free grammar is said to be ambiguous if there is more than one derivation for a particular string.
Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 10 / 14
A context-free grammar is said to be ambiguous if there is more than one derivation for a particular string. Consider:
1 S → ASB 2 S → ǫ 3 A → a 4 B → b Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 10 / 14
Consider:
1 Expr → Expr + Expr 2 Expr → Expr * Expr 3 Expr → ( Expr ) 4 Expr → var 5 Expr → const
There are two different derivation trees for the string var+var*var
Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 11 / 14
We need unambiguous grammars for parsing
syntax tree, which in turn determines meaning.
Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 12 / 14
We need unambiguous grammars for parsing
syntax tree, which in turn determines meaning. If a grammar can be made unambiguous at all, it is usually made unambiguous through layering.
Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 12 / 14
We need unambiguous grammars for parsing
syntax tree, which in turn determines meaning. If a grammar can be made unambiguous at all, it is usually made unambiguous through layering. – Have exactly one way to build each piece of the string. – Have exactly one way of combining those pieces back together.
Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 12 / 14
With grammar: If you can re-design the language, can avoid the problem entirely, e.g., create an end to match closest if
Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 13 / 14
With grammar: If you can re-design the language, can avoid the problem entirely, e.g., create an end to match closest if With tools: Most parser tools can cope with ambiguous grammars.
basis for generated parser, without creating problems.
Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 13 / 14
If we leave the world of pure CFGs, we can often resolve ambiguities through precedence declarations
precedence than exponentiation.
Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 14 / 14
If we leave the world of pure CFGs, we can often resolve ambiguities through precedence declarations
precedence than exponentiation. Allows for unambiguous parsing of ambiguous grammars.
Janyl Jumadinova Compiler Development (CMPSC 401) February 14, 2019 14 / 14