CSE443 Compilers
- Dr. Carl Alphonce
alphonce@buffalo.edu 343 Davis Hall
http:/ /www.cse.buffalo.edu/faculty/alphonce/SP17 /CSE443/index.php https:/ /piazza.com/class/iybn4ndqa1s3ei
CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis - - PowerPoint PPT Presentation
CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis Hall http:/ /www.cse.buffalo.edu/faculty/alphonce/SP17 /CSE443/index.php https:/ /piazza.com/class/iybn4ndqa1s3ei Phases of a Syntactic compiler structure Figure 1.6,
alphonce@buffalo.edu 343 Davis Hall
http:/ /www.cse.buffalo.edu/faculty/alphonce/SP17 /CSE443/index.php https:/ /piazza.com/class/iybn4ndqa1s3ei
Figure 1.6, page 5 of text
5
(from Sebesta (10th ed), p. 115)
some finite set of symbols (called the alphabet of the language).
descriptions of the lowest-level syntactic units […] called lexemes.”
two parts:
– regular grammar for token structure (e.g. structure of identifiers) – context-free grammar for sentence structure
6
Lexemes Tokens foo identifier i identifier sum identifier
integer_literal 10 integer_literal 1 integer_literal ; statement_separator = assignment_operator
7
– Invented by John Backus to describe ALGOL 58, modified by Peter Naur for ALGOL 60 – BNF is equivalent to context-free grammar – BNF is a metalanguage used to describe another language, the object language – Extended BNF: adds syntactic sugar to produce more readable descriptions
8
<assign> → <var> = <expression> <if_stmt> → if <logic_expr> then <stmt> <if_stmt> → if <logic_expr> then <stmt> else <stmt>
<if_stmt> → if <logic_expr> then <stmt> | if <logic_expr> then <stmt> else <stmt>
9
side (RHS), and consists of terminal and nonterminal symbols
(terminal and non-terminal sets are implicit in rules, as is start symbol)
10
programming language allows a list of items (e.g. parameter list, argument list).
11
identifiers, whose minimum length is one:
<ident_list> -> ident | ident , <ident_list>
language being described by the grammar).
12
rules, starting with the start symbol and ending with a sentence (all terminal symbols)
13
G2 = ({a, the, dog, cat, chased}, {S, NP, VP, Det, N, V}, {S à NP VP, NP à Det N, Det à a | the, N à dog | cat, VP à V | VP NP, V à chased}, S)
14
S à NP VP à Det N VP à the N VP à the dog VP à the dog V NP à the dog chased NP à the dog chased Det N à the dog chased a N à the dog chased a cat
15
L3 = { 0, 1, 00, 11, 000, 111, 0000, 1111, … } G3 = ( {0, 1}, {S, ZeroList, OneList}, {S à ZeroList | OneList, ZeroList à 0 | 0 ZeroList, OneList à 1 | 1 OneList }, S )
16
S à ZeroList à 0 ZeroList à 0 0 ZeroList à 0 0 0 ZeroList à 0 0 0 0
S à OneList à 1 OneList à 1 1 OneList à 1 1 1
17
a sentential form.
terminal symbols.
leftmost nonterminal in each sentential form is the one that is expanded.
neither.
18
<program> -> <stmt-list> <stmt-list> -> <stmt> | <stmt> ; <stmt-list> <stmt> -> <var> = <expr> <var> -> a | b | c | d <expr> -> <term> + <term> | <term> - <term> <term> -> <var> | const
19
<program> => <stmt-list> => <stmt> => <var> = <expr> => a = <expr> => a = <term> + <term> => a = <var> + <term> => a = b + <term> => a = b + const
20
derivation:
<program> <stmt-list> <stmt> const a <var> = <expr> <var> b <term> + <term>
21
(or for different parts of a program).
parse tree from a given input, it reports a compilation error.
semantic interpretation/translation of the program.
22
<proc_call> -> ident [(<expr_list>)]
parentheses and separated via vertical bars
<term> -> <term> (+|-) const
{ }
<ident> -> letter {letter|digit}
23
<expr> -> <expr> + <term> | <expr> - <term> | <term> <term> -> <term> * <factor> | <term> / <factor> | <factor>
<expr> -> <term> {(+ | -) <term>} <term> -> <factor> {(* | /) <factor>}
24
generates a sentential form that has two or more distinct parse trees
associativity are two examples of ways in which a grammar can provide an unambiguous interpretation.
25
The following grammar is ambiguous:
<expr> -> <expr> <op> <expr> | const <op> -> / | -
The grammar treats the '/' and '-' operators equivalently.
26
<expr> -> <expr> <op> <expr> | const <op> -> / | -
<expr> <expr> <expr> <expr> <expr> <expr> <expr> <expr> <expr> <expr> <op> <op> <op> <op> const const const const const const
/ <op>
28
<expr> -> <expr> - <term> | <term> <term> -> <term> / const | const <expr> <expr> <term> <term> <term> const const const /
29
Below are some links to grammars for real programming languages. Look at how the grammars are expressed.
– http://www.schemers.org/Documents/Standards/R5RS/ – http://www.sics.se/isl/sicstuswww/site/documentation.html
In the ones listed below, find the parts of the grammar that deal with operator precedence.
– http://java.sun.com/docs/books/jls/index.html – http://www.lykkenborg.no/java/grammar/JLS3.html – http://www.enseignement.polytechnique.fr/profs/informatique/Jean- Jacques.Levy/poly/mainB/node23.html – http://www.lrz-muenchen.de/~bernhard/Pascal-EBNF.html
30
Derivation of 2+5*3 using C grammar
<expression> <conditional-expression> <assignment-expression> <logical-OR-expression> <inclusive-OR-expression> <AND-expression> <logical-AND-expression> <exclusive-OR-expression> <equality-expression> <relational-expression> <shift-expression> <additive-expression> <additive-expression> + <multiplicative-expression> <multiplicative-expression> <cast-expression> <unary-expression> <postfix-expression> <primary-expression> <constant> 2 <multiplicative-expression> <cast-expression> <unary-expression> <postfix-expression> <primary-expression> <constant> 3 <cast-expression> <unary-expression> <postfix-expression> <primary-expression> <constant> 5 *
31
so that + is higher in the tree than *.
multiplication we must use parentheses, as in (2+3)*4.
expression, as in the following grammar fragment:
<expr> à <expr> + <term> | <term> <term> à <term> * <factor> | <factor> <factor> à <variable> | <constant> | “(” <expr> “)”
There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to read a syntax description to be able to write well-formed programs in the language. Understanding at least a little of what a compiler does in translating a program from high-level to low-level forms deepens your understanding of why programming languages are designed the way they are, and equips you to better diagnose subtle bugs in programs. The next slide shows the “evaluation order” remark in the C++ language reference, which alludes to the order being left unspecified to allow a compiler to optimize the code during translation.
32
33
C++ Programming Language, 3rd edition. Bjarne Stroustrup. (c) 1997. Page 122.
A compiler translates high level language statements into a much larger number of low-level statements, and then applies
program. The next slides shows that different phases of compilation can apply different types of optimizations (some target-independent, some target-dependent). By not specifying the order in which subexpressions are evaluated (left-to-right or right-to-left) a C++ compiler can potentially re-
34
35
Compilers: principles, techniques, and tools (Aho et al) (c) 2007, page 5.
Given a regular language L we can always construct a context free grammar G such that L = 𝓜(G). For every regular langauge L there is an NFA M = (S,∑,𝛆,F ,s0) such that L = 𝓜(M). Build G = (N,T,P,S0) as follows: N = { Ns | s ∈ S } T = { t | t ∈ ∑ } If 𝛆(i,a)=j, then add Ni → a Nj to P If i ∈ F , then add Ni → 𝜁 to P S0 = Nso
Proof (sketch): L ∈ CFL: S → aSb | ab L ∉ RL (by contradiction): Assume L is regular. In this case there exists a DFA D=(S,∑,𝛆,F ,s0) such that 𝓜(D) = L. Let k = |S|. Consider a
ib i, where i>k.
Suppose 𝛆(s0, a
i) = sr. Since i>k, not all of the states between
s0 and sr are distinct. Hence, there are v and w, 0 ≤ v < w ≤ k such that sv = sw. In other words, there is a loop. This DFA can certainly recognize a
ib i but it can also
recognize a
jb i, where i ≠ j, by following the loop.
"REGULAR GRAMMARS CANNOT COUNT"