CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis - - PowerPoint PPT Presentation

cse443 compilers
SMART_READER_LITE
LIVE PREVIEW

CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis - - PowerPoint PPT Presentation

CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis Hall Phases of a Syntactic compiler structure Figure 1.6, page 5 of text Recap Lexical analysis: LEX/FLEX (regex -> lexer) Syntactic analysis: YACC/BISON (grammar


slide-1
SLIDE 1

CSE443 Compilers

  • Dr. Carl Alphonce

alphonce@buffalo.edu 343 Davis Hall

slide-2
SLIDE 2

Phases of a compiler

Figure 1.6, page 5 of text

Syntactic structure

slide-3
SLIDE 3

Recap

Lexical analysis: LEX/FLEX (regex -> lexer) Syntactic analysis: YACC/BISON (grammar -> parser)

slide-4
SLIDE 4

Continuing from Friday

With precedence rule forcing an expression like 2+3*4 to be interpreted as 2+(3*4), how can be modify grammar to allow (2+3)*4 as a valid expression? <expr> -> <expr> + <term> | <term> <term> -> <term> * <factor> | <factor> <factor> -> <variable> | <constant> | '(' <expr> ')'

slide-5
SLIDE 5

Lecture discussion

There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to read a syntax description to be able to write well-formed programs in the language. Understanding at least a little of what a compiler does in translating a program from high-level to low-level forms deepens your understanding of why programming languages are designed the way they are, and equips you to better diagnose subtle bugs in programs. The next slide shows the “evaluation order” remark in the C++ language reference, which alludes to the order being left unspecified to allow a compiler to optimize the code during translation.

32

slide-6
SLIDE 6

Shown on Visualizer

33

C++ Programming Language, 3rd edition. Bjarne Stroustrup. (c) 1997. Page 122.

slide-7
SLIDE 7

A compiler translates high level language statements into a much larger number of low-level statements, and then applies

  • ptimizations. The entire translation process, including
  • ptimizations, must preserve the semantics of the original high-level

program. The next slides shows that different phases of compilation can apply different types of optimizations (some target-independent, some target-dependent). By not specifying the order in which subexpressions are evaluated (left-to-right or right-to-left) a C++ compiler can potentially re-

  • rder the resulting low-level instructions to give a “better” result.

34

slide-8
SLIDE 8

Returning to an earlier question

A few lectures back the question was asked whether there are context free languages which are not regular.

slide-9
SLIDE 9

SOURCE: https:/ /openi.nlm.nih.gov/detailedresult.php?img=PMC3367694_rstb20120103-g2&req=4 AUTHORS: Fitch WT, Friederici AD - Philos. Trans. R. Soc. Lond., B, Biol. Sci. (2012) LICENSE: http:/ /creativecommons.org/licenses/by/3.0/

Lexical structure Syntactic structure

slide-10
SLIDE 10

RL ⊆ CFL

proof sketch

Given a regular language L we can always construct a context free grammar G such that L = 𝓜(G). For every regular language L there is an NFA M = (S,∑,𝛆,F ,s0) such that L = 𝓜(M). Build G = (N,T,P,S0) as follows: N = { Ns | s ∈ S } T = { t | t ∈ ∑ } If 𝛆(i,a)=j, then add Ni → a Nj to P If i ∈ F , then add Ni → 𝜁 to P S0 = Nso

slide-11
SLIDE 11

(a|b)*abb

G = ( {A0, A1, A2, A3}, {a, b}, {A0 → a A0, A0 → b A0, A0 → a A1, A1 → b A2, A2 → b A3, A3 → 𝜁}, A0 }

1 2 3 a b b a b

slide-12
SLIDE 12

RL ⊊ CFL

proof sketch

Show that not all CF languages are regular. To do this we only need to demonstrate that there exists a CFL that is not regular. Consider L = { anbn | n ≥ 1 } Claim: L ∈ CFL, L ∉ RL

slide-13
SLIDE 13

L ∈ CFL: S → aSb | ab L ∉ RL (by contradiction): Assume L is regular. In this case there exists a DFA D=(S,∑,𝛆,F ,s0) such that 𝓜(D) = L. Let k = |S|. Consider aibi, where i>k. Suppose 𝛆(s0, ai) = sr. Since i>k, not all of the states between s0 and sr are distinct. Hence, there are v and w, 0 ≤ v < w ≤ k such that sv = sw. In other words, there is a loop. This DFA can certainly recognize aibi but it can also recognize ajbi, where i ≠ j, by following the loop. "REGULAR GRAMMARS CANNOT COUNT"

RL ⊊ CFL

proof sketch

slide-14
SLIDE 14

Relevance? Nested '{' and '}'

public class Foo { public static void main(String[] args) { for (int i=0; i<args.length; i++) { if (args[I].length() < 3) { … } else { … } } } }

slide-15
SLIDE 15

Context Free Grammars and parsing

O(n3) algorithms to parse any CFG exist Programming language constructs can generally be parsed in O(n)

slide-16
SLIDE 16

Top-down & bottom-up

A top-down parser builds a parse tree from root to the leaves easier to construct by hand A bottom-up parser builds a parse tree from leaves to root Handles a larger class of grammars tools (yacc/bison) build bottom-up parsers

slide-17
SLIDE 17

Our presentation First top-down, then bottom-up

Present top-down parsing first. Introduce necessary vocabulary and data structures. Move on to bottom-up parsing second.

slide-18
SLIDE 18

vocab: look-ahead

The current symbol being scanned in the input is called the lookahead symbol.

token token token token token token PARSER

slide-19
SLIDE 19

Top-down parsing

slide-20
SLIDE 20

Top-down parsing

Start from grammar's start symbol Build parse tree so its yield matches input predictive parsing: a simple form of recursive descent parsing

slide-21
SLIDE 21

If 𝛽∈(NUT)* then FIRST(𝛽) is "the set of terminals that appear as the first symbols of one or more strings of terminals generated from 𝛽." [p. 64] Ex: If A -> a 𝛾 then FIRST(A) = {a}

  • Ex. If A -> a 𝛾 | B then FIRST(A) = {a} ∪ FIRST(B)

FIRST(𝛽)

slide-22
SLIDE 22

FIRST(𝛽)

First sets are considered when there are two (or more) productions to expand A ∈ N: A -> 𝛽 | 𝛾 Predictive parsing requires that FIRST(𝛽) ∩ FIRST(𝛾) = ∅

slide-23
SLIDE 23

𝜁 productions

If lookahead symbol does not match first set, use 𝜁 production not to advance lookahead symbol but instead "discard" non-terminal:

  • ptexpt -> expr | 𝜁

"While parsing optexpr, if the lookahead symbol is not in FIRST(expr), then the 𝜁 production is used" [p. 66]

slide-24
SLIDE 24

Left recursion

Grammars with left recursion are problematic for top-down parsers, as they lead to infinite regress.

slide-25
SLIDE 25

Left recursion example

Grammar: expr -> expr + term | term term -> id FIRST sets for rule alternatives are not disjoint: FIRST(expr) = id FIRST(term) = id

expr + term expr + term expr + term term expr

slide-26
SLIDE 26

Left recursion example

Grammar: expr -> expr + term | term term -> id FIRST sets for rule alternatives are not disjoint: FIRST(expr) = id FIRST(term) = id

expr + term expr + term expr + term term

𝛾 𝛽 𝛽 𝛽

expr

𝛾 𝛽

slide-27
SLIDE 27

Rewriting grammar to remove left recursion

expr rule is of form A -> A 𝛽 | 𝛾 Rewrite as two rules A -> 𝛾 R R -> 𝛽 R | 𝜁

slide-28
SLIDE 28

Back to example

Grammar is re- written as expr -> term R R -> + term R | 𝜁

expr R + term term

𝛾 𝛽 𝛽 𝛽

R + term R + term 𝜁 R