Compiler Construction Lecture 5: Syntax Analysis I (Introduction) - PowerPoint PPT Presentation

Compiler Construction Lecture 5: Syntax Analysis I (Introduction) Winter Semester 2018/19 Thomas Noll Software Modeling and Verification Group RWTH Aachen University https://moves.rwth-aachen.de/teaching/ws-1819/cc/

Conceptual Structure of a Compiler Source code Lexical analysis (Scanner) ( id , x1 )( gets , )( id , y2 )( plus , )( int , 1 )( sem , ) context-free grammars/ Syntax analysis (Parser) Asg pushdown automata Var Exp Semantic analysis Sum Var Con Generation of intermediate code Code optimisation Generation of target code Target code 2 of 24 Compiler Construction Winter Semester 2018/19 Lecture 5: Syntax Analysis I (Introduction)

Problem Statement Syntactic Structures From Merriam-Webster’s Online Dictionary Syntax: the way in which linguistic elements (as words) are put together to form constituents (as phrases or clauses) • Starting point: sequence of symbols as produced by the scanner – here: ignore attribute information – Σ (finite) set of tokens (= syntactic atoms/terminal symbols, (e.g., { id , if , int , . . . } ) – w ∈ Σ ∗ token sequence (obviously, not every w ∈ Σ ∗ forms a valid program) • Syntactic units: atomic: keywords, variable/type/procedure/... identifiers, numerals, arithmetic/Boolean operators, ... composite: declarations, arithmetic/Boolean expressions, statements, ... • Observation: the hierarchical structure of (composite) syntactic units can be described by context-free grammars 4 of 24 Compiler Construction Winter Semester 2018/19 Lecture 5: Syntax Analysis I (Introduction)

Problem Statement Syntax Analysis Definition 5.1 The goal of syntax analysis is to determine the syntactic structure of a program, given by a token sequence, according to a context-free grammar. The corresponding program is called a parser: (token [, attribute]) syntax tree Semantic analyser Scanner Parser get next token Symbol table Asg Example: . . . �x1�:=y2+�1�;� . . . Var Exp ↓ Scanner Sum Parser . . . ( id , p 1 )( gets , )( id , p 2 )( plus , )( int , 1 )( sem , ) . . . − → Var Con 5 of 24 Compiler Construction Winter Semester 2018/19 Lecture 5: Syntax Analysis I (Introduction)

Context-Free Grammars and Languages Context-Free Grammars I Definition 5.2 (Syntax of context-free grammars) A context-free grammar (CFG) (over Σ ) is a quadruple G = � N , Σ , P , S � where • N is a finite set of nonterminal symbols, • Σ is a (finite) alphabet of terminal symbols (disjoint from N ), • P is a finite set of production rules of the form A → α where – A ∈ N and – α ∈ X ∗ for X := N ∪ Σ , • S ∈ N is a start symbol. The set of all context-free grammars over Σ is denoted by CFG Σ . Remarks: as denotations we generally use • A , B , C , . . . ∈ N for nonterminal symbols • a , b , c , . . . ∈ Σ for terminal symbols • u , v , w , x , y , . . . ∈ Σ ∗ for terminal words • α, β, γ, . . . ∈ X ∗ for sentences 7 of 24 Compiler Construction Winter Semester 2018/19 Lecture 5: Syntax Analysis I (Introduction)

Context-Free Grammars and Languages Context-Free Grammars II Context-free grammars generate context-free languages: Definition 5.3 (Semantics of context-free grammars) Let G = � N , Σ , P , S � be a context-free grammar. • The derivation relation ⇒ ⊆ X + × X ∗ of G is defined by α ⇒ β iff there exist α 1 , α 2 ∈ X ∗ , A → γ ∈ P such that α = α 1 A α 2 and β = α 1 γα 2 . • If additionally α 1 ∈ Σ ∗ or α 2 ∈ Σ ∗ , then we respectively write α ⇒ l β or α ⇒ r β (leftmost/rightmost derivation). • The language generated by G is given by L ( G ) := { w ∈ Σ ∗ | S ⇒ ∗ w } . • If a language L ⊆ Σ ∗ is generated by some G ∈ CFG Σ , then L is called context-free. The set of all context-free languages over Σ is denoted by CFL Σ . Remark: obviously, L ( G ) = { w ∈ Σ ∗ | S ⇒ ∗ l w } = { w ∈ Σ ∗ | S ⇒ ∗ r w } 8 of 24 Compiler Construction Winter Semester 2018/19 Lecture 5: Syntax Analysis I (Introduction)

Context-Free Grammars and Languages Context-Free Languages Example 5.4 The grammar G = � N , Σ , P , S � ∈ CFG Σ over Σ := { a , b } , given by the productions S → aSb | ε, generates the context-free (and non-regular) language L = { a n b n | n ∈ N } . The example derivation S ⇒ aSb ⇒ aaSbb ⇒ aabb can be represented by the following syntax tree for aabb : S a S b a S b ε 9 of 24 Compiler Construction Winter Semester 2018/19 Lecture 5: Syntax Analysis I (Introduction)

Context-Free Grammars and Languages Syntax Trees, Derivations, and Words Observations 1. Every syntax tree yields exactly one word (= concatenation of terminal leaves). 2. Every syntax tree corresponds to exactly one leftmost derivation, and vice versa. 3. Every syntax tree corresponds to exactly one rightmost derivation, and vice versa. Thus: syntax trees are uniquely representable by leftmost/rightmost derivations. But: a word can have several syntax trees (see next slide). 10 of 24 Compiler Construction Winter Semester 2018/19 Lecture 5: Syntax Analysis I (Introduction)

Context-Free Grammars and Languages Ambiguity of CFGs and CFLs I Definition 5.5 (Ambiguity) • A context-free grammar G ∈ CFG Σ is called unambiguous if every word w ∈ L ( G ) has exactly one syntax tree. Otherwise it is called ambiguous. • A context-free language L ∈ CFL Σ is called inherently ambiguous if every grammar G ∈ CFG Σ with L ( G ) = L is ambiguous. Example 5.6 on the board Corollary 5.7 A grammar G ∈ CFG Σ is unambiguous iff every word w ∈ L ( G ) has exactly one leftmost derivation iff every word w ∈ L ( G ) has exactly one rightmost derivation. 11 of 24 Compiler Construction Winter Semester 2018/19 Lecture 5: Syntax Analysis I (Introduction)

Context-Free Grammars and Languages Ambiguity of CFGs and CFLs II Theorem 5.8 It is generally undecidable whether a given CFG is ambiguous or not. Proof (idea). Reduction from Post Correspondence Problem: given instance ( � y ) of PCP x ,� , construct CFG G with two “branches” S → X | Y that respectively enumerate all � x / � y -concatenations (plus corresponding index information). Result: G is ambiguous iff ( � x ,� y ) has a solution (see [Hopcroft, Motwani, Ullman: Introduction to Automata Theory, Languages, and Computation , 2011, Section 9.5.2] for details) Remark: resolution of ambiguities by parser (later) • yacc : operator precedences and associativities • ANTLR : predicates 12 of 24 Compiler Construction Winter Semester 2018/19 Lecture 5: Syntax Analysis I (Introduction)

Parsing Context-Free Languages The Word Problem for Context-Free Languages Problem 5.9 (Word problem for context-free languages) Given G ∈ CFG Σ and w ∈ Σ ∗ , decide whether w ∈ L ( G ) (and determine a corresponding syntax tree). This problem is decidable for arbitrary CFGs: • [for CFGs in Chomsky Normal Form] Using the tabular method by Cocke, Younger, and Kasami (“CYK Algorithm”; time/space complexity O ( | w | 3 ) / O ( | w | 2 ) ) • Using the predecessor method: ⇒ S ∈ pre ∗ ( { w } ) w ∈ L ( G ) ⇐ where pre ∗ ( M ) := { α ∈ X ∗ | α ⇒ ∗ β for some β ∈ M } (polynomial [non-linear] time complexity) 14 of 24 Compiler Construction Winter Semester 2018/19 Lecture 5: Syntax Analysis I (Introduction)

Parsing Context-Free Languages Parsing Context-Free Languages Goal: exploit the special syntactic structures as present in programming languages (usually: no ambiguities) to devise parsing methods which are based on deterministic pushdown automata with linear space and time complexity Two approaches: Top-down parsing: construction of syntax tree from the root towards the leaves, representation as leftmost derivation Bottom-up parsing: construction of syntax tree from the leaves towards the root, representation as (reversed) rightmost derivation 15 of 24 Compiler Construction Winter Semester 2018/19 Lecture 5: Syntax Analysis I (Introduction)

Parsing Context-Free Languages Leftmost/Rightmost Analysis I Goal: compact representation of left-/rightmost derivations by index sequences Definition 5.10 (Leftmost/rightmost analysis) Let G = � N , Σ , P , S � ∈ CFG Σ where P = { π 1 , . . . , π p } . • If i ∈ [ p ] , π i = A → γ , w ∈ Σ ∗ , and α ∈ X ∗ , then we write i i ⇒ l w γα ⇒ r αγ w . wA α and α Aw ⇒ l β if there exist α 0 , . . . , α n ∈ X ∗ such that α 0 = α , z • If z = i 1 . . . i n ∈ [ p ] ∗ , we write α i j z α n = β , and α j − 1 ⇒ l α j for every j ∈ [ n ] (analogously for ⇒ r ). • An index sequence z ∈ [ p ] ∗ is called a leftmost analysis (rightmost analysis) of α if S z ⇒ l α z ⇒ r α ), respectively. ( S 16 of 24 Compiler Construction Winter Semester 2018/19 Lecture 5: Syntax Analysis I (Introduction)

Parsing Context-Free Languages Leftmost/Rightmost Analysis II Example 5.11 Grammar for arithmetic expressions: G AE : E → E + T | T ( 1 , 2 ) T → T * F | F ( 3 , 4 ) F → ( E ) | a | b ( 5 , 6 , 7 ) Leftmost derivation of (a)*b : 2 3 4 5 ⇒ l ⇒ l ⇒ l ⇒ l ( E )* F T * F F * F E T 2 4 6 7 ⇒ l ( T )* F ⇒ l ( F )* F ⇒ l (a)* F ⇒ l (a)*b = ⇒ leftmost analysis: 23452467 Rightmost derivation of (a)*b : 2 3 7 4 ⇒ r ⇒ r ⇒ r ⇒ r T * F T *b F *b E T 5 2 4 6 ⇒ r ( E )*b ⇒ r ( T )*b ⇒ r ( F )*b ⇒ r (a)*b = ⇒ rightmost analysis: 23745246 17 of 24 Compiler Construction Winter Semester 2018/19 Lecture 5: Syntax Analysis I (Introduction)

Compiler Construction Lecture 5: Syntax Analysis I (Introduction) - PowerPoint PPT Presentation

Compiler Construction Lecture 5: Syntax Analysis I (Introduction) Winter Semester 2018/19 Thomas Noll Software Modeling and Verification Group RWTH Aachen University https://moves.rwth-aachen.de/teaching/ws-1819/cc/ Conceptual Structure of a

Compiler Construction Chapter 11 1 Compiler Construction Compiler Construction A New Compiler

Compiler Construction Compiler Construction 1 / 111 Mayer Goldberg \ Ben-Gurion University

Compiler Construction November 21, 2018 Compiler Construction November 21, 2018 1 / 102 Mayer

Compiler Construction Compiler Construction 1 / 54 Mayer Goldberg \ Ben-Gurion University Tuesday

Compiler Construction Compiler Construction 1 / 193 Mayer Goldberg \ Ben-Gurion University Friday

Compiler Construction October 20, 2018 Compiler Construction October 20, 2018 1 / 115 Mayer

Compiler Construction Compiler Construction 1 / 177 Mayer Goldberg \ Ben-Gurion University

Compiler Construction Compiler Construction 1 / 87 Mayer Goldberg \ Ben-Gurion University

Compiler Construction Compiler Construction 1 / 88 Mayer Goldberg \ Ben-Gurion University Tuesday

Compiler Construction Compiler Construction 1 / 104 Mayer Goldberg \ Ben-Gurion University Friday

Compiler Construction Compiler Construction 1 / 104 Mayer Goldberg \ Ben-Gurion University Monday

Compiler Construction October 31, 2018 Compiler Construction October 31, 2018 1 / 175 Mayer

Compiler Construction Compiler Construction 1 / 114 Mayer Goldberg \ Ben-Gurion University

Compiler Construction Compiler Construction 1 / 112 Mayer Goldberg \ Ben-Gurion University

Compiler Construction Christian Rinderknecht 31 October 2008 1 Why study compiler construction?

Compiler Construction Lecture 19: Code Generation V (Compiler Backend) Winter Semester 2018/19

Probabilistic Context-Free Probabilistic Context-Free Grammars (PCFGs) Grammars (PCFGs) Berlin

Proc. of the 37th ACL (Assoc. for Computational Linguistics) (1999) Ecien t P

CSCI 5832 Natural Language Processing Lecture 14 Jim Martin 2/28/07 CSCI 5832 Spring 2007 1

Definiton: Derivation tree Let G = ( V, T, S, P ) be a cfg. An ordered tree is called a derivation

Statistical Parsing Recap (dashed ellipse) are adequate for representing natural languages

Muon Collider Lattice Design Y. Alexahin (FNAL APC) MAP 2014 Winter Meeting, SLAC December 3-7,

The CLIC FFS, ATF2 ultra-low betas and even more chromatic proposals R. Toms Thanks to P.

Fabien Pierre. University of Lorraine (France), LORIA, INRIA team MAGRIT. Variational methods and