Compiler Design Spring 2018 3.5 Limitations of context-free - - PowerPoint PPT Presentation

compiler design
SMART_READER_LITE
LIVE PREVIEW

Compiler Design Spring 2018 3.5 Limitations of context-free - - PowerPoint PPT Presentation

Compiler Design Spring 2018 3.5 Limitations of context-free grammars 4.0 Semantic analysis Thomas R. Gross Computer Science Department ETH Zurich, Switzerland 1 Context-free grammars Efficient parsers exist for context-free languages


slide-1
SLIDE 1

Compiler Design

Spring 2018

3.5 Limitations of context-free grammars 4.0 Semantic analysis

1

Thomas R. Gross Computer Science Department ETH Zurich, Switzerland

slide-2
SLIDE 2

Context-free grammars

§ Efficient parsers exist for context-free languages § Should we look at other language classes?

§ Context-sensitive § Unrestricted grammars

§ Grammars are about checking properties

2

slide-3
SLIDE 3

Compiler structure

§ Parser builds parse tree

§ (Concrete syntax tree) § Can be turned into abstract syntax tree (AST) § Checks input for compliance with language spec

§ Can be turned into abstract syntax tree (AST)

§ Remove unnecessary detail § Most details related to grammar symbols is not critical

§ From parse tree / AST to code generation

§ We did this (in part) in Homework 1 § (Will do more in later Homework)

3

slide-4
SLIDE 4

Extra step: Error detection

§ Parse tree construction

§ Parser finds some kinds of errors but not all

§ Some kinds of errors can be detected only at runtime

§ “Syntax errors” § Efficient parsing algorithms known for Type-2 (context-free) grammars

§ Not always desirable: Find errors with parser

§ Limitations of context-free grammars

4

slide-5
SLIDE 5

A useful property: Variables declared

§ Consider a language like Java(Li) § The spec requires that all variables used have been declared int x; x = x + 1;

5

slide-6
SLIDE 6

Using parsing to check property

§ How could we express this property so that it can be checked by the parser?

§ A parse tree is constructed only for those programs that maintain this property (variables declared before use) § Otherwise error is signaled

§ Can we find a language LJ to model this property? § Then we can think about a grammar GJ such that L(GJ) = LJ

6

slide-7
SLIDE 7

LJ

§ LJ = { a c a | a ∈ {a, b}* }

§ Terminals: a, b, c

§ Example words from LJ

§ aacaa § abcab § aabacaaba

§ How does LJ relate to our problem?

7

§ Not in LJ

§ ca § acb

slide-8
SLIDE 8

void fct1() { int x; { x = x + 1 } } § Could use LJ = { a c a d | a ∈ {a, b}* }

9

void fct2() { int x; { x = y + 1 } }

slide-9
SLIDE 9

LJ

§ LJ allows us to model the following constraint Any variable that appears in the program/function/method has been declared previously

§ Terminal c defines a separation between the “body” of a unit and the definition block.

§ Useful property to check before code generation

10

slide-10
SLIDE 10

LJ

§ LJ allows us to model the following constraint Any variable that appears in the program/function/method has been declared previously § Bad news: (Theorem) There exists no context-free grammar G such that LJ = L(G) § Proof:

11

slide-11
SLIDE 11

Another useful property: Matching parameters

§ Consider a language like Java(Li) § The spec requires that for all methods/functions, the number of formal parameters (at the place of method definition) matches the number of actual parameters (at the call site) int fct (int a, float b, xref c) { … } x = fct(a, b, c);

12

slide-12
SLIDE 12

Another useful property

§ How could we express this property so that it can be checked by the parser?

§ A parse tree is constructed only for those programs that maintain this property (actuals and formals match) § Otherwise error is signaled

§ Can we find a language LP to model this property? § Then we can think about a grammar GP such that L(GP) = LP

13

slide-13
SLIDE 13

LP

§ Lp = { an bm cn dm }

§ a, b, c, d: terminals § Integers n, m ≥ 1

§ Example words from LP

§ aabccd § aaabbcccdd § abbbbcdddd

§ Why would we care about LP?

14

§ Not in LP

§ aabcd

slide-14
SLIDE 14

LP

§ LP allows us to model the following constraint For all methods/functions, the number of formal parameters (at the place of method definition) matches the number of actual parameters (at the call site)

§ Can be extended to deal with matching types § Tricky if type conversions are an option

§ Useful property to check before code generation

15

slide-15
SLIDE 15

LP

§ LP allows us to model the following constraint For all methods/functions, the number of formal parameters (at the place of method definition) matches the number of actual parameters (at the call site) § Bad news: (Theorem) There exists no context-free grammar G such that LP = L(G) § Proof:

16

slide-16
SLIDE 16

Comments

§ Context-free grammars cannot express all desirable constraints

§ Switching to context-sensitive not productive § Use “unrestricted grammar” instead…

§ Use a program to perform additional checks

§ Complete flexibility § Can be (and often is) an additional step in compiler

§ After parsing § Before code generation

§ Recall: Some checks must wait till run time

17

slide-17
SLIDE 17

More comments

§ Note: Parsing also used in (natural) language processing

§ No (complete) (context-free) grammar exists for English, German, … § More expensive approaches are needed § Ambiguity part of reality § May need to obtain (multiple, all) parse trees

§ “The food is here!” vs. “The food is here?”

§ Interesting topic but not part of this class

18

slide-18
SLIDE 18

4.0 Semantic analysis

§ Idea: before proceeding to code generation compiler checks program properties

§ Early feedback (while source information still available) § Avoid subsequent complications

19

slide-19
SLIDE 19

Semantic analysis

§ Idea: before proceeding to code generation compiler checks program properties § Also the time to transform program

§ Often done at the time parse tree is transformed into AST § Example transformations

§ Type casts § Add default parameters to method/function calls § Construct initializer

20

slide-20
SLIDE 20

4.1 Syntax-directed translation

§ Parsing: Control table M decides which production to use § So far: Recorded production (as “action”) § General: Attach code to production

§ E.g., add node to syntax tree § E.g., keep track of definitions

§ As the parser recognizes a word

§ It produces an AST (or other desired data structure) § And/or computes predicate

21

slide-21
SLIDE 21

Attribute grammars

§ Context free grammar extended with (context-sensitive) information

§ “Attributes” § Attached to non-terminals

§ Attributes have values

§ Value assigned during parsing § Value evaluated in a conditional statement (see later)

22

slide-22
SLIDE 22

Attribute grammars

§ Types of attributes

1. Synthesized attributes

§ Value obtained from attributes of children of non-terminal

2. Inherited attributes

§ Value obtained from attribute of parent of non-terminal § Or from attribute(s) of sibling(s) of non-terminal

23

slide-23
SLIDE 23

Example

§ Example (expression evaluation)

§ E à E + T

§ Production: E0 à E1 + T § Attribute

§ Integer value § E0.Value := E1.Value + T.Value

§ Note: Use E1 vs E0 to distinguish two occurrences of E in production

25

slide-24
SLIDE 24

Attributes

§ Consider L = { an bn cn }.

§ Terminals: a, b, c § n integer ≥ 1

§ L cannot be produced by a context-free grammar § We would like to use a context free grammar (and parser) to recognize L

§ Idea: Use attributes to deal with aspects parser cannot handle § Attribute domain: Integers § Result predicate: “true” if w = ak bk ck for some k

27

slide-25
SLIDE 25

Example (cont’d)

§ Consider G19

S à A B C A à aA | a B à bB | b C à cC | c

§ Start symbol is S § L = { an bn cn } ⊂ L(G19)

28

slide-26
SLIDE 26

Rules

§ Attach a rule to each production § Rules for A productions

A0 à a A1 <A0>.Na := <A1>.Na + 1 A à a <A>.Na := 1

§ Rules for B, C productions similar § Condition for S à A B C

§ <A>.Na == <B>.Nb == <C>.Nc

29

slide-27
SLIDE 27

Rules

30

S à A B C A à aA | a B à bB | b C à cC | c A0 à a A1 <A0>.Na := <A1>.Na + 1 A à a <A>.Na := 1

Productions

B0 à b B1 <B0>.Nb := <B1>.Nb + 1 B à b <B>.Nb := 1 C0 à c C1 <C0>.Nc := <C1>.Nc + 1 C à c <C>.Nc := 1 S à A B C if and only if <A>.Na == <B>.Nb == <C>.Nc

slide-28
SLIDE 28

aabbcc

$ a$ aa$ Aa$ A$ bA$ bbA$ BbA$ BA$

32

aabbcc$ abbcc$ bbcc$ bbcc$ bbcc$ bbcc$ bbcc$ bcc$ cc$ A à a; <A>.Na:=1 A0 à a A1; <A0>.Na:=<A1>.Na+1=2 B à b; <B>.Nb:=1 B à bB; <B0>.Nb:=2 Stack Input Action

slide-29
SLIDE 29

BA$ cBA$ ccBA$ CcBA$ CBA$ S$

34

cc$ c$ $ $ $ $ C à c; <C>.Nc:=1 C0 à c C1; C0>.Nc:=<C1>.Nc+1=2 S à A B C; Na==Nb==Nc ? True ACCEPT

aabbcc

Stack Input Action

slide-30
SLIDE 30

aabbcc – tree view

35

S B A C B A C b a c b a c

Condition: true Na = 2 Nb = 2 Nc = 2 Na = 1 Nb = 1 Nc = 1

slide-31
SLIDE 31

Question

What type of parser (top-down or bottom-up) did we use to parse w (and to implement the checks)? Why? (Hint: Top-of-stack arbitrarily picked to be on the left, that is, position of top-of-stack does not convey any information.)

36

slide-32
SLIDE 32

Syntax(-based) analysis

§ Powerful tool § Easy to get carried away § Once a topic of active research

37

slide-33
SLIDE 33

Semantic analysis

§ Goal: Identify problems early on

float f; int [] iarray; int j; iarray = new int [10]; iarray [f] = j;

§ Idea: check AST

§ Either report error § Modify AST int j; float f; j = f; // replace with: j = round(f)

38

slide-34
SLIDE 34

4.2 Symbol table

§ Symbol table: Central repository of information about program symbols § Checks must exploit structure of program

39

slide-35
SLIDE 35

Symbol table

§ Many checks require gathering/retrieving information about symbols

§ Function/method names § Class names § Variable/field names § Function/method types § Class types § Variable/field types

40

slide-36
SLIDE 36

Symbol table interface

§ What is

§ Type(Symbol X) § Defined(Symbol X) § Kindof(Symbol X) § …

§ JavaLi programs consist of classes

§ With methods § Variables/fields can be declared in both contexts

§ Symbol table must mirror structure of program § Retrieval (significantly) more frequent than insertion

§ Should we support deletion?

42

slide-37
SLIDE 37

Symbol table structure

§ Nesting can hide symbols

§ E.g., name conflict between method-local symbol and class field

§ No nesting of method or class definitions § One symbol table for each class

44

slide-38
SLIDE 38

Example program

class A { int a, n; int foo (int j) { int k, n; a = j * 2; return n + 1; } } class B { int a, j; void bar (int j) { … } }

45

slide-39
SLIDE 39

One possible setup

48

Name Type Name Type Name Type Name Type Symbol table class A Symbol table class B Symbol table method A::foo S.T. method B::bar

slide-40
SLIDE 40

Symbol table

§ Should record more than name and type § Information recorded at the class level

§ Field names

§ Size (bytes) § Location (offset from ’this’ reference) § Alignment constraints

§ Method names

§ Location of code

§ Base class § Constants (if supported by language)

§ Location: Absolute in a segment

50

slide-41
SLIDE 41

Symbol table (cont’d)

§ Information recorded at the method level

§ Number and types of formal parameters § Local variables and parameters

§ Size in bytes § Alignment constraints § Location (offset from frame pointer)

§ Storage class: Register or special resources

§ Debugging information

§ Line number/byte offset for definition

51

slide-42
SLIDE 42

Symbol table construction

§ Walk over AST § Process class/method/variable definitions § Create symbol tables as appropriate § Enter information, check for duplicates § Can be done after AST is constructed or during AST construction

§ Rules for attribute grammar

52

slide-43
SLIDE 43

Program structure

§ Combine symbol tables for all classes

§ Global Symbol Table

§ Check for uniqueness of class definition

53

slide-44
SLIDE 44

Symbol table -- other issues

§ User-defined constants § Pre-defined constants or symbols

§ Boolean “true”, “false” § “this” reference

§ Can go into method symbol tables § Can go into class symbol tables

54

slide-45
SLIDE 45

More details

56

Name Type Name Type Name Type Symbol table class A Symbol table class B Global Symbol Table A class B class a int n int foo int func a int j int bar void func true

boolean constant