11/8/2012 The Structure of a Compiler (2) The Structure of a - - PowerPoint PPT Presentation

11 8 2012
SMART_READER_LITE
LIVE PREVIEW

11/8/2012 The Structure of a Compiler (2) The Structure of a - - PowerPoint PPT Presentation

11/8/2012 The Structure of a Compiler (2) The Structure of a Compiler (1) Any compiler must perform two major tasks Sourc rce Progra ram Tokens Syntactic Semantic Scanner Parser Structure re Routines (Chara racter r St Stre


slide-1
SLIDE 1

11/8/2012 1

Compiler Design and Construction Semantic Analysis

Slides modified from Louden Book, Dr. Scherger, & Y Chung (NTHU), and Fischer, Leblanc

2

 Any compiler must perform two major tasks

 Analysi

ysis of the source program

 Synthesis

sis of a machine-language program

The Structure of a Compiler (1)

Compiler Analysis Synthesis

The Structure of a Compiler (2)

3

Scanner Parser Semantic Routines Code Genera rator Optimizer Sourc rce Progra ram Tokens Syntactic Structure re Symbol and Attri ribute Tables (Used by all Phases of The Compiler) r) (Chara racter r St Stre ream) Interm rmediate Representation Target machine code

Compiler Stages

January, 2010 Chapter 1: Introduction 4 Scanner Parser Semantic Analyzer Source Code Optimizer Code Generator Target Code Optimizer Source Code Target Tokens Syntax Tree Annotated Tree Intermediate Code Target Code Literal Table Symbol Table Error Handler

Semantic Processing

April, 2011 Chapter 6:Semantic Analysis 5

 Semantic routines interpret meaning based on syntactic

structure of input (modern compilers do this)

 This makes the compilation syntax-directed

 Semantic routines finish the analysis

 Verify static semantics are followed

 Variables declared, compatible operands (type and #), etc.

 Semantic routines also start the synthesis

 Generate either IR or target machine code

 The semantic action is attached to the productions (or

sub trees of a syntax tree).

Abstract Syntax Tree

 1st step in semantic processing is to build a syntax tree

representing input program

 Don't need literal parse tree

 Intermediate nodes for precedence and associativity  e-rules

 Just enough info to drive semantic processing

 Or even recreate input

 Semantic processing performed by traversing the tree 1

  • r more times

 Attributes attached to nodes aid semantic processing

slide-2
SLIDE 2

11/8/2012 2

7.1.1 Using a Syntax Tree Representation of a Parse (1)

  • Parsing:
  • build the parse tree
  • Non-terminals for operator precedence

and associatively are included.

  • Semantic processing:
  • build and decorate the Abstract Syntax Tree (AST)
  • Non-terminals used for ease of parsing

may be omitted in the abstract syntax tree.

7

parse tree

<target> := <exp> id <exp> + <term> <term> <term> * <factoor> <factor> Const id <factor> id

abstract syntax tree

:= id + * id const id

<assign>

2012/11/8

Abstract Syntax Tree

:= Id + * Id Const Id

 Abstract syntax tree for Y:=3*X+I

Abstract Syntax Tree

:= Id(Y) + * Id(I) Const(3) Id(X)

 Abstract syntax tree for Y:=3*X+I with initial

values

Abstract Syntax Tree

 Initially, attributes only at leaves  Attributes propagate during the static semantic

checking

 Processing declarations to build symbol table  Find symbols in ST to get attributes to attach  Determining expression/operand types

 Declarations propagate top-down  Expressions propagate bottom-up  A tree is decorated after sufficient info for code

generation has propagated.

Abstract Syntax Tree

:=(itof) Id(Y)(f) +(i) * (i) Id(I,i) Const(3,i) Id(X,i)

 Abstract syntax tree for Y:=3*X+I with

propagated values

7.1.1 Using a Syntax Tree Representation of a Parse (2)

  • Semantic routines traverse (post-order) the AST,

computing attributes of the nodes of AST.

  • Initially, only leaves (i.e. terminals, e.g. const, id) have

attributes

12

  • Ex. Y := 3*X + I

:= id(Y) + * id(I) const(3) id(X)

2012/11/8

slide-3
SLIDE 3

11/8/2012 3

7.1.1 Using a Syntax Tree Representation of a Parse (3)

  • The attributes are then propagated to other nodes using

some functions, e.g.

  • build symbol table
  • attach attributes of nodes
  • check types, etc.
  • bottom-up / top-down propagation

13

<program> declaration <stmt> := id + * id const id exp. type symbol table ‘‘‘ ‘‘ ‘‘ ‘ ‘‘‘‘‘ ‘‘ ‘‘‘ ‘‘‘check types: integer * or floating * ” Need to consult symbol table for types of id’s.

2012/11/8

7.1.1 Using a Syntax Tree Representation of a Parse (4)

  • After attribute propagation is done,

the tree is decorated and ready for code generation, use another pass over the decorated AST to generate code.

  • Actually, these can be combined in a single pass
  • Build the AST
  • Decorate the AST
  • Generate the target code
  • What we have described is essentially

the Attribute Grammars(AG) (Details in chap.14)

14 2012/11/8

Static Semantic Checks

April, 2011 Chapter 6:Semantic Analysis 15  Static semantics can be checked at compile time  Check only propagated attributes  Type compatibility across assignment Int B; B := 5.2; illegal B := 3; legal  Use attributes and structure  Correct number and types of parameters procedure foo(int a, float b, int c, float b); int C; float D; call foo(C,D,3,2.9) legal call foo(C,D,3.3, 2.9) illegal call foo(1,2,3,4,5) illegal

Dynamic Semantic Checks

 Some checks can’t be done at compile time

 Array bounds, arithmetic errors, valid addresses of pointers,

variables initialized before use.

 Some languages allow explicit dynamic semantic checks

 i.e. assert denominator not = 0

 These are handled by the semantic routines inserting

code to check for these semantics

 Violating dynamic semantics result in exceptions

Translation

 Translation task uses attributes as data, but it is driven by

the structure

 Translation output can be several forms

 Machine code  Intermediate representation

 Decorated tree itself  Sent to optimizer or code generator

Compiler Organization

 one-pass compiler

 Single pass used for both analysis and synthesis

 Scanning, parsing, checking, & translation all interleaved,  No explicit IR generated

 Semantic routines must generate machine code  Only simple optimizations can be performed  T

ends to be less portable

slide-4
SLIDE 4

11/8/2012 4

7.1.2 Compiler Organization Alternatives (2)

  • We prefer the code generator completely hides machine

details and semantic routines are independent of machines.

  • Can be violated to produce better code.
  • Suppose there are several classes of registers,

each for a different purpose.

  • Better for register allocation to be performed by semantic

routines than code generator since semantic routines have a broader view of the AST.

19 2012/11/8

Compiler Organization

 one-pass with peephole optimization

 Optimizer makes a pass over generated machine code, looking

at a small number of instructions at a time

 Allows for simple code generation

 Peephole: looking at only a few instructions at a time

 Effectively a separate pass  Simple but effective  Simplifies code generator since there is a pass of post-

processing.

Compiler Organization

 one-pass analysis and IR synthesis plus a code generation

pass

 Adds flexibility  Explicit IR created & sent to code generator  IR typically simple  Optimization can examine as much of IR as wanted  Less machine-dependent analysis

 So easier to retarget

Compiler Organization

 Multi-pass analysis 

Scan, then parse, then check declarations, then static semantics

Usually used to save space (memory usage or compiler)

 Multi-pass synthesis 

Separate out machine dependence

Better optimization

Generate IR

Do machine independent optimization

Generate machine code

Machine dependent optimization

Many complicated optimization and code generation algorithms require multiple passes

i.e. optimizations that need a more global view for I = 1 to N foo = 35*bar(i)+16; bar(i) { return 3;};

7.1.2 Compiler Organization Alternatives (7)

  • Multi-language and multi-target compilers
  • Components may be shared and parameterized.
  • Ex : Ada uses Diana (language-dependent IR)
  • Ex : GCC uses two IRs.
  • one is high-level tree-oriented
  • the other(RTL) is more machine-oriented

23

FORTRAN PASCAL ADA C ..... machine-independent optimization SUN PC main-frame ..... language - and machine-independent IRs

2012/11/8

7.1.3 Single Pass (1)

  • In Micro of chap 2, scanning, parsing and semantic processing

are interleaved in a single pass.

  • (+) simple front-end
  • (+) less storage if no explicit trees
  • (-) immediately available information is limited since no complete

tree is built.

  • Relationships

24 scanner call tokens parser semantic rtn 1 semantic rtn 2 semantic rtn k semantic records call 2012/11/8

slide-5
SLIDE 5

11/8/2012 5

7.1.3 Single Pass (2)

  • Each terminal and non-terminal has a semantic record.
  • Semantic records may be considered

as the attributes of the terminals and non-terminals.

  • T

erminals

  • the semantic records are created by the scanner.
  • Non-terminals
  • the semantic records are created by a semantic routine when a

production is recognized.

  • Semantic records are transmitted

among semantic routines via a semantic stack.

25

A B C D #SR

  • ex. A  B C D #SR

2012/11/8

  • 1 pass = 1 post-order traversal of the parse tree
  • parsing actions -- build parse trees
  • semantic actions -- post-order traversal

7.1.3 Single Pass (3)

26 2012/11/8 + B A <exp> A <exp> <exp>+<term> <assign> ID:=<exp> gencode(+,B,1,tmp1) gencode(:=,A,tmp1) <assign> ID (A) := <exp> <exp> + <term> <term> const (1) id (B) A B A + B A 1 Fall, 2002 CS 153 - Chapter 6 27

Chapter 6 - Semantic Analysis

 Parser verifies that a program is

syntactically correct and constructs a syntax tree (or other intermediate representation).

 Semantic analyzer checks that the

program satisfies all other static language requirements (is “meaningful”) and collects and computes information needed for code generation.

Fall, 2002 CS 153 - Chapter 6 28

Important Semantic Information

 Symbol table: collects declaration

and scope information to satisfy “declaration before use” rule, and to establish data type and other properties of names in a program.

 Data types and type checking:

compute data types for all typed language entities and check that language rules on types are satisfied.

Fall, 2002 CS 153 - Chapter 6 29

How to build the symbol table and check types:

 Analyze the scope rules for the

language and determine an appropriate table structure for maintaining this information.

 Analyze the type requirements and

translate them into rules that can be applied recursively on a syntax tree.

Fall, 2002 CS 153 - Chapter 6 30

Theoretical framework for semantic analysis

 Focus on attributes: computable

properties of language constructs that are needed to satisfy language requirements and/or generate code

 Describe the computation of attributes

using equations or algorithms.

 Associate these equations to grammar

rules and/or kinds of nodes in a syntax tree.

slide-6
SLIDE 6

11/8/2012 6

Fall, 2002 CS 153 - Chapter 6 31

 Analyze the structure of the

equations to determine an order in which the attributes can be

  • computed. (Tree traversals of syntax

tree - preorder, postorder, inorder, or some combination of them.)

Fall, 2002 CS 153 - Chapter 6 32

 Such a set of equations as described

is called an attribute grammar.

 While much can be done without a

formal framework, the formality of equations can help the process considerably.

 Nevertheless, there is currently no

tool in standard use that allows this process to be automated (languages differ too much in their requirements).

Fall, 2002 CS 153 - Chapter 6 33

Example of an attribute grammar

Grammar: exp  exp + term | exp - term | term term  term * factor | factor factor  ( exp ) | number Attribute Grammar:

GRAMMAR RULE SEMANTIC RULES exp1  exp2 + term exp1 .val = exp2 .val  term.val exp1  exp2 - term exp1 .val = exp2 .val  term.val exp  term exp.val = term.val term1  term2 * factor term1 .val = term2 .val * factor.val term  factor term.val = factor.val factor  ( exp ) factor.val = exp.val factor  number factor.val = number.val

Fall, 2002 CS 153 - Chapter 6 34

Notes:

 Different instances of same

nonterminal must be subscripted to distinguish them.

 Some attributes must have been

precomputed (by scanner or parser), e.g. number.val.

 These particular attribute equations

look a lot like a yacc specification, because they represent a bottom-up attribute computation.

Fall, 2002 CS 153 - Chapter 6 35

A Second Example

Grammar: decl  type var-list type  int | float var-list  id , var-list | id Attribute Grammar:

GRAMMAR RULE SEMANTIC RULES decl  type var-list var-list.dtype = type.dtype type  int type.dtype = integer type  float type.dtype = real var-list1  id , var-list2

id .dtype = var-list1.dtype

var-list2.dtype = var-list1.dtype var-list  id

id .dtype = var-list.dtype

Fall, 2002 CS 153 - Chapter 6 36

Notes

 Data type typically propagates down

a syntax tree via declarations.

 No longer something yacc can

handle directly.

 Such an attribute is called inherited,

while bottom-up calculation is called synthesized.

 Syntax tree is a standard synthesized

attribute computable by yacc; other attributes computed on the tree.

slide-7
SLIDE 7

11/8/2012 7

Fall, 2002 CS 153 - Chapter 6 37

Dependency graph

 Indicates order in which attributes must

be computed.

 Synthesized attributes always flow from

children to parents, and can always be computed by a postorder traversal.

 Inherited attributes can flow any other

way.

 L-attributed: a left-to-right traversal

suffices to compute attributes. However, this may involve a combination of pre-

  • rder, inorder, and postorder traversal.

Fall, 2002 CS 153 - Chapter 6 38

Data type dependencies (by grammar rule):

var-list type decl dtype dtype decl  type var-list: var-list.dtype = type.dtype , dtype dtype dtype var-list var-list id var-list  id , var-list: id .dtype = var-list1.dtype var-list2.dtype = var-list1.dtype

Fall, 2002 CS 153 - Chapter 6 39

L-attributed dependencies have three basic mechanisms:

(b) Inheritance from sibling to sibling via the parent (a) Inheritance from parent to siblings a a a A C B a a A C B (c) Sibling inheritance via sibling pointers a a A C B

Fall, 2002 CS 153 - Chapter 6 40

Sample tree structure:

typedef enum {decl,type,id} nodekind; typedef enum {integer,real} typekind; typedef struct treeNode { nodekind kind; struct treeNode * lchild, * rchild, * sibling; typekind dtype; /* for type and id nodes */ char * name; /* for id nodes only */ } * SyntaxTree;

Fall, 2002 CS 153 - Chapter 6 41

Sample tree instance:

decl type dtype ( = real ) id id ( ) x ( ) y

String: float x, y Tree:

Fall, 2002 CS 153 - Chapter 6 42

Traversal code:

void evalType (SyntaxTree t) { switch (t->kind) { case decl: t->rchild->dtype = t->lchild->dtype; evalType(t->rchild); break; case id: if (t->sibling != NULL) { t->sibling->dtype = t->dtype; evalType(t->sibling); } break; } /* end switch */ } /* end evalType */

slide-8
SLIDE 8

11/8/2012 8

Fall, 2002 CS 153 - Chapter 6 43

Attributes need not be kept in the syntax tree:

GRAMMAR RULE SEMANTIC RULES decl  type var-list type  int dtype = integer type  float dtype = real var-list1  id , var-list2 insert(id .name, dtype) var-list  id insert(id .name, dtype)

dtype is global Use a symbol table to store the type of each identifier

Fall, 2002 CS 153 - Chapter 6 44

New traversal code:

typekind dtype; /* global */ void evalType (SyntaxTree t) { switch (t->kind) { case decl: dtype = t->lchild->dtype; evalType(t->rchild); break; case id: insert(t->name,dtype); if (t->sibling != NULL) evalType(t->sibling); break; } /* end switch */ } /* end evalType */

Fall, 2002 CS 153 - Chapter 6 45

Even better, use a parameter instead of a global variable:

void evalDecl(SyntaxTree t) { evalType(t->rchild, t->lchild->dtype); } void evalType(SyntaxTree t, typekind dtype) { insert(t->name,dtype); if (t->sibling != NULL) evalType(t->sibling,dtype); }

Note: inherited attributes can often be turned into parameters to recursive traversal functions, while synthesized attributes can be turned into returned values.

Fall, 2002 CS 153 - Chapter 6 46

Alternative to a difficult inherited situation (not recommended):

Theorem (Knuth [1968]). Given an attribute grammar, all inherited attributes can be changed into synthesized attributes by suitable modification of the grammar, without changing the language of the grammar.

Fall, 2002 CS 153 - Chapter 6 47

Example:

New grammar for types: decl  var-list id var-list  var-list id , | type type  int | float New Tree for float x, y might be:

type dtype ( = real ) id id ( ) x ( ) y dtype ( = real ) dtype ( = real )

Fall, 2002 CS 153 - Chapter 6 48

Our approach:

 Compute inherited stuff first (symbol

table) in a separate pass

 Then type inference and type

checking turns into a purely synthesized attribute computation, since all uses of names have their types already computed.

 Next:

– Symbol table structure – Synthesized type rules

slide-9
SLIDE 9

11/8/2012 9

7.2.2 LR(1) - (1)

  • Semantic routines
  • are invoked only when a structure is recognized.
  • LR parsing
  • a structure is recognized when the RHS is reduced to LHS.
  • Therefore, action symbols must be placed at the end.

49

Ex: <stmt> if <cond> then <stmt> end if <cond> then <stmt> else <stmt> end # ifThenElse # ifThen

2012/11/8

7.2.2 LR(1) - (2)

  • After shifting “if <cond> “
  • The parser cannot decide

which of #ifThen and #ifThenElse should be invoked.

  • cf. In LL parsing,
  • The structure is recognized when a non-terminal is

expanded.

50 2012/11/8

7.2.2 LR(1) - (3)

  • However, sometimes we do need to perform semantic

actions in the middle of a production.

51

<stmt> if <exp> then <stmt> end generate code for <exp> Need a conditional jump here. generate code for <stmt> Ex: Solution: Use two productions: <stmt> <if head> then <stmt> end #finishIf <if head> if <exp> #startIf semantic hook (only for semantic processing)

2012/11/8

  • Another problem
  • What if the action is not at the end?
  • Ex:
  • <prog>  #start begin <stmt> end
  • We need to call #start.
  • Solution: Introduce a new non-terminal.
  • <prog><head> begin <stmt> end
  • <head>#start
  • YACC automatically performs such transformations.

52

7.2.2 LR(1) - (4)

2012/11/8

7.2.3 Semantic Record Representation - (1)

  • Since we need to use a stack to store semantic records,

all semantic records must have the same type.

  • variant record in Pascal
  • union type in C
  • Ex:

enum kind {OP , EXP , STMT, ERROR}; typedef struct { enum kind tag; union {

  • p_rec_type OP_REC;

exp_rec_type EXP_REC; stmt_rec_type STMT_REC; ...... } } sem_rec_type; 53 2012/11/8

  • How to handle errors?
  • Ex.
  • A semantic routine

needs to create a record for each identifier in an expression.

  • What if the identifier is not declared?
  • The solution at next page…….

54

7.2.3 Semantic Record Representation - (2)

2012/11/8

slide-10
SLIDE 10

11/8/2012 10

  • Solution 1: make a bogus record
  • This method may create a chain of

meaningless error messages due to this bogus record.

  • Solution 2: create an ERROR semantic record
  • No error message will be printed

when ERROR record is encountered.

  • WHO controls the semantic stack?
  • action routines
  • parser

55

7.2.3 Semantic Record Representation - (3)

2012/11/8

7.2.4 Action-controlled semantic stack - (1)

  • Action routines take parameters from
  • the semantic stack directly and push results onto the stack.
  • Implementing stacks:
  • 1. array
  • 2. linked list
  • Usually, the stack is transparent - any records

in the stack may be accessed by the semantic routines.

  • (-) difficult to change

56 2012/11/8

  • T

wo other disadvantages:

  • (-)Action routines

need to manage the stack.

  • (-)Control of the stack

is distributed among action routines.

  • Each action routine

pops some records and pushes 0 or 1 record.

  • If any action routine

makes a mistake, the whole stack is corrupt.

  • The solution at next page……..

57

7.2.4 Action-controlled semantic stack - (2)

2012/11/8

  • Solution 1: Let parser control the stack
  • Solution 2: Introduce additional stack routines
  • Ex:
  • Parser  Stack routines  Parameter-driven action routines
  • If action routines

do not control the stack, we can use opaque (or abstract) stack: only push() and pop() are provided.

  • (+) clean interface
  • (- ) less efficient

58

7.2.4 Action-controlled semantic stack - (3)

2012/11/8

7.2.5 parser-controlled stack - (1)

  • LR
  • Semantic stack and parse stack operate in parallel [shifts and

reduces in the same way].

  • Ex:
  • <stmt> if <exp> then <stmt> end
  • Ex:
  • YACC generates such parser-controlled semantic stack.
  • <exp><exp> + <term>
  • { $$.value=$1.value+$3.value;}

59

<stmt> then <exp> if : <stmt> then <exp> if .......... .......... .......... .......... parser stack semantic stack may be combined

2012/11/8

  • LL parser-controlled semantic stack
  • Every time a production AB C D is predicted,

60 A B C D : :

Parse stack

: A : D C B : A : top right current left 12 11 10 9 8 7

Semantic stack

Need four pointers for the semantic stack (left, right, current, top). 7.2.5 parser-controlled stack - (2)

2012/11/8

slide-11
SLIDE 11

11/8/2012 11

  • However, when a new production BE F G is predicted,

the four pointers will be overwritten.

  • Therefore, create a new EOP record for the four

pointers on the parse stack.

  • When EOP record appears on stack top,

restore the four pointers, which essentially pops off records from the semantic stack.

  • An example at next page…….

61

7.2.5 parser-controlled stack - (3)

2012/11/8 62

Parse stack

A : B C D EOP(...) : E F G EOP(7,9,9,12) C D EOP(......) :  EFG A BCD B

Semantic stack

: A : D C B : A : G F E D C B : A : top right current left top right current left current 9 8 7 12 11 10 9 8 7 15 14 13 12 11 10 9 8 7

7.2.5 parser-controlled stack - (4)

2012/11/8

  • Note
  • All push() and pop() are done by the parser
  • Not by the action routines.
  • Semantic records
  • Are passed to the action routines by parameters.
  • Example
  • <primary>(<exp>) #copy ($2,$$)

63

7.2.5 parser-controlled stack - (5)

2012/11/8

  • Initial information
  • is stored in the semantic record of LHS.
  • After the RHS is processed the resulting information
  • is stored back in the semantic record of LHS.

64

7.2.5 parser-controlled stack - (6) initially : A : D C B : A : : A : finally

information flow (attributes)

2012/11/8

  • (-) Semantic stack may grow very big.
  • <fix>
  • Certain non-terminals never use semantic records,
  • e.g. <stmt list> and <id list>.
  • We may insert #reuse
  • before the last non-terminal in each of their productions.
  • Example
  • <stmt list><stmt> #reuse <stmt tail>
  • <stmt tail><stmt> #reuse <stmt tail>
  • <stmt tail>
  • Evaluation
  • Parser-controlled semantic stack is easy with LR, but not so with LL.

65

7.2.5 parser-controlled stack - (7)

2012/11/8

7.3 Intermediate representation and code generation

  • T

wo possibilities:

66

1. ..... semantic routines code generation Machine code

  • (+) no extra pass for code generation
  • (+) allows simple 1-pass compilation

2. semantic routines code generation Machine code IR

  • Target machine is abstracted to some virtual machine
  • Allows language-oriented primitives
  • Code generation separated from semantic routines
  • Semantic routines don't care about temp reg.
  • Reduces machine dependence (isolated to code generation
  • Optimization can be done at intermediate level
  • Optimization independent of target machine
  • Simpler and better optimization (IR more high-level)
  • (+) allows higher-level operations e.g. open block, call procedures.

.....

2012/11/8

slide-12
SLIDE 12

11/8/2012 12

IR vs Machine Code

 Generating machine code advantages:

 No overhead of extra pass to translate IR  Conceptually simple compilation model

 Bottom line

 IR valuable if optimization or portability is an important issue  Machine code much simpler

Forms of IR – Postfix Notation

 Concise  Simple translation  Useful for interpreters and target machines with a stack

architecture

 Not particularly good for optimization or code generation  Example:

Code Postfix a+b ab+ a+b*c abc*+ (a+b)*c ab+c* a:=b*c+b*d abc*bd*+:=

Forms of IR – Three-Address Codes

 Virtual machine having operations with 3

  • perands, 2 source, 1 destination

 Explicitly reference intermediates  Triples: op, arg1, arg2 

More concise

Position dependency makes moving/removing triples hard

 such as during optimization

 Quadruples: op, arg1, arg2, arg3 

More convenient for code generation than postfix

Expression oriented, not so good for other uses a := b*c + b*d (1) ( * b c ) (1) ( * b c t1 ) (2) ( * b d ) (2) ( * b d t2 ) (3) ( + (1) (2)) (3) ( + t1 t2 t3 ) (4) (:= (3) a ) (4) ( := t3 a _ ) intermediate results are referenced by the instruction # use temporary names

Forms of IR – Three-Address Codes

Float a,d; Int b,c; a:=b*c+b*d Triples Quadruples (1)(MULTI,Addr(b),Addr(c)) (1)(MULTI,Addr(b),Addr(c),t1) (2)(FLOAT,Addr(b),-) (2)(FLOAT,Addr(b),t2,-) (3)(MULTF,(2),Addr(d)) (3)(MULTF,t2,Addr(d),t3) (4)(FLOAT,(1),-) (4)(FLOAT,t1,t4,-) (5)(ADDF,(4),(3)) (5)(ADDF,t4,t3,t5) (6)(:=,(5),Addr(a)) (6)(:=,t5,Addr(a),-)

 Can also add more detail, such as type or address.  These forms translate input, other 3 forms transform it

Forms of IR – Tuples

 Tuples allow variable number of operands  A generalization of quadruples

a:=b*c+b*d (1)(MULTI,Addr(b),Addr(c),t1) (2)(FLOAT,Addr(b),t2) (3)(MULTF,t2,Addr(d),t3) (4)(FLOAT,t1,t4) (5)(ADDF,t4,t3,t5) (6)(:=,t5,Addr(a))

Forms of IR – Trees

 Syntax trees can also be used  Directed acyclic graph (DAG) is an option  Can use an abstract syntax tree  More complex and more powerful  Tree Transformations for optimizations

Ex: a := b*c + b*d := a + * * b c b d := a + * * b c d .....

  • Ex. Ada uses Diana.
slide-13
SLIDE 13

11/8/2012 13

Fall, 2002 CS 153 - Chapter 6 - Part 2 73

Symbol Table

 Major data structure after syntax tree.  An inherited attribute that may be kept

globally.

 May be needed before semantic

analysis (or some form of it, as in C), but makes sense to put off computing it until necessary.

 Stores declaration information using

name as primary key.

Fall, 2002 CS 153 - Chapter 6 - Part 2 74

 Specific information stored in

symbol table depends heavily on language, but generally includes:

– Data type – Scope (see below) – Size (bytes, array length) – Potential or actual location information (addresses, offsets - see later)

Fall, 2002 CS 153 - Chapter 6 - Part 2 75

 One way to finesse the issue of what

information to put into the table is to just keep pointers in the table that point to declaration nodes in the syntax tree. Then symbol table code doesn’t need to be changed when changing the information, since it is stored in the node, not directly in the

  • table. This is the approach taken in

the TINY compiler, and should be carried over to C-Minus.

Fall, 2002 CS 153 - Chapter 6 - Part 2 76

Scope Information

 Requires that symbol table have some

kind of “delete” operation in addition to lookup and insert, since exiting a scope requires that declarations be removed from view (that is, lookups no longer find them, though they may still be referenced elsewhere).

 Delete operation should not in general

re-process individual declarations: exitScope() should do them all in O(1).

Fall, 2002 CS 153 - Chapter 6 - Part 2 77

C has simple scope structure:

 All names must be declared before use

(although multiple declarations are possible).

 Scopes are nested in a stack-like fashion,

and cannot be re-entered after exit (simple delete is possible).

 Scope information can be kept simply as a

number: the nesting level (needed during semantic analysis because redeclaration in same scope is illegal in C).

Fall, 2002 CS 153 - Chapter 6 - Part 2 78

Example:

typedef int z; int y; /* this is legal C! */ void x(double x) { char* x; { char x; } }

“external” (global) scope: nestLevel 0 nestlevel 1 begins with params nestlevel 2 begins with function body nestlevel 3

slide-14
SLIDE 14

11/8/2012 14

Fall, 2002 CS 153 - Chapter 6 - Part 2 79

Not all compilers get it right that parameters have a separate scope from the function body in C. But gcc does:

C:\classes\cs153\f02>gcc -c scope.c scope.c: In function `x': scope.c:6: warning: declaration of `x' shadows a parameter

At least all names occupy a single “namespace” in C, so one symbol table is enough (compare to Java).

Fall, 2002 CS 153 - Chapter 6 - Part 2 80

Java has 5 “namespaces”, depending on type of declaration:

package A; // legal Java!!! class A { A A(A A) { A: for(;;) { if (A.A(A) == A) break A; } return A; } }

Fall, 2002 CS 153 - Chapter 6 - Part 2 81

Further complication in Java: local redeclaration even in nested scopes is illegal:

class A { A A(A A) { for(;;) { A A; // oops, now illegal! if (A.A(A) == A) break; } return A; } }

Fall, 2002 CS 153 - Chapter 6 - Part 2 82

Symbol table data structure properties:

 All operations should be very fast

(preferably O(1)).

 Must be able to disambiguate

  • verloaded name use (depending on

language): add type, scope, nesting info to lookup.

 Must not be affected by typical

programmer “clustered” names: x1, x11, x12, etc.

Fall, 2002 CS 153 - Chapter 6 - Part 2 83

Best bet:

 Use a hash table (or a list or tree or

hash table of hash tables).

 Separate chains better than a closed

array (chains handled as little stacks, insertions and deletions always at the front).

 Hash function needs to use all

characters in a name (to avoid collisions), and involve character position too!

Fall, 2002 CS 153 - Chapter 6 - Part 2 84

Example:

Indices Buckets Lists of Items 1 2 3 4 > > > > j i size temp

slide-15
SLIDE 15

11/8/2012 15

Fall, 2002 CS 153 - Chapter 6 - Part 2 85

Sample hash function code:

#define SIZE 211 // typically a prime number #define SHIFT 4 int hash ( char * key ) { int temp = 0; int i = 0; while (key[i] != '\0') { temp = ((temp << SHIFT) + key[i]) % SIZE; ++i; } return temp; }

Fall, 2002 CS 153 - Chapter 6 - Part 2 86

Easy way to get O(1) behavior when exiting a scope: use a linked list (or tree

  • r…) of hash tables, one hash table for

each scope:

> j (int) > size (int) > i (int) > i (char) > temp (char) > f (function) > j (char *) Fall, 2002 CS 153 - Chapter 6 - Part 2 87

Some structure similar to the previous slide is actually required in C++, Ada, and

  • ther languages where scopes can be

arbitrarily re-entered (C++ has the scope resolution operator ::), since individual scopes must be attached to names, allowing them to be “called”:

class A { void f(); } ... void A::f() // go back inside A { ... }

Fall, 2002 CS 153 - Chapter 6 - Part 2 88

Two additional scope issues (of many):

 Recursion: insertion into table must occur

before processing is complete: // lookup of f in body must work: void f() { … f() … }

 Relaxation of declaration before use rule

(C++ and Java class scopes): all insertions must occur before all lookups (two passes required): class A { int f() { return x; } int x; }

Fall, 2002 CS 153 - Chapter 6 - Part 2 89

One more scope issue: dynamic scope

 Some languages use a run-time

version of scope that does not follow the layout of the program on the page, but the execution path: LISP, perl.

 Symbol table then must be part of

runtime system, providing lookup of names during execution (it better be really fast in this case).

Fall, 2002 CS 153 - Chapter 6 - Part 2 90

 Called “dynamic scope” (vs. the

more usual lexical or static scope).

 A questionable design choice for any

but the most dynamic, interpreted languages, since there can then be no static semantic analysis (no static type checking, for example)

 Running the symbol table during

execution also slows down execution speed substantially

slide-16
SLIDE 16

11/8/2012 16

Fall, 2002 CS 153 - Chapter 6 - Part 2 91

Example of dynamic scope (C syntax):

int i = 1; void f(void) { printf("%d\n",i);} main() { int i = 2; /* the following call prints 1 using normal lexical scoping, but prints 2 (the value of the local i) using dynamic scope */ f(); return 0; }

Fall, 2002 CS 153 - Chapter 6 - Part 2 92

TINY symbol table:

 All names are global: there are no

scopes.

 Declaration is by use: if a lookup

fails, perform an insert.

 Virtually no information has to be

kept (all names are int vars), so I had to invent something to store in the symbol table (line numbers).

 No deletes!

Fall, 2002 CS 153 - Chapter 6 - Part 2 93

TINY symtab.h:

/* Insert line numbers and memory locs into the symbol table */ void st_insert( char * name, int lineno, int loc ); /* return the memory location of a variable or -1 if not found */ int st_lookup ( char * name ); /* Procedure printSymTab prints a formatted listing of the symbol table contents to the listing file */ void printSymTab(FILE * listing);

Fall, 2002 CS 153 - Chapter 6 - Part 2 94

Sample TINY code building the symbol table:

case AssignK: case ReadK: if (st_lookup(t->attr.name) == -1) /* not yet in table, so treat as new definition */ st_insert(t->attr.name,t->lineno,location++); else /* already in table, so ignore location, add line number of use only */ st_insert(t->attr.name,t->lineno,0); break;

Fall, 2002 CS 153 - Chapter 6 - Part 2 95

C-Minus Symbol Table

 Use basic structure of TINY  Store tree pointers  Add enterScope() and exitScope()  List of tables structure helpful (slide 15)  Add nesting level to tree nodes  Add pointer to declaration in all ID nodes

(found by lookup)

 Use best ADT methods (hide all details of

actual symtab structure)

Fall, 2002 CS 153 - Chapter 6 - Part 2 96

Sample C-Minus symtab.h:

/* Start a new scope; return 0 if malloc fails, else 1 */ int st_enterScope(void); /* Remove all declarations in the current scope */ void st_exitScope(void); /* Insert def nodes from the syntax tree return 0 if malloc fails, else 1 */ int st_insert( TreePtr ); /* Return the defnode of a variable, parameter, or function, or NULL if not found */ TreePtr st_lookup ( char * name );

slide-17
SLIDE 17

11/8/2012 17

Fall, 2002 CS 153 - Chapter 6 - Part 2 97

Data types and type checking

 A data type is constructed recursively out

  • f simple or base types (int, char, double,

etc.) and type constructors that create “new” types out of a group of existing

  • nes: struct, union, * (“pointer to”), enum,

[ ] (“array of”), etc.

 Types in code are checked by examining

the “compatibility” of the types of the components, and by determining a “result” type, if any, from these.

Fall, 2002 CS 153 - Chapter 6 - Part 2 98

C Example

 Suppose a function is declared as

char * f(double d)

 Data type of f is then

char*()(double) (function from double to char*)

 The call f(2) type checks because f

is a function, 2 is an int, and int is compatible in C with double (can be silently converted). The result then must be of type char*

Fall, 2002 CS 153 - Chapter 6 - Part 2 99

In terms of syntax tree:

call num: 2 id: f char*()(double) int (2) compatible with (3) result has type char* (1) is function

Fall, 2002 CS 153 - Chapter 6 - Part 2 100

Type compatibility of constructed types

 Generally depends on a notion of when

two type are “equal” (equivalent), or at least closely related.

 C example:

struct {} x,z; struct {} y; y = x; // illegal! (different types) z = x; // ok! Same types

Fall, 2002 CS 153 - Chapter 6 - Part 2 101

 On the other hand:

struct A {} x; struct A y; y = x; // now it’s ok!

 struct A {} x; declares a type (with

name “struct A”) and a variable x

 Reusing name struct A gives same type  Writing struct {} defines a type with a

hidden internal name (so it can’t be referred to).

Fall, 2002 CS 153 - Chapter 6 - Part 2 102

Type Equivalence Algorithm

 Structural equivalence: as long as the

types have the same structure, they are equivalent.

 Name equivalence: types are equivalent

  • nly if they are identical as names

 Declaration equivalence: types are

equivalent if they lead back (through renaming) to the same original use of a type constructor.

slide-18
SLIDE 18

11/8/2012 18

Fall, 2002 CS 153 - Chapter 6 - Part 2 103

Equivalence Example (C syntax)

 struct A {};  typedef struct A A;  typedef struct {} B;  struct A x; A y; B z;  x, y, z all structurally equivalent  x, y declaration equivalent, but z is

not declaration equivalent to these

 none are name equivalent

Fall, 2002 CS 153 - Chapter 6 - Part 2 104

C uses a combination of structural and declaration equivalence:

 Declaration equivalence for struct

and union

 structural equivalence for arrays,

pointers, and functions

 enum isn’t even a type constructor,

but constructs a named subrange of int (unlike C++ - see next slide)

Fall, 2002 CS 153 - Chapter 6 - Part 2 105

Digression: Enums in C and C++

 An enum in C is not a real type constructor: enum A {one,two,three} x; enum B {four,five,six} y; x = y /* ok in C */  In C++ this assignment is an error: C:\classes\cs153\f02>gxx enum.cpp enum.cpp: In function `int main()': enum.cpp:7: cannot convert `B' to `A' in assignment  Note how error message implies that C++

automatically generates a typedef enum A A!

Fall, 2002 CS 153 - Chapter 6 - Part 2 106

Representing types internally in a compiler

 Since types are built up recursively,

a tree structure must be used (syntax tree gets another major node kind: datatype).

 Some languages (FORTRAN, TINY,

C-Minus) have flat type spaces, so that an enum can be used: int, intarray, function.

Fall, 2002 CS 153 - Chapter 6 - Part 2 107

 Functions generally are type

constructors too, but their types do not have to be built explicitly, since the return type and parameter types are available in the syntax tree for checking (unless, of course, function types can be explicitly written, as in C: typedef char*(F)(double)- see the next slide).

Fall, 2002 CS 153 - Chapter 6 - Part 2 108

Digression on C function types

 There are two kinds of function types in C

that are almost identical (and that can almost be used interchangeably) - function constants and function pointers: typedef char* F(double); typedef char* (*G)(double);

 F is a “constant” function type (a

prototype), while G is a “pointer to function” type, or function variable: F f; // a prototype for a func f G g = f; // g is var init’ed to f f = g; // illegal - f is const

slide-19
SLIDE 19

11/8/2012 19

Fall, 2002 CS 153 - Chapter 6 - Part 2 109

 In many ways, this mirrors the close

relationship in C between pointers and arrays: int x[10]; int* y = x; // ok x = y; // illegal

 In calls and params it really doesn’t matter

which type you use or assume: f(2), (*f)(2) and (&f)(2) all work fine, and void p(F ff) and void p(G gg) are identical in effect.

Fall, 2002 CS 153 - Chapter 6 - Part 2 110

Recursive types

 Present special problems:

struct A { int x; struct A next; };

is illegal, because it would represent an “infinite” type (just as void f(void) {

f(); } represents an “infinite” call).

 In C must interpose a pointer:

struct A { int x; struct A* next; };

 Some languages use a union instead.  Others (like Java) have implicit

pointers.

Fall, 2002 CS 153 - Chapter 6 - Part 2 111

Other issues (a sample)

 Should array size be part of its type?

(C says no)

 How far should compatibility of types

go? (Should any two pointers be compatible?)

 Dynamic typing: constructing types

during execution.

Fall, 2002 CS 153 - Chapter 6 - Part 2 112

Type checking in TINY

 Only two types: int and bool  Only need to check if statement,

while statement, assignment, and a few other cases

 type errors may create a “void” type.

Suppress error messages in the presence of void.

Fall, 2002 CS 153 - Chapter 6 - Part 2 113

Sample TINY type checking code

switch (t->kind.exp) { case OpK: if ((t->child[0]->type != Integer) || (t->child[1]->type != Integer)) typeError(t,"Op applied to non-integer"); if ((t->attr.op == EQ) || (t->attr.op == LT)) t->type = Boolean; else t->type = Integer; break;

Fall, 2002 CS 153 - Chapter 6 - Part 2 114

Type Checking in C-Minus

 Go through Appendix A carefully,

writing out all type rules

 As in TINY, there are only a few types

(other than functions). And there are no explicit function types, or function variables or parameters. Also no recursive types. And no typedefs.

 Answer questions such as: is x = y

legal if x and y are both arrays?

slide-20
SLIDE 20

11/8/2012 20

Fall, 2002 CS 153 - Chapter 6 - Part 2 115

Example from Appendix A

  • 18. expression  var = expression | simple-expression
  • 19. var  ID | ID [ expression ]

An expression is a variable reference followed by an expression,

  • r just a simple expression. The assignment has the usual storage

semantics: the location of the variable represented by var is found, then the subexpression to the right of the assignment is evaluated, and the value of the subexpression is stored at the given location. This value is also returned as the value of the entire expression. A var is either a simple (integer) variable or a subscripted array variable. A negative subscript causes the program to halt (unlike C). However, upper bounds of subscripts are not checked.

Fall, 2002 CS 153 - Chapter 6 - Part 2 116

Making syntax tree traversals easy: use “generic” traversal function:

static void traverse( TreeNode * t, void (* preProc) (TreeNode *), void (* postProc) (TreeNode *) ) { if (t != NULL) { preProc(t); { int i; for (i=0; i < MAXCHILDREN; i++) traverse(t->child[i],preProc,postProc); } postProc(t); traverse(t->sibling,preProc,postProc); } }

Fall, 2002 CS 153 - Chapter 6 - Part 2 117

// builds symtab in preorder: traverse(syntaxTree,insertNode,nullProc); // checks types in postorder: traverse(syntaxTree,nullProc,checkNode); void nullProc( treeNode* t) {} etc . . .

Fall, 2002 CS 153 - Chapter 6 - Part 2 118

Analyze.h - a two-step process:

/* Function buildSymtab constructs the symbol * table by preorder traversal of the syntax tree */ void buildSymtab(TreeNode *); /* Procedure typeCheck performs type checking * by a postorder syntax tree traversal */ void typeCheck(TreeNode *);

Fall, 2002 CS 153 - Chapter 6 - Part 2 119

What should C-Minus Print under TraceAnalyze?

 Possibly a representation of the

symbol table, as in TINY

 But also another representation of

the tree with types added

 PrintTree could be modified to do

this, or a new PrintTypes function added to util.h/util.c

An Example of C-Minus Symbol Table Construction and the use of the symbol table to link uses of names to their defs.

CS 153 - Fall, 2002 - K. Louden - 11/10/02

slide-21
SLIDE 21

11/8/2012 21

11/11/02

  • K. Louden, CS 153, Fall 2002

121

The Example:

int a; /*d1*/ int b[10]; /*d2*/ int c /*d3*/ (int a[] /*d4*/, int c /*d5*/) { /* Position 1 */ if (c) { int d; /*d6*/ /* Position 2 */ d = a[c] + b[c]; return d; } return 0; } void main(void) /*d7*/ { /* Position 3 */

  • utput(c(b,a));

}

11/11/02

  • K. Louden, CS 153, Fall 2002

122

Syntax tree:

call

a

block

b c main

call

  • utput

c b a

d1 d2 d3 d7

c a

d5 d4 block if return

c

block

d

d6

= d +

subs

a c

subs

b c 11/11/02

  • K. Louden, CS 153, Fall 2002

123

Symbol Table at Position 1:

nestLevel 2 a a d4 c d5 nestLevel 1 nestLevel 0 input a d1 b d2 c d3

  • utput

11/11/02

  • K. Louden, CS 153, Fall 2002

124

Lookup of c after position 1 produces the following tree with link:

call

a

block

b c main

call

  • utput

c b a

d1 d2 d3 d7

c a

d5 d4 block if return

c

block

d

d6

= d +

subs

a c

subs

b c 11/11/02

  • K. Louden, CS 153, Fall 2002

125

Symbol Table at Position 2:

nestLevel 2 a a d4 c d5 nestLevel 1 nestLevel 0 input a d1 b d2 c d3

  • utput

d d6 nestLevel 3

11/11/02

  • K. Louden, CS 153, Fall 2002

126

Lookups of a, b, c , and d after position 2 produces the following tree with links:

call

a

block

b c main

call

  • utput

c b a

d1 d2 d3 d7

c a

d5 d4 block if return

c

block

d

d6

= d +

subs

a c

subs

b c

slide-22
SLIDE 22

11/8/2012 22

11/11/02

  • K. Louden, CS 153, Fall 2002

127

Symbol Table at Position 3:

nestLevel 2 nestLevel 1 nestLevel 0 input a d1 b d2 c d3

  • utput

main d7

11/11/02

  • K. Louden, CS 153, Fall 2002

128

Lookups of output, a, b, and c after pos. 3 produces the following tree with links:

call

a

block

b c main

call

  • utput

c b a

d1 d2 d3 d7

c a

d5 d4 block if return

c

block

d

d6

= d +

subs

a c

subs

b c

  • utput