CS502: Compiler Design Semantic Analysis (Cont.) Manas Thakur Fall - - PowerPoint PPT Presentation

cs502 compiler design semantic analysis cont manas thakur
SMART_READER_LITE
LIVE PREVIEW

CS502: Compiler Design Semantic Analysis (Cont.) Manas Thakur Fall - - PowerPoint PPT Presentation

CS502: Compiler Design Semantic Analysis (Cont.) Manas Thakur Fall 2020 Recap Syntax analysis can only find, well, syntax errors. We are interested in being able to find various other kinds of errors: bar(int a, char* s) {...} int


slide-1
SLIDE 1

CS502: Compiler Design Semantic Analysis (Cont.) Manas Thakur

Fall 2020

slide-2
SLIDE 2

Manas Thakur CS502: Compiler Design 2

Recap

  • Syntax analysis can only find, well, syntax errors.
  • We are interested in being able to find various other kinds of errors:

bar(int a, char* s) {...} int foo() { int f[3]; int i, j, k; char q, *p; float k; bar(f[6], 10, x); break; i->val = 5; q = k + p; printf(“%s, %s.\n”, p, k); goto label2; }

slide-3
SLIDE 3

Manas Thakur CS502: Compiler Design 3

Program checking

  • When are checks performed?
  • Static checking

– At compile-time – Detect and report errors by analyzing the program offline

  • Dynamic checking

– At run-time – Detect and report/handle errors as they occur

  • Pros and cons?

– Efficiency? – Completeness? – Developer and user experience? – Language flexibility?

slide-4
SLIDE 4

Manas Thakur CS502: Compiler Design 4

What all can be checked statically?

  • Uniqueness checks

– Certain names must be unique – Many languages require variable declarations

  • Control-flow checks

– Match control-flow operators with structures – Example: break applies to innermost loop/switch

  • Type checks

– Check compatibility of operators and operands – Example: Does 3.5 + “foobar” make sense?

  • What kind of check is “array bounds”?
slide-5
SLIDE 5

Manas Thakur CS502: Compiler Design 5

Uniqueness checks

  • What does a name in a program denote?

– Variable – Function – Class – Label

  • Information maintained in bindings

– A binding from the name to the corresponding entity – Bindings have scope:

  • the region of the program in which they are valid
  • Uniqueness checks

– Analyze the bindings – Make sure they obey the rules

slide-6
SLIDE 6

Manas Thakur CS502: Compiler Design 6

Namespace abstractions

  • What is a function/procedure/method? What is a class?

– Do they exist at the machine-code level? – Not really!

  • Functions/procedures/methods and classes essentially define

namespaces.

  • Helpful in

– Identifying scopes – Defining bindings

slide-7
SLIDE 7

Manas Thakur CS502: Compiler Design 7

Procedures as namespaces

  • Each procedure creates its own namespace

– Names can be declared locally – Local names hide identical non-local (global) names (shadowing) – Local names cannot be seen outside the procedure

  • Such a set of rules is called lexical (or static) scoping.

– There must then exist a dynamic scoping!

  • Ask those who have taken CS302!
  • e.g., C has global, static, local, and block scopes

– Blocks can be nested, procedures cannot.

slide-8
SLIDE 8

Manas Thakur CS502: Compiler Design 8

Lexical scoping

  • Why is it good?

– Flexibility for programmer (reuse of variable names) – Easy to “see” a binding!

  • Compiler’s headache to differentiate same-name variables at

different points

– Implementation: Lexically scoped symbol tables

{ for (int i = 0; i < 100; ++i) { ... } for (Iterator i = list.iterator(); i.hasNext();) { ... } }

Difgerent because of lexical scoping

slide-9
SLIDE 9

Manas Thakur CS502: Compiler Design 9

Symbol Table

Lexical Analyzer Lexical Analyzer Syntax Analyzer Syntax Analyzer Semantic Analyzer Semantic Analyzer Intermediate Code Generator Intermediate Code Generator Character stream Token stream Syntax tree Syntax tree Intermediate representation Machine-Independent Code Optimizer Machine-Independent Code Optimizer Code Generator Code Generator Target machine code Intermediate representation Machine-Dependent Code Optimizer Machine-Dependent Code Optimizer Target machine code Symbol Table

slide-10
SLIDE 10

Manas Thakur CS502: Compiler Design 10

Lexically scoped symbol tables

  • Tasks at hand

– Keep track of names – At the use of a name, find its information (e.g., which one?)

  • The problem

– Compiler needs a distinct entry for each declaration – Nested lexical scopes allow duplicate entries

  • Let’s see an example.
slide-11
SLIDE 11

Manas Thakur CS502: Compiler Design 11

Scopes

class p { int a, b, c; method q { int v, b, x, w; for (r = 0; ...) { int x, y, z; … } while (s) { int x, a, v; … } … r … s } … q … }

Sp:{ int a, b, c; Sq: { int v, b, x, w; Sr: { int x, y, z; ... } Ss: { int x, a, v; ... } } }

slide-12
SLIDE 12

Manas Thakur CS502: Compiler Design 12

Chained implementation

  • Create a new table for each scope
  • Chain tables together for lookup

x y z v b x w a b c

  • r

q p ... ...

  • enter() creates a new table
  • insert() adds at current level
  • lookup() walks chain of tables

and returns fjrst occurrence

  • f name
  • exit() throws away the table

for the current level

  • How would one implement the

individual tables?

r

slide-13
SLIDE 13

Manas Thakur CS502: Compiler Design 13

Tomorrow

  • Extensions to symbol tables for OO languages

– Classes – Objects – Object fields – Inheritance

  • Implementation:

– Your compiler is taking shape now.

  • Poll on Teams for doubt session.
slide-14
SLIDE 14

CS502: Compiler Design Semantic Analysis (Cont.) Manas Thakur

Fall 2020

slide-15
SLIDE 15

Manas Thakur CS502: Compiler Design 15

Virtual White Board

  • Designing a symbol table
  • Extending for new scopes
  • Classes and inheritance
  • Assignment 2: Not overweight, but under-tall

– Try feeding lasagne to Garfield – Deadline: Oct 18th

slide-16
SLIDE 16

CS502: Compiler Design Semantic Analysis (Cont.) Manas Thakur

Fall 2020

slide-17
SLIDE 17

Manas Thakur CS502: Compiler Design 17

Uniqueness checks: More complications

  • Forward references

– need multiple passes

  • includes, packages, modules, interfaces

– need to import/export

  • Various coding conveniences

– int a = sizeof(a);

  • Declare “a” in the namespace before parsing the initializer

– int b, c[sizeof(b)];

  • Declare “b” with a type before parsing “c”
  • Multiple inheritance?
  • Summary: Language features complicate the life of compiler

designers even for a seemingly simple check!

slide-18
SLIDE 18

Manas Thakur CS502: Compiler Design 18

Type checking

  • Big topic

– Type expressions – Type equivalence – Type systems – Type inference

  • What is a type?

– A collection of values and the set of operations on those values. – Remember why did you say a door can’t kick or a ship can’t die?

  • Types define capabilities.
slide-19
SLIDE 19

Manas Thakur CS502: Compiler Design 19

Purpose of types

  • Identify and prevent errors

– Avoid meaningless or harmful computations – Meaningless: (x < 6) + 1 - “bathtub” – Harmful?

  • Program organization and documentation

– Separate types for separate concepts – Types indicate programmers’ intent

  • Support implementation

– Allocate right amount of space for variables – Select right machine operands – Optimization: e.g., use fewer bits when possible

  • Key idea: types can be checked

P

  • P

P

  • P
slide-20
SLIDE 20

Manas Thakur CS502: Compiler Design 20

Type errors

  • Problem:

– Underlying memory has no concept of type – Everything is just a string of bits: – The floating point number: 3.375 – The 32-bit integer: 1,079,508,992 – Two 16-bit integers: 16472 and 0 – Four ASCII characters: @, X, NULL and NULL

  • Without type checking:

– Machine will let you store 3.375 and later load 1,079,508,992 – Violates the intended semantics of the program

0100 0000 0101 1000 0000 0000 0000 0000

slide-21
SLIDE 21

Manas Thakur CS502: Compiler Design 21

Type system

  • Idea:

– Provide clear interpretation for bits in memory – Impose constraints on the use of variables and data – Expressed as a set of rules – Automatically check the rules – Report errors to programmers

  • Key questions:

– What types are built into the language? – Can the programmer build new types? – What are the typing rules? – When does type checking occur? – How strictly are the rules enforced?

slide-22
SLIDE 22

Manas Thakur CS502: Compiler Design 22

When are checks performed?

  • Statically typed languages

– Types of all the variables are determined ahead of time – Examples?

  • C, C++, Java
  • Dynamically typed languages

– Type of a variable can vary at run-time – Examples?

  • Python, JavaScript, bash, Scheme
  • Our focus:

– Static typing – corresponds to standard static compilation

slide-23
SLIDE 23

Manas Thakur CS502: Compiler Design 23

Expressiveness

  • Consider this Scheme function:
  • What is the type of x?

– Sometimes a list, sometimes an atom – Downside?

  • What would happen in static typing?

– Cannot assign a type to x at compile-time – Cannot write this function – Static typing is conservative

(define myfunc (lambda (x) (if (list? x) (myfunc(car x)) (+ x 1))

P

  • P

P

  • P
slide-24
SLIDE 24

Manas Thakur CS502: Compiler Design 24

Types and Compilers

  • Suppose the task is to generate code for:

– What does the compiler need to know?

  • Duties of a compiler:

– Enforce type rules of the language – Choose operations to be performed

  • Can a certain computation be done in one machine instruction?

– Provide concrete representation (bits)

  • What if a check can’t be performed at compile-time?

a = b + c * d; arr[i] = *p + 2;

slide-25
SLIDE 25

Manas Thakur CS502: Compiler Design 25

Strong vs weak typing

  • A strongly typed language does not allow variables to be used in

a way inconsistent with their types (no loopholes).

– Example: Java.

  • A weakly typed language allows many ways to bypass/violate the

type system.

– Classic example: C. How?

  • Pointer arithmetic.
  • C’s motto: just trust the programmer!
slide-26
SLIDE 26

Manas Thakur CS502: Compiler Design 26

Interesting cases in type checking

  • What is the type of “x+i” if x is float and i is int?
  • Is this an error?
  • Compiler fixes the problem

– Convert into compatible types – Automatic conversions are called coercions – Rules can be complex:

  • In C, large set of rules for integral promotions

– Goal is to preserve information.

slide-27
SLIDE 27

Manas Thakur CS502: Compiler Design 27

More interesting cases

  • What about printf()?

– printf(const char* format, ...) – Implemented with varargs – Format specifies which arguments should follow – Who checks?

  • Array bounds

– Array sizes rarely provided in the declaration – Cannot check statically (in general) – There are fancy-dancy systems that try to do this – Java: check at run-time.

slide-28
SLIDE 28

Manas Thakur CS502: Compiler Design 28

Tomorrow

  • How do we actually perform

static type checking?

  • Quiz date?
  • A2?
  • Kill me!
slide-29
SLIDE 29

CS502: Compiler Design Semantic Analysis (Cont.) Manas Thakur

Fall 2020

slide-30
SLIDE 30

Manas Thakur CS502: Compiler Design 30

Type systems

  • From language specifications:

The result of a unary & operator is a pointer to the object referred to by the operand. If the type of the operand is “T”, the type of the result is “pointer to T”. If both operands of the arithmetic operators addition, subtraction and multiplication are integers, then the result is an integer.

slide-31
SLIDE 31

Manas Thakur CS502: Compiler Design 31

Properties of types

  • The excerpts on the previous slides imply:

– Types have structure

  • “pointer to T”; similarly “array of pointers to T”

– Expressions have types

  • Types are derived from operands by rules
  • Goal of type checking:

– Determine types for all parts of a program. – If the whole program type checks, you are good to go.

  • Type safety:

– A well-typed program is sound.

slide-32
SLIDE 32

Manas Thakur CS502: Compiler Design 32

Type expressions

  • Build a description of a type from:

– Basic types, also called primitive types

  • Vary across languages: int, char, float, double

– Type constructors

  • Functions over types that build more complex types

– Type variables

  • Unspecified parts of a type: polymorphism, generics

– Type names

  • An alias for a type expression: typedef in C
slide-33
SLIDE 33

Manas Thakur CS502: Compiler Design 33

Type constructors

  • Arrays

– If T is a type, then array(T) is a type denoting an array with

elements of type T.

– May have a size component: array(I, T)

  • e.g., array(1..10, T)
  • Products or records

– If T1 and T2 are types, then T1 x T2 is a type denoting pairs of two

types.

– May have labels for records/structures

  • e.g., (“name”, char*) x (“age”, int)
slide-34
SLIDE 34

Manas Thakur CS502: Compiler Design 34

Type constructors (cont.)

  • Pointers

– If T is a type, then pointer(T) denotes a pointer to T.

  • Functions or function signatures

– If D and R are types, then D

R is a type denoting a function → from domain type D to range type R.

– In programming terms:

  • domain arguments

  • range return value

– For multiple inputs, domain is a product. – Example: The type of int m(int,int) is int x int → int.

slide-35
SLIDE 35

Manas Thakur CS502: Compiler Design 35

Examples of Type rules

  • Implementation?

– SDTs!!

Expression Type rule E1 + E2 if type(E1) is int and type(E2) is int then result type is int Expression Type rule E = E1 + E2 if E1.type == int and E2.type == int then E.type = int PCQ: How do we get E1.type and E2.type?

slide-36
SLIDE 36

Manas Thakur CS502: Compiler Design 36

More interesting cases

  • Type checking nicely fits as a JavaCC/JTB assignment!

Expression Type rule E = E1[E2] if E2.type == “int” && E1.type == “array(T)” then E.type = “T” else error E = *E1 if E1.type == “pointer(T)” then E.type = “T” else error

slide-37
SLIDE 37

Manas Thakur CS502: Compiler Design 37

Examples (cont.)

  • What about function calls?
  • How do we perform these checks?
  • What is the core type-checking operation?
  • In other words, how to determine “if type(E2) is D”?

Expression Type rule E = E1(E2) if E1.type == D → R and E2.type == D then E.type = R else error

N e x t c l a s s !

slide-38
SLIDE 38

CS502: Compiler Design Semantic Analysis (Cont.) Manas Thakur

Fall 2020

slide-39
SLIDE 39

Manas Thakur CS502: Compiler Design 39

Type equivalence

  • Problem:

– Find if two given types are equivalent.

  • Different notions of type equivalence:

– Name equivalence – Structural equivalence

What do we mean by it?

slide-40
SLIDE 40

Manas Thakur CS502: Compiler Design 40

Name equivalence

  • Types are equivalent if they have the same name.
  • Examples:

– int a; int b;

  • The types of a and b are equivalent.

– Example on the right:

  • x, y are type-equivalent; r, s as well
  • but not x, r or y, s, etc.
  • What does this mean?

– x = y; would type check successfully. – but r = y; would fail.

typedef struct { int data[100]; int count; } Stack; typedef struct { int data[100]; int count; } Set; Stack x, y; Set r, s;

slide-41
SLIDE 41

Manas Thakur CS502: Compiler Design 41

Structural equivalence

  • Forget about name!
  • Two types are structurally equivalent iff one of the following

conditions is true:

– They are the same basic type. – They are formed by applying the same

constructor to structurally equivalent types.

– One is a type name that denotes another

(typedef).

  • int a[2][3] is not equivalent to int b[3][2].
  • int a is not equivalent to char b[4].
  • But Stack and Set objects are the same!

typedef struct { int data[100]; int count; } Stack; typedef struct { int data[100]; int count; } Set; Stack x, y; Set r, s;

slide-42
SLIDE 42

Manas Thakur CS502: Compiler Design 42

Type graphs for structural equivalence

  • Represent types as graphs:

– Node for each type and relation – Share the structure when possible

int pointer char × → int pointer char × → int

Function: (char × int) → int *

slide-43
SLIDE 43

Manas Thakur CS502: Compiler Design 43

Recursive types

  • Why is the following a problem?
  • Cycle in the type graph!
  • Problem with structural equivalence,

but name equivalence is fine!

  • Type equivalence in C:

– Structural equivalence for everything, except?

  • structures and unions.

struct cell { int info; struct cell *next; }

slide-44
SLIDE 44

Manas Thakur CS502: Compiler Design 44

Type equivalence in Java

  • Can we pass Bar objects to a method taking a type Foo?
  • No.
  • Java uses name equivalence for classes.
  • What can we do in C that we can’t do in Java?

class Foo { int x; float y; } class Bar { int x; float y; }

slide-45
SLIDE 45

Manas Thakur CS502: Compiler Design 45

Types in OOLs

  • What is the relationship between Animal and Monkey?

– Monkey is a subtype of Animal. – Any code that accepts an Animal object can also accept a

Monkey object.

  • Remember this whenever you write programs in Java!
  • How to write type rules in this case?

– To check an assignment,

check subtype (<=) relationship.

– Also for formal parameters.

class Animal { ... } class Monkey extends Animal { ... }

Expression Type rule E1 = E2; if type(E2) <= type(E1) then result type is E1 else error

slide-46
SLIDE 46

Manas Thakur CS502: Compiler Design 46

Before we say goodbye to semantic analysis...

  • Understanding program semantics is

an interesting, tough, and active area

  • f research.
  • Finding out what does a program do

is an undecidable problem (read about Rice’s theorem).

  • That doesn’t mean approximate

solutions can’t exist.

  • There are sound static analyses that

try to understand programs.

  • There also are unsound machine-

learning techniques that try to gauge program behavior.

slide-47
SLIDE 47

Manas Thakur CS502: Compiler Design 47

An end leads to another beginning...

  • There is a lot more that about program semantics and type

theory:

– Dynamic typing – Type inference – Gradual typing – Dependent types

  • And it’s all very interesting.

– Right?

  • Let’s generate some (intermediate) code next week (ICG)!