CS502: Compiler Design Semantic Analysis (Cont.) Manas Thakur Fall - - PowerPoint PPT Presentation
CS502: Compiler Design Semantic Analysis (Cont.) Manas Thakur Fall - - PowerPoint PPT Presentation
CS502: Compiler Design Semantic Analysis (Cont.) Manas Thakur Fall 2020 Recap Syntax analysis can only find, well, syntax errors. We are interested in being able to find various other kinds of errors: bar(int a, char* s) {...} int
Manas Thakur CS502: Compiler Design 2
Recap
- Syntax analysis can only find, well, syntax errors.
- We are interested in being able to find various other kinds of errors:
bar(int a, char* s) {...} int foo() { int f[3]; int i, j, k; char q, *p; float k; bar(f[6], 10, x); break; i->val = 5; q = k + p; printf(“%s, %s.\n”, p, k); goto label2; }
Manas Thakur CS502: Compiler Design 3
Program checking
- When are checks performed?
- Static checking
– At compile-time – Detect and report errors by analyzing the program offline
- Dynamic checking
– At run-time – Detect and report/handle errors as they occur
- Pros and cons?
– Efficiency? – Completeness? – Developer and user experience? – Language flexibility?
Manas Thakur CS502: Compiler Design 4
What all can be checked statically?
- Uniqueness checks
– Certain names must be unique – Many languages require variable declarations
- Control-flow checks
– Match control-flow operators with structures – Example: break applies to innermost loop/switch
- Type checks
– Check compatibility of operators and operands – Example: Does 3.5 + “foobar” make sense?
- What kind of check is “array bounds”?
Manas Thakur CS502: Compiler Design 5
Uniqueness checks
- What does a name in a program denote?
– Variable – Function – Class – Label
- Information maintained in bindings
– A binding from the name to the corresponding entity – Bindings have scope:
- the region of the program in which they are valid
- Uniqueness checks
– Analyze the bindings – Make sure they obey the rules
Manas Thakur CS502: Compiler Design 6
Namespace abstractions
- What is a function/procedure/method? What is a class?
– Do they exist at the machine-code level? – Not really!
- Functions/procedures/methods and classes essentially define
namespaces.
- Helpful in
– Identifying scopes – Defining bindings
Manas Thakur CS502: Compiler Design 7
Procedures as namespaces
- Each procedure creates its own namespace
– Names can be declared locally – Local names hide identical non-local (global) names (shadowing) – Local names cannot be seen outside the procedure
- Such a set of rules is called lexical (or static) scoping.
– There must then exist a dynamic scoping!
- Ask those who have taken CS302!
- e.g., C has global, static, local, and block scopes
– Blocks can be nested, procedures cannot.
Manas Thakur CS502: Compiler Design 8
Lexical scoping
- Why is it good?
– Flexibility for programmer (reuse of variable names) – Easy to “see” a binding!
- Compiler’s headache to differentiate same-name variables at
different points
– Implementation: Lexically scoped symbol tables
{ for (int i = 0; i < 100; ++i) { ... } for (Iterator i = list.iterator(); i.hasNext();) { ... } }
Difgerent because of lexical scoping
Manas Thakur CS502: Compiler Design 9
Symbol Table
Lexical Analyzer Lexical Analyzer Syntax Analyzer Syntax Analyzer Semantic Analyzer Semantic Analyzer Intermediate Code Generator Intermediate Code Generator Character stream Token stream Syntax tree Syntax tree Intermediate representation Machine-Independent Code Optimizer Machine-Independent Code Optimizer Code Generator Code Generator Target machine code Intermediate representation Machine-Dependent Code Optimizer Machine-Dependent Code Optimizer Target machine code Symbol Table
Manas Thakur CS502: Compiler Design 10
Lexically scoped symbol tables
- Tasks at hand
– Keep track of names – At the use of a name, find its information (e.g., which one?)
- The problem
– Compiler needs a distinct entry for each declaration – Nested lexical scopes allow duplicate entries
- Let’s see an example.
Manas Thakur CS502: Compiler Design 11
Scopes
class p { int a, b, c; method q { int v, b, x, w; for (r = 0; ...) { int x, y, z; … } while (s) { int x, a, v; … } … r … s } … q … }
Sp:{ int a, b, c; Sq: { int v, b, x, w; Sr: { int x, y, z; ... } Ss: { int x, a, v; ... } } }
Manas Thakur CS502: Compiler Design 12
Chained implementation
- Create a new table for each scope
- Chain tables together for lookup
x y z v b x w a b c
- r
q p ... ...
- enter() creates a new table
- insert() adds at current level
- lookup() walks chain of tables
and returns fjrst occurrence
- f name
- exit() throws away the table
for the current level
- How would one implement the
individual tables?
r
Manas Thakur CS502: Compiler Design 13
Tomorrow
- Extensions to symbol tables for OO languages
– Classes – Objects – Object fields – Inheritance
- Implementation:
– Your compiler is taking shape now.
- Poll on Teams for doubt session.
CS502: Compiler Design Semantic Analysis (Cont.) Manas Thakur
Fall 2020
Manas Thakur CS502: Compiler Design 15
Virtual White Board
- Designing a symbol table
- Extending for new scopes
- Classes and inheritance
- Assignment 2: Not overweight, but under-tall
– Try feeding lasagne to Garfield – Deadline: Oct 18th
CS502: Compiler Design Semantic Analysis (Cont.) Manas Thakur
Fall 2020
Manas Thakur CS502: Compiler Design 17
Uniqueness checks: More complications
- Forward references
– need multiple passes
- includes, packages, modules, interfaces
– need to import/export
- Various coding conveniences
– int a = sizeof(a);
- Declare “a” in the namespace before parsing the initializer
– int b, c[sizeof(b)];
- Declare “b” with a type before parsing “c”
- Multiple inheritance?
- Summary: Language features complicate the life of compiler
designers even for a seemingly simple check!
Manas Thakur CS502: Compiler Design 18
Type checking
- Big topic
– Type expressions – Type equivalence – Type systems – Type inference
- What is a type?
– A collection of values and the set of operations on those values. – Remember why did you say a door can’t kick or a ship can’t die?
- Types define capabilities.
Manas Thakur CS502: Compiler Design 19
Purpose of types
- Identify and prevent errors
– Avoid meaningless or harmful computations – Meaningless: (x < 6) + 1 - “bathtub” – Harmful?
- Program organization and documentation
– Separate types for separate concepts – Types indicate programmers’ intent
- Support implementation
– Allocate right amount of space for variables – Select right machine operands – Optimization: e.g., use fewer bits when possible
- Key idea: types can be checked
P
- P
P
- P
Manas Thakur CS502: Compiler Design 20
Type errors
- Problem:
– Underlying memory has no concept of type – Everything is just a string of bits: – The floating point number: 3.375 – The 32-bit integer: 1,079,508,992 – Two 16-bit integers: 16472 and 0 – Four ASCII characters: @, X, NULL and NULL
- Without type checking:
– Machine will let you store 3.375 and later load 1,079,508,992 – Violates the intended semantics of the program
0100 0000 0101 1000 0000 0000 0000 0000
Manas Thakur CS502: Compiler Design 21
Type system
- Idea:
– Provide clear interpretation for bits in memory – Impose constraints on the use of variables and data – Expressed as a set of rules – Automatically check the rules – Report errors to programmers
- Key questions:
– What types are built into the language? – Can the programmer build new types? – What are the typing rules? – When does type checking occur? – How strictly are the rules enforced?
Manas Thakur CS502: Compiler Design 22
When are checks performed?
- Statically typed languages
– Types of all the variables are determined ahead of time – Examples?
- C, C++, Java
- Dynamically typed languages
– Type of a variable can vary at run-time – Examples?
- Python, JavaScript, bash, Scheme
- Our focus:
– Static typing – corresponds to standard static compilation
Manas Thakur CS502: Compiler Design 23
Expressiveness
- Consider this Scheme function:
- What is the type of x?
– Sometimes a list, sometimes an atom – Downside?
- What would happen in static typing?
– Cannot assign a type to x at compile-time – Cannot write this function – Static typing is conservative
(define myfunc (lambda (x) (if (list? x) (myfunc(car x)) (+ x 1))
P
- P
P
- P
Manas Thakur CS502: Compiler Design 24
Types and Compilers
- Suppose the task is to generate code for:
– What does the compiler need to know?
- Duties of a compiler:
– Enforce type rules of the language – Choose operations to be performed
- Can a certain computation be done in one machine instruction?
– Provide concrete representation (bits)
- What if a check can’t be performed at compile-time?
a = b + c * d; arr[i] = *p + 2;
Manas Thakur CS502: Compiler Design 25
Strong vs weak typing
- A strongly typed language does not allow variables to be used in
a way inconsistent with their types (no loopholes).
– Example: Java.
- A weakly typed language allows many ways to bypass/violate the
type system.
– Classic example: C. How?
- Pointer arithmetic.
- C’s motto: just trust the programmer!
Manas Thakur CS502: Compiler Design 26
Interesting cases in type checking
- What is the type of “x+i” if x is float and i is int?
- Is this an error?
- Compiler fixes the problem
– Convert into compatible types – Automatic conversions are called coercions – Rules can be complex:
- In C, large set of rules for integral promotions
– Goal is to preserve information.
Manas Thakur CS502: Compiler Design 27
More interesting cases
- What about printf()?
– printf(const char* format, ...) – Implemented with varargs – Format specifies which arguments should follow – Who checks?
- Array bounds
– Array sizes rarely provided in the declaration – Cannot check statically (in general) – There are fancy-dancy systems that try to do this – Java: check at run-time.
Manas Thakur CS502: Compiler Design 28
Tomorrow
- How do we actually perform
static type checking?
- Quiz date?
- A2?
- Kill me!
CS502: Compiler Design Semantic Analysis (Cont.) Manas Thakur
Fall 2020
Manas Thakur CS502: Compiler Design 30
Type systems
- From language specifications:
The result of a unary & operator is a pointer to the object referred to by the operand. If the type of the operand is “T”, the type of the result is “pointer to T”. If both operands of the arithmetic operators addition, subtraction and multiplication are integers, then the result is an integer.
Manas Thakur CS502: Compiler Design 31
Properties of types
- The excerpts on the previous slides imply:
– Types have structure
- “pointer to T”; similarly “array of pointers to T”
– Expressions have types
- Types are derived from operands by rules
- Goal of type checking:
– Determine types for all parts of a program. – If the whole program type checks, you are good to go.
- Type safety:
– A well-typed program is sound.
Manas Thakur CS502: Compiler Design 32
Type expressions
- Build a description of a type from:
– Basic types, also called primitive types
- Vary across languages: int, char, float, double
– Type constructors
- Functions over types that build more complex types
– Type variables
- Unspecified parts of a type: polymorphism, generics
– Type names
- An alias for a type expression: typedef in C
Manas Thakur CS502: Compiler Design 33
Type constructors
- Arrays
– If T is a type, then array(T) is a type denoting an array with
elements of type T.
– May have a size component: array(I, T)
- e.g., array(1..10, T)
- Products or records
– If T1 and T2 are types, then T1 x T2 is a type denoting pairs of two
types.
– May have labels for records/structures
- e.g., (“name”, char*) x (“age”, int)
Manas Thakur CS502: Compiler Design 34
Type constructors (cont.)
- Pointers
– If T is a type, then pointer(T) denotes a pointer to T.
- Functions or function signatures
– If D and R are types, then D
R is a type denoting a function → from domain type D to range type R.
– In programming terms:
- domain arguments
≡
- range return value
≡
– For multiple inputs, domain is a product. – Example: The type of int m(int,int) is int x int → int.
Manas Thakur CS502: Compiler Design 35
Examples of Type rules
- Implementation?
– SDTs!!
Expression Type rule E1 + E2 if type(E1) is int and type(E2) is int then result type is int Expression Type rule E = E1 + E2 if E1.type == int and E2.type == int then E.type = int PCQ: How do we get E1.type and E2.type?
Manas Thakur CS502: Compiler Design 36
More interesting cases
- Type checking nicely fits as a JavaCC/JTB assignment!
Expression Type rule E = E1[E2] if E2.type == “int” && E1.type == “array(T)” then E.type = “T” else error E = *E1 if E1.type == “pointer(T)” then E.type = “T” else error
Manas Thakur CS502: Compiler Design 37
Examples (cont.)
- What about function calls?
- How do we perform these checks?
- What is the core type-checking operation?
- In other words, how to determine “if type(E2) is D”?
Expression Type rule E = E1(E2) if E1.type == D → R and E2.type == D then E.type = R else error
N e x t c l a s s !
CS502: Compiler Design Semantic Analysis (Cont.) Manas Thakur
Fall 2020
Manas Thakur CS502: Compiler Design 39
Type equivalence
- Problem:
– Find if two given types are equivalent.
- Different notions of type equivalence:
– Name equivalence – Structural equivalence
What do we mean by it?
Manas Thakur CS502: Compiler Design 40
Name equivalence
- Types are equivalent if they have the same name.
- Examples:
– int a; int b;
- The types of a and b are equivalent.
– Example on the right:
- x, y are type-equivalent; r, s as well
- but not x, r or y, s, etc.
- What does this mean?
– x = y; would type check successfully. – but r = y; would fail.
typedef struct { int data[100]; int count; } Stack; typedef struct { int data[100]; int count; } Set; Stack x, y; Set r, s;
Manas Thakur CS502: Compiler Design 41
Structural equivalence
- Forget about name!
- Two types are structurally equivalent iff one of the following
conditions is true:
– They are the same basic type. – They are formed by applying the same
constructor to structurally equivalent types.
– One is a type name that denotes another
(typedef).
- int a[2][3] is not equivalent to int b[3][2].
- int a is not equivalent to char b[4].
- But Stack and Set objects are the same!
typedef struct { int data[100]; int count; } Stack; typedef struct { int data[100]; int count; } Set; Stack x, y; Set r, s;
Manas Thakur CS502: Compiler Design 42
Type graphs for structural equivalence
- Represent types as graphs:
– Node for each type and relation – Share the structure when possible
int pointer char × → int pointer char × → int
Function: (char × int) → int *
Manas Thakur CS502: Compiler Design 43
Recursive types
- Why is the following a problem?
- Cycle in the type graph!
- Problem with structural equivalence,
but name equivalence is fine!
- Type equivalence in C:
– Structural equivalence for everything, except?
- structures and unions.
struct cell { int info; struct cell *next; }
Manas Thakur CS502: Compiler Design 44
Type equivalence in Java
- Can we pass Bar objects to a method taking a type Foo?
- No.
- Java uses name equivalence for classes.
- What can we do in C that we can’t do in Java?
class Foo { int x; float y; } class Bar { int x; float y; }
Manas Thakur CS502: Compiler Design 45
Types in OOLs
- What is the relationship between Animal and Monkey?
– Monkey is a subtype of Animal. – Any code that accepts an Animal object can also accept a
Monkey object.
- Remember this whenever you write programs in Java!
- How to write type rules in this case?
– To check an assignment,
check subtype (<=) relationship.
– Also for formal parameters.
class Animal { ... } class Monkey extends Animal { ... }
Expression Type rule E1 = E2; if type(E2) <= type(E1) then result type is E1 else error
Manas Thakur CS502: Compiler Design 46
Before we say goodbye to semantic analysis...
- Understanding program semantics is
an interesting, tough, and active area
- f research.
- Finding out what does a program do
is an undecidable problem (read about Rice’s theorem).
- That doesn’t mean approximate
solutions can’t exist.
- There are sound static analyses that
try to understand programs.
- There also are unsound machine-
learning techniques that try to gauge program behavior.
Manas Thakur CS502: Compiler Design 47
An end leads to another beginning...
- There is a lot more that about program semantics and type
theory:
– Dynamic typing – Type inference – Gradual typing – Dependent types
- And it’s all very interesting.
– Right?
- Let’s generate some (intermediate) code next week (ICG)!