Compiler Construction Chapter 1: Introduction Slides modified from - - PowerPoint PPT Presentation

compiler construction chapter 1 introduction
SMART_READER_LITE
LIVE PREVIEW

Compiler Construction Chapter 1: Introduction Slides modified from - - PowerPoint PPT Presentation

Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and Dr. Scherger Terminology Compiler Source Language Interpreter Target Language Translator Target Platform Relocatable Assembler


slide-1
SLIDE 1

Compiler Construction Chapter 1: Introduction

Slides modified from Louden Book and Dr. Scherger

slide-2
SLIDE 2

Terminology

January, 2010 Chapter 1: Introduction

2

 Compiler  Interpreter  Translator  Assembler  Linker  Loader  Preprocessor  Editor  Debugger  Profiler  Source Language  Target Language  Target Platform  Relocatable  Macro substitution  IDE  Cross Compiler  Dissambler  Front End  Back End

slide-3
SLIDE 3

Compiler Stages

January, 2010 Chapter 1: Introduction

3 Scanner Parser Semantic Analyzer Source Code Optimizer Code Generator Target Code Optimizer Source Code Target Tokens Syntax Tree Annotated Tree Intermediate Code Target Code Literal Table Symbol Table Error Handler

Analysys Synthesis

slide-4
SLIDE 4

Files Used by Compilers

January, 2010 Chapter 1: Introduction

4

 A source code text file (.c, .cpp, .java, etc. file extensions).  Intermediate code files: transformations of source code

during compilation, usually kept in temporary files rarely seen by the user.

 An assembly code text file containing symbolic machine

code, often produced as the output of a compiler (.asm, .s file extensions).

slide-5
SLIDE 5

Files Used by Compilers (cont.)

January, 2010 Chapter 1: Introduction

5

 One or more binary object code files: machine

instructions, not yet linked or executable (.obj, .o file extensions)

 A binary executable file: linked, independently executable

(well, not always…) code (.exe, .out extensions, or no extension).

slide-6
SLIDE 6

Compiler Execution

 What is O() of a compiler?

slide-7
SLIDE 7

Extended Example

January, 2010 Chapter 1: Introduction

7

 Source code:

 a[index] = 4 + 2

 Tokens:

 ID Lbracket ID Rbracket AssignOp Num AddOp Num

 Parse tree (syntax tree with all steps of the parser in

gory detail):

slide-8
SLIDE 8

Parse Tree

January, 2010 Chapter 1: Introduction

8

expression = expression subscript-expression identifier [ identifier ] a index additive-expression number 4 expression expression + number 2 expression expression assign-expression expression

slide-9
SLIDE 9

Syntax Tree

January, 2010 Chapter 1: Introduction

9

a "trimmed" version of the parse tree with only essential information:

assign-expression subscript-expression identifier identifier a index additive-expression number 4 number 2

slide-10
SLIDE 10

Annotated Syntax Tree (with attributes)

January, 2010 Chapter 1: Introduction

10

assign-expression subscript-expression identifier identifier a index additive-expression number 4 number 2 integer integer array of integer integer integer integer integer

slide-11
SLIDE 11

Intermediate Code

 Syntax tree very abstract  Machine code too specific  Something in between may make optimization much

easier

 One such representation is three-address code

 Has only up to three different variables (addresses)

t = 4 + 2 a[index] = t

slide-12
SLIDE 12

Target Code

January, 2010 Chapter 1: Introduction

12

(edited & modified for this presentation):

mov eax, 6 mov ecx, DWORD PTR _index$[ebp] mov DWORD PTR _a$[ebp+ecx*4], eax (Note source level constant folding optimization.) Source code: a[index] = 4 + 2 Tokens:

ID Lbracket ID Rbracket AssignOp Num AddOp Num

slide-13
SLIDE 13

Scanner Parser Semantic Analyzer Source Code Optimizer Code Generator Target Code Optimizer Source Code Target Tokens Syntax Tree Annotated Tree Intermediate Code Target Code Literal Table Symbol Table Error Handler

The Big Picture

January, 2010 Chapter 1: Introduction

13

mov eax, 6 mov ecx, DWORD PTR _index$[ebp] mov DWORD PTR _a$[ebp+ecx*4], eax

ID Lbracket ID Rbracket AssignOp Num AddOp Num

a[index] = 4 + 2

assign-expression subscript-expression identifier identifier a index additive-expression number 4 number 2

assign-expression subscript-expression identifier identifier a index additive-expression number 4 number 2 integer integer array of integer integer integer integer integer

t = 4 + 2 a[index] = t

slide-14
SLIDE 14

Algorithmic Tools

January, 2010 Chapter 1: Introduction

14

 Tokens: defined using regular expressions. (Chapter 2)  Scanner:

 an implementation of a finite state machine (deterministic

automaton) that recognizes the token regular expressions (Chapter 2).

slide-15
SLIDE 15

Algorithmic Tools (cont.)

January, 2010 Chapter 1: Introduction

15

 Parser

 A push-down automaton (i.e. uses a stack), based on grammar

rules in a standard format (BNF – Backus-Naur Form). (Chapters 3, 4, 5)

 Semantic Analyzer and Code Generator:

 Recursive evaluators based on semantic rules for attributes

(properties of language constructs). (Chapters 6, 7, 8)

slide-16
SLIDE 16

Other Phase Features

January, 2010 Chapter 1: Introduction

16

 Parser and scanner together typically operate as a unit

(parser calls scanner repeatedly to generate tokens).

 Front end:

 Parser, scanner, semantic analyzer and source code optimizer

depend primarily on source language.

 Back end:

 code generator and target code optimizer depend primarily on

target language (machine architecture).

slide-17
SLIDE 17

Other Classifications

January, 2010 Chapter 1: Introduction

17

 Logical unit: phase  Physical unit: separately compiled code file (see later)  Temporal unit: pass

 Passes: trips through the source code (or intermediate code).

These are not phases (but they could be).

slide-18
SLIDE 18

Data Structure Tools

January, 2010 Chapter 1: Introduction

18

 Syntax tree:

 see previous pictures.

 Literal table:

 "Hello, world!", 3.141592653589793, etc.  If a literal is used more than once (as they often are in a program), we

still want to store it only once.

 So we use a table (almost always a hash table or table of hash tables).

 Symbol table:

 all names (variables, functions, classes, typedefs, constants,

namespaces).

 Again, a hash table or set of hash tables is the most likely data

structure.

slide-19
SLIDE 19

Error Handler

January, 2010 Chapter 1: Introduction

19

 One of the more difficult parts of a compiler to design.  Must handle a wide range of errors  Must handle multiple errors.  Must not get stuck.  Must not get into an infinite loop (typical simple-minded

strategy:count errors, stop if count gets too high).

slide-20
SLIDE 20

Kinds of Errors

January, 2010 Chapter 1: Introduction

20

 Syntax:

 iff (x == 0) y + = z + r; }

 Semantic:

 int x = "Hello, world!";

 Runtime:

 int x = 2;  ...  double y = 3.14159 / (x - 2);

slide-21
SLIDE 21

Errors (cont.)

January, 2010 Chapter 1: Introduction

21

 A compiler must handle syntax and semantic errors, but

not runtime errors (whether a runtime error will occur is an undecidable question).

 Sometimes a compiler is required to generate code to

catch runtime errors and handle them in some graceful way (either with or without exception handling).

 This, too, is often difficult.

slide-22
SLIDE 22

Sample Compilers in This Class ("Toys")

January, 2010 Chapter 1: Introduction

22

 TINY: a 4-pass compiler for the TINY language, based on

Pascal (see text, pages 22-26)

 C-Minus: A project language given in the text(see text,

pages 26-27 and Appendix A). Based on C.

 SIL: Simple Island Language:

slide-23
SLIDE 23

TINY Example

January, 2010 Chapter 1: Introduction

23

read x; if x > 0 then fact := 1; repeat fact := fact * x; x := x - 1 until x = 0; write fact end

slide-24
SLIDE 24

C-Minus Example

January, 2010 Chapter 1: Introduction

24

int fact( int x ) { if (x > 1) return x * fact(x-1); else return 1; } void main( void ) { int x; x = read(); if (x > 0) write( fact(x) ); }

slide-25
SLIDE 25

Structure of the TINY Compiler

January, 2010 Chapter 1: Introduction

25

globals.h main.c util.h util.c scan.h scan.c parse.h parse.c symtab.h symtab.c analyze.h analyze.c code.h code.c cgen.h cgen.c

slide-26
SLIDE 26

Conditional Compilation Options

January, 2010 Chapter 1: Introduction

26

 NO_PARSE:

 Builds a scanner-only compiler.

 NO_ANALYZE:

 Builds a compiler that parses and scans only.

 NO_CODE:

 Builds a compiler that performs semantic analysis, but generates

no code.

slide-27
SLIDE 27

Listing Options (built in - not flags)

January, 2010 Chapter 1: Introduction

27

 EchoSource:

 Echoes the TINY source program to the listing, together with line

numbers.

 TraceScan:

 Displays information on each token as the scanner recognizes it.

 TraceParse:

 Displays the syntax tree in a linearlized format.

 TraceAnalyze:

 Displays summary information on the symbol table and type checking.

 TraceCode:

 Prints code generation-tracing comments to the code file.

slide-28
SLIDE 28

Terminology Review

January, 2010 Chapter 1: Introduction

28

 Compiler  Interpreter  Translator  Assembler  Linker  Loader  Preprocessor  Editor  Debugger  Profiler  Source Language  Target Language  Target Platform  Relocatable  Macro substitution  IDE  Cross Compiler  Dissambler  Front End  Back End