Data Layouts Data Structures For a Simple Compiler Joseph Bergin - - PowerPoint PPT Presentation

data layouts
SMART_READER_LITE
LIVE PREVIEW

Data Layouts Data Structures For a Simple Compiler Joseph Bergin - - PowerPoint PPT Presentation

Data Layouts Data Structures For a Simple Compiler Joseph Bergin 1/12/99 1 Symbol Tables Information about user defined names Joseph Bergin 1/12/99 2 Symbol Table Symbol Tables are organized for fast lookup. Items are typically


slide-1
SLIDE 1

Joseph Bergin 1/12/99 1

Data Layouts

Data Structures For a Simple Compiler

slide-2
SLIDE 2

Joseph Bergin 1/12/99 2

Symbol Tables

Information about user defined names

slide-3
SLIDE 3

Joseph Bergin 1/12/99 3

Symbol Table

  • Symbol Tables are organized for

fast lookup.

È Items are typically entered once and then looked up several times. È Hash Tables and Balanced Binary Search Trees are commonly used. È Each record contains a ÒnameÓ (symbol) and information describing it.

slide-4
SLIDE 4

Joseph Bergin 1/12/99 4

Simple Hash Table

  • Hasher translates ÒnameÓ into an

integer in a fixed range- the hash value.

  • Hash Value indexes into an array
  • f lists.

È Entry with that symbol is in that list

  • r is not stored at all.

È Items with same hash value = bucket.

slide-5
SLIDE 5

Joseph Bergin 1/12/99 5

Simple Hash Table

max anObject

hasher

index buckets

slide-6
SLIDE 6

Joseph Bergin 1/12/99 6

Self Organizing Hash Table

  • Can achieve constant average time

lookup if buckets have bounded average length.

  • Can guarantee this if we

periodically double number of hash buckets and re-hash all elements.

È Can be done so as to minimize movement of items.

slide-7
SLIDE 7

Joseph Bergin 1/12/99 7

Self Organizing Hash Table

2 * max

newhasher

index max anObject

hasher

index

n n n + max

slide-8
SLIDE 8

Joseph Bergin 1/12/99 8

Balanced Binary Search Tree

  • Binary search trees work if they

are kept balanced.

  • Can achieve logarithmic lookup

time.

  • Algorithms are somewhat complex.

È Red-black trees and AVL trees are used. È No leaf is much farther from root than any other

slide-9
SLIDE 9

Joseph Bergin 1/12/99 9

Balanced Binary Search Tree

slide-10
SLIDE 10

Joseph Bergin 1/12/99 10

Symbol Tables + Blocks

  • If a language is block structured

then each block (scope) needs to be represented separately in the symbol table.

  • If the hash table buckets are

Òstack-likeÓ this is automatic.

  • Can use a stack of balanced trees

with one entry per scope.

slide-11
SLIDE 11

Joseph Bergin 1/12/99 11

Special Cases

  • Some languages partition names

into different classes- keywords, variable&function names, struct names...

  • Separate symbol tables can then be

used for each kind of name. The different symbol tables might have different characteristics.

È hashtable-sortedlist-binarytree...

slide-12
SLIDE 12

Joseph Bergin 1/12/99 12

Parsing Information

slide-13
SLIDE 13

Joseph Bergin 1/12/99 13

Parse Trees

  • The structure of a modern

computer language is tree-like

  • Trees represent recursion well.
  • A gramatical structure is a node

with its parts as child nodes.

  • Interior nodes are nonterminals.
  • The tokens of the language are

leaves.

slide-14
SLIDE 14

Joseph Bergin 1/12/99 14

Parse Trees

<statement> ::= <variable> Ò:=Ò <expression> x := a + 5 statement variable := expression x a + 5

slide-15
SLIDE 15

Joseph Bergin 1/12/99 15

Parse Trees

  • There are different node types in

the same tree.

  • Variant records or type unions are

typically used. Object-orientation is also useful here.

  • Each node has a tag that

distinguishes it, permitting testing

  • n node type.
slide-16
SLIDE 16

Joseph Bergin 1/12/99 16

Parse Stack

  • Parsing is often accomplished with

a stack. (Not in this version of GCL)

  • The stack holds values

representing tokens, nonterminals and semantic symbols from the grammar.

Ð It can either hold what is expected next (LL parsing) or what has already been seen (LR parsing)

slide-17
SLIDE 17

Joseph Bergin 1/12/99 17

Parse Stack

  • A stack is used because most

languages and their grammars are

  • recursive. Stacks can accomplish

much of what trees can.

  • The contents of the stack are

usually numeric encodings of the symbols for compactness of representation and speed of processing.

slide-18
SLIDE 18

Joseph Bergin 1/12/99 18

Parse Stack

<statement> ::= <variable> Ò:=Ó <expression> #doAssign max := max + 1; <var> Ò:=Ó <expr> #doAs ... Example being scanned: Grammar fragment

slide-19
SLIDE 19

Joseph Bergin 1/12/99 19

Stack vs Parameters

  • In recursive descent parsing, no

stack is needed.

  • This is because the semantic

records can be passed directly to the semantic routines as parameters.

  • Semantic records can also be

returned from the parsing functions.

slide-20
SLIDE 20

Joseph Bergin 1/12/99 20

Tokens

Information produced by the Scanner

slide-21
SLIDE 21

Joseph Bergin 1/12/99 21

Token Records

  • Token records pass information

about symbols scanned. This varies by token type.

  • Variant records or type unions are

typically used.

  • Each value contains a tag - the

token type - and additional information.

È The tag is usually an integer.

slide-22
SLIDE 22

Joseph Bergin 1/12/99 22

Token Examples

  • Simple tokens
  • No additional info
  • Only the tag field

È endNum

  • Others are more

complex

  • Tag plus other

info

È numeralNum È 3 5

slide-23
SLIDE 23

Joseph Bergin 1/12/99 23

Handling Strings

  • Strings are variable length and

therefore present some problems.

  • In C we can allocate a free-store
  • bject to hold the spelling--BUT,

allocation is expensive in time.

  • In Pascal, allocating fixed length

strings is wasteful.

  • Spell buffers are an alternative.
slide-24
SLIDE 24

Joseph Bergin 1/12/99 24

Strings in the Free Store

write ÒThe answer is: Ò, x; The answer is:\0 The string is represented by the value of the pointer which can be passed around the compiler. strval = new char[16];

slide-25
SLIDE 25

Joseph Bergin 1/12/99 25

Strings in a Spell Buffer

write ÒThe answer is: Ò, x; before

N a m e T h e a n s w e r i s :

18 3

N a m e

after The string is represented as (3,15) = (start, length)

slide-26
SLIDE 26

Joseph Bergin 1/12/99 26

Semantic Information

slide-27
SLIDE 27

Joseph Bergin 1/12/99 27

Semantic Information

  • Parsing and semantic routines

need to share information.

  • This information can be passed as

function parameters or a semantic stack can be used.

  • There are different kinds of

semantic information.

È Variant Records/Type Unions/Objects

slide-28
SLIDE 28

Joseph Bergin 1/12/99 28

Semantic Records

  • Each record needs a tag to

distinguish its kind. We need to test the tag types.

  • Depending on the tag there will be

additional information.

  • Sometimes the additional

information must itself be a tagged union/variant record.

slide-29
SLIDE 29

Joseph Bergin 1/12/99 29

Simple Semantic Records

identifier maximum 7 addoperator + reloperator <= ifentry J35 J36

slide-30
SLIDE 30

Joseph Bergin 1/12/99 30

Complex Semantic Records

typeentry integer 2 exprentry const 33 * see types (later) exprentry variable 0, 6 false

slide-31
SLIDE 31

Joseph Bergin 1/12/99 31

Semantic Stack

In some compilers semantic records are stored in a semantic stack. In

  • thers, they are passed as

parameters. typeentry integer 2 identifier maximum 7 identifier value 5 stacktop

slide-32
SLIDE 32

Joseph Bergin 1/12/99 32

Type Information

slide-33
SLIDE 33

Joseph Bergin 1/12/99 33

Type Information

  • Type information must be

maintained for variables and parameters.

  • There are different kinds of types

È Variant Records/Type Unions/Objects

  • There are different typing rules in

different languages.

È Pointers to records/structs are a simple representation.

slide-34
SLIDE 34

Joseph Bergin 1/12/99 34

Type Information

  • Types describe variables.

È size of a variable of this type(in bytes) È kind (tag) È additional information for some types.

  • There are also recursive types.
slide-35
SLIDE 35

Joseph Bergin 1/12/99 35

Simple Types

integer 2 Boolean 2 The tag and the size are enough. character 1

slide-36
SLIDE 36

Joseph Bergin 1/12/99 36

Tuple Type

[integer, Boolean] tuple 4 integer 2 Boolean 2

slide-37
SLIDE 37

Joseph Bergin 1/12/99 37

Recursive Types

[integer, [integer, Boolean]] tuple 6 integer 2 tuple 4 ... ...

slide-38
SLIDE 38

Joseph Bergin 1/12/99 38

Range Types

integer range[1..10] range 2 1, 10 ... integer 2

slide-39
SLIDE 39

Joseph Bergin 1/12/99 39

Array Types

Boolean array[1..10][0..4] array 100 1, 10 array 10 0, 4 Boolean 2

slide-40
SLIDE 40

Joseph Bergin 1/12/99 40

Array Types (alternate)

Boolean array [range1] [range2] array 100 array 10 Boolean 2

range 2 1, 10 range 2 0, 4 integer 2 integer 2

slide-41
SLIDE 41

Joseph Bergin 1/12/99 41

Record Types

record [integer x, boolean y ] record 4 x y integer 2 Boolean 2 Note similarity to tuple types.

slide-42
SLIDE 42

Joseph Bergin 1/12/99 42

Pointer Types

pointer [integer, Boolean] tuple 4 integer 2 Boolean 2 pointer 2

slide-43
SLIDE 43

Joseph Bergin 1/12/99 43

Procedure Types

integer 2

proc 2 ...

Boolean 2

proc (integer, Boolean)

Note: Not all languages have procedure types even when they have procedures.

slide-44
SLIDE 44

Joseph Bergin 1/12/99 44

Function Types

func (integer returns [integer, Boolean])

tuple 4 integer 2 Boolean 2 integer 2

func 2 ...

Note: Not all languages have function types even when they have functions.

slide-45
SLIDE 45

Joseph Bergin 1/12/99 45

Self Recursive Types

Some languages (Java, Modula-3) permit a type to reference itself: class node { int value; node next; } class 8 value next int 4

The internal representation is a pointer (4 bytes)

slide-46
SLIDE 46

Joseph Bergin 1/12/99 46

Recursive Types Again

[ record [integer array[0..4] x, Boolean y] , integer range [1..10] , pointer [integer, integer] , func(integer, Boolean returns integer array[1..5]) ] Left as an exercise. :-)