data layouts
play

Data Layouts Data Structures For a Simple Compiler Joseph Bergin - PowerPoint PPT Presentation

Data Layouts Data Structures For a Simple Compiler Joseph Bergin 1/12/99 1 Symbol Tables Information about user defined names Joseph Bergin 1/12/99 2 Symbol Table Symbol Tables are organized for fast lookup. Items are typically


  1. Data Layouts Data Structures For a Simple Compiler Joseph Bergin 1/12/99 1

  2. Symbol Tables Information about user defined names Joseph Bergin 1/12/99 2

  3. Symbol Table ● Symbol Tables are organized for fast lookup. È Items are typically entered once and then looked up several times. È Hash Tables and Balanced Binary Search Trees are commonly used. È Each record contains a ÒnameÓ (symbol) and information describing it. Joseph Bergin 1/12/99 3

  4. Simple Hash Table ● Hasher translates ÒnameÓ into an integer in a fixed range- the hash value. ● Hash Value indexes into an array of lists. È Entry with that symbol is in that list or is not stored at all. È Items with same hash value = bucket. Joseph Bergin 1/12/99 4

  5. Simple Hash Table index buckets anObject 0 hasher max Joseph Bergin 1/12/99 5

  6. Self Organizing Hash Table ● Can achieve constant average time lookup if buckets have bounded average length. ● Can guarantee this if we periodically double number of hash buckets and re-hash all elements. È Can be done so as to minimize movement of items. Joseph Bergin 1/12/99 6

  7. Self Organizing Hash Table index index anObject 0 0 hasher newhasher n n max n + max 2 * max Joseph Bergin 1/12/99 7

  8. Balanced Binary Search Tree ● Binary search trees work if they are kept balanced. ● Can achieve logarithmic lookup time. ● Algorithms are somewhat complex. È Red-black trees and AVL trees are used. È No leaf is much farther from root than any other Joseph Bergin 1/12/99 8

  9. Balanced Binary Search Tree Joseph Bergin 1/12/99 9

  10. Symbol Tables + Blocks ● If a language is block structured then each block (scope) needs to be represented separately in the symbol table. ● If the hash table buckets are Òstack-likeÓ this is automatic. ● Can use a stack of balanced trees with one entry per scope. Joseph Bergin 1/12/99 10

  11. Special Cases ● Some languages partition names into different classes- keywords, variable&function names, struct names... ● Separate symbol tables can then be used for each kind of name. The different symbol tables might have different characteristics. È hashtable-sortedlist-binarytree... Joseph Bergin 1/12/99 11

  12. Parsing Information Joseph Bergin 1/12/99 12

  13. Parse Trees ● The structure of a modern computer language is tree-like ● Trees represent recursion well. ● A gramatical structure is a node with its parts as child nodes. ● Interior nodes are nonterminals. ● The tokens of the language are leaves. Joseph Bergin 1/12/99 13

  14. Parse Trees <statement> ::= <variable> Ò:=Ò <expression> x := a + 5 statement := variable expression x a + 5 Joseph Bergin 1/12/99 14

  15. Parse Trees ● There are different node types in the same tree. ● Variant records or type unions are typically used. Object-orientation is also useful here. ● Each node has a tag that distinguishes it, permitting testing on node type. Joseph Bergin 1/12/99 15

  16. Parse Stack ● Parsing is often accomplished with a stack. (Not in this version of GCL) ● The stack holds values representing tokens, nonterminals and semantic symbols from the grammar. Ð It can either hold what is expected next (LL parsing) or what has already been seen (LR parsing) Joseph Bergin 1/12/99 16

  17. Parse Stack ● A stack is used because most languages and their grammars are recursive. Stacks can accomplish much of what trees can. ● The contents of the stack are usually numeric encodings of the symbols for compactness of representation and speed of processing. Joseph Bergin 1/12/99 17

  18. Parse Stack Grammar fragment <var> <statement> ::= <variable> Ò:=Ó <expression> #doAssign Ò:=Ó <expr> Example being scanned: #doAs max := max + 1; ... Joseph Bergin 1/12/99 18

  19. Stack vs Parameters ● In recursive descent parsing, no stack is needed. ● This is because the semantic records can be passed directly to the semantic routines as parameters. ● Semantic records can also be returned from the parsing functions. Joseph Bergin 1/12/99 19

  20. Tokens Information produced by the Scanner Joseph Bergin 1/12/99 20

  21. Token Records ● Token records pass information about symbols scanned. This varies by token type. ● Variant records or type unions are typically used. ● Each value contains a tag - the token type - and additional information. È The tag is usually an integer. Joseph Bergin 1/12/99 21

  22. Token Examples ● Simple tokens ● Others are more complex ● No additional info ● Tag plus other ● Only the tag field info È endNum È numeralNum È 3 5 Joseph Bergin 1/12/99 22

  23. Handling Strings ● Strings are variable length and therefore present some problems. ● In C we can allocate a free-store object to hold the spelling--BUT, allocation is expensive in time. ● In Pascal, allocating fixed length strings is wasteful. ● Spell buffers are an alternative. Joseph Bergin 1/12/99 23

  24. Strings in the Free Store write ÒThe answer is: Ò, x; The answer is:\0 strval = new char[16]; The string is represented by the value of the pointer which can be passed around the compiler. Joseph Bergin 1/12/99 24

  25. Strings in a Spell Buffer write ÒThe answer is: Ò, x; N a m e 3 before 18 N a m e T h e a n s w e r i s : after The string is represented as (3,15) = (start, length) Joseph Bergin 1/12/99 25

  26. Semantic Information Joseph Bergin 1/12/99 26

  27. Semantic Information ● Parsing and semantic routines need to share information. ● This information can be passed as function parameters or a semantic stack can be used. ● There are different kinds of semantic information. È Variant Records/Type Unions/Objects Joseph Bergin 1/12/99 27

  28. Semantic Records ● Each record needs a tag to distinguish its kind. We need to test the tag types. ● Depending on the tag there will be additional information. ● Sometimes the additional information must itself be a tagged union/variant record. Joseph Bergin 1/12/99 28

  29. Simple Semantic Records identifier addoperator reloperator maximum + <= 7 ifentry J35 J36 Joseph Bergin 1/12/99 29

  30. Complex Semantic Records typeentry exprentry const 33 integer 2 exprentry variable * see types (later) 0, 6 false Joseph Bergin 1/12/99 30

  31. Semantic Stack In some compilers semantic records are stored in a semantic stack. In others, they are passed as stacktop parameters. identifier value 5 identifier maximum 7 typeentry integer 2 Joseph Bergin 1/12/99 31

  32. Type Information Joseph Bergin 1/12/99 32

  33. Type Information ● Type information must be maintained for variables and parameters. ● There are different kinds of types È Variant Records/Type Unions/Objects ● There are different typing rules in different languages. È Pointers to records/structs are a simple representation. Joseph Bergin 1/12/99 33

  34. Type Information ● Types describe variables. È size of a variable of this type(in bytes) È kind (tag) È additional information for some types. ● There are also recursive types. Joseph Bergin 1/12/99 34

  35. Simple Types integer Boolean character 2 2 1 The tag and the size are enough. Joseph Bergin 1/12/99 35

  36. Tuple Type [integer, Boolean] tuple 4 integer Boolean 2 2 Joseph Bergin 1/12/99 36

  37. Recursive Types [integer, [integer, Boolean]] tuple 6 integer tuple 2 4 ... ... Joseph Bergin 1/12/99 37

  38. Range Types integer range[1..10] range 2 1, 10 ... integer 2 Joseph Bergin 1/12/99 38

  39. Array Types Boolean array[1..10][0..4] array array Boolean 100 10 2 1, 10 0, 4 Joseph Bergin 1/12/99 39

  40. Array Types (alternate) Boolean array [range1] [range2] array array Boolean 100 10 2 range range 2 2 1, 10 0, 4 integer integer 2 2 Joseph Bergin 1/12/99 40

  41. Record Types record [integer x, boolean y ] record 4 x y integer Boolean 2 2 Note similarity to tuple types. Joseph Bergin 1/12/99 41

  42. Pointer Types pointer [integer, Boolean] pointer 2 tuple 4 integer Boolean 2 2 Joseph Bergin 1/12/99 42

  43. Procedure Types proc (integer, Boolean) proc 2 ... integer Boolean 2 2 Note: Not all languages have procedure types even when they have procedures. Joseph Bergin 1/12/99 43

  44. Function Types func (integer returns [integer, Boolean]) func 2 integer ... 2 tuple 4 integer Boolean Note: Not all languages have function types even when they have functions. 2 2 Joseph Bergin 1/12/99 44

  45. Self Recursive Types Some languages (Java, Modula-3) permit a type to reference itself: class node { int value; node next; } class value next 8 The internal representation is a int pointer (4 bytes) 4 Joseph Bergin 1/12/99 45

  46. Recursive Types Again [ record [integer array[0..4] x, Boolean y] , integer range [1..10] , pointer [integer, integer] , func(integer, Boolean returns integer array[1..5]) ] Left as an exercise. :-) Joseph Bergin 1/12/99 46

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend