compiler construction
play

Compiler Construction Compiler Construction 1 / 114 Mayer Goldberg - PowerPoint PPT Presentation

Compiler Construction Compiler Construction 1 / 114 Mayer Goldberg \ Ben-Gurion University Saturday 14 th December, 2019 Mayer Goldberg \ Ben-Gurion University Chapter 6 Roadmap Code Generation: Compiler Construction 2 / 114 Constants


  1. The constants-table program Compiler Construction effjcient (constants that contain other constants) 23 / 114 when your program executes ▶ The constants-table is a compile-time data structure: ▶ It exists until your compiler is done generating code ▶ It does not exist when the [generated] code is running ▶ The constants-table serves several purposes: ▶ Lays out constants where constants shall reside in memory ▶ Helps to pre-compute the locations of the constants in your ▶ The locations are needed to lay out other constants in memory ▶ The locations are needed during code generation ▶ Constants are compiled into a single mov instruction ▶ The size/depth/complexity of the constant are of no signifjcance ▶ The run-time behaviour for constants is always the same, always Mayer Goldberg \ Ben-Gurion University

  2. The constants-table ( continued ) Issue: Constants can be nested Compiler Construction 24 / 114 ▶ A sub-constant is also a constant ▶ It must be allocated at compile-time ▶ Its address needs to be known at compile-time ▶ Relevant data types: ▶ Pairs ▶ Vectors ▶ Symbols ▶ Symbols are special; we shall discuss symbols later Mayer Goldberg \ Ben-Gurion University

  3. Example in C: The linked list (4 9 6 3 5 1) const LL c6351 = {6, &c351}; Compiler Construction sub-constants of c496351 const LL c496351 = {4, &c96351}; typedef struct LL { const LL c96351 = {9, &c6351}; const LL c351 = {3, &c51}; const LL c51 = {5, &c1}; const LL c1 = {1, (struct LL *)0}; } LL; struct LL *next; int value; 25 / 114 ☞ The constants c1 , c51 , c351 , c6351 , c96351 are all ▶ They need to be defjned, laid out in memory, and their address known before we can defjne c496351 Mayer Goldberg \ Ben-Gurion University

  4. The constants-table ( continued ) Issue: Sharing sub-constants cannot assume any specifjc behaviour when performing side efgects on them sub-constants Compiler Construction 26 / 114 ▶ Since constants are, by defjnition, immutable, we can save space by factoring out & sharing common sub-constants: ▶ That side-efgects on constants are undefjned means that we ▶ This gives us license to share sub-constants ▶ Most Scheme implementations do not factor-out & share ☞ Our implementation shall factor-out & share sub-constants Mayer Goldberg \ Ben-Gurion University

  5. The constants-table ( continued ) Interactivity Compiler Construction 27 / 114 ▶ Most Scheme systems are interactive ▶ Interactivity is not the same as being interpreted ▶ Chez Scheme is interactive ▶ Ches Scheme has no interpreter ▶ Expressions are compiled & executed on-the-fmy Mayer Goldberg \ Ben-Gurion University

  6. The constants-table ( continued ) Interactivity tag-parse) interpreting it, or by compiling & executing it on-the-fmy (exit) is evaluated Compiler Construction 28 / 114 ▶ Interactive systems are conversational ▶ The “conversation” takes place at the REPL ▶ REPL stands for Read-Eval-Print-Loop ▶ Read: Read an expression from an input channel (scan, read, ▶ Eval: Compute the value of the expression, either by ▶ Print: Print the value of the expression (unless it’s #<void> ) ▶ Loop: Return to the start of the REPL ▶ The REPL executes until the end-of-fjle is reached or the Mayer Goldberg \ Ben-Gurion University

  7. The constants-table ( continued ) Interactivity ( continued ) expressions are entered at the REPL sub-constants: constants” at run-time would make this process imperfect Compiler Construction 29 / 114 ▶ Interactive systems need to create constants on-the-fmy, as ▶ Creating constants on-the-fmy is not conducive to sharing ▶ There would be a great performance penalty to “looking up ▶ Some constants would/should be garbage-collected, which ▶ So interactive systems do not factor & share constants Mayer Goldberg \ Ben-Gurion University

  8. The constants-table ( continued ) Interactivity ( continued ) on-the-fmy, which is harder than generating and writing assembly-instructions (which are just text) to a text fjle invoke a system debugger (such as gdb ) on an executable time-consuming, and would not ofger a great added benefjt to the course Compiler Construction 30 / 114 ☞ But we are not writing an interactive compiler! ▶ Writing interactive compilers requires generating machine-code ▶ Interactive compilers are harder to debug, since we cannot ▶ While interactive compilers are fun, the exercise would be Mayer Goldberg \ Ben-Gurion University

  9. The constants-table ( continued ) too, which would be about as diffjcult as writing an interactive compiler of time Compiler Construction 31 / 114 ▶ We are writing an offmine/batch compiler ▶ It’s not conversational ▶ We see all the source code at compile-time ▶ We won’t be implementing the load procedure ▶ So code cannot be loaded during run-time ▶ This would require the compiler to be available during run-time ▶ In particular, we get to see all the constants in our code, ahead ▶ So it makes sense that we factor/share sub-constants Mayer Goldberg \ Ben-Gurion University

  10. The constants-table ( continued ) Constructing the constants-table Const records which it is a part Compiler Construction 32 / 114 ① Scan the AST (one recursive pass) & collect the sexprs in all ▶ The result is a list of sexprs ② Convert the list to a set (removing duplicates) ③ Expand the list to include all sub-constants ▶ The list should be sorted topologically ▶ Each sub-constant should appear in the list before the const of ▶ For example, (2 3) should appear before (1 2 3) ④ Convert the resulting list into a set (remove all duplicates, again) Mayer Goldberg \ Ben-Gurion University

  11. The constants-table ( continued ) Constructing the constants-table Compiler Construction less than 256 the constants-table your code 33 / 114 constants-table: ⑤ Go over the list, from fjrst to last, and create the ① For each sexpr in the list, create a 3-tuple: ▶ The address of the constant sexpr ▶ The constant sexpr itself ▶ The representation of the constant sexpr as a list of bytes ② The fjrst constant should start at address zero (0) ▶ The TAs will instruct you how to make use of this address in ③ The constant sexpr is used as a key for looking up constants in ④ The representation of a constant is a list of numbers: ▶ Each number is a byte, that is, a non-negative integer that is Mayer Goldberg \ Ben-Gurion University

  12. The constants-table ( continued ) Constructing the constants-table constants-table: it, in its intermediate state, to look-up the addresses of sub-constants lookup & extend the constants-table Compiler Construction 34 / 114 ⑤ Go over the list, from fjrst to last, and create the ⑤ As you construct the constants-table, you shall need to consult ▶ The list of 3-tuples contains all the information needed to Mayer Goldberg \ Ben-Gurion University

  13. The constants-table ( continued ) program Compiler Construction constants at run-time to create and issue the mov instructions that evaluate the representation of other constants that contain them How the constants-table is used 35 / 114 your program ① The representations of the constants initialize the memory of ▶ They are laid out in memory by the code-generator ▶ They are allocated in assembly-language by the compiler ▶ They are assembled into a data stored in a data segment ▶ They are loaded by the system loader when you run your ▶ They are available in memory before the program starts to run ② The addresses of the constants are used to to determine the ③ The addresses of the constants are used by the code generator Mayer Goldberg \ Ben-Gurion University

  14. The constants-table ( continued ) How/where sharing of sub-constants takes place in the constants-table appeared, and is now shared by all of them Compiler Construction 36 / 114 ▶ When constructing the constants-table, we twice converted lists to sets, i.e., removed duplicates ▶ This means that for any constant sexpr S will appear only once ▶ All sexprs that contain S will use the same address of the one and only occurrence of the constant sexpr S ▶ So S has been “factored out” of all constant sexprs in which it Mayer Goldberg \ Ben-Gurion University

  15. The constants-table ( continued ) You still need some information… The code to generate the constants-table is straightforward to write, but please don’t start on it just yet. The TAs will give you some additional information: representing the various constants in memory various data types sub-constants Compiler Construction 37 / 114 ▶ The TAs will give you the layout, i.e., the schema for ▶ In particular, you need to know how to encode the RTTI for the ▶ For Strings, Pairs, Vectors, you need to know how to handle ▶ Symbols are complicated (will be covered later on) Mayer Goldberg \ Ben-Gurion University

  16. Chapter 6 Roadmap Code Generation: Compiler Construction 38 / 114 🗹 Constants ▶ Symbols & Free Variables ▶ The Code Generator Mayer Goldberg \ Ben-Gurion University

  17. Symbols & Free Vars interactive systems vs batch systems, the implementation of symbols & free variables is difgerent too similar languages) developed over decades, and is by now a fundamental aspect of these languages, so understing the implementation is essential in a standard, interactive system Compiler Construction 39 / 114 ▶ Just as the implementation of constants is difgerent in ▶ The implementation of symbols & free variables in Scheme (and ▶ We fjrst consider how symbols & free variables are implemented ▶ Then we consider how batch systems are difgerent ▶ Finally, we detail what you should implement in your system Mayer Goldberg \ Ben-Gurion University

  18. Symbols & Free Vars ( continued ) Interactive Systems and is known as its print name > (symbol->string 'moshe) "moshe" Compiler Construction 40 / 114 ▶ Symbols are hashed strings ▶ The hash table is also known as the symbol table ▶ Each symbol has a representative string that serves as a key ▶ To see the print names, use the procedure symbol->string : Mayer Goldberg \ Ben-Gurion University

  19. Symbols & Free Vars ( continued ) Interactive Systems ( continued ) modifjed using string-set! function will no longer map to it hackish way to “hide” data. Today it’s an unnecessary anachronism… Compiler Construction 41 / 114 ▶ Symbols are hashed strings ▶ Dr Racket returns a duplicate of the representative string ▶ Chez Scheme returns the exact, identical string ▶ This is one area where Che’s behaviour is problematic: ▶ If the original representative string is returned, it can be ▶ This shall render the symbol inaccessible, since the hash ▶ This [mis-]behaviour was intentional in Chez, and was used as a Mayer Goldberg \ Ben-Gurion University

  20. Symbols & Free Vars ( continued ) or pre-existing) Compiler Construction same name Interactive Systems ( continued ) 42 / 114 new expressions get typed at the REPL or loaded from fjles ▶ In interactive Scheme, new symbols are added all the time as ▶ The scanner is in charge of ▶ Recognizing the symbol token ▶ Hashing the symbol string to obtain a bucket (whether original ▶ Creating the symbol object: A symbol is a tagged object containing the address of the corresponding bucket ▶ The bucket contains 2 cells: ▶ The print cell , pointing to the representative string ▶ The value cell, holding the value of the free variable by the Mayer Goldberg \ Ben-Gurion University

  21. Symbols & Free Vars ( continued ) Interactive Systems ( continued ) strings closely related in interactive systems: Compiler Construction 43 / 114 ▶ The symbol-table serves two purposes: ▶ Managing the symbol data structure as a collection of hashed ▶ Managing the global-variable bindings via the top-level ▶ These two purposes may appear unrelated, but, in fact, they are ▶ Every free variable was once a symbol… ▶ Every symbol is hashed ▶ Free variables and symbols can be loaded during run-time Mayer Goldberg \ Ben-Gurion University

  22. The value-cell are defjned at the top-level aggregating groups of functions and variables is exported by default Compiler Construction 44 / 114 ▶ The view of the symbol-table across the dimension of the value cells is known as the top-level ▶ The top-level holds the global bindings in Scheme ▶ For example, the procedures car , cdr , cons , and other builtins ▶ Modern versions of Scheme (R 6 RS) & modern dialects of LISP come with namespaces, packages, modules, as ways of ▶ The top-level, in such systems, is just a system namespace that Mayer Goldberg \ Ben-Gurion University

  23. n -LISP name-cell, they contain 2 additional cells Compiler Construction 45 / 114 ▶ Scheme buckets come with a name-cell and a value-cell ▶ Some dialects of LISP come with more cells ▶ A value-cell & and a function-cell ▶ Such systems are known as 2-LISP systems, because beyond the ▶ In this sense, Scheme is a 1-LISP Mayer Goldberg \ Ben-Gurion University

  24. n -LISP ( continued ) What does it means to have a value-cell & function-cell? (x x) value procedure funcall data), you need to use the special form function (which has the reader-macro form #' ) Compiler Construction 46 / 114 ▶ The same variable name can refer both to a procedure and a ▶ This does not mean you cannot store a procedure in a value cell ▶ To apply a procedure in a value cell, you need to use the ▶ To obtain the closure in the function-cell (to be passed as Mayer Goldberg \ Ben-Gurion University

  25. n -LISP ( continued ) curve if you ever need to learn Perl! Compiler Construction 47 / 114 name adds power to the language ▶ What are the advantages of 2-LISP languages? ☞ There are NONE! ▶ A long time ago, some people thought the ability to overload a ▶ So why bother with 2-LISP languages?? ▶ Well, there’s this hardly-known, esoteric, programming language by the name of Perl, which is a 5-LISP language… 😊 ▶ Every name in Perl can be used for ▶ A function ▶ A scalar ▶ An array ▶ A hash table ▶ A fjle handle ▶ So knowing about this nonsense might reduce your learning Mayer Goldberg \ Ben-Gurion University

  26. Symbols & Free Vars ( continued ) Part of the symbol-table & top-level for the code: Compiler Construction 48 / 114 > (define x 34) > (define foo 'foo) The Symbol Table & Top-Level hash table print cell symbol hash bucket value cell integer 34 symbol string 3 'f' 'o' 'o' string 1 'x' Mayer Goldberg \ Ben-Gurion University

  27. Symbols & Free Vars ( continued ) become the print name for a symbol: > (string->symbol "a234") a234 > (string->symbol "A234") A234 > (string->symbol "A 234") A\x20;234 > (string->symbol "this is a bad symbol!") this\x20;is\x20;a\x20;bad\x20;symbol! Compiler Construction 49 / 114 ▶ There is a strict grammar for literal symbols, but any string can Mayer Goldberg \ Ben-Gurion University

  28. Symbols & Free Vars ( continued ) to the hash function for the symbol-table languages Compiler Construction 50 / 114 ▶ Because a symbol can be created from any string, symbols that do not resemble literal strings are printed in peculiar ways, using hexadecimal characters, so as to avoid confusion ▶ The string->symbol procedure may be thought of as the API ▶ As of R 6 RS, Scheme supports hash tables as fjrst-class objects, so programmers may use them as freely as dictionaries in other Mayer Goldberg \ Ben-Gurion University

  29. Symbols & Free Vars ( continued ) the initial value is #<undefined> (a special object that signifjes that the global variable hasn’t been defjned & holds no value) error, although Chez Scheme is tolerant of this, and tacitly defjnes the variable before setting it to set! Compiler Construction 51 / 114 ▶ When a new symbol is hashed, a bucket for it is created, and ▶ Global variables are defjned by means of the define -expression ▶ Attempts to assign an undefjned variable is defjned to be an ▶ Re-defjning a variable changes its value & is somewhat similar Mayer Goldberg \ Ben-Gurion University

  30. Symbols & Free Vars ( continued ) sexpr: objects are created variable, the variable cell is accessible for defjnition/set/get via the hash bucket of the symbol the symbol & the hash bucket Compiler Construction 52 / 114 ▶ When expressions are read, either at the REPL, or from a fjle, either in textual or compiled form, each expr is fjrst read as an ▶ At this stage, symbols are hashed & the corresponding symbol ▶ If, upon parsing, such a symbol turns out to denote a free ▶ Thus free variable access is but a pointer dereference away from Mayer Goldberg \ Ben-Gurion University

  31. Symbols & Free Vars ( continued ) entry-top-line asmop-add entry-bot-line Compiler Construction 9420 > (length (oblist)) entry-mark Effect) record-constructor ... (entry-row-set! entry-col-set! entry-screen-cols-set! > (oblist) procedure oblist : answer is always affjrmative. to ask whether a given symbol is in the symbol-table: The 53 / 114 ▶ Because symbols are hashed by the scanner, it makes no sense ▶ The list of print-names from the symbol-table is available via the Mayer Goldberg \ Ben-Gurion University

  32. Symbols & Free Vars ( continued ) break the defjnition of symbols: Compiler Construction gensym , and are also known as gensyms variable names in hygienic macro-expanders is not equal to any other symbol (in the sense of eq? ) fresh symbol that does not appear anywhere in the system, and 54 / 114 hashed a special kind of symbols that are not hashed hash table) ▶ Symbols for which there are buckets in the hash-table are said to be interned symbols (in the sense that they are internal to the ▶ Another kind of symbols are the uninterned symbols, which are ▶ The vast majority of symbols in the system are interned and ▶ Uninterned symbols are a hack that was added intentionally to ▶ Uninterned symbols are used in situations where we require a ▶ Such symbols are used when we need unique names, such as for ▶ Uninterned symbols are created by means of the procedure Mayer Goldberg \ Ben-Gurion University

  33. Symbols & Free Vars ( continued ) Uninterned symbols are supported via the following API: Compiler Construction #f > (eq? (gensym) (gensym)) #f > (eq? 'g1 (gensym)) g0 > (gensym) name: compared to anything but itself: names, such as g1 , g2 , g3 , etc 55 / 114 ▶ gensym generates uninterned symbols, usually with numbered ▶ The symbol? predicate returns #t for an uninterned symbol ▶ The eq? predicate returns #f whenever an uninterned symbol is ▶ Either an interned symbol, including a symbol by the same ▶ Or another gensym : Mayer Goldberg \ Ben-Gurion University

  34. Symbols & Free Vars ( continued ) Uninterned symbols are supported via the following API: form "g1" , "g2" , etc., that may look like the one generated for an uninterned symbol: > ( list (symbol-> string 'g1) (symbol-> string ( gensym ))) ("g1" "g1") Compiler Construction 56 / 114 ▶ The symbol->string procedure will generate a string of the ▶ Uninterned symbols can be identifjed via the gensym? procedure Mayer Goldberg \ Ben-Gurion University

  35. Symbols & Free Vars ( continued ) Our implementation of symbols procedure string->symbol , so that new symbol objects cannot be created, at run-time, from strings constants, and this simplifjes matters considerably sub-constant Compiler Construction 57 / 114 ▶ The implementation of symbols is simplifjed by the fact that ours is a static compiler, and therefore symbols shall not be loaded during run-time ▶ To simplify matters further, you should not implement the ▶ This means that all symbols in our system are static, literal ▶ All symbols shall have the respective representative string as a ☞ This afgects the way you construct the constants-table Mayer Goldberg \ Ben-Gurion University

  36. Symbols & Free Vars ( continued ) Our implementation of symbols points to the representative string, which itself is a tagged, constant, string-object two symbol objects of the representative string of the symbol Compiler Construction 58 / 114 ▶ The symbol data-structure is a tagged data structure that ▶ The eq? procedure should compare the address fjelds of the ▶ The symbol->string procedure shall create and return a copy Mayer Goldberg \ Ben-Gurion University

  37. Symbols & Free Vars ( continued ) Our implementation of free variables not loaded during run-time code-generation phase of the compiler pipeline Compiler Construction 59 / 114 ▶ Just as before, our implementation of free variables is simplifjed by the fact that ours is a static compiler, so free variables are ▶ Global variables in our system are not much more than names that serve as shorthand for assembly-language labels that point to global storage in the data section ▶ Your goal is to create a free-variables-table to serve during the Mayer Goldberg \ Ben-Gurion University

  38. Symbols & Free Vars ( continued ) Our implementation of free variables ( continued ) of the user-code, so you collect a list of strings that are the names of all the free variables that occur in the AST of the user code duplicate strings each name-string of a free variable a unique, indexed string for a label in the x86/64 assembly language: "v1" , "v2" , "v3" , etc Compiler Construction 60 / 114 ▶ Just as you collected a list of constants, by traversing the AST ▶ Create a set from the above list of strings, by removing ▶ Create a list of pairs, based on the above set, by associating with Mayer Goldberg \ Ben-Gurion University

  39. Symbols & Free Vars ( continued ) Our implementation of free variables ( continued ) instruction to the respective label/variable v n Compiler Construction 61 / 114 ▶ The list of pairs is the free-variables-table: ▶ This table must be available to the code generator ▶ Here is how the code-generator uses it: ▶ For a get to a free variable, the code-generator issues a mov instruction from the respective label/variable v n ▶ For a set to a free variable, the code-generator issues a mov Mayer Goldberg \ Ben-Gurion University

  40. Chapter 6 Roadmap Code Generation: Compiler Construction 62 / 114 🗹 Constants 🗹 Symbols & Free Variables ▶ The Code Generator Mayer Goldberg \ Ben-Gurion University

  41. Code Generation Compiler Construction 63 / 114 compiling program compiler Compilër* lang runs lang src dst on *Some assembly required Mayer Goldberg \ Ben-Gurion University

  42. Code Generation line… Compiler Construction 64 / 114 ▶ The code generator is a function expr ′ → string ▶ We look at expr ′ after the semantic analysis phase is complete ▶ After the constants-table and free-vars-table have been set up ▶ The string returned is x86/64 assembly language code, line by Mayer Goldberg \ Ben-Gurion University

  43. Code Generation ( continued ) Assumptions about the code-generator Compiler Construction code-generator, and consequently, for the compiler then combined to form a proof of correctness for the entire correct behaviour 65 / 114 We make several assumptions concerned our code-generator, that we shall have to satisfy: ▶ Notation: The notation � · � stands for the code-generator ▶ The induction hypothesis of the code-generator: For any expression E , � E � is a string of instructions in x86/64 assembly language, that evaluate E , and place its value in register rax ▶ We need this assumption to convince ourselves that for each node in the AST of expr ′ , we generate code that has the ▶ The relative correctness of each part of the code-generator is Mayer Goldberg \ Ben-Gurion University

  44. Code Generation ( continued ) Assumptions about the code-generator ( continued ) the fjrst 6 non-fmoating-point arguments are passed through 6 general-purpose registers, 8 fmoating point arguments are passed through 8 SSE registers, and any additional arguments are passed on the system-stack. procedures take far less than 6 arguments the extensive use of apply , variadic procedures & procedures with optional arguments, and the relatively little use of fmoating-point numbers Compiler Construction 66 / 114 ▶ The calling conventions on the x86/64 architecture specify that ▶ This calling convention is very nice for C, because most ▶ This calling convention is not very nice for Scheme, because of Mayer Goldberg \ Ben-Gurion University

  45. Code Generation ( continued ) Assumptions about the code-generator ( continued ) the fjrst 6 non-fmoating-point arguments are passed through 6 general-purpose registers, 8 fmoating point arguments are passed through 8 SSE registers, and any additional arguments are passed on the system-stack. conventions, but shall use the system-stack, organized into activation frames, to pass all the arguments, regardless of their number, type, & size Compiler Construction 67 / 114 ▶ The calling conventions on the x86/64 architecture specify that ☞ The code we generate shall not adhere to these calling Mayer Goldberg \ Ben-Gurion University

  46. Code Generation ( continued ) Assumptions about the code-generator ( continued ) constants, that are always present in the run-time system, even if they are not present statically in the code: Compiler Construction 68 / 114 ▶ We shall assume the availablity of four singleton, litteral ▶ The void object #<void> ▶ located at label sob_void ▶ The empty list () ▶ located at label sob_nil ▶ The Boolean value false #f ▶ located at label sob_false ▶ The Boolean value true #t ▶ located at label sob_true Mayer Goldberg \ Ben-Gurion University

  47. Code Generation ( continued ) ⋯ Compiler Construction Assumptions about the code-generator ( continued ) 69 / 114 ▶ We assume the following structure for all activation frames: System Stack lex env ret addr old rbp An-1 qword [rbp + 8 * 4] A0 stack frame qword [rbp + 8 * 3] n qword [rbp + 8 * 2] lex env qword [rbp + 8 * 1] ret addr qword [rbp] old rbp Mayer Goldberg \ Ben-Gurion University

  48. Code Generation ( continued ) Assumptions about the code-generator ( continued ) the body of some lambda -expression that has been applied. For example, within the body of a null let -expression: (let () ... ) an initially dummy frame at the start of the program Compiler Construction 70 / 114 ▶ We shall assume there is always at least one activation frame ▶ This means that the code-generator assumes that we are within ▶ We will need to support/maintain this assumption by setting up Mayer Goldberg \ Ben-Gurion University

  49. Code Generation ( continued ) describe in pseudo-code, what the code-generator returns for each and every node. Compiler Construction 71 / 114 We shall now go over each of the nodes in the expr ′ AST, and Mayer Goldberg \ Ben-Gurion University

  50. Code Generation ( continued ) ⋯ Compiler Construction Constants 72 / 114 The frame System Stack lex env ret addr old rbp � Const'(c) � An-1 = mov rax, AddressInConstTable ( c ) qword [rbp + 8 * 4] A0 stack frame qword [rbp + 8 * 3] n qword [rbp + 8 * 2] lex env qword [rbp + 8 * 1] ret addr qword [rbp] old rbp Mayer Goldberg \ Ben-Gurion University

  51. Code Generation ( continued ) ⋯ Compiler Construction Parameters / get 73 / 114 The frame System Stack lex env ret addr old rbp � Var'(VarParam'(_, minor)) � An-1 = mov rax , qword [ rbp + 8 ∗ ( 4 + minor )] qword [rbp + 8 * 4] A0 stack frame qword [rbp + 8 * 3] n qword [rbp + 8 * 2] lex env qword [rbp + 8 * 1] ret addr qword [rbp] old rbp Mayer Goldberg \ Ben-Gurion University

  52. Code Generation ( continued ) The frame Compiler Construction Parameters / set ⋯ 74 / 114 System Stack lex env ret addr � Set(Var'(VarParam'(_, minor)), E ) � old rbp An-1 = � E � mov qword [ rbp + 8 ∗ ( 4 + minor )] , rax qword [rbp + 8 * 4] A0 stack frame qword [rbp + 8 * 3] n mov rax , sob _ void qword [rbp + 8 * 2] lex env qword [rbp + 8 * 1] ret addr qword [rbp] old rbp Mayer Goldberg \ Ben-Gurion University

  53. Code Generation ( continued ) The frame Compiler Construction Bound vars / get ⋯ 75 / 114 System Stack lex env � Var'(VarBound'(_, major, minor)) � ret addr old rbp = mov rax , qword [ rbp + 8 ∗ 2 ] An-1 mov rax , qword [ rax + 8 ∗ major ] qword [rbp + 8 * 4] A0 stack frame mov rax , qword [ rax + 8 ∗ minor ] qword [rbp + 8 * 3] n qword [rbp + 8 * 2] lex env qword [rbp + 8 * 1] ret addr qword [rbp] old rbp Mayer Goldberg \ Ben-Gurion University

  54. Code Generation ( continued ) The frame Compiler Construction Bound vars / set ⋯ 76 / 114 major, System Stack � Set(Var'(VarBound'(_, lex env ret addr minor)), E ) � old rbp = � E � An-1 mov rbx , qword [ rbp + 8 ∗ 2 ] qword [rbp + 8 * 4] A0 stack frame mov rbx , qword [ rbx + 8 ∗ major ] qword [rbp + 8 * 3] n qword [rbp + 8 * 2] lex env mov qword [ rbx + 8 ∗ minor ] , rax qword [rbp + 8 * 1] ret addr qword [rbp] old rbp mov rax , sob _ void Mayer Goldberg \ Ben-Gurion University

  55. Code Generation ( continued ) ⋯ Compiler Construction Free vars / get 77 / 114 The frame System Stack lex env ret addr old rbp � Var'(VarFree'(v)) � An-1 = mov rax , qword [ LabelInFVarTable ( v )] qword [rbp + 8 * 4] A0 stack frame qword [rbp + 8 * 3] n qword [rbp + 8 * 2] lex env qword [rbp + 8 * 1] ret addr qword [rbp] old rbp Mayer Goldberg \ Ben-Gurion University

  56. Code Generation ( continued ) The frame Compiler Construction Free vars / set ⋯ 78 / 114 System Stack lex env ret addr � Set(Var'(VarFree'(v)), E ) � old rbp An-1 = � E � mov qword [ LabelInFVarTable ( v )] , rax qword [rbp + 8 * 4] A0 stack frame qword [rbp + 8 * 3] n mov rax , sob _ void qword [rbp + 8 * 2] lex env qword [rbp + 8 * 1] ret addr qword [rbp] old rbp Mayer Goldberg \ Ben-Gurion University

  57. Code Generation ( continued ) The frame Compiler Construction Sequences ⋯ 79 / 114 System Stack � Seq([ E 1 ; E 2 ; · · · ; E n ]) � lex env ret addr = � E 1 � old rbp An-1 � E 2 � · · · qword [rbp + 8 * 4] A0 stack frame qword [rbp + 8 * 3] n � E n � qword [rbp + 8 * 2] lex env qword [rbp + 8 * 1] ret addr qword [rbp] old rbp Mayer Goldberg \ Ben-Gurion University

  58. Code Generation ( continued ) jne Lexit Compiler Construction Or ⋯ The frame Lexit: 80 / 114 cmp rax, sob_false jne Lexit cmp rax, sob_false � Or'([ E 1 ; E 2 ; · · · ; E n ]) � System Stack = � E 1 � lex env ret addr old rbp � E 2 � An-1 qword [rbp + 8 * 4] A0 stack frame qword [rbp + 8 * 3] n qword [rbp + 8 * 2] lex env · · · qword [rbp + 8 * 1] ret addr qword [rbp] old rbp � E n � Mayer Goldberg \ Ben-Gurion University

  59. Code Generation ( continued ) Lelse: Compiler Construction If ⋯ The frame Lexit: 81 / 114 jmp Lexit je Lelse cmp rax, sob_false System Stack � If'( Q , T , E ) � = � Q � lex env ret addr old rbp An-1 � T � qword [rbp + 8 * 4] A0 stack frame qword [rbp + 8 * 3] n qword [rbp + 8 * 2] lex env qword [rbp + 8 * 1] ret addr � E � qword [rbp] old rbp Mayer Goldberg \ Ben-Gurion University

  60. Code Generation ( continued ) Boxes only one of many possible implementations Compiler Construction 82 / 114 ▶ Boxes privide one extra level of indirection to the value ▶ Boxes can be implemented as untagged arrays of size 1 ▶ That they are untagged means that boxes do not contain RTTI ▶ This is probably the simplest implementation, but nevertheless, Mayer Goldberg \ Ben-Gurion University

  61. Code Generation ( continued ) The frame Compiler Construction Box / get ⋯ 83 / 114 System Stack lex env ret addr old rbp � BoxGet'(Var'(v)) � An-1 = � Var'(v) � qword [rbp + 8 * 4] A0 stack frame mov rax , qword [ rax ] qword [rbp + 8 * 3] n qword [rbp + 8 * 2] lex env qword [rbp + 8 * 1] ret addr qword [rbp] old rbp Mayer Goldberg \ Ben-Gurion University

  62. Code Generation ( continued ) mov rax, sob_void Compiler Construction Box / set ⋯ The frame 84 / 114 push rax System Stack � BoxSet'(Var'(v), E ) � lex env ret addr = � E � old rbp An-1 � Var'(v) � qword [rbp + 8 * 4] A0 stack frame qword [rbp + 8 * 3] n pop qword [ rax ] qword [rbp + 8 * 2] lex env qword [rbp + 8 * 1] ret addr qword [rbp] old rbp Mayer Goldberg \ Ben-Gurion University

  63. Code Generation ( continued ) ++i, ++j) { Compiler Construction LambdaSimple Outline } ExtEnv[j] = Env[i]; 85 / 114 1): (on the stack) to ExtEnv (with ofgset of pseudo-code: � LambdaSimple ′ ([ p 1 ; · · · ; p m ] , body ]) � in Closure-Creation Code Create ExtEnv ▶ Allocate the ExtEnv (the size of which is known statically, and is 1 + | Env | ) Allocate closure object ▶ Copy pointers of minor vectors from Env Closure → Env ≔ ExtEnv Closure → Code ≔ Lcode jmp Lcont Lcode: closure body push rbp for (i = 0, j = 1; i < | Env | ; mov rbp, rsp 〚 body 〛 leave ret Lcont: Mayer Goldberg \ Ben-Gurion University

  64. Code Generation ( continued ) for (i = 0; i < n; ++i) Compiler Construction LambdaSimple ( continued ) Outline ExtEnv[0][i] = Param i ; 86 / 114 where to store the parameters pseudo-code: � LambdaSimple ′ ([ p 1 ; · · · ; p m ] , body ]) � in Closure-Creation Code Create ExtEnv ▶ Allocate ExtEnv[0] to point to a vector Allocate closure object ▶ Copy the parameters ofg of the stack: Closure → Env ≔ ExtEnv Closure → Code ≔ Lcode jmp Lcont Lcode: closure body push rbp mov rbp, rsp ▶ Allocate the closure object; Address in rax 〚 body 〛 ▶ Set rax → env = ExtEnv leave ret ▶ Set rax → code = Lcode Lcont: ▶ jmp Lcont Mayer Goldberg \ Ben-Gurion University

  65. Code Generation ( continued ) leave Compiler Construction LambdaSimple ( continued ) Outline Lcont: ret 87 / 114 pseudo-code: push rbp mov rbp, rsp Closure-Creation Code � LambdaSimple ′ ([ p 1 ; · · · ; p m ] , body ]) � in Create ExtEnv Allocate closure object ▶ Lcode: Closure → Env ≔ ExtEnv Closure → Code ≔ Lcode jmp Lcont � body � Lcode: closure body push rbp mov rbp, rsp 〚 body 〛 leave ret Lcont: Mayer Goldberg \ Ben-Gurion University

  66. Code Generation ( continued ) compositional Compiler Construction LambdaSimple ( continued ) Outline with the jmp Lcont instruction code-generator 88 / 114 perform only the code in blue the code in orange executes the code in blue Closure-Creation Code ▶ During the creation of the closure, we Create ExtEnv ▶ During the application of the closure, only Allocate closure object ▶ The code in orange is embedded within Closure → Env ≔ ExtEnv Closure → Code ≔ Lcode jmp Lcont ▶ This makes our code-generator Lcode: closure body push rbp mov rbp, rsp ▶ We can combine the output of the 〚 body 〛 leave ret ▶ The downside is that we need to pay Lcont: Mayer Goldberg \ Ben-Gurion University

  67. Code Generation ( continued ) lambda -expression, so that the normal Compiler Construction LambdaSimple ( continued ) Outline the the closure was applied mistake, and it would not execute unless program-fmow would not reach it by 89 / 114 of the code-generator code-generator non-compositional instruction at the expense of making our of the way” where to place the code Closure-Creation Code ▶ We could have saved the jmp Lcont Create ExtEnv ▶ We could not have combined the output Allocate closure object Closure → Env ≔ ExtEnv Closure → Code ≔ Lcode ▶ We would have needed some place, “out jmp Lcont Lcode: generated for the body of a closure body push rbp mov rbp, rsp 〚 body 〛 leave ret Lcont: Mayer Goldberg \ Ben-Gurion University

  68. Code Generation ( continued ) . Compiler Construction Verify that rax has type closure push n Application push rax . . push rax 90 / 114 � Applic ′ ( proc , [ Arg 1 ; · · · ; Arg n ]) � in pseudo-code: � Arg n � � Arg 1 � � proc � push rax → env call rax → code Mayer Goldberg \ Ben-Gurion University

  69. Code Generation ( continued ) Application ( continued ) add rsp, 8*1 ; pop env pop rbx ; pop arg count shl rbx, 3 ; rbx = rbx * 8 add rsp, rbx; pop args stack before popping ofg the arguments pop might be difgerent from the number originally pushed: Compiler Construction 91 / 114 � Applic ′ ( proc , [ Arg 1 ; · · · ; Arg n ]) � in pseudo-code: ▶ Notice that upon return, we consult the argument count on the ▶ This takes into account the fact that the number we need to ▶ lambda -expressions with optional arguments ▶ The tail-call optimization Mayer Goldberg \ Ben-Gurion University

  70. Code Generation ( continued ) procedure, i.e., at Lcode Compiler Construction Lambda with optional args Outline 92 / 114 pseudo-code: LambdaSimple' ClosureOpt-Creation Code Create ExtEnv � LambdaOpt ′ ([ p 1 ; · · · ; p m ] , opt , body ]) � in Allocate closure object Closure → Env ≔ ExtEnv Closure → Code ≔ Lcode jmp Lcont ▶ The code is essentially the same as for Lcode: Adjust stack for c opt args l ▶ The difgerence occurs in the body of the o s u r e push rbp b o d mov rbp, rsp y 〚 body 〛 leave ret Lcont: Mayer Goldberg \ Ben-Gurion University

  71. Code Generation ( continued ) leave Compiler Construction Lambda with optional args Outline Lcont: ret 93 / 114 mov rbp, rsp pseudo-code: push rbp optional arguments Adjust the stack for the Lcode: ClosureOpt-Creation Code � LambdaOpt ′ ([ p 1 ; · · · ; p m ] , opt , body ]) � in Create ExtEnv Allocate closure object Closure → Env ≔ ExtEnv Closure → Code ≔ Lcode jmp Lcont Lcode: Adjust stack for c opt args l o � body � s u r e push rbp b o d mov rbp, rsp y 〚 body 〛 leave ret Lcont: Mayer Goldberg \ Ben-Gurion University

  72. Code Generation ( continued ) supposed to point to the list (3 Compiler Construction Lambda with optional args Outline arguments must change too! 5 8) 94 / 114 ... ) is applied to the arguments 1, 1, 2, 3, 5, 8 4 arguments ▶ Suppose (lambda (a b c . d) The stack as it is The stack as expected 8 (3 5 8) ▶ Six arguments are passed 5 2 ▶ The body of the procedure expects 3 1 2 1 ▶ The last argument, d , is 1 4 1 env 6 ret ▶ The stack needs to be adjusted env ret ▶ Notice that the number of Mayer Goldberg \ Ben-Gurion University

  73. Code Generation ( continued ) supposed to point to the empty Compiler Construction Lambda with optional args Outline arguments must change too! list () 95 / 114 ... ) is applied to the arguments 1, 1, 2 4 arguments ▶ Suppose (lambda (a b c . d) The stack as it is The stack as expected ▶ Three arguments are passed 2 () ▶ The body of the procedure expects 1 2 1 1 ▶ The last argument, d , is 3 1 env 4 ret env ▶ The stack needs to be adjusted ret ▶ Notice that the number of Mayer Goldberg \ Ben-Gurion University

  74. Code Generation ( continued ) the empty list Compiler Construction activation frame: Lambda with optional args to make room for opt 96 / 114 ▶ As you can see ▶ Sometimes we need to shrink the top frame ▶ Sometimes we need to enlarge the top frame by one ▶ When the number of arguments matches precisely the number of required parameters, there is no room in the frame to place ▶ We shift the contents of the frame down by one [8-byte] word ▶ We can test during run-time and decide what to do ▶ This is the basic approach ▶ We can also use magic to save us from having to test and shift down… 🧚 ▶ To use magic, we need to change the structure of our Mayer Goldberg \ Ben-Gurion University

  75. Code Generation ( continued ) With Magic Compiler Construction ⋯ Without Magic 97 / 114 ⋯ System Stack System Stack lex env lex env ret addr ret addr old rbp old rbp magic An-1 An-1 qword [rbp + 8 * 4] qword [rbp + 8 * 4] A0 stack frame stack frame qword [rbp + 8 * 3] A0 qword [rbp + 8 * 3] n qword [rbp + 8 * 2] n qword [rbp + 8 * 2] lex env qword [rbp + 8 * 1] lex env qword [rbp + 8 * 1] ret addr qword [rbp] ret addr qword [rbp] old rbp old rbp Mayer Goldberg \ Ben-Gurion University

  76. Code Generation ( continued ) supposed to point to the list (3 Compiler Construction Lambda with optional args with Outline arguments must change too! 5 8) 98 / 114 4 arguments magic 1, 1, 2, 3, 5, 8 ... ) is applied to the arguments ▶ Suppose (lambda (a b c . d) The stack as it is The stack as expected magic magic 8 (3 5 8) ▶ Six arguments are passed 5 2 ▶ The body of the procedure expects 3 1 2 1 ▶ The last argument, d , is 1 4 1 env 6 ret ▶ The stack needs to be adjusted env ret ▶ Notice that the number of Mayer Goldberg \ Ben-Gurion University

  77. Code Generation ( continued ) supposed to point to the empty Compiler Construction Lambda with optional args with Outline arguments must change too! list () 99 / 114 4 arguments magic 1, 1, 2 ... ) is applied to the arguments ▶ Suppose (lambda (a b c . d) The stack as it is The stack as expected ▶ Three arguments are passed magic () 2 2 ▶ The body of the procedure expects 1 1 1 1 ▶ The last argument, d , is 3 3 env env ret ret ▶ The stack needs to be adjusted ▶ Notice that the number of Mayer Goldberg \ Ben-Gurion University

  78. Code Generation ( continued ) Lambda with optional args ( continued ) To summarize: belong to procedures with optional arguments! from the frame after returning from an application depending on your taste/style… Compiler Construction 100 / 114 ▶ Using magic means reserving a word at the start of each frame ▶ All frames grow by one word, regardless of whether or not they ▶ We do not include magic in the argument count on the stack! ▶ If you choose to use magic you need to remember to remove it ☞ You are free to use either the basic approach or magic, Mayer Goldberg \ Ben-Gurion University

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend