Compiler Construction
Mayer Goldberg \ Ben-Gurion University Saturday 14th December, 2019
Mayer Goldberg \ Ben-Gurion University Compiler Construction 1 / 114
Compiler Construction Compiler Construction 1 / 114 Mayer Goldberg - - PowerPoint PPT Presentation
Compiler Construction Compiler Construction 1 / 114 Mayer Goldberg \ Ben-Gurion University Saturday 14 th December, 2019 Mayer Goldberg \ Ben-Gurion University Chapter 6 Roadmap Code Generation: Compiler Construction 2 / 114 Constants
Mayer Goldberg \ Ben-Gurion University Compiler Construction 1 / 114
▶ Constants ▶ Symbols & Free Variables ▶ The Code Generator
Mayer Goldberg \ Ben-Gurion University Compiler Construction 2 / 114
▶ Constants are static, allocated during compile-time, and loaded
▶ Constants must be allocated before the user-code executes: It
▶ It might not be obvious that there is a real and serious issue at
▶ The following example should help convince you of this Mayer Goldberg \ Ben-Gurion University Compiler Construction 3 / 114
▶ The anomaly of quote is the name given to a phenomenon that
▶ The anomaly was discovered & defjned in the world of
▶ However, the anomaly is not unique to LISP/Scheme, and
Mayer Goldberg \ Ben-Gurion University Compiler Construction 4 / 114
Mayer Goldberg \ Ben-Gurion University Compiler Construction 5 / 114
Mayer Goldberg \ Ben-Gurion University Compiler Construction 6 / 114
Mayer Goldberg \ Ben-Gurion University Compiler Construction 7 / 114
Mayer Goldberg \ Ben-Gurion University Compiler Construction 8 / 114
Mayer Goldberg \ Ben-Gurion University Compiler Construction 9 / 114
▶ The key idea is to trace the inner loop in last-pair:
Mayer Goldberg \ Ben-Gurion University Compiler Construction 10 / 114
Mayer Goldberg \ Ben-Gurion University Compiler Construction 11 / 114
Mayer Goldberg \ Ben-Gurion University Compiler Construction 12 / 114
Mayer Goldberg \ Ben-Gurion University Compiler Construction 13 / 114
▶ Within foo the list remains the same length (2) ▶ Within goo the list keeps growing by 2 with each application of
▶ Since (eq? (foo) (foo)) returns #f, we realize that a new
▶ Since (eq? (goo) (goo)) returns #t, we realize that the same
Mayer Goldberg \ Ben-Gurion University Compiler Construction 14 / 114
Mayer Goldberg \ Ben-Gurion University Compiler Construction 15 / 114
▶ In foo we have (let ((s (list 'he 'said:))) ... ) ▶ In goo we have (let ((s '(he said:))) ... ) ▶ Each time we call foo a new list is allocated afresh, and is then
▶ With each call to foo, the variable s gets assigned a new value,
▶ Each time we call goo the same list gets extended further
▶ If the original list exists at address L, the expression (let ((s
▶ You might think that the anomaly of quote is caused by
Mayer Goldberg \ Ben-Gurion University Compiler Construction 16 / 114
Mayer Goldberg \ Ben-Gurion University Compiler Construction 17 / 114
▶ There are no side-efgects in either foo or g
Mayer Goldberg \ Ben-Gurion University Compiler Construction 18 / 114
▶ When you defjne a string as char *s = "hello"; you’ve
▶ The type of "hello" is const char * ▶ The type of s is char * ▶ You just told the compiler not to worry about it…
▶ Just like LISP/Scheme, C/C++ will let you intermix static and
▶ But the static data has been marked .section .rodata by gcc
▶ .rodata means read only ▶ .rwdata means read or write
▶ If you try to change your mixed data,
▶ changes to the dynamic data will work just fjne ▶ changes to the static data will generate a segmentation fault Mayer Goldberg \ Ben-Gurion University Compiler Construction 19 / 114
▶ If you try to compile your code with -g (for debugging) and
▶ The debugger will ignore the page-protection bits and load your
▶ So your data is read only in the shell, but read, write, execute
▶ This is one situation in which the debugger will not refmect the
Mayer Goldberg \ Ben-Gurion University Compiler Construction 20 / 114
▶ This bug is “the monster that really does live under the bed!”
▶ It’s never there when you inspect (using a debugger) ▶ But you just know it’s there!
▶ The only way to fjx this bug is to look for places in your code
Mayer Goldberg \ Ben-Gurion University Compiler Construction 21 / 114
▶ Allocation-time is crucial ▶ Constants must be created/allocated before program execution ▶ Side-efgects on constants are defjned to be undefjned in most
▶ While testing for identity (e.g., using eq?) is not a side-efgect, it
Mayer Goldberg \ Ben-Gurion University Compiler Construction 22 / 114
▶ The constants-table is a compile-time data structure:
▶ It exists until your compiler is done generating code ▶ It does not exist when the [generated] code is running
▶ The constants-table serves several purposes:
▶ Lays out constants where constants shall reside in memory
▶ Helps to pre-compute the locations of the constants in your
▶ The locations are needed to lay out other constants in memory
▶ The locations are needed during code generation ▶ Constants are compiled into a single mov instruction ▶ The size/depth/complexity of the constant are of no signifjcance ▶ The run-time behaviour for constants is always the same, always
Mayer Goldberg \ Ben-Gurion University Compiler Construction 23 / 114
▶ A sub-constant is also a constant
▶ It must be allocated at compile-time ▶ Its address needs to be known at compile-time
▶ Relevant data types:
▶ Pairs ▶ Vectors ▶ Symbols ▶ Symbols are special; we shall discuss symbols later Mayer Goldberg \ Ben-Gurion University Compiler Construction 24 / 114
▶ They need to be defjned, laid out in memory, and their address
Mayer Goldberg \ Ben-Gurion University Compiler Construction 25 / 114
▶ Since constants are, by defjnition, immutable, we can save space
▶ That side-efgects on constants are undefjned means that we
▶ This gives us license to share sub-constants ▶ Most Scheme implementations do not factor-out & share
Mayer Goldberg \ Ben-Gurion University Compiler Construction 26 / 114
▶ Most Scheme systems are interactive ▶ Interactivity is not the same as being interpreted
▶ Chez Scheme is interactive ▶ Ches Scheme has no interpreter ▶ Expressions are compiled & executed on-the-fmy Mayer Goldberg \ Ben-Gurion University Compiler Construction 27 / 114
▶ Interactive systems are conversational
▶ The “conversation” takes place at the REPL ▶ REPL stands for Read-Eval-Print-Loop ▶ Read: Read an expression from an input channel (scan, read,
▶ Eval: Compute the value of the expression, either by
▶ Print: Print the value of the expression (unless it’s #<void>) ▶ Loop: Return to the start of the REPL ▶ The REPL executes until the end-of-fjle is reached or the
Mayer Goldberg \ Ben-Gurion University Compiler Construction 28 / 114
▶ Interactive systems need to create constants on-the-fmy, as
▶ Creating constants on-the-fmy is not conducive to sharing
▶ There would be a great performance penalty to “looking up
▶ Some constants would/should be garbage-collected, which
▶ So interactive systems do not factor & share constants
Mayer Goldberg \ Ben-Gurion University Compiler Construction 29 / 114
▶ Writing interactive compilers requires generating machine-code
▶ Interactive compilers are harder to debug, since we cannot
▶ While interactive compilers are fun, the exercise would be
Mayer Goldberg \ Ben-Gurion University Compiler Construction 30 / 114
▶ We are writing an offmine/batch compiler
▶ It’s not conversational ▶ We see all the source code at compile-time ▶ We won’t be implementing the load procedure ▶ So code cannot be loaded during run-time ▶ This would require the compiler to be available during run-time
▶ In particular, we get to see all the constants in our code, ahead
▶ So it makes sense that we factor/share sub-constants Mayer Goldberg \ Ben-Gurion University Compiler Construction 31 / 114
▶ The result is a list of sexprs
▶ The list should be sorted topologically ▶ Each sub-constant should appear in the list before the const of
▶ For example, (2 3) should appear before (1 2 3)
Mayer Goldberg \ Ben-Gurion University Compiler Construction 32 / 114
▶ The address of the constant sexpr ▶ The constant sexpr itself ▶ The representation of the constant sexpr as a list of bytes
▶ The TAs will instruct you how to make use of this address in
▶ Each number is a byte, that is, a non-negative integer that is
Mayer Goldberg \ Ben-Gurion University Compiler Construction 33 / 114
▶ The list of 3-tuples contains all the information needed to
Mayer Goldberg \ Ben-Gurion University Compiler Construction 34 / 114
▶ They are laid out in memory by the code-generator ▶ They are allocated in assembly-language by the compiler ▶ They are assembled into a data stored in a data segment ▶ They are loaded by the system loader when you run your
▶ They are available in memory before the program starts to run
Mayer Goldberg \ Ben-Gurion University Compiler Construction 35 / 114
▶ When constructing the constants-table, we twice converted lists
▶ This means that for any constant sexpr S will appear only once
▶ All sexprs that contain S will use the same address of the one
▶ So S has been “factored out” of all constant sexprs in which it
Mayer Goldberg \ Ben-Gurion University Compiler Construction 36 / 114
▶ The TAs will give you the layout, i.e., the schema for
▶ In particular, you need to know how to encode the RTTI for the
▶ For Strings, Pairs, Vectors, you need to know how to handle
▶ Symbols are complicated (will be covered later on)
Mayer Goldberg \ Ben-Gurion University Compiler Construction 37 / 114
▶ Symbols & Free Variables ▶ The Code Generator
Mayer Goldberg \ Ben-Gurion University Compiler Construction 38 / 114
▶ Just as the implementation of constants is difgerent in
▶ The implementation of symbols & free variables in Scheme (and
▶ We fjrst consider how symbols & free variables are implemented
▶ Then we consider how batch systems are difgerent ▶ Finally, we detail what you should implement in your system Mayer Goldberg \ Ben-Gurion University Compiler Construction 39 / 114
▶ Symbols are hashed strings
▶ The hash table is also known as the symbol table ▶ Each symbol has a representative string that serves as a key
▶ To see the print names, use the procedure symbol->string:
Mayer Goldberg \ Ben-Gurion University Compiler Construction 40 / 114
▶ Symbols are hashed strings
▶ Dr Racket returns a duplicate of the representative string ▶ Chez Scheme returns the exact, identical string ▶ This is one area where Che’s behaviour is problematic: ▶ If the original representative string is returned, it can be
▶ This shall render the symbol inaccessible, since the hash
▶ This [mis-]behaviour was intentional in Chez, and was used as a
Mayer Goldberg \ Ben-Gurion University Compiler Construction 41 / 114
▶ In interactive Scheme, new symbols are added all the time as
▶ The scanner is in charge of ▶ Recognizing the symbol token ▶ Hashing the symbol string to obtain a bucket (whether original
▶ Creating the symbol object: A symbol is a tagged object
▶ The bucket contains 2 cells: ▶ The print cell , pointing to the representative string ▶ The value cell, holding the value of the free variable by the
Mayer Goldberg \ Ben-Gurion University Compiler Construction 42 / 114
▶ The symbol-table serves two purposes:
▶ Managing the symbol data structure as a collection of hashed
▶ Managing the global-variable bindings via the top-level
▶ These two purposes may appear unrelated, but, in fact, they are
▶ Every free variable was once a symbol… ▶ Every symbol is hashed ▶ Free variables and symbols can be loaded during run-time Mayer Goldberg \ Ben-Gurion University Compiler Construction 43 / 114
▶ The view of the symbol-table across the dimension of the value
▶ The top-level holds the global bindings in Scheme ▶ For example, the procedures car, cdr, cons, and other builtins
▶ Modern versions of Scheme (R6RS) & modern dialects of LISP
▶ The top-level, in such systems, is just a system namespace that
Mayer Goldberg \ Ben-Gurion University Compiler Construction 44 / 114
▶ Scheme buckets come with a name-cell and a value-cell ▶ Some dialects of LISP come with more cells
▶ A value-cell & and a function-cell ▶ Such systems are known as 2-LISP systems, because beyond the
▶ In this sense, Scheme is a 1-LISP Mayer Goldberg \ Ben-Gurion University Compiler Construction 45 / 114
▶ The same variable name can refer both to a procedure and a
▶ This does not mean you cannot store a procedure in a value cell ▶ To apply a procedure in a value cell, you need to use the
▶ To obtain the closure in the function-cell (to be passed as
Mayer Goldberg \ Ben-Gurion University Compiler Construction 46 / 114
▶ What are the advantages of 2-LISP languages?
▶ A long time ago, some people thought the ability to overload a
▶ So why bother with 2-LISP languages??
▶ Well, there’s this hardly-known, esoteric, programming
▶ Every name in Perl can be used for ▶ A function ▶ A scalar ▶ An array ▶ A hash table ▶ A fjle handle ▶ So knowing about this nonsense might reduce your learning
Mayer Goldberg \ Ben-Gurion University Compiler Construction 47 / 114
print cell value cell string 3 'f' 'o' 'o' symbol string 1 'x' symbol The Symbol Table & Top-Level integer 34 hash table hash bucket
Mayer Goldberg \ Ben-Gurion University Compiler Construction 48 / 114
▶ There is a strict grammar for literal symbols, but any string can
Mayer Goldberg \ Ben-Gurion University Compiler Construction 49 / 114
▶ Because a symbol can be created from any string, symbols that
▶ The string->symbol procedure may be thought of as the API
▶ As of R6RS, Scheme supports hash tables as fjrst-class objects,
Mayer Goldberg \ Ben-Gurion University Compiler Construction 50 / 114
▶ When a new symbol is hashed, a bucket for it is created, and
▶ Global variables are defjned by means of the define-expression
▶ Attempts to assign an undefjned variable is defjned to be an
▶ Re-defjning a variable changes its value & is somewhat similar
Mayer Goldberg \ Ben-Gurion University Compiler Construction 51 / 114
▶ When expressions are read, either at the REPL, or from a fjle,
▶ At this stage, symbols are hashed & the corresponding symbol
▶ If, upon parsing, such a symbol turns out to denote a free
▶ Thus free variable access is but a pointer dereference away from
Mayer Goldberg \ Ben-Gurion University Compiler Construction 52 / 114
▶ Because symbols are hashed by the scanner, it makes no sense
▶ The list of print-names from the symbol-table is available via the
Mayer Goldberg \ Ben-Gurion University Compiler Construction 53 / 114
▶ Symbols for which there are buckets in the hash-table are said to
▶ Another kind of symbols are the uninterned symbols, which are
▶ The vast majority of symbols in the system are interned and
▶ Uninterned symbols are a hack that was added intentionally to
▶ Uninterned symbols are used in situations where we require a
▶ Such symbols are used when we need unique names, such as for
▶ Uninterned symbols are created by means of the procedure
Mayer Goldberg \ Ben-Gurion University Compiler Construction 54 / 114
▶ gensym generates uninterned symbols, usually with numbered
▶ The symbol? predicate returns #t for an uninterned symbol ▶ The eq? predicate returns #f whenever an uninterned symbol is
▶ Either an interned symbol, including a symbol by the same
▶ Or another gensym:
Mayer Goldberg \ Ben-Gurion University Compiler Construction 55 / 114
▶ The symbol->string procedure will generate a string of the
▶ Uninterned symbols can be identifjed via the gensym? procedure
Mayer Goldberg \ Ben-Gurion University Compiler Construction 56 / 114
▶ The implementation of symbols is simplifjed by the fact that
▶ To simplify matters further, you should not implement the
▶ This means that all symbols in our system are static, literal
▶ All symbols shall have the respective representative string as a
Mayer Goldberg \ Ben-Gurion University Compiler Construction 57 / 114
▶ The symbol data-structure is a tagged data structure that
▶ The eq? procedure should compare the address fjelds of the
▶ The symbol->string procedure shall create and return a copy
Mayer Goldberg \ Ben-Gurion University Compiler Construction 58 / 114
▶ Just as before, our implementation of free variables is simplifjed
▶ Global variables in our system are not much more than names
▶ Your goal is to create a free-variables-table to serve during the
Mayer Goldberg \ Ben-Gurion University Compiler Construction 59 / 114
▶ Just as you collected a list of constants, by traversing the AST
▶ Create a set from the above list of strings, by removing
▶ Create a list of pairs, based on the above set, by associating with
Mayer Goldberg \ Ben-Gurion University Compiler Construction 60 / 114
▶ The list of pairs is the free-variables-table:
▶ This table must be available to the code generator ▶ Here is how the code-generator uses it: ▶ For a get to a free variable, the code-generator issues a mov
▶ For a set to a free variable, the code-generator issues a mov
Mayer Goldberg \ Ben-Gurion University Compiler Construction 61 / 114
▶ The Code Generator
Mayer Goldberg \ Ben-Gurion University Compiler Construction 62 / 114
compiling program
compiler
src lang dst lang runs
Mayer Goldberg \ Ben-Gurion University Compiler Construction 63 / 114
▶ The code generator is a function expr′ → string
▶ We look at expr′ after the semantic analysis phase is complete ▶ After the constants-table and free-vars-table have been set up
▶ The string returned is x86/64 assembly language code, line by
Mayer Goldberg \ Ben-Gurion University Compiler Construction 64 / 114
▶ Notation: The notation · stands for the code-generator ▶ The induction hypothesis of the code-generator: For any
▶ We need this assumption to convince ourselves that for each
▶ The relative correctness of each part of the code-generator is
Mayer Goldberg \ Ben-Gurion University Compiler Construction 65 / 114
▶ The calling conventions on the x86/64 architecture specify that
▶ This calling convention is very nice for C, because most
▶ This calling convention is not very nice for Scheme, because of
Mayer Goldberg \ Ben-Gurion University Compiler Construction 66 / 114
▶ The calling conventions on the x86/64 architecture specify that
Mayer Goldberg \ Ben-Gurion University Compiler Construction 67 / 114
▶ We shall assume the availablity of four singleton, litteral
▶ The void object #<void> ▶ located at label sob_void ▶ The empty list () ▶ located at label sob_nil ▶ The Boolean value false #f ▶ located at label sob_false ▶ The Boolean value true #t ▶ located at label sob_true Mayer Goldberg \ Ben-Gurion University Compiler Construction 68 / 114
▶ We assume the following structure for all activation frames:
An-1 ⋯ A0 n lex env ret addr
ret addr lex env stack frame qword [rbp] qword [rbp + 8 * 1] qword [rbp + 8 * 2] qword [rbp + 8 * 3] qword [rbp + 8 * 4] System Stack
Mayer Goldberg \ Ben-Gurion University Compiler Construction 69 / 114
▶ We shall assume there is always at least one activation frame
▶ This means that the code-generator assumes that we are within
▶ We will need to support/maintain this assumption by setting up
Mayer Goldberg \ Ben-Gurion University Compiler Construction 70 / 114
Mayer Goldberg \ Ben-Gurion University Compiler Construction 71 / 114
An-1 ⋯ A0 n lex env ret addr
ret addr lex env stack frame qword [rbp] qword [rbp + 8 * 1] qword [rbp + 8 * 2] qword [rbp + 8 * 3] qword [rbp + 8 * 4] System Stack
Mayer Goldberg \ Ben-Gurion University Compiler Construction 72 / 114
An-1 ⋯ A0 n lex env ret addr
ret addr lex env stack frame qword [rbp] qword [rbp + 8 * 1] qword [rbp + 8 * 2] qword [rbp + 8 * 3] qword [rbp + 8 * 4] System Stack
Mayer Goldberg \ Ben-Gurion University Compiler Construction 73 / 114
An-1 ⋯ A0 n lex env ret addr
ret addr lex env stack frame qword [rbp] qword [rbp + 8 * 1] qword [rbp + 8 * 2] qword [rbp + 8 * 3] qword [rbp + 8 * 4] System Stack
Mayer Goldberg \ Ben-Gurion University Compiler Construction 74 / 114
An-1 ⋯ A0 n lex env ret addr
ret addr lex env stack frame qword [rbp] qword [rbp + 8 * 1] qword [rbp + 8 * 2] qword [rbp + 8 * 3] qword [rbp + 8 * 4] System Stack
Mayer Goldberg \ Ben-Gurion University Compiler Construction 75 / 114
An-1 ⋯ A0 n lex env ret addr
ret addr lex env stack frame qword [rbp] qword [rbp + 8 * 1] qword [rbp + 8 * 2] qword [rbp + 8 * 3] qword [rbp + 8 * 4] System Stack
Mayer Goldberg \ Ben-Gurion University Compiler Construction 76 / 114
An-1 ⋯ A0 n lex env ret addr
ret addr lex env stack frame qword [rbp] qword [rbp + 8 * 1] qword [rbp + 8 * 2] qword [rbp + 8 * 3] qword [rbp + 8 * 4] System Stack
Mayer Goldberg \ Ben-Gurion University Compiler Construction 77 / 114
An-1 ⋯ A0 n lex env ret addr
ret addr lex env stack frame qword [rbp] qword [rbp + 8 * 1] qword [rbp + 8 * 2] qword [rbp + 8 * 3] qword [rbp + 8 * 4] System Stack
Mayer Goldberg \ Ben-Gurion University Compiler Construction 78 / 114
An-1 ⋯ A0 n lex env ret addr
ret addr lex env stack frame qword [rbp] qword [rbp + 8 * 1] qword [rbp + 8 * 2] qword [rbp + 8 * 3] qword [rbp + 8 * 4] System Stack
Mayer Goldberg \ Ben-Gurion University Compiler Construction 79 / 114
An-1 ⋯ A0 n lex env ret addr
ret addr lex env stack frame qword [rbp] qword [rbp + 8 * 1] qword [rbp + 8 * 2] qword [rbp + 8 * 3] qword [rbp + 8 * 4] System Stack
Mayer Goldberg \ Ben-Gurion University Compiler Construction 80 / 114
An-1 ⋯ A0 n lex env ret addr
ret addr lex env stack frame qword [rbp] qword [rbp + 8 * 1] qword [rbp + 8 * 2] qword [rbp + 8 * 3] qword [rbp + 8 * 4] System Stack
Mayer Goldberg \ Ben-Gurion University Compiler Construction 81 / 114
▶ Boxes privide one extra level of indirection to the value ▶ Boxes can be implemented as untagged arrays of size 1
▶ That they are untagged means that boxes do not contain RTTI ▶ This is probably the simplest implementation, but nevertheless,
Mayer Goldberg \ Ben-Gurion University Compiler Construction 82 / 114
An-1 ⋯ A0 n lex env ret addr
ret addr lex env stack frame qword [rbp] qword [rbp + 8 * 1] qword [rbp + 8 * 2] qword [rbp + 8 * 3] qword [rbp + 8 * 4] System Stack
Mayer Goldberg \ Ben-Gurion University Compiler Construction 83 / 114
An-1 ⋯ A0 n lex env ret addr
ret addr lex env stack frame qword [rbp] qword [rbp + 8 * 1] qword [rbp + 8 * 2] qword [rbp + 8 * 3] qword [rbp + 8 * 4] System Stack
Mayer Goldberg \ Ben-Gurion University Compiler Construction 84 / 114
▶ Allocate the ExtEnv (the size of which is
▶ Copy pointers of minor vectors from Env
Lcode: push rbp mov rbp, rsp 〚 body 〛 leave ret closure body Create ExtEnv Allocate closure object Closure → Env ≔ ExtEnv Closure →Code ≔ Lcode jmp Lcont Lcont: Closure-Creation Code Mayer Goldberg \ Ben-Gurion University Compiler Construction 85 / 114
▶ Allocate ExtEnv[0] to point to a vector
▶ Copy the parameters ofg of the stack:
▶ Allocate the closure object; Address in rax ▶ Set rax → env = ExtEnv ▶ Set rax → code = Lcode ▶ jmp Lcont
Lcode: push rbp mov rbp, rsp 〚 body 〛 leave ret closure body Create ExtEnv Allocate closure object Closure → Env ≔ ExtEnv Closure →Code ≔ Lcode jmp Lcont Lcont: Closure-Creation Code Mayer Goldberg \ Ben-Gurion University Compiler Construction 86 / 114
▶ Lcode:
Lcode: push rbp mov rbp, rsp 〚 body 〛 leave ret closure body Create ExtEnv Allocate closure object Closure → Env ≔ ExtEnv Closure →Code ≔ Lcode jmp Lcont Lcont: Closure-Creation Code Mayer Goldberg \ Ben-Gurion University Compiler Construction 87 / 114
▶ During the creation of the closure, we
▶ During the application of the closure, only
▶ The code in orange is embedded within
▶ This makes our code-generator
▶ We can combine the output of the
▶ The downside is that we need to pay
Lcode: push rbp mov rbp, rsp 〚 body 〛 leave ret closure body Create ExtEnv Allocate closure object Closure → Env ≔ ExtEnv Closure →Code ≔ Lcode jmp Lcont Lcont: Closure-Creation Code Mayer Goldberg \ Ben-Gurion University Compiler Construction 88 / 114
▶ We could have saved the jmp Lcont
▶ We could not have combined the output
▶ We would have needed some place, “out
Lcode: push rbp mov rbp, rsp 〚 body 〛 leave ret closure body Create ExtEnv Allocate closure object Closure → Env ≔ ExtEnv Closure →Code ≔ Lcode jmp Lcont Lcont: Closure-Creation Code Mayer Goldberg \ Ben-Gurion University Compiler Construction 89 / 114
Mayer Goldberg \ Ben-Gurion University Compiler Construction 90 / 114
▶ Notice that upon return, we consult the argument count on the
▶ This takes into account the fact that the number we need to
▶ lambda-expressions with optional arguments ▶ The tail-call optimization Mayer Goldberg \ Ben-Gurion University Compiler Construction 91 / 114
▶ The code is essentially the same as for
▶ The difgerence occurs in the body of the
Lcode: push rbp mov rbp, rsp 〚 body 〛 leave ret c l
u r e b
y Adjust stack for
Create ExtEnv Allocate closure object Closure → Env ≔ ExtEnv Closure →Code ≔ Lcode jmp Lcont Lcont: ClosureOpt-Creation Code
Mayer Goldberg \ Ben-Gurion University Compiler Construction 92 / 114
Lcode: push rbp mov rbp, rsp 〚 body 〛 leave ret c l
u r e b
y Adjust stack for
Create ExtEnv Allocate closure object Closure → Env ≔ ExtEnv Closure →Code ≔ Lcode jmp Lcont Lcont: ClosureOpt-Creation Code
Mayer Goldberg \ Ben-Gurion University Compiler Construction 93 / 114
▶ Suppose (lambda (a b c . d)
▶ Six arguments are passed ▶ The body of the procedure expects
▶ The last argument, d, is
▶ The stack needs to be adjusted
▶ Notice that the number of
6 8 5 3 2 1 1 env ret ret
(3 5 8)
2 1 1 4 env The stack as it is The stack as expected Mayer Goldberg \ Ben-Gurion University Compiler Construction 94 / 114
▶ Suppose (lambda (a b c . d)
▶ Three arguments are passed ▶ The body of the procedure expects
▶ The last argument, d, is
▶ The stack needs to be adjusted
▶ Notice that the number of
3 2 1 1 env ret ret
()
2 1 1 4 env The stack as it is The stack as expected Mayer Goldberg \ Ben-Gurion University Compiler Construction 95 / 114
▶ As you can see
▶ Sometimes we need to shrink the top frame ▶ Sometimes we need to enlarge the top frame by one ▶ When the number of arguments matches precisely the number
▶ We shift the contents of the frame down by one [8-byte] word
▶ We can test during run-time and decide what to do
▶ This is the basic approach
▶ We can also use magic to save us from having to test and shift
▶ To use magic, we need to change the structure of our
Mayer Goldberg \ Ben-Gurion University Compiler Construction 96 / 114
An-1 ⋯ A0 n lex env ret addr
ret addr lex env stack frame qword [rbp] qword [rbp + 8 * 1] qword [rbp + 8 * 2] qword [rbp + 8 * 3] qword [rbp + 8 * 4] System Stack
An-1 ⋯ A0 n lex env ret addr
ret addr lex env stack frame System Stack magic qword [rbp] qword [rbp + 8 * 1] qword [rbp + 8 * 2] qword [rbp + 8 * 3] qword [rbp + 8 * 4] Mayer Goldberg \ Ben-Gurion University Compiler Construction 97 / 114
▶ Suppose (lambda (a b c . d)
▶ Six arguments are passed ▶ The body of the procedure expects
▶ The last argument, d, is
▶ The stack needs to be adjusted
▶ Notice that the number of
6 8 5 3 2 1 1 env ret ret
(3 5 8)
2 1 1 4 env The stack as it is The stack as expected
magic magic
Mayer Goldberg \ Ben-Gurion University Compiler Construction 98 / 114
▶ Suppose (lambda (a b c . d)
▶ Three arguments are passed ▶ The body of the procedure expects
▶ The last argument, d, is
▶ The stack needs to be adjusted
▶ Notice that the number of
3 2 1 1 env ret ret
()
2 1 1 3 env The stack as it is The stack as expected
magic
Mayer Goldberg \ Ben-Gurion University Compiler Construction 99 / 114
▶ Using magic means reserving a word at the start of each frame
▶ All frames grow by one word, regardless of whether or not they
▶ We do not include magic in the argument count on the stack!
▶ If you choose to use magic you need to remember to remove it
Mayer Goldberg \ Ben-Gurion University Compiler Construction 100 / 114
Mayer Goldberg \ Ben-Gurion University Compiler Construction 101 / 114
A-n-1 A-n-2 ⋯ A-0 n lex-env-g ret-to-f B-m-1 B-m-2 B-m-3 ⋯ B-0 m lex-env-h rbp-in-f A [non-tail] call to (g A-0 ⋯ A-n-1) from within the body
Frame of the call (g A-0 ⋯ A-n-1) ret-to-f Setting up the stack for a tail-call (h B-0 ⋯ B-m-1) from within the body
Once called, h shall push the rbp... Frame of the call (h B-0 ... B-m-1) B-m-1 B-m-2 B-m-3 B-0 m lex-env-h ret-to-f The stack set up for the tail-call (h B-0 ⋯ B-m-1) after overwriting the frame from the call to g, and before jumping to f Setting up the stack for a tail-call
Mayer Goldberg \ Ben-Gurion University Compiler Construction 102 / 114
▶ The code-generator is a recursive function expr′ → string ▶ The return string is an assembly-language code-fragment ▶ To convert this fragment into a standalone program, we need to
▶ The prologue ▶ defjnes the various segments ▶ lays out constants in the data segment ▶ lays out the free variables in the data segment ▶ sets up the initial dummy frame ▶ calls the user-code ▶ The epilogue contains the code for the primitive procedures
Mayer Goldberg \ Ben-Gurion University Compiler Construction 103 / 114
▶ The builtin procedures are the procedures that come with the
▶ There are two kinds of builtin procedures:
▶ Low-level builtins, or primitives, which are low-level
▶ Such procedures include car, pair?, apply and others
▶ Higher-level builtins, which are procedures that could either be
▶ Such procedures include map, length, list, etc ▶ We shall supply you with a Scheme source fjle containing
Mayer Goldberg \ Ben-Gurion University Compiler Construction 104 / 114
▶ The general format of the apply procedure is
▶ s is a proper list ▶ x0 · · · xn−1 is a possibly-empty sequence of Scheme expressions ▶ proc is an expression the value c of which is a closure
▶ Let w ≡ ‘(,x0 · · · ,xn−1 ,@s). The closure c should be such that
▶ The closure c is applied to the arguments in w in tail-position ▶ An implementation of apply must duplicate the frame-recycling
Mayer Goldberg \ Ben-Gurion University Compiler Construction 105 / 114
▶ Start with template code for the code-generator. It should do
▶ Compose the procedures in the assignments so far:
▶ You are given code for opening and reading a textfjle ▶ Open the Scheme source fjle, read in the text ▶ Apply the reader to the list of characters ▶ Your grammar should be for ⟨sexpr⟩∗ ▶ You should have a usable read_sexprs procedure for doing
▶ Map over the list of sexprs a procedure that ▶ tag-parses the sexpr ▶ perofrms semantic analysis on the parsed-expression
▶ Build the constants-table & free-variable-table
Mayer Goldberg \ Ben-Gurion University Compiler Construction 106 / 114
▶ Apply the code-generator to each expr' in the list
▶ Append to each of the resulting strings a call to an x86/64
▶ examines the contents of rax, and ▶ print the Scheme object if not void
▶ Catenate all the strings together; This is your code-fragment ▶ Sandwitch the code-fragment between a prologue and epilogue
▶ Write the resulting string into a text-fjle using the code we shall
▶ Then run nasm to assembly the assembly fjle ▶ Then run gcc to link the assembly fjle
▶ If you link with linux standard C libraries, you will fjnd it very
▶ You are using gcc as a “smart linker” 😊 Mayer Goldberg \ Ben-Gurion University Compiler Construction 107 / 114
▶ Run the executable and examine the output to stdout
▶ By the end of all these steps, you will have completed the cycle
▶ You can now go from Scheme source code to a linux executable ▶ The only problem is that your code generator supports nothing
Mayer Goldberg \ Ben-Gurion University Compiler Construction 108 / 114
▶ Add support for constants, and test! ▶ Add support for Seq', and test! ▶ Add support for If', and test!
▶ Test support for and (which macro-expands to
▶ Add support for Or', and test! ▶ Add support for defjning/setting/getting free variables, and test!
Mayer Goldberg \ Ben-Gurion University Compiler Construction 109 / 114
▶ Add support for LambdaSimple' ▶ Add support for Applic', and test thoroughly! ▶ Implement some primitives and test thoroughly!
▶ Start with type-predicates such as pair?, null?, number?,
▶ Throw in car, cdr, cons, etc
▶ Add support for ApplicTP', and test thoroughly! ▶ Add support for the rest of the primitives, and test throughly! ▶ Add support for LambdaOpt', and test!
Mayer Goldberg \ Ben-Gurion University Compiler Construction 110 / 114
▶ Include the Scheme code we shall provide you, and test
▶ When you read a Scheme source fjle, be sure to append it to
▶ This will make it appear that the Scheme code we provide was
▶ Your compiler will compile all the code together
▶ By now you have a working compiler! Congratulations!
▶ Run your compiler at the linux labs ▶ Make sure everything builds properly
▶ Submit your compiler according to the instructions we provide
Mayer Goldberg \ Ben-Gurion University Compiler Construction 111 / 114
▶ Share/show code to others ▶ Include code from students from previous years, from your
▶ Slack ofg, and leave most/all of the work to your partner ▶ Make your code public (e.g., put on it github)
Mayer Goldberg \ Ben-Gurion University Compiler Construction 112 / 114
▶ Share tests with your classmates ▶ Share scripts to automate testing with your classmates ▶ Test after the tiniest changes/additions to your code ▶ Gloat, boast, be proud, & happy when your compiler fjnally
Mayer Goldberg \ Ben-Gurion University Compiler Construction 113 / 114
Mayer Goldberg \ Ben-Gurion University Compiler Construction 114 / 114