CS502: Compiler Design Runtime Environments Manas Thakur Fall 2020 - - PowerPoint PPT Presentation
CS502: Compiler Design Runtime Environments Manas Thakur Fall 2020 - - PowerPoint PPT Presentation
CS502: Compiler Design Runtime Environments Manas Thakur Fall 2020 Going backstage Character stream Machine-Independent Machine-Independent Lexical Analyzer Lexical Analyzer Code Optimizer Code Optimizer B a c k e n d Intermediate
Manas Thakur CS502: Compiler Design 2
Going backstage
Lexical Analyzer Lexical Analyzer Syntax Analyzer Syntax Analyzer Semantic Analyzer Semantic Analyzer Intermediate Code Generator Intermediate Code Generator Character stream Token stream Syntax tree Syntax tree Intermediate representation Machine-Independent Code Optimizer Machine-Independent Code Optimizer Code Generator Code Generator Target machine code Intermediate representation Machine-Dependent Code Optimizer Machine-Dependent Code Optimizer Target machine code Symbol Table
F r o n t e n d B a c k e n d
Manas Thakur CS502: Compiler Design 3
What all from the runtime interests a compiler?
- Memory
– holds data and code – Our interest: Storage layouts
- Processor(s)
– perform(s) computations in registers – Our interest: Register allocation
- Instruction set
– defines the primitives available for execution – Our interest: Code generation
- Ultimate aim: Performance
– in terms of time and memory – Our interest: Optimization
Manas Thakur CS502: Compiler Design 4
Typical memory subdivision while executing a program Code Static Heap Stack Free memory
Instructions Data that outlives procedures
(malloc, new)
Data local to procedures
(variables, parameters, temporaries)
Data across all procedures
(globals, constants)
Compiler’s responsibility: To reserve space for all these kinds of memory
Manas Thakur CS502: Compiler Design 5
Procedure abstraction
- A namespace for locals and parameters
- Also the return value
- Compiler passes (recall ICG) introduce temporaries
- We need to reserve space for all of them
- Operations:
– Call another procedure
- caller vs callee
– Return from the current procedure
Manas Thakur CS502: Compiler Design 6
How do we call and return from a procedure?
- In the caller:
– Save state of current procedure
- Program counter (where to resume)
- Registers (holding current computations)
– Store arguments in a callee-accessible location – Transfer control-flow
- In the callee:
– Collect parameters – Declare variables – Perform computations (perhaps in temporaries)
- May involve accessing globals
– Return to caller
- Store return value in a caller-accessible location
Some of these tasks can be performed either by the caller or by the callee e.g., caller-save vs callee-save registers
Manas Thakur CS502: Compiler Design 7
Supporting procedure calls
- Only one procedure runs at a time
– Unless?
- If foo calls bar, bar returns before foo
– bar comes last but goes fjrst – Last In, First Out!
- Procedure calls are modelled using a stack
– Called control-stack or “the stack”
- Each active procedure has an activation record or frame in
the stack
Manas Thakur CS502: Compiler Design 8
Activation record (a general structure)
Actual parameters Return value(s) Control/access link Saved machine status Local data Temporaries Previous frame Next frame
Frame pointer
(boundary of current frame)
Stack pointer
(used to access
- ther items)
(point to callers
- r other frames)
(e.g., register values while transferring control-fmow)
Manas Thakur CS502: Compiler Design 9
Addressing items in activation records
Actual parameters Return value(s) Control/access link Saved machine status Local data Temporaries
SP
Growing addresses SP - offset SP + offset
Manas Thakur CS502: Compiler Design 10
Activation records: Design decisions
- Items communicated between caller and
callee placed near the caller
– Parameters – Return value – Advantage?
- Fixed-length items placed together
– Parameters – Return value – Control link
- Space requirement of locals/temporaries
sometimes not known early
Actual parameters Return value(s) Control/access link Saved machine status Local data Temporaries Caller’s AR
Manas Thakur CS502: Compiler Design 11
Complications
- Access to non-local data
– Store globals at a “globally known” location (recall Static from Slide 4?)
- Nested procedures
– Similar to yet different from nested blocks – Store nesting-depth with each variable – Use access links to point to the frames
- f enclosing procedures
- Passing procedures as arguments
– Or as return values – Functional languages
- Challenge:
– Doing all this efficiently
Some other time!
Manas Thakur CS502: Compiler Design 12
Referencing variables with access links
- An access link points to the most recent activation of the
procedure that contains the current procedure
– When can we have multiple activations of a procedure on the
control-stack?
- Suppose
– Np is the nesting-depth of procedure p that refers to non-local
variable a
– Na is the nesting-depth of the procedure, say q, that defines a
- Np – Na access links would have to be traversed when in
procedure p to get to the activation record of q
- Can we make this more efficient?
Manas Thakur CS502: Compiler Design 13
Displays as an alternative to access links
- Traversing access links one-by-one may be costly in case of a
high nesting-depth difference for the variable to be accessed
- Idea:
– Use a global array with the pointer to the most recently active
procedure with nesting-depth i at index i
– The array is called a display (say d) – Advantage:
- If I am a procedure m with nesting-depth k,
and I want to access a variable a with nesting-depth l ≤ k, I only have to follow a maximum of two pointers:
– One to d[l], which gives the AR defining a – Another for the offset of a from the SP of the obtained AR
Next class: Heap management.
CS502: Compiler Design Runtime Environments (Cont.) Manas Thakur
Fall 2020
Manas Thakur CS502: Compiler Design 15
Heap
- A chunk of memory used usually for dynamically allocated data
– using malloc, calloc, new, etc.
- Goal:
– Have as much space as possible to serve allocation requests
- Challenge:
– When to deallocate a previously allocated chunk – Why didn’t this challenge exist with a stack?
- Memory associated with a frame gets popped out automatically once
the corresponding procedure finishes execution.
Manas Thakur CS502: Compiler Design 16
Memory allocation
- Simple task
- Keep a pointer to the first available memory location
- Allocate the requested block when a request comes
– Well, there are again multiple ways to do this:
- First fit
- Best fit
- Read OS books for more!
- Move the pointer to the next free location
- Challenge:
– Memory eventually fills up! – Need deallocations.
Manas Thakur CS502: Compiler Design 17
Explicit deallocation
- Programmer’s task to deallocate memory
- Most languages till 1990s had explicit deallocation
– Exception: Lisp had garbage collection far back in 1958!
- Examples:
– free in C – delete in C++
- Problem:
– Often difficult to visualize when to free memory – Deleting conservatively as well as aggressively may lead to
memory-related issues
Manas Thakur CS502: Compiler Design 18
Problems with bad explicit deallocation
- Too conservative:
– Memory leaks
- Memory fills up while running applications
- Buy next smartphone with higher GBs of RAM!
– What if it’s a high-end server at a government institute? – What if it’s an iPhone? :-)
- Too aggressive:
– Dangling pointers
- A pointer to freed memory
- Using such pointers might lead to weird (and harmful) behaviour
Manas Thakur CS502: Compiler Design 19
Implicit deallocation of memory
- Also called garbage collection
- Motto:
– Don’t trust the programmer
Instead:
– Trust the compiler writer!
- Idea:
– Memory that is no longer in use should be reclaimed automatically
- Examples:
– OO: Java, Smalltalk – Functional: Lisp, ML, Haskell – Logic: Prolog – Scripting: Awk, Perl
Manas Thakur CS502: Compiler Design 20
Garbage collection schemes
- One shot:
– Pause the program – Give full control to a GC pass – Hope the situation improves once GC is over
- On-the-fly (aka incremental):
– Perform some GC actions periodically
- say after each call to new, and/or every time a procedure returns
– Sometimes a one-shot GC may be kept as backup
- Concurrent:
– Separate thread for GC – Relatively complicated, but gaining popularity
Manas Thakur CS502: Compiler Design 21
Garbage collection algorithms
- Reference counting
- Mark and sweep
- Baker’s
- Lieberman’s
- Generational
- Region-based
- Parallel
- Many in the JVM itself:
– G1, Parallel, Concurrent mark and sweep (CMS), Serial,
Shenandoah, ZGC ... list keeps growing.
We will get a glimpse of the colored ones
Manas Thakur CS502: Compiler Design 22
Reference counting GC
- With each allocated chunk (from now on, object) obj:
– maintain the count of references (or pointers) that point to obj
- Operations and actions:
– Allocate obj (e.g., q = new T(), such that the allocated chunk is
named obj):
- Initialize obj.rc to one
– Copy obj (e.g., using p = q):
- ++obj.rc
– Reference changes to obj’ (e.g., q = r):
- --obj.rc
– obj.rc becomes zero:
- Reclaim obj
Manas Thakur CS502: Compiler Design 23
Reference counting: Disadvantages
- Expensive to maintain counts with each object
– Extra storage – Extra computation with each statement that updates references
- Cyclic data structures
– References keep pointing to each other, though no external
reference may be pointing to the data structure as a whole
- Memory fragmentation
– No inbuilt compaction
- Still a simple technique
– Example: Used by the UNIX kernel to recover file descriptors
Manas Thakur CS502: Compiler Design 24
Mark and sweep GC
- Two phases:
– Mark – Sweep
- Mark phase:
– Marks all objects that are reachable from at least one reference
- Sweep phase:
– Reclaims all objects that were not marked in the mark phase
- Advantages:
– Cost is incurred only during the GC phase(s) – Compaction can be performed before actually reclaiming the
memory during the sweep phase
Manas Thakur CS502: Compiler Design 25
Generational GC
- (Statistical) Idea:
– Most objects fall out of use very quickly (high infant mortality!) – Conversely, an old object might stay in business for some more time
- Divide objects into two or more classes (generations):
– Older generation – Younger generation
- Each GC pass checks objects only in the younger generation for
reclamation, and keeps moving objects to the older generation
- Trigger full GC (of any kind) less frequently
- Overall, reduces GC cost while still being correct
Manas Thakur CS502: Compiler Design 26
Swachch Heap Abhiyaan
- Motto: Help the garbage collector
- Why?
– GC is costly; pauses can be noticed while running large programs
- How?
– Program analyses:
- Escape analysis (identify and allocate objects local to a procedure on
the stack)
– Programmer-guided:
- Assign null to no-longer needed reference variables
- Mix explicit and implicit deallocation
- Easier said than done; popular subjects of research.
Manas Thakur CS502: Compiler Design 27
What next?
- Code Generation and Optimization (CGO)
– A very interleaved and interesting topic – Final outcome:
- Target code that is also efficient