CS356 : Discussion #11 Dynamic Memory, Allocation Lab and Linking - - PowerPoint PPT Presentation

cs356 discussion 11
SMART_READER_LITE
LIVE PREVIEW

CS356 : Discussion #11 Dynamic Memory, Allocation Lab and Linking - - PowerPoint PPT Presentation

CS356 : Discussion #11 Dynamic Memory, Allocation Lab and Linking Illustrations from CS:APP3e textbook malloc and free in action Each square represents 4 bytes. malloc returns addresses at multiples of 8 bytes (64 bit). If there is a problem,


slide-1
SLIDE 1

CS356: Discussion #11

Dynamic Memory, Allocation Lab and Linking

Illustrations from CS:APP3e textbook

slide-2
SLIDE 2

malloc and free in action

Each square represents 4 bytes. malloc returns addresses at multiples of 8 bytes (64 bit). If there is a problem, malloc returns NULL and sets errno. malloc does not initialize to 0, use calloc instead.

slide-3
SLIDE 3

Explicit Heap Allocators

Explicit allocators like malloc

  • Must handle arbitrary sequences of allocate/free requests
  • Must respond immediately (no buffering of requests)
  • Helper data structures must be stored in the heap itself
  • Payloads must be aligned to 8-bytes boundaries
  • Allocated blocks cannot be moved/modified

Goal 1. Maximize throughput: (# completed requests) / second

  • Simple to implement malloc with running time O(# free blocks)

and free with running time O(1) Goal 2. Maximize peak utilization: max{ allocated(t) : t ⩽ T} / heapsize(T)

  • If the heap can shrink, take max{ heapsize(t) : t ⩽ T}
  • Problem: fragmentation, internal (e.g., larger block allocated for alignment)
  • r external (free space between allocated blocks)
  • Severity of external fragmentation depends also on future requests
slide-4
SLIDE 4

Implementation: Implicit Free Lists

  • Block size includes header/padding, always a multiple of 8 bytes.
  • Can scan the list using headers but O(# blocks), not O(# free blocks)
  • Special terminating header: zero size, allocated bit set (will not be merged)
  • With 1-word header and 2-word alignment, minimum block size is 2 words

Each square represents 4 bytes Header: (block size) / (allocated)

slide-5
SLIDE 5

Exercise

Assume:

  • 8-byte alignment
  • Block sizes multiples of 8 bytes
  • Implicit free list with 4-byte header (format from previous slide)

Block size (in bytes) and block header (in hex) for blocks allocated by the following sequence: malloc(1) malloc(5) malloc(12) malloc(13)

  • malloc(1): 8 bytes, block header 0x00000009 == 0...01001
  • malloc(5): 16 bytes, block header 0x00000011 == 0...10001
  • malloc(12): 16 bytes, block header 0x00000011 == 0...10001
  • malloc(13): 24 bytes, block header 0x00000019 == 0...11001
slide-6
SLIDE 6

Block Placing and Splitting of Free Blocks

  • If multiple blocks are available in the list, which one to pick?

○ First Fit: First block with enough space. ⇒ retains large blocks at the end of the list, but must skip many ○ Next Fit: First block with enough space, start from last position. ⇒ no need to skip small blocks at start, but worse memory utilization ○ Best Fit: Smallest block with enough space. ⇒ generally better utilization, but slower (must check all blocks)

  • If available space is larger than required, what to do?

○ Assign entire block ⇒ internal fragmentation (ok for good fit) ○ Split the block at 2-word boundary, add another header

slide-7
SLIDE 7

Getting additional memory

What to do when no free block is large enough?

  • Extend the heap by calling sbrk(intptr_t increment)

○ Returns a pointer to the start of the new area

  • Coalesce adjacent free blocks

○ Can coalesce both previous and following block ○ Coalescing when freeing blocks is O(1) but allows thrashing ○ Boundary tag Use a “footer” at the end of each (free) block to fetch previous block

slide-8
SLIDE 8

Coalescing Cases

slide-9
SLIDE 9

Explicit Free Lists

Allocation time is O(#blocks) for implicit lists...

  • Idea. Organize free blocks as a doubly-linked list (pointer inside free blocks)
  • LIFO ordering, first-fit placement ⇒ O(1) freeing/coalescing (boundary tags)
  • Address order, first-fit placement ⇒ O(#free blocks) freeing

Much faster when memory is full, but lower memory utilization.

slide-10
SLIDE 10

Explicit Free Lists: Example

16/0 16/0 24/1 24/1 16/0 ⨯ 16/0 16/0 ⨯ 16/0 0/1

  • Still need boundary tags for coalescing
  • The next free block can be anywhere in the heap
  • Freeing with LIFO policy

○ No coalescing: just add element to list head ○ Coalescing: remove contiguous blocks from free list, merge them, add the new free block to list head

Free List Root

slide-11
SLIDE 11

Segregated Free Lists

To reduce allocation time:

  • Partition free blocks into size classes, e.g., (2(i), 2(i+1)]
  • Keep a list of free blocks for each class
  • Add freed blocks to the appropriate class
  • Search a block of size n in the appropriate class, then following ones

1-8 9-16 17-32 33-64 65-∞

slide-12
SLIDE 12

Segregated Free Lists: Implementation

Simple Segregated

  • Allocate maximum class size (e.g., 2(i+1)), never split free blocks
  • Can assign first block from list and add freed blocks to front
  • If list is empty, extend the heap and add blocks to list
  • No header/footer required for allocated blocks, singly-linked free list

Segregated Fits

  • First-fit search in appropriate class, split and add remaining to some class
  • To free a block, coalesce and place result in appropriate class

Advantages

  • Higher throughput: log-time search for power-of-two size classes
  • Higher utilization: first-fit with segregated lists is closer to best fit
slide-13
SLIDE 13

Allocation Lab

  • Goal. Write your own version malloc, free, realloc.
  • Understand their specification!
  • Check the throughput / utilization effects of different strategies.

You only need to modify mm.c

#include <stdio.h> int mm_init (void); // init heap: 0 if successful, -1 if not void *mm_malloc (size_t size);// 8-byte aligned non-overlapping heap region void mm_free (void *ptr); // frees a pointer returned by mm_malloc void *mm_realloc(void *ptr, size_t size); /* mm_realloc(ptr, size)

  • if ptr is NULL, equivalent to mm_malloc(size)
  • if ptr is not NULL and size == 0, equivalent to mm_free(ptr)
  • if ptr is not NULL and size != 0, return pointer to area with new size

(contract: old data unchanged, new data uninitialized) */

slide-14
SLIDE 14

Support Routines

The memlib.c package simulates the OS memory system.

void *mem_sbrk(int incr); // extend heap, return pointer to new area void mem_reset_brk(void); // reset brk, release all heap memory void *mem_heap_lo(void); // address of first heap byte void *mem_heap_hi(void); // address of last heap byte == brk-1 size_t mem_heapsize(void); // heap size in bytes size_t mem_pagesize(void); // system page size

Note that mem_sbrk accepts only a positive integer (cannot shrink the heap).

slide-15
SLIDE 15

Recommended: Heap Checker

To debug, scan the heap

  • Do allocated blocks overlap?
  • Are blocks in the free list marked as free?
  • Are all free blocks in the free list?
  • Do pointers in the heap point to valid heap addresses
  • Invent your own… but add comments!

Save the checks inside int mm_check(void) (return 0 if inconsistent)

  • During debug, call and quit if mm_check() == 0
  • Remove calls from final submission (it would decrease throughput)
slide-16
SLIDE 16

Evaluation: Trace Simulator

Trace Driver: mtest.c

  • Compile with make
  • Checks correctness, space utilization, throughput
  • ./mtest –r <num> Repeat throughput measurements <num> times
  • ./mtest -f <trace> to simulate a single trace

Trace File Format

  • Header

<num_ids> /* number of request id's */ <num_ops> /* number of requests (operations) */

  • Requests

a <id> <bytes> /* ptr_<id> = malloc(<bytes>) */ r <id> <bytes> /* realloc(ptr_<id>, <bytes>) */ f <id> /* free(ptr_<id>) */

slide-17
SLIDE 17

Trace Example

2 5 a 0 512 a 1 128 r 0 640 f 1 f 0 Meaning

  • 2 distinct request IDs
  • 5 requests
  • Allocate 512 bytes, allocate 128, resize from 512 to 640, free all areas
slide-18
SLIDE 18

Assigned Points

100 points for performance ○ memory utilization = peak memory usage / heap size (at most 1) ○ throughput = operations / second ○ performance index (w = 0.6)

slide-19
SLIDE 19

Implementations

Implementations include but are not limited to

  • Implicit free list (Best version: 59 + 2 = 61 points)
  • Explicit free list (Naïve version: 49 + 3 = 52 points; Best version: 100 points)
  • Segregated lists (Best version: 100 points)

Additional rules

  • You are not allowed to define global structs, arrays, lists, trees.
  • Only global scalar values (integers, floats, pointers) are allowed.
  • Returned pointers must be aligned to 8-byte boundaries.
slide-20
SLIDE 20

Linking

Storage Class Specifiers

  • extern ⇒ to declare a global variable/function defined in another unit
  • static ⇒ to define a global variable/function with internal linkage

Function prototypes are extern by default; local variables can be static (bad)

gcc -c main.c swap.c gcc -o prog main.o swap.o ./prog

slide-21
SLIDE 21

With extern specifier

  • Cannot initialize the variable (another unit will)
  • Expected during linking as a global variable in another unit

With Initialization (Strong Symbol)

  • Initialized to the given value
  • Exported during linking

○ Linking error if another unit initializes a variable with the same name ○ No error if the other unit defines a weak symbol (no initialization) Without Initialization (Weak Symbol)

  • Initialized to zero if no strong symbol is present
  • Exported during linking in “common mode”

○ Shared if another unit defines a variable with the same name No checks on global variable types: data types may not match (bad!) ○ Also no checks on function prototypes of external functions... ○ Checks on types/prototypes if linking optimization -ftlo is enabled ○ Common strategy: each unit includes its own prototypes/externs

Global Variables (Avoid If Possible)

slide-22
SLIDE 22

External Symbols: When types don’t match...

$ gcc -Wall -Wextra -std=c99 main.c swap.c -o prog $ ./prog z = 11223344 x = 11227788 y = 55663344 z = 11220000

/* main.c */ #include <stdio.h> int z = 0x11223344; void swap(int *x, int *y); int main() { int x = 0x11223344; int y = 0x55667788; printf("z = %x\n", z); swap(&x, &y); printf("x = %x\n", x); printf("y = %x\n", y); printf("z = %x\n", z); } /* swap.c */ short z; void swap(short *x, short *y) { z = 0; short tmp = *x; *x = *y; *y = tmp; }

slide-23
SLIDE 23

External Symbols: With -flto

$ gcc -Wall -Wextra -std=c99 -flto main.c swap.c -o prog main.c:4:6: warning: type of ‘swap’ does not match original declaration [..] swap.c:1:7: warning: type of ‘z’ does not match original declaration [..] main.c:3:5: note: type ‘int’ should match type ‘short int’

/* main.c */ #include <stdio.h> int z = 0x11223344; void swap(int *x, int *y); int main() { int x = 0x11223344; int y = 0x55667788; printf("z = %x\n", z); swap(&x, &y); printf("x = %x\n", x); printf("y = %x\n", y); printf("z = %x\n", z); } /* swap.c */ short z; void swap(short *x, short *y) { z = 0; short tmp = *x; *x = *y; *y = tmp; }

slide-24
SLIDE 24

External Symbols: Using Headers

$ gcc -Wall -Wextra -std=c99 \ main.c swap.c -o prog $ ./prog z = 11223344 x = 55667788 y = 11223344 z = 0 Strategy: Each unit includes its own prototypes/declarations.

  • Non-matching types now result

in compile errors within a unit

  • Header guards are used to

avoid double (or recursive) inclusion of headers

  • Adding int z = 42 to swap.c

results in a linking error

/* main.c */ #include "main.h" int z = 0x11223344; int main() { int x = 0x11223344; int y = 0x55667788; printf("z = %x\n", z); swap(&x, &y); printf("x = %x\n", x); printf("y = %x\n", y); printf("z = %x\n", z); } /* swap.c */ #include "swap.h" void swap(int *x, int *y) { z = 0; int tmp = *x; *x = *y; *y = tmp; } /* main.h */ #ifndef MAIN_H #define MAIN_H #include <stdio.h> #include "swap.h" extern int z; #endif /* MAIN_H */ /* swap.h */ #ifndef SWAP_H #define SWAP_H #include "main.h" void swap(int *x, int *y); #endif /* SWAP_H */

slide-25
SLIDE 25

Linking

  • Phase 1: Symbol Resolution

○ Global Symbols: Non-static, global variables and functions ○ External Global Symbols: Used but not defined in a unit ○ Local Symbols: Static variables and functions (used only in this unit)

■ Local variables are not local symbols! (Not involved in linking) ■ Errors for duplicate definition of local symbols, duplicate global symbols with initializations (strong symbols)

  • Phase 2: Relocation
slide-26
SLIDE 26

Symbol Resolution: Global, External, Local

slide-27
SLIDE 27

Symbol Resolution: Strong and Weak

slide-28
SLIDE 28

Examples

slide-29
SLIDE 29

Examples

slide-30
SLIDE 30

Object Files

Three kinds of object files:

  • Relocatable: code/data (e.g., .o produced by gcc -c)
  • Executable: binary ready for execution (e.g., ./prog produced by gcc -o)
  • Shared: ready to be used as dynamic library (.so on Linux)

In Linux, executable files have the ELF format (Executable & Linked Format)

slide-31
SLIDE 31

ELF Format

.text binary code .rodata constants like strings .data initialized global/static vars .bss uninitialized global/static variables (no space in .o) .symtab symbol table (functions and global variables) .rel.text relocation info .debug, .line: symbol table for locals and other definitions (included with -g) .strtab Table of all the strings used by other headers

slide-32
SLIDE 32

Relocation

slide-33
SLIDE 33

Static Libraries

  • Binary code from library functions

added to executable at linking time

  • Only needed functions are included
  • No dependencies at runtime
  • Large executable
  • Each program loads the same library

code into memory

  • Need to run the linker again to use a

new library version (e.g., with bug fixes)

slide-34
SLIDE 34

Shared Libraries

  • Use an indirection: lookup address
  • f code/data at runtime
  • OS loader fills Global Offset Table

with runtime location of code

  • Many processes can share the

same code (as read-only areas)

  • If system libraries are updated, a

new version is used

  • Compile with -shared