CS356: Discussion #10
Dynamic Memory and Cache Lab
Illustrations from CS:APP3e textbook
CS356 : Discussion #10 Dynamic Memory and Cache Lab Illustrations - - PowerPoint PPT Presentation
CS356 : Discussion #10 Dynamic Memory and Cache Lab Illustrations from CS:APP3e textbook Cache Lab Goal To write a small C simulator of caching strategies. Expect about 200-300 lines of code. Starting point in your repository.
Illustrations from CS:APP3e textbook
Goal
Traces
For example: “I 0400d7d4,8” “M 0421c7f0,4” “L 04f6b868,8”
○ Instruction load: I (ignore these) ○ Data load: L (hit, miss, miss/eviction) ○ Data store: S (hit, miss, miss/eviction) ○ Data modify: M (load+store: hit/hit, miss/hit, miss/eviction/hit) http://bytes.usc.edu/cs356/assignments/cachelab.pdf
./csim-ref [-hv] -S <S> -K <K> -B <B> -p <P> -t <tracefile>
$ ./csim-ref -S 16 -K 1 -B 16 –p LRU -t traces/yi.trace hits:4 misses:5 evictions:3 $ ./csim-ref -S 16 -K 1 -B 16 –p LRU -v -t traces/yi.trace L 10,1 miss M 20,1 miss hit ... ... M 12,1 miss eviction hit hits:4 misses:5 evictions:3 (See https://usc-cs356.github.io/assignments/cachelab.html)
$ ./csim-ref -S 2 -K 2 -B 2 -p LRU -v -t traces/simple_policy.trace L 0,1 miss L 1,1 hit L 2,1 miss L 6,1 miss L 2,1 hit L a,1 miss eviction L 2,1 hit hits:3 misses:4 evictions:1 $ ./csim-ref -S 2 -K 2 -B 2 -p FIFO -v -t traces/simple_policy.trace L 0,1 miss L 1,1 hit L 2,1 miss L 6,1 miss L 2,1 hit L a,1 miss eviction L 2,1 miss eviction hits:2 misses:5 evictions:2
LRU and FIFO
$ ./csim-ref -S 2 -K 1 -B 4 -p LRU -v -t traces/simple_size.trace L 0,2 miss L 1,2 hit L 3,1 hit L 6,6 miss miss eviction L 3,1 miss eviction hits:2 misses:4 evictions:2
Memory accesses that cross a line boundary
Metadata Metadata Metadata Metadata Metadata Metadata
?? bits per line to support eviction policy
Not needed in this lab
struct Line { // include valid bit, tag, and metadata };
Flat array for 𝑇 × 𝐿 struct Line
Example 1 struct Line { // include valid bit, tag, and metadata }; struct Line *cache; cache = (struct Line *)malloc(S * K * sizeof(struct Line));
Array for 𝑇
struct Line *
Array for 𝐿 struct Line per set
struct Line ** Example 2 struct Line **cache; cache = (struct Line **)malloc(...); for i = 0, 1, ..., S-1 do cache[i] = (struct Line *)malloc(...); struct Line { // include valid bit, tag, and metadata };
Fill in the csim.c file to:
Rules
⇒ How to deal with this?
printSummary(hit_count, miss_count, eviction_count)
3 test suites:
You only need to output the correct number of cache hits, misses, evictions.
○ int s = atoi(arg_str); int S = pow(2, s); You must pass all tests in a test suite to receive its points.
○ fopen (open a file), fgets (read a line), sscanf (parse a line), fclose
○ Cache = S sets ○ Each set = E cache lines ○ Use malloc and free
○ Valid bit, tag, and what else? ○ How to keep track of statistics for LRU and FIFO policies?
○ How to extract tag / set / block bits from an input address? ○ How to select the correct set? And how to look for a hit? ○ What to update in case of hit (in addition to hit counter)? ○ What to do in case of miss?
Low-level memory allocation in Linux void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t off) int munmap(void *addr, size_t length) void *sbrk(intptr_t increment) Portable alternatives in stdlib void *malloc(size_t size) void *calloc(size_t nmemb, size_t size) void *realloc(void *ptr, size_t size) void free(void *ptr) Check man pages for these functions! $ man malloc
Each square represents 4 bytes. malloc returns addresses at multiples of 8 bytes (64 bit). If there is a problem, malloc returns NULL and sets errno. malloc does not initialize to 0, use calloc instead.
#include <stdlib.h> #include <stdio.h> int compare(const void *x, const void *y) { int xval = *(int *)x; int yval = *(int *)y; if (xval < yval) return -1; else if (xval == yval) return 0; else return 1; }
$ gcc sort.c -o sort $ ./sort How many numbers to sort? 4 Please input 4 numbers: 2 3 1 -1 Sorted numbers: -1 1 2 3
int main() { int count = 0; printf("How many numbers to sort? "); if (scanf("%d", &count) != 1) { fprintf(stderr, "Invalid input\n"); return 1; } int *numbers = (int *) malloc(count * sizeof(int)); printf("Please input %d numbers: ", count); for (int i = 0; i < count; i++) { if (scanf("%d", &numbers[i]) != 1) { fprintf(stderr, "Invalid input\n"); return 1; } } qsort(numbers, count, sizeof(int), compare); printf("Sorted numbers:"); for (int i = 0; i < count; i++) { printf(" %d", numbers[i]); } printf("\n"); free(numbers); return 0; }
/* Return y = Ax */ int *matvec(int **A, int *x, int n) { int i, j; int *y = (int *)malloc(n * sizeof(int)); /* should set y[i] = 0 (or use calloc) */ for (i = 0; i < n; i++) { for (j = 0; j < n; j++) { y[i] += A[i][j] * x[j]; } } return y; } void buffer_overflow() { char buf[64]; gets(buf); /* use fgets instead */ return; } void bad_pointer() { int val; scanf("%d", val); /* use &val */ } int *stackref() { int val; return &val; /* val is a local variable */ } void leak(int n) { int *x = (int *)malloc(n * sizeof(int)); return; /* x is garbage at this point */ } int *search(int *p, int val) { while (*p && *p != val) p += sizeof(int); /* should be p++ */ return p; }
/* Create an nxm array */ int **makeArray1(int n, int m) { int i; /* should be sizeof(int*) */ int **A = (int **)malloc(n * sizeof(int)); for (i = 0; i < n; i++) { A[i] = (int *)malloc(m * sizeof(int)); } return A; } /* Create an nxm array */ int **makeArray2(int n, int m) { int i; int **A = (int **)malloc(n * sizeof(int *)); /* should be .. i < n .. */ for (i = 0; i <= n; i++) { A[i] = (int *)malloc(m * sizeof(int)); } return A; } int *binheapDelete(int **binheap, int *size) { int *packet = binheap[0]; binheap[0] = binheap[*size - 1]; *size--; /* This should be (*size)-- */ heapify(binheap, *size, 0); return(packet); } int *heapref(int n, int m) { int i; int *x, *y; x = (int *)malloc(n * sizeof(int)); /* ... */ /* Other calls to malloc and free here */ free(x); y = (int *)malloc(m * sizeof(int)); for (i = 0; i < m; i++) { y[i] = x[i]++; /* x[i] in freed block */ } return y; }
Explicit allocators like malloc
Goal 1. Maximize throughput: (# completed requests) / second
and free with running time O(1) Goal 2. Maximize peak utilization: max{ allocated(t) : t ⩽ T} / heapsize(T)
Each square represents 4 bytes Header: (block size) / (allocated)
Assume:
Block size (in bytes) and block header (in hex) for blocks allocated by the following sequence: malloc(1) malloc(5) malloc(12) malloc(13)
○ First Fit: First block with enough space. ⇒ retains large blocks at the end of the list, but must skip many ○ Next Fit: First block with enough space, start from last position. ⇒ no need to skip small blocks at start, but worse memory utilization ○ Best Fit: Smallest block with enough space. ⇒ generally better utilization, but slower (must check all blocks)
○ Assign entire block ⇒ internal fragmentation (ok for good fit) ○ Split the block at 2-word boundary, add another header
What to do when no free block is large enough?
○ Returns a pointer to the start of the new area
○ Can coalesce both previous and following block ○ Coalescing when freeing blocks is O(1) but allows thrashing ○ Boundary tag Use a “footer” at the end of each (free) block to fetch previous block
Allocation time is O(#blocks) for implicit lists...
Much faster when memory is full, but lower memory utilization.
16/0 16/0 24/1 24/1 16/0 ⨯ 16/0 16/0 ⨯ 16/0 0/1
○ No coalescing: just add element to list head ○ Coalescing: remove contiguous blocks from free list, merge them, add the new free block to list head
Free List Root
To reduce allocation time:
1-8 9-16 17-32 33-64 65-∞
Simple Segregated
Segregated Fits
Advantages