cs356 discussion 10
play

CS356 : Discussion #10 Dynamic Memory and Cache Lab Illustrations - PowerPoint PPT Presentation

CS356 : Discussion #10 Dynamic Memory and Cache Lab Illustrations from CS:APP3e textbook Cache Lab Goal To write a small C simulator of caching strategies. Expect about 200-300 lines of code. Starting point in your repository.


  1. CS356 : Discussion #10 Dynamic Memory and Cache Lab Illustrations from CS:APP3e textbook

  2. Cache Lab Goal ● To write a small C simulator of caching strategies. ● Expect about 200-300 lines of code. ● Starting point in your repository. Traces The traces directory contains program traces generated by valgrind ● The format of each line is: <operation> <address>,<size> ● For example: “ I 0400d7d4,8 ” “ M 0421c7f0,4 ” “ L 04f6b868,8 ” ● Operations Instruction load: I (ignore these) ○ Data load: ฀ L (hit, miss, miss/eviction) ○ Data store: ฀ S (hit, miss, miss/eviction) ○ Data modify: ฀ M (load+store: hit/hit, miss/hit, miss/eviction/hit) ○ http://bytes.usc.edu/cs356/assignments/cachelab.pdf

  3. Reference Cache Simulator ./ csim-ref [-hv] -S <S> -K <K> -B <B> -p <P> -t <tracefile> -h Optional help flag that prints usage information -v Optional verbose flag that displays trace information -S <S> Number of sets (s=log2(S) is the number of bits used for the set index) -K <K> Number of lines per set (associativity) -B <B> Number of block size (i.e., use B = 2 b bytes / block) -p <P> Selects a policy, either LRU or FIFO -t <tracefile> select a trace $ ./ csim-ref -S 16 -K 1 -B 16 – p LRU -t traces/yi.trace hits:4 misses:5 evictions:3 $ ./ csim-ref -S 16 -K 1 -B 16 – p LRU -v -t traces/yi.trace L 10,1 miss M 20,1 miss hit ... ... M 12,1 miss eviction hit hits:4 misses:5 evictions:3 (See https://usc-cs356.github.io/assignments/cachelab.html )

  4. Reference Cache Simulator LRU and FIFO $ ./ csim-ref -S 2 -K 2 -B 2 -p LRU -v -t traces/simple_policy.trace L 0,1 miss L 1,1 hit L 2,1 miss L 6,1 miss L 2,1 hit L a,1 miss eviction L 2,1 hit hits:3 misses:4 evictions:1 $ ./ csim-ref -S 2 -K 2 -B 2 -p FIFO -v -t traces/simple_policy.trace L 0,1 miss L 1,1 hit L 2,1 miss L 6,1 miss L 2,1 hit L a,1 miss eviction L 2,1 miss eviction hits:2 misses:5 evictions:2

  5. Reference Cache Simulator Memory accesses that cross a line boundary $ ./ csim-ref -S 2 -K 1 -B 4 -p LRU -v -t traces/simple_size.trace L 0,2 miss L 1,2 hit L 3,1 hit L 6,6 miss miss eviction L 3,1 miss eviction hits:2 misses:4 evictions:2

  6. Your Simulator Not needed in this lab ?? bits per line to support eviction policy Metadata Metadata Metadata Metadata Metadata Metadata

  7. Your Simulator struct Line { struct Line { // include valid bit, tag, and metadata // include valid bit, tag, and metadata }; }; Example 1 Flat array for 𝑇 × 𝐿 struct Line struct Line *cache; cache = (struct Line *)malloc(S * K * sizeof (struct Line));

  8. Your Simulator struct Line { // include valid bit, tag, and metadata }; Example 2 struct Line ** Array for 𝐿 struct Line per set Array for 𝑇 struct Line * struct Line **cache; cache = (struct Line **)malloc(...); for i = 0, 1, ..., S-1 do cache[i] = (struct Line *)malloc(...);

  9. Your Simulator Fill in the csim.c file to: ● Accept the same command-line options. ● Produce identical output. Rules Include name and username in the header. ● Use only C code (must compile with gcc -std=c11 ) ● Use malloc to allocate data structures for arbitrary S , K , B ● ● Implement both LRU and FIFO policies. ● Ignore instruction cache accesses (starting with I ). Memory accesses can cross block boundaries: ● ⇒ How to deal with this? At the end of your main function, call: ● printSummary (hit_count, miss_count, eviction_count)

  10. Evaluation 3 test suites : ● Direct Mapped: K = 1; no need to implement an eviction policy ● Policy Tests: check that LRU and FIFO policies work correctly ● Size Tests: include memory accesses that cross a line boundary You only need to output the correct number of cache hits, misses, evictions . You can run csim-ref -v to check the expected behavior. ● Start from small traces such as traces/dave.traces ● Use the getopt library to parse command-line arguments. ● ○ int s = atoi(arg_str); int S = pow(2, s); You must pass all tests in a test suite to receive its points.

  11. Problems How to parse the input traces? ● ○ fopen (open a file), fgets (read a line), sscanf (parse a line), fclose ● How to represent the cache? How to allocate memory for any s, E, b? ○ Cache = S sets Each set = E cache lines ○ Use malloc and free ○ What needs to be stored in a cache line? ● Valid bit, tag, and what else? ○ ○ How to keep track of statistics for LRU and FIFO policies? ● How to retrieve data at a memory address? How to extract tag / set / block bits from an input address? ○ How to select the correct set? And how to look for a hit? ○ What to update in case of hit (in addition to hit counter)? ○ What to do in case of miss? ○ Useful: Print the content of the cache after each request in a trace ●

  12. Dynamic Memory Allocation in C Low-level memory allocation in Linux void * mmap ( void *addr, size_t length, int prot, int flags, int fd, off_t off) int munmap ( void *addr, size_t length) void * sbrk ( intptr_t increment) Portable alternatives in stdlib void * malloc ( size_t size) void * calloc ( size_t nmemb, size_t size) void * realloc ( void *ptr, size_t size) void free ( void *ptr) Check man pages for these functions! $ man malloc

  13. malloc and free in action Each square represents 4 bytes. malloc returns addresses at multiples of 8 bytes (64 bit). If there is a problem, malloc returns NULL and sets errno . malloc does not initialize to 0, use calloc instead.

  14. Using malloc and free #include <stdlib.h> int main () { #include <stdio.h> int count = 0 ; printf("How many numbers to sort? "); int compare ( const void *x, const void *y) { if (scanf("%d", &count) != 1 ) { int xval = *( int *)x; fprintf(stderr, "Invalid input \n "); int yval = *( int *)y; return 1 ; } if (xval < yval) int *numbers = ( int *) malloc(count * sizeof ( int )); return - 1 ; printf("Please input %d numbers: ", count); else if (xval == yval) for ( int i = 0 ; i < count; i++) { return 0 ; if (scanf("%d", &numbers[i]) != 1 ) { else fprintf(stderr, "Invalid input \n "); return 1 ; return 1 ; } } } qsort(numbers, count, sizeof ( int ), compare); printf("Sorted numbers:"); for ( int i = 0 ; i < count; i++) { $ gcc sort.c -o sort printf(" %d", numbers[i]); $ ./sort } How many numbers to sort? 4 printf(" \n "); Please input 4 numbers: 2 3 1 -1 free(numbers); Sorted numbers: -1 1 2 3 return 0 ; }

  15. Memory-Related Bugs /* Return y = Ax */ void buffer_overflow() { int *matvec( int **A, int *x, int n) { char buf[ 64 ]; int i, j; gets(buf); /* use fgets instead */ int *y = ( int *)malloc(n * sizeof ( int )); return ; /* should set y[i] = 0 (or use calloc) */ } for (i = 0 ; i < n; i++) { for (j = 0 ; j < n; j++) { void bad_pointer() { y[i] += A[i][j] * x[j]; int val; } scanf("%d", val); /* use &val */ } } return y; } int *search( int *p, int val) { while (*p && *p != val) void leak( int n) { p += sizeof ( int ); /* should be p++ */ int *x = ( int *)malloc(n * sizeof ( int )); return p; return ; } /* x is garbage at this point */ } int *stackref() { int val; return &val; /* val is a local variable */ }

  16. Memory-Related Bugs /* Create an nxm array */ int *binheapDelete( int **binheap, int *size) { int **makeArray1( int n, int m) { int *packet = binheap[ 0 ]; int i; binheap[ 0 ] = binheap[*size - 1 ]; /* should be sizeof(int*) */ *size--; /* This should be (*size)-- */ int **A = ( int **)malloc(n * sizeof ( int )); heapify(binheap, *size, 0 ); for (i = 0 ; i < n; i++) { return (packet); A[i] = ( int *)malloc(m * sizeof ( int )); } } return A; } int *heapref( int n, int m) { int i; /* Create an nxm array */ int *x, *y; int **makeArray2( int n, int m) { x = ( int *)malloc(n * sizeof ( int )); int i; /* ... */ int **A = ( int **)malloc(n * sizeof ( int *)); /* Other calls to malloc and free here */ /* should be .. i < n .. */ free(x); for (i = 0 ; i <= n; i++) { y = ( int *)malloc(m * sizeof ( int )); A[i] = ( int *)malloc(m * sizeof ( int )); for (i = 0 ; i < m; i++) { } y[i] = x[i]++; /* x[i] in freed block */ return A; } } return y; }

  17. Explicit Heap Allocators Explicit allocators like malloc ● Must handle arbitrary sequences of allocate/free requests ● Must respond immediately (no buffering of requests) ● Helper data structures must be stored in the heap itself Payloads must be aligned to 8-bytes boundaries ● Allocated blocks cannot be moved/modified ● Goal 1. Maximize throughput: (# completed requests) / second ● Simple to implement malloc with running time O(# free blocks) and free with running time O(1) Goal 2. Maximize peak utilization: max{ allocated(t) : t ⩽ T} / heapsize(T) If the heap can shrink, take max{ heapsize(t) : t ⩽ T} ● Problem: fragmentation , internal (e.g., larger block allocated for alignment) ● or external (free space between allocated blocks) ● Severity of external fragmentation depends also on future requests

  18. Implementation: Implicit Free Lists Each square represents 4 bytes Header: (block size) / (allocated) Block size includes header/padding, always a multiple of 8 bytes. ● Can scan the list using headers but O(# blocks) , not O(# free blocks) ● Special terminating header: zero size, allocated bit set (will not be merged) ● With 1-word header and 2-word alignment, minimum block size is 2 words ●

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend