CS356 : Discussion #10 Dynamic Memory and Cache Lab Illustrations - PowerPoint PPT Presentation

CS356 : Discussion #10 Dynamic Memory and Cache Lab Illustrations from CS:APP3e textbook

Cache Lab Goal ● To write a small C simulator of caching strategies. ● Expect about 200-300 lines of code. ● Starting point in your repository. Traces The traces directory contains program traces generated by valgrind ● The format of each line is: <operation> <address>,<size> ● For example: “ I 0400d7d4,8 ” “ M 0421c7f0,4 ” “ L 04f6b868,8 ” ● Operations Instruction load: I (ignore these) ○ Data load: ฀ L (hit, miss, miss/eviction) ○ Data store: ฀ S (hit, miss, miss/eviction) ○ Data modify: ฀ M (load+store: hit/hit, miss/hit, miss/eviction/hit) ○ http://bytes.usc.edu/cs356/assignments/cachelab.pdf

Reference Cache Simulator ./ csim-ref [-hv] -S <S> -K <K> -B <B> -p <P> -t <tracefile> -h Optional help flag that prints usage information -v Optional verbose flag that displays trace information -S <S> Number of sets (s=log2(S) is the number of bits used for the set index) -K <K> Number of lines per set (associativity) -B <B> Number of block size (i.e., use B = 2 b bytes / block) -p <P> Selects a policy, either LRU or FIFO -t <tracefile> select a trace $ ./ csim-ref -S 16 -K 1 -B 16 – p LRU -t traces/yi.trace hits:4 misses:5 evictions:3 $ ./ csim-ref -S 16 -K 1 -B 16 – p LRU -v -t traces/yi.trace L 10,1 miss M 20,1 miss hit ... ... M 12,1 miss eviction hit hits:4 misses:5 evictions:3 (See https://usc-cs356.github.io/assignments/cachelab.html )

Reference Cache Simulator LRU and FIFO $ ./ csim-ref -S 2 -K 2 -B 2 -p LRU -v -t traces/simple_policy.trace L 0,1 miss L 1,1 hit L 2,1 miss L 6,1 miss L 2,1 hit L a,1 miss eviction L 2,1 hit hits:3 misses:4 evictions:1 $ ./ csim-ref -S 2 -K 2 -B 2 -p FIFO -v -t traces/simple_policy.trace L 0,1 miss L 1,1 hit L 2,1 miss L 6,1 miss L 2,1 hit L a,1 miss eviction L 2,1 miss eviction hits:2 misses:5 evictions:2

Reference Cache Simulator Memory accesses that cross a line boundary $ ./ csim-ref -S 2 -K 1 -B 4 -p LRU -v -t traces/simple_size.trace L 0,2 miss L 1,2 hit L 3,1 hit L 6,6 miss miss eviction L 3,1 miss eviction hits:2 misses:4 evictions:2

Your Simulator Not needed in this lab ?? bits per line to support eviction policy Metadata Metadata Metadata Metadata Metadata Metadata

Your Simulator struct Line { struct Line { // include valid bit, tag, and metadata // include valid bit, tag, and metadata }; }; Example 1 Flat array for 𝑇 × 𝐿 struct Line struct Line *cache; cache = (struct Line *)malloc(S * K * sizeof (struct Line));

Your Simulator struct Line { // include valid bit, tag, and metadata }; Example 2 struct Line ** Array for 𝐿 struct Line per set Array for 𝑇 struct Line * struct Line **cache; cache = (struct Line **)malloc(...); for i = 0, 1, ..., S-1 do cache[i] = (struct Line *)malloc(...);

Your Simulator Fill in the csim.c file to: ● Accept the same command-line options. ● Produce identical output. Rules Include name and username in the header. ● Use only C code (must compile with gcc -std=c11 ) ● Use malloc to allocate data structures for arbitrary S , K , B ● ● Implement both LRU and FIFO policies. ● Ignore instruction cache accesses (starting with I ). Memory accesses can cross block boundaries: ● ⇒ How to deal with this? At the end of your main function, call: ● printSummary (hit_count, miss_count, eviction_count)

Evaluation 3 test suites : ● Direct Mapped: K = 1; no need to implement an eviction policy ● Policy Tests: check that LRU and FIFO policies work correctly ● Size Tests: include memory accesses that cross a line boundary You only need to output the correct number of cache hits, misses, evictions . You can run csim-ref -v to check the expected behavior. ● Start from small traces such as traces/dave.traces ● Use the getopt library to parse command-line arguments. ● ○ int s = atoi(arg_str); int S = pow(2, s); You must pass all tests in a test suite to receive its points.

Problems How to parse the input traces? ● ○ fopen (open a file), fgets (read a line), sscanf (parse a line), fclose ● How to represent the cache? How to allocate memory for any s, E, b? ○ Cache = S sets Each set = E cache lines ○ Use malloc and free ○ What needs to be stored in a cache line? ● Valid bit, tag, and what else? ○ ○ How to keep track of statistics for LRU and FIFO policies? ● How to retrieve data at a memory address? How to extract tag / set / block bits from an input address? ○ How to select the correct set? And how to look for a hit? ○ What to update in case of hit (in addition to hit counter)? ○ What to do in case of miss? ○ Useful: Print the content of the cache after each request in a trace ●

Dynamic Memory Allocation in C Low-level memory allocation in Linux void * mmap ( void *addr, size_t length, int prot, int flags, int fd, off_t off) int munmap ( void *addr, size_t length) void * sbrk ( intptr_t increment) Portable alternatives in stdlib void * malloc ( size_t size) void * calloc ( size_t nmemb, size_t size) void * realloc ( void *ptr, size_t size) void free ( void *ptr) Check man pages for these functions! $ man malloc

malloc and free in action Each square represents 4 bytes. malloc returns addresses at multiples of 8 bytes (64 bit). If there is a problem, malloc returns NULL and sets errno . malloc does not initialize to 0, use calloc instead.

Using malloc and free #include <stdlib.h> int main () { #include <stdio.h> int count = 0 ; printf("How many numbers to sort? "); int compare ( const void *x, const void *y) { if (scanf("%d", &count) != 1 ) { int xval = *( int *)x; fprintf(stderr, "Invalid input \n "); int yval = *( int *)y; return 1 ; } if (xval < yval) int *numbers = ( int *) malloc(count * sizeof ( int )); return - 1 ; printf("Please input %d numbers: ", count); else if (xval == yval) for ( int i = 0 ; i < count; i++) { return 0 ; if (scanf("%d", &numbers[i]) != 1 ) { else fprintf(stderr, "Invalid input \n "); return 1 ; return 1 ; } } } qsort(numbers, count, sizeof ( int ), compare); printf("Sorted numbers:"); for ( int i = 0 ; i < count; i++) { $ gcc sort.c -o sort printf(" %d", numbers[i]); $ ./sort } How many numbers to sort? 4 printf(" \n "); Please input 4 numbers: 2 3 1 -1 free(numbers); Sorted numbers: -1 1 2 3 return 0 ; }

Memory-Related Bugs /* Return y = Ax */ void buffer_overflow() { int *matvec( int **A, int *x, int n) { char buf[ 64 ]; int i, j; gets(buf); /* use fgets instead */ int *y = ( int *)malloc(n * sizeof ( int )); return ; /* should set y[i] = 0 (or use calloc) */ } for (i = 0 ; i < n; i++) { for (j = 0 ; j < n; j++) { void bad_pointer() { y[i] += A[i][j] * x[j]; int val; } scanf("%d", val); /* use &val */ } } return y; } int *search( int *p, int val) { while (*p && *p != val) void leak( int n) { p += sizeof ( int ); /* should be p++ */ int *x = ( int *)malloc(n * sizeof ( int )); return p; return ; } /* x is garbage at this point */ } int *stackref() { int val; return &val; /* val is a local variable */ }

Memory-Related Bugs /* Create an nxm array */ int *binheapDelete( int **binheap, int *size) { int **makeArray1( int n, int m) { int *packet = binheap[ 0 ]; int i; binheap[ 0 ] = binheap[*size - 1 ]; /* should be sizeof(int*) */ *size--; /* This should be (*size)-- */ int **A = ( int **)malloc(n * sizeof ( int )); heapify(binheap, *size, 0 ); for (i = 0 ; i < n; i++) { return (packet); A[i] = ( int *)malloc(m * sizeof ( int )); } } return A; } int *heapref( int n, int m) { int i; /* Create an nxm array */ int *x, *y; int **makeArray2( int n, int m) { x = ( int *)malloc(n * sizeof ( int )); int i; /* ... */ int **A = ( int **)malloc(n * sizeof ( int *)); /* Other calls to malloc and free here */ /* should be .. i < n .. */ free(x); for (i = 0 ; i <= n; i++) { y = ( int *)malloc(m * sizeof ( int )); A[i] = ( int *)malloc(m * sizeof ( int )); for (i = 0 ; i < m; i++) { } y[i] = x[i]++; /* x[i] in freed block */ return A; } } return y; }

Explicit Heap Allocators Explicit allocators like malloc ● Must handle arbitrary sequences of allocate/free requests ● Must respond immediately (no buffering of requests) ● Helper data structures must be stored in the heap itself Payloads must be aligned to 8-bytes boundaries ● Allocated blocks cannot be moved/modified ● Goal 1. Maximize throughput: (# completed requests) / second ● Simple to implement malloc with running time O(# free blocks) and free with running time O(1) Goal 2. Maximize peak utilization: max{ allocated(t) : t ⩽ T} / heapsize(T) If the heap can shrink, take max{ heapsize(t) : t ⩽ T} ● Problem: fragmentation , internal (e.g., larger block allocated for alignment) ● or external (free space between allocated blocks) ● Severity of external fragmentation depends also on future requests

Implementation: Implicit Free Lists Each square represents 4 bytes Header: (block size) / (allocated) Block size includes header/padding, always a multiple of 8 bytes. ● Can scan the list using headers but O(# blocks) , not O(# free blocks) ● Special terminating header: zero size, allocated bit set (will not be merged) ● With 1-word header and 2-word alignment, minimum block size is 2 words ●

CS356 : Discussion #10 Dynamic Memory and Cache Lab Illustrations - PowerPoint PPT Presentation

CS356 : Discussion #10 Dynamic Memory and Cache Lab Illustrations from CS:APP3e textbook Cache Lab Goal To write a small C simulator of caching strategies. Expect about 200-300 lines of code. Starting point in your repository.

CS356 : Discussion #4 Assembly Instructions & Debugging with GDB Last week: Operand Forms

CS356 : Discussion #5 Assembly Procedures and Arrays Procedures Functions are a key abstraction

CS356 : Discussion #9 Cache Lab & Review for Midterm II Illustrations from CS:APP3e textbook

CS356 : Discussion #15 Review for Final Exam Marco Paolieri (paolieri@usc.edu) Illustrations

CS356 : Discussion #3 Assembly Instructions What about programs that operate on data? Integer

CS356 : Discussion #2 Integer Operations & Floating-Point Operations Integers in C (64-bit

CS356 : Discussion #11 Dynamic Memory, Allocation Lab and Linking Illustrations from CS:APP3e

CS356 : Discussion #14 Processor Architecture Marco Paolieri (paolieri@usc.edu) Illustrations

Introduction to CS356 CS356 Object-Oriented Design and Programming http://cs356.yusun.io

CS356 : Discussion #13 Review for Final Exam Illustrations from CS:APP3e textbook Processor

SOLID: Principles of OOD CS356 Object-Oriented Design and Programming http://cs356.yusun.io

CS356 Unit 11 Linking 11.2 In complex C projects... We would like to: Split source into

CS356 Unit 4 Intro to x86 Instruction Set 4.2 Why Learn Assembly To understand something of

CS356 Unit 4 x86 Instruction Set 4.2 Why Learn Assembly Understand hardware limitations

CS356 Unit 7 Data Layout & Intermediate Stack Frames 7.2 Structs CS:APP 3.9.1 Structs

Acknowledgement Part of the presentation is based on Prof. Douglas Schmidts lecture

Recap: Refactoring Improve the structure of code No value gain at the moment, but

CS356 Unit 9 Virtual Memory & Address Translation 9.2 Indirection Indirection means

CS356 Unit 15 Review 15.2 Final Jeopardy Binary Instruction Random Riddles Memory Processor

CS356 Unit 10 Memory Allocation & Heap Management 10.2 BASIC OS CONCEPTS & TERMINOLOGY

CS356 Unit 10 Memory Allocation & Heap Management BASIC OS CONCEPTS & TERMINOLOGY 10.3

CS356 Unit 12 Processor Hardware Organization Pipelining 12.2 From combinational to sequential

CS356 Unit 5 x86 Control Flow 5.2 JUMP/BRANCHING OVERVIEW 5.3 Concept of Jumps/Branches

Goals Understand the terms and ideas used in a modern, high-performance processor CS356 Unit

CS356 : Discussion #10 Dynamic Memory and Cache Lab Illustrations - PowerPoint PPT Presentation

CS356 : Discussion #10 Dynamic Memory and Cache Lab Illustrations from CS:APP3e textbook Cache Lab Goal To write a small C simulator of caching strategies. Expect about 200-300 lines of code. Starting point in your repository.

CS356 : Discussion #4 Assembly Instructions &amp; Debugging with GDB Last week: Operand Forms

CS356 : Discussion #5 Assembly Procedures and Arrays Procedures Functions are a key abstraction

CS356 : Discussion #9 Cache Lab &amp; Review for Midterm II Illustrations from CS:APP3e textbook

CS356 : Discussion #15 Review for Final Exam Marco Paolieri (paolieri@usc.edu) Illustrations

CS356 : Discussion #3 Assembly Instructions What about programs that operate on data? Integer

CS356 : Discussion #2 Integer Operations &amp; Floating-Point Operations Integers in C (64-bit

CS356 : Discussion #11 Dynamic Memory, Allocation Lab and Linking Illustrations from CS:APP3e

CS356 : Discussion #14 Processor Architecture Marco Paolieri (paolieri@usc.edu) Illustrations

Introduction to CS356 CS356 Object-Oriented Design and Programming http://cs356.yusun.io

CS356 : Discussion #13 Review for Final Exam Illustrations from CS:APP3e textbook Processor

SOLID: Principles of OOD CS356 Object-Oriented Design and Programming http://cs356.yusun.io

CS356 Unit 11 Linking 11.2 In complex C projects... We would like to: Split source into

CS356 Unit 4 Intro to x86 Instruction Set 4.2 Why Learn Assembly To understand something of

CS356 Unit 4 x86 Instruction Set 4.2 Why Learn Assembly Understand hardware limitations

CS356 Unit 7 Data Layout &amp; Intermediate Stack Frames 7.2 Structs CS:APP 3.9.1 Structs

Acknowledgement Part of the presentation is based on Prof. Douglas Schmidts lecture

Recap: Refactoring Improve the structure of code No value gain at the moment, but

CS356 Unit 9 Virtual Memory &amp; Address Translation 9.2 Indirection Indirection means

CS356 Unit 15 Review 15.2 Final Jeopardy Binary Instruction Random Riddles Memory Processor

CS356 Unit 10 Memory Allocation &amp; Heap Management 10.2 BASIC OS CONCEPTS &amp; TERMINOLOGY

CS356 Unit 10 Memory Allocation &amp; Heap Management BASIC OS CONCEPTS &amp; TERMINOLOGY 10.3

CS356 Unit 12 Processor Hardware Organization Pipelining 12.2 From combinational to sequential

CS356 Unit 5 x86 Control Flow 5.2 JUMP/BRANCHING OVERVIEW 5.3 Concept of Jumps/Branches

Goals Understand the terms and ideas used in a modern, high-performance processor CS356 Unit

CS356 : Discussion #4 Assembly Instructions & Debugging with GDB Last week: Operand Forms

CS356 : Discussion #9 Cache Lab & Review for Midterm II Illustrations from CS:APP3e textbook

CS356 : Discussion #2 Integer Operations & Floating-Point Operations Integers in C (64-bit

CS356 Unit 7 Data Layout & Intermediate Stack Frames 7.2 Structs CS:APP 3.9.1 Structs

CS356 Unit 9 Virtual Memory & Address Translation 9.2 Indirection Indirection means

CS356 Unit 10 Memory Allocation & Heap Management 10.2 BASIC OS CONCEPTS & TERMINOLOGY

CS356 Unit 10 Memory Allocation & Heap Management BASIC OS CONCEPTS & TERMINOLOGY 10.3