University of Washington
Today
HW3 extension
- Phew!
Lab 4? Finish up caches, exceptional control flow
1
Today HW3 extension Phew! Lab 4? Finish up caches, exceptional - - PowerPoint PPT Presentation
University of Washington Today HW3 extension Phew! Lab 4? Finish up caches, exceptional control flow 1 University of Washington Cache Associativity 8-way 1-way 2-way 4-way 1 set, 8 sets, 4 sets, 2 sets, 8 blocks 1
University of Washington
HW3 extension
Lab 4? Finish up caches, exceptional control flow
1
University of Washington
2 1 2 3 4 5 6 7 Set 1 2 3 Set 1 Set 1-way 8 sets, 1 block each 2-way 4 sets, 2 blocks each 4-way 2 sets, 4 blocks each Set 8-way 1 set, 8 blocks direct mapped fully associative
University of Washington
Cold (compulsory) miss
3
University of Washington
Cold (compulsory) miss
Conflict miss
4
University of Washington
Cold (compulsory) miss
Conflict miss
Capacity miss
5
University of Washington
L3 unified cache (shared by all cores) Main memory Processor package L1 i-cache and d-cache: 32 KB, 8-way, Access: 4 cycles L2 unified cache: 256 KB, 8-way, Access: 11 cycles L3 unified cache: 8 MB, 16-way, Access: 30-40 cycles Block size: 64 bytes for all caches.
6
University of Washington
Multiple copies of data exist:
What is the main problem with that?
7
University of Washington
Multiple copies of data exist:
What to do on a write-hit?
What to do on a write-miss?
Typical caches:
8
University of Washington
9
University of Washington
Examples
Some design differences
10
University of Washington
Write code that has locality
How to achieve?
11
University of Washington
c = (double *) calloc(sizeof(double), n*n); /* Multiply n x n matrices a and b */ void mmm(double *a, double *b, double *c, int n) { int i, j, k; for (i = 0; i < n; i++) for (j = 0; j < n; j++) for (k = 0; k < n; k++) c[i*n + j] += a[i*n + k]*b[k*n + j]; }
12
University of Washington
Assume:
First iteration:
n
8 wide
13
University of Washington
Assume:
Other iterations:
Total misses:
n
8 wide
14
University of Washington
c = (double *) calloc(sizeof(double), n*n); /* Multiply n x n matrices a and b */ void mmm(double *a, double *b, double *c, int n) { int i, j, k; for (i = 0; i < n; i+=B) for (j = 0; j < n; j+=B) for (k = 0; k < n; k+=B) /* B x B mini matrix multiplications */ for (i1 = i; i1 < i+B; i1++) for (j1 = j; j1 < j+B; j1++) for (k1 = k; k1 < k+B; k1++) c[i1*n + j1] += a[i1*n + k1]*b[k1*n + j1]; }
i1 j1
Block size B x B
15
University of Washington
Assume:
First (block) iteration:
Block size B x B n/B blocks
16
University of Washington
Assume:
Other (block) iterations:
Total misses:
Block size B x B n/B blocks
17
University of Washington
No blocking:
Blocking:
If B = 8 difference is 4 * 8 * 9 / 8 = 36x If B = 16 difference is 4 * 16 * 9 / 8 = 72x Suggests largest possible block size B, but limit 3B2 < C! Reason for dramatic difference:
18
University of Washington
Programmer can optimize for cache performance
All systems favor “cache-friendly code”
19
University of Washington
L3 unified cache (shared by all cores) Main memory Processor package L1 i-cache and d-cache: 32 KB, 8-way, Access: 4 cycles L2 unified cache: 256 KB, 8-way, Access: 11 cycles L3 unified cache: 8 MB, 16-way, Access: 30-40 cycles Block size: 64 bytes for all caches.
20
University of Washington
64M 8M 1M 128K 16K 2K 1000 2000 3000 4000 5000 6000 7000 s1 s3 s5 s7 s9 s11 s13 s15 s32 Working set size (bytes) Read throughput (MB/s) Stride (x8 bytes) L1 L2 Mem L3
Intel Core i7 32 KB L1 i-cache 32 KB L1 d-cache 256 KB unified L2 cache 8M unified L3 cache All caches on-chip
21
University of Washington
22 car *c = malloc(sizeof(car)); c->miles = 100; c->gals = 17; float mpg = get_mpg(c); free(c); Car c = new Car(); c.setMiles(100); c.setGals(17); float mpg = c.getMPG();
get_mpg: pushq %rbp movq %rsp, %rbp ... popq %rbp ret
0111010000011000 100011010000010000000010 1000100111000010 110000011111101000011111
Data & addressing Integers & floats Machine code & C x86 assembly programming Procedures & stacks Arrays & structs Memory & caches Exceptions & processes Virtual memory Memory allocation Java vs. C
University of Washington
So far, we’ve seen how the flow of control changes as a single
A CPU executes more than one program at a time though – we
Exceptional control flow is the basic mechanism used for:
23
University of Washington
Processors do only one thing:
24
University of Washington
Up to now: two ways to change control flow: … which ones?
25
University of Washington
Up to now: two ways to change control flow:
Processor also needs to react to changes in system state
26
University of Washington
Up to now: two ways to change control flow:
Processor also needs to react to changes in system state
Can jumps and procedure calls achieve this?
27
University of Washington
Up to now: two ways to change control flow:
Processor also needs to react to changes in system state
Can jumps and procedure calls achieve this?
28
University of Washington
Exists at all levels of a computer system Low level mechanisms
Higher level mechanisms
29
University of Washington
An exception is transfer of control to the operating system (OS)
exception exception processing by exception handler
event
I_current I_next
30
University of Washington
1 2
n-1
31
Each type of event has a unique exception number k
k = index into exception table (a.k.a. interrupt vector)
Handler k is called each time exception k occurs
Exception Table code for exception handler 0 code for exception handler 1 code for exception handler 2 code for exception handler n-1
Exception numbers
University of Washington
Caused by events external to the processor
Examples:
32
University of Washington
Caused by events that occur as a result of executing an
33
University of Washington
0804d070 <__libc_open>: . . . 804d082: cd 80 int $0x80 804d084: 5b pop %ebx . . .
exception
returns
int pop
34
University of Washington
int a[1000]; main () { a[500] = 13; } 80483b7: c7 05 10 9d 04 08 0d movl $0xd,0x8049d10
exception: page fault Create page and load into memory returns
movl
35
University of Washington
int a[1000]; main () { a[5000] = 13; } 80483b7: c7 05 60 e3 04 08 0d movl $0xd,0x804e360
exception: page fault detect invalid address
movl
signal process
36
University of Washington
37
Exception Number Description Exception Class Divide error Fault 13 General protection fault Fault 14 Page fault Fault 18 Machine check Abort 32-127 OS-defined Interrupt or trap 128 (0x80) System call Trap 129-255 OS-defined Interrupt or trap http://download.intel.com/design/processor/manuals/253665.pdf
University of Washington
Exceptions
38