Università degli studi di Udine Sistemi operativi – Operating Systems
Memory Consistency Models
Università degli studi di Udine Sistemi operativi – Operating Systems
Sources of out-of-order memory accesses
Compiler optimizations Store buffers FIFOs for uncommitted writes Invalidate queues (for cache coherency) Data prefetch Banked cache architectures Networked interconnect Non-uniform memory access (NUMA) architectures:
different accesses to memory have different latencies
... Università degli studi di Udine Sistemi operativi – Operating Systems
Compiler optimizations
Language semantic does not consider
- 1. Side-effects of memory accesses
- 2. Multi-threading
- 3. Asynchronous execution
Compiler can:
Reorder instructions Eliminate operations Some compiler optimization can be controlled by the
volatile qualifier
Università degli studi di Udine Sistemi operativi – Operating Systems
void waitval (int *ptr) { while (*ptr == 0) continue; } void waitval (int *ptr) { while (*ptr == 0) continue; } waitval: ldr r3, [r0] cmp r3, #0 movne pc, lr loop: b loop waitval: ldr r3, [r0] cmp r3, #0 movne pc, lr loop: b loop
Compiler does not need to consider that someone else can change *ptr
ARM assembly code C code
int add3 (int x) { int i; for (i=0; i<3; i++) x += x; return x; } int add3 (int x) { int i; for (i=0; i<3; i++) x += x; return x; } add3: mov r0, r0, asl #3 mov pc, lr add3: mov r0, r0, asl #3 mov pc, lr
This function always returns 8·x: compiler can optimize code
ARM assembly code C code
int add_vals (int *vec) { int y = vec[1]; y += vec[0]; return y; } int add_vals (int *vec) { int y = vec[1]; y += vec[0]; return y; } add_vals: ldr r3, [r0] ldr r0, [r0, #4] add r0, r0, r3 mov pc, lr add_vals: ldr r3, [r0] ldr r0, [r0, #4] add r0, r0, r3 mov pc, lr
Result does not depend on access order: compiler can change loads order
ARM assembly code C code