memory
play

Memory FIFOs for uncommitted writes Consistency Invalidate queues - PowerPoint PPT Presentation

Sistemi operativi Operating Systems Universit degli studi di Udine Sistemi operativi Operating Systems Universit degli studi di Udine Sources of out-of-order memory accesses Compiler optimizations Store buffers Memory


  1. Sistemi operativi – Operating Systems Università degli studi di Udine Sistemi operativi – Operating Systems Università degli studi di Udine Sources of out-of-order memory accesses � Compiler optimizations � Store buffers Memory � FIFOs for uncommitted writes Consistency � Invalidate queues (for cache coherency) � Data prefetch Models � Banked cache architectures � Networked interconnect � Non-uniform memory access (NUMA) architectures: different accesses to memory have different latencies � ... Sistemi operativi – Operating Systems Università degli studi di Udine Sistemi operativi – Operating Systems Università degli studi di Udine int add3 (int x) { add3: int add3 (int x) { add3: int i; mov r0, r0, asl #3 int i; mov r0, r0, asl #3 Compiler optimizations for (i=0; i<3; i++) x += x; mov pc, lr for (i=0; i<3; i++) x += x; mov pc, lr return x; return x; } } C code ARM assembly code This function always returns 8·x: compiler can optimize code � Language semantic does not consider int add_vals (int *vec) { add_vals: int add_vals (int *vec) { add_vals: 1. Side-effects of memory accesses int y = vec[1]; ldr r3, [r0] int y = vec[1]; ldr r3, [r0] y += vec[0]; ldr r0, [r0, #4] y += vec[0]; ldr r0, [r0, #4] 2. Multi-threading return y; add r0, r0, r3 return y; add r0, r0, r3 } mov pc, lr } mov pc, lr 3. Asynchronous execution C code ARM assembly code Result does not depend on access order: compiler can change loads order � Compiler can: waitval: waitval: � Reorder instructions ldr r3, [r0] ldr r3, [r0] void waitval (int *ptr) { void waitval (int *ptr) { cmp r3, #0 cmp r3, #0 while (*ptr == 0) � Eliminate operations while (*ptr == 0) movne pc, lr movne pc, lr continue; continue; loop: loop: } } � Some compiler optimization can be controlled by the b loop b loop volatile qualifier C code ARM assembly code Compiler does not need to consider that someone else can change *ptr

  2. Sistemi operativi – Operating Systems Università degli studi di Udine Sistemi operativi – Operating Systems Università degli studi di Udine Volatile Examples int *ptr; /* pointer to int */ int *ptr; /* pointer to int */ � Semantic volatile int *ptr_to_vol; /* pointer to volatile int */ volatile int *ptr_to_vol; /* pointer to volatile int */ � Each read from a volatile variable requires an actual load int * volatile vol_ptr; /* volatile pointer to int */ int * volatile vol_ptr; /* volatile pointer to int */ and may return a different value volatile int * volatile vol_ptr_to_vol; /* volatile pointer to volatile int */ volatile int * volatile vol_ptr_to_vol; /* volatile pointer to volatile int */ � Compiler optimization cannot merge reads from the same address � Beware the semantic: � Each write to a volatile variable requires an actual store � Compiler optimization cannot cancel stores � a = *ptr_to_vol; � is a volatile access � Required to access I/O address space � a = *vol_ptr; � is not a volatile access Note: this is the C/C++ semantic the Java semantic differs (it also implies atomicity) Sistemi operativi – Operating Systems Università degli studi di Udine Sistemi operativi – Operating Systems Università degli studi di Udine Volatile Volatile � Inconsistent qualification causes errors volatile int A; volatile int A; volatile int B; volatile int B; volatile int A; volatile int A; A=1; /* these two lines won't be */ A=1; /* these two lines won't be */ � Volatile does not enforce ordering with non-volatile volatile int B; volatile int B; B=1; /* reordered by compiler */ B=1; /* reordered by compiler */ accesses A=1; /* these two lines won't be */ A=1; /* these two lines won't be */ B=1; /* reordered by compiler but */ B=1; /* reordered by compiler but */ int A; int A; /* accesses can be reordered */ /* accesses can be reordered */ volatile int B; volatile int B; /* by HW */ /* by HW */ � Volatile does not enforce order on how access are A=1; /* these two lines can be */ A=1; /* these two lines can be */ actually performed B=1; /* reordered by compiler */ B=1; /* reordered by compiler */ � Volatile does not mean atomic volatile int X; volatile int X; X=1; /* this assignment can be interrupted or preempted */ X=1; /* this assignment can be interrupted or preempted */

  3. Sistemi operativi – Operating Systems Università degli studi di Udine Sistemi operativi – Operating Systems Università degli studi di Udine Memory barrier Store Buffer � Implementation on GCC � Record the store in buffer until is actually performed asm volatile ("" : : : "memory"); asm volatile ("" : : : "memory"); � Hide memory latency � Cache latency � Cache-miss on write � This inline assembly code: � Processor can execute other instructions 1. contains no instructions � Data dependency (RAW) 2. may read or write all of RAM � Wait until the write is actually performed in memory or in cache � Read the data from the store buffer (store forwarding) � Hence: � Data dependency (WAW) compiler memory accesses reordering is not allowed � Add a new entry in the store buffer around the barrier in either direction � Replace the previous write in the store buffer Sistemi operativi – Operating Systems Università degli studi di Udine Sistemi operativi – Operating Systems Università degli studi di Udine Example Example � Execution: � Processor P1 executes � 1: store A : cache miss P1 P2 � write the updated value in store buffer � 1) store A � send a read request (data will come from P2 cache) � 2) store B � several clock cycles needed Store Store � P1 can proceed, (the new value is in the store buffer) buffer buffer � A and B are shared with P2: � P2 does not see the write Cache Cache � 2: store B : cache hit � A is in P2 cache B B A � data is written in cache � B is in both caches � a coherence message is sent to P2 � P2 sees the write Interconnect � 3: A is loaded in P1 cache � 4: A is updated in P1 cache � a coherence message is sent to P2 � P2 sees the write � � P2 sees the store on B first, then the store on A

  4. Sistemi operativi – Operating Systems Università degli studi di Udine Sistemi operativi – Operating Systems Università degli studi di Udine Consequence Cache coherency � Cache coherency can require cache line invalidation Note : initially: A=0 and B=0 A and B are volatiles � A processor send an invalidate message to another one � Target processor must invalidate cache line A = 1 while (B==0) continue; � Invalidate Queue B = 1 assert (A==1); /* this can fail! */ � Store invalidate requests while the cache is busy � Invalidate the line when the cache is ready P1 P2 If P2 sees the stores performed by P1 in reverse order, the assertion fails Sistemi operativi – Operating Systems Università degli studi di Udine Sistemi operativi – Operating Systems Università degli studi di Udine Data prefetch Banked cache architectures � Processor can read data before the actual load � Caches split in several banks instruction � While accesses to busy banks must wait, accesses to idle banks can proceed � Hide memory latency � Preload data in cache Processor � Speculative execution Store � Execute instructions after a branch before the branch buffer Cache Cache \ Interconnect

  5. Sistemi operativi – Operating Systems Università degli studi di Udine Sistemi operativi – Operating Systems Università degli studi di Udine Definitions Definitions � Performed � Program order � Write � a write by processor i is performed with respect to processor k when: � The order of operations as specified by software � a read issued by k to the same address returns the value stored by i � Execution order � Read � a read by processor i is performed with respect to processor k when: � The order of operations as executed by a processor � a write issued by k to the same address cannot affect the value read by i � Perceived order � Globally Performed � The order of operations as seen by processors and memories � globally performed: is performed with respect to all processors � Memory consistency model � Write � A write is globally performed when its modi cation has been � Rules that specify the allowed behavior of programs in terms fi propagated to all processors of memory accesses � Read � Rules: order restrictions � A read is globally performed when the value it returns is bound and the write that wrote this value is globally performed Sistemi operativi – Operating Systems Università degli studi di Udine Sistemi operativi – Operating Systems Università degli studi di Udine Memory consistency models Memory consistency models � Rules on access ordering can regard: � Uniform consistency models � Location (address of access) � Rules do no concern category of accesses � Direction � read, write, read-write � Value � Causality � Hybrid consistency models � behavior of an access depends on the behavior of another one � Category of accesses matters � Category � shared / private � synchronizing / not synchronizing

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend