Memory FIFOs for uncommitted writes Consistency Invalidate queues - PowerPoint PPT Presentation

Sistemi operativi – Operating Systems Università degli studi di Udine Sistemi operativi – Operating Systems Università degli studi di Udine Sources of out-of-order memory accesses � Compiler optimizations � Store buffers Memory � FIFOs for uncommitted writes Consistency � Invalidate queues (for cache coherency) � Data prefetch Models � Banked cache architectures � Networked interconnect � Non-uniform memory access (NUMA) architectures: different accesses to memory have different latencies � ... Sistemi operativi – Operating Systems Università degli studi di Udine Sistemi operativi – Operating Systems Università degli studi di Udine int add3 (int x) { add3: int add3 (int x) { add3: int i; mov r0, r0, asl #3 int i; mov r0, r0, asl #3 Compiler optimizations for (i=0; i<3; i++) x += x; mov pc, lr for (i=0; i<3; i++) x += x; mov pc, lr return x; return x; } } C code ARM assembly code This function always returns 8·x: compiler can optimize code � Language semantic does not consider int add_vals (int *vec) { add_vals: int add_vals (int *vec) { add_vals: 1. Side-effects of memory accesses int y = vec[1]; ldr r3, [r0] int y = vec[1]; ldr r3, [r0] y += vec[0]; ldr r0, [r0, #4] y += vec[0]; ldr r0, [r0, #4] 2. Multi-threading return y; add r0, r0, r3 return y; add r0, r0, r3 } mov pc, lr } mov pc, lr 3. Asynchronous execution C code ARM assembly code Result does not depend on access order: compiler can change loads order � Compiler can: waitval: waitval: � Reorder instructions ldr r3, [r0] ldr r3, [r0] void waitval (int *ptr) { void waitval (int *ptr) { cmp r3, #0 cmp r3, #0 while (*ptr == 0) � Eliminate operations while (*ptr == 0) movne pc, lr movne pc, lr continue; continue; loop: loop: } } � Some compiler optimization can be controlled by the b loop b loop volatile qualifier C code ARM assembly code Compiler does not need to consider that someone else can change *ptr

Sistemi operativi – Operating Systems Università degli studi di Udine Sistemi operativi – Operating Systems Università degli studi di Udine Volatile Examples int *ptr; /* pointer to int */ int *ptr; /* pointer to int */ � Semantic volatile int *ptr_to_vol; /* pointer to volatile int */ volatile int *ptr_to_vol; /* pointer to volatile int */ � Each read from a volatile variable requires an actual load int * volatile vol_ptr; /* volatile pointer to int */ int * volatile vol_ptr; /* volatile pointer to int */ and may return a different value volatile int * volatile vol_ptr_to_vol; /* volatile pointer to volatile int */ volatile int * volatile vol_ptr_to_vol; /* volatile pointer to volatile int */ � Compiler optimization cannot merge reads from the same address � Beware the semantic: � Each write to a volatile variable requires an actual store � Compiler optimization cannot cancel stores � a = *ptr_to_vol; � is a volatile access � Required to access I/O address space � a = *vol_ptr; � is not a volatile access Note: this is the C/C++ semantic the Java semantic differs (it also implies atomicity) Sistemi operativi – Operating Systems Università degli studi di Udine Sistemi operativi – Operating Systems Università degli studi di Udine Volatile Volatile � Inconsistent qualification causes errors volatile int A; volatile int A; volatile int B; volatile int B; volatile int A; volatile int A; A=1; /* these two lines won't be */ A=1; /* these two lines won't be */ � Volatile does not enforce ordering with non-volatile volatile int B; volatile int B; B=1; /* reordered by compiler */ B=1; /* reordered by compiler */ accesses A=1; /* these two lines won't be */ A=1; /* these two lines won't be */ B=1; /* reordered by compiler but */ B=1; /* reordered by compiler but */ int A; int A; /* accesses can be reordered */ /* accesses can be reordered */ volatile int B; volatile int B; /* by HW */ /* by HW */ � Volatile does not enforce order on how access are A=1; /* these two lines can be */ A=1; /* these two lines can be */ actually performed B=1; /* reordered by compiler */ B=1; /* reordered by compiler */ � Volatile does not mean atomic volatile int X; volatile int X; X=1; /* this assignment can be interrupted or preempted */ X=1; /* this assignment can be interrupted or preempted */

Sistemi operativi – Operating Systems Università degli studi di Udine Sistemi operativi – Operating Systems Università degli studi di Udine Memory barrier Store Buffer � Implementation on GCC � Record the store in buffer until is actually performed asm volatile ("" : : : "memory"); asm volatile ("" : : : "memory"); � Hide memory latency � Cache latency � Cache-miss on write � This inline assembly code: � Processor can execute other instructions 1. contains no instructions � Data dependency (RAW) 2. may read or write all of RAM � Wait until the write is actually performed in memory or in cache � Read the data from the store buffer (store forwarding) � Hence: � Data dependency (WAW) compiler memory accesses reordering is not allowed � Add a new entry in the store buffer around the barrier in either direction � Replace the previous write in the store buffer Sistemi operativi – Operating Systems Università degli studi di Udine Sistemi operativi – Operating Systems Università degli studi di Udine Example Example � Execution: � Processor P1 executes � 1: store A : cache miss P1 P2 � write the updated value in store buffer � 1) store A � send a read request (data will come from P2 cache) � 2) store B � several clock cycles needed Store Store � P1 can proceed, (the new value is in the store buffer) buffer buffer � A and B are shared with P2: � P2 does not see the write Cache Cache � 2: store B : cache hit � A is in P2 cache B B A � data is written in cache � B is in both caches � a coherence message is sent to P2 � P2 sees the write Interconnect � 3: A is loaded in P1 cache � 4: A is updated in P1 cache � a coherence message is sent to P2 � P2 sees the write � � P2 sees the store on B first, then the store on A

Sistemi operativi – Operating Systems Università degli studi di Udine Sistemi operativi – Operating Systems Università degli studi di Udine Consequence Cache coherency � Cache coherency can require cache line invalidation Note : initially: A=0 and B=0 A and B are volatiles � A processor send an invalidate message to another one � Target processor must invalidate cache line A = 1 while (B==0) continue; � Invalidate Queue B = 1 assert (A==1); /* this can fail! */ � Store invalidate requests while the cache is busy � Invalidate the line when the cache is ready P1 P2 If P2 sees the stores performed by P1 in reverse order, the assertion fails Sistemi operativi – Operating Systems Università degli studi di Udine Sistemi operativi – Operating Systems Università degli studi di Udine Data prefetch Banked cache architectures � Processor can read data before the actual load � Caches split in several banks instruction � While accesses to busy banks must wait, accesses to idle banks can proceed � Hide memory latency � Preload data in cache Processor � Speculative execution Store � Execute instructions after a branch before the branch buffer Cache Cache \ Interconnect

Sistemi operativi – Operating Systems Università degli studi di Udine Sistemi operativi – Operating Systems Università degli studi di Udine Definitions Definitions � Performed � Program order � Write � a write by processor i is performed with respect to processor k when: � The order of operations as specified by software � a read issued by k to the same address returns the value stored by i � Execution order � Read � a read by processor i is performed with respect to processor k when: � The order of operations as executed by a processor � a write issued by k to the same address cannot affect the value read by i � Perceived order � Globally Performed � The order of operations as seen by processors and memories � globally performed: is performed with respect to all processors � Memory consistency model � Write � A write is globally performed when its modi cation has been � Rules that specify the allowed behavior of programs in terms fi propagated to all processors of memory accesses � Read � Rules: order restrictions � A read is globally performed when the value it returns is bound and the write that wrote this value is globally performed Sistemi operativi – Operating Systems Università degli studi di Udine Sistemi operativi – Operating Systems Università degli studi di Udine Memory consistency models Memory consistency models � Rules on access ordering can regard: � Uniform consistency models � Location (address of access) � Rules do no concern category of accesses � Direction � read, write, read-write � Value � Causality � Hybrid consistency models � behavior of an access depends on the behavior of another one � Category of accesses matters � Category � shared / private � synchronizing / not synchronizing

Memory FIFOs for uncommitted writes Consistency Invalidate queues - PowerPoint PPT Presentation

Sistemi operativi Operating Systems Universit degli studi di Udine Sistemi operativi Operating Systems Universit degli studi di Udine Sources of out-of-order memory accesses Compiler optimizations Store buffers Memory

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if

Personal SE Computer Memory Addresses C Pointers Computer Memory Organization Memory is a

Memory Memory processing is the ability to: Acquire (Short term memory) Manipulate

Memory Management Memory Manager Requirements Minimize primary memory access time

UNIFIED MEMORY IN CUDA 6 MARK HARRIS NVIDIA CONFIDENTIAL Unified Memory Dramatically Lower

Virtual Memory and Virtual Memory and Demand Paging Demand Paging Virtual Memory Illustrated

Dynamic Memory Management 333 Dynamic Memory Management Process Memory Layout Process Memory

Lecture 11: Persistent Memory Databases 1 / 71 Persistent Memory Databases Recap

Memory Hierarchy: Caching CSE 141, S2'06 Jeff Brown The memory subsystem Computer Control

Memory Management Ideally programmers want memory that is large fast non

28.05.04 09:50 Memory Management The computer memory is a limited resource so the Memory

Welcome to The Memory Class An Introduction to Memory Problems and the Memory Center Agenda For

Long-Term Memory Introduction STM versus LTM Episodic Memory Semantic Memory

CSCI 446: Artificial Intelligence Probability Instructor: Michele Van Dyne [These slides were

DRY-SAS/DBMS UPDATE Executive Committee meeting 9 OCTOBER 2020 BACKGROUND DRY-SAS AND DBMS

Introduc)on to Bayesian methods Lecture 14 David Sontag New

CS 188: Artificial Intelligence Probability Pieter Abbeel UC Berkeley Many slides adapted

Dynamical Symmetry Breaking and Collider Physics (Review Talk) 201 .1 . at Osaka

Privacy Enhancing Technology and the Right to be Forgotten Michael Kolain Taking on the epic

Preference for Boys, Family Size and Educational Attainment in India Santosh Kumar 1 Adriana

HSM3D: Feature-Less Global 6DOF Scan-Matching in the Hough/Radon Domain Andrea Censi Stefano