Distributed Shared Memory Distributed Shared Memory Systems - PDF document

Operating Systems DSM Distributed Shared Memory • Distributed Shared Memory Systems – Page based – Shared-variable based • Consistency Models – Strict consistency – Sequential consistency – Release consistency Distributed Shared Memory • Distributed Shared Memory (DSM): have collection of workstations share a single, virtual address space. • The main objective of DSM: – to alleviate the burden on the programmer – by hiding the fact that physical memory is distributed and not accessible in its entirety to all processors. • DSM creates the illusion of a single shared memory. – much like a virtual memory creates the illusion of a memory that is larger than the available physical memory. • Vanilla implementation: – references to local pages done in hardware. – references to remote page cause HW page fault; trap to OS; load the page from remote; restart faulting instruction. • Optimizations: – share only selected portions of memory. – replicate shared variables on multiple machines.

Operating Systems DSM Page-Based DSM • NUMA (Non-Uniform Memory Access) – processor can directly reference local and remote memory locations – no software intervention • Workstations on network – can only reference local memory • Goal of DSM – add software to allow NOWs (Network of Workstations) to run multiprocessor code – simplicity of programming NUMA Machine NUMA Node NUMA Node

Operating Systems DSM Shared-Variable DSM • Is it necessary to share entire address space? • Share individual variables. • more variety in possible update algorithms for replicated variables • opportunity to eliminate false sharing Design Issues • Replication – replicate read-only portions – replicate read and write portions • Granularity – restriction: memory portions multiples of pages – pros of large portions: • amortize protocol overhead • locality of reference Processor 1 Processor 2 – cons of large portions • false sharing! A A B B code using A code using B

Operating Systems DSM Basic Design • Emulate cache of multiprocessor using the MMU and system software 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 2 5 1 3 6 4 7 11 13 15 9 8 10 12 14 CPU CPU CPU CPU Design Issues (cont) • Update Options: Write-Update vs . Write-Invalidate • Write-Update: – Writes made locally are multicast to all copies of the data item. – Multiple writers can share same data item. – Consistency depends on multicast protocol. • E.g. Sequential consistency achieved with totally ordered multicast. – Reads are cheap • Write-Invalidate – Distinguish between read-only (multiple copies possible) and writable (only one copy possible). – When process attempts to write to remote or replicated item, it first sends a multicast message to invalidate copies; If necessary, get copy of item. – Ensures sequential consistency.

Operating Systems DSM Protocol for Handling DSM Pages (only one writable copy) Operation Page Location Page Status Actions Taken Before Local Read/Write Read Local Read-only Write Local Read-only Invalidate remote copies; Upgrade local copy to writable Read Remote Read-only Make local read-only copy Write Remote Read-only Invalidate remote objects; Make local writable copy Read Local Writable Write Local Writable Read Remote Writable Downgrade page to read-only; Make local read-only copy Write Remote Writable Transfer remote writable copy to local memory Design Issues (cont) • Finding the Owner – broadcast request for owner • combine request with requested operation • problem: broadcast effects all participants (interrupts all processors), uses network bandwidth – page manager • possible hot spot • multiple page manager, hash on page address – probable owner • each process keeps track of probable owner • Update probable owner whenever – Process transfers ownership of a page – Process handles invalidation request for a page – Process receives read access for a page from another process – Process receives request for page it does not own (forwards request to probable owner and resets probable owner to requester) • periodically refresh information about current owners

Operating Systems DSM Probable Owner Chains B C D E B C D E owner A A B C D E B C D E B C D E B C D E owner A A A A owner A issues write A issues read B C D E B C D E owner A A alternatively … Design Issues (cont) • Finding the copies – How to find the copies when they must be invalidated – broadcast requests • what when broadcasts are not reliable? – copysets • maintained by page manager or by owner � � � � � 3 4 2 3 4 1 2 3 2 3 4 1 � � � � � � � � CPU CPU CPU CPU CPU

Operating Systems DSM Consistency Models • Single copy of writable page: –simple, but expensive for heavily shared pages • Multiple copies of writable page –how to keep pages consistent? • Perfect consistency is expensive. • How to relax consistency requirements? • Consistency model Contract between application and memory. If application agrees to obey certain rules, memory promises to work correctly. Consistency Models • Strict consistency: – A DSM is said to obey strict consistency if reading a variable x always returns the value written to x by the most recently executed write operation. • Sequential consistency: – A DSM is said to obey sequential consistency if the sequence of values read by the different processes corresponds to some sequential interleaved execution of the same processes. • Release consistency

Operating Systems DSM Strict Consistency • Most stringent consistency model: Any read to a memory location x returns the value stored by the most recent write operation to x. • strict consistency observed in uni-processor systems. • has come to be expected by uni-processor programmers – very unlikely to be supported by any multiprocessor • All writes are immediately visible by all processes • Requires that absolute global time order is maintained • Example of strict consistency: Initial: x = 0 Real time: t1 t2 t3 t4 P1: x = 1; a1 = x; x = 2; b1 = x; P2: a2 = x; b2 = x; Sequential Consistency • Strict consistency impossible to implement. • Programmers can manage with weaker models. • Sequential consistency [Lamport 79] The result of any execution is the same as if the operations of all processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program. • Any valid interleaving is acceptable, but all processes must see the same sequence of memory references. • Sequential consistency does not guarantee that read returns value written by another process anytime earlier. • Results are not deterministic.

Operating Systems DSM Strict Versus Sequential Consistency Initial: x = 0 Real time: t1 t2 t3 t4 P1: x = 1; a1 = x; x = 2; b1 = x; P2: a2 = x; b2 = x; • Under strict consistency, both processes, P1 and P2, must see 1. x = 1 with their first read operation 2. x = 2 with the second • Under sequential consistency, result is considered correct as long as: 1. The two read operations of P1 always return the values 1 and 2, respectively. 2. The two read operations of P2 return any of the following combinations of values: 0 0, 0 1, 1 1, 1 2, 2 2. Release Consistency • Operations : – acquire critical region: C.S. is about to be entered. • make sure that local copies of variables are made consistent with remote ones. – release critical region: C.S. has just been exited. • propagate shared variables to other machines. – Operations may apply to a subset of shared variables P1 P2 � x x � lock; lock; � delayed x = 1; a = x; until P1 cause unlock; unlock; unlocks propagation

Operating Systems DSM Release Consistency (cont) • Possible implementation – Acquire: 1. Send request for lock to synchronization processor; wait until granted. 2. Issue arbitrary read/writes to/from local copies. – Release: 1. Send modified data to other machines. 2. Wait for acknowledgements. 3. Inform synchronization processor about release. – Operations on different locks happen independently. • Release consistency: 1. Before an ordinary access to a shared variable is performed, all previous acquires done by the process must have completed successfully. 2. Before a release is allowed to be performed, all previous reads and writes done by the process must have completed. Entry Consistency • Operations : – acquire critical region: C.S. is about to be entered. • imports only those shared variables pertaining to the current lock • unnecessary overhead can be eliminated • whereas lazy release consistency imports all shared variables – release critical region: C.S. has just been exited. P1 P2 � x x � cause lock; lock; � delayed propagation x = 1; a = x; until P1 unlock; unlock; unlocks

Operating Systems DSM Consistency Models: Summary Consistency Description Strict Absolute time ordering of all shared accesses matters Sequential All processes see all shared accesses in the same order Release Shared data are made consistent when a critical region is exited

Distributed Shared Memory Distributed Shared Memory Systems - PDF document

Operating Systems DSM Distributed Shared Memory Distributed Shared Memory Systems Page based Shared-variable based Consistency Models Strict consistency Sequential consistency Release consistency

Distributed Shared Memory 1 Distributed Shared Memory Making the main memory of a cluster of

Distributed Shared Memory Shared memory : difficult to realize vs . easy to program with.

Distributed Shared Memory Presented by Humayun Arafat 1 Outline Background Shared Memory,

Distributed Shared Memory and Machine Learning CSci 8211 Chai-Wen Hsieh 11/5/2018 Agenda

Outline Asynchronous shared memory model Wait-free Consensus in shared memory with R/W

COMP 590-154: Computer Architecture Shared-Memory Multi-Processors Shared-Memory Multiprocessors

Distributed Memory and Cache Consistency Distributed Memory and Cache Consistency (some slides

Distributed Memory and Cache Consistency Distributed Memory and Cache Consistency (some slides

Distributed Shared Memory History, fundamentals and a few examples Coming up Cluster Computing

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Programming with Shared Memory In a shared memory system, any memory location can be accessible by

Todays Topics - Distributed Shared Memory The Shared Memory Abstraction, why? Approaches

Shared Memory Programming Introduction to OpenMP Overview Shared memory systems Basic

Threaded Programming Lecture 1: Concepts Overview Shared memory systems Basic Concepts

Shared Memory Bus for Multiprocessor Systems Mat Laibowitz and Albert Chiou Group 6 Shared

T re a tme nt o f Mo o d Diso rde rs in Midlife Disc lo sure s I HAVE NO DISCL OSURE S Wo

1 Problem Distributed Directory Directory becomes a bottleneck P0 P1 Page Location Page

D ISTRIBUTED S YSTEMS [COMP9243] Replication and consistency of memory objects Shared address

Some Highlights of DSM-5 Jan Fawcett, MD Conflicts of Interest: More Enjoyment Than DSM-5

So far... Uncertianty of what? CV plot Prediction data Plotting - data processing Variance of

Unleashing dynamic task scheduling at rack-scale Magnus Norgren, Andra Hugo (DDN

The Case for Heterogeneous HTAP Raja Appuswamy, Manos Karpathiotakis, Danica Porobic, and

15-721 DATABASE SYSTEMS Lecture #10 Storage Models & Data Layout Andy Pavlo / /

Distributed Shared Memory Distributed Shared Memory Systems - PDF document

Operating Systems DSM Distributed Shared Memory Distributed Shared Memory Systems Page based Shared-variable based Consistency Models Strict consistency Sequential consistency Release consistency

Distributed Shared Memory 1 Distributed Shared Memory Making the main memory of a cluster of

Distributed Shared Memory Shared memory : difficult to realize vs . easy to program with.

Distributed Shared Memory Presented by Humayun Arafat 1 Outline Background Shared Memory,

Distributed Shared Memory and Machine Learning CSci 8211 Chai-Wen Hsieh 11/5/2018 Agenda

Outline Asynchronous shared memory model Wait-free Consensus in shared memory with R/W

COMP 590-154: Computer Architecture Shared-Memory Multi-Processors Shared-Memory Multiprocessors

Distributed Memory and Cache Consistency Distributed Memory and Cache Consistency (some slides

Distributed Memory and Cache Consistency Distributed Memory and Cache Consistency (some slides

Distributed Shared Memory History, fundamentals and a few examples Coming up Cluster Computing

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Programming with Shared Memory In a shared memory system, any memory location can be accessible by

Todays Topics - Distributed Shared Memory The Shared Memory Abstraction, why? Approaches

Shared Memory Programming Introduction to OpenMP Overview Shared memory systems Basic

Threaded Programming Lecture 1: Concepts Overview Shared memory systems Basic Concepts

Shared Memory Bus for Multiprocessor Systems Mat Laibowitz and Albert Chiou Group 6 Shared

T re a tme nt o f Mo o d Diso rde rs in Midlife Disc lo sure s I HAVE NO DISCL OSURE S Wo

1 Problem Distributed Directory Directory becomes a bottleneck P0 P1 Page Location Page

D ISTRIBUTED S YSTEMS [COMP9243] Replication and consistency of memory objects Shared address

Some Highlights of DSM-5 Jan Fawcett, MD Conflicts of Interest: More Enjoyment Than DSM-5

So far... Uncertianty of what? CV plot Prediction data Plotting - data processing Variance of

Unleashing dynamic task scheduling at rack-scale Magnus Norgren, Andra Hugo (DDN

The Case for Heterogeneous HTAP Raja Appuswamy, Manos Karpathiotakis, Danica Porobic, and

15-721 DATABASE SYSTEMS Lecture #10 Storage Models &amp; Data Layout Andy Pavlo / /

15-721 DATABASE SYSTEMS Lecture #10 Storage Models & Data Layout Andy Pavlo / /