Distributed Shared Memory Distributed Shared Memory Systems - - PDF document

distributed shared memory
SMART_READER_LITE
LIVE PREVIEW

Distributed Shared Memory Distributed Shared Memory Systems - - PDF document

Operating Systems DSM Distributed Shared Memory Distributed Shared Memory Systems Page based Shared-variable based Consistency Models Strict consistency Sequential consistency Release consistency


slide-1
SLIDE 1

Operating Systems DSM

Distributed Shared Memory

  • Distributed Shared Memory Systems

– Page based – Shared-variable based

  • Consistency Models

– Strict consistency – Sequential consistency – Release consistency

Distributed Shared Memory

  • Distributed Shared Memory (DSM): have collection of workstations share

a single, virtual address space.

  • The main objective of DSM:

– to alleviate the burden on the programmer – by hiding the fact that physical memory is distributed and not accessible in its entirety to all processors.

  • DSM creates the illusion of a single shared memory.

– much like a virtual memory creates the illusion of a memory that is larger than the available physical memory.

  • Vanilla implementation:

– references to local pages done in hardware. – references to remote page cause HW page fault; trap to OS; load the page from remote; restart faulting instruction.

  • Optimizations:

– share only selected portions of memory. – replicate shared variables on multiple machines.

slide-2
SLIDE 2

Operating Systems DSM

Page-Based DSM

  • NUMA (Non-Uniform Memory Access)

– processor can directly reference local and remote memory locations – no software intervention

  • Workstations on network

– can only reference local memory

  • Goal of DSM

– add software to allow NOWs (Network of Workstations) to run multiprocessor code – simplicity of programming NUMA Machine

NUMA Node NUMA Node

slide-3
SLIDE 3

Operating Systems DSM

Shared-Variable DSM

  • Is it necessary to share entire address space?
  • Share individual variables.
  • more variety in possible update algorithms for replicated

variables

  • opportunity to eliminate false sharing

Design Issues

  • Replication

– replicate read-only portions – replicate read and write portions

  • Granularity

– restriction: memory portions multiples of pages – pros of large portions:

  • amortize protocol overhead
  • locality of reference

– cons of large portions

  • false sharing!

A B Processor 1 code using A A B Processor 2 code using B

slide-4
SLIDE 4

Operating Systems DSM

Basic Design

  • Emulate cache of multiprocessor using the MMU and system

software

1 2 3 4 5 6 7 8 9 10 11 12 13 14 2 5 9 CPU 1 3 6 8 CPU 4 7 11 12 CPU 15 10 14 13 15 CPU

Design Issues (cont)

  • Update Options: Write-Update vs. Write-Invalidate
  • Write-Update:

– Writes made locally are multicast to all copies of the data item. – Multiple writers can share same data item. – Consistency depends on multicast protocol.

  • E.g. Sequential consistency achieved with totally ordered multicast.

– Reads are cheap

  • Write-Invalidate

– Distinguish between read-only (multiple copies possible) and writable (only one copy possible). – When process attempts to write to remote or replicated item, it first sends a multicast message to invalidate copies; If necessary, get copy

  • f item.

– Ensures sequential consistency.

slide-5
SLIDE 5

Operating Systems DSM

Protocol for Handling DSM Pages (only one writable copy)

Transfer remote writable copy to local memory Writable Remote Write Downgrade page to read-only; Make local read-only copy Writable Remote Read Writable Local Write Writable Local Read Invalidate remote objects; Make local writable copy Read-only Remote Write Make local read-only copy Read-only Remote Read Invalidate remote copies; Upgrade local copy to writable Read-only Local Write Read-only Local Read

Actions Taken Before Local Read/Write Page Status Page Location Operation

Design Issues (cont)

  • Finding the Owner

– broadcast request for owner

  • combine request with requested operation
  • problem: broadcast effects all participants (interrupts all processors), uses

network bandwidth

– page manager

  • possible hot spot
  • multiple page manager, hash on page address

– probable owner

  • each process keeps track of probable owner
  • Update probable owner whenever

– Process transfers ownership of a page – Process handles invalidation request for a page – Process receives read access for a page from another process – Process receives request for page it does not own (forwards request to probable owner and resets probable owner to requester)

  • periodically refresh information about current owners
slide-6
SLIDE 6

Operating Systems DSM

Probable Owner Chains

B B A A C C D D E E

  • wner

B B A A C C D D E E

  • wner

A issues write B B A A C C D D E E

  • wner

A issues read B B A A C C D D E E

  • wner

alternatively …

Design Issues (cont)

  • Finding the copies

– How to find the copies when they must be invalidated – broadcast requests

  • what when broadcasts are not reliable?

– copysets

  • maintained by page manager or by owner

3 4 CPU 2 3 4 CPU 1 2 3 CPU 2 3 CPU

  • 1

CPU 4

slide-7
SLIDE 7

Operating Systems DSM

Consistency Models

  • Single copy of writable page:

–simple, but expensive for heavily shared pages

  • Multiple copies of writable page

–how to keep pages consistent?

  • Perfect consistency is expensive.
  • How to relax consistency requirements?
  • Consistency model

Contract between application and memory. If application agrees to obey certain rules, memory promises to work correctly.

Consistency Models

  • Strict consistency:

– A DSM is said to obey strict consistency if reading a variable x always returns the value written to x by the most recently executed write operation.

  • Sequential consistency:

– A DSM is said to obey sequential consistency if the sequence of values read by the different processes corresponds to some sequential interleaved execution of the same processes.

  • Release consistency
slide-8
SLIDE 8

Operating Systems DSM

Strict Consistency

  • Most stringent consistency model:

Any read to a memory location x returns the value stored by the most recent write operation to x.

  • strict consistency observed in uni-processor systems.
  • has come to be expected by uni-processor programmers

– very unlikely to be supported by any multiprocessor

  • All writes are immediately visible by all processes
  • Requires that absolute global time order is maintained
  • Example of strict consistency:

P1: x = 1; a1 = x; x = 2; b1 = x; P2: a2 = x; b2 = x; Initial: x = 0 Real time: t1 t2 t3 t4

Sequential Consistency

  • Strict consistency impossible to implement.
  • Programmers can manage with weaker models.
  • Sequential consistency [Lamport 79]

The result of any execution is the same as if the operations of all processors were executed in some sequential order, and the

  • perations of each individual processor appear in this sequence in the
  • rder specified by its program.
  • Any valid interleaving is acceptable, but all processes must

see the same sequence of memory references.

  • Sequential consistency does not guarantee that read returns

value written by another process anytime earlier.

  • Results are not deterministic.
slide-9
SLIDE 9

Operating Systems DSM

Strict Versus Sequential Consistency

P1: x = 1; a1 = x; x = 2; b1 = x; P2: a2 = x; b2 = x; Initial: x = 0 Real time: t1 t2 t3 t4

  • Under strict consistency, both processes, P1 and P2, must see

1. x = 1 with their first read operation 2. x = 2 with the second

  • Under sequential consistency, result is considered correct as

long as: 1. The two read operations of P1 always return the values 1 and 2, respectively. 2. The two read operations of P2 return any of the following combinations of values: 0 0, 0 1, 1 1, 1 2, 2 2.

Release Consistency

  • Operations:

– acquire critical region: C.S. is about to be entered.

  • make sure that local copies of variables are made consistent with remote
  • nes.

– release critical region: C.S. has just been exited.

  • propagate shared variables to other machines.

– Operations may apply to a subset of shared variables

x P1 lock; x = 1; unlock; x P2 lock; a = x; unlock;

  • cause

propagation delayed until P1 unlocks

slide-10
SLIDE 10

Operating Systems DSM

Release Consistency (cont)

  • Possible implementation

– Acquire:

  • 1. Send request for lock to synchronization processor; wait until granted.
  • 2. Issue arbitrary read/writes to/from local copies.

– Release:

  • 1. Send modified data to other machines.
  • 2. Wait for acknowledgements.
  • 3. Inform synchronization processor about release.

– Operations on different locks happen independently.

  • Release consistency:
  • 1. Before an ordinary access to a shared variable is performed, all

previous acquires done by the process must have completed successfully.

  • 2. Before a release is allowed to be performed, all previous reads and

writes done by the process must have completed.

Entry Consistency

  • Operations:

– acquire critical region: C.S. is about to be entered.

  • imports only those shared variables pertaining to the current lock
  • unnecessary overhead can be eliminated
  • whereas lazy release consistency imports all shared variables

– release critical region: C.S. has just been exited. x P1 lock; x = 1; unlock; x P2 lock; a = x; unlock;

  • cause

propagation delayed until P1 unlocks

slide-11
SLIDE 11

Operating Systems DSM

Consistency Models: Summary

Consistency Description Strict Absolute time ordering of all shared accesses matters Sequential All processes see all shared accesses in the same order Release Shared data are made consistent when a critical region is exited