Distributed Systems Distributed Shared Memory Paul Krzyzanowski - - PowerPoint PPT Presentation

distributed systems
SMART_READER_LITE
LIVE PREVIEW

Distributed Systems Distributed Shared Memory Paul Krzyzanowski - - PowerPoint PPT Presentation

Distributed Systems Distributed Shared Memory Paul Krzyzanowski pxk@cs.rutgers.edu Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License. Page 1 Page 1 Motivation SMP


slide-1
SLIDE 1

Page 1 Page 1

Distributed Shared Memory

Paul Krzyzanowski pxk@cs.rutgers.edu

Distributed Systems

Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.

slide-2
SLIDE 2

Page 2

Motivation

SMP systems

– Run parts of a program in parallel – Share single address space

  • Share data in that space

– Use threads for parallelism – Use synchronization primitives to prevent race conditions

Can we achieve this with multicomputers?

– All communication and synchronization must be done with messages

slide-3
SLIDE 3

Page 3

Distributed Shared Memory (DSM)

Goal: allow networked computers to share a region of virtual memory

  • How do you make a distributed memory

system appear local?

  • Physical memory on each node used to hold

pages of shared virtual address space. Processes address it like local memory.

slide-4
SLIDE 4

Page 4

Take advantage of the MMU

  • Page table entry for a page is valid if the page

is held (cached) locally

  • Attempt to access non-local page leads to a

page fault

  • Page fault handler

– Invokes DSM protocol to handle fault – Fault handler brings page from remote node

  • Operations are transparent to programmer

– DSM looks like any other virtual memory system

slide-5
SLIDE 5

Page 5

Simplest design

Each page of virtual address space exists on

  • nly one machine at a time
  • no caching
slide-6
SLIDE 6

Page 6

Simplest design

On page fault:

– Consult central server to find which machine is currently holding the page – Directory

Request the page from the current owner:

– Current owner invalidates PTE – Sends page contents – Recipient allocates frame, reads page, sets PTE – Informs directory of new location

slide-7
SLIDE 7

Page 7

Problem

Directory becomes a bottleneck

– All page query requests must go to this server

Solution

– Distributed directory – Distribute among all processors – Each node responsible for portion of address space – Find responsible processor:

  • hash(page#) mod num_processors
slide-8
SLIDE 8

Page 8

P0

Distributed Directory

Page Location 0000 P3 0004 P1 0008 P1 000C P2 … …

P2

Page Location 0002 P3 0006 P1 000A P0 000E

P3

Page Location 0003 P3 0007 P1 000B P2 000F

P1

Page Location 0001 P3 0005 P1 0009 P0 000D P2 … …

slide-9
SLIDE 9

Page 9

Design Considerations: granularity

  • Memory blocks are typically a multiple of a node’s

page size to integrate with VM system

  • Large pages are good

– Cost of migration amortized over many localized accesses

  • BUT

– Increases chances that multiple objects reside in one page

  • Thrashing

(page data ping-pongs between multiple machines)

  • False sharing

(unrelated data happens to live on the same page, resulting in a need for the page to be shared)

slide-10
SLIDE 10

Page 10

Design Considerations: replication

What if we allow copies of shared pages on multiple nodes?

  • Replication (caching) reduces average cost of

read operations

– Simultaneous reads can be executed locally across hosts

  • Write operations become more expensive

– Cached copies need to be invalidated or updated

  • Worthwhile if reads/writes ratio is high
slide-11
SLIDE 11

Page 11

Replication

Multiple readers, single writer

– One host can be granted a read-write copy – Or multiple hosts granted read-only copies

slide-12
SLIDE 12

Page 12

Replication

Read operation:

If page not local

  • Acquire read-only copy of the page
  • Set access writes to read-only on any writeable copy on
  • ther nodes

Write operation:

If page not local or no write permission

  • Revoke write permission from other writable copy

(if exists)

  • Get copy of page from owner (if needed)
  • Invalidate all copies of the page at other nodes
slide-13
SLIDE 13

Page 13

Full replication

Extend model

– Multiple hosts have read/write access – Need multiple-readers, multiple-writers protocol – Access to shared data must be controlled to maintain consistency

slide-14
SLIDE 14

Page 14

Dealing with replication

  • Keep track of copies of the page

– Directory with single node per page not enough – Keep track of copyset

  • Set of all systems that requested copies
  • On getting a request for a copy of a page:

– Directory adds requestor to copyset – Page owner sends page contents to requestor

  • On getting a request to invalidate page:

– Directory issues invalidation requests to all nodes in copyset and wait for acknowledgements

slide-15
SLIDE 15

Page 15

How do you propagate changes?

  • Send entire page

– Easiest, but may be a lot of data

  • Send differences

– Local system must save original and compute differences

slide-16
SLIDE 16

Page 16

Home-based algorithms

Home-based

– A node (usually first writer) is chosen to be the home of the page – On write, a non-home node will send changes to the home node.

  • Other cached copies invalidated

– On read, a non-home node will get changes (or page) from home node

Non-home-based

– Node will always contact the directory to find the current owner (latest copy) and obtain page from there

slide-17
SLIDE 17

Page 17

Consistency Model

Definition of when modifications to data may be seen at a given processor Defines how memory will appear to a programmer

Places restrictions on what values can be returned by a read of a memory location

slide-18
SLIDE 18

Page 18

Consistency Model

Must be well-understood

– Determines how a programmer reasons about the correctness of a program – Determines what hardware and compiler

  • ptimizations may take place
slide-19
SLIDE 19

Page 19

Sequential Semantics

Provided by most (uniprocessor) programming languages/systems Program order

The result of any execution is the same as if the

  • perations of all processors were executed in some

sequential order and the operations of each individual processor appear in this sequence in the order specified by the program.

― Leslie Lamport

slide-20
SLIDE 20

Page 20

Sequential Semantics

Requirements:

– All memory operations must execute one at a time – All operations of a single processor appear to execute in program order – Interleaving among processors is OK

slide-21
SLIDE 21

Page 21

Sequential semantics

P0 P1 P2 P3 P4

memory

slide-22
SLIDE 22

Page 22

Achieving sequential semantics

Illusion is efficiently supported in uniprocessor systems

– Execute operations in program order when they are to the same location or when one controls the execution of another – Otherwise, compiler or hardware can reorder

Compiler:

– Register allocation, code motion, loop transformation, …

Hardware:

– Pipelining, multiple issue, VLIW, …

slide-23
SLIDE 23

Page 23

Achieving sequential consistency

Processor must ensure that the previous memory operation is complete before proceeding with the next one Program order requirement

– Determining completion of write operations

  • get acknowledgement from memory system

– If caching used

  • Write operation must send invalidate or update messages

to all cached copies

  • ALL these messages must be acknowledged
slide-24
SLIDE 24

Page 24

Achieving sequential consistency

All writes to the same location must be visible in the same order by all processes Write atomicity requirement

– Value of a write will not be returned by a read until all updates/invalidates are acknowledged

  • hold off on read requests until write is complete

– Totally ordered reliable multicast

slide-25
SLIDE 25

Page 25

Improving performance

Break rules to achieve better performance

– But compiler and/or programmer should know what’s going on!

Goals:

– combat network latency – reduce number of network messages

Relaxing sequential consistency

– Weak consistency models

slide-26
SLIDE 26

Page 26

Relaxed (weak) consistency

Relax program order between all operations to memory

– Read/writes to different memory operations can be reordered

Consider:

– Operation in critical section (shared) – One process reading/writing – Nobody else accessing until process leaves critical section

No need to propagate writes sequentially or at all until process leaves critical section

slide-27
SLIDE 27

Page 27

Synchronization variable (barrier)

  • Operation for synchronizing memory
  • All local writes get propagated
  • All remote writes are brought in to the local

processor

  • Block until memory synchronized
slide-28
SLIDE 28

Page 28

Consistency guarantee

  • Access to synchronization variables are

sequentially consistent

– All processes see them in the same order

  • No access to a synchronization variable can

be performed until all previous writes have completed

  • No read or write permitted until all previous

accesses to synchronization variables are performed

– Memory is updated during sync

slide-29
SLIDE 29

Page 29

Problems with sync consistency

  • Inefficiency

– Are we synchronizing because the process finished memory accesses or is about to start?

  • On a sync, systems must make sure that:

– All locally-initiated writes have completed – All remote writes have been acquired

slide-30
SLIDE 30

Page 30

Can we do better?

Separate synchronization into two stages:

  • 1. acquire access

Obtain valid copies of pages

  • 2. release access

Send invalidations or updates for shared pages that were modified locally to nodes that have copies.

acquire(R) // start of critical section Do stuff release(R) // end of critical section

Eager Release Consistency (ERC)

slide-31
SLIDE 31

Page 31

Getting Lazy

The release operation requires:

– Sending invalidations to copyset nodes – And waiting for all to acknowledge

Do not make modifications visible globally at release

On release:

– Send invalidation only to directory

  • r send updates to home node (owner of page)

On acquire: this is where modifications are propagated

– Check with directory to see whether it needs a new copy

  • Chances are not every node will need to do an acquire

Reduces message traffic on releases Lazy Release Consistency (LRC)

slide-32
SLIDE 32

Page 32

Finer granularity

Release consistency

– Synchronizes all data – No relation between lock and data

Use object granularity instead of page granularity

– Each variable or group of variables can have a synchronization variable – Propagate only writes performed in those sections – Cannot rely on OS and MMU anymore

  • Need smart compilers

Entry Consistency

slide-33
SLIDE 33

Page 34 Page 34

The end.