Multiprocessor OS 2003 1 Multiple processor systems Why? Clock - - PDF document

multiprocessor
SMART_READER_LITE
LIVE PREVIEW

Multiprocessor OS 2003 1 Multiple processor systems Why? Clock - - PDF document

Multiprocessor OS 2003 1 Multiple processor systems Why? Clock speed limit: 10GHz 2cm chip size 100GHz 2mm chip size 1THz <100 m chip size In practice, we could put many processors together OS 2003 2


slide-1
SLIDE 1

OS 2003 1

Multiprocessor

OS 2003 2

Multiple processor systems

  • Why? Clock speed limit:

– 10GHz → 2cm chip size – 100GHz → 2mm chip size – 1THz → <100µm chip size

  • In practice, we could put many

processors together

slide-2
SLIDE 2

OS 2003 3

Architectures

Shared memory c c c c c c c c

Interconnect

c c c c c c c c

M M M M M M M M

Internet

c c c c c c c c

M M M M M M M M

Shared memory Tightly coupled Loosely coupled

OS 2003 4

What does each process sees

  • A process running on a CPU sees:

– Usual virtual memory (paged) – Can write in memory and read back a different value (another process changed it)

  • IPC

– Organize shared memory (OS)

slide-3
SLIDE 3

OS 2003 5

BUS based MP architecture

Bus CPU CPU Mem

2 CPUs is fine, 64 is not bus contention

Bus CPU CPU Mem

cache

Bus CPU CPU Mem

cache private memory

OS 2003 6

Cache

  • Try to keep most used pages (lines usually) in

cache

  • When memory in changed (written), other

caches need to be notified of the change

  • There are specific cache transfer protocols
  • If local memory is present the compiler should

do a good job at separating what goes in main memory versus what stays in local memory

slide-4
SLIDE 4

OS 2003 7

Crossbar switches

M M M M M M P P P P P P Processor n connects to memory q 1 N 1 Q No conflicts (if memory is available)

OS 2003 8

UMA, NUMA classes

  • UMA (uniform memory access):

– Uniform access, read/write – Memory accesses have all the same characteristics

  • NUMA (non-uniform):

– Single address space visible to all CPUs – Access to remote memory is slower than local – E.g. 100 processors, difficult, then something has to give, in practice the uniform access time is the tradeoff

slide-5
SLIDE 5

OS 2003 9

How it works

  • The memory is split between nodes
  • Clearly the access to a remote node’s

local memory is slower

  • A request from one of the nodes has to

either go to the bus, possibly cached, or to the local memory

  • Caches need to be up-to-date all the time

OS 2003 10

OS types

  • One OS in each CPU, N CPUs operate as N

independent computes

– What happens in loosely coupled MP systems – No much sharing of memory, CPU cycles, etc. between processes (e.g. a CPU loaded while others idle)

  • Master-slave

– Single OS, allocating CPUs and memory – Single data structures (memory page tables, process tables, etc.) – Only the master runs the OS

  • Symmetric multiprocessing (e.g. Windows, Linux)

– SMP, each CPU can run the OS – Make sure updates of e.g. page tables are done consistently (mutexes, different parts and different critical regions within the OS)

slide-6
SLIDE 6

OS 2003 11

MP synchronization

  • Appropriate synchronization procedure are

needed

– Disabling interrupt doesn’t work

  • TSL instruction, locking also the BUS while

reading/writing atomically

– Lock_bus – Read, Write – Unlock_bus

  • Otherwise, if the bus doesn’t support TSL,

there’s always Peterson’s solution

OS 2003 12

MP scheduling

  • 2-D problem

– Multiplexing in time (time sharing) – Multiplexing in space (space sharing)

  • Multiple threads in parallel on different CPUs
  • Take decisions also on how much processes and

groups are related

– Different users might start different processes – Same user starting a group of processes

  • The scheduler should avoid blocking CPUs simply

because a process is holding a lock

  • Also, it might make sense to keep the same process

recurrently running on the same processor

slide-7
SLIDE 7

OS 2003 13

Scheduling (for time-sharing)

  • Give additional quanta to processes

holding (global) locks to avoid blocking

  • ther CPUs

– Smart scheduling

  • CPU affinity

– Keep the same process on the same CPU to exploit cache at best

OS 2003 14

Scheduling (for space sharing)

  • Schedule multiple threads (of a single

process) in parallel to many CPUs at

  • nce
  • In pure space-sharing there’s no

multiprogramming on the CPUs

– E.g. if we have 64K processors there’s no much need of multiprogramming

  • Mix of space and time sharing
slide-8
SLIDE 8

OS 2003 15

Hyper-threading

  • PIV processors
  • Execute 2 threads at once
  • Since many instructions do different

things they also use different subset of the CPU

  • Idea! Why not keep most of the CPU

always busy by allowing the execution of another thread

  • This is clearly all done in hardware

OS 2003 16

Software

  • Send/receive model

– Two blocking calls – send/receive messages

  • Asynchronous

– The send returns immediately – The message buffer of course cannot be modified until the message is actually sent – Buffering issues (double, triple buffering)

  • Copy on write: only copy the buffer if the code

tries to write on it

slide-9
SLIDE 9

OS 2003 17

RPC

  • Remote procedure call

client stub

OS

network server stub

OS

OS 2003 18

Complications

  • Based on RPC

– DCOM (Microsoft) – Corba (Open standard)

  • Same as RPC but object oriented
  • Language to describe parameters, functions

and objects

  • The marshaling of parameters is simpler

– Parameters need to be packed in a uniform format to be shipped across network and possibly different architectures