User-Level Interprocess Communication for Shared Memory - - PowerPoint PPT Presentation

user level interprocess communication for shared memory
SMART_READER_LITE
LIVE PREVIEW

User-Level Interprocess Communication for Shared Memory - - PowerPoint PPT Presentation

User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad Thomas E. Anderson Edward D. Lazowska Henry M. Levy Presented by: Dan Lake Introduction IPC is central to operating system design IPC is central


slide-1
SLIDE 1

User-Level Interprocess Communication for Shared Memory Multiprocessors

Brian N. Bershad Thomas E. Anderson Edward D. Lazowska Henry M. Levy Presented by: Dan Lake

slide-2
SLIDE 2

Introduction

  • IPC is central to operating system design

IPC is central to operating system design

  • Advantages of Decomposed Systems

Advantages of Decomposed Systems

  • Failure Isolation (address space boundaries)

Failure Isolation (address space boundaries)

  • Extensibility (add new modules)

Extensibility (add new modules)

  • Modularity (interfaces enforced)

Modularity (interfaces enforced)

  • Kernel traditionally responsible for IPC

Kernel traditionally responsible for IPC

  • Kernel-based IPC has problems

Kernel-based IPC has problems

  • Architectural performance barriers (LRPC 70%)

Architectural performance barriers (LRPC 70%)

  • Interaction of kernel-IPC and user-level threads

Interaction of kernel-IPC and user-level threads

  • Strong interdependencies

Strong interdependencies

  • Cost of partitioning these facilities is high

Cost of partitioning these facilities is high

slide-3
SLIDE 3

Solution For Shared Memory Multiprocessors

  • URPC (User Remote Procedure Calls)

URPC (User Remote Procedure Calls)

  • Separete three components of IPC

Separete three components of IPC

a) Data transfer Data transfer b) Thread management Thread management c) Processor reallocation Processor reallocation

  • Goals

Goals

  • Move

Move a a & & b b into user-level into user-level

  • Limit the kernel to performing only

Limit the kernel to performing only c c

  • Eliminate kernel from cross-address space

Eliminate kernel from cross-address space communication communication

slide-4
SLIDE 4

Message Passing

  • Logical channels of pair-wise

Logical channels of pair-wise shared memory shared memory

  • Channels created & mapped

Channels created & mapped

  • nce
  • nce for every client/server

for every client/server pairing pairing

  • Channels are bi-directional

Channels are bi-directional

  • TSL controlls access in

TSL controlls access in either direction either direction

  • Just as secure as going

Just as secure as going through the kernel through the kernel

slide-5
SLIDE 5

Data & Security

  • Applications access URPC procedures through

Applications access URPC procedures through Stubs layer Stubs layer

  • Stubs unmarshal data into procedure

Stubs unmarshal data into procedure parameters parameters

  • Stubs copy data in/out, no direct use of shared

Stubs copy data in/out, no direct use of shared memory memory

  • Arguments are passed in buffers that are

Arguments are passed in buffers that are allocated and pair-wise mapped during binding allocated and pair-wise mapped during binding

  • Data queues monitored by application level

Data queues monitored by application level thread management thread management

slide-6
SLIDE 6

Thread Management

  • LRPC: client threads always cross address-

LRPC: client threads always cross address- space to server space to server

  • URPC: always try to reschedule another thread

URPC: always try to reschedule another thread within address-space within address-space

  • Switching threads within the same address

Switching threads within the same address space requires less overhead than processor space requires less overhead than processor reallocation reallocation

  • Synchronous from programmer pov, but

Synchronous from programmer pov, but asynchronous to thread mgmt level asynchronous to thread mgmt level

slide-7
SLIDE 7

Processor Reallocation

  • Switching the processor between threads of

Switching the processor between threads of different address spaces different address spaces

  • Requires privileged kernel mode to access

Requires privileged kernel mode to access protected mapping registers protected mapping registers

  • Does include significant overhead

Does include significant overhead

  • As pointed out in the LRPC paper

As pointed out in the LRPC paper

  • URPC strives to avoid processor reallocation

URPC strives to avoid processor reallocation

  • This avoidance can lead to substantial

This avoidance can lead to substantial performance gains performance gains

slide-8
SLIDE 8

Optimistic Scheduling Policy

  • Assumptions

Assumptions

  • Client has other work to do

Client has other work to do

  • Server will soon have a processor to service a

Server will soon have a processor to service a message message

slide-9
SLIDE 9

Sample Execution Timeline

Optimistic Reallocation Scheduling Policy

pending outgoing messages detected  processor donated  FCMgr “Underpowered”

slide-10
SLIDE 10

Why the optimistic approach doesn’t always hold

  • This approach does not work as well when the

This approach does not work as well when the application application

  • Runs as a single thread

Runs as a single thread

  • Is Real time

Is Real time

  • Has high latency I/O

Has high latency I/O

  • Priority Invocations

Priority Invocations

  • URPC solves some of these problems by

URPC solves some of these problems by allowing forced processor reallocation even if allowing forced processor reallocation even if there is still work to do there is still work to do

slide-11
SLIDE 11

Kernel Handles Processor Reallocation

  • URPC handles this through call called

URPC handles this through call called “ “Processor.Donate” Processor.Donate”

  • This passes control of an idle processor down

This passes control of an idle processor down to the kernel, and then back up to a specified to the kernel, and then back up to a specified address in the receiving space address in the receiving space

slide-12
SLIDE 12

Voluntary Return of Processors

  • The policy of a URPC server process:

The policy of a URPC server process: “… “…Upon receipt of a processor from a client Upon receipt of a processor from a client address, return the processor when all address, return the processor when all

  • utstanding messages from the client have
  • utstanding messages from the client have

generated replies, or when the server generated replies, or when the server determines that the client has become determines that the client has become ‘underpowered’….” ‘underpowered’….”

slide-13
SLIDE 13

Parallels to User Threads Paper

  • Even though URPC implement a policy/protocol,

Even though URPC implement a policy/protocol, there is absolutely no way to enforce it. This there is absolutely no way to enforce it. This has the potential to lead to some interesting has the potential to lead to some interesting side effects. side effects.

  • This is similar to some of the problems

This is similar to some of the problems discussed in the User Threads paper discussed in the User Threads paper

  • For example, a server thread could conceivably

For example, a server thread could conceivably continue to hold a donated processor and handle continue to hold a donated processor and handle requests from other clients requests from other clients

slide-14
SLIDE 14

What this leads to…

  • Starvation

Starvation

  • URPC handles this by only directly reallocating

URPC handles this by only directly reallocating processors to load balance. processors to load balance.

  • The system also needs the notion of preemptive

The system also needs the notion of preemptive reallocation reallocation

  • The Preemptive reallocation must also adhere to

The Preemptive reallocation must also adhere to

  • No higher priority thread waits while a lower priority thread

No higher priority thread waits while a lower priority thread runs runs

  • No processor idles when there is work for it to do (even if

No processor idles when there is work for it to do (even if the work is in another address space) the work is in another address space)

slide-15
SLIDE 15

Performance

Note: Table II results are independent of load

slide-16
SLIDE 16

Performance

  • Latency is proportional to the number of threads per cpu’s

Latency is proportional to the number of threads per cpu’s

  • T = C = S = 1 call latency is 93 microseconds

T = C = S = 1 call latency is 93 microseconds

  • T = 2, C =1, S = 1, latency increases to 112 microseconds however,

T = 2, C =1, S = 1, latency increases to 112 microseconds however, throughput raises 75% (benefits of parallelism) throughput raises 75% (benefits of parallelism)

  • Call latency effectively reduced to 53 microseconds

Call latency effectively reduced to 53 microseconds

  • C = 1, S = 0, worst performance

C = 1, S = 0, worst performance

  • In both cases, C = 2, S = 2 yields best performance

In both cases, C = 2, S = 2 yields best performance

slide-17
SLIDE 17

Performance

  • Worst case URPC latency for one thread is 375 us

Worst case URPC latency for one thread is 375 us

  • Similar hardware, LRPC call latency is 157 us

Similar hardware, LRPC call latency is 157 us

  • Reasons:

Reasons:

  • URPC requires two level scheduling

URPC requires two level scheduling

  • URPC ‘s low level scheduling is done by LRPC

URPC ‘s low level scheduling is done by LRPC

  • Small price considering possible gains, this is necessary

Small price considering possible gains, this is necessary to have high level scheduling to have high level scheduling

slide-18
SLIDE 18

Conclusions

  • Performance gained by moving features out of

Performance gained by moving features out of the kernel not vice-versa the kernel not vice-versa

  • URPC represents appropriate division for

URPC represents appropriate division for

  • perating system kernels of shared memory
  • perating system kernels of shared memory

multiprocessors multiprocessors

  • URPC showcases a design specific to a

URPC showcases a design specific to a multiprocessor, not just uniprocessor design multiprocessor, not just uniprocessor design that runs on multiprocessor hardware that runs on multiprocessor hardware