User-Level Interprocess Communication for Shared Memory - - PowerPoint PPT Presentation
User-Level Interprocess Communication for Shared Memory - - PowerPoint PPT Presentation
User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad Thomas E. Anderson Edward D. Lazowska Henry M. Levy Presented by: Dan Lake Introduction IPC is central to operating system design IPC is central
Introduction
- IPC is central to operating system design
IPC is central to operating system design
- Advantages of Decomposed Systems
Advantages of Decomposed Systems
- Failure Isolation (address space boundaries)
Failure Isolation (address space boundaries)
- Extensibility (add new modules)
Extensibility (add new modules)
- Modularity (interfaces enforced)
Modularity (interfaces enforced)
- Kernel traditionally responsible for IPC
Kernel traditionally responsible for IPC
- Kernel-based IPC has problems
Kernel-based IPC has problems
- Architectural performance barriers (LRPC 70%)
Architectural performance barriers (LRPC 70%)
- Interaction of kernel-IPC and user-level threads
Interaction of kernel-IPC and user-level threads
- Strong interdependencies
Strong interdependencies
- Cost of partitioning these facilities is high
Cost of partitioning these facilities is high
Solution For Shared Memory Multiprocessors
- URPC (User Remote Procedure Calls)
URPC (User Remote Procedure Calls)
- Separete three components of IPC
Separete three components of IPC
a) Data transfer Data transfer b) Thread management Thread management c) Processor reallocation Processor reallocation
- Goals
Goals
- Move
Move a a & & b b into user-level into user-level
- Limit the kernel to performing only
Limit the kernel to performing only c c
- Eliminate kernel from cross-address space
Eliminate kernel from cross-address space communication communication
Message Passing
- Logical channels of pair-wise
Logical channels of pair-wise shared memory shared memory
- Channels created & mapped
Channels created & mapped
- nce
- nce for every client/server
for every client/server pairing pairing
- Channels are bi-directional
Channels are bi-directional
- TSL controlls access in
TSL controlls access in either direction either direction
- Just as secure as going
Just as secure as going through the kernel through the kernel
Data & Security
- Applications access URPC procedures through
Applications access URPC procedures through Stubs layer Stubs layer
- Stubs unmarshal data into procedure
Stubs unmarshal data into procedure parameters parameters
- Stubs copy data in/out, no direct use of shared
Stubs copy data in/out, no direct use of shared memory memory
- Arguments are passed in buffers that are
Arguments are passed in buffers that are allocated and pair-wise mapped during binding allocated and pair-wise mapped during binding
- Data queues monitored by application level
Data queues monitored by application level thread management thread management
Thread Management
- LRPC: client threads always cross address-
LRPC: client threads always cross address- space to server space to server
- URPC: always try to reschedule another thread
URPC: always try to reschedule another thread within address-space within address-space
- Switching threads within the same address
Switching threads within the same address space requires less overhead than processor space requires less overhead than processor reallocation reallocation
- Synchronous from programmer pov, but
Synchronous from programmer pov, but asynchronous to thread mgmt level asynchronous to thread mgmt level
Processor Reallocation
- Switching the processor between threads of
Switching the processor between threads of different address spaces different address spaces
- Requires privileged kernel mode to access
Requires privileged kernel mode to access protected mapping registers protected mapping registers
- Does include significant overhead
Does include significant overhead
- As pointed out in the LRPC paper
As pointed out in the LRPC paper
- URPC strives to avoid processor reallocation
URPC strives to avoid processor reallocation
- This avoidance can lead to substantial
This avoidance can lead to substantial performance gains performance gains
Optimistic Scheduling Policy
- Assumptions
Assumptions
- Client has other work to do
Client has other work to do
- Server will soon have a processor to service a
Server will soon have a processor to service a message message
Sample Execution Timeline
Optimistic Reallocation Scheduling Policy
pending outgoing messages detected processor donated FCMgr “Underpowered”
Why the optimistic approach doesn’t always hold
- This approach does not work as well when the
This approach does not work as well when the application application
- Runs as a single thread
Runs as a single thread
- Is Real time
Is Real time
- Has high latency I/O
Has high latency I/O
- Priority Invocations
Priority Invocations
- URPC solves some of these problems by
URPC solves some of these problems by allowing forced processor reallocation even if allowing forced processor reallocation even if there is still work to do there is still work to do
Kernel Handles Processor Reallocation
- URPC handles this through call called
URPC handles this through call called “ “Processor.Donate” Processor.Donate”
- This passes control of an idle processor down
This passes control of an idle processor down to the kernel, and then back up to a specified to the kernel, and then back up to a specified address in the receiving space address in the receiving space
Voluntary Return of Processors
- The policy of a URPC server process:
The policy of a URPC server process: “… “…Upon receipt of a processor from a client Upon receipt of a processor from a client address, return the processor when all address, return the processor when all
- utstanding messages from the client have
- utstanding messages from the client have
generated replies, or when the server generated replies, or when the server determines that the client has become determines that the client has become ‘underpowered’….” ‘underpowered’….”
Parallels to User Threads Paper
- Even though URPC implement a policy/protocol,
Even though URPC implement a policy/protocol, there is absolutely no way to enforce it. This there is absolutely no way to enforce it. This has the potential to lead to some interesting has the potential to lead to some interesting side effects. side effects.
- This is similar to some of the problems
This is similar to some of the problems discussed in the User Threads paper discussed in the User Threads paper
- For example, a server thread could conceivably
For example, a server thread could conceivably continue to hold a donated processor and handle continue to hold a donated processor and handle requests from other clients requests from other clients
What this leads to…
- Starvation
Starvation
- URPC handles this by only directly reallocating
URPC handles this by only directly reallocating processors to load balance. processors to load balance.
- The system also needs the notion of preemptive
The system also needs the notion of preemptive reallocation reallocation
- The Preemptive reallocation must also adhere to
The Preemptive reallocation must also adhere to
- No higher priority thread waits while a lower priority thread
No higher priority thread waits while a lower priority thread runs runs
- No processor idles when there is work for it to do (even if
No processor idles when there is work for it to do (even if the work is in another address space) the work is in another address space)
Performance
Note: Table II results are independent of load
Performance
- Latency is proportional to the number of threads per cpu’s
Latency is proportional to the number of threads per cpu’s
- T = C = S = 1 call latency is 93 microseconds
T = C = S = 1 call latency is 93 microseconds
- T = 2, C =1, S = 1, latency increases to 112 microseconds however,
T = 2, C =1, S = 1, latency increases to 112 microseconds however, throughput raises 75% (benefits of parallelism) throughput raises 75% (benefits of parallelism)
- Call latency effectively reduced to 53 microseconds
Call latency effectively reduced to 53 microseconds
- C = 1, S = 0, worst performance
C = 1, S = 0, worst performance
- In both cases, C = 2, S = 2 yields best performance
In both cases, C = 2, S = 2 yields best performance
Performance
- Worst case URPC latency for one thread is 375 us
Worst case URPC latency for one thread is 375 us
- Similar hardware, LRPC call latency is 157 us
Similar hardware, LRPC call latency is 157 us
- Reasons:
Reasons:
- URPC requires two level scheduling
URPC requires two level scheduling
- URPC ‘s low level scheduling is done by LRPC
URPC ‘s low level scheduling is done by LRPC
- Small price considering possible gains, this is necessary
Small price considering possible gains, this is necessary to have high level scheduling to have high level scheduling
Conclusions
- Performance gained by moving features out of
Performance gained by moving features out of the kernel not vice-versa the kernel not vice-versa
- URPC represents appropriate division for
URPC represents appropriate division for
- perating system kernels of shared memory
- perating system kernels of shared memory
multiprocessors multiprocessors
- URPC showcases a design specific to a