User-Level Interprocess Communication for Shared Memory - PowerPoint PPT Presentation

User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad Thomas E. Anderson Edward D. Lazowska Henry M. Levy Presented by: Dan Lake

Introduction • IPC is central to operating system design IPC is central to operating system design • Advantages of Decomposed Systems Advantages of Decomposed Systems • Failure Isolation (address space boundaries) Failure Isolation (address space boundaries) • Extensibility (add new modules) Extensibility (add new modules) • Modularity (interfaces enforced) Modularity (interfaces enforced) • Kernel traditionally responsible for IPC Kernel traditionally responsible for IPC • Kernel-based IPC has problems Kernel-based IPC has problems • Architectural performance barriers (LRPC 70%) Architectural performance barriers (LRPC 70%) • Interaction of kernel-IPC and user-level threads Interaction of kernel-IPC and user-level threads • Strong interdependencies Strong interdependencies • Cost of partitioning these facilities is high Cost of partitioning these facilities is high

Solution For Shared Memory Multiprocessors • URPC (User Remote Procedure Calls) URPC (User Remote Procedure Calls) • Separete three components of IPC Separete three components of IPC a) Data transfer Data transfer b) Thread management Thread management c) Processor reallocation Processor reallocation • Goals Goals • Move a a & & b b into user-level into user-level Move • Limit the kernel to performing only Limit the kernel to performing only c c • Eliminate kernel from cross-address space Eliminate kernel from cross-address space communication communication

Message Passing • Logical channels of pair-wise Logical channels of pair-wise shared memory shared memory • Channels created & mapped Channels created & mapped once once for every client/server for every client/server pairing pairing • Channels are bi-directional Channels are bi-directional • TSL controlls access in TSL controlls access in either direction either direction • Just as secure as going Just as secure as going through the kernel through the kernel

Data & Security • Applications access URPC procedures through Applications access URPC procedures through Stubs layer Stubs layer • Stubs unmarshal data into procedure Stubs unmarshal data into procedure parameters parameters • Stubs copy data in/out, no direct use of shared Stubs copy data in/out, no direct use of shared memory memory • Arguments are passed in buffers that are Arguments are passed in buffers that are allocated and pair-wise mapped during binding allocated and pair-wise mapped during binding • Data queues monitored by application level Data queues monitored by application level thread management thread management

Thread Management • LRPC: client threads always cross address- LRPC: client threads always cross address- space to server space to server • URPC: always try to reschedule another thread URPC: always try to reschedule another thread within address-space within address-space • Switching threads within the same address Switching threads within the same address space requires less overhead than processor space requires less overhead than processor reallocation reallocation • Synchronous from programmer pov, but Synchronous from programmer pov, but asynchronous to thread mgmt level asynchronous to thread mgmt level

Processor Reallocation • Switching the processor between threads of Switching the processor between threads of different address spaces different address spaces • Requires privileged kernel mode to access Requires privileged kernel mode to access protected mapping registers protected mapping registers • Does include significant overhead Does include significant overhead • As pointed out in the LRPC paper As pointed out in the LRPC paper • URPC strives to avoid processor reallocation URPC strives to avoid processor reallocation • This avoidance can lead to substantial This avoidance can lead to substantial performance gains performance gains

Optimistic Scheduling Policy • Assumptions Assumptions • Client has other work to do Client has other work to do • Server will soon have a processor to service a Server will soon have a processor to service a message message

Sample Execution Timeline Optimistic Reallocation Scheduling Policy pending outgoing messages detected  FCMgr processor donated “Underpowered” 

Why the optimistic approach doesn’t always hold • This approach does not work as well when the This approach does not work as well when the application application • Runs as a single thread Runs as a single thread • Is Real time Is Real time • Has high latency I/O Has high latency I/O • Priority Invocations Priority Invocations • URPC solves some of these problems by URPC solves some of these problems by allowing forced processor reallocation even if allowing forced processor reallocation even if there is still work to do there is still work to do

Kernel Handles Processor Reallocation • URPC handles this through call called URPC handles this through call called “ “Processor.Donate” Processor.Donate” • This passes control of an idle processor down This passes control of an idle processor down to the kernel, and then back up to a specified to the kernel, and then back up to a specified address in the receiving space address in the receiving space

Voluntary Return of Processors • The policy of a URPC server process: The policy of a URPC server process: “…Upon receipt of a processor from a client Upon receipt of a processor from a client “… address, return the processor when all address, return the processor when all outstanding messages from the client have outstanding messages from the client have generated replies, or when the server generated replies, or when the server determines that the client has become determines that the client has become ‘underpowered’….” ‘underpowered’….”

Parallels to User Threads Paper • Even though URPC implement a policy/protocol, Even though URPC implement a policy/protocol, there is absolutely no way to enforce it. This there is absolutely no way to enforce it. This has the potential to lead to some interesting has the potential to lead to some interesting side effects. side effects. • This is similar to some of the problems This is similar to some of the problems discussed in the User Threads paper discussed in the User Threads paper • For example, a server thread could conceivably For example, a server thread could conceivably continue to hold a donated processor and handle continue to hold a donated processor and handle requests from other clients requests from other clients

What this leads to… • Starvation Starvation • URPC handles this by only directly reallocating URPC handles this by only directly reallocating processors to load balance. processors to load balance. • The system also needs the notion of preemptive The system also needs the notion of preemptive reallocation reallocation • The Preemptive reallocation must also adhere to The Preemptive reallocation must also adhere to • No higher priority thread waits while a lower priority thread No higher priority thread waits while a lower priority thread runs runs • No processor idles when there is work for it to do (even if No processor idles when there is work for it to do (even if the work is in another address space) the work is in another address space)

Performance Note: Table II results are independent of load

Performance • Latency is proportional to the number of threads per cpu’s Latency is proportional to the number of threads per cpu’s • T = C = S = 1 call latency is 93 microseconds T = C = S = 1 call latency is 93 microseconds • T = 2, C =1, S = 1, latency increases to 112 microseconds however, T = 2, C =1, S = 1, latency increases to 112 microseconds however, throughput raises 75% (benefits of parallelism) throughput raises 75% (benefits of parallelism) • Call latency effectively reduced to 53 microseconds Call latency effectively reduced to 53 microseconds • C = 1, S = 0, worst performance C = 1, S = 0, worst performance • In both cases, C = 2, S = 2 yields best performance In both cases, C = 2, S = 2 yields best performance

Performance • Worst case URPC latency for one thread is 375 us Worst case URPC latency for one thread is 375 us • Similar hardware, LRPC call latency is 157 us Similar hardware, LRPC call latency is 157 us • Reasons: Reasons: • URPC requires two level scheduling URPC requires two level scheduling • URPC ‘s low level scheduling is done by LRPC URPC ‘s low level scheduling is done by LRPC • Small price considering possible gains, this is necessary Small price considering possible gains, this is necessary to have high level scheduling to have high level scheduling

User-Level Interprocess Communication for Shared Memory - PowerPoint PPT Presentation

User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad Thomas E. Anderson Edward D. Lazowska Henry M. Levy Presented by: Dan Lake Introduction IPC is central to operating system design IPC is central

Interprocess Communication Mechanisms shared storage shared virtual memory shared files

Outline Interprocess Communication 1 IPC Theory What is IPC? IPC via Shared Memory Eike Ritter

Interprocess Communication Primitives Message Passing: issues Communication

Interprocess Communication Mechanisms shared storage These mechanisms have already been

Presented by Akbar Saidov Introduction Interprocess communication (IPC) Central to

Outline Asynchronous shared memory model Wait-free Consensus in shared memory with R/W

COMP 590-154: Computer Architecture Shared-Memory Multi-Processors Shared-Memory Multiprocessors

Distributed Shared Memory 1 Distributed Shared Memory Making the main memory of a cluster of

Distributed Shared Memory Shared memory : difficult to realize vs . easy to program with.

ECE 3574: Applied Software Design InterProcess Communication using Shared Memory Chris Wyatt

Interprocess Communication Pipes (UNIX) Sockets (UNIX) Shared Memory (UNIX)

Interprocess Communication Tevfik Ko ar Louisiana State University November 30th, 2010 1

Interprocess Communication Message Queues Tevfik Ko ar Louisiana State University

CptS 360 (System Programming) Unit 16: Interprocess Communication Bob Lewis School of

Interprocess Communication and Synchronization Chester Rebeiro IIT Madras 1 Inter Process

Protecting Interprocess Communications Operating systems provide various kinds of

Introduction Introduction R: a powerful, free, open-source, reliable, statistical Why

Haskell in the datacentre! Simon Marlow Facebook (Copenhagen, April 2019) Haskell powers Sigma

CS 555: D ISTRIBUTED S YSTEMS [RMI] Shrideep Pallickara Computer Science Colorado State

Native service interfaces for the Virtual State Layer Felix Kuperjans Advisor(s): Dr.

Convergence in Concurrency Doug Lea SUNY Oswego Introduction Motivation Infrastructure and

Design Patterns & Concurrency Sebastian Graf, Oliver Haase 1 Expectations ? ...on the

Parameter Learning 1 Graphical Models 10708 Carlos Guestrin Carnegie Mellon University

BN Semantics 2 The revenge of d-separation Graphical Models 10708 Carlos Guestrin

User-Level Interprocess Communication for Shared Memory - PowerPoint PPT Presentation

User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad Thomas E. Anderson Edward D. Lazowska Henry M. Levy Presented by: Dan Lake Introduction IPC is central to operating system design IPC is central

Interprocess Communication Mechanisms shared storage shared virtual memory shared files

Outline Interprocess Communication 1 IPC Theory What is IPC? IPC via Shared Memory Eike Ritter

Interprocess Communication Primitives Message Passing: issues Communication

Interprocess Communication Mechanisms shared storage These mechanisms have already been

Presented by Akbar Saidov Introduction Interprocess communication (IPC) Central to

Outline Asynchronous shared memory model Wait-free Consensus in shared memory with R/W

COMP 590-154: Computer Architecture Shared-Memory Multi-Processors Shared-Memory Multiprocessors

Distributed Shared Memory 1 Distributed Shared Memory Making the main memory of a cluster of

Distributed Shared Memory Shared memory : difficult to realize vs . easy to program with.

ECE 3574: Applied Software Design InterProcess Communication using Shared Memory Chris Wyatt

Interprocess Communication Pipes (UNIX) Sockets (UNIX) Shared Memory (UNIX)

Interprocess Communication Tevfik Ko ar Louisiana State University November 30th, 2010 1

Interprocess Communication Message Queues Tevfik Ko ar Louisiana State University

CptS 360 (System Programming) Unit 16: Interprocess Communication Bob Lewis School of

Interprocess Communication and Synchronization Chester Rebeiro IIT Madras 1 Inter Process

Protecting Interprocess Communications Operating systems provide various kinds of

Introduction Introduction R: a powerful, free, open-source, reliable, statistical Why

Haskell in the datacentre! Simon Marlow Facebook (Copenhagen, April 2019) Haskell powers Sigma

CS 555: D ISTRIBUTED S YSTEMS [RMI] Shrideep Pallickara Computer Science Colorado State

Native service interfaces for the Virtual State Layer Felix Kuperjans Advisor(s): Dr.

Convergence in Concurrency Doug Lea SUNY Oswego Introduction Motivation Infrastructure and

Design Patterns &amp; Concurrency Sebastian Graf, Oliver Haase 1 Expectations ? ...on the

Parameter Learning 1 Graphical Models 10708 Carlos Guestrin Carnegie Mellon University

BN Semantics 2 The revenge of d-separation Graphical Models 10708 Carlos Guestrin

Design Patterns & Concurrency Sebastian Graf, Oliver Haase 1 Expectations ? ...on the