silberschatz and galvin chapter 16
play

Silberschatz and Galvin Chapter 16 Distributed System Structures - PDF document

Silberschatz and Galvin Chapter 16 Distributed System Structures CPSC 410--Richard Furuta 4/13/99 1 Distributed System Structures General structure of distributed systems Network Operating Systems Distributed Operating Systems


  1. Silberschatz and Galvin Chapter 16 Distributed System Structures CPSC 410--Richard Furuta 4/13/99 1 Distributed System Structures ¥ General structure of distributed systems Ð Network Operating Systems Ð Distributed Operating Systems Ð Remote Services ¥ Robustness ¥ Design issues CPSC 410--Richard Furuta 4/13/99 2 1

  2. Network Operating Systems ¥ Multiplicity of machines is visible to users ¥ Explicit access to remote resources; alternatives: Ð logging into remote machine ¥ examples: telnet, rlogin, rsh Ð transferring data to local machine ¥ examples: ftp, rcp CPSC 410--Richard Furuta 4/13/99 3 Distributed Operating Systems ¥ Multiplicity of machines hidden from users ¥ Access to remote resources uses similar commands as access to local resources ¥ Data and process may migration between sites, under the control of the distributed operating system Ð Data migration: transfer data by transferring entire file or transferring only those portions of the file necessary for the immediate task Ð Process migration (computation migration): transfer the computation, rather than the data CPSC 410--Richard Furuta 4/13/99 4 2

  3. Distributed Operating Systems: Data Migration ¥ Accessing data from site A that resides on site B ¥ Transfer complete file from B to A; transfer back on modification Ð Òautomatic FTPÓ Ð older versions of Andrew file system Ð Inefficient ¥ Transfer portions of file that are necessary for the immediate task Ð NFS Ð newer versions of Andrew file system ¥ Issues: amount of information that is needed; concurrent access to same portion of file CPSC 410--Richard Furuta 4/13/99 5 Distributed Operating Systems: Process Migration ¥ Process Migration Ð execute an entire process, or parts of it, at different sites. Ð Load balancing Ð distribute processes across network to even the workload. Ð Computation speedup Ð subprocesses can run concurrently on different sites. Ð Hardware preference Ð process execution may require specialized processor. Ð Software preference Ð required software may be available at only a particular site. Ð Data access Ð run process remotely, rather than transfer all data locally. CPSC 410--Richard Furuta 4/13/99 6 3

  4. Distributed Operating Systems: Process Migration ¥ Migration without requiring user input Ð program does not need to be coded for migration Ð often used for load balancing and computational speedup among homogeneous systems ¥ Migration, specified by user Ð movement to satisfy hardware or software preference CPSC 410--Richard Furuta 4/13/99 7 Remote Services ¥ Requests for data at another site are expressed as requests for services from the remote site ¥ Requests are transferred to a remote server, which accesses necessary data, computes desired results, transferring them back to the requester. CPSC 410--Richard Furuta 4/13/99 8 4

  5. Remote Services ¥ Remote Procedure Call (RPC) Ð One of most common forms of remote service; abstracts procedure-call mechanism Ð Messages are addressed to an RPC daemon listening to a port on the remote system. Ð Messages include the name of the process to run and the parameters to pass to that process. Ð Process is executed as requested and any output is sent back to the requester in a separate message. Ð Port: a number included at the start of a message package that is used to differentiate services located at the system. Ð Many ports, one network address CPSC 410--Richard Furuta 4/13/99 9 Remote Services: Remote Procedure Calls ¥ RPC semantics are not precisely the same as local procedure calls Ð local calls fail rarely; remote calls can fail (or be duplicated) because of network errors ¥ timestamps needed to keep track of what has been processed Ð binding of formal and actual (i.e., client and server port) not as simple as local because client and server do not share memory ¥ advance agreement on port addresses at RPC procedure compile time; canÕt change number once compiled ¥ dynamic binding via a rendezvous dameon on a well-known RPC port (client sends message requesting port address of the RPC it needs to execute) CPSC 410--Richard Furuta 4/13/99 10 5

  6. Remote Services: Remote Procedure Calls ¥ A distributed file system (DFS) can be implemented as a set of RPC daemons and clients. Ð The messages are addressed to the DFS port on a server on which a file operation is to take place. Ð The message contains the disk operation to be performed (i.e., r re r r e e ea ad a a d , w d d wr w w r ri r i i it te t t e e e , r r r re e e en n na n a am a me m m e e e , d de d d el e e le l l et e e t t te e e e ,or st s s s t ta t a at a t t tu us u u s ). s s Ð The return message contains any data resulting from that call, which is executed by the DFS daemon on behalf of the client. CPSC 410--Richard Furuta 4/13/99 11 Remote Services: Threads ¥ Threads can send and receive messages while other operations within the task continue asynchronously ¥ Pop-up thread Ð created on Òas neededÓ basis to respond to new RPC. Ð Cheaper to start new thread than to restore existing one. Ð No threads block waiting for new work; no context has to be saved, or restored. Ð Incoming RPCs do not have to be copied to a buffer within a server thread. ¥ RPCs to processes on the same machine as the caller made more lightweight via shared memory between threads in different processes running on same machine. CPSC 410--Richard Furuta 4/13/99 12 6

  7. Robustness ¥ To ensure that the system is robust, we must: Ð Detect failures. ¥ link ¥ site Ð Reconfigure the system so that computation may continue. Ð Recover when a site or a link is repaired. CPSC 410--Richard Furuta 4/13/99 13 Failure Detection: Handshaking Procedure ¥ At fixed intervals, sites A and B send each other an I-am-up message. If site A does not receive this message within a predetermined time period, it can assume that site B has failed, that the link between A and B has failed, or that the message from B has been lost. ¥ A can try to differentiate between the cases by sending B an Are-you- up? message. ¥ At the time site A sends the Are-you-up? message, it specifies a time interval during which it is willing to wait for the reply from B. If A does not receive BÕs reply message within the time interval, A may conclude that one or more of the following situations has occurred: Ð Site B is down. Ð The direct link (if one exists) from A to B is down. Ð The alternative path from A to B is down. Ð The message has been lost. CPSC 410--Richard Furuta 4/13/99 14 7

  8. Reconfiguration ¥ Procedure that allows the system to reconfigure and to continue its normal mode of operation. ¥ If a direct link from A to B has failed, this information must be broadcast to every site in the system, so that the various routing tables can be updated accordingly. ¥ If it is believed that a site has failed (because it can no longer be reached), then every site in the system must be so notified, so that they will no longer attempt to use the services of the failed site. CPSC 410--Richard Furuta 4/13/99 15 Recovery from failure ¥ When a failed link or site is repaired, it must be integrated into the system gracefully and smoothly. ¥ Suppose that a link between A and B has failed. When it is repaired, both A and B must be notified. We can accomplish this notification by continuously repeating the handshaking procedure. ¥ Suppose that site B has failed. When it recovers, it must notify all other sites that it is up again. Site B then may have to receive from the other sites various information to update CPSC 410--Richard Furuta 4/13/99 16 it l l t bl 8

  9. Design Issues ¥ Transparency and locality Ð distributed system should look like conventional, centralized system and not distinguish between local and remote resources. ¥ User mobility Ð brings userÕs environment (i.e., home directory) to wherever the user logs in. ¥ Fault tolerance Ð system should continue functioning, perhaps in a degraded form, when faced with CPSC 410--Richard Furuta 4/13/99 17 various types of failures Design Issues ¥ Scalability Ð system should adapt to increased service load. ¥ Large-scale systems Ð service demand from any system component should be bounded by a constant that is independent of the number of nodes. ¥ ServersÕ process structure Ð servers should operate efficiently in peak periods; use lightweight processes or threads. CPSC 410--Richard Furuta 4/13/99 18 9

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend