D D u k e S y s t t e m s
Servers
and A Li-le Bit of Networking
Jeff Chase Duke University
Servers and A Li-le Bit of Networking Jeff Chase Duke University - - PowerPoint PPT Presentation
D D u k e S y s t t e m s Servers and A Li-le Bit of Networking Jeff Chase Duke University Unix process view: data A process has multiple channels I/O channels (file descriptors) for data movement in and out of the process (I/O).
D D u k e S y s t t e m s
Jeff Chase Duke University
Process
Thread Program
Files
I/O channels (“file descriptors”) stdin stdout stderr pipe tty socket
A process has multiple channels for data movement in and out of the process (I/O). The parent process and parent program set up and control the channels for a child (until exec). The channels are typed. Each channel is named by a file descriptor.
Cloud and Software-as-a-Service (SaaS) Rapid evolution, no user upgrade, no user data management. Agile/elastic deployment on clusters and virtual cloud utility- infrastructure. Where is your application? Where is your data? Where is your OS? networked server “cloud”
Internet “cloud” server hosts with server applications client applications NIC device kernel network software client host
Data is sent on the network as messages called packets.
A socket is a buffered channel for passing data between processes over a network. socket client int sd = socket(<internet stream>); gethostbyname(“www.cs.duke.edu”); <make a sockaddr_in struct> <install host IP address and port> connect(sd, <sockaddr_in>); write(sd, “abcdefg”, 7); read(sd, ….);
“GET /images/fish.gif HTTP/1.1” sd = socket(…); connect(sd, name); write(sd, request…); read(sd, reply…); close(sd); s = socket(…); bind(s, name); listen(s, 10); sd = accept(s); read(sd, request…); write(sd, reply…); close(sd); request reply client (initiator) server
connect(csd, <IP address and port>). For a client: connect the socket named by descriptor csd to a server at the specified IP address and port. Block until the connection is established. bind(sd, <…port>). For a server: associate the socket named by descriptor sd with a port number reachable at an IP address of the host machine. Does not block, but may fail, e.g., if some other process is already bound to the port. listen(sd, qsize). For a server: indicate that the socket named by descriptor sd is a server socket. When a connect request arrives for its port, establish the connection and place it on the accept queue (unless the accept queue is full). Listen does not block: it merely sets some parameters on the socket. accept(sd, …). For a server: accept a connection from the accept queue for the server socket named by descriptor sd. Block if the accept queue is empty. Returns the IP address and port of the client for this connection, and a new socket descriptor csd for the connection. Given a socket descriptor csd for an established connection (from a completed connect or accept) a process may use write (or send) to send bytes to the connection peer, and may use read (or recv) to receive bytes sent by the peer.
...
struct sockaddr_in socket_addr; sock = socket(PF_INET, SOCK_STREAM, 0); memset(&socket_addr, 0, sizeof socket_addr); socket_addr.sin_family = PF_INET; socket_addr.sin_port = htons(port); socket_addr.sin_addr.s_addr = htonl(INADDR_ANY); if (bind(sock, (struct sockaddr *) &socket_addr, sizeof socket_addr) < 0) { perror("bind failed"); exit(1); } listen(sock, 10); while (1) { int acceptsock = accept(sock, NULL, NULL); forkme(acceptsock, prog, argv); /* fork/exec cat */ close(acceptsock); } }
socket read Hardware Server Kernel Network Interface
packet (DMA)
copy request bufger
read Disk Interface
data (DMA)
copy reply bufger
request
socket write
from user bufger into network bufger
packet and DMA
packet queues listen queue accept queue Server application (Apache, Tomcat/ Java, etc)
Server operations create socket(s) bind to port number(s) listen to advertise port wait for client to arrive on port (select/poll/epoll of ports) accept client connection read or recv request write or send response close client socket
disk queue
user space socket per-process descriptor table kernel space “open file table”
Disclaimer: this drawing is oversimplified
pointer
There’s no magic here: processes use read/write (and other syscalls) to
even be mapped onto stdin or stdout. Deeper in the kernel, sockets are handled differently from files, pipes, etc. Sockets are the entry/exit point for the network protocol stack.
int fd
pipe file tty global port table
Inbound traffic
number (16-bit integer) that is unique on that host.
– Source/dest port is named in every IP packet. – Kernel looks at port to demultiplex incoming traffic.
– We have to agree on well-known ports for common services – Look at /etc/services – Ports 1023 and below are ‘reserved’ and privileged: generally you must be root/admin/superuser to bind to them.
port assigned dynamically by the kernel.
The IP network carries data packets addressed to a destination node (host named by IP address) and port. Kernel network stack demultiplexes incoming network traffic: choose process/socket to receive it based on destination port.
Network adapter hardware aka, network interface controller (“NIC”) Incoming network packets Apps with
sockets
sleep ready queue interrupt trap or fault return to user mode wakeup sleep queue switch
Example 1: NIC interrupt wakes thread to receive incoming packets. Example 2: disk interrupt wakes thread when disk I/O completes. Example 3: clock interrupt wakes thread after N ms have elapsed.
Note: it isn’t actually the interrupt itself that wakes the thread, but the interrupt handler (software). The awakened thread must have registered for the wakeup before sleeping (e.g., by placing its TCB on some sleep queue for the event).
TCP/IP Client Network adapter Global IP Internet TCP/IP Server Network adapter Internet client host Internet server host Sockets interface (system calls) Hardware interface (interrupts) User code Kernel code Hardware and firmware Note: the “protocol stack” should not be confused with a thread stack. It’s a layering of software modules that implement network protocols: standard formats and rules for communicating with peers over a network.
Insert “Power of TCP/IP” slide, /usr/net/87. (The poster in my office)
The Internet concept wasn’t
not to everyone. It had to be marketed, even within the tech community. In 1986, the US National Science Foundation (NSF)
commercial Internet (then NSFNET). IP support in sockets (Berkeley Unix) was widely used among academics. The driving force for adopting TCP/IP was a collection of Unix-
arrayed against a few large companies with their own proprietary network standards.
TCP user (application)
TCP/IP protocol sender checksum COMPLETE SEND
transmit queue
get data
user transmit buffers TCP send buffers (optional)
packets TCP/IP protocol receiver checksum COMPLETE RECEIVE
receive queue
window data
user receive buffers TCP rcv buffers (optional)
inbound packets TCB flow ack flow ack
TCP implementation
network path
Integrity: packets are covered by a checksum to detect errors. Reliability: receiver acks received data, sender retransmits if needed. Ordering: packets/bytes have sequence numbers, and receiver reassembles. Flow control: receiver tells sender how much / how fast to send (window). Congestion control: sender “guesses” current network capacity on path.
Illustration only
ICANN: a US nonprofit organization that is responsible for the coordination
namespaces of the Internet, and ensuring the network's stable and secure operation. IANA: a department of ICANN.
read(fd, buf, len): read len bytes from the I/O object named by the descriptor fd and store it in the process VAS at address buf. Block until the data is available:
awaiting a notify triggered by the disk interrupt handler.
calling thread blocks awaiting a notify triggered by the NIC interrupt handler after the packet arrives. (The recv syscall is equivalent.)
pipe; e.g., the calling thread blocks awaiting a notify triggered by the write (like soda machine). If the object named by fd will never produce the data, then read returns an EOF: end of file. A read returns the number of bytes transferred: zero à EOF.
write(fd, buf, len): write len bytes to the I/O object named by the descriptor fd, fetching them from the process VAS at address buf. Generally, the write call is asynchronous / nonblocking: it returns immediately, and completes when the data arrives at the destination some time
blocks when some bound on the buffer memory is reached:
the calling thread blocks until “enough” of those writes reach the disk.
the receiver; e.g., the calling thread blocks until the receiving process consumes “enough” of that data. (The send syscall is equivalent.)
e.g., the calling thread blocks until a reader consumes “enough” bytes, freeing up sufficient kernel buffer space for the write to complete. If bytes written to the object named by fd will never be consumed, then write returns an error. E.g., this may occur if the receiver(s) closed its end of the pipe or socket, or the network is unreachable.
– Synchronous I/O: calling process blocks until is “complete”.
– Process or file descriptor (e.g., file or socket)
– Oops, that didn’t really help.
sets of sockets or other file descriptors.
– select was slow for large poll sets. Now we have various variants: poll, epoll, pollet, kqueue. None are ideal.
Network services run as processes that listen for requests arriving on ports. Each network service receives requests at a designated port number (assigned and standardized by IANA). See /etc/services Inetd forks service processes on demand (lazily) on incoming connect requests for their ports.
What if no process is listening on the destination port for a request?
Before After
inetd “internet daemon”.
standard services.
– Standard services and ports listed in /etc/services. – inetd listens on all the ports and accepts connections.
forks a child process.
configured for the port.
then exits.
Listen at Server Ports
TCP/IP Comm. Service Master Server / inetd
Receive Request at Server Port create Server Process
(Child) Server
Get Connection for Request establish connec- tion at new Port Communicate with Client Terminate
fork(), exec() accept() listen(), select()
[Apache Modeling Project: http://www.fmc-modeling.org/projects/apache]
request reply
Remote Procedure Call (RPC) is one common example of this pa=ern. The Web is another.
– */c-samples/buggyserver.c
smash attack (previously discussed).
code chosen by the attacker, to “own” the web server.
crack it you get the points.
world…and it never stops.
– NX: no-execute segments. The classic attack injects code
address to branch to the injected code. NX makes this harder by disabling execute privilege on the stack segment. Any attempt to execute the attacker’s code (or any code) on the stack generates a fault. – ASLR: address space layout randomization. The attacker guesses where the stack resides in order to overwrite a frame’s return address to branch to injected code. Randomizing the layout makes this harder.
struct sockaddr_in socket_addr;
sock = socket(PF_INET, SOCK_STREAM, 0); int on = 1; setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &on, sizeof on); memset(&socket_addr, 0, sizeof socket_addr); socket_addr.sin_family = PF_INET; socket_addr.sin_port = htons(port); socket_addr.sin_addr.s_addr = htonl(INADDR_ANY); if (bind(sock, (struct sockaddr *)&socket_addr, sizeof socket_addr) < 0) { perror("couldn't bind"); exit(1); } listen(sock, 10);
Illustration only
while (1) {
int acceptsock = accept(sock, NULL, NULL); char *input = (char *)malloc(1024*sizeof (char)); recv(acceptsock, input, 1024, 0); int is_html = 0; char *contents = handle(input,&is_html); free(input); …send response… close(acceptsock); }
If a server is listening on only one port/socket (“listener”), then it can skip the select/poll/epoll.
Illustration only
const char *resp_ok = "HTTP/1.1 200 OK\nServer: BuggyServer/1.0\n"; const char *content_html = "Content-type: text/html\n\n"; send(acceptsock, resp_ok, strlen(resp_ok), 0); send(acceptsock, content_html, strlen(content_html), 0); send(acceptsock, contents, strlen(contents), 0); send(acceptsock, "\n", 1, 0); free(contents);
Illustration only
– Many clients send requests “at the same time”.
– Don’t leave a server CPU idle if there is a request to work on.
– Unix had single-threaded processes and blocking syscalls. – If a process blocks it can’t do anything else until it wakes up.
pattern for a thread’s program.
sequence of typed messages or events.
– Handle one event at a time, in order.
except to get the next event.
– Blocks only if no events to handle (idle).
handler routines for the event types.
– The thread upcalls the handler to dispatch or “handle” each event.
thread becomes unresponsive to events.
events
Dispatch events by invoking handlers (upcalls).
Accept Client Connection Read HTTP Request Header Find File Send HTTP Response Header Read File Send Data
may block waiting on disk I/O We want to be able to process requests concurrently. may block waiting on network
R1 arrives Receive R1 R2 arrives Disk request 1a 1a completes R1 completes Receive R2
R1 arrives Receive R1 Disk request 1a 1a completes R1 completes Receive R2
R2 arrives Finish 1a Start 1a
R1 arrives Receive R1 R2 arrives Disk request 1a 1a completes R1 completes Receive R2
processes/threads to build responsive/efficient systems.
(interrupts, callbacks, event queues, etc.). Example: Windows I/O driver stack is a highly flexible event-driven system.
I/O. E.g., polling APIs like waitpid() with WNOHANG. This is an event-driven model: notify thread by an event when op completes.
– Event-driven model is natural for GUIs, servers. – But to use multiple cores effectively, we need multiple threads. And every system today is a multicore system. – Multi-threading also enables use of blocking APIs without compromising responsiveness of other threads in the program.
Accept Conn Read Request Find File Send Header Read File Send Data Accept Conn Read Request Find File Send Header Read File Send Data Process 1 Process N
separate address spaces
Accept Conn Read Request Find File Send Header Read File Send Data Accept Conn Read Request Find File Send Header Read File Send Data Thread 1 Thread N
This structure might have lower cost than the multi-process architecture if threads are “cheaper” than processes.
Incoming request queue
worker loop
Handle one request, blocking as necessary. When request is complete, return to worker pool. Magic elastic worker pool Resize worker pool to match incoming request load: create/ destroy workers as needed. dispatch idle workers Workers wait here for next request dispatch. Workers could be processes or threads.
Poll()
notification), in its entirety, ready for service (dispatch).
returned through a single call to poll.
threads (à handlers must be thread-safe as well).
Incoming event queue
worker loop
Handle one event, blocking as necessary. When handler is complete, return to worker pool. We can synchronize an event queue with a monitor: a mutex/CV pair. Protect the event queue data structure itself with the mutex. dispatch threads waiting on CV Workers wait on the CV for next event if the event queue is empty. Signal the CV when a new event arrives. This is a producer/consumer problem.
handler handler handler
handle any kind of asynchronous event.
– Arriving input (e.g., GUI clicks/swipes, requests to a server) – Notify that an operation started earlier is complete
– Subscribe to events published/posted by other threads – Including status of children: stop/exit/wait, signals, etc.
drives any kind of action in the receiving thread. – In Android: intents, binder RPC, UI events
thread requests do not block; the request returns immediately (“asynchronous”) and delivers a completion event later.
any of its components is first instantiated.
e.g., by User Interface (UI) events.
– Also called the “UI thread”. – UI toolkit code is not thread-safe, so it should execute only on the UI thread. – UI thread should not block (except for next event), or app becomes unresponsive.
launches and tears down components, and receives various upcalls.
threads (workers) for other uses.
events
Three examples/models for use of threads in Android.
1. Main thread (UI thread): receives UI events and other upcall events on a single incoming message queue. Illustrates event- driven pattern: thread blocks only to wait for next event. 2. ThreadPool: an elastic pool of threads that handle incoming calls from clients: Android supports “binder” request/response calls from
the pool receives it, handles it, responds to it, and returns to the pool to wait for the next request. 3. AsyncTask: the main thread can create an AsyncTask thread to perform some long-running activity without blocking the UI thread. The AsyncTask thread sends progress updates to the main thread
These patterns are common in many other systems as well.
Adapted from http://developer.android.com/guide/components/processes-and-threads.html Summary: By default, all components of the same application run in the same process and thread (called the "main" thread). The main thread controls the UI, so it is also called the UI thread. If the UI thread blocks then the application stops responding to the user. You can create additional background threads for operations that block, e.g., I/O, to avoid doing those operations on the UI thread. The background threads can interact with the UI by posting messages/tasks/events to the UI thread. Details: When an application component starts and the application does not have any
application with a single thread of execution called the main thread. All components that run in the same process are initialized by its main thread, and system upcalls to those components (onCreate, onBind, onStart,…) run on the main thread. The main thread is also called the UI thread. It is in charge of dispatching events to user interface widgets and interacting with elements of the Android UI toolkit. For instance, when the user touches a button on the screen, your app's UI thread dispatches the touch event to the widget, to set its pressed state and redraw itself. If you have operations that might require blocking, e.g., to perform I/O like network communication or database access, you should run them in separate threads. A thread that is not the UI thread is called a background thread or "worker" thread.
Adapted from http://developer.android.com/guide/components/processes-and-threads.html Your app should never block the UI thread. When the UI thread is blocked, no events can be dispatched, including drawing events. From the user's perspective, the application appears to “hang” or “freeze”. Even worse, if the app blocks the UI thread for more than a few seconds, Android presents the user with the infamous "application not responding" (ANR) dialog. The user might then decide to quit your application and uninstall it. In a correct Android program the UI thread blocks only to wait for the next event, when it has nothing else to do (it is idle). If you have an operation to perform that might block for any other reason, then you should arrange for a background/worker thread to do it. Additionally, the Android UI toolkit is not thread-safe: if multiple threads call a module that is not thread-safe, then the process might crash. A correct app manipulates the user interface only from a single thread, the UI thread. So: your app must not call UI widgets from a worker thread. So how can a worker thread interact with the UI, e.g., to post status updates? Android
Note: this concept of a single event-driven main/UI thread appears in other systems too.
[http://techtej.blogspot.com/2011/03/android-thread-constructs-part-3.html]
is multi-threaded servers with a thread pool. We can implement it with our thread primitives.
byte streams just aren’t enough. Modern apps have GUIs and likely interact with one or more servers. And they need to be responsive.
events from the GUI, from the OS, and from other apps. So their threading models are similar to servers.
modern distributed systems.
response services.”
networks (e.g., Network File System). client [sockets] server [sockets] “glue”
This code is “canned”, independent of the specific application. Auto-generate this “stub” code from API spec (IDL). Humans focus on getting this code right.
Remote Procedure Call (RPC) is request/response interaction through a published API, using IPC messaging to cross an inter- process boundary. API stubs
RPC is used in many standard Internet services. It is also the basis for component frameworks like DCOM, CORBA, and Android. Software is packaged into named “objects” or components. Components may publish interfaces and/or invoke published interfaces of other components. Components may execute in different processes and/or on different nodes. generated from an Interface Description Language (IDL) Establishing an RPC connection to a named remote interface is
[OpenGroup, late 1980s] Q: How do we manage these “call threads”? A: Create them as needed, and keep idle threads in a thread pool. When an RPC call arrives, wake up an idle thread from the pool to handle it. On the client, the client thread blocks until the server thread returns a response.
Implementing RPC
Birrell/Nelson 1984
Stubs link with the client/server code to “hide” the boundary crossing. – They “marshal” args/results – i.e., translate to/from some standard network stream format – Also known as linearize, serialize – …or “flatten” – Propagate PL-level exceptions – Stubs are auto-generated from an Interface Description Language (IDL) file by a stub compiler tool at software build time, and linked in. – Client and server must agree on the protocol signatures in the IDL file.
– RPC stubs are similar to system call stubs, but they do more than just trap to the kernel. – The RPC stubs construct/deconstruct a message transmitted through a messaging system. – Binder is an example of such a messaging system, implemented as a Linux kernel plug-in module (a driver) and some user-space libraries.
application’s RPC API written in an Interface Description Language.
– Looks like any interface definition… – List of method names and argument/result types and signatures. – Stub code marshals arguments into request message, marshals results into a reply message.
Android Architecture and Binder Dhinakaran Pandiyan Saketh Paranjape
Do RPC calls ever fail?
network transparency is neither achievable no desirable. If the client cannot reach the server, you may want it to fail, so the app can decide if/how to report the problem to the user, as opposed to (say) hanging forever. Generally, the problem of stalled network communication is a major reason why thread systems provide thread alert APIs and timeouts on CV wait. RPC systems built over sockets (such as Java RMI) can be expected to reflect any socket- level errors back to the process. For example, if a server process has failed (its port is no longer bound), but its host is still up, then its host kernel rejects an incoming TCP connection request with "connection refused”. The RPC client should report this and other errors back to the app. For example, a socket connect request also fails if the network is
the RPC system may retry or it may report the error. Timeouts are a configuration issue. Defaults should be reasonable, but they vary.
JVM+lib
Linux kernel Activity Manager Service etc. Android services and libraries communicate by sending messages through shared-memory channels set up by binder.
JVM+lib
Android binder A client binds to a service.
Bindings are reference-counted.
Services register to advertise for clients.
an add-on kernel driver for /dev/binder object RPC
programming, covering most of the ground in this deck:
– http://www.cs.dartmouth.edu/~campbell/cs50/socketprogramming.html
– http://pages.cs.wisc.edu/~remzi/OSTEP/threads-events.pdf
Distributed Systems: Principles and Paradigms.
– http://www.cs.vu.nl/~ast/books/ds1/02.pdf
concurrently with other processes.
progress on other requests.
serially, or can fork a process per request.
– Tradeoffs?
– inetd “internet daemon” for standard /etc/services – Design pattern for (Web) servers: “prefork” a fixed number of worker processes.
models for concurrency.
require effective concurrent handling of requests.
event handling.
noise.
Shells face similar problems in tracking their children, which execute independently (asynchronously).
– Example: wait*(WNOHANG). But you have to keep asking to know when a child changes state. (polling) – What about starting asynchronous operations, like a read? How to know when it is done without blocking?
– Signals? E.g., SIGCHLD: “Your child has died.” – Interrupted syscalls w/ EINTR: “Look: something happened.”
but how to structure their interactions?