HTTP Server HTTP Server Servers: Concurrency and Creates a socket - - PDF document

http server
SMART_READER_LITE
LIVE PREVIEW

HTTP Server HTTP Server Servers: Concurrency and Creates a socket - - PDF document

HTTP Server HTTP Server Servers: Concurrency and Creates a socket ( socket ) Bind s to an address Performance Listen s to setup accept backlog Can call accept to block waiting for connections (Can call select to check for


slide-1
SLIDE 1

1

Servers: Concurrency and Performance

Jeff Chase Duke University

HTTP Server

  • HTTP Server

– Creates a socket (socket) – Binds to an address – Listens to setup accept backlog – Can call accept to block waiting for connections

– (Can call select to check for data on multiple socks)

  • Handle request

– GET /index.html HTTP/1.0\n <optional body, multiple lines>\n \n

Inside your server

packet queues listen queue accept queue Server application (Apache, Tomcat/Java, etc)

Measures

  • ffered load

response time throughput utilization

Example: Video On Demand

Client() { fd = connect(“server”); write (fd, “video.mpg”); while (!eof(fd)) { read (fd, buf); display (buf); } }

Server() { while (1) { cfd = accept(); read (cfd, name); fd = open (name); while (!eof(fd)) { read(fd, block); write (cfd, block); } close (cfd); close (fd); } [MIT/Morris]

How many clients can the server support? Suppose, say, 200 kb/s video on a 100 Mb/s network link?

Performance “analysis”

  • Server capacity:

– Network (100 Mbit/s) – Disk (20 Mbyte/s)

  • Obtained performance: one client stream
  • Server is limited by software structure
  • If a video is 200 Kbit/s, server should be able to

support more than one client.

[MIT/Morris]

500?

WebServer Flow

TCP socket space state: listening address: {*.6789, *.*} completed connection queue: sendbuf: recvbuf:

128.36.232.5 128.36.230.2

state: listening address: {*.25, *.*} completed connection queue: sendbuf: recvbuf: state: established address: {128.36.232.5:6789, 198.69.10.10.1500} sendbuf: recvbuf:

connSocket = accept() Create ServerSocket read request from connSocket read local file write file to connSocket close connSocket Discussion: what does step do and how long does it take?

slide-2
SLIDE 2

2 Web Server Processing Steps

Accept Client Connection Read HTTP Request Header Find File Send HTTP Response Header Read File Send Data

may block waiting on disk I/O Want to be able to process requests concurrently. may block waiting on network

Process States and Transitions

running (user) running

(kernel)

ready blocked

Run Wakeup

interrupt, exception

Sleep Yield

trap/return

Server Blocking

  • accept() when no connect requests are waiting on the

listen queue – What if server has multiple ports to listen from?

  • E.g., 80 for HTTP, 443 for HTTPS
  • open/read/write on server files
  • read() on a socket, if the client is sending too slowly
  • write() on socket, if the client is receiving too slowly

– Yup, TCP has flow control like pipes What if the server blocks while serving one client, and another client has work to do?

Under the Hood

CPU I/O device

I/O request I/O completion

start (arrival rate λ)

exit

(throughput λ until some center saturates)

Concurrency and Pipelining

CPU DISK Before NET CPU DISK NET After

Better single-server performance

  • Goal: run at server’s hardware speed

– Disk or network should be bottleneck

  • Method:

– Pipeline blocks of each request – Multiplex requests from multiple clients

  • Two implementation approaches:

– Multithreaded server – Asynchronous I/O

[MIT/Morris]

slide-3
SLIDE 3

3

Concurrent threads or processes

  • Using multiple threads/processes

– so that only the flow processing a particular request is blocked – Java: extends Thread or implements Runnable interface

Example: a Multi-threaded WebServer, which creates a thread for each request

Multiple Process Architecture

  • Advantages

– Simple programming while addressing blocking issue

  • Disadvantages

– Many processes; large context switch overheads – Consumes much memory – Optimizations involving sharing information among processes (e.g., caching) harder

Accept Conn Read Request Find File Send Header Read File Send Data Accept Conn Read Request Find File Send Header Read File Send Data Process 1 Process N

separate address spaces

Using Threads

  • Advantages

– Lower context switch overheads – Shared address space simplifies optimizations (e.g., caches)

  • Disadvantages

– Need kernel level threads (why?) – Some extra memory needed to support multiple stacks – Need thread-safe programs, synchronization

Accept Conn Read Request Find File Send Header Read File Send Data Accept Conn Read Request Find File Send Header Read File Send Data Thread 1 Thread N

Threads

  • A thread is a schedulable stream of control.
  • defined by CPU register values (PC, SP)
  • suspend: save register values in

memory

  • resume: restore registers from

memory

  • Multiple threads can execute independently:
  • They can run in parallel on multiple

CPUs...

– - physical concurrency

  • …or arbitrarily interleaved on a single

CPU.

– - logical concurrency

  • Each thread must have its own stack.

Multithreaded server

server() { while (1) { cfd = accept(); read (cfd, name); fd = open (name); while (!eof(fd)) { read(fd, block); write (cfd, block); } close (cfd); close (fd); }}

for (i = 0; i < 10; i++) threadfork (server);

  • When waiting for I/O,

thread scheduler runs another thread

  • What about references to

shared data?

  • Synchronization

[MIT/Morris]

Event-Driven Programming

  • One execution stream: no CPU

concurrency.

  • Register interest in events

(callbacks).

  • Event loop waits for events,

invokes handlers.

  • No preemption of event

handlers.

  • Handlers generally short-

lived. Event Loop Event Handlers

[Ousterhout 1995]

slide-4
SLIDE 4

4

Single Process Event Driven (SPED)

  • Single threaded
  • Asynchronous (non-blocking) I/O
  • Advantages

– Single address space – No synchronization

  • Disadvantages

– In practice, disk reads still block

Accept Conn Read Request Find File Send Header Read File Send Data

Event Dispatcher

Asynchronous Multi-Process Event Driven (AMPED)

  • Like SPED, but use helper processes/thread for disk I/O
  • Use IPC to communicate with helper process
  • Advantages

– Shared address space for most web server functions – Concurrency for disk I/O

  • Disadvantages

– IPC between main thread and helper threads

Accept Conn Read Request Find File Send Header Read File Send Data

Event Dispatcher

Helper 1 Helper 1 Helper 1

This hybrid model is used by the “Flash” web server.

Event-Based Concurrent Servers Using I/O Multiplexing

  • Maintain a pool of connected descriptors.
  • Repeat the following forever:

– Use the Unix select function to block until:

  • (a) New connection request arrives on the listening

descriptor.

  • (b) New data arrives on an existing connected descriptor.

– If (a), add the new connection to the pool of connections. – If (b), read any available data from the connection

  • Close connection on EOF and remove it from the pool.

[CMU 15-213]

Select

  • If a server has many open sockets, how does it know

when one of them is ready for I/O?

int select(int n, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);

  • Issues with scalability: alternative event interfaces

have been offered.

Asynchronous I/O

struct callback { bool (*is_ready)(); void (*cb)(arg); void *arg; } main() { while (1) { for (c = each callback) { if (c->is_ready()) c->handler(c->arg); } } }

  • Code is structured as a

collection of handlers

  • Handlers are nonblocking
  • Create new handlers for

blocking operations

  • When operation

completes, call handler

[MIT/Morris]

Asychronous server

init() {

  • n_accept(accept_cb);

} accept_cb() {

  • n_readable(cfd,name_cb);

}

  • n_readable(fd, fn) {

c = new callback(test_readable, fn, fd); add c to callback list; } name_cb(cfd) { read(cfd,name); fd = open(name);

  • n_readable(fd, read_cb);

} read_cb(cfd, fd) { read(fd, block);

  • n_writeeable(fd, write_cb);

} write_cb(cfd, fd) { write(cfd, block);

  • n_readable(fd, read_cb);

} [MIT/Morris]

slide-5
SLIDE 5

5

Multithreaded vs. Async

  • Hard to program

– Locking code – Need to know what blocks

  • Coordination explicit
  • State stored on thread’s stack

– Memory allocation implicit

  • Context switch may be

expensive

  • Multiprocessors
  • Hard to program

– Callback code – Need to know what blocks

  • Coordination implicit
  • State passed around explicitly

– Memory allocation explicit

  • Lightweight context switch
  • Uniprocessors

[MIT/Morris]

Coordination example

  • Threaded server:

– Thread for network interface – Interrupt wakes up network thread – Protected (locks and conditional variables) shared buffer shared between server threads and network thread

  • Asynchronous I/O

– Poll for packets

  • How often to poll?

– Or, interrupt generates an event

  • Be careful: disable

interrupts when manipulating callback queue.

[MIT/Morris]

Threads!

One View Should You Abandon Threads?

  • No: important for high-end servers (e.g.

databases).

  • But, avoid threads wherever possible:

– Use events, not threads, for GUIs, distributed systems, low-end servers. – Only use threads where true CPU concurrency is needed. – Where threads needed, isolate usage in threaded application kernel: keep most of code single-threaded. Threaded Kernel

Event-Driven Handlers

[Ousterhout 1995]

Another view

  • Events obscure control flow

– For programmers and tools

AcceptHandler(event e) { struct session *s = new_session(e); RequestHandler.enqueue(s); } RequestHandler(struct session *s) { …; CacheHandler.enqueue(s); } CacheHandler(struct session *s) { pin(s); if( !in_cache(s) ) ReadFileHandler.enqueue(s); else ResponseHandler.enqueue(s); } . . . ExitHandlerr(struct session *s) { …; unpin(&s); free_session(s); } thread_main(int sock) { struct session s; accept_conn(sock, &s); read_request(&s); pin_cache(&s); write_response(&s); unpin(&s); } pin_cache(struct session *s) { pin(&s); if( !in_cache(&s) ) read_file(&s); }

Events Threads

Accept Conn. Write Response Read File Read Request Pin Cache

Web Server

Exit

[von Behren]

Control Flow

Accept Conn. Write Response Read File Read Request Pin Cache

Web Server

Exit CacheHandler(struct session *s) { pin(s); if( !in_cache(s) ) ReadFileHandler.enqueue(s); else ResponseHandler.enqueue(s); } RequestHandler(struct session *s) { …; CacheHandler.enqueue(s); } . . . ExitHandlerr(struct session *s) { …; unpin(&s); free_session(s); } AcceptHandler(event e) { struct session *s = new_session(e); RequestHandler.enqueue(s); } thread_main(int sock) { struct session s; accept_conn(sock, &s); read_request(&s); pin_cache(&s); write_response(&s); unpin(&s); } pin_cache(struct session *s) { pin(&s); if( !in_cache(&s) ) read_file(&s); }

Events Threads

  • Events obscure control flow

– For programmers and tools

[von Behren]

slide-6
SLIDE 6

6

Exceptions

  • Exceptions complicate control flow

– Harder to understand program flow – Cause bugs in cleanup code

Accept Conn. Write Response Read File Read Request Pin Cache

Web Server

Exit CacheHandler(struct session *s) { pin(s); if( !in_cache(s) ) ReadFileHandler.enqueue(s); else ResponseHandler.enqueue(s); } RequestHandler(struct session *s) { …; if( error ) return; CacheHandler.enqueue(s); } . . . ExitHandlerr(struct session *s) { …; unpin(&s); free_session(s); } AcceptHandler(event e) { struct session *s = new_session(e); RequestHandler.enqueue(s); } thread_main(int sock) { struct session s; accept_conn(sock, &s); if( !read_request(&s) ) return; pin_cache(&s); write_response(&s); unpin(&s); } pin_cache(struct session *s) { pin(&s); if( !in_cache(&s) ) read_file(&s); }

Events Threads [von Behren]

State Management

CacheHandler(struct session *s) { pin(s); if( !in_cache(s) ) ReadFileHandler.enqueue(s); else ResponseHandler.enqueue(s); } RequestHandler(struct session *s) { …; if( error ) return; CacheHandler.enqueue(s); } . . . ExitHandlerr(struct session *s) { …; unpin(&s); free_session(s); } AcceptHandler(event e) { struct session *s = new_session(e); RequestHandler.enqueue(s); } thread_main(int sock) { struct session s; accept_conn(sock, &s); if( !read_request(&s) ) return; pin_cache(&s); write_response(&s); unpin(&s); } pin_cache(struct session *s) { pin(&s); if( !in_cache(&s) ) read_file(&s); }

Events Threads

Accept Conn. Write Response Read File Read Request Pin Cache

Web Server

Exit

  • Events require manual state management
  • Hard to know when to free

– Use GC or risk bugs

[von Behren]

Accept Conn Read Request Find File Send Header Read File Send Data Accept Conn Read Request Find File Send Header Read File Send Data Thread 1 Thread N

Internet Growth and Scale

The Internet The Internet

How to handle all those client requests raining on your server?

Servers Under Stress

Ideal Peak: some resource at max Overload: some resource thrashing

Load (concurrent requests, or arrival rate)

Performance [Von Behren]

Response Time

Components

  • Wire time +
  • Queuing time +
  • Service demand +
  • Wire time (response)

Depends on

  • Cost/length of request
  • Load conditions at server

latency

  • ffered load
slide-7
SLIDE 7

7 Queuing Theory for Busy People

  • Big Assumptions

– Queue is First-Come-First-Served (FIFO, FCFS). – Request arrivals are independent (poisson arrivals). – Requests have independent service demands. – i.e., arrival interval and service demand are exponentially distributed (noted as “M”).

M/M/1 Service Center

  • ffered load

request stream @ arrival rate λ

wait here

Process for mean service demand D

Utilization

  • What is the probability that the center is busy?

– Answer: some number between 0 and 1.

  • What percentage of the time is the center busy?

– Answer: some number between 0 and 100

  • These are interchangeable: called utilization U
  • If the center is not saturated, i.e., it completes all its

requests in some bounded time, then:

  • U = λD = (arrivals/T * service demand)
  • “Utilization Law”
  • The probability that the service center is idle is 1-U.

Little’s Law

  • For an unsaturated queue in steady state, mean

response time R and mean queue length N are governed by:

Little’s Law: N = λR

  • Suppose a task T is in the system for R time units.
  • During that time:

– λR new tasks arrive. – N tasks depart (all tasks ahead of T).

  • But in steady state, the flow in balances flow out.

– Note: this means that throughput X = λ.

Inverse Idle Time “Law”

R

1(100%)

Service center saturates as 1/ λ approaches D: small increases in λ cause large increases in the expected response time R. U Little’s Law gives response time R = D/(1 - U). Intuitively, each task T’s response time R = D + DN. Substituting λR for N: R = D + D λR Substituting U for λD: R = D + UR R - UR = D --> R(1 - U) = D --> R = D/(1 - U)

What does this tell us about server behavior at saturation?

Under the Hood

CPU I/O device

I/O request I/O completion

start (arrival rate λ)

exit

(throughput λ until some center saturates)

slide-8
SLIDE 8

8

Common Bottlenecks

  • No more File Descriptors
  • Sockets stuck in TIME_WAIT
  • High Memory Use (swapping)
  • CPU Overload
  • Interrupt (IRQ) Overload

[Aaron Bannert]

Scaling Server Sites: Clustering

server array Clients

L4: TCP L7: HTTP SSL etc.

Goals server load balancing failure detection access control filtering priorities/QoS request locality transparent caching smart switch

virtual IP addresses (VIPs)

What to switch/filter on? L3 source IP and/or VIP L4 (TCP) ports etc. L7 URLs and/or cookies L7 SSL session IDs

Scaling Services: Replication

Internet Internet Distribute service load across multiple sites. How to select a server site for each client or request? Is it scalable? Client Site A Site B ?

Extra Slides

(Any new information on the following slides will not be tested.)

Event-Based Concurrent Servers Using I/O Multiplexing

  • Maintain a pool of connected descriptors.
  • Repeat the following forever:

– Use the Unix select function to block until:

  • (a) New connection request arrives on the listening

descriptor.

  • (b) New data arrives on an existing connected descriptor.

– If (a), add the new connection to the pool of connections. – If (b), read any available data from the connection

  • Close connection on EOF and remove it from the pool.

[CMU 15-213]

Problems of Multi-Thread Server

  • High resource usage, context switch overhead, contended

locks

  • Too many threads → throughput meltdown, response time

explosion

  • Solution: bound total number of threads
slide-9
SLIDE 9

9

Event-Driven Programming

  • Event-driven programming, also called asynchronous i/o
  • Using Finite State Machines (FSM) to monitor the progress of requests
  • Yields efficient and scalable concurrency
  • Many examples: Click router, Flash web server, TP Monitors, etc.
  • Java: asynchronous i/o

– for an example see: http://www.cafeaulait.org/books/jnp3/examples/12/

Traditional Processes

  • Expensive and “heavyweight”
  • One system call per process
  • Fork overhead
  • Coordination

Events

  • Need async I/O
  • Need select
  • Wasn’t originally available
  • Not standardized
  • Immature
  • But efficient
  • Code is distributed all through the program
  • Harder to debug and understand

Threads

  • Separate interface and implementation
  • Pthreads interface
  • Implementation is user-level or kernel (native)
  • If user-level, needs async I/O
  • But hide the abstraction behind the thread interface

Reference

The State of the Art in Locally Distributed Web- server Systems

Valeria Cardellini, Emiliano Casalicchio, Michele Colajanni and Philip S. Yu