[PPT] - HTTP and the Dynamic Web HTTP and the Dynamic Web How does the Web PowerPoint Presentation

SLIDE 1

HTTP and the Dynamic Web HTTP and the Dynamic Web

SLIDE 2

How does the Web work? How does the Web work?

The canonical example in your Web browser

Click here

“here” is a Uniform Resource Locator (URL)

http://www-cse.ucsd.edu

It names the location of an object on a server.

[courtesy of Geoff Voelker] voelker@cs.ucsd.edu

SLIDE 3

In Action… In Action…

Client Server

http://www-cse.ucsd.edu

Client uses DNS to resolves name of server (www-cse.ucsd.edu)
Establishes an HTTP connection with the server over TCP/IP
Sends the server the name of the object (null)
Server returns the object

HTTP

[Voelker]

SLIDE 4

Naming and URLs Naming and URLs

How should objects be named?

URLs name objects and the virtual locations for those objects.

Location is a DNS name, so there’s two more levels of naming and indirection under there. Before hypertext we used to worry about access transparency.

Object name interpretation is up to the server, but it’s often a

location in the local file tree. If an object moves, the URL breaks (dangling reference).

Location-independent names seem like the obvious way to go

Why don’t we use them (e.g., URNs)?
How do we make them work, esp. in the face of mobility?

[from Voelker, with additions]

SLIDE 5

Protocols Protocols

What kind of transport protocol should the Web use? HTTP 1.0

One TCP connection/object
Complaints: inefficient, slow, burdensome…

HTTP 1.1

One TCP connection/many objects (persistent connections)
Solves all problems, right? Huge amount of complexity

Clients, proxies, servers

How do they compare?

Protocol differences [Krishnamurthy99], performance comparison

[Nielsen97], effects on servers [Manley97], overhead of TCP connections [Caceres98]

HTTPS: HTTP with encryption

[Voelker]

SLIDE 6

HTTP in a Nutshell HTTP in a Nutshell

HTTP supports request/response message exchanges of arbitrary length. Small number of request types: basically GET and POST, with supplements.

bject name, + content for POST
ptional query string
ptional request headers

Responses are self-typed objects (documents) with various attributes and tags.

ptional cookies
ptional response headers
Client

Server(s)

SLIDE 7

Scalable Servers Scalable Servers

Server

Of course, you are not the only person accessing the server…

SLIDE 8

Web Caching Web Caching

Gee, is there some way to offload those busy servers?
Use caches to exploit reference locality among clients

Clients Proxy Cache Servers [Voelker]

SLIDE 9

Caching Caching

How should we build caching systems for the Web?

Seminal paper [Chankhunthod96]
Proxy caches [Duska97]
Akamai hack [Karger99]
Cooperative caching [Tewari99, Fan98, Wolman99]
Popularity distributions [Breslau99]

[Voelker]

SLIDE 10

Issues for Web Caching Issues for Web Caching

binding clients to proxies, handling failover

manual configuration, router-based “transparent caching”, WPAD (Web Proxy Automatic Discovery)

proxy may confuse/obscure interactions between server and client
consistency management

At first approximation the Web is a wide-area read-only file service...but it is much more than that. caching responses vs. caching documents deltas [Mogul+Bala/Douglis/Misha/others@research.att.com]

prefetching, scale, request routing, scale, performance

Web caching vs. content distribution (e.g., Akamai) A few weeks from now...

SLIDE 11

HTTP 1.1 HTTP 1.1

Specification effort started in W3C, finished in IETF....much later.

A number of research works influenced the specification. HTTP 1.0 shows the importance of careful specification.

performance

persistent connections with pipelining range requests, incremental update, deltas

caching

cache control headers

negotiation of content attributes and encodings
content attributes vs. transport attributes

transport encodings for transmission through proxies

Trailer header and trailer headers

SLIDE 12

Persistent Connections Persistent Connections

There are three key performance reasons for persistent connections:

connection setup overhead
TCP slow start: just do it and get it over with
pipelining as an alternative to multiple connections

And some new complexities resulting from their use, e.g.:

request/response framing and pairing
unexpected connection breakage

Just ask anyone from Akamai...

large numbers of active connections

How long to keep connections around?

These motivations and issues manifest in HTTP, but they are fundamental for request/response messaging over TCP.

SLIDE 13

Cookies Cookies

HTTP cookies (RFC2109) have brought us a better Web.

S optionally includes arbitrary state as a cookie in a response.
Cookie is opaque to C, but C saves the cookie.
C sends the saved cookie in future requests to S, and possibly to
ther servers as well.
Allows stateful servers for sessions, personalized content, etc.

But: cookies raise privacy and security issues.

What did S put in that cookie? Can anyone else see it? How much

space does it take up on my disk that I paid soooo much for?

Cookies may allow third parties who are friends of S1,..., SN to
bserve C’s movements among S1,..., SN.

Unverifiable transactions, e.g., DoubleClick and other ad services.

SLIDE 14

Unverifiable Transactions Unverifiable Transactions

Users may not know that they are interacting with DoubleClick.

Amazon and MyCFO trust DoubleClick, but client is ignorant.

The user visits pages at many sites that reference DoubleClick.
DoubleClick’s cookie allows it to associate all the requests from a given user.
If the browser sends Referer headers, DoubleClick may gather information

about all the sites the user visits that reference DoubleClick. mycfo.com Client doubleclick, akamai, etc.

GET x GET y GET ad Referer mycfo.com

amazon.com

ad, cookie c ad GET ad, cookie c Referer amazon.com/x

SLIDE 15

Web Cache Consistency Web Cache Consistency

“Requirements of performance, availability, and disconnected operation require us to relax the goal of semantic transparency.”

HTTP 1.1 specification

Any caching/replication framework must take steps to ensure that the cache does not deliver old copies of modified objects. Issues for cache consistency in the Web:

large number of clients/proxies
most static objects don’t change very often
weaker consistency requirements

Stale information might be OK, as long as it is “not too stale”.

SLIDE 16

Cache Expiration and Validation Cache Expiration and Validation

HTTP 1.0 cache control

Origin server may add a “freshness date” (Expires) response header.

...or the cache could determine expiration time heuristically.

Proxy must revalidate cache entry if it has expired.

Last-Modified and If-Modified-Since

Whose clock do we use for absolute expiration times?

Clients Proxy Origin Server

GET x GET x GET x GET x GET x If-Modified-Since m x, Last-Modified m Expires t 304: Not Modified

SLIDE 17

Expiration and Validation in HTTP 1.1 Expiration and Validation in HTTP 1.1

HTTP 1.1 cache control allows origin server to:

use relative instead of absolute expiration times (max-age);
issue opaque validators (ETag for entity tag) instead of timestamps;

Origin server may specify which of several cached entries to use.

Clients Proxy Origin Server

GET x GET x GET x GET x GET x If-None-Match v x, ETag v max-age t 304: Not Modified, ETag v Age < t Age = 0

SLIDE 18

Other 1.1 Cache Control Features Other 1.1 Cache Control Features

Client may specify that no caching is to occur.

private or no-store

Vary headers allow server to specify that certain request headers

must also match if the proxy deems a cached response valid. language, character set, etc.

Server may specify that a response is not cacheable.

Pragma: no-cache header since HTTP 1.0

Client may explicitly request the proxy to validate the response.

Pragma: no-cache

Proxy may/should/must tell client the age of a cached response.

Age header

Proxy may/should/must tell client that it could not validate a non-

fresh cached response with the origin server. Warning header

SLIDE 19

The Dynamic Web The Dynamic Web

HTTP began as a souped-up FTP that supports hypertext URLs. Service builders rapidly began using it for dynamically-generated content. Web servers morphed into Web Application Servers. Common Gateway Interface (CGI) Java Servlets and JavaServer Pages (JSP) Microsoft Active Server Pages (ASP) Microsoft ASPs are not to be confused with Application Service Providers (ASPs).

Client

Server(s)

execute

program

SLIDE 20

Multi Multi-

tier Services

tier Services

Web application server relational databases Clients

HTTP

file servers

e.g., component “middleware” transaction monitors

middle tiers

HTTP RPC, RMI IIOP DCOM, EJB, CORBA, etc. JNDI, JDBC,SQL HTML+forms, applets, JavaScript, etc.

SLIDE 21

From Servers to Servlets From Servers to Servlets

Servlets are dynamically loaded Java classes/objects invoked by a Web server to process requests.

Servlets are to servers as applets are to browsers.
Servlet support converts standard Web servers into extensible

“Web application servers”.

designed as a Java-based replacement for CGI

Web server acts as a “connection manager” for the service body, which is specified as pluggable servlets. interface specified by JavaSoft, supported by major servers

Servlets can be used in any kind of server (not just HTTP).

Invocation triggers are defined by server; the servlet does not know or care how it is invoked.

SLIDE 22

Anatomy of a Servlet Anatomy of a Servlet

Servlet

ServletContext

init(ServletConfig config) String getServletInfo() service(....) destroy()

network service

(servlet container)

String getServerInfo() Object getAttribute(name) String getMimeType(name)

getResource*(name)

log(string)

ServletConfig

String getInitParameter(name) ServletContext getServletContext() Enumeration getInitParameterNames()

GenericServlet

(implements)

SLIDE 23

Invoking a Servlet Invoking a Servlet

Servlet

service(ServletRequest, ServletResponse)

ServletRequest

getContentLength, getContentType, getRemoteAddr, getRemoteHost, getInputStream, getParameter(name), getParameterValues(name), network service

ServletInputStream

readline(...)

ServletResponse

setContentType(MIME type) getOutputStream()

ServletOutputStream

print(...) println(...) ???

SLIDE 24

HTTP Servlets HTTP Servlets

HttpServlet

service(...) doGet() doHead() doPost()...

HttpServletRequest

getCookies(), getRemoteUser(), getAuthType(), getHeader(name), getHeaderNames(), HttpSession getSession()

HttpServletResponse

addCookie(), setStatus(code, msg), setHeader(name, value), sendRedirect(), encodeUrl() GenericServlet ServletResponse ServletRequest

SLIDE 25

HelloWorld Servlet HelloWorld Servlet

import java.io.; import javax.servlet.; public class HelloWorld extends GenericServlet { public void service(ServletRequest request, ServletResponse response) throws ServletException, IOException { ... } public String getServletInfo() { return “Hello World Servlet"; } }

SLIDE 26

HelloWorld Servlet (continued) HelloWorld Servlet (continued)

public void service(ServletRequest request, ServletResponse response) throws ServletException, IOException { ServletOutputStream output = response.getOutputStream(); String fromWho = request.getParameter(“from"); response.setContentType(“text/html"); if (fromWho == null) {

utput.println(“<p>Hello world!");

} else {

utput.println(“<p>Hello world from <em>"

+ fromWho + “</em>"); } }

SLIDE 27

Example 1: Invoking a Servlet by URL Example 1: Invoking a Servlet by URL

Most servers allow a servlet to be invoked directly by URL.

client issues HTTP GET

e.g., http://www.yourhost/servlet/HelloWorld

servlet specified by HTTP POST

e.g., with form data

<FORM ACTION=“http://yourhost/servlet/HelloWorld" METHOD=“POST"> From : <INPUT TYPE=“TEXT" NAME=“from" SIZE=“20"> <INPUT TYPE=“SUBMIT" VALUE=“Submit"> </FORM> generates a URL-encoded query string, e.g., “<servletURL>?from=me”