Caching in with Resolvers Norman Walsh http://www.sun.com/ XML - - PowerPoint PPT Presentation

caching in with resolvers
SMART_READER_LITE
LIVE PREVIEW

Caching in with Resolvers Norman Walsh http://www.sun.com/ XML - - PowerPoint PPT Presentation

Caching in with Resolvers Norman Walsh http://www.sun.com/ XML Conference & Exposition 2003 07-12 December 2003 Version 1.0 Introduction Using URIs The Problem Solutions Catalog-based Resolution Proxy Caches


slide-1
SLIDE 1

Caching in with Resolvers

Version 1.0

http://www.sun.com/

Norman Walsh

XML Conference & Exposition 2003 07-12 December 2003

slide-2
SLIDE 2
  • Using URIs
  • The Problem
  • Solutions
  • Catalog-based Resolution
  • Proxy Caches
  • Parting Thoughts and Q&A

2 / 47 http://www.sun.com/

Introduction

slide-3
SLIDE 3

Relative URIs Absolute URIs on the Local File System Absolute URIs on the Network How do we use URIs to address resources?

3 / 47 http://www.sun.com/

Using URIs

slide-4
SLIDE 4
  • dbpoolx.rng
  • ../xml/docbookx.dtd
  • ../../xsl/html/docbook.xsl

These are context dependent.

4 / 47 http://www.sun.com/

Relative URIs

slide-5
SLIDE 5
  • file:///c:/xml/docbook42/docbookx.dtd
  • file:///share/schemas/relax-ng/docbook/4.2/docbook.rng
  • file:///export/home/john/doctypes/xml/docbook/4.2/doc-

bookx.dtd These identifiers are only useful (in general) on the system where they were created.

5 / 47 http://www.sun.com/

Absolute URIs on the Local File System

slide-6
SLIDE 6
  • http://docbook.org/rng/4.2/docbook.rng
  • http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd
  • urn:publicid:-:OASIS:DTD+DocBook+XML+V4.2:EN
  • http://docbook.sourceforge.net/release/xsl/cur-

rent/html/docbook.xsl These offer the most unambiguous identification.

6 / 47 http://www.sun.com/

Absolute URIs on the Network

slide-7
SLIDE 7

In Theory… In Practice… Demo Therefore… An old chestnut: in theory, theory and practice are the same but in practice, they aren’t.

7 / 47 http://www.sun.com/

The Problem

slide-8
SLIDE 8

Standards encourage us design for a perfect world. Namespace documents, schemas, stylesheets, and other files are identified by URIs on the global web where:

  • The network is universally available and
  • There is no latency

8 / 47 http://www.sun.com/

In Theory…

slide-9
SLIDE 9
  • Networks go down
  • Firewalls and security measures interfere with access
  • Our machines are sometimes physically disconnected
  • Latency is sometimes significant

9 / 47 http://www.sun.com/

In Practice…

slide-10
SLIDE 10

Demo 1

10 / 47 http://www.sun.com/

Demo

slide-11
SLIDE 11
  • While it’s useful, interoperable, and important to identify

documents with URIs on the global web,

  • It is convenient, sometimes necessary, to store representa-

tions locally and

  • To access them transparently instead of “hitting the web”

for them each time

11 / 47 http://www.sun.com/

Therefore…

slide-12
SLIDE 12

Brute Force Example (1) Example (2)

12 / 47 http://www.sun.com/

Brute Force

slide-13
SLIDE 13

The brute force “solution” is to edit every document so that it explicitly references local resources:

  • Replace absolute URIs to resources on the global web with

absolute or relative references to URIs on the local file sys- tem

  • Do this every time you exchange documents with colleagues
  • (Hopefully you don’t need any digital signatures)

13 / 47 http://www.sun.com/

Brute Force

slide-14
SLIDE 14

<!DOCTYPE book SYSTEM "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"> <?xml-stylesheet href="http://www.example.org/styles/style.xsl" type="text/css"?> <book>…</book> Becomes <!DOCTYPE book SYSTEM "/path/to/docbookx.dtd"> <?xml-stylesheet href="/local/copy/of/style.xsl" type="text/css"?> <book>…</book>

14 / 47 http://www.sun.com/

Example (1)

slide-15
SLIDE 15

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:exsl="http://exslt.org/common" version="1.0" exclude-result-prefixes="exsl"> <xsl:import href="http://docbook.sourceforge.net/release/xsl/current/html/docbook.xsl"/> <xsl:import href="http://docbook.sourceforge.net/release/xsl/current/html/chunk-common.xsl"/> <xsl:include href="http://docbook.sourceforge.net/release/xsl/current/html/manifest.xsl"/> … Becomes <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:exsl="http://exslt.org/common"

15 / 47 http://www.sun.com/

Example (2)

slide-16
SLIDE 16

version="1.0" exclude-result-prefixes="exsl"> <xsl:import href="/local/copy/of/docbook.xsl"/> <xsl:import href="/local/copy/of/chunk-common.xsl"/> <xsl:include href="/local/copy/of/manifest.xsl"/> …

Example (2) (Continued)

slide-17
SLIDE 17

What Is It? Catalog History An XML Catalog Catalog Features Mapping External Identifiers Mapping URIs Chaining Catalogs Rewriting Delegation Extension Miscellany Catalog: Pro …

17 / 47 http://www.sun.com/

Catalog-based Resolution

slide-18
SLIDE 18
  • Explicit mapping from global identifiers to local identifiers
  • Maintained by hand (or by local system configuration, e.g.,

Debian)

  • Relies on a resolver in the application
  • The focus of this presentation is XML Catalogs, developed

by the Entity Resolution Technical Committee at OASIS

18 / 47 http://www.sun.com/

What Is It?

slide-19
SLIDE 19

XML Catalogs…

  • Developed by the Entity Resolution Technical Committee at

OASIS

  • Have the same semantics as SGML Open Catalogs, where

appropriate

  • Support normatively only the SGML Open Catalog entries

relevant to XML

19 / 47 http://www.sun.com/

Catalog History

slide-20
SLIDE 20

<?xml version='1.0'?> <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> <public publicId="-//OASIS//DTD DocBook XML V4.2//EN" uri="/share/doctypes/docbook42/xml/docbookx.dtd"/> <system systemId="http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" uri="/share/doctypes/docbook42/xml/docbookx.dtd"/> <uri name="http://docbook.org/rng/4.2/docbook.rng" uri="schema/relaxng/docbook.rng"/> </catalog>

20 / 47 http://www.sun.com/

An XML Catalog

slide-21
SLIDE 21

What do catalogs provide? They can…

  • Map external identifiers
  • Map URIs
  • Be chained together for modularity
  • Rewrite system identifiers and URIs
  • Delegate mapping to another catalog
  • Be extended

21 / 47 http://www.sun.com/

Catalog Features

slide-22
SLIDE 22

The “public” and “system” entries map external identifiers to local resources based on public and/or system identifiers. <public publicId="-//OASIS//DTD DocBook XML V4.2//EN" uri="/share/doctypes/docbook42/xml/docbookx.dtd"/> <system systemId="http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" uri="/share/doctypes/docbook42/xml/docbookx.dtd"/>

22 / 47 http://www.sun.com/

Mapping External Identifiers

slide-23
SLIDE 23

The “uri” entry maps one URI to another. <uri name="http://docbook.org/rng/4.2/docbook.rng" uri="schema/relaxng/docbook.rng"/> <uri name="http://docbook.sf.net/xsl/fo/docbook.xsl" uri="xsl/fo/docbook.xsl"/> <uri name="http://docbook.sf.net/bibl/bibl.xml" uri="file:/home/ndw/.bibliography.xml"/> <uri name="http://docbook.sf.net/images/draft.png" uri="xsl/images/draft.png"/>

23 / 47 http://www.sun.com/

Mapping URIs

slide-24
SLIDE 24

Catalogs can be chained together. For example, consider /etc/catalog.xml: <?xml version='1.0'?> <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> <nextCatalog catalog="/usr/share/docbook/4.2/catalog.xml"/> <nextCatalog catalog="/usr/share/html/4.01/catalog.xml"/> <nextCatalog catalog="/usr/share/entities/8879.1986/catalog.xml"/> </catalog>

24 / 47 http://www.sun.com/

Chaining Catalogs

slide-25
SLIDE 25

Rewriting is a convenient method of moving an entire tree <rewriteSystem systemIdStartString="http://www.w3.org/" rewritePrefix="/projects/w3c/WWW/"/> This entry would map: http://www.w3.org/2002/xmlspec/dtd/2.3/xmlspec.dtd to /projects/w3c/WWW/2002/xmlspec/dtd/2.3/xmlspec.dtd

25 / 47 http://www.sun.com/

Rewriting

slide-26
SLIDE 26

Delegation turns control over to another catalog. <delegatePublic publicIdStartString="-//Example//" catalog="http://example.com/catalog.xml"/> (We’ll come back to this example later because the “http” URI in the catalog attribute is interesting.)

26 / 47 http://www.sun.com/

Delegation

slide-27
SLIDE 27

The XML Catalog format is extensible. For example, an “suffix rewriting” extension: <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog" xmlns:sfx="http://nwalsh.com/xcatalog/1.0"> <sfx:systemSuffix suffix="docbookx.dtd" uri="/share/doctypes/docbook42/xml/docbookx.dtd"/> </catalog> This extension replaces any system identifier that ends in “docbookx.dtd” with the specified URI.

27 / 47 http://www.sun.com/

Extension

slide-28
SLIDE 28
  • Relative URIs in the catalog are resolved against the current

base URI

  • XML Catalogs support xml:base
  • Entries can be grouped for convenience

28 / 47 http://www.sun.com/

Miscellany

slide-29
SLIDE 29

Catalogs can be:

  • Easily configured without privileged access to the machine
  • Changed on a per-application basis, if necessary
  • Managed automatically by install processes
  • Configured manually, without every having network access

to the resources managed

  • Extended by resolvers

29 / 47 http://www.sun.com/

Catalog: Pro

slide-30
SLIDE 30

On the other hand, XML Catalogs:

  • must be explicitly maintained, either directly by the user or

by processes the user runs

  • nly function for applications that explicitly support XML

Catalogs (or are built on top of libraries that explictly support them)

30 / 47 http://www.sun.com/

Catalog: Con

slide-31
SLIDE 31

Demo 2

31 / 47 http://www.sun.com/

Demo

slide-32
SLIDE 32

What Are They? Proxy Features Configuration Example Configuration Example Cache: Pro Cache: Con (1) Cache: Con (2)

32 / 47 http://www.sun.com/

Caching Proxies

slide-33
SLIDE 33
  • A system-level cache of recently accessed resources
  • A proxy cache sits between the system on which it is running

and the rest of the network

  • The concrete examples in this presentation are from the

World Wide Web Offline Explorer or WWWOffle.

33 / 47 http://www.sun.com/

What Are They?

slide-34
SLIDE 34
  • Configured on a system-wide basis.
  • Stores anything that it doesn’t consider “local”
  • Sometimes offers features to control advertising and spam
  • Can sometimes be tuned for better XML support.

34 / 47 http://www.sun.com/

Proxy Features

slide-35
SLIDE 35

What’s local? LocalHost { localhost 127.0.0.1 mercury }

35 / 47 http://www.sun.com/

Configuration Example

slide-36
SLIDE 36

What’s important? Purge { <http://www.w3.org/> age = 2y <http://lists.w3.org/> age = 3y <http://www.oasis-open.org/> age = 2y <http://norman.walsh.name/> age = 1 <ftp://*> age = 7 age = 4w } Additional rules could be added to address specific patterns of URI (e.g., URIs that end in .xsl).

36 / 47 http://www.sun.com/

Configuration Example

slide-37
SLIDE 37

Proxy caches:

  • Operate transparently, requiring no explicit setup by the

user

  • Are applicable to almost every application that access the

network

37 / 47 http://www.sun.com/

Cache: Pro

slide-38
SLIDE 38

On the other hand, caches:

  • are substantially more complex to configure and may require

privileged access to the machine

  • apply globally and cannot be configured on a per-application

basis

38 / 47 http://www.sun.com/

Cache: Con (1)

slide-39
SLIDE 39
  • nly cache resources that can be accessed at least once.

You can’t, for example, install a package that you received in email and expect it to work without actually receiving it at least once.

  • may discard resources that have not been accessed for some

period of time

  • are not easily extensible

39 / 47 http://www.sun.com/

Cache: Con (2)

slide-40
SLIDE 40

RDDL DDDS

40 / 47 http://www.sun.com/

Other Mechanisms

slide-41
SLIDE 41
  • Provides a level of indirection for locating representations

associated with a URI.

  • Allows applications to say not just what they want, but why.
  • Solves a slightly different problem, operating conceptually

above the resolver used in catalog systems.

  • With some improvements to existing APIs, would be very

useful indeed.

41 / 47 http://www.sun.com/

RDDL

slide-42
SLIDE 42
  • Designed as an extension of DNS for resolving URIs
  • Is really a system for binding strings to data
  • Not yet widely deployed
  • Is likely to depend on network services that may not be

readily available on disconnected machines

42 / 47 http://www.sun.com/

DDDS

slide-43
SLIDE 43

Catalogs and Caches Parting Thoughts References Q&A

43 / 47 http://www.sun.com/

Conclusions

slide-44
SLIDE 44

Remember our delegation example? Suppose you’re offline? <delegatePublic publicIdStartString="-//Example//" catalog="http://example.com/catalog.xml"/>

  • The application asks the catalog to do resolution…
  • The resolver asks for the delegated catalog…and the cache

provides it

  • The resolver continues processing the new catalog…

44 / 47 http://www.sun.com/

Catalogs and Caches

slide-45
SLIDE 45
  • XML Catalogs are easy to use, easy to install, and easy to

extend

  • Caches are transparent and work with almost any application
  • Best of all: they can work together
  • In point of fact, I use both everyday and I wouldn’t want to

give either up.

45 / 47 http://www.sun.com/

Parting Thoughts

slide-46
SLIDE 46
  • The OASIS Entity Resolution Technical Committee, ht-

tp://www.oasis-open.org/committees/entity/

  • The World Wide Web Offline Explorer, ht-

tp://www.gedanken.demon.co.uk/wwwoffle/

  • Apache XML Commons, http://xml.apache.org/commons/
  • libxml2, http://xmlsoft.org/

46 / 47 http://www.sun.com/

References

slide-47
SLIDE 47

I can be reached at <Norman.Walsh@Sun.COM>

47 / 47 http://www.sun.com/

Q&A