The Essence of XML J er ome Sim eon, Bell Labs, Lucent Philip - - PowerPoint PPT Presentation

the essence of xml
SMART_READER_LITE
LIVE PREVIEW

The Essence of XML J er ome Sim eon, Bell Labs, Lucent Philip - - PowerPoint PPT Presentation

The Essence of XML J er ome Sim eon, Bell Labs, Lucent Philip Wadler, Avaya Labs The Evolution of Language 2 x (Descartes) x. 2 x (Church) (McCarthy) (LAMBDA (X) (* 2 X)) <?xml version="1.0"?> <LAMBDA-TERM>


slide-1
SLIDE 1

The Essence of XML

J´ erˆ

  • me Sim´

eon, Bell Labs, Lucent Philip Wadler, Avaya Labs

slide-2
SLIDE 2

The Evolution of Language

slide-3
SLIDE 3

2x (Descartes)

slide-4
SLIDE 4

λx. 2x (Church)

slide-5
SLIDE 5

(LAMBDA (X) (* 2 X)) (McCarthy)

slide-6
SLIDE 6

<?xml version="1.0"?> <LAMBDA-TERM> <VAR-LIST> <VAR>X</VAR> </VAR-LIST> <EXPR> <APPLICATION> <EXPR><CONST>*</CONST></EXPR> <ARGUMENT-LIST> <EXPR><CONST>2</CONST></EXPR> <EXPR><VAR>X</VAR></EXPR> </ARGUMENT-LIST> </APPLICATION> </EXPR> </LAMBDA-TERM>

(W3C)

slide-7
SLIDE 7

XML everywhere!

slide-8
SLIDE 8
slide-9
SLIDE 9
slide-10
SLIDE 10
slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14

The Essence of XML

slide-15
SLIDE 15

XML vs. S-expressions

<foo>1 2 3</foo> (foo ”1 2 3”) (foo 1 2 3) <bar>1 two 3</bar> (bar 1 ”two” 3) (bar 1 ”two” ”3”)

slide-16
SLIDE 16

XML Schema and Validation

<foo>1 2 3</foo> ⇓ element foo of type integer-list { 1, 2, 3 } ⇓ <foo>1 2 3</foo> <xs:simpleType name=”integer-list”> <xs:list itemType=”xs:integer”/> </xs:simpleType> <xs:element name=”foo” type=”integer-list”/>

slide-17
SLIDE 17

Mixing it up

<bar>1 two 3</bar> ⇓ element bar of type mixed-list { 1, ”two”, 3 } ⇓ <bar>1 two 3</bar> <xs:simpleType name=”mixed-list”> <xs:list> <xs:union memberTypes=”xs:integer xs:string”/> </xs:list> </xs:simpleType> <xs:element name=”bar” type=”mixed-list”/>

slide-18
SLIDE 18

Really mixing it up

element bar of type mixed-list { 1, ”two”, ”3” } ⇓ <bar>1 two 3</bar> ⇓ element bar of type mixed-list { 1, ”two”, 3 } <xs:simpleType name=”mixed-list”> <xs:list> <xs:union memberTypes=”xs:integer xs:string”/> </xs:list> </xs:simpleType> <xs:element name=”bar” type=”mixed-list”/>

slide-19
SLIDE 19

The Essence of XML

  • The problem it solves is not hard.
  • It doesn’t solve it very well.
slide-20
SLIDE 20

The Essence of XML

  • The problem it solves is not hard.
  • It doesn’t solve it very well.
  • (Not entirely fair:

XML is based on SGML, which was aimed at documents, not data)

  • (NB. “Essence” is used in the same sense as

Reynolds “The Essence of Algol” Harper and Mitchell “The Essence of ML” Wadler “The Essence of Functional Programming”)

slide-21
SLIDE 21

Our contribution

  • XML and Schema are in widespread use,

so worth some effort to model.

  • We give a foundational theory.
  • Validation differs from matching.
  • We characterize validation with a theorem.
  • Simple version in paper,

less simple in XQuery formal semantics.

slide-22
SLIDE 22

What’s in a name?

slide-23
SLIDE 23

Structural types vs. Named types

type Feet = Integer type Miles = Integer

  • Structural: two names for the same thing
  • Named: two distinct types
slide-24
SLIDE 24

Named typing and strategic defense

enter height? 10023

slide-25
SLIDE 25

Named typing and strategic defense

enter height? 10023

slide-26
SLIDE 26

Named typing and strategic defense

enter height? 10023

slide-27
SLIDE 27

Schema and XQuery

slide-28
SLIDE 28

XML Schema

<xs:simpleType name=”integer-list”> <xs:list itemType=”xs:integer”/> </xs:simpleType> <xs:element name=”foo” type=”integer-list”/> <xs:simpleType name=”mixed-list”> <xs:list> <xs:union memberTypes=”xs:integer xs:string”/> </xs:list> </xs:simpleType> <xs:element name=”bar” type=”integer-list”/>

slide-29
SLIDE 29

XQuery

define type integer-list { xs:integer* } define element foo of type integer-list define type mixed-list { (xs:integer|xs:string)* } define element bar of type mixed-list

slide-30
SLIDE 30

Schema

<xs:simpleType name=”feet”> <xs:restriction base=”xs:integer”/> </xs:simpleType> <xs:simpleType name=”miles”> <xs:restriction base=”xs:integer”/> </xs:simpleType> <xs:element name=”configuration”> <xs:complexType> <xs:sequence> <xs:element name=”shuttle” type=”miles”/> <xs:element name=”laser” type=”feet”/> </xs:sequence> </xs:complexType> </xs:element>

slide-31
SLIDE 31

XQuery

define type feet restricts xs:integer define type miles restricts xs:integer define element configuration of type configuration.type define type configuration.type { element shuttle of type feet, element laser of type miles }

slide-32
SLIDE 32

Validation, Matching, and Erasure

slide-33
SLIDE 33

Data model

<configuration> <shuttle>120</shuttle> <laser>10023</laser> </configuration> = element configuration { element shuttle { ”120” }, element laser { ”10023” } }

slide-34
SLIDE 34

Validation

validate as Type { UntypedValue } ⇒ Value validate as element configuration { element configuration { element shuttle { ”120” }, element laser { ”10023” } } } ⇒ element configuration of type configuration.type { element shuttle of type miles { 120 }, element laser of type feet { 10023 } }

slide-35
SLIDE 35

Matching

Value matches Type element configuration of type configuration.type { element shuttle of type miles { 120 }, element laser of type feet { 10023 } } matches element configuration of type configuration.type

slide-36
SLIDE 36

Matching depends on type names

Value matches Type element configuration of type configuration.type { element shuttle of type miles { 120 }, element laser of type miles { 10023 } } matches element configuration of type configuration.type

(not!)

slide-37
SLIDE 37

Unvalidated data does not match

element configuration { element shuttle { ”120” }, element laser { ”10023” } } matches element configuration of type configuration.type

(not!)

slide-38
SLIDE 38

Erasure

Value erases to UntypedValue element configuration of type configuration.type { element shuttle of type miles { 120 }, element laser of type feet { 10023 } } erases to element configuration { element shuttle { ”120” }, element laser { ”10023” } }

slide-39
SLIDE 39

Erasure is a relation

validate as xs:integer ( ”7” ) ⇒ 7 validate as xs:integer ( ”007” ) ⇒ 7 7 erases to ”7” 7 erases to ”007”

slide-40
SLIDE 40

Inference rules

slide-41
SLIDE 41

Matching: Sequence and choice

() matches () Value1 matches Type1 Value2 matches Type2 Value1 , Value2 matches Type1 , Type2 Value matches Type1 Value matches Type1 | Type2 Value matches Type2 Value matches Type1 | Type2

slide-42
SLIDE 42

Matching: Occurrence and base types

Value matches () | Type Value matches Type ? Value matches Type , Type * Value matches Type + Value matches Type + ? Value matches Type * AtomicTypeName derives from xs:string String matches AtomicTypeName AtomicTypeName derives from xs:integer Integer matches AtomicTypeName

slide-43
SLIDE 43

Matching: Element

ElementType yields BaseElementName of type BaseTypeName BaseTypeName resolves to Type ElementName substitutes for BaseElementName TypeName derives from BaseTypeName Value matches Type element ElementName of type TypeName { Value } matches ElementType

slide-44
SLIDE 44

Validation: Element

ElementType yields BaseElementName of type BaseTypeName BaseTypeName resolves to Type ElementName substitutes for BaseElementName validate as Type { UntypedValue } ⇒ Value validate as ElementType { element ElementName { UntypedValue } } ⇒ element ElementName of type TypeName { Value }

slide-45
SLIDE 45

The validation theorem

slide-46
SLIDE 46

The validation theorem

Theorem We have that validate as Type { UntypedValue } ⇒ Value if and only if Value matches Type Value erases to UntypedValue.

  • Obvious in retrospect, not so obvious in prospect.
  • Trick is to make validation and erasure into relations.
slide-47
SLIDE 47

Ambiguity and Roundtripping

Definition The type Type is unambiguous for validation if for every UntypedValue there is at most one Value such that validate as Type { UntypedValue } ⇒ Value. Corollary (Roundtripping) If Value matches Type Value erases to UntypedValue validate as Type { UntypedValue } ⇒ Value′ Type is unambiguous for validation then Value = Value′.

slide-48
SLIDE 48

Example: An unambiguous type

element foo of type integer-list { 1, 2, 3 } erases to <foo>1 2 3</foo> validate as element foo { <foo>1 2 3</foo> } ⇒ element foo of type integer-list { 1, 2, 3 }

slide-49
SLIDE 49

Example: An ambiguous type

element bar of type mixed-list { ”1”, ”two”, ”3” } erases to <bar>1 two 3</bar> validate as element bar { <bar>1 two 3</bar> } ⇒ element bar of type mixed-list { 1, ”two”, 3 }

slide-50
SLIDE 50

Conclusions

slide-51
SLIDE 51
slide-52
SLIDE 52
slide-53
SLIDE 53
slide-54
SLIDE 54

The Essence of XML

  • Validation

validate as Type { UntypedValue } ⇒ Value

  • Matching

Value matches Type

  • Erasure

Value erases to UntypedValue

  • Validation Theorem

Theorem We have that validate as Type { UntypedValue } ⇒ Value if and only if Value matches Type Value erases to UntypedValue.

slide-55
SLIDE 55

XQuery formal semantics (not in paper)

  • Dynamic Semantics

DynEnv ⊢ Expr ⇒ Value

  • Static Semantics

StatEnv ⊢ Expr : Type

  • Type Soundness

Theorem If DynEnv ⊢ Expr ⇒ Value StatEnv ⊢ Expr : Type then Value matches Type.

slide-56
SLIDE 56

Success stories

  • XQuery has two specifications, one in prose and one using

formal methods — one of the first uses of formal methods in an industrial standard.

  • Formalization of named typing raised ten issues not resolved

in the prose specification.

  • XQuery face-to-face, Chapel Hill, NC, 17–18 October 2002:

After presentation of formal semantics of pure named typing, it was accepted without dissent. In the two-day meeting, this was the only decision adopted without dissent.

  • Our techniques also adopted by James Clark and Makoto

Murata to formalize Relax NG, another industrial standard.

slide-57
SLIDE 57
slide-58
SLIDE 58
slide-59
SLIDE 59

Action items

  • Paper in POPL proceedings misprinted; get it from the web.
  • Review XQuery and send us your comments!