Constraining Content XCSL XML Constraint Specification Language Jos - - PDF document

constraining content
SMART_READER_LITE
LIVE PREVIEW

Constraining Content XCSL XML Constraint Specification Language Jos - - PDF document

J R o o h l a m a s o l r a C s 2 4 0 9 / 8 2 / Constraining Content XCSL XML Constraint Specification Language Jos Carlos Ramalho jcr@di.uminho.pt Contents Behind the scene Examples CODE


slide-1
SLIDE 1 J
  • s
é C a r l
  • s
R a m a l h
  • 9
/ 2 8 / 2 4 S G M L D
  • c
u m e n t s : W h e r e D
  • e
s Q u a l i t y G
  • ?
1

Constraining Content

José Carlos Ramalho jcr@di.uminho.pt

XCSL –XML Constraint Specification Language

XML Europe 2001 - 22/25 May - Berlin - Germany 2

Contents

  • Behind the scene
  • Examples
  • Constraint Specification Language
  • System Arquitecture
  • Example solutions
  • Future work

CODE +

  • ther inner parts
slide-2
SLIDE 2 J
  • s
é C a r l
  • s
R a m a l h
  • 9
/ 2 8 / 2 4 S G M L D
  • c
u m e n t s : W h e r e D
  • e
s Q u a l i t y G
  • ?
2

XML Europe 2001 - 22/25 May - Berlin - Germany 3

Behind the scene

  • SGML96 (Boston)

– Semantic Validation: two approaches

  • SGML97 (Washington)

– Where does quality go?

  • Markup Languages Journal

– Processing constraints

  • Markup Technologies 98 (Chicago)

– Attribute Grammar approach

XML Europe 2001 - 22/25 May - Berlin - Germany 4

Behind the scene (2)

  • Phd thesis – 2000

– Low-level functional approach – Formal specification of XCSL:

  • Attribute grammar
  • XAM specification

– Processor: had to be written in the host functional language (SSL, Haskell)

  • Desired goal

– To have everything in the same declarative paradigm

slide-3
SLIDE 3 J
  • s
é C a r l
  • s
R a m a l h
  • 9
/ 2 8 / 2 4 S G M L D
  • c
u m e n t s : W h e r e D
  • e
s Q u a l i t y G
  • ?
3

XML Europe 2001 - 22/25 May - Berlin - Germany 5

Motivation

  • To meet some particular project needs
  • Sometimes we do not want a Schema or a

DTD

– We just want to constraint certain document parts

  • Requirements:

– The solution should work with existing tools and standards – The solution should work across every platform

XML Europe 2001 - 22/25 May - Berlin - Germany 6

Type inference

Problem: how to process ... ?

Constraint: latitude > 39 and latitude < 42.5 Document: ...<latitude>41.32</latitude>...

Answer:

...<latitude type=“float”>41.32</latitude>...

Can be transparent to users: #FIXED attributes

slide-4
SLIDE 4 J
  • s
é C a r l
  • s
R a m a l h
  • 9
/ 2 8 / 2 4 S G M L D
  • c
u m e n t s : W h e r e D
  • e
s Q u a l i t y G
  • ?
4

XML Europe 2001 - 22/25 May - Berlin - Germany 7

Value normalization

Problem: How can I identify ...?

... King <name>Affonso</name> proclaimed several ... ... And his soldiers battled against <name>Afonso</name>. ...and that church was built in the <date>XVIII century</date>. ...it all happened on <date> the fifth October</date>...

Answer

...King <name value=“Afonso”>Affonso</name>... ...it all happened on <date value=“xxxx.10.05”>the fifth... User awareness is needed!

XML Europe 2001 - 22/25 May - Berlin - Germany 8

Programs ⇔ XML Documents

  • Have a support

language formally defined

  • Processing - compiler

– lexical analysis – syntactic analysis – semantic analysis

  • complex: type checking;

type inference, ...

  • Can be formally

specified: Attribute Grammars

  • Have a support markup

language defined in XML

  • Processing - parser

– lexical analysis – syntactic analysis – semantic analysis

  • very simple: ID - IDREF

coupling

slide-5
SLIDE 5 J
  • s
é C a r l
  • s
R a m a l h
  • 9
/ 2 8 / 2 4 S G M L D
  • c
u m e n t s : W h e r e D
  • e
s Q u a l i t y G
  • ?
5

XML Europe 2001 - 22/25 May - Berlin - Germany 9

Constraints: what type? i. Domain range checking

  • Normally data is of type numeric or date

ii. Dependencies between two elements or attributes

  • iii. Pattern matching against a Regular

Expression

  • Enforcing content to follow a certain format
  • Example: [0-9]{4}\.[0-9]{2}\.[0-9]{2}
  • iv. Quantified constraints … (???)

XML Europe 2001 - 22/25 May - Berlin - Germany 10

Example1: students

<?xml version="1.0"?> <students> <student> <name>Peter Weird</name> <grades> <grade1>12</grade1> <grade2>8</grade2> <grade3>15</grade3> </grades> </student> <student> <name>Jose Almeida</name> <grades> <grade1>9</grade1> <grade2>18</grade2> <grade3>7</grade3> </grades> </student> </students>

We want:

  • to ensure that each grade is greater

than 10

  • to give a warning message for each

grade below 10

domain range (i)

slide-6
SLIDE 6 J
  • s
é C a r l
  • s
R a m a l h
  • 9
/ 2 8 / 2 4 S G M L D
  • c
u m e n t s : W h e r e D
  • e
s Q u a l i t y G
  • ?
6

XML Europe 2001 - 22/25 May - Berlin - Germany 11

Example2: linguistics

<?xml version="1.0"?> <doc> <sentence> <noun number="s" genre="f">Alice</noun> <verb time="present" number="s“ person="3">drinks</verb> </sentence> <sentence> <noun number="p">Dogs</noun> <verb time="present" number="s" person="3">barks</verb> </sentence> <sentence> <noun genre="f" number="s">Diana</noun> <verb number="s">are</verb> <adj genre="m" number="s">beautiful</adj> </sentence> </doc>

In portuguese:

  • verbs must agree with correspondent nouns in

person and number

  • adjectives must agree with correspondent nouns

in genre and number

XML Europe 2001 - 22/25 May - Berlin - Germany 12

Example3: XDBML

<?xml version="1.0"?> <DB> <STRUCTURE> <TABLE NAME="items"> <COLUMNS> <COLUMN NAME="code" TYPE="nvarchar" SIZE="10" NULL="no"/> <COLUMN NAME="description" TYPE="nvarchar" SIZE="50" NULL="no"/> ... </COLUMNS> <KEYS> <PKEYS TYPE="simple"> <PKEY NAME="code"/> </PKEYS> </KEYS> </TABLE> </STRUCTURE> <DATA> <items> <items-REG> <code>a111</code> <description>leite agros meio-gordo</description> </items-REG> <items-REG> <code>a115</code> <description>leite agros chocolatado</description> </items-REG> <items-REG> <code>a112</code> <description>leite agros meio-gordo</description> </items-REG> <items-REG> <code>a115</code> <description>leite agros chocolatado</description> </items-REG> ... </DATA> </DB>

slide-7
SLIDE 7 J
  • s
é C a r l
  • s
R a m a l h
  • 9
/ 2 8 / 2 4 S G M L D
  • c
u m e n t s : W h e r e D
  • e
s Q u a l i t y G
  • ?
7

XML Europe 2001 - 22/25 May - Berlin - Germany 13

RDB XML document XML’ document RDB’ f

XDBML(2): the problem

XML Europe 2001 - 22/25 May - Berlin - Germany 14

XDBML(2): the problem

  • We want to transform the database in the XML axis

(with XSLT).

  • We want to upload the new database into a DBMS
  • We must ensure:

– Primary keys are still primary keys (challenging) – ID/IDREF attributes are globally unique – Other constraints are easy to deal with

  • Do we need quantifiers?

– For each key, test if its value is unique. – Are quantifiers already there?

slide-8
SLIDE 8 J
  • s
é C a r l
  • s
R a m a l h
  • 9
/ 2 8 / 2 4 S G M L D
  • c
u m e n t s : W h e r e D
  • e
s Q u a l i t y G
  • ?
8

XML Europe 2001 - 22/25 May - Berlin - Germany 15

Constraint Spec. Lang.

  • From author’s Phd

– Subset of XSLT – Composition: set of triples

ConstraintSpec = Constraint+ Constraint = (ContextSelector,ContextCondition,Action)

XPath Xpath predicates ?? code We can use a XSLT processor to handle Const.Processing

XML Europe 2001 - 22/25 May - Berlin - Germany 16

Architecture

XML instance Constraint Spec. XSL stylesheet Constraint Processor Generator Standard XSL Processor Error Messages...? XML XML XML

slide-9
SLIDE 9 J
  • s
é C a r l
  • s
R a m a l h
  • 9
/ 2 8 / 2 4 S G M L D
  • c
u m e n t s : W h e r e D
  • e
s Q u a l i t y G
  • ?
9

XML Europe 2001 - 22/25 May - Berlin - Germany 17

CSL with a XML wrapper

DTD: V1.0

<!ELEMENT cs (constraint)+> <!ELEMENT constraint (selector,cc,action)> <!ELEMENT selector EMPTY> <!ELEMENT cc EMPTY> <!ELEMENT action (message*)> <!ELEMENT message (#PCDATA|value)*> <!ELEMENT value EMPTY> <!ATTLIST cs dtd CDATA #IMPLIED date CDATA #IMPLIED version CDATA #IMPLIED > <!ATTLIST selector selexp CDATA #REQUIRED> <!ATTLIST cc cond CDATA #REQUIRED> <!ATTLIST value selexp CDATA #REQUIRED>

  • action is a message list
  • value is used to refer document

nodes inside messages

XML Europe 2001 - 22/25 May - Berlin - Germany 18

XCSL1: Students

<?xml version="1.0"?> <CS> <CONSTRAINT> <SELECTOR SELEXP="//student/grades/*"/> <CC>.&gt;10</CC> <ACTION> <MESSAGE>WARNING: <VALUE SELEXP="name(.)"/> of <VALUE SELEXP="../../name"/> is below minimum!</MESSAGE> </ACTION> </CONSTRAINT> </CS>

Spotting grades lower than 10.

slide-10
SLIDE 10 J
  • s
é C a r l
  • s
R a m a l h
  • 9
/ 2 8 / 2 4 S G M L D
  • c
u m e n t s : W h e r e D
  • e
s Q u a l i t y G
  • ?
1

XML Europe 2001 - 22/25 May - Berlin - Germany 19

XCSL2: linguistics

<?xml version="1.0"?> <!DOCTYPE CS SYSTEM "csl.dtd"> <CS> <CONSTRAINT> <SELECTOR SELEXP="//sentence"/> <CC>noun/@number=verb/@number</CC> <ACTION> <MESSAGE>Attribute number of "<VALUE SELEXP="noun"/>" does not agree with attribute number of <VALUE SELEXP="verb"/>"!!!</MESSAGE> </ACTION> </CONSTRAINT> <CONSTRAINT> <SELECTOR SELEXP="//sentence"/> <CC>noun/@genre=adj/@genre</CC> <ACTION> <MESSAGE>ERROR:<VALUE SELEXP="noun"/> e <VALUE SELEXP="adj"/> do not agree in genre!!!</MESSAGE> </ACTION> </CONSTRAINT></CS>

Testing agreements...

XML Europe 2001 - 22/25 May - Berlin - Germany 20

XCSL processors

  • Implemented with XML::DT:

– Perl module on top of XML::Parser – Presented at XML Europe’99 – It can be found on:

  • http://natura.di.uminho.pt
  • Comphreensive Perl Archive Network
  • XSL processing being done with:

– Saxon – Windows platforms – Xalan – Linux like platforms

slide-11
SLIDE 11 J
  • s
é C a r l
  • s
R a m a l h
  • 9
/ 2 8 / 2 4 S G M L D
  • c
u m e n t s : W h e r e D
  • e
s Q u a l i t y G
  • ?
1

XML Europe 2001 - 22/25 May - Berlin - Germany 21

CSL processor: V1.0

... %handler=( '-outputenc' => 'ISO-8859-1', '-default' => sub{"<$q>$c</$q>"}, 'SELECTOR' => sub{"$v{SELEXP}"}, 'CONSTRAINT' => sub{"<xsl:template match=\"$c\n</xsl:template> \n"}, 'MESSAGE' => sub{"\n ERROR: $c\n"}, 'VARIABLE' => sub{"<xsl:value-of select=\'$v{SELEXP}\'/>"}, 'CS' => sub{"$c"}, 'ACTION' => sub{"$c"}, 'CC' => sub{"[not($v{COND})]\">"}, ); ...

Algorithm:

  • convert each constraint into a template
  • use SELEXP for MATCH attribute
  • convert CC into a predicate: [...]
  • put MESSAGE contents inside template body
  • filter text nodes

XML Europe 2001 - 22/25 May - Berlin - Germany 22

Example1: students

<?xml version="1.0"?> <CS> <CONSTRAINT> <SELECTOR SELEXP="//student/grades/*"/> <CC>.&gt;10</CC> <ACTION> <MESSAGE>WARNING: <VALUE SELEXP="name(.)"/> of <VALUE SELEXP="../../name"/> is below minimum!</MESSAGE> </ACTION> </CONSTRAINT> </CS>

XCSL document

slide-12
SLIDE 12 J
  • s
é C a r l
  • s
R a m a l h
  • 9
/ 2 8 / 2 4 S G M L D
  • c
u m e n t s : W h e r e D
  • e
s Q u a l i t y G
  • ?
1 2

XML Europe 2001 - 22/25 May - Berlin - Germany 23

Example1: students

<xsl:template match="/"> <xsl:apply-templates/> </xsl:template> <!--########## NEW CONSTRAINT ##########--> <xsl:template match="//student/grades/*[not(.&gt;10)]"> WARNING: <xsl:value-of select='name(.)'/> of <xsl:value-of select='../../name'/> is below minimum! <xsl:apply-templates/> </xsl:template> <xsl:template match="text()" priority="-1"/> <xsl:template match="//text()" priority="-1"> </xsl:template>

Generated XSL stylesheet

XML Europe 2001 - 22/25 May - Berlin - Germany 24

Problems with V1.0

  • Optional elements
  • Non-filled elements

– Will cause errors – “You will catch more than you want, ...” similar to DB when we have SQL queries over Attributes that can be empty

slide-13
SLIDE 13 J
  • s
é C a r l
  • s
R a m a l h
  • 9
/ 2 8 / 2 4 S G M L D
  • c
u m e n t s : W h e r e D
  • e
s Q u a l i t y G
  • ?
1 3

XML Europe 2001 - 22/25 May - Berlin - Germany 25

Solution: CSL V2.0

<!ELEMENT cs (constraint)+> <!ELEMENT constraint (selector,cc,action)> <!ELEMENT selector EMPTY> <!ELEMENT cc (#PCDATA|variable)*> <!ELEMENT action (message*)> <!ELEMENT message (#PCDATA|value)*> <!ELEMENT variable EMPTY> <!ELEMENT value EMPTY> <!ATTLIST cs dtd CDATA #IMPLIED date CDATA #IMPLIED version CDATA #IMPLIED > <!ATTLIST selector selexp CDATA #REQUIRED> <!ATTLIST variable selexp CDATA #REQUIRED> <!ATTLIST value selexp CDATA #REQUIRED>

Changes:

  • CC has a mixed content
  • user has a stamp to mark suspicious

paths: VARIABLE

  • “user must be aware ...”

XML Europe 2001 - 22/25 May - Berlin - Germany 26

CSL processor: V2.0

%handler=( '-outputenc' => 'ISO-8859-1', '-default' => sub{"<$q>$c</$q>"}, 'SELECTOR' => sub{"$v{SELEXP}"}, 'CONSTRAINT' => sub{@ccvars=(); $predicates=""; "<xsl:template match=\"$c</xsl:template>";}, 'MESSAGE' => sub{" ERROR: $c"}, 'VALUE' => sub{"<xsl:value-of select=\'$v{SELEXP}\'/>"}, 'CS' => sub{"$c"}, 'ACTION' => sub{"$c"}, 'CC' => sub{$predicates = "[not($c)]"; foreach $ccvar (@ccvars) { $predicates .= "[$ccvar]"; } $predicates . "\">";}, 'VARIABLE' => sub{push(@ccvars,$v{SELEXP}); "$v{SELEXP}";}, );

Algorithm:

  • convert each constraint into a template
  • use SELEXP for MATCH attribute
  • convert CC into a predicate: [...]
  • each “stamped” path will be converted into a predicate
  • put MESSAGE contents inside template body
  • filter text nodes
slide-14
SLIDE 14 J
  • s
é C a r l
  • s
R a m a l h
  • 9
/ 2 8 / 2 4 S G M L D
  • c
u m e n t s : W h e r e D
  • e
s Q u a l i t y G
  • ?
1 4

XML Europe 2001 - 22/25 May - Berlin - Germany 27

Example2: linguistics

<CS> <CONSTRAINT> <SELECTOR SELEXP="//sentence"/> <CC><VARIABLE SELEXP="noun/@number"/> =<VARIABLE SELEXP="verb/@number"/> </CC> <ACTION> <MESSAGE>Attribute number of "<VALUE SELEXP="noun"/>" does not agree with attribute number of <VALUE SELEXP="verb"/>"!!!</MESSAGE> </ACTION> </CONSTRAINT> <CONSTRAINT> <SELECTOR SELEXP="//sentence"/> <CC><VARIABLE SELEXP="noun/@genre"/>=<VARIABLE SELEXP="adj/@genre"/></CC> <ACTION> <MESSAGE>:<VALUE SELEXP="noun"/> e <VALUE SELEXP="adj"/> do not agree in genre!!!</MESSAGE> </ACTION> </CONSTRAINT> </CS> XML Europe 2001 - 22/25 May - Berlin - Germany 28

... <xsl:template match="//sentence [not(noun/@number=verb/@number)][noun/@number][verb/@number]"> ERROR: Attribute number of "<xsl:value-of select='noun'/>" does not agree with attribute number of <xsl:value-of select='verb'/>"! </xsl:template> <xsl:template match="//sentence [not(noun/@genre=adj/@genre)][noun/@genre][adj/@genre]"> ERROR: :<xsl:value-of select='noun'/> e <xsl:value-of select='adj'/> do not agree in genre!!! </xsl:template> <xsl:template match="//text()" priority="-1"> </xsl:template> ...

Example2: linguistics

Generated XSL stylesheet //sentence //sentence Ambiguity problem

slide-15
SLIDE 15 J
  • s
é C a r l
  • s
R a m a l h
  • 9
/ 2 8 / 2 4 S G M L D
  • c
u m e n t s : W h e r e D
  • e
s Q u a l i t y G
  • ?
1 5

XML Europe 2001 - 22/25 May - Berlin - Germany 29

XCSL: versions 2.1 and 2.2

  • Version 2.1

– Each CONSTRAINT is converted into a XSL template – Templates will run in different “modes”

  • Version 2.2

– Added support for “let expressions” – Enables two dimensional traversals

<xsl:template match="/"> <xsl:apply-templates mode="const1"/> <xsl:apply-templates mode="const2"/> <xsl:apply-templates mode="const3"/> <xsl:apply-templates mode="const4"/> </xsl:template> Version 2.1 main skeleton

XML Europe 2001 - 22/25 May - Berlin - Germany 30

Example3: XDBML

<?xml version="1.0"?> <CS> <CONSTRAINT> <SELECTOR SELEXP="//items/items-REG"/> <LET NAME="key" VALUE="code"/> <CC>count(//items/items-REG[code = $key]) = 1</CC> <ACTION> <MESSAGE>WARNING: code: <VALUE SELEXP="code"/> is not unique!</MESSAGE> </ACTION> </CONSTRAINT> </CS>

XCSL v2.2 document Maintaining primary key invariant

slide-16
SLIDE 16 J
  • s
é C a r l
  • s
R a m a l h
  • 9
/ 2 8 / 2 4 S G M L D
  • c
u m e n t s : W h e r e D
  • e
s Q u a l i t y G
  • ?
1 6

XML Europe 2001 - 22/25 May - Berlin - Germany 31

Example3: XDBML

<xsl:template mode="const1" match="//items/items-REG"> <xsl:variable name="key"> <xsl:value-of select="code"/> </xsl:variable> <xsl:if test="not(count(//items/items-REG [code = $key]) = 1)"> WARNING: code: <xsl:value-of select='code'/> is not unique! </xsl:if> <xsl:apply-templates mode="const1"/> </xsl:template>

Generated XSL stylesheet: main template

XML Europe 2001 - 22/25 May - Berlin - Germany 32

Conclusions

  • XML data and XCSL constraints can be

joined in the same document

  • Schemas or DTDs are not always necessary
  • Other ideas coming from this presentation:

– Do it simple – With existing technology

slide-17
SLIDE 17 J
  • s
é C a r l
  • s
R a m a l h
  • 9
/ 2 8 / 2 4 S G M L D
  • c
u m e n t s : W h e r e D
  • e
s Q u a l i t y G
  • ?
1 7

XML Europe 2001 - 22/25 May - Berlin - Germany 33

Future Work

  • To create a XSL processor generator
  • To generate Perl (XML::DT) instead of XSL

– Would allow to have Perl actions – Would allow to specify pattern matching constraints

  • To do some reverse engineering on top of

all:

– Finding a suitable abstract representation for these contraints and the whole model (HOAG)

XML Europe 2001 - 22/25 May - Berlin - Germany 34

Other resources

  • My personnal webpage:

– http://www.di.uminho.pt/~jcr

  • My XML page with drafts, thesis,

presentations, software, ... (always under construction!):

– http://orunner.di.uminho.pt/PED/ped.html

slide-18
SLIDE 18 J
  • s
é C a r l
  • s
R a m a l h
  • 9
/ 2 8 / 2 4 S G M L D
  • c
u m e n t s : W h e r e D
  • e
s Q u a l i t y G
  • ?
1 8

XML Europe 2001 - 22/25 May - Berlin - Germany 35

Q&A?

jcr@di.uminho.pt