SGML Documents: SGML Documents: Where Does Quality Go? Where Does - - PowerPoint PPT Presentation

sgml documents sgml documents where does quality go where
SMART_READER_LITE
LIVE PREVIEW

SGML Documents: SGML Documents: Where Does Quality Go? Where Does - - PowerPoint PPT Presentation

SGML Documents: SGML Documents: Where Does Quality Go? Where Does Quality Go? Jos Carlos Ramalho Jorge Gustavo Rocha Jos Joo Almeida Pedro Rangel Henriques Language Processing and Specification Group Computer Science Department


slide-1
SLIDE 1

SGML Documents: SGML Documents: Where Does Quality Go? Where Does Quality Go?

José Carlos Ramalho Jorge Gustavo Rocha José João Almeida Pedro Rangel Henriques

Language Processing and Specification Group Computer Science Department University of Minho Portugal

slide-2
SLIDE 2

SGML/XML’97 - 8..11Dez - Washington - José Carlos Ramalho 2

What will we discuss? What will we discuss?

When information increases, when

information sources increase and vary, what happens to quality?

How can we ensure/preserve quality? What is quality (what are we talking

about)?

In what contexts is quality more

relevant?

Can we measure it? ...

slide-3
SLIDE 3

SGML/XML’97 - 8..11Dez - Washington - José Carlos Ramalho 3

What are we doing with SGML? What are we doing with SGML?

Constructing document DBs Publishing books on the Internet Converting parish registers (XIII and

XIV century) to SGML

Publishing from SGML DBs: Internet,

CDROM, paper, …

Connecting SGML Documents to GIS

slide-4
SLIDE 4

SGML/XML’97 - 8..11Dez - Washington - José Carlos Ramalho 4

Quality is good. Quality is important. Quality is when something is

good and achieves to remain good for a period of time.

Attribute, class, category (from

dic.).

Specific attribute that

distinguishes a person, a thing or an entity (from encycolpedia).

Quality? Quality?

Lots of Subjectivity

slide-5
SLIDE 5

SGML/XML’97 - 8..11Dez - Washington - José Carlos Ramalho 5

Quality (in our context)? Quality (in our context)?

→ Interface → … → Data relevance → … → Data correctness

There is a lot less subjectivity in this item

slide-6
SLIDE 6

SGML/XML’97 - 8..11Dez - Washington - José Carlos Ramalho 6

Aims of this work Aims of this work

We want to minimize Data Incorrectness We don’t want to change existing

models

We want to extend them In the end we want to eliminate

information revision cycles

slide-7
SLIDE 7

SGML/XML’97 - 8..11Dez - Washington - José Carlos Ramalho 7

SGML SGML authoring and processing model

authoring and processing model

Editor Editor Parser

Formatter

DTD Design Process Authoring Process Validation Process OK / errors SGML Doc.

Valid SGML Doc.

Formatting

Process

OUTPUT

Style Specification

slide-8
SLIDE 8

SGML/XML’97 - 8..11Dez - Washington - José Carlos Ramalho 8

Data (in)correctness Data (in)correctness

Example 1: Portuguese History

CD ROM

Kings Kingdoms Wars … ???

slide-9
SLIDE 9

SGML/XML’97 - 8..11Dez - Washington - José Carlos Ramalho 9

Data (in)correctness Data (in)correctness

Example 1: Portuguese History Kingdoms Wars

CD ROM

Kings … ??? What went wrong?

  • Kings with inexistent kingdoms
  • Wars happening in the wrong era
  • Characters that died before they were born
  • ...
slide-10
SLIDE 10

SGML/XML’97 - 8..11Dez - Washington - José Carlos Ramalho 10

Data (in)correctness Data (in)correctness

Example 2: Parish register (XIII and XIV century) Family Database …??? Death certificate Marriage articles Baptism certificate

slide-11
SLIDE 11

SGML/XML’97 - 8..11Dez - Washington - José Carlos Ramalho 11

Data (in)correctness Data (in)correctness

Example 2: Parish register (XIII and XIV century) Family Database Birth certificate Death certificate Marriage articles Baptism certificate

Problems:

  • negative ages
  • death before baptism
  • marriages between people

with age differences higher than 100

  • ...
slide-12
SLIDE 12

SGML/XML’97 - 8..11Dez - Washington - José Carlos Ramalho 12

What do we propose? What do we propose?

An extra validation task:

– we need an additional level of abstraction separating information content from document structure.

Implemented over an external functional

system (in the moment …)

Capable of expressing invariants and

pre-conditions over data contents

Invisible from the user point of view

slide-13
SLIDE 13

SGML/XML’97 - 8..11Dez - Washington - José Carlos Ramalho 13

How? How?

Special Comment Sections: embedding code

in DTDs

Throught an anchor to an external file

<!DOCTYPE king [ <!ELEMENT king -- (name,coname, bdate,…)> <!-- INV inv_king(k) = …

  • ->

<!-- INV: king.cam --> <!DOCTYPE king [ … ]>

slide-14
SLIDE 14

SGML/XML’97 - 8..11Dez - Washington - José Carlos Ramalho 14

Example: kings and decrees Example: kings and decrees

<!-- INV: king.cam --> <!DOCTYPE king [ <!ELEMENT king -- (name, coname, bdate, ddate,decree+)> <!ELEMENT decree -- (date, body)> <!ELEMENT (name,coname,bdate,ddate,date) -- (#PCDATA)> <!ELEMENT body -- (#PCDATA)> ]>

Inv_king(k) = { if( k notin famous_personsDB → k ++ “ not in FPDB”), if( bdate_(k) > ddate_(k) → k ++ “died before he has born”), if( ddate_(k) - bdate_(k) > 120 → k ++ “lived more than 120”), if( !all( x ← decree_l(k) : bdate_(k) < date_(x) /\ date_(x) < ddate_(k) ) → k ++ “made a decree outside his life” ) };

king.dtd king.cam

slide-15
SLIDE 15

SGML/XML’97 - 8..11Dez - Washington - José Carlos Ramalho 15

Example: kings and decrees Example: kings and decrees

<king> <name>D.Dinis</name> <coname>Farmer</coname> <bdate>1270.09.23</bdate> <ddate>1370.09.23</ddate> <decree> <date>1300.07.15</date> <body>From this day only bicycles are allowed to circulate.</body> </decree> <decree> <date>1389.11.03</date> <body>McDonald’s will sell green wine instead of COCA- COLA.</decree> </king> ERRORS: D.Dinis must be inserted in FPDB. D.Dinis made a decree outside his life.

slide-16
SLIDE 16

SGML/XML’97 - 8..11Dez - Washington - José Carlos Ramalho 16

Other Examples Other Examples

Tying an Archaeological Database to a GIS:

– archaeological SGML documents have geographical coordinates. – we must ensure that every one of those coordinates is within a certain range.

City Council Elections

– each voting section produces a final report with the results (an SGML document). – we must ensure that the number of votes matches the number of subscribed voters minus the absent ones.

slide-17
SLIDE 17

SGML/XML’97 - 8..11Dez - Washington - José Carlos Ramalho 17

New New SGML auth. and proc. model SGML auth. and proc. model

Editor Editor Parser

Formatter

DTD Design Process Authoring Process Validation Process 1/2

OK / errors SGML Doc. Valid SGML Doc.

Formatting Process OUTPUT

Style Specification

DTD

DTD2CAM ESIS OK / errors CAMILA

Validation Process 2/2

slide-18
SLIDE 18

SGML/XML’97 - 8..11Dez - Washington - José Carlos Ramalho 18

Camila Camila Validation Process Validation Process

Designer LOAD

  • aux. Func.

Types Invariants User nsgmls esis2cam validate dtd2cam OK / errors ESIS Data flow Control flow DTD <king> … </king>

slide-19
SLIDE 19

SGML/XML’97 - 8..11Dez - Washington - José Carlos Ramalho 19

Camila Camila Validation Process Validation Process

Designer LOAD

  • aux. Func.

Types Invariants User nsgmls esis2cam validate dtd2cam OK / errors ESIS Data flow Control flow DTD <king> … </king>

TYPE king = name_ :name coname_ :coname bdate_ :bdate ddate_ :ddate decree_l :decree-seq ; ENDTYPE inv_king( k ) = true;

<!ELEMENT king - - (name, coname, bdate, ddate, decree+)>

dtd2cam

slide-20
SLIDE 20

SGML/XML’97 - 8..11Dez - Washington - José Carlos Ramalho 20

Conclusion Conclusion

The new proposed model enables us to

put some kind of data constraints associated with DTD element contents.

We can avoid many errors given by a

distracted user.

We can improve information quality and

reduce information revision cycle.

slide-21
SLIDE 21

SGML/XML’97 - 8..11Dez - Washington - José Carlos Ramalho 21

Conclusion (cont.) Conclusion (cont.)

In the case studies we have dealed with

so far we didn’t find complex invariants.

Structural correctness imposed by

SGML already enforces some validation

  • ver element contents.

Most of needed invariants are very

simple: domain range validation, relationship validation, ...

slide-22
SLIDE 22

SGML/XML’97 - 8..11Dez - Washington - José Carlos Ramalho 22

Future Work Future Work

A simple constraint language is being

studied/created to optimize the proposed system.

We are going to implement this

validation scheme (with the new language) in our prototype INES (“A Document Programming Environment”).

slide-23
SLIDE 23

SGML/XML’97 - 8..11Dez - Washington - José Carlos Ramalho 23

INES: Document Programming Env INES: Document Programming Env

Context Rules

Designer DTD “X”

INES

Utilizador A Utilizador B Utilizador C Texto “Y” Texto “X” Texto “Z” Doc X Doc Y Doc Z Context Rules Style Specification

slide-24
SLIDE 24

SGML/XML’97 - 8..11Dez - Washington - José Carlos Ramalho 24

INES: inside INES: inside

Designer DTD Editor Editor Generator SGEN “X” Editor Utilizador Errors Text Errors Doc X Código Scheme Context Editor Context Conditions; Invariants DSSSL Editor Style Specification RTF PostScript DTD DTD SGML text