The TEI Workflow: from design to dissemination Lou Burnard - - PowerPoint PPT Presentation

the tei workflow from design to dissemination
SMART_READER_LITE
LIVE PREVIEW

The TEI Workflow: from design to dissemination Lou Burnard - - PowerPoint PPT Presentation

1. Organization 2. Conservation 3. Dissemination 4. Data Modelling 5. Implementation The TEI Workflow: from design to dissemination Lou Burnard Consulting 1/52 1. Organization 3 The TEI has most to contribute to aspects 4 and 5 . .


slide-1
SLIDE 1
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

The TEI Workflow: from design to dissemination

Lou Burnard Consulting

1/52

slide-2
SLIDE 2
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

Stages in managing a digital project

. .

1

  • rganisation: in which we get the funding, decide what we're

trying to do, and recruit someone to do it .

2

conservation: in which we think about how we will ensure our work is not lost when the money runs out . .

3

dissemination: in which we consider how to make our work available to people . .

4

data modelling: in which we argue about the structure and essence of our materials . .

5

implementation: in which we actually build something . . The TEI has most to contribute to aspects 4 and 5

2/52

slide-3
SLIDE 3
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

An (imaginary) case study

The Virgolos Archive holds one of the largest and most varied imaginary collections of historic picture postcards in the world. Its founder, recently-deceased eccentric Belgian millionaire Marcel Virgolos, left substantial funding in his will for the digitization and dissemination of the archive. Proposals are now invited ...

3/52

slide-4
SLIDE 4
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation
  • 1. Organization

You will need to draw up a persuasive project plan... Does your institution provide a research support officer? What will be done when and by whom? What deliverables are expected at each stage of the project? What dependencies are there between the different stages of the project plan? How will each stage be validated? What will you do if targets are not achieved? A GANTT chart might help You will need to select suppliers... Digitization in house or by a vendor? Get advice from professionals : not all digitization is the same Eg (in UK) http://www.jiscdigitalmedia.ac.uk/digitisation TEI Consortium members may get special rates; see http://www.tei-c.org/AccessTEI/

4/52

slide-5
SLIDE 5
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

Organizing Virgolos : shopping list

Make a representative sample from the archive Get quotes from prospective digitization suppliers Experiment with transcribers Calculate digitization/transcription workflow Test workflow Revise workflow till it works... Award contracts

5/52

slide-6
SLIDE 6
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation
  • 2. Conservation

This may seem obvious, but... Digital media are fragile and must be properly maintained This applies (obviously) to their physical storage : is the Cloud going to last forever? Think also about the format of your data: can it be migrated without effort? Take steps to preserve your data : it is far more valuable than the interface you provide to it Again there are specialist national and international agencies out there ready to help you: e.g. the Digital Preservation Coalition... but start with your institutional librarian!

6/52

slide-7
SLIDE 7
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

Conserving Virgolos

Our digital data is all in standard formats (TIFF, PNG, TEI-XML...) Our TEI-XML is formally documented in an ODD We control dissemination via our own server We are negotiating a deposit arrangement with a national or institutional library

7/52

slide-8
SLIDE 8
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation
  • 3. Dissemination

What will people want to do with our digital archive? . .

1

Browse it on the web . .

2

Extract (parts of) it as an e-book .

3

Analyse its data content .

4

Analyse its linguistic content . .

5

Search by

topics represented visually, or discussed sender, recipient, location, date, time...

. .

6

Visualise patterns in distribution of data or content

8/52

slide-9
SLIDE 9
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

And what kind of people will they be?

are there any legal and IPR issues to resolve? is this a project for the general public or for specialists only? what can we do to address concerns about linguistic or cultural sensitivities? how about accessibility issues?

9/52

slide-10
SLIDE 10
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

The Virgolos Digital Archive : a public resource

Our audience is potentially large and divers: cultural heritage and tourism geographers and social historians historical linguists bibliographers and librarians the general public We will make our digital versions available under a Creative Commons licence We will set up an outreach programme

social media for promotional activity academic publications in specialist journals attractive and easy to use website

And we will welcome other agencies trying to integrate our resources with theirs Some good role models: Gallica, Old Bailey

10/52

slide-11
SLIDE 11
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

Access methods

Print Produce high quality printed readable version of selected items Display Render TEI XML in a web browser Transform Turn the TEI XML into (eg) HTML5 for an eBook Index Provide human-readable entry points into the text Expose the data Provide machine-readable entry points into the text . . And how about making the TEI XML source available too?

11/52

slide-12
SLIDE 12
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation
  • 4. Data Modelling

`Conceptual analysis is the work of philosophers, lawyers, lexicographers, systems analysts and database administrators.' [Sowa, 1984] Several formal methods have been developed to assist in the task of data modelling Their application should always be informed by domain-specific knowledge In other words, the job should not be left to the information scientists! Information (concept) modelling is a necessary preliminary of data modelling

12/52

slide-13
SLIDE 13
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

A traditional data analysis

Seeks to identify .. the "objects of interest" their attributes or properties relationships amongst those properties processes and anticipating processing of those objects

13/52

slide-14
SLIDE 14
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

For example...

14/52

slide-15
SLIDE 15
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

Is the TEI data model suitable for our postcards?

TEI out of the box is designed to work with traditionally organised books and manuscripts. But suppose we want to work on a slightly different kind of object... a postcard collection, or a monumental inscription? How do we make a TEI schema to handle hundreds or thousands of things like this:

15/52

slide-16
SLIDE 16
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

A postcard (front)

16/52

slide-17
SLIDE 17
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

A postcard (back)

17/52

slide-18
SLIDE 18
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

Another postcard

Not all cards are organized the same way...

18/52

slide-19
SLIDE 19
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

Which are the most significant components of these texts?

the picture the postmark the printed part the message(s) written on them the addressee(s) subject matter of the picture information about the publishing, printing, circulation of the card or other metadata...

19/52

slide-20
SLIDE 20
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

Which are the most significant components of these texts?

the picture the postmark the printed part the message(s) written on them the addressee(s) subject matter of the picture information about the publishing, printing, circulation of the card or other metadata...

19/52

slide-21
SLIDE 21
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

Which are the most significant components of these texts?

the picture the postmark the printed part the message(s) written on them the addressee(s) subject matter of the picture information about the publishing, printing, circulation of the card or other metadata...

19/52

slide-22
SLIDE 22
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

Which are the most significant components of these texts?

the picture the postmark the printed part the message(s) written on them the addressee(s) subject matter of the picture information about the publishing, printing, circulation of the card or other metadata...

19/52

slide-23
SLIDE 23
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

Which are the most significant components of these texts?

the picture the postmark the printed part the message(s) written on them the addressee(s) subject matter of the picture information about the publishing, printing, circulation of the card or other metadata...

19/52

slide-24
SLIDE 24
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

Which are the most significant components of these texts?

the picture the postmark the printed part the message(s) written on them the addressee(s) subject matter of the picture information about the publishing, printing, circulation of the card or other metadata...

19/52

slide-25
SLIDE 25
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

Which are the most significant components of these texts?

the picture the postmark the printed part the message(s) written on them the addressee(s) subject matter of the picture information about the publishing, printing, circulation of the card or other metadata...

19/52

slide-26
SLIDE 26
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

Which are the most significant components of these texts?

the picture the postmark the printed part the message(s) written on them the addressee(s) subject matter of the picture information about the publishing, printing, circulation of the card or other metadata...

19/52

slide-27
SLIDE 27
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

Data and text

The data view is most concerned with the production and transmission of the card: who produced it and where? who sent it to whom, when and where? The textual view is most concerned with the content of the card: What does it represent? What is the message and how is it expressed? . . Our challenge: combine the two in a single TEI structure

20/52

slide-28
SLIDE 28
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

Suggestion

We could begin by structuring the content of the card as divisions

  • f various types

Physically: recto: one side, usually the one with the picture verso: the other side, usually the one with the message On these two surfaces, we expect to find various other subsections, such as: the message information about the sending of the card, notably:

the addressee the postmark, stamp, etc.

data about the publication, sale, collection, etc. of this card

21/52

slide-29
SLIDE 29
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

First try at encoding a postcard

. . <carte n="0010"> <recto url="cartes/19800726_001r.jpg"/> <verso url="cartes/19800726_001v.jpg"> <obliteration> <date>PM ?? Jul ???</date> <lieu>EL PASO. TX 799</lieu> </obliteration> <message> <p>26 juill 80</p> <p>Chère Madame, après New-York et Washington dont le gigantisme m'a beaucoup séduite, nous avons commencé notre conquête de l'Ouest par New Orleans, ville folle en fête perpétuelle. Il fait une chaleur torride au Texas mais le coca-cola permet de résister – l'Amérique m'enchante ! Bientôt, le grand Canyon, le Colorado et San Francisco... En espérant que vous passez de bonnes vacances, affectueusement </p> <p> Sylvie </p><p>François. </p> </message> <destinataire> Madame Lefrère 4, allée George Rouault 75020 Paris France </destinataire> </verso> </carte> 22/52

slide-30
SLIDE 30
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

Commentary

We didn't use the TEI vocabulary. This means we may have trouble sharing or explaining our data with non-french

  • speakers. Or benefitting from their work.

We haven't included all the things that might be encoded: for example, corrections in the text, layout of the components, names of people or places referred to, linguistic or historical features, bibliographic data about where the card was printed ... We haven't structured (for example) the address, which will make intelligent searching difficult. Of course, we can always invent more tags for these things. But isn't it rather a waste of our time if the TEI has already done the job ?

23/52

slide-31
SLIDE 31
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

TEI version

We regard each card as a <text> containing two <div> elements, one for the recto and one for the verso of each card. We markup each functional division of the card as a <div type="[function]" Metadata, e.g. about the publishere of the card and data about its transmission will go in the TEI Header We markup names of people and places with <name> and dates with <date> We use the attribute @facs to associate parts of transcribed text with their digital image, indicated by a <graphic> element. We use <address> for the address; <stamp> element for stamps, postmarks, and similar things. We may also need <del> (for deletions), <add> (for additions), <reg> (for regularized spellings), <unclear> for things we cannot read, <lb> for line breaks ... . . Will that be enough ?

24/52

slide-32
SLIDE 32
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

First try at a TEI version : the header

. . <teiHeader> <fileDesc> <titleStmt> <title>San Antonio River : digital edition of card 19800726_001 from the Virgolos collection</title> </titleStmt> <publicationStmt> <p>Demonstration at DH OXSS 2013</p> </publicationStmt> <sourceDesc> <bibl> <title level="m">San Antonio River (postcard)</title> <publisher>School Mart</publisher> <pubPlace>1812 South Press, San Antonio, Texas 70210</pubPlace> <idno>SA-146-C</idno> <note resp="#ed">The San Antonio river, often called the Venice of Texas, winds its way through the business section of San Antonio. It is very picturesque with its many bridges and beautifully landscaped banks.</note> </bibl> <listPerson> <person xml:id="MLF"> <persName>Mme Lefrere</persName> </person> </listPerson> </sourceDesc> </fileDesc> </teiHeader> 25/52

slide-33
SLIDE 33
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

First try at a TEI version : the text

. . <text> <body> <div type="recto"> <figure> <graphic url="../../Graphics/Cartes/19800726_001r.jpg"/> <figDesc>View showing a river with a stone bridge and small mexican-style houses. In the foreground a man and a woman in a pedalo boat.</figDesc> <head>San Antonio River</head> </figure> </div> <div facs="19800726_001v.jpg" type="verso"> <div type="message"> <!-- ...

  • ->

</div> <div type="destination"> <p> <!-- stamps --> </p> <p> <address> <!-- ...

  • ->

</address> </p> </div> </div> </body> </text> 26/52

slide-34
SLIDE 34
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

First try at a TEI version : the message

. . <div type="message" xml:lang="fr"> <p> <date when="1980-07-26">26 juill 80</date> </p> <p>Chère Madame, après New-York et Washington dont le gigantisme m'a beaucoup séduite, nous avons commencé notre conquête de l'Ouest par New Orleans, ville folle en fête perpétuelle. Il fait une chaleur torride au Texas mais le coca-cola permet de résister – l'Amérique m'enchante ! Bientôt, le grand Canyon, le Colorado et San Francisco... </p> <p> En espérant que vous passez de bonnes vacances, affectueusement. </p> <signed>Sylvie </signed> <signed>François </signed> </div> 27/52

slide-35
SLIDE 35
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

First try at a TEI version : the destination

. . <div type="destination"> <p> <stamp type="postmark"> <placeName>El Paso</placeName> - TX 799 -<date notBefore="1980-07-26"> <unclear>PM JUL</unclear> </date> </stamp> <stamp type="postage"> Male head in profile, with airplane and radar tower in background <mentioned>US Airmail 21 c.</mentioned> </stamp> </p> <p> <address> <addrLine> <name ref="#MLF" type="person">Madame Lefrère</name> </addrLine> <addrLine>4, allée George Rouault</addrLine> <addrLine>75020 Paris</addrLine> <addrLine>France</addrLine> </address> </p> </div> 28/52

slide-36
SLIDE 36
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

Why use TEI (or any other common framework)

re-usability and repurposing of resources modular software development lower training costs ‘frequently answered questions’ — common technical solutions for different application areas . . The TEI was designed to support multiple views of the same

  • resource. The TEI is an evolving model of the concerns of Digital

Humanities.

29/52

slide-37
SLIDE 37
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

A word on TEI Conformance

A document is TEI Conformant if and only if it: is a well-formed XML document can be validated against a TEI Schema, that is, a schema derived from the TEI Guidelines conforms to the TEI Abstract Model uses the TEI Namespace (and other namespaces where relevant) correctly is documented by means of a TEI Conformant specification (an ODD file) which refers to the TEI Guidelines . . Standardization should not mean ‘Do what I do’ , but rather ‘Explain what you do in terms I can understand’

30/52

slide-38
SLIDE 38
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation
  • 5. Implementation

Our project requires many technologies and skills apart from TEI XML! For example: for transformation XSLT for print page layout skills for web display XHTML, CSS, Javascript, json ... interface design skills for searching and indexing Xpath, XQuery, XForms... RDF ... SPARQL ... A lot can be achieved by "non-programming" interfaces -- but you still need an understanding of these tools

31/52

slide-39
SLIDE 39
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

Some commonly-used TEI-friendly tools

  • Xygen includes many built-in transformations to make

different outputs from a TEI file, and also many ways

  • f ‘intelligently’ searching TEI files

TEI Boilerplate browser addons for display of TEI-XML, using CSS to control its rendering Omeka full content management system (CMS) which knows about TEI XML, much used in the digital historian community (see also ) OxGarage a web based tool for performing transformations between TEI and many other formats existDB a powerful XML database system: supports XQuery and underlies many complex document archives (see also BaseX) . . Wiki page on tools

32/52

slide-40
SLIDE 40
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

A web service: OxGarage

OxGarage provides a web interface to the TEI-C XSL stylesheets and their profiles: http:/www.tei-c.org/oxgarage It can (amongst other things) generate schemas like Roma generate documentation in HTML, ePub, DOCX, ODT ... convert between TEI XML and Word DOCX (or ODT) chain sets of transformations together It can operate as a scripted web service, or online.

33/52

slide-41
SLIDE 41
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

Matrix of OxGarage conversions

34/52

slide-42
SLIDE 42
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

http://www.tei-c.org/ege-webclient/

35/52

slide-43
SLIDE 43
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

http://www.tei-c.org/ege-webclient/

36/52

slide-44
SLIDE 44
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

http://www.tei-c.org/ege-webclient/

37/52

slide-45
SLIDE 45
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

... or at the command line

ODD to HTML, in French

curl -s

  • F upload=@test.odd
  • o test.html

http://oxgarage.oucs.ox.ac.uk:8080/ege-webservice/Conversions/ ODD%3Atext%3Axml/ ODDC%3Atext%3Axml/

  • ddhtml%3Aapplication%3Axhtml%2Bxml/

?properties=<conversions><conversion%20index='1'> <property%20id='oxgarage.lang'>fr</property></conversion></conversions>

38/52

slide-46
SLIDE 46
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

... or you can roll your own

Key W3C standards : XSLT: a language for transforming XML XPath: a language for expressing paths through XML trees XSL FO: an XML vocabulary for describing formatted pages CSS : a (non-XML!) language for specifying how HTML/XML documents should be rendered on screen or paper These languages provide all the functionality you need for processing any XML document, including TEI

39/52

slide-47
SLIDE 47
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

XSLT

The XSLT language is a language for defining transformations: expressed in XML uses namespaces to distinguish output from instructions purely functional reads and writes XML trees

40/52

slide-48
SLIDE 48
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

What is a transformation?

Take this

. . <div type="recipe" n="34"> <head>Student Pasta</head> <list> <item>Pasta</item> <item>Grated cheese</item> </list> <p>Cook the pasta and mix with the cheese</p> </div>

and make this

. . <html> <h1>34: Student Pasta</h1> <p>Ingredients: Pasta Grated cheese</p> <p>Cook the pasta and mix with the cheese</p> </html>

41/52

slide-49
SLIDE 49
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

Structure of an XSL file

An XSLT file (a stylesheet) contains a number of templates, each of which specifies actions to be taken when some part of an XML tree is processed

. . <xsl:stylesheet xpath-default-namespace="http://www.tei- c.org/ns/1.0" version="2.0"> <xsl:template match="div"> <!-- .... do something with div elements....--></xsl:template> <xsl:template match="p"> <!-- .... do something with p elements....--></xsl:template></xsl:stylesheet>

The div and p are patterns which specify which bit of the document is matched by the template. Any element not starting with xsl: in a template body is put into the

  • utput.

42/52

slide-50
SLIDE 50
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

How do you express that in XSLT?

. . <xsl:stylesheet xpath-default-namespace="http://www.tei- c.org/ns/1.0" version="2.0"> <xsl:template match="div"> <html> <h1> <xsl:value-of select="@n"/>: <xsl:value-of select="head"/> </h1> <p>Ingredients: <xsl:apply-templates select="list/item"/> </p> <p> <xsl:value-of select="p"/> </p> </html></xsl:template></xsl:stylesheet>

Note: the namespace declaration linking xsl: to http://www.w3.org/1999/XSL/Transform is not shown in these examples.

43/52

slide-51
SLIDE 51
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

Processing an XML document with XSLT

. .

1

Starting at the root node of the document ... . .

2

If the stylesheet has a template matching the element you are looking at, apply it; if not, continue with the element's children .

3

If the element you are processing contains only text nodes,

  • utput them

. .

4

A template may access any part of the tree, extracting data from any element or attribute . .

5

The order of templates in your program file is immaterial

44/52

slide-52
SLIDE 52
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

The importance of XPath

XPath is the language we use to define which part or parts of an XML tree we are talking about. It defines a path through the tree starting from a given position (the context) foo: an element called <foo> foo/bar: a <bar> contained directly by a <foo> foo/bar/@blort: an attribute @blort on a <bar> contained directly by a <foo> foo/bar[@blort]: a <bar> which has an attribute @blort and is contained directly by a <foo> foo/bar[@blort='yes']: a <bar> which has an attribute @blort with the value 'yes' and is contained directly by a <foo> The context can also be defined: //foo: an element called <foo> anywhere in the document ancestor::foo/bar: a <bar> contained indirectly by a <foo>

45/52

slide-53
SLIDE 53
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

CSS : Cascading Stylesheets

a simple language for specifying how documents should be formatted the specifications are attached to nodes in a document each specification sets one or more properties, such as colour, font, size etc. for an element or a class of elements values of properties not specified are inherited (cascaded) from a parent

body { background: white; font-family: Helvetica; font-size: 24pt; color: black; } p { margin-top: 6px; } p.it { font-style: italic; }

Processed directly and (fairly) consistently by current web browsers a CSS stylesheet is usually stored on a server and linked to HTML or XML documents being served from it displays the whole of a document, in the order in which it is stored

46/52

slide-54
SLIDE 54
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

XSLFO : XSL Formatting Objects

A language for describing the layout of paged documents an XML document is converted (‘de-serialised’) to an XML document containing formatting objects by an XSLT transformation these objects are then converted to a printable format such as PDF or Postscript by a rendering agent Implementations are (mostly) expensive page layout software . . Good for top-quality typographic output; see also LaTeX

47/52

slide-55
SLIDE 55
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

XQUERY : a programming language for XML

To handle large quantities of XML documents some sort of database system is essential An XML database system such as eXist, baseX, or markLogic is a powerful piece of software which handles XML data in the same way as a relational database handles relational data XQUERY is the standard query language for such databases (analogous to SQL for RDBMS)

48/52

slide-56
SLIDE 56
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

XQUERY = XPATH + FLOWR

A sample FLOWR expression:

for $act in doc("hamlet.xml")//ACT let $speakers := distinct-values($act//SPEAKER) return <div> <h1> { string($act/TITLE) } </h1> <ul> { for $speaker in $speakers return <li> { $speaker } </li> } </ul> </div> 49/52

slide-57
SLIDE 57
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

Typical XQUERY uses

Extracting information from a database for use in a web service Generating summary reports on data stored in an XML database Searching textual documents on the Web for relevant information and compiling the results Selecting and transforming XML data to XHTML to be published on the Web Pulling data from databases to be used for the application integration Splitting up an XML document that represents multiple transactions into multiple XML documents. . . There is considerable functional overlap with XSLT...

50/52

slide-58
SLIDE 58
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

XSLT and XQUERY

The XPATH specification is used by both XSLT is (historically) concerned with transformations of documents to some other form XQUERY is (historically) concerned with manipulation of data The two are often used in a complementary way: documents are stored in, and retrieved from an XML database, before being processed by an XSLT stylesheet to render them (or parts of them) as web pages, epub, etc. See for example http://history.state.gov

51/52

slide-59
SLIDE 59
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

Implementation of the Virgolos Archive

Alas, the money ran out while we were still arguing about our data model... As the cards are digitized and archived, we create metadata for them: minimal TEI Headers These are stored in our exist database Our clever programmers are developing a crowd sourcing application to produce transcriptions of the cards, which will be stored in the same place They will also develop a friendly web site to permit searching and display of individual items, or a series... . . For a (real) example, see http://graves.uvic.ca/graves/site/index.xml

52/52

slide-60
SLIDE 60
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

Implementation of the Virgolos Archive

Alas, the money ran out while we were still arguing about our data model... As the cards are digitized and archived, we create metadata for them: minimal TEI Headers These are stored in our exist database Our clever programmers are developing a crowd sourcing application to produce transcriptions of the cards, which will be stored in the same place They will also develop a friendly web site to permit searching and display of individual items, or a series... . . For a (real) example, see http://graves.uvic.ca/graves/site/index.xml

52/52

slide-61
SLIDE 61
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

Implementation of the Virgolos Archive

Alas, the money ran out while we were still arguing about our data model... As the cards are digitized and archived, we create metadata for them: minimal TEI Headers These are stored in our exist database Our clever programmers are developing a crowd sourcing application to produce transcriptions of the cards, which will be stored in the same place They will also develop a friendly web site to permit searching and display of individual items, or a series... . . For a (real) example, see http://graves.uvic.ca/graves/site/index.xml

52/52

slide-62
SLIDE 62
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

Implementation of the Virgolos Archive

Alas, the money ran out while we were still arguing about our data model... As the cards are digitized and archived, we create metadata for them: minimal TEI Headers These are stored in our exist database Our clever programmers are developing a crowd sourcing application to produce transcriptions of the cards, which will be stored in the same place They will also develop a friendly web site to permit searching and display of individual items, or a series... . . For a (real) example, see http://graves.uvic.ca/graves/site/index.xml

52/52

slide-63
SLIDE 63
  • 1. Organization
  • 2. Conservation
  • 3. Dissemination
  • 4. Data Modelling
  • 5. Implementation

Implementation of the Virgolos Archive

Alas, the money ran out while we were still arguing about our data model... As the cards are digitized and archived, we create metadata for them: minimal TEI Headers These are stored in our exist database Our clever programmers are developing a crowd sourcing application to produce transcriptions of the cards, which will be stored in the same place They will also develop a friendly web site to permit searching and display of individual items, or a series... . . For a (real) example, see http://graves.uvic.ca/graves/site/index.xml

52/52