XML Technologies and Applications Jukka Teuhola University of Turku - - PowerPoint PPT Presentation

xml technologies and applications
SMART_READER_LITE
LIVE PREVIEW

XML Technologies and Applications Jukka Teuhola University of Turku - - PowerPoint PPT Presentation

XML Technologies and Applications Jukka Teuhola University of Turku Dept. of Information Technology Computer Science Spring 2013 XML-1 J. Teuhola 2013 1 1. General Extent: 5 study points Level: Advanced (syventv) Form:


slide-1
SLIDE 1

XML-1 J. Teuhola 2013 1

XML Technologies and Applications

Jukka Teuhola University of Turku

  • Dept. of Information Technology

Computer Science Spring 2013

slide-2
SLIDE 2

XML-1 J. Teuhola 2013 2

  • 1. General
  • Extent: 5 study points
  • Level: Advanced (syventävä)
  • Form: Self-study course
  • Components: Written material, exercise project, exam
  • Starting lecture (2 h): Tue 15.1.2013 at 8:15-10 in Beta
  • Exercise project:

– See instructions in:

http://staff.cs.utu.fi/kurssit/XML_technologies_and_applications/spring_2013/project/

– Must be finished before the examination

  • Exam dates: March 11th, 2013, two others to be

announced

  • Preliminary knowledge (recommended):

– HTML – Programming in Java – Principles of Databases

slide-3
SLIDE 3

XML-1 J. Teuhola 2013 3

Course material

  • Powerpoint slides:

http://staff.cs.utu.fi/kurssit/XML_technologies_and_applications/spring_2013/slides

The slides are in principle sufficient for passing the course, but for more detailed presentation, some XML textbook can be useful, such as:

– Elliotte Rusty Harold, W. Scott Means: "XML in a Nutshell", O'Reilly, 2nd ed. 2002 – Neil Bradley: "The XML Companion", Addison-Wesley, 2002. – Anders Møller, Michael Schwartzbach: An Introduction to XML and Web Technologies”, Addison-Wesley, 2006 – P. J. Deitel, H. M. Deitel: ”Internet and World Wide Web: How to Program”, Prentice Hall 2008 – Ossi Nykänen: "XML", Docendo 2001 (in Finnish)

slide-4
SLIDE 4

XML-1 J. Teuhola 2013 4

Useful links – mainly recommendations by WWW Consortium (W3C XML)

  • XML 1.0, XML 1.1
  • Namespaces in XML
  • XML Schema
  • Extensible stylesheet language: XSL 1.0, XSL 1.1
  • XSL Transformations: XSLT 1.0, XSLT 2.0
  • XML Path language: Xpath 1.0, XPath 2.0 , XPath 3.0
  • XML Linking
  • Cascading Style Sheets (CSS)
  • Document Object Model (DOM)
  • Simple API for XML (SAX)
  • XML Query (XQuery)
  • HTML5
slide-5
SLIDE 5

XML-1 J. Teuhola 2013 5

Other useful web sources

  • Tutorials:

W3 schools Moeller & Schwartzbach Oasis cover pages

  • Frequently Asked Questions: The XML FAQ
  • Java & XML: Oracle's pages
  • Software: Apache XML
  • Tools: XMLSpy (free trial), Cooktop (free),

XMLFox (free), and many others …

  • XML News and Resources: Cafe con Leche
slide-6
SLIDE 6

XML-1 J. Teuhola 2013 6

Contents

1. General 2. XML syntax 3. Defining the document structure: DTD 4. Designing the document structure 5. Namespaces 6. Defining the document structure: XML schema 7. Character sets

  • 8. Transformations (XSLT)
  • 9. Selecting parts: XPath

10.XML links and pointers 11.Formatting documents: CSS and XSL-FO 12.Application programming interfaces (APIs) for XML 13.XML databases and querying 14.Application areas

slide-7
SLIDE 7

XML-1 J. Teuhola 2013 7

What is XML?

  • ”Extensible Markup Language”
  • Generalized way of representing the structure of

documents

  • Actually a meta-language, a formalism enabling the

definition of application-specific markup

  • Simplification of SGML (Standard Generalized Markup

Language, ISO 1986)

  • Developed by World Wide Web Consortium (W3C,

recommendation 1998)

  • Open standard, independent of vendors and operating

systems.

slide-8
SLIDE 8

XML-1 J. Teuhola 2013 8

What is XML? (Cont.)

  • XML was originally developed to enhance HTML
  • It soon turned out to have much wider use in

document processing

  • Numerous application areas
  • XML has been extended by several attached

technologies

  • What is markup? Tags and other additional

descriptions of structure/content/layout/etc.

  • XML is based on a strict grammar of tags
  • Different tag sets for different applications
slide-9
SLIDE 9

XML-1 J. Teuhola 2013 9

What XML is NOT? Misconceptions corrected:

  • XML is not a programming language

(but could be used for marking such)

  • XML is not a transport protocol

(but is commonly used to markup documents transferred in computer networks)

  • XML is not a database structure

(but may be stored in databases, and queried therefrom)

slide-10
SLIDE 10

XML-1 J. Teuhola 2013 10

What is an XML document?

  • Contains only text, not binary data
  • Roughly divided into tags and character data
  • Rules for well-formedness, e.g. start and end

tags must match.

  • Tags can be chosen freely, but the XML

application program must know the tags

  • Nesting of tag pairs defines hierarchical

documents (tree structures)

slide-11
SLIDE 11

XML-1 J. Teuhola 2013 11

Kinds of XML documents

  • Narrative documents:

– Long text paragraphs – ’Semi-structured data’: flexible component structure – E.g. books, articles, web-pages, mail, etc.

  • Data-oriented documents:

– Shorter data units – More uniform structuring – Resembles formatted databases (though textual representation)

  • XML was originally planned for narrative

documents.

slide-12
SLIDE 12

XML-1 J. Teuhola 2013 12

Example document (narrative)

<?xml version="1.0"?> <article> <title>Adaptive Text Compression</title> <author>John W. Smith</author> <text>Due to correlations between subsequent characters in natural language texts, it is possible to predict the next character on the basis of predecessors, which enables efficient compression. An adaptive compression method learns the correlations gradually, so that the properties of the text already processed are utilized when making predictions of the followers. </text> </article>

slide-13
SLIDE 13

XML-1 J. Teuhola 2013 13

Example document (data-oriented)

<?xml version="1.0"?> <course name=“Advanced databases"> <teacher>Jukka</teacher> <semester>Spring 2013</semester> <audience> <student>Pekka</student> <student>Pirkko</student> </audience> </course>

slide-14
SLIDE 14

XML-1 J. Teuhola 2013 14

XML goals

  • Interoperability among users in the same field
  • Portability of data objects between applications
  • Flexibility in transforming one XML represen-

tation to another

  • Customizability of the tag sets
  • The markup should bear also the semantics
  • f documents
  • The markup should not define how the

document is displayed (but special tools for that are defined, as well).

slide-15
SLIDE 15

XML-1 J. Teuhola 2013 15

Comparison: HTML

<html> <head><title>Advanced databases</title></head> <body> <h1>Advanced databases</h1> <p>Teacher: Jukka</p> <p>Semester: Fall 2011</p> <p>Students: <ul> <li>Pekka</li> <li>Pirkko</li> </ul> </body> </html>

slide-16
SLIDE 16

XML-1 J. Teuhola 2013 16

Writing XML applications

Alternatives:

  • Self-made programs e.g. in Java, C++, Python,

Perl, etc., using ready-made (often free) libraries for auxiliary tasks

  • Off-the-shelf software:

– General: Editors, validators, transformers (Note: any text editor can be used for editing). – Application-specific (for specialized tag sets)

slide-17
SLIDE 17

XML-1 J. Teuhola 2013 17

Parsing and validation

  • No fixed tag set; yet strict syntax:

well-formedness must be verified. This is usually done by an XML parser.

  • Application-specific markup is defined in a schema; the

document is valid if it matches the schema, otherwise

  • invalid. Two ways of defining the schema (by W3C):

– Document Type Definition (DTD) – XML Schema Language

  • Not all constraints can be specified in a declarative

language; the rest must be handled in application software.

slide-18
SLIDE 18

XML-1 J. Teuhola 2013 18

Development of XML

1. Starting point: SGML (1986); very complicated, long specification with many special cases, no widespread use 2. XML 1.0 (1998) simplified from SGML, immediate success 3. Namespaces (1999) extended generality and cross-application usage 4. Transformations (XSLT, 1999) were defined for portability and output (XSL-FO, 2001) 5. Addressing elements of docs: XPath (1999)

slide-19
SLIDE 19

XML-1 J. Teuhola 2013 19

Development of XML (cont.)

6. Definition of Application Program Interfaces: – DOM (2000): Document Object Model, objects within objects (OO view) – SAX (2000): Simple API for XML (sequential processing, developed outside W3C) 7. Pointers for hypertext (2001): XLink (between docs) and XPointer (within docs). 8. XML Schema (2001): complex, not very successful; external parties developed their own schema languages. 9. Later extensions: XQuery, XInclude, RDF, Signatures, ...

slide-20
SLIDE 20

XML-1 J. Teuhola 2013 20

Example application domains for XML

  • MathML: Mathematical Markup Language
  • SVG: Scalable Vector Graphics
  • MusicXML: Music notation format
  • SMIL: Synchronized Multimedia Integration Language
  • CML: Chemical Markup Language
  • X3D: Virtual Reality Modeling Language (earlier VRML)
  • GML: Geography Markup Language
  • Office Open XML: Zipped XML, e.g. docx (Word 2007)
  • Web Services: Simple Object Access Protocol, Web

Service Description Language