XML and XQuery 5DV120 Database System Principles Ume a University - - PowerPoint PPT Presentation

xml and xquery
SMART_READER_LITE
LIVE PREVIEW

XML and XQuery 5DV120 Database System Principles Ume a University - - PowerPoint PPT Presentation

XML and XQuery 5DV120 Database System Principles Ume a University Department of Computing Science Stephen J. Hegner hegner@cs.umu.se http://www.cs.umu.se/~hegner XML and XQuery 20130422 Slide 1 of 46 The XML Context XML was


slide-1
SLIDE 1

XML and XQuery

5DV120 — Database System Principles Ume˚ a University Department of Computing Science Stephen J. Hegner hegner@cs.umu.se http://www.cs.umu.se/~hegner

XML and XQuery 20130422 Slide 1 of 46

slide-2
SLIDE 2

The XML Context

  • XML was designed for data exchange, not for large, homogeneous

databases.

  • Its primary usage is thus quite different from that of the relational model.
  • In terms of potential application, XML and the relational model are, for

the most part, complementary rather than competitive frameworks.

  • Nevertheless, there are similarities.
  • In particular, the XML query language XQuery has many similarities with

SQL.

  • In these slides, some basic ideas surrounding XML and XQuery are

presented.

  • The goal it to provide a brief introduction, not a comprehensive

presentation.

XML and XQuery 20130422 Slide 2 of 46

slide-3
SLIDE 3

The Components Surrounding XML

Data: XML is the language for representing data. DML: There are many query languages for XML, among them: XPath: A language for expressing queries as operations on paths. XQuery: A more comprehensive query language, in many ways similar to SQL, containing XPath as a subset. XSLT: An imperative document translation language which may also be used to express queries. SQL/XML: Used for data exchange and storage between relational and XML databases. DDL: There is no true DDL for XML; any well-formed XML expression qualifies as data. Constraints: However, there are at least two languages for expressing constraints on XML data. DTD: An old language, inherited from SGML, with limited expressive power. XML Schema: A newer and very comprehensive language, but relatively complex.

XML and XQuery 20130422 Slide 3 of 46

slide-4
SLIDE 4

The Family Tree of XML

  • XML, at least originally, stood for eXtended Markup Language.
  • XML is a descendant of SGML (Standard Generalized Markup Language).
  • In this sense, it is a cousin of HTML.
  • All of these languages are characterized by nested blocks with tag pairs
  • f the form <foo> and </foo>.
  • In HTML, the tags and their semantics are fixed in the definition of the

language.

  • In XML, on the other hand, even the tags themselves may be freely

chosen, and are limited only by constraints expressed in an appropriate language such as DTD or XML Schema.

XML and XQuery 20130422 Slide 4 of 46

slide-5
SLIDE 5

A Simple Example

  • Here is a simple way (but not the only way) to represent the tuple

(’Biology’,’Watson’,’90000’) of the department relation of the university schema.

<department dept_name="Biology"> <building >Watson </building > <budget >90000 </budget > </ department >

  • It may be regarded as a simple tree representation.

department dept name Biology building Watson budget 90000

XML and XQuery 20130422 Slide 5 of 46

slide-6
SLIDE 6

Types of Vertices in the Tree Representation

<department dept_name="Biology"> <building >Watson </building > <budget >90000 </budget > </ department > department dept name Biology building Watson budget 90000

  • Each yellow box is a tag vertex.
  • Each green box is an attribute vertex, and dept name is called an

attribute of the tag department with value Biology.

  • Each blue box is a text vertex.

Warning: XPath uses a somewhat different and more complex classification (not covered in detail here).

  • For now, this one will suffice.

XML and XQuery 20130422 Slide 6 of 46

slide-7
SLIDE 7

Attributes versus Elements with Text Values

Question: Why not represent dept name as the value enclosed in tags, as shown on the right, as opposed to an attribute of department?

<department dept_name="Biology"> <building >Watson </building > <budget >90000 </budget > </department > <department > <dept_name >Biology </dept_name > <building >Watson </building > <budget >90000 </budget > </ department >

Answer: It is a design decision, and both work.

  • There is an advantage to attributes for representing key constraints in

DTD.

  • More later on this.
  • It is also possible to have several attributes:

<department dept_name="Biology" budget="90000"> <building >Watson </building > </ department >

XML and XQuery 20130422 Slide 7 of 46

slide-8
SLIDE 8

Order of Children

  • The order of tag children is significant.
  • Thus, the following two expressions are different.

<department dept_name="Biology"> <building >Watson </building > <budget >90000 </budget > </department > <department dept_name="Biology"> <budget >90000 </budget > <building >Watson </building > </ department >

  • However, the order of attributes is not significant.
  • Thus, the following two expressions are equivalent.

<department dept_name="Biology" budget="90000"> <building >Watson </building > </department > <department budget="90000" dept_name="Biology"> <building >Watson </building > </ department >

  • Hence, in the tree representation, the order of tag vertices matters, but

the order of attribute vertices does not.

XML and XQuery 20130422 Slide 8 of 46

slide-9
SLIDE 9

Deeper Nesting and Multiple Occurrences

  • There is no limit to the depth of nesting, and tags may be repeated.
  • Here is part of an example from a nested representation of the university

database.

<department dept_name="Biology"> <building >Watson <building > <budget >90000 </budget > <instructor iid="76766"> <name >Crick </name > <salary >72000 </salary > <teaches > ... </teaches > <teaches > ... </teaches > </instructor > <course cid="BIO -101"> <title >Intro. to Biology </title > <credits >4</credits > <section > ... </section > <section > ... </section > </course > <course cid="BIO -301"> ... </course > <student > <sid >98988 </sid > <name >Tanaka </name > <tot_cred >120 </tot_cred > <takes > ... </takes > <takes > ... </takes > <advisor > <iid >76766 </iid > </advisor > </student > </ department >

  • The only requirement is that the tags be nested properly.

XML and XQuery 20130422 Slide 9 of 46

slide-10
SLIDE 10

Document Structure

  • An entire document must be a well-formed XML expression.
  • Thus, there must be an encompassing pair of begin-end block markers.

<university_flat > <!-- The university database

  • f the

textbook in XML

  • ->

<department dept_name="Biology"> ... </ department > ... <department dept_name=" Basketweaving "> ... </department > <instructor > ... </ instructor > ... <instructor > ... </ instructor > ... </ university_flat >

  • Note that linebreaks are just whitespace.
  • Layout is as freeform as HTML.
  • Comments are represented just as in HTML.

XML and XQuery 20130422 Slide 10 of 46

slide-11
SLIDE 11

Accessing and Querying XML Databases

  • Since an XML document is just a text file, it may be created and

accessed with any text editor.

  • However, to query it properly, to check it against a DTD or XML Schema

specification, and even to access it at all if it is very large, an XML DBMS, or a relational DBMS with XML support is necessary.

  • In this course, the XML DBMS eXist-db will be used for that purpose.
  • It uses a Web interface.
  • However, it is also very easy to install on your own computer.
  • It is written in Java, and requires only JDK.

XML and XQuery 20130422 Slide 11 of 46

slide-12
SLIDE 12

The Departmental Installation of eXist-db

URL: http://exist-db.cs.umu.se

  • You should have received a user-ID and password.
  • The system will be demonstrated during a lecture.
  • This slide provides just some basic points.
  • Login is via the Admin tab under Administration.
  • Under Browse->Collections, the main directory /db may be

found.

  • Some system-wide databases may be found there, as may /db/home.
  • Under /db/home is the private directory of each user.
  • Under that directory, the possibility of downloading and deleting

files, as well as creating subdirectories, is provided.

  • There is also the Webstart client which will provides browsing, change
  • f access rights, etc.
  • Access rights are in Unix/Linux style.

XML and XQuery 20130422 Slide 12 of 46

slide-13
SLIDE 13

The Departmental Installation of eXist-db — 2

  • There are two interfaces for a query client, accessible from the home

page. XQuery Sandbox: A simple interface which allows the user to paste in a query to a database and have it evaluated. XQuery IDE (eXide): A newer and fancier interface, but does not communicate with some clipboards. ❉ Does not talk to Emacs.

  • Only results which are themselves well-formed XML expressions are

displayed.

  • Results which are lists of values, for example, are not displayed.

XML and XQuery 20130422 Slide 13 of 46

slide-14
SLIDE 14

XQuery

  • As noted previously, there are several possible languages for querying

XML databases.

  • The two “pure” languages are XPath and XQuery.
  • XPath is a relatively simple language, based upon paths through the tree

representing an XML expression.

  • Joins cannot be expressed in XPath, for example.
  • XQuery is a more comprehensive query language with a flavor similar to

that of SQL.

  • XPath 1.0 is essentially a subset of XQuery 2.0.
  • Thus, while only XQuery will be considered here, the presentation will

include aspects of XPath.

XML and XQuery 20130422 Slide 14 of 46

slide-15
SLIDE 15

XQuery: Selecting the Document and Simple Paths

  • Every query must give the path to the document.
  • This one finds the entire university flat database:
  • The answer to this query is the whole document.

doc("/db/ university / university_flat .xml")

  • Items in paths are separated by forward slashes, as in Unix.
  • This query returns the whole document as well, since the

<university flat> – /university flat> block is at the top level.

doc("/db/ university / university_flat .xml")/ university_flat

  • This query returns just the departments.

doc("/db/ university / university_flat .xml")/ university_flat /department

XML and XQuery 20130422 Slide 15 of 46

slide-16
SLIDE 16

XQuery: Simple Paths 2

  • /*/ acts as a wildcard for one level.
  • The following two queries are equivalent, since there is only one

university flat element.

doc("/db/ university / university_flat .xml")/ university_flat /department doc("/db/ university / university_flat .xml")/*/ department

  • The following query returns all department names.

doc("/db/ university / university_flat .xml")/*/ department / @dept_name

However, the eXist-db query evaluator displays only well-formed XML

statements, and since the result is a set of strings, it will show an empty answer, although it will provide a correct count of the number of values.

  • It will be shown how to overcome this shortly.

XML and XQuery 20130422 Slide 16 of 46

slide-17
SLIDE 17

XQuery: Conditions in Paths

  • To select specific values, use the square condition brackets.

doc("/db/ university / university_flat .xml") /*/ department[ @dept_name =’Biology ’]/ building doc("/db/ university / university_flat .xml") /*/ department[ @dept_name =’Biology ’ or @dept_name=’Music ’]

  • The following query finds all departments which have a dept name

attribute.

doc("/db/ university/ university_flat .xml")/*/ department [ @dept_name ]/ building

  • The following query finds all departments which have a budget of less

than 90000

doc("/db/ university/ university_flat .xml")/*/ department [budget <90000]

  • Conditions need not be at the end of the path.

doc("/db/ university / university_flat .xml") /*/ department[budget <90000]/ building

XML and XQuery 20130422 Slide 17 of 46

slide-18
SLIDE 18

XQuery: Going Uphill and Way Downhill in Paths

  • It is even possible to go backwards in paths to the parent.
  • The following query finds all departments which have a budget of less

than 90000 and which have a building.

doc("/db/ university / university_flat .xml") /*/ department[budget <’90000 ’]/ building /..

  • The pair // goes as far down as necessary in a path.
  • The following query finds all buildings regardless of how deeply buried by

nesting.

doc("/db/ university / university_flat .xml")// building

XML and XQuery 20130422 Slide 18 of 46

slide-19
SLIDE 19

XQuery: FLWOR Expressions

  • Path expressions, by themselves, are limited in power.
  • FLWOR expressions are more powerful, and often much easier to use

even when a solution with using a path expression is possible. FLWOR = For Let Where Order Return, pronounced “flower”.

  • FLWOR expressions have a flavor similar in some ways to SQL.
  • The following two queries are equivalent.

doc("/db/ university / university_flat .xml")/*/ department for ✩x in doc("/db/ university / university_flat .xml")/*/ department return ✩x

  • But much more can be done.

for $dept in doc("/db/university / university_flat .xml")/*/ department

  • rder by ✩dept/name

return ✩dept

XML and XQuery 20130422 Slide 19 of 46

slide-20
SLIDE 20

XQuery: Where Clauses and Evaluation

  • The following two queries do the same thing:

doc("/db/ university / university_flat .xml") /*/ department[ @dept_name =’Biology ’]/ building for $dept in doc("/db/university / university_flat .xml")/*/ department where ✩dept[ @dept_name="Biology"] return ✩dept/building

  • To view results which are strings in the eXist-db result window, wrap

them in tags using the element directive.

for $dept in doc("/db/ university / university_flat .xml")/*/ department where ✩dept/building=<building >Watson </building > return element foo { ✩dept/ @dept_name }

  • The following does the same thing.

for ✩dept in doc("/db/ university / university_flat .xml") /*/ department[building=’Watson ’] return <foo > {✩ dept/ @dept_name} </foo >

  • In the above, the set brackets {...} indicate that the enclosed

expression should be evaluated and not returned literally.

XML and XQuery 20130422 Slide 20 of 46

slide-21
SLIDE 21

XQuery: Enclosing Several Elements in Tags

  • Wrapping in tags is also useful for returning several items as a single

expression.

for ✩dept in doc("/db/ university / university_flat .xml")/*/ department where ✩dept[ @dept_name="Biology"] return element foo { ✩dept/building , ✩dept/budget } for ✩dept in doc("/db/ university / university_flat .xml")/*/ department where ✩dept[ @dept_name="Biology"] return <foo > {✩ dept/building} {✩ dept/budget} </foo >

  • Attributes may also be included in this fashion.

for ✩dept in doc("/db/ university / university_flat .xml")/*/ department where ✩dept[ @dept_name="Biology"] return element foo { attribute greeting {"hello"}, ✩dept/building , ✩dept/budget }

  • As may nested element directives.

for ✩dept in doc("/db/ university / university_flat .xml")/*/ department where ✩dept[ @dept_name="Biology"] return element foo { attribute greeting {"hello"}, element Gruss {"Hallo"}, ✩dept/building , ✩dept/budget }

XML and XQuery 20130422 Slide 21 of 46

slide-22
SLIDE 22

XQuery: Joins

  • Joins may be realized via a double for loop.
  • The following query finds the building of the department of each

instructor.

for ✩dept in doc("/db/ university / university_flat .xml")/*/ department for ✩instr in doc("/db/ university / university_flat .xml")/*/ instructor where ✩dept/ @dept_name = ✩instr/dept_name return element instr_bldg { ✩instr/name , ✩dept/building }

  • This may also be written as:

for ✩dept in doc("/db/ university / university_flat .xml")/*/ department for ✩instr in doc("/db/ university / university_flat .xml")/*/ instructor where ✩dept/ @dept_name = ✩instr/dept_name return <instr_bldg > {✩ instr/name} {✩ dept/building} </instr_bldg >

XML and XQuery 20130422 Slide 22 of 46

slide-23
SLIDE 23

XQuery: Nested Queries

  • Find all colleagues of Srinivasan; that is, all instructors who work in the

same department as Srinivasan.

for ✩dept in doc("/db/ university/ university_flat .xml") // instructor [name=" Srinivasan"]/ dept_name return element colleagues { doc("/db/ university / university_flat .xml")// instructor [dept_name =✩ dept] }

  • Exclude Srinivasan.

for ✩dept in doc("/db/ university/ university_flat .xml") // instructor [name=" Srinivasan"]/ dept_name return element colleagues { doc("/db/ university / university_flat .xml")// instructor [dept_name =✩ dept] except doc("/db/ university / university_flat .xml")// instructor [name=" Srinivasan"] }

XML and XQuery 20130422 Slide 23 of 46

slide-24
SLIDE 24

XQuery: Let and Aggregation

  • The let clause returns the list of all values which are matched.
  • It is particularly useful in aggregation queries, since the aggregation

functions operate on lists.

  • Here is a simple query which counts the departments.

let ✩depts := doc("/db/ university / university_flat .xml")/ university_flat /department return element answer {attribute n_depts {count (✩ depts )} }

XML and XQuery 20130422 Slide 24 of 46

slide-25
SLIDE 25

XQuery: Let and Aggregation 2

  • Here is a query which returns, for each department, an element with

attributes giving its name, the number of instructors, and the maximum, minimum, average, and total salary of all of its instructors.

for $dept in doc("/db/ university / university_flat .xml")// department let ✩instrs := doc("/db/ university / university_flat .xml") // instructor [dept_name =✩ dept/ @dept_name]

  • rder by count (✩ instrs) descending , sum (✩ instrs/salary) descending

return element dept_summary { attribute dept_name {✩ dept/ @dept_name}, attribute instr_count {count (✩ instrs )}, attribute salary_max {max (✩ instrs/salary )}, attribute salary_min {min (✩ instrs/salary )}, attribute salary_avg {avg (✩ instrs/salary )}, attribute salary_sum {sum (✩ instrs/salary )} }

XML and XQuery 20130422 Slide 25 of 46

slide-26
SLIDE 26

XQuery Updates: Replace

  • XQuery has recently added support for updates.
  • A few of the most basic operations are illustrated here.

Reminder: You do not have update privileges on doc("/db/university/university flat.xml"), so try these on your

  • wn copy.

Replace: Change the building of the Comp. Sci. department to MIT-huset.

for $bldg in doc("/db/ university / university_flat .xml")/*/ department /building where ✩bldg /../ @dept_name = "Comp.Sci." return update replace ✩bldg with <building >MIT -huset </building >

  • It is possible to achieve the same result by replacing the value while

retaining the tags.

for $bldg in doc("/db/ university / university_flat .xml")/*/ department /building where ✩bldg /../ @dept_name = "Comp.Sci." return update value ✩bldg with "MIT -huset"

XML and XQuery 20130422 Slide 26 of 46

slide-27
SLIDE 27

XQuery: If-Then-Else

  • This is a convenient point to sneak in an example with if-then-else.

Replace: Toggle the building of the Comp. Sci. department between Taylor and MIT-huset.

for ✩bldg in doc("/db/ university / university_flat .xml")/*/ department /building where ✩bldg /../ @dept_name = "Comp.Sci." return if (✩ bldg = element building {"Taylor"}) then update value ✩bldg with "MIT -huset" else if (✩ bldg = element building {"MIT -huset"}) then update replace ✩bldg with element building {"Taylor"} else ()

  • The else clause is mandatory, but it may be empty, as shown.

XML and XQuery 20130422 Slide 27 of 46

slide-28
SLIDE 28

XQuery Updates: Insert

Insert: Add <foo>bar</foo> to the top-level database.

for ✩top in doc("/db/ university / university_flat .xml")/ university_flat return update insert <foo >bar </foo > into ✩top

  • Of course, think of adding something more interesting, such as a new

department, instructor, student, or course.

  • Insert <foo>bar</foo> just after the Comp. Sci. department.

for ✩cs in doc("/db/ university / university_flat .xml") /*/ department[ @dept_name ="Comp.Sci."] return update insert <foo >bar </foo > following ✩cs

  • To insert it just before instead of just after, replace following with

preceding in the above.

  • Add the attribute language to the Comp. Sci. department with value

Swedish.

for ✩cs in doc("/db/ university / university_flat .xml") /*/ department[ @dept_name ="Comp.Sci."] return update insert attribute language {"Swedish"} into ✩cs

XML and XQuery 20130422 Slide 28 of 46

slide-29
SLIDE 29

XQuery Updates: Delete

Delete: Delete all occurrences of <foo>bar</foo> from the top-level database.

for ✩foo in doc("/db/ university / university_flat .xml")/ university_flat /foo return update delete ✩foo

  • Remove the language attribute from the Comp. Sci. department.

for ✩cs in doc("/db/ university / university_flat .xml") /*/ department[ @dept_name ="Comp.Sci."] return update delete ✩cs/@language

XML and XQuery 20130422 Slide 29 of 46

slide-30
SLIDE 30

XQuery: Tokenization of Strings

  • To extract the ”words” of a string, with ”words” separated by

whitespace, use the tokenize operation.

tokenize("ab1 c2drm","\s+")

  • The above call returns the sequence ("a","b1","c2","drm").
  • The function tokenize takes two arguments, a string and a regular

expression (RE).

  • The string is broken into ”words” separated by strings matching the RE.
  • \s matches any whitespace character; \s+ is a nonempty sequence of

such characters.

XML and XQuery 20130422 Slide 30 of 46

slide-31
SLIDE 31

XQuery: Normalizing Whitespace of Strings

  • If there is whitespace at the beginning and/or end of the string, the

tokenize function will interpret that whitespace as a token.

  • The following returns six tokens a sequence of six tokens:

("","a","b1","c2","drm","")

tokenize("ab1 c2drm","\s+")

  • To remove these, use the function normalize-space.

tokenize(normalize -space("ab1 c2drm"),"\s+")

  • The above function returns ("a","b1","c2","drm").

XML and XQuery 20130422 Slide 31 of 46

slide-32
SLIDE 32

XQuery: Set Operations on Sequences of Atoms

  • The XQuery operations union, intersection, and except operate only
  • n sequences of nodes, not sequences of atoms.
  • Set operations on sequences of atoms may be realized using the

distinct-values operator. Union :

let ✩x := tokenize(normalize -space("ab1c2drm"),"\s+") let ✩y := tokenize(normalize -space("c2drmef"),"\s+") return distinct -values ((✩x,✩y))

Intersection:

let ✩x := tokenize(normalize -space("ab1c2drm"),"\s+") let ✩y := tokenize(normalize -space("c2drmef"),"\s+") return distinct -values (✩x[.=✩y])

Set difference:

let ✩x := tokenize(normalize -space("ab1c2drm"),"\s+") let ✩y := tokenize(normalize -space("c2drmef"),"\s+") return distinct -values (✩x[not (.=✩y)])

XML and XQuery 20130422 Slide 32 of 46

slide-33
SLIDE 33

XQuery: Set Operations on Sequences of Atoms 2

  • Set operations on sequences of atoms may be realized using more basic

functions. Union : (with duplicates not removed)

let ✩x := tokenize(normalize -space("ab1c2drm"),"\s+") let ✩y := tokenize(normalize -space("c2drmef"),"\s+") return (✩x,✩y)

Union : (with duplicates removed)

let ✩x := tokenize(normalize -space("ab1c2drm"),"\s+") let ✩y := tokenize(normalize -space("c2drmef"),"\s+") let ✩u := ( for ✩z in ✩y where (every ✩w in ✩x satisfies (✩z != ✩w)) return ✩z) return (✩x,✩u)

XML and XQuery 20130422 Slide 33 of 46

slide-34
SLIDE 34

XQuery: Set Operations on Sequences of Atoms 3

  • Set operations on sequences of atoms may be realized using more basic

functions (continued). Intersection:

let ✩x := tokenize(normalize -space("ab1c2drm"),"\s+") let ✩y := tokenize(normalize -space("c2drmef"),"\s+") for ✩z in ✩x where (some ✩w in ✩y satisfies (✩z = ✩w)) return ✩z

Set difference:

let ✩x := tokenize(normalize -space("ab1c2drm"),"\s+") let ✩y := tokenize(normalize -space("c2drmef"),"\s+") for ✩z in ✩x where (every ✩w in ✩y satisfies (✩z != ✩w)) return ✩z

XML and XQuery 20130422 Slide 34 of 46

slide-35
SLIDE 35

XQuery: Quantification Operations on Sequences of Atoms

Universal quantification: The following expression returns the members of $x which are smaller than every element of $y: (1,2).

let ✩x := (1, 2, 3, 4, 5, 6, 7, 8) let ✩y := (3, 4, 5) for ✩z in ✩x where (every ✩w in ✩y satisfies (✩z < ✩w)) return ✩z

  • The members of the result are returned one at a time.

Existential quantification: The following expression returns the members of $x which are smaller than some element of $y: (1,2,3,4).

let ✩x := (1, 2, 3, 4, 5, 6, 7, 8) let ✩y := (3, 4, 5) for ✩z in ✩x where (some ✩w in ✩y satisfies (✩z < ✩w)) return $z

XML and XQuery 20130422 Slide 35 of 46

slide-36
SLIDE 36

XQuery: Comments

  • Comments in XQuery are delimited by (: and :).

(: This is a comment which is split

  • ver

two

  • lines. :)

element bar { let ✩x := (1, 2, 3, 4, 5, 6, 7, 8) let ✩y := (3, 4, 5) for ✩z in ✩x where (some ✩w in ✩y satisfies (✩z < ✩w)) return element foo {$z} }

XML and XQuery 20130422 Slide 36 of 46

slide-37
SLIDE 37

Namespaces

  • A primary use of XML is data exchange.
  • The problems associated with having a common, global namespace for all

sources of data are obvious.

  • For example, an XML database for Ume˚

a University would very well many names in common with those for Yale University, although the substructure and hence semantics would well be different.

  • This would make meaningful data exchange difficult at best.
  • For this reason, there is a way in XML to identify namespaces and to

qualify a name with a namespace.

  • The general format is shown on the next slide.

XML and XQuery 20130422 Slide 37 of 46

slide-38
SLIDE 38

Namespaces 2

<university_flat xmlns="http: // www.yale.edu" xmlns:umu="http: // www.umu.se" xmlns:hsv="http: // www.hsv.se" > <department > <!-- This department name belongs to the default namespace

  • ->

<!-- XML data statements

  • ->

</department > <umu:department > <!-- This department name belongs to the umu namespace

  • ->

<!-- XML data statements

  • ->

</ umu:department > <hsv:department > <!-- This department name belongs to the hsv namespace

  • ->

<!-- XML data statements

  • ->

</ hsv:department > </ university_flat >

Default namespace: The attribute xmlns identifies the default namespace.

  • Names in the default namespace do not need to be qualified.

Other namespaces: These need to be declared with a prefix (which is actually the suffix following xlmns:), and that prefix used with all instances of names from that space.

  • Namespace declarations may be nested.

All names beginning with xml are potentially reserved and should not be

user declared.

XML and XQuery 20130422 Slide 38 of 46

slide-39
SLIDE 39

Namespaces 3 — the Values of Namespaces

<university_flat xmlns="http: // www.yale.edu" xmlns:umu="http: // www.umu.se" xmlns:hsv="http: // www.hsv.se" > <department > <!-- This department name belongs to the default namespace

  • ->

<!-- XML data statements

  • ->

</department > <umu:department > <!-- This department name belongs to the umu namespace

  • ->

<!-- XML data statements

  • ->

</ umu:department > <hsv:department > <!-- This department name belongs to the hsv namespace

  • ->

<!-- XML data statements

  • ->

</ hsv:department > </ university_flat >

  • The URLs in the above example are just strings.
  • There is no requirement that they point to any useful information about

the document structure.

  • There is no requirement that they point to anything at all.

XML and XQuery 20130422 Slide 39 of 46

slide-40
SLIDE 40

DTD — Document Type Definition

DTDs Document Type Definitions are inherited from SGML and predate XML.

  • They are written in SGML, not XML.
  • They may be used to specify some basic constraints, but many useful

types of constraints cannot be expressed.

  • They are nevertheless still in fairly wide use because they are easy to

write and understand.

  • They use a form of regular expression to express certain things.
  • The general format is as follows.

<!DOCTYPE name [ ... declarations ... ]>

XML and XQuery 20130422 Slide 40 of 46

slide-41
SLIDE 41

DTD 2

<!DOCTYPE university [ ... <!ELEMENT university_flat ( department*,instructor *,course*,classroom*, section*,teaches*,student*,takes*,advisor*, time_slot*,prereq *)> <!ELEMENT department (building ,budget)> <!ELEMENT building (# PCDATA) <!ELEMENT budget (# PCDATA) <!ATTLIST department dept_name ID #REQUIRED > <!ELEMENT instructor (name ,dept_name ,salary) <!ATTLIST instructor iid ID #REQUIRED > <!ELEMENT name (# PCDATA) <!ELEMENT salary (# PCDATA) ... ]>

  • Above is a (partial) example of a DTD for university flat.
  • An !ELEMENT declaration gives the required list of tagged elements under

the identified tag.

  • * means zero or more occurrences.
  • + means one or more occurrences.
  • Order is significant.
  • #PCDATA means that it is a string (Parsed Character DATA).
  • An element may also be declared #ANY, meaning that there are no

restrictions on its value.

XML and XQuery 20130422 Slide 41 of 46

slide-42
SLIDE 42

DTD 3

  • The same schema continued:

<!DOCTYPE university [ ... <!ELEMENT university_flat ( department*,instructor *,course*,classroom*, section*,teaches*,student*,takes*,advisor*, time_slot*,prereq *)> <!ELEMENT department (building ,budget)> <!ELEMENT building (# PCDATA) <!ELEMENT budget (# PCDATA) <!ATTLIST department dept_name ID #REQUIRED > <!ELEMENT instructor (name ,dept_name ,salary) <!ATTLIST instructor iid ID #REQUIRED > <!ELEMENT name (# PCDATA) <!ELEMENT salary (# PCDATA) ... ]>

  • An !ATTLIST declaration gives the attributes under the identified tag.
  • ID means that the value is unique in the entire document.
  • #REQUIRED means that it is mandatory to have the attribute.
  • Order is not significant.

XML and XQuery 20130422 Slide 42 of 46

slide-43
SLIDE 43

DTD 4

<!DOCTYPE university2 [ ... <!ELEMENT university_flat ( department*,instructor *,course*,classroom*, section*,teaches*,student*,takes*,advisor*, time_slot*,prereq *)> <!ELEMENT department (building ,budget)> <!ATTLIST department dept_name ID #REQUIRED > <!ELEMENT instructor (name ,salary) <!ATTLIST instructor iid ID #REQUIRED dept_name IDREF #REQUIRED > ... ]>

  • The above DTD differs slightly from the previous one in that dept name

is now an attribute of instructor, instead of a subelement.

  • IDREF means that the value must occur as the ID somewhere else in the

document

  • A weak form of referential integrity (foreign key).
  • An attribute which is neither ID or IDREF may be declared to be CDATA,

a character string.

  • Attributes other than IDs may be declared to be #IMPLIED instead of

#REQUIRED, which means that they are optional.

XML and XQuery 20130422 Slide 43 of 46

slide-44
SLIDE 44

Shortcomings of DTDs

  • DTDs have the following significant limitations:
  • They are not aware of namespaces.
  • The only basic types are (glorified) strings.
  • Support for database constraints is very limited.
  • Keys must be global over all attributes.
  • Referential integrity is existential.
  • They impose order when none is wanted.
  • All element definitions are global; there is no scoping.
  • Nevertheless, they are simple to write and better than nothing, so they

are still used.

XML and XQuery 20130422 Slide 44 of 46

slide-45
SLIDE 45

XML Schema

  • Many of the shortcomings of DTDs are addressed in XML Schema, a

language which is explicitly designed to express complex constraints on XML documents.

  • Key and referential integrity constraints may be expressed, even when the

attributes used form a complex structure.

  • Unfortunately, the language is also relatively complex.
  • For reasons of limited time, it will not be considered further in this course.

XML and XQuery 20130422 Slide 45 of 46

slide-46
SLIDE 46

XQuery Programming

XQJ: A standard API for issuing XQuery queries to an XML database from Java.

  • Somewhat analogous to JDBC for SQL.
  • Many native XML database systems, including eXist-db, support the

XQJ API. XQC: A developing standard API for C.

  • Newer than XQJ and still very much under development.
  • Current implementations include Zorba and XQuilla.
  • Remember that XML is used much more for data exchange and much

less for database management than the relational model.

  • Much of the use of XML is not primarily focused upon formal querying in

general or XQuery in particular.

  • Therefore, programming surrounding XML has a very different flavor,

which is very important and useful, but beyond the scope of this course.

XML and XQuery 20130422 Slide 46 of 46