overview database management systems
play

Overview Database Management Systems Semi-Structured Data - PowerPoint PPT Presentation

Lecture 12 Overview Database Management Systems Semi-Structured Data Introduction to XML Winter 2004 Querying XML Documents CMPUT 391: XML and Querying XML Dr. Osmar R. Zaane Chapter 17 University of Alberta of Textbook 1


  1. Lecture 12 Overview Database Management Systems • Semi-Structured Data • Introduction to XML Winter 2004 • Querying XML Documents CMPUT 391: XML and Querying XML Dr. Osmar R. Zaïane Chapter 17 University of Alberta of Textbook 1 2 Dr. Osmar Zaïane, 2001-2004 CMPUT 391 – Database Management Systems University of Alberta Dr. Osmar Zaïane, 2001-2004 CMPUT 391 – Database Management Systems University of Alberta The Structure of Data Structured Data • In the real world data can be of any type • For applications manipulating data, the structure of data is very important to insure efficiency and effectiveness. and not necessarily following any organized • The data is structured when: format or sequence. – Data is organized in semantic chunks (entities). • Such data is said to be unstructured. – Similar entities are grouped together (relations or classes). Unstructured data is chaotic because it – Entities in a same group have the same descriptions (attributes). doesn’t follow any rule and is not – Entity descriptions for all entities in a group have the same predictable. defined format, a predefined length, are all present, and follow the same order (schema). • Text data is usually unstructured. Many data • This structure is sometimes too rigid for some applications. on the Internet is unstructured (video • For many application, data is neither completely streams, sound streams, images, etc). unstructured nor completely structured. Dr. Osmar Zaïane, 2001-2004 CMPUT 391 – Database Management Systems University of Alberta 3 Dr. Osmar Zaïane, 2001-2004 CMPUT 391 – Database Management Systems University of Alberta 4

  2. Semi-Structured Data Semi-Structured Data (Cont.) • Data is organized in semantic entities • To make it suitable for machine processing • Similar entities are grouped together it should have these characteristics • But – Be object-like – Entities in the same group may not have the same – Be schemaless (doesn’t guarantee to attributes conform exactly to any schema, but – The presence of some attributes may not always be required different objects have some commonality – The size of same attributes of entities in a same among themselves) group may not be the same – Be self-describing (some schema-like – The type of the same attributes of entities in a same information, like attribute names, is part of group may not be of the same type. data itself) 5 6 Dr. Osmar Zaïane, 2001-2004 CMPUT 391 – Database Management Systems University of Alberta Dr. Osmar Zaïane, 2001-2004 CMPUT 391 – Database Management Systems University of Alberta Non-Self-Describing Data Self-Describing Data Relational or Object-Oriented: • Attribute names embedded in the data itself • Doesn’t need schema to figure out what is what Data part : • (but schema might be useful nonetheless) (#123, [“Students”, {[“John”, 111111111, [123,”Main St”]], (#12345, [“Joe”, 222222222, [321, “Pine St”]] } [ ListName : “Students”, ] ) Contents : { [ Name : “John Doe”, Schema part : Id : “111111111”, Address : [ Number : 123, Street : “Main St.”] ] , PersonList[ ListName : String, PersonList [ Name : “Joe Public”, Contents : [ Name : String, Id : “222222222”, Id : String, Address : [ Number : 321, Street : “Pine St.”] ] } Address : [ Number : Integer, Street : String] ] ] ) ] Dr. Osmar Zaïane, 2001-2004 CMPUT 391 – Database Management Systems University of Alberta 7 Dr. Osmar Zaïane, 2001-2004 CMPUT 391 – Database Management Systems University of Alberta 8

  3. Data Model for Semi-Structured Data Example: Booklist Data in OEM • Semi-structured data doesn’t have a schema. • There are many data models to represent semi- BOOK structured data. Most of them use the notion of labeled graphs. – Nodes in the graph correspond to compound AUTHOR TITLE PUBLISHED AUTHOR FORMAT TITLE objects or atomic values. – Edges in the graph correspond to attributes The Hard- Identity 1998 – The graph is self describing (no need for a schema) character cover – Object Exchange Model (OEM): each object is of phy- described by a triplet <label, type, value> Milan Kundera sical law – Complex objects are decomposed hierarchically Richard Feynman into smaller objects 9 10 Dr. Osmar Zaïane, 2001-2004 CMPUT 391 – Database Management Systems University of Alberta Dr. Osmar Zaïane, 2001-2004 CMPUT 391 – Database Management Systems University of Alberta Overview Introduction to XML • XML: eXtensible Markup Language • Semi-Structured Data • Suitable for semistructured data • Introduction to XML – Easy to describe object-like data • Querying XML Documents – Selfdescribing – Doesn’t require a schema (but can be provided optionally) • Standard of the World-Wide Web Consortium for data exchange • All major database products have been extended to store and construct XML documents Dr. Osmar Zaïane, 2001-2004 CMPUT 391 – Database Management Systems University of Alberta 11 Dr. Osmar Zaïane, 2001-2004 CMPUT 391 – Database Management Systems University of Alberta 12

  4. What is Special with XML Example attributes • It is a language to markup data <?xml version=“1.0” ?> • There are no predefined tags like in HTML <PersonList Type =“Student” Date =“2002-02-02” > • Extensible � tags can be defined and <Title Value =“Student List” /> Root element Root <Person> extended based on applications and needs … … … elements </Person> – Elements / Tags <Person> Empty … … … element </Person> – Attributes </PersonList> – ( Eg. : < BOOK page="453" > … </ BOOK >) Element (or tag) names • Elements are nested • Root element contains all others 13 14 Dr. Osmar Zaïane, 2001-2004 CMPUT 391 – Database Management Systems University of Alberta Dr. Osmar Zaïane, 2001-2004 CMPUT 391 – Database Management Systems University of Alberta More Terminology Rules for Creating XML Documents Opening tag • Rule 1 : All terminating tags shall be closed – Omitting a closing XML tag is an error. <Person Name = “John” Id = “111111111”> Example: <FirstName> Joerg </FirstName> “standalone” text, • Rule 2 : All non-terminating tags shall be closed John is a nice fellow not useful as data Person Content of Person – Omitting a forward slash for non-terminating <Address> (empty) tags is an error. Address , Parent of Address <Number>21</Number> Ancestor of number number Example <Available answer="yes" /> Nested element, child of Person Person <Street>Main St.</Street> • Rule 3 : XML shall be case sensitive </Address> – Using the wrong case is an error. … … … Child of Address Address , Example: <FirstName> Osmar </firstname> Descendant of Person Person </Person> – It is OK in HTML <H1>my header</h1> Closing tag Dr. Osmar Zaïane, 2001-2004 CMPUT 391 – Database Management Systems University of Alberta 15 Dr. Osmar Zaïane, 2001-2004 CMPUT 391 – Database Management Systems University of Alberta 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend