html
play

HTML Simple markup language Text is annotated with language - PDF document

HTML Simple markup language Text is annotated with language commands Internet Databases called tags, usually consisting of a start tag and an end tag Chapter 22 Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes


  1. HTML ❖ Simple markup language ❖ Text is annotated with language commands Internet Databases called tags, usually consisting of a start tag and an end tag Chapter 22 Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke 1 2 HTML Example: Book Listing Web Pages with Database Contents <HTML><BODY> ❖ Web pages contain the results of database Fiction: queries. How do we generate such pages? <UL><LI>Author: Milan Kundera</LI? – Web server creates a new process for a program <LI>Title: Identity</LI> interacts with the database. <LI>Published: 1998</LI> – Web server communicates with this program via </UL> CGI (Common gateway interface) Science: – Program generates result page with content from <UL><LI>Author: Richard Feynman</LI> the database <LI>Title: The Character of Physical Law</LI> – Other protocols: ISAPI (Microsoft Internet Server API), NSAPI (Netscape Server API) <LI>Hardcover</LI> </UL></BODY></HTML> Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke 3 4 Application Servers Other Server-Side Processing ❖ In CGI, each page request results in the creation of a ❖ Java Servlets: Java programs that run on the new process: very inefficient server and interact with the server through a ❖ Application server: Piece of software between the well-defined API. web server and the applications ❖ JavaBeans: Reusable software components ❖ Functionality: written in Java. – Hold a set of pre-forked threads or processes for performance ❖ Java Server Pages and Active Server Pages: – Database connection pooling (reuse a set of existing connections) Code inside a web page that is interpreted by – Integration of heterogeneous data sources the web server – Transaction management involving several data sources – Session management Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke 5 6 1

  2. Beyond HTML: XML XML: Language Constructs ❖ Elements ❖ Extensible Markup Language (XML): “Extensible HTML” – Main structural building blocks of XML – Start and end tag ❖ Confluence of SGML and HTML: The power – Must be properly nested of SGML with the simplicity of HTML ❖ Element can have attributes that provide ❖ Allows definition of new markup languages, additional information about the element called document type declarations (DTDs) ❖ Entities: like macros, represent common text. ❖ Comments ❖ Document type declarations (DTDs) Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke 7 8 Booklist Example in XML XML: DTDs <?XML version=“1.0” standalone=“yes”?> ❖ A DTD is a set of rules that defines the <!DOCTYPE BOOKLIST SYSTEM “booklist.dtd”> elements, attributes, and entities that are <BOOKLIST> <BOOK genre=“Fiction”> allowed in the document. <AUTHOR> ❖ An XML document is well-formed if it does <FIRST>Milan</FIRST><LAST>Kundera</LAST> not have an associated DTD but it is properly </AUTHOR> <TITLE>Identity</TITLE> nested. <PUBLISHED>1998</PUBLISHED> ❖ An XML document is valid if it has a DTD <BOOK genre=“Science” format=“Hardcover”> <AUTHOR> and the document follows the rules in the <FIRST>Richard</FIRST><LAST>Feynman</LAST> DTD. </AUTHOR> <TITLE>The Character of Physical Law</TITLE> </BOOK></BOOKLIST> Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke 9 10 An Example DTD Domain-Specific DTDs ❖ Development of standardized DTDs for specialized <!DOCTYPE BOOKLIST [ domains enables data exchange between <!ELEMENT BOOKLIST (BOOK)*> heterogeneous sources <!ELEMENT BOOK (AUTHOR, TITLE, PUBLISHED?)> <!ELEMENT AUTHOR (FIRST, LAST)> ❖ Example: Mathematical Markup Language <!ELEMENT FIRST (#PCDATA)> (MathML) <!ELEMENT LAST (#PCDATA)> – Encodes mathematical material on the web <!ELEMENT TITLE (#PCDATA)> – In HTML: <IMG SRC=“xysq.gif” ALT=“(x+y)^2”> <!ELEMENT PUBLISHED (#PCDATA)> – In MathML: <!ATTLIST BOOK genre (Science|Fiction) #REQUIRED> <apply> <power/> <apply> <plus/> <ci>x</ci> <ci>y</ci> </apply> <!ATTLIST BOOK format (Paperback|Hardcover) “Paperback”> <cn>2</cn> ]> </apply> Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke 11 12 2

  3. XML-QL: Querying XML Data XML-QL (Contd.) A more complicated example: ❖ Goal: High-level, declarative language that allows manipulation of XML documents WHERE <BOOK> $b <BOOK> IN “www.booklist.com/books.xml”, ❖ No standard yet <AUTHOR> $n </AUTHOR> ❖ Example query in XML-QL: <PUBLISHED> $p </PUBLISHED> in $e WHERE CONSTRUCT <BOOK> <RESULT> <PUBLISHED> $p </PUBLISHED> <NAME><LAST>$1</LAST></NAME> WHERE <LAST> $l </LAST> IN $n </BOOK> in “www.booklist.com/books.xml CONSTRUCT <LAST> $l </LAST> CONSTRUCT <RESULT> $1 </RESULT> </RESULT> Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke 13 14 Semi-structured Data Example: Booklist Data in OEM ❖ Data with partial structure BOOK ❖ All data models for semi-structured data use some type of labeled graph AUTHOR TITLE PUBLISHED AUTHOR FORMAT ❖ We introduce the object exchange model TITLE (OEM): – Object is triple (label, type, value) The Hard- Identity 1998 character cover – Complex objects are decomposed hierarchically into smaller objects of phy- Milan Kundera sical law Richard Feynman Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke 15 16 Indexing for Text Search Inverted Files RID Document ❖ Text database: Collection of text documents ❖ For each possible query 1 Agent James term, store an ordered ❖ Important class of queries: Keyword searches 2 Mobile agent list (the inverted list) of – Boolean queries: Query terms connected with document identifiers AND, OR and NOT. Result is list of documents that contain the term. that satisfy the boolean expression. Word Inverted List ❖ Query evaluation: – Ranked queries: Result is list of documents ranked Intersection or Union of Agent <1,2> by their “relevance”. inverted lists. – IR: Precision (percentage of retrieved documents James <1> that are relevant) and recall (percentage of ❖ Example: Agent AND relevant objects that are retrieved) James Mobile <2> Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke 17 18 3

  4. Signature Files Signature Files: Query Evaluation ❖ Index structure (the signature file) with one ❖ Boolean query consisting of conjunction of words: – Generate query signature Sq data entry for each document – Scan signatures of all documents. ❖ Hash function hashes words to bit-vector. – If signature S matches Sq, then retrieve document and check for false positives. ❖ Data entry for a document (the signature of ❖ Boolean query consisting of disjunction of k words: the document) is the OR of all hashed words. – Generate k query signatures S1, …, Sk ❖ Signature S1 matches signature S2 if – Scan signature file to find documents whose signature S2&S1=S2 matches any of S1, …, Sk – Check for false positives Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke 19 20 Signature Files: Example Summary Word Hash ❖ Publishing databases on the web requires server-side Agent 1010 processing such as CGI-scripts, Servlets, ASP, or JSP ❖ XML is an emerging document description standard James 1100 that allows the definition of new DTDs. Query languages for XML documents such as XQL are Mobile 0001 emerging. ❖ Text databases have gained importance with the RID Document Signature proliferation of text data on the web. Boolean queries can be efficiently evaluated using an inverted index 1 Agent James 1110 or a signature file. Evaluation of ranked queries is a 2 Mobile agent 1011 more difficult problem. Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke 21 22 4

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend