outline
play

Outline Introduction: storing and accessing data CUGS Core - - PDF document

Outline Introduction: storing and accessing data CUGS Core - Databases Semi-structured data Information integration Object-oriented and object-relational Patrick Lambrix databases Linkpings universitet 1 2 Work method


  1. Outline • Introduction: storing and accessing data CUGS Core - Databases • Semi-structured data • Information integration • Object-oriented and object-relational Patrick Lambrix databases Linköpings universitet 1 2 Work method Requirements For each topic: • Responsible for a topic (presentation + • introductory presentation by topic questions) (ca 60 hours) responsible • Participation in smaller discussion groups • in smaller groups: reading papers, • Take-home exam (ca 40 hours) discussion guided by predefined questions, summary • each smaller group presents their summary, final discussion moderated by topic responsible 3 4 Databanks/Databases Databank • One of many ways to store data in • DataBank Management System (DBMS): a electronic form collection of programs that allows a user to create and maintain a databank • used in every-day life: bank, reservation of hotel or travel, library search, bar codes • new applications : multimedia databases, • databank system = physical databank + geografic information systems, real-time DBMS databases 5 6

  2. Issues Databanks Real life • What information is stored? Queries/ information Answers updates • How is the information stored? Model (high and low level) • How is the information accessed? Databank Processing of Databank (user level, system level) queries and updates Management System • How is a databank recovered after a crash? Access to stored data Physical databank 7 8 Issues Persons • How to keep track of changes of the data • databank administrator over time? • databank designer • ’end user’ • How can several users access and update • application programmer information in a databank at the same time? • DBMS designer • How can a user access information in several • developer of tools databanks at the same time? • operator, maintenance 9 10 DEFINITION Homo sapiens adrenergic, beta-1-, receptor What information is stored? ACCESSION NM_000684 SOURCE ORGANISM human REFERENCE 1 • Model of reality AUTHORS Frielle, Collins, Daniel, Caron, Lefkowitz, - Entity-Relationship model (ER) Kobilka TITLE Cloning of the cDNA for the human - Unified Modeling Language (UML) beta 1-adrenergic receptor REFERENCE 2 AUTHORS Frielle, Kobilka, Lefkowitz, Caron TITLE Human beta 1- and beta 2-adrenergic receptors: structurally and functionally related receptors derived from distinct genes 11 12

  3. Entity-relationship Entity-Relationship protein-id source PROTEIN • entities and attributes accession definition m • entity types • key attributes Reference • relations • cardinality constraints n title article-id ARTICLE author 13 14 How is the information stored? Text - Information Retrieval (high level) How is the information accessed? • Search based on words (user level) • conceptual models: boolean, vector, probabilistic, … structure precision • Text (IR) • file models: • Semi-structured data flat files, inverted files, ... • Data models (DB) • Rules + Facts (KB) 15 16 IR – File model: inverted file Vector model (simplified) inverted file postings file document file Doc1 (1,1,0) DOC # WORD HITS LINK LINK DOCUMENTS cloning Doc2 (0,1,0) … … … … … Doc1 Q (1,1,1) adrenergic 32 1 … … … 5 … … Doc2 cloning 53 adrenergic 1 … … … 2 receptor 22 … 5 … … … sim(d,q) = d . q … … |d| x |q| receptor 17 18

  4. Relational databases PROTEIN REFERENCE Databases PROTEIN-ID ACCESSION DEFINITION SOURCE PROTEIN-ID ARTICLE-ID 1 Homo sapiens human NM_000684 1 1 adrenergic, • Relational databases: 1 2 beta-1-, receptor - model: tables + relational algebra ARTICLE - query language (SQL) ARTICLE-ID AUTHOR TITLE • Object-oriented databases: 1 Frielle Cloning of the cDNA for the human …. - model: persistent objects, 1 Collins Cloning of the cDNA for the human …. Cloning of the cDNA for the human …. 1 Daniel messages, encapsulation, inheritance Cloning of the cDNA for the human …. 1 Caron Cloning of the cDNA for the human …. - query language (t.ex. OQL) 1 Lefkowitz Cloning of the cDNA for the human …. 1 Kobilka Human beta 1- and beta 2-adrenergic receptors 2 Frielle Human beta 1- and beta 2-adrenergic receptors 2 Kobilka Human beta 1- and beta 2-adrenergic receptors 2 Lefkowitz Human beta 1- and beta 2-adrenergic receptors 2 Caron 19 20 Relational databases PROTEIN REFERENCE SQL PROTEIN-ID ACCESSION DEFINITION SOURCE PROTEIN-ID ARTICLE-ID 1 Homo sapiens human NM_000684 1 1 adrenergic, 1 2 beta-1-, receptor select source from protein ARTICLE-AUTHOR ARTICLE-TITLE where accession = NM_000684; ARTICLE-ID AUTHOR ARTICLE-ID TITLE 1 Frielle PROTEIN Cloning of the cDNA for the human 1 1 Collins PROTEIN-ID ACCESSION DEFINITION SOURCE beta 1-adrenergic receptor 1 Daniel 1 Caron Human beta 1- and beta 2- 1 Homo sapiens human 2 NM_000684 1 Lefkowitz adrenergic receptors: structurally adrenergic, 1 Kobilka and functionally related beta-1-, receptor 2 Frielle receptors derived from distinct 2 Kobilka genes 2 Lefkowitz 2 Caron 21 22 SQL From relational to object model select title • CASE from protein, article-title, reference where protein.accession = NM_000684 • CAD REFERENCE and protein.protein-id PROTEIN-ID ARTICLE-ID • office automation = reference.protein-id 1 1 and reference.article-id 1 2 • multimedia applications = article-title.article-id; PROTEIN ARTICLE-TITLE PROTEIN-ID ACCESSION DEFINITION SOURCE ARTICLE-ID TITLE 1 Cloning of the … 1 Homo sapiens human NM_000684 adrenergic, 2 Human beta 1- … beta-1-, receptor 23 24

  5. Object-Oriented Databases Object (OODB) • World is modeled using objects. • An object has an object identifier (OID) that is not visible to the user. • An object has a state (value) and a behavior (operations). • OID cannot be changed. • Persistent objects - permanent storage • object versus value (sometimes transient objects are allowed) (a value has no OID) • object structure can be arbitrarily complex (atom, tuple, set, list, bag, array) 25 26 Example - object state Example - object state • o3(id3, tuple, • o1(id1, tuple, <title: `Cloning of …’, author: o5 >) <accession: NM_000684, • o4(id4, tuple, source : human, <title: `Human beta-1 …’, author: o6 >) definition: ’Homo sapiens adrenergic …’, • o5(id5, list, [Frielle, Collins, Daniel, Caron, reference: o2>) Lefkowitz, Kobilka]) • o2(id2, set, {o3,o4}) • o6(id6, list, [Frielle, Kobilka , Lefkowitz, Caron]) Remark: These examples do not use a standard syntax 27 28 ”Homo sapiens adrenergic, human NM_000684 beta-1-, receptor” Classes SOURCE ACCESSION DEFINITION define class protein type tuple ( accession: string; REFERENCE source : string; definition: string; reference: set(article); ); AUTHOR set operations AUTHOR TITLE TITLE create-protein(string,string,string,set(article)): protein; list list Frielle get-accession: string; ”Cloning of …” ”Human beta-1 …” Frielle get-source: string; Collins get-definition: string; Kobilka Daniel get-references: set(article); Caron add-reference(article): void; Lefkowitz end protein; Lefkowitz Caron 29 30 Kobilka

  6. Classes Example program program define class article variables: article1, article2, protein1; type tuple ( begin title: string; author: list(string); ); article1 := create-article(’Cloning….’, list(Frielle, Collins, operations Daniel, Caron, Lefkowitz, Kobilka)); create-article(string, list(string)): article; get-title string; protein1 := create-protein(NM_000684, human,’Homo get-authors: list(string); sapiens adrenergic …’, set(article1)); print-article-info string; article2 := create-article(’ Human beta-1….’, list(Frielle, end article; Kobilka , Lefkowitz, Caron]); protein1.add-reference(article2); end; 31 32 Operations Inheritance • encapsulation: operation = interface + body • journal-article subtype-of article: - interface: how is the operation called? journal-name journal-volume page-numbers What is the result of the operation? journal-article inherits all attributes and operations from > visible to user, used in programs article and has in addition also journal-name, journal- - body: how is the operation implemented? volume and page-numbers as attributes > invisible for user • human-protein subtype-of protein (source = ’human’) • program is based on message passing 33 34 Operator overloading Query language OQL • The same operator name can be used for • select … from … where different implementations select distinct … from … where • example: • iterator variables print-article-info for article prints information on title and • path expressions author. print-article-info for journal-article prints information on • struct title, author and also on the journal’s name, volume and page number.. 35 36

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend