semi structured data
play

Semi-structured data Data is not just text, but is not as well- - PDF document

Semi-structured data Data is not just text, but is not as well- Semi-structured data structured as data in databases Occurs often in web databanks Occurs often in integration of databanks 1 2 Semi-structured data - properties


  1. Semi-structured data • Data is not just text, but is not as well- Semi-structured data structured as data in databases • Occurs often in web databanks • Occurs often in integration of databanks 1 2 Semi-structured data - properties Semi-structured data - properties • irregular structure • It should be possible to ignore the data guide upon querying • implicit structure • Data guide changes fast • partial structure • object can change type/class • a posteriori ’data guide’ • difference between data guide and data is versus a priori schema blurred • large data guides 3 4 OEM (Object Exchange Model) Semi-structured data - model • Graph • network of nodes • Nodes: objects • object model (oid) oid • query: path search in the network atomic or complex - atoms: integer, string, gif, html, … - value of a complex object is a set of object references (label, oid) • Edges have labels • OEM is used by a number of systems (ex. Lorel) 5 6

  2. OEM example Lorel query language 1. Find all places to eat Vietnamese food Restaurant Guide Guide 12 select P restaurant restaurant cafe from RestaurantGuide.% P nearby zipcode 19 35 54 77 where P.category grep “ietnamese” 92310 nearby category name address category name address address price price category name 2. Find the names and streets of all restaurants in Palo Alto 17 13 14 66 18 23 25 55 79 80 gourmet Chef Chu Vietnamese Saigon Mountain Menlo Park cheap fast food Sandra View select R.name, A.street street city zipcode from RestaurantGuide.restaurant{R}.address A 44 15 16 El Camino Real Palo Alto 92310 where A.city = “Palo Alto” nearby 7 8 Data Guides Lorel query language • A structural summary over a databank 3. Find all restaurants to eat with zipcode 92310 that is used as a dynamic schema select RestaurantGuide.restaurant where • Is used in query formulation and RestaurantGuide.restaurant(.address)?.zipcode = 92310 optimization • Is often created a posteriori Wildcards and variables • Properties: ? - 0 or 1 path - object variables + - 1 or more paths select P from Guide.% P –concise * - 0 or more paths select A from #.address{A} –accurate # - any path - path variables –convenient % - 0 or more chars select Guide.#@P.name 9 10 Data Guides - definitions Data Guides - definitions • Label path: sequence of labels • A data guide for object s is an object L1.L2. … .Ln d such that every label path of s has • Data path: alternating sequence of exact one data path instance in d , labels and oid:s and each label path in d is a label L1.o1.L2.o2. … .Ln.on path of s . • Data path d is an instance of label path l if the sequences of labels are identical in l and d . 11 12

  3. Data Guides - example Data Guides Data model minimal Data Guide 1 18 • A databank can have several data A B A B B guides 2 3 4 19 C C C C 5 6 7 20 • Minimal data guides D D D D the smallest data guides 8 9 10 21 (a) (c) 13 14 Minimal Data Guides Strong Data Guides • Concise Intuitively: ”label paths that reach the same set of objects • May be hard to maintain in the data model = label paths that reach Example: child node for 10 with label E the same objects in the data guide” 15 16 Strong Data Guides - definitions Strong Data Guides - definitions Definition: An object o can be reached from s via l if d is a strong data guide for s if there is a data path of s that is an instance of l and that has o as last oid for all label paths l of s it holds that (L1.o1.L2.o2. … Ln.o) L(s,l) = L(d,l) The target set for label path l in object s is the set of objects that can be reached from s There is a 1-1-mapping between target via l. Notation: T(s,l) sets in the data model and nodes in a L(s,l): set of label paths of s that have the strong data guide. same target set in s as l . 17 18

  4. Data Guides - example Strong Data Guides - algorithm strong Data Guide minimal Data Guide Data model Implementation: 1 11 18 - Traverse data model depth-first. A B A B B A B - Each time you find a new target set for 2 3 4 12 13 19 label path l , create a new object in the data C C C C C C guide. If the target set is already represented in the 5 6 7 14 15 20 data guide, do not create a new object, but D D D D D D link to the existing object. 8 9 10 16 17 21 (a) (b) (c) 19 20 Strong Data Guides - use – Easier to maintain Semi-structured data – Used as path index for query - optimization exercises 21 22 Exercise 1 Exercise 2 • Represent the relations below using the OEM data • Using the data model from the previous question, model. formulate the following queries using Lorel: – find all the restaurants that are located in Linkoping c_id name r_id name c1 Linkoping r1 Hamlet – find the address (city and street) of the “Hamlet” restaurant c2 Norkoping r2 Normandie r3 McDonald's Cities – list the restaurants by city (equivalent of GROUP BY) Restaurants r_id c_id street r1 c1 Storgatan r2 c1 St.Larsgatan r3 c2 Kungsgatan Restaurants&Cities 23 24

  5. Exercise 3 Exercise 4 • Draw the strong Data Guide for the restaurant guide data model below. • Write 4 simple queries in Lorel that illustrate the use of Guide RestaurantGuide 1 coercion in the following types of comparison: restaurant restaurant cafe – string type against integer type; nearby nearby 4 2 3 – value against atomic object nearby – value against complex object category name address contact name address contact category name address contact – value against set of objects 5 6 7 8 9 10 11 12 13 14 15 gourmet Chef Chu Saigon Menlo Park fast food Sandra Explain how coercion works in each case street city zipcode manager reservation street city zipcode reservation manager 17 18 19 20 16 21 22 23 24 25 El Camino Real Palo Alto 92310 Rydsvagen Linkoping 58435 phone phone phone phone 26 27 28 29 71-72-73 11-12-13 31-32-33 34-35-36 25 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend