probabilistic xml probabilistic xml
play

Probabilistic XML Probabilistic XML Benny Kimelfeld & Yehoshua - PowerPoint PPT Presentation

VLDB 2007 Vienna, Austria Matching Twigs in Matching Twigs Probabilistic XML Probabilistic XML Benny Kimelfeld & Yehoshua Sagiv


  1. VLDB 2007 Vienna, Austria Matching Twigs in Matching Twigs Probabilistic XML Probabilistic XML Benny Kimelfeld & Yehoshua Sagiv ��������� �����������������������������������������������������������

  2. Example: Scanning Aerial Photography Example: Scanning Aerial Photography Find regions that include a factory building and a road … with a high probability VLDB 2007 VLDB 2007 Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

  3. Analyzing a Region What is the probability that this region is an answer Analyzing a Region (i.e., includes a factory building and a road)? factory bldg. & wall (40%) match / house & road (30%) (36%) The probability of each match can be significantly road (60%) road (90%) smaller than the probability that there is any match match (24%) match factory bldg. (40%) / house (50%) / (45%) apt. building (50%) factory bldg. (50%) But specifying the probability of each match match does not answer the question! (36%) VLDB 2007 VLDB 2007 Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

  4. A Database Point of View A Database Point of View * Query region Querying probabilistic data: road factory Each answer has an amount of certainty : The probability of being obtained building when querying a random database Probabilistic Data A prob. process for generating random data VLDB 2007 VLDB 2007 Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

  5. What Query Should We Pose? What Query Should We Pose? * A pattern • An answer is a match • What is the probability of each region specific match ? road factory • What is the probability of each pair of road & factory building? building • An answer is a projection of * A pattern w/ one or more matches projection region • What is the prob. of each answer after the projection ? project road factory • For each region, what is the on region prob. that it has some pair of building This is what we need! This is what we need! road & factory building? VLDB 2007 VLDB 2007 Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

  6. Another Example Another Example Find the following objects in one region: A factory building, a road, an antenna, a heliport, a track VLDB 2007 VLDB 2007 Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

  7. Finding a Partial Match Finding a Partial Match Find the following objects in one region: A factory building, a road, an antenna, a heliport, a track heliport (80%) partial match road (90%) (36%) No Track! factory bldg. w/ antennas (50%) / apt. building w/ water tanks (30%) For many applications, that’s good enough … VLDB 2007 VLDB 2007 Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

  8. What If … … What If Should we just filter out the whole match? Does not make sense! What about the previous partial match? heliport (80%) match track (20%) road (90%) (7.2%) factory bldg w/ antennas (50%) / apt. building w/ water tanks (30%) The probability may be too low to be of any interest! VLDB 2007 VLDB 2007 Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

  9. Finding Maximal Matches Finding Maximal Matches A pattern The goal is to find the maximal among the partial matches with a sufficient probability Probabilistic Data VLDB 2007 VLDB 2007 Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

  10. Querying Prob. Data: Earlier Work Querying Prob. Data: Earlier Work • Projection Projection and incomplete semantics incomplete semantics were • explored for relational models – Projection: Projection: Very simple queries can be highly – intractable (data complexity) [Dalvi & Suciu, VLDB 04] – Maximally joining relations: Maximally joining relations: Tractable under data – complexity, generally intractable under query-and- data complexity [Kimelfeld & Sagiv, PODS 07] • Yet tractable for important classes of schemas • None of these paradigms studied in the context of prob. XML (only complete matches w/o projection) But they are more relevant to prob. XML since, as the paper shows, they become tractable VLDB 2007 VLDB 2007 Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

  11. The Content of the Paper In the paper, we also have some preliminary results on The Content of the Paper the combination of maximal matches and projection Query evaluation over probabilistic XML Efficient algorithms and complexity analysis for various paradigms of querying Evaluating twig queries with projection Evaluating Boolean twig queries Finding maximal matches of twigs In the paper, we explain in detail why our results do not follow from previous results on XML/relational models VLDB 2007 VLDB 2007 Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

  12. Talk Overview Overview Talk �� �� Introduction �� �� �� �� �� �� Introduction �� �� Twig Queries over Probabilistic XML �� �� �� �� �� �� Twig Queries over Probabilistic XML − XML and Twig Queries − Probabilistic XML − Querying Probabilistic XML (Complete Semantics) �� �� Query Evaluation �� �� �� �� �� �� Query Evaluation (Complete Semantics) �� Finding Maximal Matches �� �� �� �� �� �� �� Finding Maximal Matches �� �� �� �� Conclusion, Related and Future Work �� �� �� �� Conclusion, Related and Future Work VLDB 2007 VLDB 2007 Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

  13. (Ordinary) XML Documents (Ordinary) XML Documents Rooted tree Each node has a tag, a value or both VLDB 2007 VLDB 2007 Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

  14. Twig Queries Twig Queries Output node (projection) Rooted tree Possibly, more than one * Descendant edge Node predicate over region the tag and value heliport factory park.�lot @area ≥10km 2 Child edge VLDB 2007 VLDB 2007 Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

  15. Matches and Answers Matches and Answers A match match of a twig T in a document d is a mapping from the nodes of T to those of d root( T ) → root( d ) node predicates are satisfied child edge → edge desc. edge → path * T d region heliport factory An answer answer is obtained from a match by @area park.�lot listing the images of the output nodes That is, applying projection to the match ≥10km 2 VLDB 2007 VLDB 2007 Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

  16. Boolean Queries Boolean Queries A twig without output nodes is a Boolean Boolean twig The answer is either true or false B ( d ) = true * B means that there is d a match of B in d region heliport factory park.�lot @area ≥10km 2 VLDB 2007 VLDB 2007 Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

  17. Talk Overview Overview Talk �� �� Introduction �� �� �� �� �� �� Introduction �� �� Twig Queries over Probabilistic XML �� �� �� �� �� �� Twig Queries over Probabilistic XML − XML and Twig Queries − Probabilistic XML − Querying Probabilistic XML (Complete Semantics) �� �� Query Evaluation �� �� �� �� �� �� Query Evaluation (Complete Semantics) �� Finding Maximal Matches �� �� �� �� �� �� �� Finding Maximal Matches �� �� �� �� Conclusion, Related and Future Work �� �� �� �� Conclusion, Related and Future Work VLDB 2007 VLDB 2007 Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

  18. Probabilistic XML Probabilistic XML d ∑ Pr( d ) = 1 d Probabilistic Random Instance XML document A probabilistic process An ordinary XML of generating ordinary document d , generated with probability Pr( d ) XML documents VLDB 2007 VLDB 2007 Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

  19. Implicit Representations Implicit Representations In practice, the probability space may be huge E.g., uncertainty is many small pieces of data It is unrealistic to represent the probabilistic document by explicitly specifying the entire space We usually explore implicit representations Such as the following one that we consider: VLDB 2007 VLDB 2007 Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

  20. A ProTDB ProTDB Document Document [ A [Nierman Nierman & & Jagadish Jagadish 02] 02] aerial-photo Ordinary Ordinary Distributional Distributional nodes nodes region neighborhood factory Independent 0.75 0.4 0 8 . 0.8 8 . 0 house house vehicle building 0.4 3 . 0 size size type park.lot heliport s m • 2 types of nodes 5 0.5 . 0 Rooted tree Mutually exclusive • 2 types of distributions track private VLDB 2007 VLDB 2007 Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend