Xcerpt and visXcerpt: Integrating Web Querying
http://www.pms.ifi.lmu.de/
Institute for Informatics University of Munich
Sacha Berger François Bry Tim Furche Benedikt Linse Andreas Schroeder
Xcerpt and visXcerpt: Integrating Web Querying Sacha Berger - - PowerPoint PPT Presentation
Xcerpt and visXcerpt: Integrating Web Querying Sacha Berger Franois Bry Institute for Informatics Tim Furche University of Munich Benedikt Linse Andreas Schroeder http://www.pms.ifi.lmu.de/ 1 Data: Semi-structured Trees & Graphs
http://www.pms.ifi.lmu.de/
Institute for Informatics University of Munich
Sacha Berger François Bry Tim Furche Benedikt Linse Andreas Schroeder
Overview Data Patterns Rules
Graph data model for Xcerpt and visXcerpt
— as in RDF and semi-structured DBs like Lore — great attention to XML specificities such as attributes and namespaces
Consistent Extension of XML
— children order may be irrelevant — possible transparent resolution of non-hierarchical relations
Bibliography Entries
— rather regular schema with optionals — several ordered lists, otherwise keyed attributes
Identifier and label of elements Context-Menu: Interactive Features Folding elements for information focus Element nesting (child relation) becomes box nesting and colors Non-hierarchical relations as hyperlinks Ordered vs. unordered children list
Overview Data Patterns Rules
‘Advancements in Data Management for Military and Civil Application’ ‘Graphs and Networks’ ‘Trees’ ‘Data Structures’ ‘Data’ ‘Information Systems’ ‘Papyri’ ‘Wax Tablets’ ‘Storage Management’ ‘Secondary Storage’ ‘Programming Techniques’ ‘Software’ ‘Operating Systems’ ‘Computing Classification System’
acm98:CCS acm98:D mybib:journal_adm
h a s T
C
c e p t h a s T
C
c e p t
acm98:E acm98:H
hasTopConcept
acm98:D_1 acm98:D_4
narrower narrower
‘Logic Programming’ ‘Visual Programming’
acm98:D_1_6 acm98:D_1_7
n a r r
e r n a r r
e r narrower narrower
acm98:D_4_2 acm98:D_4_2_e acm98:D_4_2_e_i acm98:D_4_2_e_ii
narrower narrower
‘Database Management’ ‘Physical Design’ ‘Logical Design’ ‘Data Models’ ‘Information Storage and Retrieval’ ‘Information Storage’ ‘Systems and Software’ ‘Performance evaluation (efficiency and effectiveness)’
acm98:E_1
narrower
acm98:E_1_c acm98:E_1_d
narrower n a r r
e r
acm98:H_2
narrower
acm98:H_2_1 acm98:H_2_2
narrower
acm98:H_2_1_a
narrower narrower
acm98:H_3
narrower narrower
acm98:H_3_2
n a r r
e r
acm98:H_3_4 acm98:H_3_4_d
narrower
mybib:conf_dmc mybib:article_66_scaurus_qumran mybib:article_66_wax_cicero mybib:inproc_44_brutus
‘Applied Data Management’ ‘From Wax Tablets to Papyri: The Qumran Case Study’ ‘Space- and Time-Optimal Data Storage on Wax Tablets’ ‘Efficient Management of Rapidly Changing Personal Records’
primarySubject subject related primarySubject subject p r i m a r y S u b j e c t primarySubject r e l a t e d primarySubject s u b j e c t related subject
Overview Data Patterns Rules
Query-by-Example paradigm
— queries just like data plus variables, incompleteness, optionality, negation — patterns plus variables instead of navigation
Logical Variables in Patterns
— select relevant data (n-ary queries) — group and aggregate data — join different data items
Basic Pattern
“return the titles of all top-level sections in articles by Marcus Tullius Cicero and published in ‘Applied Data Management’. ”
Accessing Web resources: arbitrary XML documents can be accessed using their URL Incomplete patterns in depth: descendant allows additional intermediary elements Grouping collects alternative bindings for variables: essential for structural assembly Incomplete patterns in breadth: partial patterns allow additional child elements Variables are used in lieu of data : express selection, joins, or arithmetic conditions
Overview Data Patterns Rules
Complex Pattern
“return titles and optionally paragraphs of all top-level sections without figures in articles on the topic ‘Wax Tablets’. ”
Terms as formulas: Terms may contain boolean connectives, variables, negation, etc. Subterm negation: Some subterms may be required not to occur in matching data Optional subterms: Local form of disjunction essential for variable schema data Value Joins: Expressed through multiple variable occurrences Optional construction:
Limited form of conditional construction based on variable bindings
Overview Data Patterns Rules
Separation of Query and Construction
— two separate parts in rules — no mixing of construction and querying — instead chaining where necessary
Separation of Concern by Views
— separate tasks of a query in rules — efficient evaluation of chained queries — memoization and unfolding
Overview Data Patterns Rules
Rules and Chaining
“close the skos:related relation on the provided data by adding skos:subject and traversing the closure of skos:narrower”
Terms as formulas: Terms may contain boolean connectives, including disjunctions Rules separate construction from querying and allow for procedural abstraction in query programs
Overview Data Patterns Rules