RDF Topics Finish up XML. What is RDF? Why is it interesting? - - PowerPoint PPT Presentation

rdf
SMART_READER_LITE
LIVE PREVIEW

RDF Topics Finish up XML. What is RDF? Why is it interesting? - - PowerPoint PPT Presentation

RDF Topics Finish up XML. What is RDF? Why is it interesting? SPARQL: The query language for querying RDF. Learning Objectives: Identify data management problems for which RDF is a desirable representation. Design


slide-1
SLIDE 1

4/22/10 1 4/22/10 1

RDF

  • Topics
  • Finish up XML.
  • What is RDF?
  • Why is it interesting?
  • SPARQL: The query language for querying RDF.
  • Learning Objectives:
  • Identify data management problems for which RDF is a

desirable representation.

  • Design RDF stores for a given data management problem.
  • Write queries in SPARQL
slide-2
SLIDE 2

4/22/10 2

XQuery Evaluation Model

  • The FOR clause acts more or less like a FROM clause in SQL
  • You can have multiple variables and paths:

For $v1 IN path1 $v2 IN path2

  • The query iterates over all combinations of values from the

component XPath expressions.

  • Variables $v1 and $v2 are assigned one value at a time.
  • The WHERE and RETURN clauses are applied to each

combination.

  • The LET clause is applied to each combination of values

produced by the FOR clause.

  • It assigns to a variable the complete set of values produced by its

path expression.

  • The RETURN clause is similar to a select clause.
  • Identifies the pieces that you actually want to return.
slide-3
SLIDE 3

4/22/10 3

Berkeley DB XML

  • What is it?
  • A native XML database built atop Berkeley DB.
  • Think of BDB as the storage layer and XML/XQuery as the

schema and query layers.

  • Concepts
  • Document: A well-formed XML document, treated like a

row/tuple in an RDBMS.

  • Container: a collection of related documents (kind of like a

table).

  • Can combine different document types in a container (unlike

a relational table).

slide-4
SLIDE 4

4/22/10 4

Containers

  • Include documents and meta-data
  • Indices
  • Dictionaries mapping element and attribute names to ID

numbers.

  • Statistics to aid in optimization
  • Document meta-data
  • Two flavors of containers:
  • Whole document: each key/data pair contains an entire

document.

  • Node containers: key/data pairs correspond to elements.
slide-5
SLIDE 5

4/22/10 5

Indices

  • Users can specify indices on containers.
  • Three types of indices:
  • Presence: keeps track of locations of elements or attributes

with a specified name.

  • Useful for queries that access elements or attributes of a particular

name, regardless of value.

  • Equality: indexes the values of all the elements/attributes

with a specified name

  • Useful for queries on value equality or range
  • Substring: indexes on substrings of the values of all

elements or attributes with a specified name

  • Useful for queries using the Xquery contains() function (comparable

to LIKE in SQL).

slide-6
SLIDE 6

4/22/10 6

Using DBXML to Query

  • DBXML is case sensitive and wants keywords in lower case

(select, not SELECT).

  • Getting Started:
  • Copy imdb.dbxml into your current directory.
  • Start dbxml:

% dbxml

  • Open container

dbxml> openContainer imdb.dbxml

  • Specify the collection to use using:
  • collection(“imdb.dbxml”)/
  • Must use query command to issue an XQuery:

dbxml> query ‘for $m in collection(“imdb.dbxml”)/movie return $m/title’

  • Use print to display results:

dbxml> print

slide-7
SLIDE 7

4/22/10 7

What is RDF?

  • Resource Description Framework
  • Grew out of the Semantic Web; way of attaching meta-data to

the web.

  • Fundamental idea is to attach labels to the edges between

nodes.

  • Structure is represented as a collection of triples: node, edge,

node.

  • Nodes represented by URIs (uniform resource identifiers)
  • Can be used to represent both structured and semi-structured

data.

  • Perhaps more importantly, interestingly, it is a natural

representation for graphs.

4/22/10 7

slide-8
SLIDE 8

4/22/10 8

RDF Concepts

  • Things are called resources.
  • Resources have properties.
  • Properties have values.
  • You can describe a resource by making statements about it;

these statements define its properties.

  • In RDF, we use the terms subject, predicate, and object to

describe the pieces of these statements.

  • The statement, “Margo has brown hair.” has:
  • Subject=resource: Margo
  • Predicate=property: hair
  • Object=value: brown

4/22/10 8

slide-9
SLIDE 9

4/22/10 9

Naming

  • RDF uses URI references (URIref) to identify any of

subject, predicate, object.

  • A URIref is a URI with an optional fragment identifier

(signified by the # symbol).

  • Examples:
  • URI : http://www.eecs.harvard.edu/~margo/somepage
  • URIref: http://www.eecs.harvard.edu/~margo/somepage#foo
  • Although RDF was invented for the web and is

specified in terms of URI’s, you can use it to store any data and refer to subjects, predicates, and

  • bjects by any ID of your choosing.

4/22/10 9

slide-10
SLIDE 10

4/22/10 10

The Model

  • Triples
  • That’s right -- the entire data model is described by triples of

subject, predicate, object, and by that mechanism, you can build up arbitrarily complex objects and relationships.

  • Think of RDF as describing a graph where the

subject and object are nodes and the predicates are edges.

  • So, our,”Margo has brown hair,” statement would be

represented as (margo, hair, brown) and look like: Margo “brown” hair

slide-11
SLIDE 11

4/22/10 11

Drawing Pictures of Data

  • Although there are XML representations of RDF, it is

by far most natural to think of RDF visually as a graph.

  • When we think about the graph stucture, we make

URIs circular nodes and literal values rectangles.

  • So our previous example might look like:

MyURI “Margo” “brown” hair name

slide-12
SLIDE 12

4/22/10 12

Let’s Draw our (tired) Products

Product itemname Vendor name sku-value # $$$

item inventory vendor sku instock val curr

currency

slide-13
SLIDE 13

4/22/10 13

Making this more RDF-like

SKU itemname # $$$

name inventory vendor instock val curr

currency vendor

price

name address

name address products

slide-14
SLIDE 14

4/22/10 14

Queries over RDF: SPARQL

  • Yet another query language.
  • Like many others, looks a lot like SQL:
  • Basic Structure:

SELECT <what to return> FROM <what collection> WHERE <what patterns to match>

  • Variables
  • Name is preceded by a ? (or $- in Jena)
  • Predicates are expressed as pattern-matching on

triples.

slide-15
SLIDE 15

4/22/10 15

WHERE Claues

  • WHERE { triple . triple . … };
  • Triple is a combination of variables, URIs, constants
  • The “.” means AND
  • Example: find all products sold by the vendor whose

ID is: vend001

SELECT ?name WHERE {

<vend001> <products> ?p> . <?p> <name> <?name>

};

slide-16
SLIDE 16

4/22/10 16

UNION

  • If the “.” operator is AND, how do you specify OR?
  • Find all products in either US or Canadian dollars:

SELECT ?name WHERE {

?sku <inventory> ?inv . ?inv <price> ?price . {{ ?price <curr> “USD” } UNION { ?price <curr> “CAD” }} . ?sku <name> ?name

};

slide-17
SLIDE 17

4/22/10 17

Optional Matches

  • Let’s say that we want all items and their vendor, but if there is

no vendor specified, we want to see the item anyway.

  • The obvious query doesn’t quite work:

select ?name ?vendor where { ?product <vendor> ?vendor . ?product <name> ?name };

  • Use OPTIONAL to return items without a vendor:

select ?name ?vendor where { ?product <name> ?name . OPTIONAL { ?product <vendor> ?vendor } };

slide-18
SLIDE 18

4/22/10 18

Filters

  • Let’s you impose constraints on values that match a

pattern.

  • Syntax: part of the WHERE clause (add like you

would add a pattern):

  • FILTER expression
  • Example: Find all product over $1.

select ?name ?val where { ?product <name> ?name . ?product <inventory> ?inv . ?inv <price> ?price . ?price <val> ?val . FILTER (?val > “1.0”) };

slide-19
SLIDE 19

4/22/10 19

Functions on which you can filter

  • There are many functions on which you can filter:
  • Regular binary and unary operators:
  • ! (not), || (or), && (and)
  • Comparators (numeric and string): <, > , =, !=, <=, >=
  • REGEX: regular expressions
  • BOUND: tells you if a variable is bound to a value.
  • Regular Expressions
  • regex(string, pattern)