Big (Linked) Semantic Data Compression Motivation & Challenges - - PowerPoint PPT Presentation

big linked semantic data compression
SMART_READER_LITE
LIVE PREVIEW

Big (Linked) Semantic Data Compression Motivation & Challenges - - PowerPoint PPT Presentation

Big (Linked) Semantic Data Compression Motivation & Challenges Antonio Faria, Javier D. Fernndez and Miguel A. Martinez-Prieto 3rd KEYSTONE Training School Keyword search in Big Linked Data Image: ROMAN AQUEDUCT (S EGOVIA , SPAIN ) 23


slide-1
SLIDE 1

Big (Linked) Semantic Data Compression

Motivation & Challenges

Antonio Fariña, Javier D. Fernández and

Miguel A. Martinez-Prieto

23TH AUGUST 2017

3rd KEYSTONE Training School Keyword search in Big Linked Data

Image: ROMAN AQUEDUCT (SEGOVIA, SPAIN)

slide-2
SLIDE 2
  • Linked Data & Semantic Technologies
  • Foundations
  • RDF
  • SPARQL
  • (Some) Open Issues
  • Linked Data Workflow
  • Big Linked Data Challenges
  • Semantic Data Compression
  • Why is Semantic Data Redundant?
  • Compression Approaches
  • Achievements & Challenges

PAGE 2

Agenda

images: zurb.com

slide-3
SLIDE 3
  • Foundations
  • RDF
  • SPARQL

Linked Data & Semantic Technologies

Big (Linked) Semantic Data Compression

slide-4
SLIDE 4

Linked Data is simply about using the Web to create typed links between data from different sources.

Linked Data Foundations

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 4

slide-5
SLIDE 5

Linked Data is simply about using the Web to create typed links between data from different sources.

  • Linked Data refers to a set of best practices for publishing and connecting

data on the Web.

  • These best practices have been adopted by an increasing number of data

providers, leading to the creation of a global data space:

  • Data are machine-readable.
  • Data meaning is explicitly defined.
  • Data are linked from/to external datasets.
  • The resulting data network connects data from different domains:
  • Publications, movies, multimedia, government data, statistical data, etc.

Linked Data

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 5

Linked Data

slide-6
SLIDE 6

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 7

Linked Data Principles

  • 1. Use URIs as names for entities.
  • 2. Use HTTP URIs so that people can look

up those names.

  • 3. When someone looks up a URI, provide

useful information, using standards (e.g. RDF , SPARQL).

  • 4. Include links to other URIs, so that

they can discover more things.

slide-7
SLIDE 7

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 8

#1 URIs as names

What is his name?

“Homer Simpson” “Homer Simpson”

  • Human conventions are not effective to

name data entities:

  • They are not universal → ambiguity.
  • The use of URIs (Universal Resource

Identifier) enables any real-world entity to be identified at universal scale:

What is his name?

http://example.org/person/homer-simpson http://example.org/person/homer-simpson-guy

Names must ensure that any data entity has its

  • wn identity in the global Linked Data space.
slide-8
SLIDE 8

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 9

#2 HTTP URIs

  • Dereferenceable URIs ensure the corresponding entity descriptions to

be retrieved when an HTTP URI is accessed (via HTTP client).

http://example.org/person/homer-simpson

http://example.org/property/name "Homer Simpson" http://example.org/property/address "742 Evergreen Terrace" http://example.org/property/location http://example.org/place/springfield http://example.org/property/father http://example.org/person/abe-simpson ...

Entity names must be searchable (via HTTP).

slide-9
SLIDE 9

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 10

#3 Standards

  • Many and varied stakeholders coexist

within the Linked Data ecosystem…

  • Data providers from diverse domains

(economy, bioinformatics, multimedia…).

  • Application developers.
  • End-users…
  • … but all of them “must speak the same

languages” for effective understanding.

  • Standardized semantic technologies:
  • URIs for naming.
  • Serialization formats (XML, N3,

Turtle, HDT…) for data storage.

  • RDF for data modelling and exchange.
  • SPARQL for RDF querying.
slide-10
SLIDE 10
  • Data entities are individually described:
  • A particular HTTP URI is assigned as name.
  • Its features are stated.
  • Linking two URIs establishes a particular type of connection between two

existing entities:

  • This principle materializes the aim of data integration in Linked Data.

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 11

#4 Links to Other URIs

person:homer-simpson

“Homer Simpson” "742 Evergreen Terrace"

property:address property:name person:marge-simpson

“Marge Simpson”

property:address property:name

@prefix person : <http://example.org/person/> . @prefix property : <http://example.org/property/> .

slide-11
SLIDE 11

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 12

#4 Links to Other URIs

person:homer-simpson

“Homer Simpson” "742 Evergreen Terrace"

property:address property:name person:marge-simpson

“Marge Simpson”

property:address property:name location:Springfield

“Springfield”

property:name property:location property:location person:abe-simpson property:father

“Abe Simpson” 83

property:age property:name person:bart-simpson

“Bart Simpson” 10

property:age property:name property:father property:mother

@prefix person : <http://example.org/person/> . @prefix property : <http://example.org/property/> .

slide-12
SLIDE 12

The Web of Linked Data revisits WWW foundations to build a cloud of data-to-data labelled hyperlinks.

  • The Web of Linked Data converts raw data into a first-class citizen of the Web:
  • Data entities are the atoms of the Web of Linked Data.
  • Each entity has its own identity.
  • Relies on the WWW infrastructure:
  • It uses HTTP as communication protocol.
  • Entities are named using URIs.

Web of Linked Data

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 13

The Web of Linked Data

Knowledge from different fields can be easily integrated and universally shared/exploited using WWW infrastructure.

slide-13
SLIDE 13

The Web of Linked Data (2007 – 2011)

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 14

http://lod-cloud.net/

slide-14
SLIDE 14

The Web of Linked Data (2014)

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 15

http://lod-cloud.net/

slide-15
SLIDE 15

The Web of Linked Data (2017)

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 16

  • ~10K datasets organized into 9

domains which include many and varied knowledge fields.

  • 150B statements, including

entity descriptions and (inter/intra-dataset) links between them.

  • >500 live endpoints serving this

data.

http://lod-cloud.net/ http://stats.lod2.eu/ http://sparqles.ai.wu.ac.at/

slide-16
SLIDE 16

RDF is a standard model for data interchange

  • n the Web. RDF has features that facilitate

data merging even if the underlying schemas differ…

RDF

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 17

slide-17
SLIDE 17
  • RDF is a standard model for data publication, interchange, and

consumption on the Web of Linked Data.

  • RDF allows any class of data to be described using a simple triple

structure:

  • Subject: the resource being described.
  • Predicate: a property of that resource.
  • Object: the value for the corresponding property.

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 18

RDF Basics

http://example.org/person/homer-simpson http://example.org/property/name "Homer Simpson" http://example.org/person/homer-simpson http://example.org/property/father http://example.org/person/abe-simpson

slide-18
SLIDE 18
  • An RDF triple can be seen as a labelled directed subgraph in which

subject and object nodes are linked by a particular (predicate) edge:

  • The subject node contains the URI which names the resource.
  • The predicate edge labels the relationship using a URI whose semantics is

described by any vocabulary/ontology.

  • The object node may contain a URI or a Literal value.
  • RDF links (between entities) also take the form of RDF triples.

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 19

RDF Triples

@prefix person : <http://example.org/person/> . @prefix property : <http://example.org/property/> . person:homer-simpson person:abe-simpson "Homer Simpson" property:name "742 Evergreen Terrace" property:address property:father

slide-19
SLIDE 19

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 20

RDF Graphs

@prefix person : <http://example.org/person/> . @prefix property : <http://example.org/property/> .

person:homer-simpson person:abe-simpson "Homer Simpson" property:name "742 Evergreen Terrace" property:address property:father person:marge-simpson property:address "Marge Simpson" property:name location:springfield property:location property:location person:bart-simpson "Springfield" property:mother property:father property:name "Bart Simpson" 10 property:name property:age 83 "Bart Simpson" property:age property:name

slide-20
SLIDE 20

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 21

RDF Graphs

  • An RDF graph is only a mental model which must be serialized for

effective storage:

  • Choosing a particular serialization format is an important decision for the

most relevant tasks in the Web of Linked Data.

slide-21
SLIDE 21

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 23

RDF Serialization Formats

NTriples RDF/XML N3 JSON/LD

http://www.easyrdf.org/converter

slide-22
SLIDE 22

SPARQL is a semantic query language for databases, able to retrieve and manipulate data stored in Resource Description Framework (RDF) format.

SPARQL

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 24

slide-23
SLIDE 23
  • SPARQL is a semantic query language, able to retrieve and

manipulate RDF data.

  • SPARQL query resolution performs graph pattern matching:
  • The objective is to determine all the candidates matching to a pattern

query in a large data graph.

  • Triple Patterns are the most basic SPARQL query:
  • Triple patterns are like RDF triples except that each of the subject,

predicate and object may be a variable.

  • More complex Graph Patterns comprise joins or unions of multiple

triple patterns.

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 25

SPARQL Basics

slide-24
SLIDE 24
  • The semantics of “lives in” is

captured by the property

http://example.org/property/location.

  • The entity describing “Springfield”

is named by the URI

http://example.org/location/springfield.

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 26

SPARQL Querying

Who lives in Springfield?

?who

location:springfield property:location

Who?

@prefix person : <http://example.org/person/> . @prefix property : <http://example.org/property/> . SELECT ?Who WHERE { ?Who property:location location:Springfield }

slide-25
SLIDE 25

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 27

SPARQL Querying

person:homer-simpson person:abe-simpson "Homer Simpson" property:name "742 Evergreen Terrace" property:address property:father person:marge-simpson property:address "Marge Simpson" property:name location:springfield property:location property:location person:bart-simpson "Springfield" property:mother property:father property:name "Bart Simpson" 10 property:name property:age 83 "Bart Simpson" property:age property:name

?who

location:springfield property:location location:springfield property:location property:location

?who ?who

slide-26
SLIDE 26
  • Linked Data Workflow
  • Big Linked Data Challenges

(Some) Open Issues

Big (Linked) Semantic Data Compression

slide-27
SLIDE 27

Data moves from data providers to end users within the Linked Data ecosystem. It evolves along many stages to consolidate effective results which satisfy end-user requirements.

Linked Data Workflow

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 29

slide-28
SLIDE 28
  • Generation: data is ingested from real-world systems and transformed into an

RDF dataset.

  • Publication: the RDF dataset is exposed using one or more Linked Data compliant

services.

  • Consumption: data is retrieved from consumers using well-defined interfaces.

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 31

Linked Data Workflow

Real-world Facts Data Creator RDF Dataset Data Provider HTTP URIs Dump Endpoint LD Fragment Data Consumers

GENERATION

PUBLICATION CONSUMP TION

slide-29
SLIDE 29
  • Data is extracted from real-world sources:
  • Real-time sources require streams of raw data to be continuously ingested.
  • Data from other sources are ingested in batch.
  • Data cleansing deals with the transformation of raw data into high-

quality data which is finally modeled using RDF:

  • Streaming or batch processing workflows are deployed for data

transformation.

  • Data integration enriches the current data with links from/to other

relevant entities in the Linked Data Web:

  • Where is the entity published? What is its URI?
  • A new RDF dataset (or and RDF stream) is generated as a result.

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 32

RDF Data Generation

slide-30
SLIDE 30

The new RDF dataset must be exposed in the Linked Data Web.

  • a. RDF Dump:

I. Triples are serialized (and possible compressed) into a valid RDF format. II. The resulting dump file is hosted at a web server and registered into a central catalog (e.g. datahub.io) for discovering purposes.

  • b. HTTP URIs:

I. Triples are stored and indexed (using possible a semantic database). II. An interface is exposed for dereferencing URIs.

  • c. SPARQL endpoint:

I. Triples are stored and indexed using a semantic database. II. An SPARQL interface is exposed for querying.

  • d. Linked Data fragment:

I. Triples are self-indexed using an in-memory RDF engine (HDT). II. An LD interface is exposed for (federated) querying.

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 33

RDF Data Publication

slide-31
SLIDE 31

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 34

RDF Data Consumption

Consumers retrieve RDF data to meet their information requirements.

  • Consumers can download RDF dumps for local use:
  • A (possibly) voluminous file is retrieved and stored locally.
  • Data are usually indexed using a semantic database.
  • The remaining mechanisms are used to retrieve pieces of RDF data

matching particular needs:

  • Dereferencing HTTP URIs allows individual entity descriptions to be

retrieved from the corresponding publisher.

  • SPARQL endpoints are used to retrieve RDF data which matches complex

SPARQL queries in a particular dataset.

  • Finally, Linked Data Fragments are accessed to resolve federated queries

along the Web of Linked Data.

slide-32
SLIDE 32

This processing workflow suffers when Big Linked Data must be managed along it.

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 35

Linked Data Workflow

Real-world Facts Data Creator RDF Dataset Data Provider HTTP URIs LD Fragment Endpoint Dump Data Consumers

CONSUMP TION

slide-33
SLIDE 33

Big is not a matter of size... it is a matter or representativity & consumption capacity.

Big Linked Data Challenges

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 36

slide-34
SLIDE 34

We refer as Big Linked Data (BLD) an RDF dataset that exceeds the capacity of conventional tools used to implement the processing workflow.

  • BLD management is a challenge because of its volume:
  • Individual datasets are not typically considered BLD [+Gb,++Gb].
  • Integrated RDF datasets (mashups) can be considered BLD [++Gb, +Tb].
  • The whole Web of Linked Data is obviously BLD [++Tb].
  • BLD management is a challenge because of its velocity:
  • Data is generated continuously in many scenarios.
  • Data is queried continuously from many users.
  • BLD management is a challenge because of other Vs (variability, veracity,

validity, vulnerability…) .

Big Linked Data

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 37

Big Linked Data

slide-35
SLIDE 35
  • Generation Challenges:
  • Efficient streaming solutions are required for processing continuous

RDF sources (e.g. meteorology or traffic sensors, social networks…).

  • Scalable batch-processing tools able to store and manage large

amounts of RDF data along the cleansing/integration stages.

  • Publication Challenges:
  • Compact serialization formats for saving storage space (RDF dumps).
  • Efficient indexes for performing fast URI lookups and retrieving the

corresponding triples (HTTP URIs).

  • Lightweight indexes and efficient in-memory algorithms for fast

query resolution (SPARQL endpoints).

  • Linked Data Fragments is a good example of scalable solution.
  • Consumption Challenges:
  • Compact serialization formats for reducing network latencies.

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 38

Big Linked Data Challenges

slide-36
SLIDE 36
  • Why is Semantic Data Redundant?
  • Compression Approaches
  • Achievements & Challenges

Semantic Data Compression

Big (Linked) Semantic Data Compression

slide-37
SLIDE 37
  • A semantic data compressor provides an alternative method for

serializing RDF triples:

  • It detects redundant information within the dataset and removes it.
  • In practice, the resulting encoding uses (much) less bits than that

required by traditional formats.

  • This lecture deals with the use of semantic data compression to

address some of the Big Linked Data challenges:

  • Compressed RDF saves storage space.
  • Compressed RDF requires less bandwidth for exchanging purposes.
  • Compressed RDF is loaded faster than other traditional solutions, and

(possibly) requires less amounts of memory for triples parsing.

  • Some classes of compressed RDF allows fast lookups to be performed

with no prior triples decoding.

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 40

Semantic Data Compression

slide-38
SLIDE 38

Data redundancy means that the same information can be encoded using less bits.

Why Semantic Data is Redundant?

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 41

slide-39
SLIDE 39
  • Data is redundant when the same information can

be encoded using less bits:

  • Compression is achieved when redundant data

is removed from the original dataset.

  • Why is semantic data redundant?
  • Some facts can be inferred from other ones →

semantic redundancy.

  • Sequences of symbols are repeated along the

dataset → symbolic redundancy.

  • The underlying RDF graph structure is redundant by

itself → syntactic (structural) redundancy.

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 42

Semantic Data Redundancy

slide-40
SLIDE 40
  • Semantic redundancy occurs when the same meaning can be expressed

using less triples:

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 43

Semantic Redundancy

http://example.org/property/age http://www.w3.org/2000/01/rdf−schema#domain http://example.org/classes/person http://example.org/person/bart-simpson http://example.org/property/age 10 http://example.org/person/bart-simpson http://www.w3.org/1999/02/22−rdf−syntax−ns#type http://example.org/classes/person

  • The third triple is redundant because the entity named as http://example.org/

person/bart-simpson is the type http://example.org/classes/person because of the

second triple (it provides the age of the person).

slide-41
SLIDE 41
  • Symbolic redundancy is due to symbol repetitions in triples:
  • This is the “traditional” source of redundancy removed by universal

compressors.

  • Symbolic redundancy in RDF is mainly due to URIs:
  • URIs tend to be very large strings which share long prefixes, but also has
  • ther common infixes or suffixes.

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 44

Symbolic Redundancy

  • Literals also contribute to this redundancy.

http://example.org/class/person http://example.org/property/address http://example.org/property/age http://example.org/property/location http://example.org/person/abe-simpson http://example.org/person/bart-simpson http://example.org/person/homer-simpson http://example.org/person/marge-simpson Abe Simpson Marge Simpson

slide-42
SLIDE 42
  • Syntactic redundancy depends on how the RDF graph structure is

serialized:

  • For instance, a serialized subset of n triples (which describes the same

resource) writes n times the subject value. It can be abbreviated.

  • ... and also on the underlying graph structure by itself:
  • For instance, resources of the same classes are described using (almost)

the same sub-graph structure.

  • Syntactic compression also has (many) room for optimization:
  • Structural RDF features must be better understood.

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 45

Syntactic Redundancy

slide-43
SLIDE 43

The current state of the art comprises a rich and varied set of compressors for RDF data. These are mainly lossless compressors (because they preserve the original knowledge in the dataset), yet lossy compressors are also emerging

Compression Approaches

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 46

slide-44
SLIDE 44
  • Semantic compressors proposes many and varied techniques for

lossless compression:

  • They preserve the original knowledge in the dataset.
  • Lossy compressors are not in the scope of this lecture because losing

knowledge is not acceptable for the Linked Data workflow.

  • Three classes of lossless compressors:
  • Physical compressors detect and removes symbolic and/or syntactic

redundancy from the original dataset.

  • Logical compressors detect and removes semantic redundancy from the
  • riginal dataset.
  • Hybrid compressors perform at physical and logical levels.

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 47

Compression Approaches

slide-45
SLIDE 45
  • Physical compressors adapt traditional concepts from data compression

to the particular case of RDF compression:

  • Dictionary compression removes symbolic redundancy and allows

triples to be rewritten as 3-IDs tuples (ID graph).

  • Graph compression is applied to the ID graph representation and

removes different kinds of syntactic redundancy.

  • Many and varied physical compressors have been proposed:
  • It reports good space savings → compressed datasets take (much) less

space than their uncompressed counterparts.

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 48

Physical Compressors

  • A subset of such compressors (e.g. HDT, k2triples, or

RDFCSA) are really self-indexes:

  • Datasets are compressed and indexed in a single encoding.
  • Triples can be searched accessed with no prior

decompression.

slide-46
SLIDE 46
  • Logical compressors look for redundant triples (those than can be

inferred), and remove them from the dataset.

  • The resulting dataset encoding includes two components:
  • The canonical subgraph which organizes all canonical triples.
  • The set of inference rules which must be applied to recover “redundant

triples”.

  • Different rule-based algorithms have been proposed to obtain canonical

subgraphs.

  • Compression effectiveness is less competitive than for physical

compressors.

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 49

Logical Compressors

slide-47
SLIDE 47
  • Hybrid compressors combine the best of both worlds:
  • Consider different strategies to compact the graph by deleting semantic

redundancy at logical level.

  • Detect and remove syntactic/symbolic redundancy at physical level

(dictionary + graph compression).

  • These topic has not been much explored yet… but it is (maybe) the

most promissory:

  • Canonical subgraphs are smaller than their original counterparts → less

space requirements.

  • Implicit triples can be efficiently accessed by applying inference rules →

better overall search performance.

  • HDT++ or k2triples++ report competitive (preliminary)

space/time tradeoffs.

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 50

Hybrid Compressors

slide-48
SLIDE 48

RDF compression is a mature field of research, but the current state of the art has many room for optimization.

Achievements & Challenges

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 51

slide-49
SLIDE 49
  • RDF compression is a mature field of research:
  • We are working on this field from 2009.
  • HDT is W3C Member Submission from 2011:
  • It is a de facto standard for publishing compressed RDF

datasets (e.g. LOD Laundromat).

  • It is the RDF engine of Linked Data Fragments and has

been adopted by many other tools in the Semantic Web community.

  • HDT+Jena allows in-memory SPARQL resolution to be

efficiently performed.

  • K2triples and RDFCSA are the state of the art for RDF

compression and SPARQL triples pattern resolution.

  • We have recently compressed the Linked Data Web in a

single dataset: LOD-a-lot.

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 52

RDF Compression Achievements

Sandra Mario Nieves Ana Antonio Javier D. José M. Claudio Antonio Miguel A. Gonzalo Axel

slide-50
SLIDE 50
  • RDF compression is yet underexploited:
  • Managing compressed RDF allows the memory footprint to be reduced,

improving scalability.

  • Self-indexes can be adopted by semantic databases to improve query

performance.

  • A full-compressed triplestore has not been yet released, but we

currently working on it.

  • Although our techniques has room for optimization!
  • Some other semantic applications can be improved using compression:
  • RDF archives, Linked Closed Data…

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 53

RDF Compression Challenges

slide-51
SLIDE 51

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 54

Bibliography

1. Sandra Alvarez-Garccía, Nieves Brisaboa, Javier D. Fernáandez, Miguel A. Martínez-Prieto, and Gonzalo Navarro. Compressed Vertical Partitioning for Efficient RDF Management. Knowledge and Information Systems (KAIS), 44(2):439–474, 2015. 2. Tim Berners-Lee. Linked Data, 2006. http://www.w3.org/DesignIssues/LinkedData.html. 3. Christian Bizer, Tom Heath, and Tim Berners-Lee. Linked Data - The Story So Far. International Journal of Semantic Web and Information Systems, 5(3):1–22, 2009. 4. Nieves Brisaboa, Ana Cerdeira, Antonio Fariña, and Gonzalo Navarro. A Compact RDF Store using Suffix Arrays. In Proceedings of SPIRE, pages 103-115, 2015. 5. Javier D. Fernández, Mario Arias, Miguel A. Martínez-Prieto, and Claudio Gutiérrez. Management of Big Semantic

  • Data. In Big Data Computing, chapter 4. Taylor and Francis/CRC, 2013.

6. Javier D. Fernández, Miguel A. Martínez-Prieto, Claudio Gutiérrez, and Axel Polleres. Binary RDF Representation for Publication and Exchange. W3C Member Submission, 2011. www.w3.org/Submission/HDT/. 7. Javier D. Fernández, Miguel A. Martínez-Prieto, Claudio Gutiérrez, Axel Polleres, and Mario Arias. Binary RDF Representation for Publication and Exchange. Journal of Web Semantics, 19:22–41, 2013.

slide-52
SLIDE 52

BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 55

Bibliography

8. Tom Heath and Christian Bizer. Linked Data: Evolving the Web into a Global Data Space. Morgan & Claypool, 1 edition, 2011. http://linkeddatabook.com/. 9. Amit K. Joshi, Pascal Hitzler, and Guozhu Dong. Logical Linked Data Compression. In Proceedings of ESWC, pages 170–184, 2013.

  • 10. Frank Manola and Eric Miller. RDF Primer. W3C Recommendation, 2004. www.w3.org/TR/rdf-primer/.
  • 11. Jeff Z. Pan, José Manuel Gómez-Pérez, Yuan Ren, Honghan Wu, and Man Zhu. SSP: Compressing RDF data by

Summarisation, Serialisation and Predictive Encoding. Technical report, 2014. Available at http://www.kdrive- project.eu/wp-content/uploads/2014/06/WP3-TR2-2014 SSP.pdf.

  • 12. Eric Prud’hommeaux and Andy Seaborne. SPARQL Query Language for RDF. W3C Recommendation, 2008.

http://www.w3.org/TR/rdf-sparql-query/.

slide-53
SLIDE 53

Basics of Data Compression

Let’s the lecture continues…

Image: MAIN SQUARE & CATHEDRAL (SEGOVIA, SPAIN)