Big (Linked) Semantic Data Compression Motivation & Challenges - PowerPoint PPT Presentation

Big (Linked) Semantic Data Compression Motivation & Challenges Antonio Fariña, Javier D. Fernández and Miguel A. Martinez-Prieto 3rd KEYSTONE Training School Keyword search in Big Linked Data Image: ROMAN AQUEDUCT (S EGOVIA , SPAIN ) 23 TH AUGUST 2017

Agenda Linked Data & Semantic Technologies  Foundations  RDF  SPARQL  (Some) Open Issues  Linked Data Workflow  Big Linked Data Challenges  Semantic Data Compression  Why is Semantic Data Redundant?  Compression Approaches  Achievements & Challenges  PAGE 2 images: zurb.com

Big (Linked) Semantic Data Compression Linked Data & Semantic Technologies • Foundations • RDF • SPARQL

Linked Data Foundations “ Linked Data is simply about using the Web to create typed links between data from different sources. PAGE 4 BIG (LINKED) SEMANTIC DATA COMPRESSION

Linked Data Linked Data Linked Data is simply about using the Web to create typed links between data from different sources. Linked Data refers to a set of best practices for publishing and connecting  data on the Web. These best practices have been adopted by an increasing number of data  providers, leading to the creation of a global data space: Data are machine-readable.  Data meaning is explicitly defined.  Data are linked from/to external datasets.  The resulting data network connects data from different domains:  Publications, movies, multimedia, government data, statistical data, etc.  PAGE 5 BIG (LINKED) SEMANTIC DATA COMPRESSION

Linked Data Principles 1. Use URIs as names for entities. 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information, using standards (e.g. RDF , SPARQL). 4. Include links to other URIs , so that they can discover more things. PAGE 7 BIG (LINKED) SEMANTIC DATA COMPRESSION

#1 URIs as names Names must ensure that any data entity has its What is his name? own identity in the global Linked Data space. “Homer Simpson” Human conventions are not effective to  name data entities: They are not universal → ambiguity .  The use of URIs (Universal Resource  Identifier) enables any real-world entity to be identified at universal scale: http://example.org/person/homer-simpson What is his name? “Homer Simpson” http://example.org/person/homer-simpson-guy BIG (LINKED) SEMANTIC DATA COMPRESSION PAGE 8

#2 HTTP URIs http://example.org/person/homer-simpson http://example.org/property/name "Homer Simpson" http://example.org/property/address "742 Evergreen Terrace" http://example.org/property/location http://example.org/place/springfield http://example.org/property/father http://example.org/person/abe-simpson ... Entity names must be searchable (via HTTP). Dereferenceable URIs ensure the corresponding entity descriptions to  be retrieved when an HTTP URI is accessed (via HTTP client). PAGE 9 BIG (LINKED) SEMANTIC DATA COMPRESSION

#3 Standards Many and varied stakeholders coexist  within the Linked Data ecosystem… Data providers from diverse domains  (economy, bioinformatics, multimedia…). Application developers.  End- users…  … but all of them “must speak the same  languages” for effective understanding. Standardized semantic technologies:  URIs for naming.  Serialization formats (XML, N3,  Turtle, HDT…) for data storage. RDF for data modelling and exchange.  SPARQL for RDF querying.  …  PAGE 10 BIG (LINKED) SEMANTIC DATA COMPRESSION

#4 Links to Other URIs Data entities are individually described:  A particular HTTP URI is assigned as name.  Its features are stated.  property:name “Marge Simpson” property:name “Homer Simpson” property:address property:address "742 Evergreen Terrace" @prefix person : <http://example.org/person/> . @prefix property : <http://example.org/property/> . person:homer-simpson person:marge-simpson Linking two URIs establishes a particular type of connection between two  existing entities: This principle materializes the aim of data integration in Linked Data.  PAGE 11 BIG (LINKED) SEMANTIC DATA COMPRESSION

#4 Links to Other URIs person:bart-simpson person:abe-simpson 10 83 property:age property:age “ B art Simpson” “Abe Simpson” property:name property:name property:name “Marge Simpson” property:name “Homer Simpson” property:mother property:father property:father property:address property:address "742 Evergreen Terrace" location:Springfield person:homer-simpson person:marge-simpson property:location property:location property:name “Springfield” @prefix person : <http://example.org/person/> . @prefix property : <http://example.org/property/> . PAGE 12 BIG (LINKED) SEMANTIC DATA COMPRESSION

The Web of Linked Data Web of Linked Data The Web of Linked Data revisits WWW foundations to build a cloud of data-to-data labelled hyperlinks. The Web of Linked Data converts raw data into a first-class citizen of the Web:  Data entities are the atoms of the Web of Linked Data.  Each entity has its own identity .  Relies on the WWW infrastructure:  It uses HTTP as communication protocol.  Entities are named using URIs .  Knowledge from different fields can be easily integrated and universally shared/exploited using WWW infrastructure. PAGE 13 BIG (LINKED) SEMANTIC DATA COMPRESSION

The Web of Linked Data (2007 – 2011) http://lod-cloud.net/ PAGE 14 BIG (LINKED) SEMANTIC DATA COMPRESSION

The Web of Linked Data (2014) http://lod-cloud.net/ PAGE 15 BIG (LINKED) SEMANTIC DATA COMPRESSION

The Web of Linked Data (2017) ~10K datasets organized into 9  domains which include many and varied knowledge fields. 150B statements , including  entity descriptions and (inter/intra-dataset) links between them. >500 live endpoints serving this  data. http://stats.lod2.eu/ http://sparqles.ai.wu.ac.at/ http://lod-cloud.net/ PAGE 16 BIG (LINKED) SEMANTIC DATA COMPRESSION

RDF “ RDF is a standard model for data interchange on the Web. RDF has features that facilitate data merging even if the underlying schemas differ… PAGE 17 BIG (LINKED) SEMANTIC DATA COMPRESSION

RDF Basics RDF is a standard model for data publication, interchange, and  consumption on the Web of Linked Data. RDF allows any class of data to be described using a simple triple  structure: Subject : the resource being described.  Predicate : a property of that resource.  Object : the value for the corresponding property.  http://example.org/person/homer-simpson http://example.org/property/name "Homer Simpson" http://example.org/person/homer-simpson http://example.org/property/father http://example.org/person/abe-simpson PAGE 18 BIG (LINKED) SEMANTIC DATA COMPRESSION

RDF Triples @prefix person : <http://example.org/person/> . @prefix property : <http://example.org/property/> . property:address "742 Evergreen Terrace" property:father property:name person:abe-simpson person:homer-simpson "Homer Simpson" An RDF triple can be seen as a labelled directed subgraph in which  subject and object nodes are linked by a particular (predicate) edge: The subject node contains the URI which names the resource.  The predicate edge labels the relationship using a URI whose semantics is  described by any vocabulary/ontology. The object node may contain a URI or a Literal value.  RDF links (between entities) also take the form of RDF triples.  PAGE 19 BIG (LINKED) SEMANTIC DATA COMPRESSION

RDF Graphs @prefix person : <http://example.org/person/> . @prefix property : <http://example.org/property/> . property:name property:mother "Bart Simpson" person:marge-simpson "Marge Simpson" property:location property:address property:name property:name person:bart-simpson "Springfield" location:springfield "742 Evergreen Terrace" property:age property:address property:location person:homer-simpson "Homer Simpson" 10 property:father property:name property:father property:age property:name 83 "Bart Simpson" person:abe-simpson PAGE 20 BIG (LINKED) SEMANTIC DATA COMPRESSION

RDF Graphs An RDF graph is only a mental model which must be serialized for  effective storage: Choosing a particular serialization format is an important decision for the  most relevant tasks in the Web of Linked Data. PAGE 21 BIG (LINKED) SEMANTIC DATA COMPRESSION

RDF Serialization Formats N3 RDF/XML NTriples JSON/LD http://www.easyrdf.org/converter PAGE 23 BIG (LINKED) SEMANTIC DATA COMPRESSION

SPARQL “ SPARQL is a semantic query language for databases, able to retrieve and manipulate data stored in Resource Description Framework (RDF) format. PAGE 24 BIG (LINKED) SEMANTIC DATA COMPRESSION

Big (Linked) Semantic Data Compression Motivation & Challenges - PowerPoint PPT Presentation

Big (Linked) Semantic Data Compression Motivation & Challenges Antonio Faria, Javier D. Fernndez and Miguel A. Martinez-Prieto 3rd KEYSTONE Training School Keyword search in Big Linked Data Image: ROMAN AQUEDUCT (S EGOVIA , SPAIN ) 23

Linked Data Mapper Mapper Linked Data A Browser rowser- -based Semantic Mapping

Composition Announcements Linked Lists Linked List Structure A linked list is either empty or a

Composition Announcements Linked Lists Linked List Structure A linked list is either empty or a

Linked Lists Fundamentals of Computer Science Outline Sequential vs. Linked Linked List

Lossless compression in lossy compression systems Almost every lossy compression system

14.9.2 JPEG2000 compression DCT compression basis for JPEG wavelet compression

Composition Announcements Linked Lists Linked List Structure A linked list is either empty or a

Composition Announcements Linked Lists Linked List Structure A linked list is either empty or a

csci 210: Data Structures Linked lists Summary Today linked lists single-linked

Linked Lists Definition of Linked Lists A linked list is a sequence of items (objects) where

Joint Regional Seminar 2016 Risk Analysis of Equity-linked Products 1 Equity-linked products 2

Linked Lists Kruse and Ryba Textbook 4.1 and Chapter 6 Linked Lists Linked list of items

Ch 5 Linked Lists A Node Class for Linked Lists A Linked List Toolkit The Bag Class with a

Linked Lists first: 3 first: 4 first: 5 first: 3 first: 4 first: 5 rest: rest: rest:

JPEG Compression Ian Snyder December 11, 2009 Ian Snyder JPEG Compression Outline

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

Using open source software for the supervision and management of the water resource system of

Session 8 Distributed Architecture Sbastien Combfis Fall 2019 This work is licensed under

What if analysis through simula1on-op1miza1on hybrids Marco

LAWSCI (2017) http://sciforum.net/conference/mol2net-03/lawsci-01 Doctrinal considerations

Neuromorphic Electronics Introduction Philipp H afliger hafliger@ifi.uio.no Brain Research

Plant Development Lecture 1: Plant architecture and embryogenesis. Lecture 2: Polarity and

GENOME DUPLICATION AND GENE ANNOTATION: AN EXAMPLE FOR A REFERENCE PLANT SPECIES. Alessandra

He who asks is a fool for five CSE527 minutes, but he who does not Computational Biology ask