Semantic Web: a short introduction Ivan Herman, Semantic Web - - PowerPoint PPT Presentation
Semantic Web: a short introduction Ivan Herman, Semantic Web - - PowerPoint PPT Presentation
Semantic Web: a short introduction Ivan Herman, Semantic Web Activity Lead, W3C Webelopers Day, Internet NG Conference, Isabel Plaza (Madrid), October 17, 2007 (2) > Towards a Semantic Web The current Web represents information using
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (2)
(2)
> Towards a Semantic Web
The current Web represents information using
− natural language (English, Hungarian, Spanish,…) − graphics, multimedia, page layout
Humans can process this easily
− can deduce facts from partial information − can create mental associations − are used to various sensory information
(well, sort of… people with disabilities may have serious problems
- n the Web with rich media!)
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (3)
(3)
> Towards a Semantic Web
Tasks often require to combine data on the Web:
− hotel and travel information may come from different sites − searches in different digital libraries − etc.
Again, humans combine these information easily
− even if different terminologies are used!
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (4)
(4)
> However…
However: machines are ignorant!
− partial information is unusable − difficult to make sense from, e.g., an image − drawing analogies automatically is difficult − difficult to combine information automatically
is <foo:creator> same as <bar:author>? how to combine different XML hierarchies?
− …
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (5)
(5)
> Example: automatic airline reservation
Your automatic airline reservation
− knows about your preferences − builds up knowledge base using your past − can combine the local knowledge with remote services:
airline preferences dietary requirements calendaring etc
It communicates with remote information (i.e., on the Web!)
− (M. Dertouzos: The Unfinished Revolution)
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (6)
(6)
> Example: data(base) integration
Databases are very different in structure, in content Lots of applications require managing several databases
− after company mergers − combination of administrative data for e-Government − biochemical, genetic, pharmaceutical research − etc.
Most of these data are accessible from the Web (though not necessarily public yet)
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (7)
(7)
> And the problem is real…
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (8)
(8)
> Example: change of address & the authorities
It means change of address at “official” places
so you could still get the right official mails for official notices, tax
information, certificates, etc.
… but you never know if you notified the right local, regional, national, etc, authorities, so they all have your new mail address
ie, you still get some mail from some agency at your old address
It should be possible to change the address in one official place only
− the administration should be smart enough to propagate the
change to authorities that need to know about it
− this means that various authorities should be able to merge their
data…
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (9)
(9)
> Example: “smart” portal
Various types of “portals” are created (for a journal on line, for a specific area of knowledge, for specific communities, etc) The portals may:
− integrate lots of different data sources − may have access to specialized domain knowledge
Goal is to provide a better local access, search on the integrated data, reveal new relationships among the data
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (10)
(10)
> What is needed?
(Some) data should be available for machines for further processing Data should be possibly combined, merged on a Web scale Sometimes, data may describe other data (like the library example, using metadata)… … but sometimes the data is to be exchanged by itself, like my calendar or my travel preferences Machines may also need to reason about that data
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (11)
(11)
> In what follows…
We will use a simplistic example to introduce the main Semantic Web concepts We take, as an example area, data integration
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (12)
(12)
> The rough structure of data integration
- 1. Map the various data onto an abstract data representation
− make the data independent of its internal representation…
- 2. Merge the resulting representations
- 3. Start making queries on the whole!
− queries that could not have been done on the individual data sets
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (13)
(13)
>
A simplified bookstore data (dataset “A”)
ID Author Title Publisher Year
ISBN 0-00-651409-X The Glass Palace 2000
ID Name Home page ID City
Harper Collins London id_xyz id_qpr id_xyz Ghosh, Amitav http://www.amitavghosh.com/
- Publ. Name
id_qpr
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (14)
(14)
> 1st: export your data as a set of relations
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (15)
(15)
> Some notes on the exporting the data
Relations form a graph
− the nodes refer to the “real” data or contain some literal − how the graph is represented in machine is immaterial for now
Data export does not necessarily mean physical conversion of the data
− relations can be generated on-the-fly at query time
via SQL “bridges” scraping HTML pages extracting data from Excel sheets etc.
One can export part of the data
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (16)
(16)
> Another bookshop data (dataset “F”)
ID Titre Auteur Original ISBN 2020386682 ISBN 0-00-651409-X ID Traducteur Le Palais des miroirs i_abc i_qrs Nom i_abc Ghosh, Amitav i_grs Besse, Christiane
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (17)
(17)
> 2nd: export your second set of data
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (18)
(18)
> 3rd: start merging your data
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (19)
(19)
> 3rd: start merging your data (cont.)
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (20)
(20)
> 3rd: merge identical resources
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (21)
(21)
> Start making queries…
User of data “F” can now ask queries like:
− « donnes-moi le titre de l’original » − (ie: “give me the title of the original”)
This information is not in the dataset “F”… …but can be retrieved by merging with dataset “A”!
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (22)
(22)
> However, more can be achieved…
We “feel” that a:author and f:auteur should be the same But an automatic merge doest not know that! Let us add some extra information to the merged data:
− a:author same as f:auteur − both identify a “Person” − a term that a community may have already defined:
a “Person” is uniquely identified by his/her name and, say,
homepage
it can be used as a “category” for certain type of resources
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (23)
(23)
> 3rd revisited: use the extra knowledge
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (24)
(24)
> Start making richer queries!
User of dataset “F” can now query:
− « donnes-moi la page d’accueil de l’auteur de l’original » − (ie, “give me the home page of the original’s author”)
The information is not in datasets “F” or “A”… …but was made available by:
− merging datasets “A” and datasets “F” − adding three simple extra statements as an extra “glue”
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (25)
(25)
> Combine with different datasets
Using, e.g., the “Person”, the dataset can be combined with
- ther sources
For example, data in Wikipedia can be extracted using dedicated tools
− there is an active development to add some simple semantic
“tag” to wikipedia entries (so called “Semantic Wiki”-s)
− the “dbpedia” project can extract the “infobox” information from
Wikipedia already…
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (26)
(26)
> Merge with Wikipedia data
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (27)
(27)
> Merge with Wikipedia data
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (28)
(28)
> Merge with Wikipedia data
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (29)
(29)
> Is that surprising?
Maybe but, in fact, no… What happened via automatic means is done all the time, every day by the users of the Web! The difference: a bit of extra rigor (e.g., naming the relationships) is necessary so that machines could do this, too
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (30)
(30)
> What did we do?
We combined different datasets
− all may be of different origin somewhere on the web − all may have different formats (mysql, excel sheet, XHTML, etc) − all may have different names for relations (e.g., multilingual)
We could combine the data because some URI-s were identical (the ISBN-s in this case) We could add some simple additional information (the “glue”), also using common terminologies that a community has produced As a result, new relations could be found and retrieved
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (31)
(31)
> It could become even more powerful
We could add extra knowledge to the merged datasets
− e.g., a full classification of various type of library data − geographical information − etc.
This is where ontologies, extra rules, etc, may come in Even more powerful queries can be asked as a result
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (32)
(32)
> What did we do? (cont)
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (33)
(33)
> The abstraction pays off because…
… the graph representation is independent on the exact structures in, say, a relational database … a change in local database schema's, XHTML structures, etc, do not affect the whole, only the “export” step
− “schema independence”
… new data, new connections can be added seamlessly, regardless of the structure of other data sources
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (34)
(34)
> So where is the Semantic Web?
The Semantic Web provides technologies to make such integration possible! For example:
− an abstract model for the relational graphs: RDF − extract RDF information from XML (eg, XHTML) pages: GRDDL − add structured information to XHTML pages: RDFa − a query language adapted for the relational graphs: SPARQL − characterize the relationships, categorize resources: RDFS,
OWL, SKOS, Rules
applications may choose among the different technologies some of them may be relatively simple with simple tools (RDFS),
whereas some require sophisticated systems (OWL, Rules)
− reuse of existing “ontologies” that others have produced (FOAF in
- ur case)
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (35)
(35)
> So where is the Semantic Web? (cont)
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (36)
(36)
> SW data begins to accumulate on the Web
IgentaConnect bibliographic metadata storage: over 200 million triplets Tracking the US Congress: data stored in RDF (around 25 million triplets) RDFS/OWL Representation of WordNet: also downloadable as 150MB of RDF/XML “Département/canton/commune” structure of France published by the French Statistical Institute Geonames Ontology and associated RDF data: 6 million (and growing) geographical features RDF Book Mashup, integrating book data from, eg, Amazon “dbpedia”: get infobox data of Wikipedia into RDF See, for example, the linked data index
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (37)
(37)
> And what about applications?
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (38)
(38)
> A number of projects in data integration
Developments are under way at various companies, institutions
− not always easy to find out the details…
Data integration comes to the fore as one of the SW application areas
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (39)
(39)
> Integrate knowledge for Chinese Medicine
Integration of a large number of relational databases (on traditional Chinese medicine) using a Semantic Layer
− around 80 databases, around 200,000 records each
A visual tool to map databases to the semantic layer using a specialized ontology Form based query interface for end users
Courtesy of Huajun Chen, Zhejiang University, (SWEO Case Study)
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (40)
(40)
> Find the right experts at NASA
Expertise locater for nearly 20,000 NASA civil servants using RDF integration techniques over 6 or 7 geographically distributed databases, data sources, and web services…
Courtesy of Kendall Clark, Clark & Parsia, LLC
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (41)
(41)
> Ontology controlled annotation
Annotation of different data formats all along the full drug discovery process…
RDF Triple Store Web API
Acrobat
Chemical Series Compounds Assay Data Points Scientific Papers Any PDF Pathways Lab data Collaborations Targets BioMarkers Spreadsheets Powerpoints Word… Websites/Pages Views of exp data BrainStorming Meeting Notes
Semantic Agents
Automat
ic Email Alerts
Project
Portals
Wikis
Courtesy of Giles Day, Pfizer
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (42)
(42)
> Public health surveillance
Integrated biosurveillance system (biohazards, bioterrorism, disease control, etc)
Courtesy of Parsa Mirhaji, School of Health Information Sciences, University of Texas (SWEO Case Study)
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (43)
(43)
> Help in choosing the right drug regimen
Help in finding the best drug regimen for a specific case
− find the best trade-off for a patient
Use an ontology for medical conditions, signs, symptoms Integrate data from various sources (patients, physicians, Pharma, researchers, etc)
Courtesy of Erick Von Schweber, PharmaSURVEYOR Inc., (SWEO Use Case)
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (44)
(44)
> Some other names…
Pfizer, NASA, Eli Lilly, MITRE Corp., Elsevier, … EU R&D Projects like Sculpteur and Artiste UN FAO’s MeteoBroker, … Semantic Digital Library projects (JeromeDL, Simile, Fedora,…)
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (45)
(45)
> Web sites, portals, local site search
Portal’s internal organization makes use of semantic data,
- ntologies
− integration with external and internal data
- there is a clear overlap here with data integration applications!
− better queries, often based on controlled vocabularies or
- ntologies…
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (46)
(46)
> Semantic portal for art collections
Courtesy of Jacco van Ossenbruggen, CWI, and Guus Schreiber, VU Amsterdam
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (47)
(47)
> Semantic portal for cultural heritage
Courtesy of Francisca Hernández, Fundación Marcelino Botín, and Richard Benjamins, iSOCO, (SWEO Case Study)
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (48)
(48)
> Help for deep sea drilling operations
Integration of experience and data in the planning and
- peration of deep sea drilling
processes Discover relevant experiences that could affect current or planned drilling operations
− uses an ontology backed
search engine
Courtesy of David Norheim and Roar Fjellheim, Computas AS (SWEO Use Case)
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (49)
(49)
> Portal to Principality of Asturias’ documents
Search through governmental documents A “bridge” is created between the users and the juridical jargon using SW vocabularies and tools
Courtesy of Diego Berrueta and Luis Polo, CTIC, U. of Oviedo, and the Principality of Asturias, (SWEO Case Study)
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (50)
(50)
> Digital music asset portal at NRK
Used by program production to find the right music in the archive for a specific show
Courtesy of Robert Engels, ESIS, and Jon Roar Tønnesen, NRK (SWEO Case Study)
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (51)
(51)
> Elsevier’s DOPE browser
Single interface to multiple data sources (in life sciences) Integration, search, etc, via thesauri and metadata in RDF(S)
Courtesy of Anita de Waard, Elsevier, Christiaan Fluit, Aduna, and Frank van Harmelen, VU Amsterdam (SWEO Use Case)
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (52)
(52)
> Intelligent search for public services
Semantic Web based search engine for public services at the municipality of Zaragoza (Spain) The search is based a local ontology, natural language processing and ontological reasoning
Courtesy of Jesús Fernando Ruíz, Municipality of Zaragoza (SWEO Use Case)
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (53)
(53)
> Vodafone live!
Integrate various vendors’ product descriptions via RDF
− ring tones, games, wallpapers − manage complexity of handsets, binary
formats
A portal is created to offer appropriate content Significant increase in content download after the introduction
Courtesy of Kevin Smith, Vodafone Group R&D (SWEO Case Study)
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (54)
(54)
> Other examples…
Sun’s White Paper and System Handbook collections Nokia’s S60 support portal Harper’s Online Magazine Oracle’s virtual pressroom Opera’s community site Dow Jones’ Synaptica
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (55)
(55)
> All kind of other types of applications…
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (56)
(56)
> Adobe’s XMP
Metadata is added by, e.g., Photoshop into files in RDF XMP is a way of embedding + vocabulary + a set of (public) tools (there are also 3rd party tools to extract the RDF content) Used by a number of platform solutions
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (57)
(57)
> Natural interface to business applications
Courtesy of C. Anantaram, Tata Consultancy Services Limited (SWEO Case Study)
Users interact with a business application (eg, via email) in natural language; OWL helps in the retrieval of relevant concepts
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (58)
(58)
> Suggestions’ database…
Employees of the bank can submit new ideas for innovation, improving the business process, reduce costs, etc The entry system analyses the entry, shows similar ideas already in the system based on the concepts (not words) User gets immediate feedback, system gets better search, analysis, etc
Courtesy of José Luís Bas Uribe, Bankinter, and Richard Benjamins, iSOCO, (SWEO Case Study)
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (59)
(59)
> Other application areas come to the fore
Content management Business intelligence Collaborative user interfaces Sensor-based services Linking virtual communities Grid infrastructure Multimedia data management Etc
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (60)
(60)
> Conclusions
The Semantic Web is there to integrate data on the Web The goal is the creation of a Web of Data
Ivan Herman, Semantic Web: a Short Introduction. “Webelopers day”, Isabel Plaza, 17.10.’07 (61)
(61)