Jorge Gr Jo Gracia ia Jose Labra Jo Labra Ontology Engineering - - PowerPoint PPT Presentation

jorge gr jo gracia ia jose labra jo labra
SMART_READER_LITE
LIVE PREVIEW

Jorge Gr Jo Gracia ia Jose Labra Jo Labra Ontology Engineering - - PowerPoint PPT Presentation

Jorge Gr Jo Gracia ia Jose Labra Jo Labra Ontology Engineering Group (OEG) Web Semantics Oviedo (WESO) Universidad Politcnica de Madrid (UPM) University of Oviedo jgracia@fi.upm.es labra@uniovi.es Multilingual Web Workshop Madrid


slide-1
SLIDE 1

Jo Jorge Gr Gracia ia

Ontology Engineering Group (OEG) Universidad Politécnica de Madrid (UPM) jgracia@fi.upm.es

Multilingual Web Workshop Madrid (Spain) 7-8 May 2014 Jo Jose Labra Labra

Web Semantics Oviedo (WESO) University of Oviedo labra@uniovi.es

slide-2
SLIDE 2

Multilingual Web Workshop Madrid, May 2014 Multilingual Web Workshop Madrid, May 2014

} Motivation } The group } Main goals } Activities } Where are we now?

2

slide-3
SLIDE 3

Multilingual Web Workshop Madrid, May 2014 3

Moti tivati tion

slide-4
SLIDE 4

Multilingual Web Workshop Madrid, May 2014 4

349 1,906 635 2,201 1,984 676

Monolingual datasets Multilingual datasets January 2012 June 2012 December 2012

  • A. Gómez-Pérez, D. Vila-Suero, E. Montiel-Ponsoda, J. Gracia, and G. Aguado-de Cea,

"Guidelines for multilingual linked data," in Proceedings of the 3rd International Conference

  • n Web Intelligence, Mining and Semantics, ser. WIMS '13. New York, NY, USA: ACM, Jun. 2013.
slide-5
SLIDE 5

Multilingual Web Workshop Madrid, May 2014

2,567,324 10,250,936 3,154,779 10,594,338 12,272,806 3,365,930

RDF literals without language tag RDF literals with language tag January 2012 June 2012 December 2012

  • A. Gómez-Pérez, D. Vila-Suero, E. Montiel-Ponsoda, J. Gracia, and G. Aguado-de Cea,

"Guidelines for multilingual linked data," in Proceedings of the 3rd International Conference

  • n Web Intelligence, Mining and Semantics, ser. WIMS '13. New York, NY, USA: ACM, Jun. 2013.

5

slide-6
SLIDE 6

Multilingual Web Workshop Madrid, May 2014 6

slide-7
SLIDE 7

Multilingual Web Workshop Madrid, May 2014 Multilingual Web Workshop Madrid, May 2014

Vocabulary selection RDF generation Data Interlinking Web Publishing

7

slide-8
SLIDE 8

Multilingual Web Workshop Madrid, May 2014

http ttp://example.org/Spain http ttp://example.org/I23AX4 X45 http ttp://example.org/Es España

8

slide-9
SLIDE 9

Multilingual Web Workshop Madrid, May 2014 9

slide-10
SLIDE 10

Multilingual Web Workshop Madrid, May 2014

slide-11
SLIDE 11

Multilingual Web Workshop Madrid, May 2014 11

Th The g e grou roup

slide-12
SLIDE 12

Multilingual Web Workshop Madrid, May 2014 Multilingual Web Workshop Madrid, May 2014

W3C community group on Best Practises for Multilingual Linked (Open) Data

https://www.w3.org/community/bpmlod


12

Started on June 2013 bi-weekly telcos 3 chairs. Currently: 67 members from academia and industry

José Labra Jorge Gracia John McCrae

slide-13
SLIDE 13

Multilingual Web Workshop Madrid, May 2014 Multilingual Web Workshop Madrid, May 2014 13

and many

  • thers…
slide-14
SLIDE 14

Multilingual Web Workshop Madrid, May 2014 14

Main Main g goals

  • als
slide-15
SLIDE 15

Multilingual Web Workshop Madrid, May 2014 Multilingual Web Workshop Madrid, May 2014

Crowdsourcing ideas from the community regarding best practices to produce multilingual linked (open) data. Documenting patterns and best practices for the creation, linking, and use of multilingual linked data.

15

slide-16
SLIDE 16

Multilingual Web Workshop Madrid, May 2014 Multilingual Web Workshop Madrid, May 2014

Linked Data for Language Technologies (LD4LT) BPMLOD Ontology lexica (Ontolex) Data on the Web Best Practices

Use Cases BP for LD in LT lemon specification BP for using lemon BP for Multlingual Data on the Web BP for Data on the Web

16

slide-17
SLIDE 17

Multilingual Web Workshop Madrid, May 2014 17

Acti tiviti ties

slide-18
SLIDE 18

Multilingual Web Workshop Madrid, May 2014 Multilingual Web Workshop Madrid, May 2014 18

TOPIC classification USE CASES PATTERNS

BEST PRACTISES & GUIDELINES

slide-19
SLIDE 19

Multilingual Web Workshop Madrid, May 2014 Multilingual Web Workshop Madrid, May 2014 19

TOPIC classification

NAMING

slide-20
SLIDE 20

Multilingual Web Workshop Madrid, May 2014 Multilingual Web Workshop Madrid, May 2014

Naming

Opaque URIs, Descriptive URIs, IRIs, …

Textual information

Language tags, linguistic information, …

Linking

Interlanguage links, owl:sameAs, …

Ontologies and vocabularies

Mono/multilingual vocabularies, ontology localisation…

Quality of MLOD Tools and examples of MLOD Other related aspects

licensing, legal aspects, …

20

https://www.w3.org/community/bpmlod/wiki/Topic_classification

slide-21
SLIDE 21

Multilingual Web Workshop Madrid, May 2014 Multilingual Web Workshop Madrid, May 2014 21

TOPIC classification USE CASES

slide-22
SLIDE 22

Multilingual Web Workshop Madrid, May 2014 Multilingual Web Workshop Madrid, May 2014

USE CASES

  • 1. Localization workflow [D. Lewis]
  • 2. Lexicalisation of RDF Datasets [E. Montiel, G. Dunshire]
  • 3. Ontology localisation [E. Montiel, L. Aguado, G. Dunsire]
  • 4. Crosslingual linked data matching [J. Gracia]
  • 5. Machine translation [T. Heuss]
  • 6. Application localization [J. McCrae]

CASE STUDIES

  • 1. Translations of multilingual terminologies for

libraries [G. Dunsire]

22

https://www.w3.org/community/bpmlod/wiki/Use_cases_definition

slide-23
SLIDE 23

Multilingual Web Workshop Madrid, May 2014 Multilingual Web Workshop Madrid, May 2014 23

TOPIC classification USE CASES PATTERNS

slide-24
SLIDE 24

Multilingual Web Workshop Madrid, May 2014 Multilingual Web Workshop Madrid, May 2014

Difficult to establish a boundary between Patterns vs Best Practices vs Bad smells

By now: we identify the main practices Bad/Good may depend on the context/use case

Examples:

  • Patterns for naming and dereferencing

24

slide-25
SLIDE 25

Multilingual Web Workshop Madrid, May 2014 Multilingual Web Workshop Madrid, May 2014

} Example: URI for Armenia? Descriptive URIs http://example.org/Armenia ¡ Opaque URIs http://example.org/I23AX45 ¡ Full IRIs http://օրինակ.օրգ#Հայաստան ¡ Internationalized paths only http://example.org#Հայաստան ¡ Language in host name http://hy.example.org#Հայաստան ¡ http://en.example.org#Armenia ¡

May be unreadable for non-Latin alphabet users Difficult to be descriptive enough in some contexts %-encoding non-ASCII characters

http://example.org/Espa%3Fa

Human-readable Good tool support

Non Human-readable Difficult to handle by developers ¡

Independence between concept and language Maintenance: changes in text don't affect URI Suitable for LD generation

Security issues (spoofing) Unreadable for speakers of other languages Tool support Readable (for one language) Unreadable for speakers of other languages Less security issues Path readable (for one language) Where should we put the language tag? Dialects can become unwieldy Example: languages & sublanguages hy-­‑Latin-­‑IT-­‑arevela ¡ Practical reasons

Independent development of datasets by language

Language in Path http://example.org/Armenia.en ¡ http://example.org/en/Armenia ¡ http://example.org/Armenia?lang=en ¡ Dialects ¡

Compatible with content negotiation

25

slide-26
SLIDE 26

Multilingual Web Workshop Madrid, May 2014 Multilingual Web Workshop Madrid, May 2014

} Which data should I return when accessing a URI?

No language content negotiation

Ignore Accept-language...all the data

Language content negotiation

Accept-­‑language:en ¡ <> ¡rdfs:label ¡"Armenia"@en ¡. ¡ Accept-­‑language:hy ¡ :<> ¡rdfs:label ¡"Հայաստան "@hy ¡. ¡ Clients have to filter triples in other languages Bandwidth overhead Easy to develop Consistency of data

Difficult to implement Looses data Less network overhead

http://example.org/Armenia ¡ http://example.org/Armenia ¡

Language content redirection

Accept-­‑language:en ¡ 303 ¡ See ¡also: ¡http://example.org/Armenia.en ¡ Accept-­‑language:hy ¡ 303 ¡ See ¡also:http://example.org/Armenia.hy ¡

http://example.org/Armenia ¡

More difficult to implement Not always feasible Keeps difference between concept and language representation ¡<> ¡rdfs:label ¡"Armenia"@en, ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡"Հայաստան"@hy ¡.

26

slide-27
SLIDE 27

Multilingual Web Workshop Madrid, May 2014 Multilingual Web Workshop Madrid, May 2014 27

TOPICS classification USE CASES PATTERNS

BEST PRACTISES & GUIDELINES

slide-28
SLIDE 28

Multilingual Web Workshop Madrid, May 2014 Multilingual Web Workshop Madrid, May 2014

Some (future) EXAMPLES. Guidelines for: Linguistic Linked Data generation RDF and Ontology translation Multilingual Linked Data generation, publication and exploitation ...

28

slide-29
SLIDE 29

Multilingual Web Workshop Madrid, May 2014 29

Wh Where are w ere are we n e now

  • w?
slide-30
SLIDE 30

Multilingual Web Workshop Madrid, May 2014 Multilingual Web Workshop Madrid, May 2014 30

TOPICS classification USE CASES PATTERNS

BEST PRACTISES & GUIDELINES

We are here (Patterns for textual information)

slide-31
SLIDE 31

Multilingual Web Workshop Madrid, May 2014 Multilingual Web Workshop Madrid, May 2014

Thanks… and get involved!

31

https://www.w3.org/community/bpmlod


Next telco: Thursday 22nd May 10:00 CEST