A FRAMEWORK FOR MULTILINGUAL AND SEMANTIC ENRICHMENT OF DIGITAL - - PowerPoint PPT Presentation

a framework for
SMART_READER_LITE
LIVE PREVIEW

A FRAMEWORK FOR MULTILINGUAL AND SEMANTIC ENRICHMENT OF DIGITAL - - PowerPoint PPT Presentation

Co-funded by the Horizon 2020 Framework Programme of the European Union Grant Agreement Number 644771 A FRAMEWORK FOR MULTILINGUAL AND SEMANTIC ENRICHMENT OF DIGITAL CONTENT (NEW L10N BUSINESS OPPORTUNITIES) FREME WEBINAR HELD FOR GALA, 28


slide-1
SLIDE 1 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 1 Co-funded by the Horizon 2020 Framework Programme of the European Union Grant Agreement Number 644771

FREME WEBINAR HELD FOR GALA, 28 APRIL 2016

A FRAMEWORK FOR MULTILINGUAL AND SEMANTIC ENRICHMENT OF DIGITAL CONTENT (NEW L10N BUSINESS OPPORTUNITIES)

www.freme-project.eu

Presented by Tatjana Gornostaja (Tilde) and Felix Sasaki (DFKI / W3C Fellow)

slide-2
SLIDE 2 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 2

OVERVIEW

  • Introduction
  • Technological aspects of the framework
  • Localization and other FREME business cases
  • Q&A
slide-3
SLIDE 3 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 3

Cou Coupl pling ing Kn Know

  • wledge

ledge an and Lan d Languag guage vi via e a e-Ser Service vice Ec Ecos

  • sys

ystem tem

slide-4
SLIDE 4 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 4

Knowledge Language

slide-5
SLIDE 5 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 5

Knowledge Language

slide-6
SLIDE 6 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 6

Knowledge Language

slide-7
SLIDE 7 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 7

FRE FREME ME

Picture: coloringpageswallpaper.com

slide-8
SLIDE 8 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 8

THE FREME PROJECT

  • Two year H2020 Innovation action; start February 2015
  • Industry partners leading four business cases around

digital content and (linked) data

  • Technology development bridging language and data
  • Outreach and business modelling demonstrating monetization of the multilingual

data value chain

slide-9
SLIDE 9 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 9

CURRENT STATE OF SOLUTIONS

Machine translation, terminology annotation, ... Linked data creation & processing

GAPS THAT HINDER BUSINESS:

  • Plethora of formats
  • Adaptability and platform dependency
  • Language coverage
  • Usability “The right tool for the right person

in given and new enterprises”: technology influences job profiles

slide-10
SLIDE 10 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 10

FREME TO THE RESCUE: ENRICHING DIGITAL CONTENT

Machine translation, terminology annotation, ... Linked data creation & processing

LT and LD as first class citizens on the Web

A SET OF INTERFACES* - DESIGN DRIVEN BY BUSINESS CASES

LT and LD for various user types: (application) developer, content architect, content author, … * Graphical interfaces * Software Interfaces

slide-11
SLIDE 11 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 11
slide-12
SLIDE 12 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 12

OVERVIEW

  • Introduction
  • Technological aspects of the framework
  • Localization and other FREME business cases
  • Q&A
slide-13
SLIDE 13 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 13

FREME FROM A TECHNICAL PERSPECTIVE

A framework for multilingual and semantic enrichment of digital content that provides access via a set of APIs and GUIs to six E- services.

  • e-Entity for enriching content with information on named entities;
  • e-Link for enrichment with linked data sources;
  • e-Terminology for detecting terms and enriching them with term

related information;

  • e-Translation for providing custom machine translation systems;
  • e-Internationalisation for processing a variety of digital content

formats; and

  • e-Publishing for exporting the outcome of enrichment processes

in the ePub format.

slide-14
SLIDE 14 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 14

FREME FROM A TECHNICAL PERSPECTIVE

How to access FREME – several options:

  • A life version 0.5 (0.6 soon to be released!) including

documentation at http://api.freme-project.eu/doc/current/

  • A development version at http://api-dev.freme-project.eu/doc/
  • A Java / maven software package;

see the documentation for installation instructions

  • Source code in a GitHub project

https://github.com/freme-project/

  • The framework is available under Apache 2.0 license to ease

commercial use

  • Underlying services have various licensing conditions
slide-15
SLIDE 15 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 15

LINGUISTIC LINKED DATA AND OTHER STANDARDS PUT IN ACTION VIA FREME

  • NIF (Natural Language Processing Interchange Format) for

representing digital content and enrichment information in a format agnostic manner, based on the linked data stack;

  • OntoLex lemon for representing lexical information, to be used

e.g. for improving machine translation output;

  • Internationalization Tag Set 2.0 for representing various types
  • f enrichment information in a standardized manner, related

e.g. to terminology named entities; and

  • The general linked data technology stack (RDF, SPARQL etc.)

FREME is built on outcomes of standard driving projects in FP7 in the area of linguist linked data: LIDER and FALCON

  • Cf. http://lider-project.eu/ and http://falcon-project.eu/
slide-16
SLIDE 16 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 16

EXAMPLE API CALL

  • The request is made to the API for the e-Entity service, a service

that enriches content with named entities.

  • The input format of content is plain text; the output format is turtle.
  • The content to enrich is “Welcome to the city of Prague”.
  • The language or the content is English.
  • The dataset used for the enrichment is DBpedia.
slide-17
SLIDE 17 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 17

EXAMPLE OUTPUT: USING NIF TO STORE CONTENT …

(1) <http://freme-project.eu/#char=0,29> (2) a nif:String , nif:Context , nif:RFC5147String ; (3) nif:beginIndex "0"^^xsd:int ; (4) nif:endIndex "29"^^xsd:int ; (5) nif:isString "Welcome to the city of Prague"^^xsd:string . 1) Identifying the content via a URI 2) Adding certain types from NIF* 3) Identifying the start offset of the content 4) Identifying the end offset of the content 5) Providing the string content itself.

* For More on NIF: see a dedicated tutorial http://de.slideshare.net/m1ci/nif-tutorial

slide-18
SLIDE 18 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 18

… AND ENRICHMENT INFORMATION

(1) <http://freme-project.eu/#char=23,29> … (2) nif:anchorOf "Prague"^^xsd:string ; (3) nif:beginIndex "23"^^xsd:int ; (4) nif:endIndex "29"^^xsd:int ; (5) nif:referenceContext <http://freme-project.eu/#char=0,29> ; (6) itsrdf:taClassRef <http://dbpedia.org/ontology/City>. 1) Identifying the annotation via a URI 2) Providing the string content of the annotation 3) Identifying the start offset of the content 4) Identifying the end offset of the content 5) Relating the content to annotations 6) Enrichment with ITS 2.0 class information (“Prague” = a city)

slide-19
SLIDE 19 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 19

SIMPLIFIED OUTPUT HELPS API DEVELOPERS TO CONSUME LINKED DATA

  • FREME provides user specified filter mechanism to simply

the output

  • Supports CVS, XML or JSON
  • Example output as CSV

http://dbpedia.org/resource/Prague,50.0878367932108,14.424132200 1241 For more infos on filtering, see http://api.freme-project.eu/doc/current/knowledge-base/filtering.html

slide-20
SLIDE 20 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 20

FORMAT COVERAGE

  • Processing of various content formats
  • NIF, RDF, Text, HTML, OpenOffice, XLIFF 1.2, various XML

formats, …

  • Many formats are processed via e-Internationalization

services

  • Format specified in API call as input and (partially

supported) output

slide-21
SLIDE 21 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 21

USING E-TERMINOLOGY WITH HTML OUTPUT

<!DOCTYPE html> … <body> <p>Welcome to the city of Prague.</p> </body> … </html> <!DOCTYPE html> … <p>Welcome to the <span its-term="yes">city</span> of Prague. …</html>

Call of e-Terminology

slide-22
SLIDE 22 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 22

TRANSLATING XLIFF CONTENT WITH E-TRANSLATION

...<trans-unit> <source>This is car</source> </trans-unit> ... <http://freme-project.eu/#char=0,13> nif:isString "This is a car"@en itsrdf:target "Dies ist ein Auto"@de .

Call of e-Translation

slide-23
SLIDE 23 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 23

IMPROVING E-TRANSLATION OUTPUT VIA E-TERMINOLOGY

“The EU in brief. The EU is a unique economic and political partnership between 28 European countries that together cover much of the continent.” continent, partnership, briefing, economics, covering

Call of e-Terminology: detection of translation suggestions

De voorschriften in DE EU. De EU is een uniek partnerschap tussen politiek en economie in de Europese landen, die gezamenlijk 28 verpakking van het continent.

Call of e-Translation: improved output!

slide-24
SLIDE 24 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 24

OVERVIEW

  • Introduction
  • Technological aspects of the framework
  • Localization and other FREME business cases
  • Q&A
slide-25
SLIDE 25 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 25

MOTIVATION

  • Aid translators
  • Supplement typical linguistic support tools like glossary look-up

with entity recognition and term disambiguation

  • Possibility to introduce proprietary and domain-specific semantic

datasets

  • Provide “Value-Add” to customers
  • Make their content more interactive, compelling and

discoverable

  • Open up service offerings to new customers from existing and

new channels

slide-26
SLIDE 26 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 26

TRANSLATOR SUPPORT

  • Automatic machine

translation suggestions

  • Automatic terminology

look-up

  • Includes definitions
  • Automatic Entity

Recognition

  • Includes many textual

and visual contextual properties: descriptions, images, links to other resources…

slide-27
SLIDE 27 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 27

CUSTOMER VALUE-ADD

  • Relationships can be

formed between new content and existing knowledge resources

  • Utilize open and

private Multilingual Linked Data Cloud

DBpedia

Proprietary dataset Translated Content

slide-28
SLIDE 28 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 28

BUSINESS BENEFITS

  • Technological Support to Content Authors and Localizers
  • Aid with the cognitive and physical tasks of finding and

employing the most appropriate terminology

  • Opens up Conversations with New Customers
  • Deliver semantically richer, more interactive, highly

sociable and discoverable content

  • Through integration, enrichment added automatically

can be validated by human and saved with content

  • Demonstrates Vistatec thought leadership to customers

looking for service differentiators and value add

slide-29
SLIDE 29 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 29

CHALLENGE AND OPPORTUNITY: BIG DATA IS GROWING ACROSS LANGUAGES, SECTORS AND DOMAINS

  • BC: Digital publishing
  • BC: Translation and localisation
  • BC: Agriculture and food domain data
  • BC: Web site personalisation

Agriculture metadata, user content, news content, …

WHAT LIES AHEAD FOR SEVERAL INDUSTRIES? SEE THE FREME BUSINESS CASES

EN ES JA, ZH, ... AR

slide-30
SLIDE 30 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 30

DIGITAL PUBLISHING

With a simple click you can fetch extra information from a dataset and use it to annotate content.

slide-31
SLIDE 31 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 31

AGRICULTURE AND FOOD DATA

Domain experts can automatically extract terms from title, description, abstracts and full text.

slide-32
SLIDE 32 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 32

PERSONALISATION OF WEB CONTENT

Businesses can identify the topics their customers are engaging with, focusing their global content strategy.

slide-33
SLIDE 33 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 33

CONTACTS

E-mail: info@freme-project.eu felix.sasaki@dfki.de tatjana.gornostaja@tilde.com

CONSORTIUM

slide-34
SLIDE 34 FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 34

OVERVIEW

  • Introduction
  • Technological aspects of the framework
  • Localization and other FREME business cases
  • Q&A