GALATEAS - EU Project Part of the European Commission s I nform - - PowerPoint PPT Presentation

galateas eu project
SMART_READER_LITE
LIVE PREVIEW

GALATEAS - EU Project Part of the European Commission s I nform - - PowerPoint PPT Presentation

G eneralized A nalysis of L ogs for A utomatic y g T ranslation and E pisodic A nalysis of S earches http: / / www.galateas.eu GALATEAS - EU Project Part of the European Commission s I nform ation and Com m unication Technologies


slide-1
SLIDE 1

Generalized Analysis of Logs for Automatic

y g

Translation and Episodic Analysis of Searches

http: / / www.galateas.eu

slide-2
SLIDE 2

GALATEAS - EU Project

Part of the European Commission ‘s I nform ation and Com m unication Multitudinous vendors offer engines for retrieving contents and Technologies Policy Support Program m e

(Co-funded by the European Commission for an overall budget of 3.7M Euros)

01/ 04/ 2010 to 31/ 03/ 2013

Multitudinous vendors offer engines for retrieving contents and metadata via search requests by end users. These queries are a precious resource for understanding user

  • behaviour. GALATEAS w ill greatly help to custom ize these

search requests and enable content providers to understand w hat inform ation users are really looking for understand w hat inform ation users are really looking for. GALATEAS w ill address tw o im portant challenges:

  • Making sense of short queries in any language and
  • Translating them .

This will help content administrators to answer questions that are crucially important to them, such as:

  • Which are the topics which are most commonly searched in my

collection, according to a certain language? , g g g

  • How do these topics relate with my catalogue?
  • Which named entities (people, places) are more popular among

my users?”.

slide-3
SLIDE 3

GALATEAS - EU Project

Today content provider cannot customize Today content provider cannot customize content and indexing as they don’t know their users.

The GALATEAS project offers digital content providers

an innovative approach to – an innovative approach to understanding users' behaviour by analysing language-based information from transaction logs from transaction logs – technologies facilitating improved navigation and search for m ultilingual navigation and search for m ultilingual content access

slide-4
SLIDE 4

GALATEAS - web services

GALATEAS develops two web services

LangLog: It will analyze transaction log containing queries to search engines for a given content provider. B l i t ti ti l t h l i l d ith l i t d i By applying statistical technologies coupled with language oriented services, it will produce reports concerning the informational needs of the users accessing that particular aggregation. LangLog will provide generalizations

  • f the actions that information seekers perform in order to find contents inside

p a searchable collection of digital objects. QueryTrans: It will translate queries coming from an external search engine into several target languages: the external search engine will return to the into several target languages: the external search engine will return to the user results into languages different from the one in which the query was formulated.

slide-5
SLIDE 5

LangLog -Understand user needs LangLog

Understand user needs

Challenge - Recognise named entities and deal with multilingual terms in

Query 1 Tableau Mona Li (F)

multilingual terms in very short texts

La Gioconda

Index term 1 Index term 2

Lisa (F) Query 2 Oil painting la Gioconda (EN) Query 3 La Gioconda Oil Painting

Index term 2 Index term 3

pitturi da Vinci (IT) Painting

Identify appropriate index terms GALATEAS according to what the user is looking for

slide-6
SLIDE 6

LangLog - Customise according to user needs

Query ID Query Class Query Query Class

LangLog

Customise according to user needs

Query ID Query Class Query 1 Leonardo da Vinci, La Gioconda Art Query 2 Leonardo da Vinci, Science Query ID Query Class Query X Leonardo da Vinci, hydraulics Science Que y eo a do da c , Vitruvian Man Sc e ce Query 2 Oil painting, la Gioconda Art Query 3 La Gioconda pitturi Art hydraulics, hydrometer Query 3 La Gioconda, pitturi, da Vinci Art Query 5 Leonardo da Vinci, meteorology Science

GALATEAS Challenge – Perform classification Assign to previously unseen queries a class from your i d i hi h Perform classification and clustering with short query texts indexing hierarchy

slide-7
SLIDE 7

QueryTrans – Query in multiple languages y

y p g g

direct answer No answer answer

the raft of the

more answers

I t

query

Medusa? I want a picture about “le radeau de la Méduse” (F)

query translation

(F)

GALATEAS GALATEAS machine translation resources

slide-8
SLIDE 8

Sources

  • Our sources are the transaction logs of

Sources

  • Our sources are the transaction logs of

specialised content providers which contain

– Information that is already structured such as: Information that is already structured such as:

  • Session data
  • Clickthrough data

Di it l t t id ’ t t d i f ti hi hi

  • Digital content providers’ structured information hierarchies

– Unstructured information

  • The queries themselves

q

slide-9
SLIDE 9

Technologies

GALATEAS will combine uniquely

Technologies

GALATEAS will combine uniquely

  • Language resources

– Bilingual dictionaries, word lists, synonyms g y y

  • State-of-the-art natural language processing tools

– E.g. Xerox Incremental Parser (XIP)

D t i i d l t t

  • Data mining and log management components

– Extract Transform Load tools

  • Query expansion classification and clustering systems

Query expansion, classification and clustering systems

– E.g. CLUTO

  • Machine translation software

– MOSES

slide-10
SLIDE 10

Technologies

All technologies are incorporated in a web services framework that

Technologies

All technologies are incorporated in a web services framework that allows easy integration of third-party technologies and great extensibility Web ser ices Original query Translated query Web services GALATEAS core Customer Semantic services Query logs q y Semantic similarity Customised reports Named Entity Recognition/ Part of Speech Tagging similarity W b i Natural Language Processing services Web services

slide-11
SLIDE 11

GALATEAS partners GALATEAS partners

  • Project coordinator:

Xerox Research Centre Xerox Research Centre Europe (France)

  • Objet Direct (France)

Objet Direct (France)

  • CELI (Italy)
  • University of Trento (Italy)

G t k (It l )

  • Gonetwork (Italy)
  • Bridgeman Art (England)
  • Humboldt University

(Germany)

  • University of Amsterdam

(Netherlands) ( )

http:/ / w w w .galateas.eu

slide-12
SLIDE 12

Questions?

http:/ / w w w galateas eu http:/ / w w w .galateas.eu Contact: Xerox Research Centre Europe Frédérique Segond 6 chemin de Maupertuis 38240 Meylan; France 38240 Meylan; France frederique.segond@xrce.xerox.com