GALATEAS - EU Project Part of the European Commission s I nform - - PowerPoint PPT Presentation
GALATEAS - EU Project Part of the European Commission s I nform - - PowerPoint PPT Presentation
G eneralized A nalysis of L ogs for A utomatic y g T ranslation and E pisodic A nalysis of S earches http: / / www.galateas.eu GALATEAS - EU Project Part of the European Commission s I nform ation and Com m unication Technologies
GALATEAS - EU Project
Part of the European Commission ‘s I nform ation and Com m unication Multitudinous vendors offer engines for retrieving contents and Technologies Policy Support Program m e
(Co-funded by the European Commission for an overall budget of 3.7M Euros)
01/ 04/ 2010 to 31/ 03/ 2013
Multitudinous vendors offer engines for retrieving contents and metadata via search requests by end users. These queries are a precious resource for understanding user
- behaviour. GALATEAS w ill greatly help to custom ize these
search requests and enable content providers to understand w hat inform ation users are really looking for understand w hat inform ation users are really looking for. GALATEAS w ill address tw o im portant challenges:
- Making sense of short queries in any language and
- Translating them .
This will help content administrators to answer questions that are crucially important to them, such as:
- Which are the topics which are most commonly searched in my
collection, according to a certain language? , g g g
- How do these topics relate with my catalogue?
- Which named entities (people, places) are more popular among
my users?”.
GALATEAS - EU Project
Today content provider cannot customize Today content provider cannot customize content and indexing as they don’t know their users.
The GALATEAS project offers digital content providers
an innovative approach to – an innovative approach to understanding users' behaviour by analysing language-based information from transaction logs from transaction logs – technologies facilitating improved navigation and search for m ultilingual navigation and search for m ultilingual content access
GALATEAS - web services
GALATEAS develops two web services
LangLog: It will analyze transaction log containing queries to search engines for a given content provider. B l i t ti ti l t h l i l d ith l i t d i By applying statistical technologies coupled with language oriented services, it will produce reports concerning the informational needs of the users accessing that particular aggregation. LangLog will provide generalizations
- f the actions that information seekers perform in order to find contents inside
p a searchable collection of digital objects. QueryTrans: It will translate queries coming from an external search engine into several target languages: the external search engine will return to the into several target languages: the external search engine will return to the user results into languages different from the one in which the query was formulated.
LangLog -Understand user needs LangLog
Understand user needs
Challenge - Recognise named entities and deal with multilingual terms in
Query 1 Tableau Mona Li (F)
multilingual terms in very short texts
La Gioconda
Index term 1 Index term 2
Lisa (F) Query 2 Oil painting la Gioconda (EN) Query 3 La Gioconda Oil Painting
Index term 2 Index term 3
pitturi da Vinci (IT) Painting
Identify appropriate index terms GALATEAS according to what the user is looking for
LangLog - Customise according to user needs
Query ID Query Class Query Query Class
LangLog
Customise according to user needs
Query ID Query Class Query 1 Leonardo da Vinci, La Gioconda Art Query 2 Leonardo da Vinci, Science Query ID Query Class Query X Leonardo da Vinci, hydraulics Science Que y eo a do da c , Vitruvian Man Sc e ce Query 2 Oil painting, la Gioconda Art Query 3 La Gioconda pitturi Art hydraulics, hydrometer Query 3 La Gioconda, pitturi, da Vinci Art Query 5 Leonardo da Vinci, meteorology Science
GALATEAS Challenge – Perform classification Assign to previously unseen queries a class from your i d i hi h Perform classification and clustering with short query texts indexing hierarchy
QueryTrans – Query in multiple languages y
y p g g
direct answer No answer answer
the raft of the
more answers
I t
query
Medusa? I want a picture about “le radeau de la Méduse” (F)
query translation
(F)
GALATEAS GALATEAS machine translation resources
Sources
- Our sources are the transaction logs of
Sources
- Our sources are the transaction logs of
specialised content providers which contain
– Information that is already structured such as: Information that is already structured such as:
- Session data
- Clickthrough data
Di it l t t id ’ t t d i f ti hi hi
- Digital content providers’ structured information hierarchies
– Unstructured information
- The queries themselves
q
Technologies
GALATEAS will combine uniquely
Technologies
GALATEAS will combine uniquely
- Language resources
– Bilingual dictionaries, word lists, synonyms g y y
- State-of-the-art natural language processing tools
– E.g. Xerox Incremental Parser (XIP)
D t i i d l t t
- Data mining and log management components
– Extract Transform Load tools
- Query expansion classification and clustering systems
Query expansion, classification and clustering systems
– E.g. CLUTO
- Machine translation software
– MOSES
Technologies
All technologies are incorporated in a web services framework that
Technologies
All technologies are incorporated in a web services framework that allows easy integration of third-party technologies and great extensibility Web ser ices Original query Translated query Web services GALATEAS core Customer Semantic services Query logs q y Semantic similarity Customised reports Named Entity Recognition/ Part of Speech Tagging similarity W b i Natural Language Processing services Web services
GALATEAS partners GALATEAS partners
- Project coordinator:
Xerox Research Centre Xerox Research Centre Europe (France)
- Objet Direct (France)
Objet Direct (France)
- CELI (Italy)
- University of Trento (Italy)
G t k (It l )
- Gonetwork (Italy)
- Bridgeman Art (England)
- Humboldt University
(Germany)
- University of Amsterdam
(Netherlands) ( )
http:/ / w w w .galateas.eu
Questions?
http:/ / w w w galateas eu http:/ / w w w .galateas.eu Contact: Xerox Research Centre Europe Frédérique Segond 6 chemin de Maupertuis 38240 Meylan; France 38240 Meylan; France frederique.segond@xrce.xerox.com