LogCLEF 2011 Overview Giorgio di Nunzio Thomas Mandl Johannes - - PowerPoint PPT Presentation

logclef 2011 overview
SMART_READER_LITE
LIVE PREVIEW

LogCLEF 2011 Overview Giorgio di Nunzio Thomas Mandl Johannes - - PowerPoint PPT Presentation

LogCLEF 2011 Overview Giorgio di Nunzio Thomas Mandl Johannes Leveling University of Padua Universitt Hildesheim Dublin City University dinunzio@dei.unipd.it mandl@uni-hildesheim.de Johannes.leveling@computing.dcu.ie LogCLEF 2011


slide-1
SLIDE 1

LogCLEF 2011 Overview

Thomas Mandl

Universität Hildesheim mandl@uni-hildesheim.de

LogCLEF 2011 Overview

Amsterdam, 19 Sept. 2011 Giorgio di Nunzio

University of Padua dinunzio@dei.unipd.it

Johannes Leveling

Dublin City University Johannes.leveling@computing.dcu.ie

slide-2
SLIDE 2

LogCLEF 2011 Overview

Overview

  • Objective and Tasks
  • Datasets
  • Participation
  • Thanks to everyone involved!!!
slide-3
SLIDE 3

LogCLEF 2011 Overview

Objective and Tasks 1/2

  • Lack of availability and use of log

data for research experiments

  • True!
  • goal is not the production of a gold

standard

  • Not true anymore!
  • but a forum for the creative

exploration of logs

  • Not true anymore!
slide-4
SLIDE 4

LogCLEF 2011 Overview

Objective and Tasks 2/2

  • Language identification task
  • participants are required to recognize the actual language of the query.
  • Query classification
  • participants are required to annotate each query with a label which

represents a category of interest.

  • Success of a query
  • The success can be defined in terms of time spent on a page, number of

clicked items, actions performed during the browsing of the result list.

  • Query refinement:
  • following queries in the same session are a generalization, specification, or

shift of the original one.

slide-5
SLIDE 5

LogCLEF 2011 Overview

Log Resources at LogCLEF

Year Origin Size Type 2009 Tumba! 350.000 queries Query log 2009 TEL 1,900,000 records Query and activity log 2010 TEL 760,000 records Query and activity log 2010 TEL 1.5 GB (zipped) Web server log 2010 DBS 5 GB Web server log 2011 TEL 950,000 records Query and activity log 2011 Sogou 1.9 GB (zipped) Query log

slide-6
SLIDE 6

LogCLEF 2011 Overview

Log Data Sets 2011 Overview

  • The European Library (TEL)
  • Deutscher Bildungsserver (DBS)
  • Sogou
slide-7
SLIDE 7

LogCLEF 2011 Overview

Log Data Set: TEL

  • The European Library (TEL) logs
  • national libraries of Europe
  • users and content come from many languages.

Query/search logs

  • 18 months (Jan 2007 – Jun 2008) [train]
  • 12 months (Jan 2009 – Dec 2009) [train]
  • 12 months (Jan 2010 – Dec 2010) [test]
slide-8
SLIDE 8

LogCLEF 2011 Overview

slide-9
SLIDE 9

LogCLEF 2011 Overview

slide-10
SLIDE 10

LogCLEF 2011 Overview

TEL examples

892989;guest;62.121.xxx.xxx;btprfui7keanue1u0nanhte5j0;en;("plastics mould");view_brief;a0037;31;;; 893209;guest;213.149.xxx.xxx;o270cev7upbblmqja30rdeo3p4;en;("penser leurope");search_sim;;0;-;;; 893261;guest;194.171.xxx.xxx;null;en;(“magna carta”);search_url;;0;-;;; 893487;guest;81.179.xxx.xxx;9rrrtrdp2kqrtd706pha470486;en;("spengemann"); view_brief;a0067;1;-;;; 893488;guest;81.179.xxx.xxx;9rrrtrdp2kqrtd706pha470486;en;("spengemann"); view_brief;a0000;0;-;;; 893533;guest;85.192.xxx.xxx;ckujekqff2et6r9p27h8r89le6;fr;("egypt france britain");search_sim;;0;-;;;

slide-11
SLIDE 11

LogCLEF 2011 Overview

Log Data Set: DBS

  • Deutscher Bildungsserver (DBS)
  • a quality controlled internet directory for

educational resources

  • raw server log
  • three months of activities on the portal
  • 5 GB
slide-12
SLIDE 12

LogCLEF 2011 Overview

TEL

slide-13
SLIDE 13

LogCLEF 2011 Overview

DBS examples

f64.alicedsl.de - - [09/Nov/2009:00:23:09 +0100] "GET /zeigen.html?seite=5892 HTTP/1.1" 200 22436 http://www.bildungsserver.de/zeigen.html?seite=2521 "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.5) Gecko/20091102 … 80d.superkabel.de - - [09/Nov/2009:00:26:28 +0100] "GET /db/fwulesen.html?Id=200006289 HTTP/1.1" 200 16301 http://www.google.de/search?hl=de&source=hp&q=+landes+filmstelle&btnG=Google- Suche&meta=&aq=f&oq =&fp=6013614429992176 "Mozilla/5.0 (Windows; U; Windows NT 6.0; de; rv:1.9.1.4) Gecko/20091016 Firefox/3.5.4 (.NET CLR 3.5.30729)" 937.googlebot.com - - [09/Nov/2009:00:27:09 +0100] "GET /db/ffach2.html?fach=2&Rnum=12&Snum=3 HTTP/1.1" 200 16019 - "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 5bd.ono.com - - [09/Nov/2009:00:30:46 +0100] "GET /db/mlesen.html?Id=42021 HTTP/1.1" 200 180746 - "Java/1.6.0_13" 8f4.primacom.net - - [09/Nov/2009:00:30:45 +0100] "GET /zeigen.html?seite=771 HTTP/1.1" 200 45871 http://www.bildungsserver.de/metasuche/qsuche.html? feldinhalt1=aktive+medienarbeit&bool1=AND&finden=finden&searchall= ja& datenbanken%5B%5D=dbs_seiten&DBS=1&art=einfach "Mozilla/5.0 (Windows; U; Windows NT 6.0; de; rv:1.9.1.4) Gecko/20091016 Firefox/3.5.4 (.NET CLR 3.5.30729)

slide-14
SLIDE 14

LogCLEF 2011 Overview

Log Data Set: Sogou

  • The Sogou query logs (SougouQ) contain queries

to the Chinese search engine Sogou (provided by the Tsinghua-Sogou Joint lab of Search Technology).

  • one month of web search logs from June 2008.
slide-15
SLIDE 15

LogCLEF 2011 Overview

slide-16
SLIDE 16

LogCLEF 2011 Overview

Sogou examples

00:00:12 5439871361994184 [思域+价格] 4 2 mall.ieche.com/newcar/… 00:00:12 8938876633535969 [连胜文+蔡依珊] 3 2 news.sohu.com/20061231… 00:00:12 09982882549828048 [制丸机] 4 3 www.chem17.com/products/mulu/1539.asp 00:00:13 9613878418955091 [搜狐] 3 1 news.sohu.com/ 00:00:13 28711258103341036 [宋明+高洁+高芳] 1 6 202.181.214.125/archiver/… 00:00:13 6501698160797595 [电视剧笨小孩下载] 13 7 www.fomao.com/film/21/… 00:00:13 029345760462857306 [书连] 1 1 www.shulink.com/…

slide-17
SLIDE 17

LogCLEF 2011 Overview

Creation of ground truth

  • Design and implementation of an

interface for log annotation

  • Thanks to
  • Humboldt University of Berlin
  • University of Amsterdam
  • Dublin City University
  • University of Hildesheim
  • University of Padua
  • Marco Collautti (B.Eng Thesis)
slide-18
SLIDE 18

LogCLEF 2011 Overview

LogCLEF Query annotation interface

slide-19
SLIDE 19

LogCLEF 2011 Overview

LogCLEF Query annotation interface

slide-20
SLIDE 20

LogCLEF 2011 Overview

LogCLEF Query annotation interface

slide-21
SLIDE 21

LogCLEF 2011 Overview

Ground truth produced

  • 24 users
  • (At least) 50 queries per user
  • 723 annotated query record with language, query

session, and query category.

  • A baseline was generated
  • 940,957 annotated query records with languages
  • Training set created by LogCLEF 2010
  • manual annotations for 510 query records about query

language and category of the query;

  • automatic annotations for 100 query records about query

language.

slide-22
SLIDE 22

LogCLEF 2011 Overview

Participation

 17 registered  4 groups submitted results (7 in 2010, 5 in 2009)

  • 7 institutions

 None for Sogou  Participant Institution Country

DAEDALUS Universidad Politécnica de Madrid & Universidad Carlos III de Madrid DAEDALUS - Data, Decisions and Language Spain UBER-UvA Humboldt University Berlin University of Amsterdam Germany The Netherlands CUZA ``Alexandru Ioan Cuza'' University Romania ESSEX University of Essex United Kingdom

slide-23
SLIDE 23

LogCLEF 2011 Overview

Approaches and Results

  • …come to the LogCLEF Lab!
slide-24
SLIDE 24

LogCLEF 2011 Overview

Thank you for your Attention