CLEF-IP: Information Retrieval in Intellectual Property Domain - - PowerPoint PPT Presentation

clef ip information retrieval in intellectual property
SMART_READER_LITE
LIVE PREVIEW

CLEF-IP: Information Retrieval in Intellectual Property Domain - - PowerPoint PPT Presentation

CLEF-IP: Information Retrieval in Intellectual Property Domain Florina Piroi & Mihai Lupu & Allan Hanbury & Veronika Zenz September 19, 2010 Piroi, Lupu, Hanbury, Zenz (TU,IRF, MR) CLEF-IP 2011 September 19, 2010 1 / 12 Lab


slide-1
SLIDE 1

CLEF-IP: Information Retrieval in Intellectual Property Domain

Florina Piroi & Mihai Lupu & Allan Hanbury & Veronika Zenz

September 19, 2010

Piroi, Lupu, Hanbury, Zenz (TU,IRF, MR) CLEF-IP 2011 September 19, 2010 1 / 12

slide-2
SLIDE 2

Lab Description

CLEF-IP at CLEF

Launched in 2009 Aim: investigate IR methods in patent retrieval Focus: Cross–language retrieval for European languages

Piroi, Lupu, Hanbury, Zenz (TU,IRF, MR) CLEF-IP 2011 September 19, 2010 2 / 12

slide-3
SLIDE 3

Lab Description

The CLEF-IP Collection

Target data: 3.5 million patent documents, representing 1.5 million patents (EPO, WIPO, Marec format); almost 300K image files Prior Art Candidate Search Task (PAC) Classification Task (CLS1) and Refined Classification Task (CLS2 Patent Image-based Document Retrieval (IMG-PAC) Patent Image-based Classification (IMG-CLS) Relevance Assessments: based on patent citations and existing patent classification, manual Target data: standardized XML format for patent data (Marec Scheme), with multilingual content in English, French and German. Image tasks organized with help from IMAGEClef.

Piroi, Lupu, Hanbury, Zenz (TU,IRF, MR) CLEF-IP 2011 September 19, 2010 3 / 12

slide-4
SLIDE 4

Lab Description

Previous Work

NTCIR

from 2003 Japanese, English, Chinese Ad-hoc, Prior Art, Classification (F-term), Machine Translation

TREC-Chem

  • rganized in collaboration with NIST, Univ. College London, York
  • Univ. Canada

retrieval in chemistry domains English only

Piroi, Lupu, Hanbury, Zenz (TU,IRF, MR) CLEF-IP 2011 September 19, 2010 4 / 12

slide-5
SLIDE 5

Submissions and Measures

Participants and Runs

ID Institution PAC CLS1 CLS2 IMG-PAC IMG-CLS Short ID chemnitz Chemnitz University of Technology, Retrieval Group DE x Ch hildesheim Hildesheim Univ. - Information Science DE x Ch hprussia Hewlett-Packard Labs, Russia RU x Hp hyderabad International Institute of Information Technology - SIEL IN x Hy joanneum Joanneum Research, Institute for Information and Communication Technologies AT x Jo lugano University of Lugano CH x Lu nijgmenen Radboud University Nijgmenen, Information Foraging Lab NL x x x Ni spinque Spinque NL x Ni tuwien-1 Vienna University of Technology, Inst. for Computer-Aided Automation AT x Jo tuwien-2 Vienna University of Technology, Inst. for Software Technology and Interactive Systems AT x x Lu Jo wisenut WISEnut Inc. KR x x x Wi xerox-sas Xerox Research Centre Europe FR x x Xe Total: 30 16 9 10 12 Same Short IDs in the participant list indicate group collaborations Piroi, Lupu, Hanbury, Zenz (TU,IRF, MR) CLEF-IP 2011 September 19, 2010 5 / 12

slide-6
SLIDE 6

Submissions and Measures

Measures

PAC and IMG-PAC Tasks

Precision, Precision@5, Precision@10, Precision@50, Precision@100 Recall, Recall@5, Recall@10, Recall@50, Recall@100 MAP, nDCG trec eval

CLS1 and CLS2 Tasks

Precision@1, Precision@5 Recall@1, Recall@5 MAP, F1 at 1, 5. trec eval

Piroi, Lupu, Hanbury, Zenz (TU,IRF, MR) CLEF-IP 2011 September 19, 2010 6 / 12

slide-7
SLIDE 7

Submissions and Measures

Measures Ctnd.

IMG-CLS Task

Equal Error Rate (EER) Area Under Curve (AUC) or a ROC curve True Positive Rate (TPR) Octave

Piroi, Lupu, Hanbury, Zenz (TU,IRF, MR) CLEF-IP 2011 September 19, 2010 7 / 12

slide-8
SLIDE 8

CLEF-IP Tasks

Prior Art Candidates Search Task

3973 topics, topic documents are patent application documents, 1/3 in English, 1/3 in German, and 1/3 French; 300 topic training set Task: Find documents in the corpus that potentially invalidate the given patent document Relevance judgements are automatically extracted from patent citations stored in patent search reports, extended to include patent family members.

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 Hy.5 Ch.3 Ch.2 Lu.1 Lu.2 Lu.3 Lu.4 Ch.7 Ch.1 Ni.3 Ni.1 Ch.4 Hy.3 Wi.7 Wi.6 Wi.5 Wi.8 Wi.4 Wi.3 Wi.2 Wi.1 Hy.4 Hy.6 Hy.2 Ni.2 Hy.1 Ch.6 Ni.4 Ch.5 Hp.1 map P P at 100 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Hy.5 Ch.3 Ch.2 Lu.1 Lu.2 Lu.3 Lu.4 Ch.7 Ch.1 Ni.3 Ni.1 Ch.4 Hy.3 Wi.7 Wi.6 Wi.5 Wi.8 Wi.4 Wi.3 Wi.2 Wi.1 Hy.4 Hy.6 Hy.2 Ni.2 Hy.1 Ch.6 Ni.4 Ch.5 Hp.1 ndcg R R at 100

Piroi, Lupu, Hanbury, Zenz (TU,IRF, MR) CLEF-IP 2011 September 19, 2010 8 / 12

slide-9
SLIDE 9

CLEF-IP Tasks

Classification Tasks

CLS1: 3000 topics; CLS2: 4934 topics; Topic documents are patent application documents CLS1 Task: Classify patent document according to the IPC (International Patent Classification) system, up to the subclass level. CLS2 Task: Classify patent document according to the IPC system, up to the group/subgroup level when the subclass level is given. Relevance judgements are classifications recorded in the patent documents on which the topics were based.

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Ni.1 Ni.5 Ni.2 Ni.4 Ni.6 Ni.7 Ni.3 Ni.8 Wi.8 Wi.6 Wi.5 Wi.7 Wi.1 Wi.2 Wi.3 Wi.4 P at 1 P at 5 R at 5 F1 at 5 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 Wi.5 Wi.8 Wi.6 Wi.7 Wi.2 Wi.1 Wi.4 Wi.3 Ni.1 P at 1 P at 5 R at 5 F1 at 5

Piroi, Lupu, Hanbury, Zenz (TU,IRF, MR) CLEF-IP 2011 September 19, 2010 9 / 12

slide-10
SLIDE 10

CLEF-IP Tasks

Patent Image-based Classification

A pilot task with 1000 topics. A topic is a b/w patent image file Task: Automatically classify the images into 9 classes: abstract drawing, graph, flow chart, gene sequence, program listing, symbol, chemical structure, table, mathematics Training data for each of the classes was provided

run EER AUC TPR Jo.alphacentauri 0.15 0.91 0.66 Jo.arcturus 0.24 0.81 0.50 Jo.betelgeuse 0.18 0.90 0.62 Jo.canopus 0.16 0.91 0.65 Jo.procyon 0.37 0.67 0.27 Jo.rigel 0.16 0.90 0.63 Jo.sirius 0.16 0.91 0.64 Jo.vega 0.32 0.72 0.28 Xe.RUNORH 0.06 0.98 0.85 Xe.RUNORH ROTRAIN 0.04 0.99 0.91 Xe.FV ORH SP 0.08 0.92 0.85 Xe.MEAN ALL 0.08 0.91 0.85

Piroi, Lupu, Hanbury, Zenz (TU,IRF, MR) CLEF-IP 2011 September 19, 2010 10 / 12

slide-11
SLIDE 11

CLEF-IP Tasks

Patent Image-based Document Retrieval

A pilot task with 211 topics. A topic is a patent document and its attached image files Task: Find documents in the corpus that potentially invalidate the given patent document using the available patent images Corpus is reduced to patents in 3 IPC subclasses: A43B, A61B,

  • H01L. Patent images were available

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Xe.1 Xe.9 Xe.5 Xe.2 Xe.4 Xe.3 Xe.10 Xe.6 Xe.8 Xe.7 map P at 5 R at 5 R

Piroi, Lupu, Hanbury, Zenz (TU,IRF, MR) CLEF-IP 2011 September 19, 2010 11 / 12

slide-12
SLIDE 12

Thank You

Thank You

CLEF-IP Workshop, Tuesday

Our ‘Supporters’

EU Network of Excellence PROMISE (FP7-258191) Austrian Research Promotion Agency (FFG) FIT-IT project IMPEX (No. 825846) www.ir-facility.org/clef-ip www.ifs.tuwien.ac.at/~clef-ip/

{piroi, lupu, hanbury}@ifs.tuwien.ac.at m.lupu@ir-facility.org veronika.zenz@max-recall.com

Piroi, Lupu, Hanbury, Zenz (TU,IRF, MR) CLEF-IP 2011 September 19, 2010 12 / 12