clef ip information retrieval in intellectual property
play

CLEF-IP: Information Retrieval in Intellectual Property Domain - PowerPoint PPT Presentation

CLEF-IP: Information Retrieval in Intellectual Property Domain Florina Piroi & Mihai Lupu & Allan Hanbury & Veronika Zenz September 19, 2010 Piroi, Lupu, Hanbury, Zenz (TU,IRF, MR) CLEF-IP 2011 September 19, 2010 1 / 12 Lab


  1. CLEF-IP: Information Retrieval in Intellectual Property Domain Florina Piroi & Mihai Lupu & Allan Hanbury & Veronika Zenz September 19, 2010 Piroi, Lupu, Hanbury, Zenz (TU,IRF, MR) CLEF-IP 2011 September 19, 2010 1 / 12

  2. Lab Description CLEF-IP at CLEF Launched in 2009 Aim: investigate IR methods in patent retrieval Focus: Cross–language retrieval for European languages Piroi, Lupu, Hanbury, Zenz (TU,IRF, MR) CLEF-IP 2011 September 19, 2010 2 / 12

  3. Lab Description The CLEF-IP Collection Target data: 3.5 million patent documents, representing 1.5 million patents (EPO, WIPO, Marec format); almost 300K image files Prior Art Candidate Search Task (PAC) Classification Task (CLS1) and Refined Classification Task (CLS2 Patent Image-based Document Retrieval (IMG-PAC) Patent Image-based Classification (IMG-CLS) Relevance Assessments: based on patent citations and existing patent classification, manual Target data: standardized XML format for patent data (Marec Scheme), with multilingual content in English, French and German. Image tasks organized with help from IMAGEClef. Piroi, Lupu, Hanbury, Zenz (TU,IRF, MR) CLEF-IP 2011 September 19, 2010 3 / 12

  4. Lab Description Previous Work NTCIR from 2003 Japanese, English, Chinese Ad-hoc, Prior Art, Classification (F-term), Machine Translation TREC-Chem organized in collaboration with NIST, Univ. College London, York Univ. Canada retrieval in chemistry domains English only Piroi, Lupu, Hanbury, Zenz (TU,IRF, MR) CLEF-IP 2011 September 19, 2010 4 / 12

  5. Submissions and Measures Participants and Runs IMG-PAC IMG-CLS Short ID CLS1 CLS2 PAC ID Institution chemnitz Chemnitz University of Technology, Retrieval Group DE x Ch hildesheim Hildesheim Univ. - Information Science DE x Ch hprussia Hewlett-Packard Labs, Russia RU x Hp hyderabad International Institute of Information Technology - IN x Hy SIEL joanneum Joanneum Research, Institute for Information and AT x Jo Communication Technologies lugano University of Lugano CH x Lu nijgmenen Radboud University Nijgmenen, Information Foraging NL x x x Ni Lab spinque Spinque NL x Ni tuwien-1 Vienna University of Technology, Inst. for AT x Jo Computer-Aided Automation tuwien-2 Vienna University of Technology, Inst. for Software AT x x Lu Jo Technology and Interactive Systems wisenut WISEnut Inc. KR x x x Wi xerox-sas Xerox Research Centre Europe FR x x Xe Total: 30 16 9 10 12 Same Short IDs in the participant list indicate group collaborations Piroi, Lupu, Hanbury, Zenz (TU,IRF, MR) CLEF-IP 2011 September 19, 2010 5 / 12

  6. Submissions and Measures Measures PAC and IMG-PAC Tasks Precision, Precision@5, Precision@10, Precision@50, Precision@100 Recall, Recall@5, Recall@10, Recall@50, Recall@100 MAP, nDCG trec eval CLS1 and CLS2 Tasks Precision@1, Precision@5 Recall@1, Recall@5 MAP, F 1 at 1, 5. trec eval Piroi, Lupu, Hanbury, Zenz (TU,IRF, MR) CLEF-IP 2011 September 19, 2010 6 / 12

  7. Submissions and Measures Measures Ctnd. IMG-CLS Task Equal Error Rate (EER) Area Under Curve (AUC) or a ROC curve True Positive Rate (TPR) Octave Piroi, Lupu, Hanbury, Zenz (TU,IRF, MR) CLEF-IP 2011 September 19, 2010 7 / 12

  8. CLEF-IP Tasks Prior Art Candidates Search Task 3973 topics, topic documents are patent application documents , 1 / 3 in English, 1 / 3 in German, and 1 / 3 French; 300 topic training set Task: Find documents in the corpus that potentially invalidate the given patent document Relevance judgements are automatically extracted from patent citations stored in patent search reports , extended to include patent family members . 0.1 0.7 map ndcg P R P at 100 R at 100 0.09 0.6 0.08 0.5 0.07 0.06 0.4 0.05 0.3 0.04 0.03 0.2 0.02 0.1 0.01 0 0 Hy.5 Ch.3 Ch.2 Lu.1 Lu.2 Lu.3 Lu.4 Ch.7 Ch.1 Ni.3 Ni.1 Ch.4 Hy.3 Wi.7 Wi.6 Wi.5 Wi.8 Wi.4 Wi.3 Wi.2 Wi.1 Hy.4 Hy.6 Hy.2 Ni.2 Hy.1 Ch.6 Ni.4 Ch.5 Hp.1 Hy.5 Ch.3 Ch.2 Lu.1 Lu.2 Lu.3 Lu.4 Ch.7 Ch.1 Ni.3 Ni.1 Ch.4 Hy.3 Wi.7 Wi.6 Wi.5 Wi.8 Wi.4 Wi.3 Wi.2 Wi.1 Hy.4 Hy.6 Hy.2 Ni.2 Hy.1 Ch.6 Ni.4 Ch.5 Hp.1 Piroi, Lupu, Hanbury, Zenz (TU,IRF, MR) CLEF-IP 2011 September 19, 2010 8 / 12

  9. CLEF-IP Tasks Classification Tasks CLS1: 3000 topics; CLS2: 4934 topics; Topic documents are patent application documents CLS1 Task: Classify patent document according to the IPC (International Patent Classification) system, up to the subclass level. CLS2 Task: Classify patent document according to the IPC system, up to the group/subgroup level when the subclass level is given. Relevance judgements are classifications recorded in the patent documents on which the topics were based. 0.9 0.55 P at 1 P at 1 P at 5 P at 5 R at 5 0.5 R at 5 F1 at 5 F1 at 5 0.8 0.45 0.7 0.4 0.35 0.6 0.3 0.5 0.25 0.2 0.4 0.15 0.3 0.1 0.2 0.05 Ni.1 Ni.5 Ni.2 Ni.4 Ni.6 Ni.7 Ni.3 Ni.8 Wi.8 Wi.6 Wi.5 Wi.7 Wi.1 Wi.2 Wi.3 Wi.4 Wi.5 Wi.8 Wi.6 Wi.7 Wi.2 Wi.1 Wi.4 Wi.3 Ni.1 Piroi, Lupu, Hanbury, Zenz (TU,IRF, MR) CLEF-IP 2011 September 19, 2010 9 / 12

  10. CLEF-IP Tasks Patent Image-based Classification A pilot task with 1000 topics. A topic is a b/w patent image file Task: Automatically classify the images into 9 classes: abstract drawing, graph, flow chart, gene sequence, program listing, symbol, chemical structure, table, mathematics Training data for each of the classes was provided run EER AUC TPR Jo.alphacentauri 0.15 0.91 0.66 Jo.arcturus 0.24 0.81 0.50 Jo.betelgeuse 0.18 0.90 0.62 Jo.canopus 0.16 0.91 0.65 Jo.procyon 0.37 0.67 0.27 Jo.rigel 0.16 0.90 0.63 Jo.sirius 0.16 0.91 0.64 Jo.vega 0.32 0.72 0.28 Xe.RUNORH 0.06 0.98 0.85 Xe.RUNORH ROTRAIN 0.04 0.99 0.91 Xe.FV ORH SP 0.08 0.92 0.85 Xe.MEAN ALL 0.08 0.91 0.85 Piroi, Lupu, Hanbury, Zenz (TU,IRF, MR) CLEF-IP 2011 September 19, 2010 10 / 12

  11. CLEF-IP Tasks Patent Image-based Document Retrieval A pilot task with 211 topics. A topic is a patent document and its attached image files Task: Find documents in the corpus that potentially invalidate the given patent document using the available patent images Corpus is reduced to patents in 3 IPC subclasses: A43B, A61B, H01L. Patent images were available 1 map P at 5 R at 5 0.9 R 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Xe.1 Xe.9 Xe.5 Xe.2 Xe.4 Xe.3 Xe.10 Xe.6 Xe.8 Xe.7 Piroi, Lupu, Hanbury, Zenz (TU,IRF, MR) CLEF-IP 2011 September 19, 2010 11 / 12

  12. Thank You Thank You CLEF-IP Workshop, Tuesday Our ‘Supporters’ EU Network of Excellence PROMISE (FP7-258191) Austrian Research Promotion Agency (FFG) FIT-IT project IMPEX (No. 825846) www.ir-facility.org/clef-ip www.ifs.tuwien.ac.at/~clef-ip/ { piroi, lupu, hanbury } @ifs.tuwien.ac.at m.lupu@ir-facility.org veronika.zenz@max-recall.com Piroi, Lupu, Hanbury, Zenz (TU,IRF, MR) CLEF-IP 2011 September 19, 2010 12 / 12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend