Towards Employing Multilingual Term Resources for Intelligent - - PowerPoint PPT Presentation

towards employing multilingual term resources for
SMART_READER_LITE
LIVE PREVIEW

Towards Employing Multilingual Term Resources for Intelligent - - PowerPoint PPT Presentation

Towards Employing Multilingual Term Resources for Intelligent Patents Search Galia Angelova and Irina Temnikova Institute of Information and Communication Technologies (IICT) Bulgarian Academy of Sciences Presentation Structure


slide-1
SLIDE 1

Galia Angelova and Irina Temnikova Institute of Information and Communication Technologies (IICT) Bulgarian Academy of Sciences

Towards Employing Multilingual Term Resources for Intelligent Patents Search

slide-2
SLIDE 2

2

Presentation Structure

  • Introduction\Motivations
  • Related Work
  • Our Approach
slide-3
SLIDE 3

3

Work Motivations

slide-4
SLIDE 4

4

Introduction/Motivations 1

  • 1. 40 millions patents available electronically
  • 2. Search mainly limited to:

– keywords – boolean operators – proximity – truncation/wildcards

  • 3. Manual query formulation

(average of 5 minutes per query)

slide-5
SLIDE 5

5

Introduction/Motivations 2

  • 4. Up to 40 hours for 15 queries in 100 documents

(average 12 hours) [Joho, H., 2010]

  • 5. Limited contribution of Natural Language Pro-

cessing (NLP)

  • 6. Multilingual search restricted to

category numbers

slide-6
SLIDE 6

6

Related work

slide-7
SLIDE 7

7

Related Work 1

  • 1. NLP for Patents – Various applications:

– Patents language analysis [Sheremetyeva, S., 2003;

Lamirel, J.-Ch. et al., 2003; Hsin-hung Lin, D. et al., 2010]

– Patents readability improvement [Shinmori, A. et al.,

2010]

– Patents text generation [Sheremetyeva, S. et al., 1996] – Text classification [Nanba, H. el al., 2009] – Patents translation [Waeschle, K., et al., 2011; Orsnes, B.

et al., 1996; Choi, S.-K., et al., 2007]

slide-8
SLIDE 8

8

Related Work 2

  • 1. NLP for Patents - Patents search:

– Document retrieval using Differential Latent Semantic Index

and Template Matching Technique [Chen, L.. et al., 2001]

– Patents, publications and persons network detection [Li, H.,

et al., 2011]

– The role of spelling errors in patents search [Stein, B., et al.,

2012]

– Query term distillation [Itoh, H., et al., 2005] – Transforming a patent application into a search query [Xue,

X., et al., 2009]

– Query expansion with synonyms using WordNet [Magdy, W.,

et al., 2011]

slide-9
SLIDE 9

9

Related Work 3

  • Use of Wikipedia in NLP

– Text Simplification and Machine Translation [Coster, W. et

al., 2011; Wubben, S., 2012]

– Question Answering [Dornescu, I., 2012] – Word Sense Disambiguation in Wikipedia [Ratinov, L.,

2011]

– Untangling Wikipedia cross-lingual links [De Melo, G., et

al., 2010]

– Relation extraction [Yan, Y. et al., 2009] – Named Entity disambiguation using W. [Cucerzan, S., 2007]

slide-10
SLIDE 10

10

Our Approach

slide-11
SLIDE 11

11

Our Approach 1

  • Aims:

– Improve patents search – Make possible multilingual search

  • How:

– By applying NLP techniques – With the help of a large external multilingual

terminological resource (extracted from Wikipedia)

slide-12
SLIDE 12

12

Our Approach 2

  • Expanding search by:

– Annotation/indexing of patent applications – Adding term equivalents to the search query

itself

  • Types of terms added:

– Synonyms – Paraphrases – Multilingual equivalents

slide-13
SLIDE 13

13

Our Approach 3

  • Recognition of the correct sense of the term:

– Patent classification labels

+

– Wikipedia title pages controlled by experts

+

– Wikipedia articles long texts

slide-14
SLIDE 14

14

Our Approach 5

Patent terms disambiguation using Wikipedia

slide-15
SLIDE 15

15

Our Approach 4

International Patent Classification (IPC) terms

slide-16
SLIDE 16

16

Our Approach 5

The Wikipedia “Computer file” article

slide-17
SLIDE 17

17

Our Approach 6

NLP term enrichment details: 1.Extraction of synonyms

  • textual markers “also known as”
  • Wikipedia “redirect” pages {{Redirect|term}}

2.Noun phrase paraphrases

  • “Date of birth”= “Birthdate”
slide-18
SLIDE 18

18

Our Approach 7

NLP term enrichment details: 3.Multilingual equivalents

En: Childbirth

  • French:

accouchement, travail, naissance, parturition

  • Italian: parto
slide-19
SLIDE 19

19

Our Approach 8

Multilingual indexation of patents terms.

slide-20
SLIDE 20

20

Summary 1

  • Improving patents search with NLP techniques
  • Using a large terminological controlled and

regularly updated resource (Wikipedia)

  • Challenges: term sense disambiguation

– Solutions:

  • Patents classification categories
  • Wikipedia articles large texts
slide-21
SLIDE 21

21

Summary 2

  • Patent applications annotation + search ex-

pansion with:

– Synonyms – NP paraphrases – Multilingual equivalents

slide-22
SLIDE 22

22

Main References

  • D. Hunt, D., L. Nguyen, and M. Rodgers (Eds.). Patent searching: tools

and techniques. Wiley, 2007. Joho, H., L. A. Azzopardi and W. Vanderbauwhede. A survey of patent users: an analysis of tasks, behavior, search functionality and system

  • requirements. In Proceedings of the third symposium on Information

interaction in context, ACM, 2010, pp. 13-24. Lupu, M., K. Mayer, J. Tait, and A. J. Trippe. (Eds.) Current challenges in patent information retrieval. The Information Retrieval Series, Vol. 29, Springer, 2011.

slide-23
SLIDE 23

23

Thank you! Any comments/advices?