Towards Employing Multilingual Term Resources for Intelligent - - PowerPoint PPT Presentation
Towards Employing Multilingual Term Resources for Intelligent - - PowerPoint PPT Presentation
Towards Employing Multilingual Term Resources for Intelligent Patents Search Galia Angelova and Irina Temnikova Institute of Information and Communication Technologies (IICT) Bulgarian Academy of Sciences Presentation Structure
2
Presentation Structure
- Introduction\Motivations
- Related Work
- Our Approach
3
Work Motivations
4
Introduction/Motivations 1
- 1. 40 millions patents available electronically
- 2. Search mainly limited to:
– keywords – boolean operators – proximity – truncation/wildcards
- 3. Manual query formulation
(average of 5 minutes per query)
5
Introduction/Motivations 2
- 4. Up to 40 hours for 15 queries in 100 documents
(average 12 hours) [Joho, H., 2010]
- 5. Limited contribution of Natural Language Pro-
cessing (NLP)
- 6. Multilingual search restricted to
category numbers
6
Related work
7
Related Work 1
- 1. NLP for Patents – Various applications:
– Patents language analysis [Sheremetyeva, S., 2003;
Lamirel, J.-Ch. et al., 2003; Hsin-hung Lin, D. et al., 2010]
– Patents readability improvement [Shinmori, A. et al.,
2010]
– Patents text generation [Sheremetyeva, S. et al., 1996] – Text classification [Nanba, H. el al., 2009] – Patents translation [Waeschle, K., et al., 2011; Orsnes, B.
et al., 1996; Choi, S.-K., et al., 2007]
8
Related Work 2
- 1. NLP for Patents - Patents search:
– Document retrieval using Differential Latent Semantic Index
and Template Matching Technique [Chen, L.. et al., 2001]
– Patents, publications and persons network detection [Li, H.,
et al., 2011]
– The role of spelling errors in patents search [Stein, B., et al.,
2012]
– Query term distillation [Itoh, H., et al., 2005] – Transforming a patent application into a search query [Xue,
X., et al., 2009]
– Query expansion with synonyms using WordNet [Magdy, W.,
et al., 2011]
9
Related Work 3
- Use of Wikipedia in NLP
– Text Simplification and Machine Translation [Coster, W. et
al., 2011; Wubben, S., 2012]
– Question Answering [Dornescu, I., 2012] – Word Sense Disambiguation in Wikipedia [Ratinov, L.,
2011]
– Untangling Wikipedia cross-lingual links [De Melo, G., et
al., 2010]
– Relation extraction [Yan, Y. et al., 2009] – Named Entity disambiguation using W. [Cucerzan, S., 2007]
10
Our Approach
11
Our Approach 1
- Aims:
– Improve patents search – Make possible multilingual search
- How:
– By applying NLP techniques – With the help of a large external multilingual
terminological resource (extracted from Wikipedia)
12
Our Approach 2
- Expanding search by:
– Annotation/indexing of patent applications – Adding term equivalents to the search query
itself
- Types of terms added:
– Synonyms – Paraphrases – Multilingual equivalents
13
Our Approach 3
- Recognition of the correct sense of the term:
– Patent classification labels
+
– Wikipedia title pages controlled by experts
+
– Wikipedia articles long texts
14
Our Approach 5
Patent terms disambiguation using Wikipedia
15
Our Approach 4
International Patent Classification (IPC) terms
16
Our Approach 5
The Wikipedia “Computer file” article
17
Our Approach 6
NLP term enrichment details: 1.Extraction of synonyms
- textual markers “also known as”
- Wikipedia “redirect” pages {{Redirect|term}}
2.Noun phrase paraphrases
- “Date of birth”= “Birthdate”
18
Our Approach 7
NLP term enrichment details: 3.Multilingual equivalents
En: Childbirth
- French:
accouchement, travail, naissance, parturition
- Italian: parto
19
Our Approach 8
Multilingual indexation of patents terms.
20
Summary 1
- Improving patents search with NLP techniques
- Using a large terminological controlled and
regularly updated resource (Wikipedia)
- Challenges: term sense disambiguation
– Solutions:
- Patents classification categories
- Wikipedia articles large texts
21
Summary 2
- Patent applications annotation + search ex-
pansion with:
– Synonyms – NP paraphrases – Multilingual equivalents
22
Main References
- D. Hunt, D., L. Nguyen, and M. Rodgers (Eds.). Patent searching: tools
and techniques. Wiley, 2007. Joho, H., L. A. Azzopardi and W. Vanderbauwhede. A survey of patent users: an analysis of tasks, behavior, search functionality and system
- requirements. In Proceedings of the third symposium on Information
interaction in context, ACM, 2010, pp. 13-24. Lupu, M., K. Mayer, J. Tait, and A. J. Trippe. (Eds.) Current challenges in patent information retrieval. The Information Retrieval Series, Vol. 29, Springer, 2011.
23