Extraction of Authors Definitions Using Indexed Reference - - PowerPoint PPT Presentation

extraction of author s definitions using indexed
SMART_READER_LITE
LIVE PREVIEW

Extraction of Authors Definitions Using Indexed Reference - - PowerPoint PPT Presentation

Background Framework Implementation Demo Evaluation Conclusion Extraction of Authors Definitions Using Indexed Reference Identification Marc Bertin, Iana Atanassova and Jean-Pierre Descl es Paris-Sorbonne University, LaLIC Laboratory


slide-1
SLIDE 1

Background Framework Implementation Demo Evaluation Conclusion

Extraction of Author’s Definitions Using Indexed Reference Identification

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es

Paris-Sorbonne University, LaLIC Laboratory

18 September 2009, RANLP 2009

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-2
SLIDE 2

Background Framework Implementation Demo Evaluation Conclusion

Outline

1 Background 2 Framework 3 Implementation 4 Demo 5 Evaluation 6 Conclusion

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-3
SLIDE 3

Background Framework Implementation Demo Evaluation Conclusion

1 Background 2 Framework 3 Implementation 4 Demo 5 Evaluation 6 Conclusion

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-4
SLIDE 4

Background Framework Implementation Demo Evaluation Conclusion

Studies on definition in the LaLIC laboratory (E. Cartier, 2004; T. Hacene 2008; C. Teissedre 2008) Implementation: several tools for segmentation and semantic annotation:

SegaTex: G. Mourad 2001, B. Djioua 2006; Excom annotation platform: B. Djioua and J.-P. Descles 2006,

  • M. Alrahabi 2008.

Work in the field of Bibliosemantics (M. Bertin 2006-2009): identification and annotation of relations between authors based on bibliographic links.

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-5
SLIDE 5

Background Framework Implementation Demo Evaluation Conclusion

1 Background 2 Framework 3 Implementation 4 Demo 5 Evaluation 6 Conclusion

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-6
SLIDE 6

Background Framework Implementation Demo Evaluation Conclusion

Our aim is to establish links between authors by using indexed references in the text, and then identify the definitions and relate them to the authors. The method that we propose is based on the indexed references which allow us, in the case when we identify a definition in the research scope determined by the segmentation, to link this definition to the author cited in the text.

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-7
SLIDE 7

Background Framework Implementation Demo Evaluation Conclusion

Two relations:

1 relation between the definiendum, what is to be defined, and

the definiens, what defines it.

2 relation between the definition itself and the author.

We can associate a definition to an author. In this case we can talk about signed definitions. The bibliographic links give us a starting context or scope for the research of definitions.

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-8
SLIDE 8

Background Framework Implementation Demo Evaluation Conclusion

Linguistic Study of the Definition

We have used the semantic map proposed by T. Hacene (2008). In the implementation we have used a part of this semantic map according to our purpose. The linguistic study of our corpus has led us to a better understanding of the distinction between a definition and a definatory characteristic, which has been taken in consideration for the construction of our linguistic resources. We define a definatory characteristic as a sentence that gives

  • nly some essential properties of the defined object.

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-9
SLIDE 9

Background Framework Implementation Demo Evaluation Conclusion

Linguistic Study of the Definition

We have distinguished three sub-categories of the definatory characteristics:

1 identification 2 determined categorization 3 pseudo-definition

Two sub-categories of the definition:

1 general definitions 2 axiomatic definitions

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-10
SLIDE 10

Background Framework Implementation Demo Evaluation Conclusion

(Taouise Hacene, 2008)

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-11
SLIDE 11

Background Framework Implementation Demo Evaluation Conclusion

1 Background 2 Framework 3 Implementation 4 Demo 5 Evaluation 6 Conclusion

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-12
SLIDE 12

Background Framework Implementation Demo Evaluation Conclusion

Processing Overview

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-13
SLIDE 13

Background Framework Implementation Demo Evaluation Conclusion

Segmentation

Segmentation tools: SegaTex (G. Mourad, 2001; B. Djioua 2006), Excom-2 (M. Alrahabi, 2008) Segmentation into sentences, paragraphs, sections. Segmentation rules based on the punctuation and capitalisation. Different languages (French, English, Bulgarian, Arabic, ... ) Input: text files Output: DocBook format, UTF8 encoding

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-14
SLIDE 14

Background Framework Implementation Demo Evaluation Conclusion

Segmentation Output

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-15
SLIDE 15

Background Framework Implementation Demo Evaluation Conclusion

Processing Overview

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-16
SLIDE 16

Background Framework Implementation Demo Evaluation Conclusion

Processing Overview

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-17
SLIDE 17

Background Framework Implementation Demo Evaluation Conclusion

Indexed Reference Identification - 1

Norms: ISO-690, ISO 690-2, AFNOR NF Z 44-005, AFNOR NF Z 44-005-2 Examples: (Hoc, 1990a), (Thom, 1970), (Dingwall et al., 1995; Hartmann and G¨

  • rlich, 1995), [24], Pickett-Heaps et al.

(1990), (like other authors e.g. Raven, 1983), (Cwuc and SPRAGUE 1989), (18, 53, 56) Finite state automata and identification of known names entities

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-18
SLIDE 18

Background Framework Implementation Demo Evaluation Conclusion

Indexed Reference Identification - 2

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-19
SLIDE 19

Background Framework Implementation Demo Evaluation Conclusion

Annotation

Automatic annotation through exploration of the context: The Contextual Exploration Method (Descl´ es, 1997, 2006) Based on linguistic resources, which are manually constructed Resources: surface linguistic markers (indicators and clues) and contextual exploration rules

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-20
SLIDE 20

Background Framework Implementation Demo Evaluation Conclusion

Annotation

Excom annotation system (B. Djioua, 2006; M. Alrahabi, 2008). Available online: www.excom.fr Input: segmented XML files Output: annotated XML files

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-21
SLIDE 21

Background Framework Implementation Demo Evaluation Conclusion

Contextual Exploration Rule: Example

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-22
SLIDE 22

Background Framework Implementation Demo Evaluation Conclusion

Annotated sentence: Example

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-23
SLIDE 23

Background Framework Implementation Demo Evaluation Conclusion

What can we do with the annotations? Information retrieval of definitions. Identify the definitions of a given notion. Sometimes the same notion has several different definitions, esp. in humanitarian sciences. For a given keyword, identify the domains in which it is used. Find the definitions related to an author.

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-24
SLIDE 24

Background Framework Implementation Demo Evaluation Conclusion

System Overview: Interface and Exploitation

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-25
SLIDE 25

Background Framework Implementation Demo Evaluation Conclusion

1 Background 2 Framework 3 Implementation 4 Demo 5 Evaluation 6 Conclusion

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-26
SLIDE 26

Background Framework Implementation Demo Evaluation Conclusion

1 Background 2 Framework 3 Implementation 4 Demo 5 Evaluation 6 Conclusion

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-27
SLIDE 27

Background Framework Implementation Demo Evaluation Conclusion

Multilngual Corpora

Scientific texts and articles available online. French corpus: Intellectica (1991-2002), ALSIC, TALN, IRISA, 6 PhD theses. English corpus: 116 articles from Nature, Journal of Cell Science, Biophysical Journal, Proceedings of the National Academy of Sciences, The Journal of Cell Biology, and others (J. Descl´ es 2008). Corpus Sentences Annotated Sentences Percentage French 119410 5976 5,00 % English 38378 1743 4,54 % Total 157788 7719 4,89 %

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-28
SLIDE 28

Background Framework Implementation Demo Evaluation Conclusion

Precision and Recall

Evaluation set: random sample of 500 sentences from our corpora. Results for the identification of indexed references: Recall Precision F-measure 0,911% 0,989% 0,9483

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-29
SLIDE 29

Background Framework Implementation Demo Evaluation Conclusion

Cohen’s Weighted Kappa

The test Kappa, proposed by Cohen, provides a method to measure numerically the agreement between two or more

  • bservers or methods for judgments that are qualitative in

nature. This test consists in carrying out a session of ”concordance” between the judges in order to evaluate the rates of agreement between them. The agreement between the judges is defined as the conformity of two or more informations concerning the same

  • bject.

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-30
SLIDE 30

Background Framework Implementation Demo Evaluation Conclusion

We have constituted a base of annotated text segments and they have been evaluated independently by two human judges. The judges had to classify the segments into two categories: correct and incorrect. Evaluation set: 50 sentences.

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-31
SLIDE 31

Background Framework Implementation Demo Evaluation Conclusion

Judge A Judge B Answers Correct Incorrect Total Correct 33 5 38 Incorrect 3 9 12 Total 36 14 50 κ = 0, 6515 (substantial agreement)

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-32
SLIDE 32

Background Framework Implementation Demo Evaluation Conclusion

1 Background 2 Framework 3 Implementation 4 Demo 5 Evaluation 6 Conclusion

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-33
SLIDE 33

Background Framework Implementation Demo Evaluation Conclusion

Future Work

Construction of the linguistic resources to cover the rest of the semantic map. Extend the corpora and include articles in different domains. Transmit the linguistic resources to other languages and test the annotation.

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-34
SLIDE 34

Background Framework Implementation Demo Evaluation Conclusion

Related Work

Work on semantic annotation for different tasks: events, citations, localisation, hypotheses, ... (Fl. le Priol, M. Alrahabi, B. Djioua) Bibliosemantics (M. Bertin, 2008) Automatic summarisation (A. Blais, 2007) Information retrieval (I. Atanassova, 2008) Processing of scientific texts in biology (BioExcom) (J. Descl´ es, 2008)

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-35
SLIDE 35

Background Framework Implementation Demo Evaluation Conclusion

Bibliography

J.-P. Descl´ es, 2006, Contextual exploration processing for discourse automatic annotations of texts. FLAIRS-19. Invited Speaker.

  • E. Cartier, 2004, Rep´

erage automatique des expressions d´ efinitoires : mod´ elisation de l’information d´ efinitoire, M´ ethode d’exploration contextuelle, M´ ethodologie de d´ eveloppement des ressources linguistiques, Description des expressions du fran¸ cais contemporain; PhD thesis, under the direction of J.-P. Descl´ es, Paris-Sorbonne University

  • T. Hacene, 2008, Comment extraire des d´

efinitions des textes ?, Master’s thesis, Paris-Sorbonne University

  • C. Teissedre, B. Djioua and J.-P. Descl´

es, 2008, Automatic Retrieval of Definitions in Texts, in Accordance with a General Linguistic Ontology, FLAIRS-21, Florida, pp. 518-523

Marc Bertin, Iana Atanassova and Jean-Pierre Descl´ es Paris-Sorbonne University, LaLIC Laboratory RANLP 2009

slide-36
SLIDE 36

Thank you for your attention!

Further information:

marc.bertin@paris-sorbonne.fr iana.atanassova@paris-sorbonne.fr jean-pierre.descles@paris-sorbonne.fr