Processing the Scope of Negation and Modality Cues in Biomedical - PowerPoint PPT Presentation

Processing the Scope of Negation and Modality Cues in Biomedical Texts Roser Morante, Walter Daelemans CNTS-Language Technology Group University of Antwerp

Framework • The BIOGRAPH project (www.biograph.be) University of Antwerp: - Text Mining: CNTS, Department of Linguistics, Walter Daelemans - Data Mining: ADReM, Department of Mathematics and Computer Science Bart Goethals - Genetics: AMG, Department of Molecular Genetics, Jurgen Del-Favero 1

Framework • The BIOGRAPH project aims at: - Assisting researchers in ranking candidate disease causing genes by putting forward a new methodology for combined text analysis and data mining from heterogeneous information sources - Mining biomedical texts: providing accurate relations automatically extracted from text and weighted according to their reliability • Treatment of negation, modality and quantification 2

Framework The BIOGRAPH flow 3

Gene Prioritization • Candidate region - Gene responsible for a disease (e.g. schizophrenia or Alzheimer) is in a known area of the genome - Many genes (> 200) are in this candidate region • Experimental validation is needed - Very expensive in time and cost • Combine information in literature and in databases! - Which genes in the candidate region could be most relevant for the disease and why? - Provide a prioritization (ranking problem) 4

Event Extraction MEDLINE:7747440 Epstein-Barr virus replicative gene transcription during de novo infection of human thymocytes: simultaneous early expression of BZLF-1 and its repressor RAZ. Epstein-Barr virus (EBV) is known to infect B cells and epithelial cells. We and others have shown that EBV can also infect a subset of thymocytes. Infection of thymocytes was accompanied by the appearance of linear EBV genome within 8 hr of infection. Circularization of the EBV genome was not detected. This is in contrast to the infection in B cells where the genome can circularize within 24 hr of infection. The appearance of the BamHI ZLF-1 gene product, ZEBRA, by RT-PCR, was observed within 8 hr of infection. The appearance of a novel fusion transcript (RAZ), which comprised regions of the BZLF-1 locus and the adjacent BRLF-1 locus, was detected by RT-PCR. ZEBRA protein was also identified in infected thymocytes by immunoprecipitation. In addition, we demonstrated that the EBNA-1 gene in infected thymocytes was transcribed from the Fp promoter, rather than from the Cp/Wp promoter which is used in latently infected B cells. Transcripts encoding gp350/220, the major coat protein of EBV, were identified, but we did not find any evidence of transcription from the LMP-2A or EBER-1 loci in infected thymocytes. These observations suggest that de novo EBV infection of thymocytes differs from infection of B cells. The main difference is that with thymocytes, no evidence could be found that the virus ever circularizes. Rather, EBV remains in a linear configuration from which replicative genes are transcribed. 5

Event Extraction MEDLINE:7747440 ... In addition, we demonstrated that the EBNA-1 gene in infected thymocytes was transcribed from the Fp promoter, rather than from the Cp/Wp promoter which is used in latently infected B cells. Transcripts encoding gp350/220, the major coat protein of EBV, were identified, but we did not find any evidence of transcription from the LMP-2A or EBER-1 loci in infected thymocytes. These observations suggest that de novo EBV infection of thymocytes differs from infection of B cells. <event id="E10" source="7747440" neg="1" spec="1"> <predicate type="Transcription" begin="1216" end="1229"> transcription </predicate> <patient type="Theme" begin="1239" end="1245"> LMP-2A </patient> </event> 6

Contents • Motivation • Negation - Task description - Related work - Corpus - System description - Results • Modality - Related work - Results • Negation vs. modality • Conclusions • Further Research 7

Motivation • Extracted information that falls in the scope of hedge or negation cues cannot be presented as factual information • Vincze et al. (2008) report that 17.70% of the sentences in the BioScope corpus contain hedge cues and 13 % negation cues • Light et al. (2004) estimate that 11% of sentences in MEDLINE abstracts contain speculative fragments 8

Finding the scope of negation • Finding the scope of a negation cue means determining at a sentence level which words in the sentence are affected by the negation(s) Analysis at the phenotype and genetic level showed that lack of CD5 expression was due neither to segregation of human autosome 11 , on which the CD5 gene has been mapped, nor to deletion of the CD5 structural gene . 9

Related work • Most of the related work focuses on detecting whether a term is negated or not - Rule or regular expression based systems like NegEx (Chapman et al. 2001) and NegFinder (Mutalik et al. 2001) - Machine learning systems like Averbuch et al. (2004) - Huang and Lowe (2007) develop a hybrid system that combines regular expression matching with parsing in order to locate negated concepts 10

Corpus 11

Corpus • Medical and biological texts annotated with information about negation and speculation PMA treatment, and <xcope id=“X1.4.1”> <xcope id=“X1.4.1”> <cue type=“negation'' ref="X1.4.1"> not <cue type=“negation'' ref="X1.4.1"> </cue> retinoic acid treatment of the U937 </cue> cells </xcope> acts in inducing NF-KB </xcope> expression in the nuclei. • Corpora Clinical Papers Abstracts #Docs. 1954 9 1273 #Sent. 6383 2670 11871 #Words 41985 60935 282243 12

Experimental Setting • Abstracts corpus: - 10 fold cross-validation experiments • Clinical and papers corpora: robustness test - Training on abstracts - Testing on clinical and papers 13

System Description • We model the scope finding task as two consecutive classification tasks: - Finding negation cues: a token is classified as being at the beginning of a negation signal, inside or outside - Finding the scope: a token is classified as being the first element of a scope sequence, the last, or neither • Supervised machine learning approach 14

System Architecture 15

Preprocessing 16

Finding Negation Cues • We filter out negation cues that are unambiguous in the training corpus (17 out of 30) • For the rest, a classifier predicts whether a token is the first token of a negation signal, inside or outside of it - Algorithm : IGTREE as implemented in TiMBL (Daelemans et al. 2007) - Instances represent all tokens in a sentence - Features about the token in focus and its context 17

Features negation cue finding • Of the token - Lemma, word, POS and IOB chunk tag • Of the token context - Word, POS and IOB chunk tag of 3 tokens to the right and 3 to the left 18

Ambiguous Negation Cues In Abstracts Corpus 19

Results • Baseline: tagging as negation signals tokens that are BASELINE negation signals at least in 50% of the occurrences in TOKENS the training corpus absence, absent, cannot, could BASELINE PREC RECALL F1 IAA not, fail, Abstracts 82.00 95.17 88.09 94.46 failure, impossible, Papers 84.01 92.46 88.03 79.42 instead of, lack, miss, Clinical 97.31 97.53 97.42 90.70 neither, never, no, none, nor, not, rather SYSTEM PREC RECALL F1 than, unable, with the Abstracts 84.72 98.75 91.20 (+3.11) exception of, Papers 87.18 95.72 91.25 (+3.22) without Clinical 97.33 98.09 97.71 (+0.29) 20

Results system vs. baseline in abstracts corpus • The system performs better 21

Results in the three corpora • The system is portable 22

Discussion • Cause of lower recall on papers corpus: NOT % negation % classified signals correctly Abstracts 58.89 98.25 Papers 53.22 93.68 Clinical 6.72 91.22 • Errors: not is classified as negation signal However, programs for tRNA identification [...] do not necessarily perform well on unknown ones The evaluation of this ratio is difficult because not all true interactions are known 23

Finding Scopes • Three classifiers predict whether a token is the first token in the scope sequence, the last or neither - MBL (Daelemans et al. 2007) - SVM light (Joachims 1999) - CRF++ (Lafferty et al. 2001) • A fourth classifier predicts the same taking as input the output of the previous classifiers - CRF++ • The features used by the object classifiers and the metalearner are different 24

Finding Scopes 25

Finding Scopes • Previous attempts: lower results - Chunk-based classification, instead of word-based - BIO classification of tokens (EMNLP’08) instead of FOL (First, Other, Last) - Single classifier approach, instead of metalearner 26

Processing the Scope of Negation and Modality Cues in Biomedical - PowerPoint PPT Presentation

Processing the Scope of Negation and Modality Cues in Biomedical Texts Roser Morante, Walter Daelemans CNTS-Language Technology Group University of Antwerp Framework The BIOGRAPH project (www.biograph.be) University of Antwerp: - Text

Stratied Negation Negation wrapp ed inside a recursion mak es no sense. Ev

Identifying Negation in the DGS Corpus Graz, 2019-05-03 Marc Schulder, Thomas Hanke

Relationship between imagery modality and dominant sensory modality of the task Robin Nicolas

1 + 1 > 2? Getting More Out of Multi-Modality Imaging Matthias J. Ehrhardt September 26, 2019

Reinforcement Learning of Reinforcement Learning of Affordance Cues Affordance Cues Final

Subminimal Logics and Relativistic Negation Satoru Niki School of Information Science, JAIST

Today Closed World Assumption & Negation as Failure. Clark completion Lloyd-Topor

Double Negation Translations as Morphisms Olivier Hermant CRI, MINES ParisTech December 1, 2014

Variable Negation Strategy Decision Table-Based Testing Variable Negation Strategy An

Logic Programming Theory Lecture 7: Negation as Failure Richard Mayr School of Informatics 6th

Variable Negation Strategy Decision Table-Based Testing Variable Negation Strategy An

Double Negation Translations as Morphisms Olivier Hermant CRI, MINES ParisTech December 12, 2014

The processing cost of weak modality and consequences for child production and typology Paloma

Acoustic Cues Used by Learners of English Danica Reid Phonological Processing Lab Simon Fraser

1 2 Transplantation is the preferred modality, and has many advantages and benefits over dialysis

Scope A scope is a textual region of the program in which a (name-to-object) binding is CSC

Science 2013 Goal Iden1fy and priori1ze biological variables as MBON products Priori%za%on

in Human Blood Serum Svitlana D. Zagorodnya, Anna V. Golovan, Galina V. Baranova, Tetyana V.

3/18/2019 Disclosures I have no disclosures. Herpesviruses 40 th Annual Advances in

Set02 - Data STAT 401 (Engineering) - Iowa State University January 13, 2017 (STAT330@ISU)

SDVOSB Program First Wednesday Virtual Learning Series 2018 Hosts Christopher Eischen,

The Value-added Structure of Gross Exports and Global Production Network Robert Koopman and Zhi

! th Lt, 3L! a.y =o l" +o, l'+ Repe.leJ .l toc rlr ch.rid c tt cr: t.G p.rh.'ar..

eXist XML Database Overview Leif-Jran Olsson Introduction Current development Sprkbanken,

Sambuz

Useful Links

Newsletter

Mail Us

Processing the Scope of Negation and Modality Cues in Biomedical - PowerPoint PPT Presentation

Processing the Scope of Negation and Modality Cues in Biomedical Texts Roser Morante, Walter Daelemans CNTS-Language Technology Group University of Antwerp Framework The BIOGRAPH project (www.biograph.be) University of Antwerp: - Text

Stratied Negation Negation wrapp ed inside a recursion mak es no sense. Ev

Identifying Negation in the DGS Corpus Graz, 2019-05-03 Marc Schulder, Thomas Hanke

Relationship between imagery modality and dominant sensory modality of the task Robin Nicolas

1 + 1 &gt; 2? Getting More Out of Multi-Modality Imaging Matthias J. Ehrhardt September 26, 2019

Reinforcement Learning of Reinforcement Learning of Affordance Cues Affordance Cues Final

Subminimal Logics and Relativistic Negation Satoru Niki School of Information Science, JAIST

Today Closed World Assumption &amp; Negation as Failure. Clark completion Lloyd-Topor

Double Negation Translations as Morphisms Olivier Hermant CRI, MINES ParisTech December 1, 2014

Variable Negation Strategy Decision Table-Based Testing Variable Negation Strategy An

Logic Programming Theory Lecture 7: Negation as Failure Richard Mayr School of Informatics 6th

Variable Negation Strategy Decision Table-Based Testing Variable Negation Strategy An

Double Negation Translations as Morphisms Olivier Hermant CRI, MINES ParisTech December 12, 2014

The processing cost of weak modality and consequences for child production and typology Paloma

Acoustic Cues Used by Learners of English Danica Reid Phonological Processing Lab Simon Fraser

1 2 Transplantation is the preferred modality, and has many advantages and benefits over dialysis

Scope A scope is a textual region of the program in which a (name-to-object) binding is CSC

Science 2013 Goal Iden1fy and priori1ze biological variables as MBON products Priori%za%on

in Human Blood Serum Svitlana D. Zagorodnya, Anna V. Golovan, Galina V. Baranova, Tetyana V.

3/18/2019 Disclosures I have no disclosures. Herpesviruses 40 th Annual Advances in

Set02 - Data STAT 401 (Engineering) - Iowa State University January 13, 2017 (STAT330@ISU)

SDVOSB Program First Wednesday Virtual Learning Series 2018 Hosts Christopher Eischen,

The Value-added Structure of Gross Exports and Global Production Network Robert Koopman and Zhi

! th Lt, 3L! a.y =o l&quot; +o, l'+ Repe.leJ .l toc r*lr ch.rid c tt cr: t.G p.rh.'ar..*

eXist XML Database Overview Leif-Jran Olsson Introduction Current development Sprkbanken,

Sambuz

Useful Links

Newsletter

Mail Us

1 + 1 > 2? Getting More Out of Multi-Modality Imaging Matthias J. Ehrhardt September 26, 2019

Today Closed World Assumption & Negation as Failure. Clark completion Lloyd-Topor

! th Lt, 3L! a.y =o l" +o, l'+ Repe.leJ .l toc rlr ch.rid c tt cr: t.G p.rh.'ar..