Text Mining on Clinical Data Robert McHardy Outline Motivation - PowerPoint PPT Presentation

Institut für Maschinelle Sprachverarbeitung Text Mining on Clinical Data Robert McHardy

Outline • Motivation • Medical Entity Recognition • Anonymization of Medical Reports • Knowledge-based Biomedical Word Sense Disambiguation • Extraction of Potential Adverse Drug Events • Resources Universität Stuttgart 5.12.2017 2

Motivation — Different Users Universität Stuttgart 5.12.2017 3

Motivation — Why do we need Text Mining on Clinical Data? • Doctors need to know if a drug is safe to use or not Universität Stuttgart 5.12.2017 4

Motivation — Why do we need Text Mining on Clinical Data? • Doctors need to know if a drug is safe to use or not • As fast as possible Universität Stuttgart 5.12.2017 4

Motivation — Why do we need Text Mining on Clinical Data? • Doctors need to know if a drug is safe to use or not • As fast as possible • We don‘t want to suffer from unsafe drugs Universität Stuttgart 5.12.2017 4

Motivation — Why do we need Text Mining on Clinical Data? • Doctors need to know if a drug is safe to use or not • As fast as possible • We don‘t want to suffer from unsafe drugs • Researchers want to use the data Universität Stuttgart 5.12.2017 4

Motivation — Why do we need Text Mining on Clinical Data? • Doctors need to know if a drug is safe to use or not • As fast as possible • We don‘t want to suffer from unsafe drugs • Researchers want to use the data • It has to be anonymized Universität Stuttgart 5.12.2017 4

Motivation — PubMed, again! Universität Stuttgart 5.12.2017 5

Unified Medical Language System Metathesaurus (UMLS) Universität Stuttgart 5.12.2017 6

Medical Entity Recognition — Overview • Abacha and Zweigenbaum: Consists of two parts • Detecting phrases referring to medical entities • Assigning semantic categories to the found entities Universität Stuttgart 5.12.2017 7

Medical Entity Recognition — Overview Universität Stuttgart 5.12.2017 8

Medical Entity Recognition — Overview Type 1 diabetes T1D Diabetes type 1 IDDM Juvenile diabetes Universität Stuttgart 5.12.2017 8

Medical Entity Recognition — Noun Phrase Chunking Pharmacodynamic studies, including positron-emission tomography (PET) and computed tomography (CT) […] Universität Stuttgart 5.12.2017 9

Medical Entity Recognition — Noun Phrase Chunking Pharmacodynamic studies, including positron-emission tomography (PET) and computed tomography (CT) […] • Many tools for NP chunking available Universität Stuttgart 5.12.2017 9

Medical Entity Recognition — Noun Phrase Chunking Pharmacodynamic studies, including positron-emission tomography (PET) and computed tomography (CT) […] • Many tools for NP chunking available • Maximum recall is desired Universität Stuttgart 5.12.2017 9

Medical Entity Recognition — Noun Phrase Chunking Pharmacodynamic studies, including positron-emission tomography (PET) and computed tomography (CT) […] • Many tools for NP chunking available • Maximum recall is desired • Open-domain tools like IMS‘ TreeTagger are suitable Universität Stuttgart 5.12.2017 9

Medical Entity Recognition — MetaMap and the UMLS • MetaMap is a tool which maps noun phrases in raw text to UMLS concepts • This is done according to a matching score Universität Stuttgart 5.12.2017 10

Medical Entity Recognition — MetaMap and the UMLS • Three problems with MetaMap Universität Stuttgart 5.12.2017 11

Medical Entity Recognition — MetaMap and the UMLS • Three problems with MetaMap • Noun chunking performance is worse than with specialized NLP tools Universität Stuttgart 5.12.2017 11

Medical Entity Recognition — MetaMap and the UMLS • Three problems with MetaMap • Noun chunking performance is worse than with specialized NLP tools • Medical entity detection often finds verbs and general words which aren‘t MEs Universität Stuttgart 5.12.2017 11

Medical Entity Recognition — MetaMap and the UMLS • Three problems with MetaMap • Noun chunking performance is worse than with specialized NLP tools • Medical entity detection often finds verbs and general words which aren‘t MEs • Some ambiguity is left Universität Stuttgart 5.12.2017 11

Medical Entity Recognition — MetaMap and the UMLS • Three problems with MetaMap • Noun chunking performance is worse than with specialized NLP tools • Medical entity detection often finds verbs and general words which aren‘t MEs • Some ambiguity is left • UMLS can provide several concepts for a term Universität Stuttgart 5.12.2017 11

Medical Entity Recognition — MetaMap and the UMLS • Three problems with MetaMap • Noun chunking performance is worse than with specialized NLP tools • Medical entity detection often finds verbs and general words which aren‘t MEs • Some ambiguity is left • UMLS can provide several concepts for a term • and several semantic categories for a concept Universität Stuttgart 5.12.2017 11

Medical Entity Recognition — MetaMap and the UMLS Pharmacodynamic studies, including positron-emission tomography (PET) and computed tomography (CT) […] Cold temperature Common cold Cold ( term) Cold storage ( term) Cold storage Chronic obstructive lung disease Universität Stuttgart 5.12.2017 12

Medical Entity Recognition — MetaMap+ • Use tools like TreeTagger for the NP chunking Universität Stuttgart 5.12.2017 13

Medical Entity Recognition — MetaMap+ • Use tools like TreeTagger for the NP chunking • Filter NPs with a stop-word list Universität Stuttgart 5.12.2017 13

Medical Entity Recognition — MetaMap+ • Use tools like TreeTagger for the NP chunking • Filter NPs with a stop-word list • Search in specialized lists for candidate terms Universität Stuttgart 5.12.2017 13

Medical Entity Recognition — MetaMap+ • Use tools like TreeTagger for the NP chunking • Filter NPs with a stop-word list • Search in specialized lists for candidate terms • Annotate entities with MetaMap Universität Stuttgart 5.12.2017 13

Medical Entity Recognition — MetaMap+ • Use tools like TreeTagger for the NP chunking • Filter NPs with a stop-word list • Search in specialized lists for candidate terms • Annotate entities with MetaMap • Filter frequent errors and too broad semantic types Universität Stuttgart 5.12.2017 13

Medical Entity Recognition — MetaMap+ • Voting mechanism to disambiguate semantic categories Universität Stuttgart 5.12.2017 14

Medical Entity Recognition — Support Vector Machines (SVMs) • Word level features: • words of the NP • number of words of the NP • window of words around the NP • Orthographical features: • first letter capitalized • all letters upper-/lowercase • contains abbreviation(s) • POS tags Universität Stuttgart 5.12.2017 15

Medical Entity Recognition — BIO-CRFs Pharmacodynamic studies, including positron-emission tomography (PET) and computed tomography (CT) […] • Words are annotated with the the tags B, I and O Universität Stuttgart 5.12.2017 16

Medical Entity Recognition — BIO-CRFs Pharmacodynamic studies, including positron-emission tomography (PET) and computed tomography (CT) […] • Words are annotated with the the tags B, I and O • B-x: Begin of a phrase of class x Universität Stuttgart 5.12.2017 16

Medical Entity Recognition — BIO-CRFs Pharmacodynamic studies, including positron-emission tomography (PET) and computed tomography (CT) […] • Words are annotated with the the tags B, I and O • B-x: Begin of a phrase of class x • I-x: Intermediate part of a phrase of class x Universität Stuttgart 5.12.2017 16

Medical Entity Recognition — BIO-CRFs Pharmacodynamic studies, including positron-emission tomography (PET) and computed tomography (CT) […] • Words are annotated with the the tags B, I and O • B-x: Begin of a phrase of class x • I-x: Intermediate part of a phrase of class x • O: Outside entities Universität Stuttgart 5.12.2017 16

Medical Entity Recognition — BIO-CRFs • Word level features: • The word itself • Window of words • Lemmas • Orthographical features: • Upper/lowercase • contains a digit • pre- and suffixes • POS tags • (Semantic category of word (provided by MetaMap+)) Universität Stuttgart 5.12.2017 17

Medical Entity Recognition — Evaluation • Corpus contains discharge summaries and progress notes • De-identified and annotated by hand • Entities: Problem, Treatment and Test • Overall 76,665 sentences Universität Stuttgart 5.12.2017 18

Medical Entity Recognition — Evaluation Setting Precision Recall F-Score MetaMap 15.52 16.10 15.80 MetaMap+ 48.68 56.46 52.28 SVM 43.65 47.16 45.33 BIO-CRF 70.15 83.31 76.17 BIO-CRF-Hybrid 72.18 83.78 77.55 Universität Stuttgart 20.01.2016 19

Anonymization of Medical Reports Universität Stuttgart 20.01.2016 20

Anonymization of Medical Reports — What is anonymization? • De-Identification Universität Stuttgart 5.12.2017 21

Anonymization of Medical Reports — What is anonymization? • De-Identification • Completely remove all personal health information Universität Stuttgart 5.12.2017 21

Text Mining on Clinical Data Robert McHardy Outline Motivation - PowerPoint PPT Presentation

Institut fr Maschinelle Sprachverarbeitung Text Mining on Clinical Data Robert McHardy Outline Motivation Medical Entity Recognition Anonymization of Medical Reports Knowledge-based Biomedical Word Sense Disambiguation

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Data Mining 2020 Text Classification Naive Bayes Ad Feelders Universiteit Utrecht Ad Feelders

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Text Mining Text Mining Web pages Emails Technical documents Corporate documents

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Data Mining in Bioinformatics Day 4: Text Mining Karsten Borgwardt February 25 to March 10

Stack Stack Heap Heap Data Data Text Text Program A Program B Stack Stack Text Heap

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

2. Text Mining D-BSSE Karsten Borgwardt Data Mining II Course, Basel Spring Semester 2016 118

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Enhancing ICANN Text Accountability 26 June 2014 Text #ICANN50 Text #ICANN50 Text #ICANN50

Add Your Title Here Replace your text here! Replace your text here! Insert your title here 1

Instant classroom response system on mobile Instant classroom response system on mobile devices

TRANSCRIPT About Dr. David Harrison After completing his medical studies, David founded the

As Assi sisted R Repr eproductive T e Tec echnology ( (AR ART) in n Eur urope 201 e

What Is Known About the Human Health Effects of Neonicotinoid Pesticides? Melissa Perry, ScD,

Disclaimer: The material herein is developed under NSF-NUE (Nanotechnology Undergraduate

Luis Mendez Hala Altunji Johnpaul Golinski Sol Ezra 5/4/2011 Executive Summary Problem:

Fecal Microbiota Transplantation in C. diff. colitis Benefits and Limitations Gerhard Rogler,

FRIENDS OF CHILDREN WITH CANCER - TANZANIA FRIENDS OF CHILDREN IN HOSPITALS F RIENDS OF C HILDREN