text mining on clinical data
play

Text Mining on Clinical Data Robert McHardy Outline Motivation - PowerPoint PPT Presentation

Institut fr Maschinelle Sprachverarbeitung Text Mining on Clinical Data Robert McHardy Outline Motivation Medical Entity Recognition Anonymization of Medical Reports Knowledge-based Biomedical Word Sense Disambiguation


  1. Institut für Maschinelle Sprachverarbeitung Text Mining on Clinical Data Robert McHardy

  2. Outline • Motivation • Medical Entity Recognition • Anonymization of Medical Reports • Knowledge-based Biomedical Word Sense Disambiguation • Extraction of Potential Adverse Drug Events • Resources Universität Stuttgart 5.12.2017 2

  3. Motivation — Different Users Universität Stuttgart 5.12.2017 3

  4. Motivation — Why do we need Text Mining on Clinical Data? • Doctors need to know if a drug is safe to use or not Universität Stuttgart 5.12.2017 4

  5. Motivation — Why do we need Text Mining on Clinical Data? • Doctors need to know if a drug is safe to use or not • As fast as possible Universität Stuttgart 5.12.2017 4

  6. Motivation — Why do we need Text Mining on Clinical Data? • Doctors need to know if a drug is safe to use or not • As fast as possible • We don‘t want to suffer from unsafe drugs Universität Stuttgart 5.12.2017 4

  7. Motivation — Why do we need Text Mining on Clinical Data? • Doctors need to know if a drug is safe to use or not • As fast as possible • We don‘t want to suffer from unsafe drugs • Researchers want to use the data Universität Stuttgart 5.12.2017 4

  8. Motivation — Why do we need Text Mining on Clinical Data? • Doctors need to know if a drug is safe to use or not • As fast as possible • We don‘t want to suffer from unsafe drugs • Researchers want to use the data • It has to be anonymized Universität Stuttgart 5.12.2017 4

  9. Motivation — PubMed, again! Universität Stuttgart 5.12.2017 5

  10. Unified Medical Language System Metathesaurus (UMLS) Universität Stuttgart 5.12.2017 6

  11. Medical Entity Recognition — Overview • Abacha and Zweigenbaum: Consists of two parts • Detecting phrases referring to medical entities • Assigning semantic categories to the found entities Universität Stuttgart 5.12.2017 7

  12. Medical Entity Recognition — Overview Universität Stuttgart 5.12.2017 8

  13. Medical Entity Recognition — Overview Type 1 diabetes T1D Diabetes type 1 IDDM Juvenile diabetes Universität Stuttgart 5.12.2017 8

  14. Medical Entity Recognition — Noun Phrase Chunking Pharmacodynamic studies, including positron-emission tomography (PET) and computed tomography (CT) […] Universität Stuttgart 5.12.2017 9

  15. Medical Entity Recognition — Noun Phrase Chunking Pharmacodynamic studies, including positron-emission tomography (PET) and computed tomography (CT) […] • Many tools for NP chunking available Universität Stuttgart 5.12.2017 9

  16. Medical Entity Recognition — Noun Phrase Chunking Pharmacodynamic studies, including positron-emission tomography (PET) and computed tomography (CT) […] • Many tools for NP chunking available • Maximum recall is desired Universität Stuttgart 5.12.2017 9

  17. Medical Entity Recognition — Noun Phrase Chunking Pharmacodynamic studies, including positron-emission tomography (PET) and computed tomography (CT) […] • Many tools for NP chunking available • Maximum recall is desired • Open-domain tools like IMS‘ TreeTagger are suitable Universität Stuttgart 5.12.2017 9

  18. Medical Entity Recognition — MetaMap and the UMLS • MetaMap is a tool which maps noun phrases in raw text to UMLS concepts • This is done according to a matching score Universität Stuttgart 5.12.2017 10

  19. Medical Entity Recognition — MetaMap and the UMLS • Three problems with MetaMap Universität Stuttgart 5.12.2017 11

  20. Medical Entity Recognition — MetaMap and the UMLS • Three problems with MetaMap • Noun chunking performance is worse than with specialized NLP tools Universität Stuttgart 5.12.2017 11

  21. Medical Entity Recognition — MetaMap and the UMLS • Three problems with MetaMap • Noun chunking performance is worse than with specialized NLP tools • Medical entity detection often finds verbs and general words which aren‘t MEs Universität Stuttgart 5.12.2017 11

  22. Medical Entity Recognition — MetaMap and the UMLS • Three problems with MetaMap • Noun chunking performance is worse than with specialized NLP tools • Medical entity detection often finds verbs and general words which aren‘t MEs • Some ambiguity is left Universität Stuttgart 5.12.2017 11

  23. Medical Entity Recognition — MetaMap and the UMLS • Three problems with MetaMap • Noun chunking performance is worse than with specialized NLP tools • Medical entity detection often finds verbs and general words which aren‘t MEs • Some ambiguity is left • UMLS can provide several concepts for a term Universität Stuttgart 5.12.2017 11

  24. Medical Entity Recognition — MetaMap and the UMLS • Three problems with MetaMap • Noun chunking performance is worse than with specialized NLP tools • Medical entity detection often finds verbs and general words which aren‘t MEs • Some ambiguity is left • UMLS can provide several concepts for a term • and several semantic categories for a concept Universität Stuttgart 5.12.2017 11

  25. Medical Entity Recognition — MetaMap and the UMLS Pharmacodynamic studies, including positron-emission tomography (PET) and computed tomography (CT) […] Cold temperature Common cold Cold ( term) Cold storage ( term) Cold storage Chronic obstructive lung disease Universität Stuttgart 5.12.2017 12

  26. Medical Entity Recognition — MetaMap+ • Use tools like TreeTagger for the NP chunking Universität Stuttgart 5.12.2017 13

  27. Medical Entity Recognition — MetaMap+ • Use tools like TreeTagger for the NP chunking • Filter NPs with a stop-word list Universität Stuttgart 5.12.2017 13

  28. Medical Entity Recognition — MetaMap+ • Use tools like TreeTagger for the NP chunking • Filter NPs with a stop-word list • Search in specialized lists for candidate terms Universität Stuttgart 5.12.2017 13

  29. Medical Entity Recognition — MetaMap+ • Use tools like TreeTagger for the NP chunking • Filter NPs with a stop-word list • Search in specialized lists for candidate terms • Annotate entities with MetaMap Universität Stuttgart 5.12.2017 13

  30. Medical Entity Recognition — MetaMap+ • Use tools like TreeTagger for the NP chunking • Filter NPs with a stop-word list • Search in specialized lists for candidate terms • Annotate entities with MetaMap • Filter frequent errors and too broad semantic types Universität Stuttgart 5.12.2017 13

  31. Medical Entity Recognition — MetaMap+ • Voting mechanism to disambiguate semantic categories Universität Stuttgart 5.12.2017 14

  32. Medical Entity Recognition — Support Vector Machines (SVMs) • Word level features: • words of the NP • number of words of the NP • window of words around the NP • Orthographical features: • first letter capitalized • all letters upper-/lowercase • contains abbreviation(s) • POS tags Universität Stuttgart 5.12.2017 15

  33. Medical Entity Recognition — BIO-CRFs Pharmacodynamic studies, including positron-emission tomography (PET) and computed tomography (CT) […] • Words are annotated with the the tags B, I and O Universität Stuttgart 5.12.2017 16

  34. Medical Entity Recognition — BIO-CRFs Pharmacodynamic studies, including positron-emission tomography (PET) and computed tomography (CT) […] • Words are annotated with the the tags B, I and O • B-x: Begin of a phrase of class x Universität Stuttgart 5.12.2017 16

  35. Medical Entity Recognition — BIO-CRFs Pharmacodynamic studies, including positron-emission tomography (PET) and computed tomography (CT) […] • Words are annotated with the the tags B, I and O • B-x: Begin of a phrase of class x • I-x: Intermediate part of a phrase of class x Universität Stuttgart 5.12.2017 16

  36. Medical Entity Recognition — BIO-CRFs Pharmacodynamic studies, including positron-emission tomography (PET) and computed tomography (CT) […] • Words are annotated with the the tags B, I and O • B-x: Begin of a phrase of class x • I-x: Intermediate part of a phrase of class x • O: Outside entities Universität Stuttgart 5.12.2017 16

  37. Medical Entity Recognition — BIO-CRFs • Word level features: • The word itself • Window of words • Lemmas • Orthographical features: • Upper/lowercase • contains a digit • pre- and suffixes • POS tags • (Semantic category of word (provided by MetaMap+)) Universität Stuttgart 5.12.2017 17

  38. Medical Entity Recognition — Evaluation • Corpus contains discharge summaries and progress notes • De-identified and annotated by hand • Entities: Problem, Treatment and Test • Overall 76,665 sentences Universität Stuttgart 5.12.2017 18

  39. Medical Entity Recognition — Evaluation Setting Precision Recall F-Score MetaMap 15.52 16.10 15.80 MetaMap+ 48.68 56.46 52.28 SVM 43.65 47.16 45.33 BIO-CRF 70.15 83.31 76.17 BIO-CRF-Hybrid 72.18 83.78 77.55 Universität Stuttgart 20.01.2016 19

  40. Anonymization of Medical Reports Universität Stuttgart 20.01.2016 20

  41. Anonymization of Medical Reports — What is anonymization? • De-Identification Universität Stuttgart 5.12.2017 21

  42. Anonymization of Medical Reports — What is anonymization? • De-Identification • Completely remove all personal health information Universität Stuttgart 5.12.2017 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend