Constraints Driven Information Extraction in the Medical Domain Dan - - PowerPoint PPT Presentation

constraints driven information extraction in the
SMART_READER_LITE
LIVE PREVIEW

Constraints Driven Information Extraction in the Medical Domain Dan - - PowerPoint PPT Presentation

Constraints Driven Information Extraction in the Medical Domain Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign November 2012 With thanks to: Collaborators: Ming-Wei Chang, Prateek Jindal, Lev Ratinov, Many


slide-1
SLIDE 1

November 2012 HITC Workshop 2012 @ University of Illinoi With thanks to: Collaborators: Ming-Wei Chang, Prateek Jindal, Lev Ratinov, Many others Funding: NSF; DHS; NIH; DARPA.ONC

Constraints Driven Information Extraction in the Medical Domain

Dan Roth

Department of Computer Science University of Illinois at Urbana-Champaign

Page 1

slide-2
SLIDE 2

2

Medical Informatics

 An electronic health record (EHR) is a personal

health record in digital format.

 Patient-centric information that should aid clinical

decision-making.

 Includes information relating to the current and

historical health, medical conditions and medical tests of its subject.

 Data about medical referrals, treatments,

medications, demographic information and other non-clinical administrative information.

A narrative with embedded database elements Potential Benefits

 Health

 Utilize in medical advice systems  Medication selection and tracking (Vioxx…)  Disease outbreak and control

 Science

 Correlating response to drugs with other

conditions Technological Challenges Privacy Challenges

Needs Enable information extraction & information integration across various projections

  • f the data and across

systems

slide-3
SLIDE 3

Analyzing Electronic Health Records

The patient is a 65 year old female with post thoracotomy syndrome that

  • ccurred on the site of her thoracotomy incision .

She had a thoracic aortic aneurysm repaired in the past and subsequently developed neuropathic pain at the incision site . She is currently on Vicodin , one to two tablets every four hours p.r.n. , Fentanyl patch 25 mcg an hour , change of patch every 72 hours , Elavil 50 mgq .h.s. , Neurontin 600 mg p.o. t.i.d. with still what she reports as stabbing left-sided chest pain that can be as severe as a 7/10. She has failed conservative therapy and is admitted for a spinal cord stimulator trial .

3

[The patient] is a 65 year old female with [post thoracotomy syndrome] [that]

  • ccurred on the site of [[her] thoracotomy incision] .

[She] had [a thoracic aortic aneurysm] repaired in the past and subsequently developed [neuropathic pain] at [the incision site] . [She] is currently on [Vicodin] , one to two tablets every four hours p.r.n. , [Fentanyl patch] 25 mcg an hour , change of patch every 72 hours , [Elavil] 50 mgq .h.s. , [Neurontin] 600 mg p.o. t.i.d. with still what [she] reports as [stabbing left-sided chest pain] [that] can be as severe as a 7/10. [She] has failed [conservative therapy] and is admitted for [a spinal cord stimulator trial] . Identify Important Mentions

slide-4
SLIDE 4

Analyzing Electronic Health Records

[The patient] is a 65 year old female with [post thoracotomy syndrome] [that]

  • ccurred on the site of [[her] thoracotomy incision] .

[She] had [a thoracic aortic aneurysm] repaired in the past and subsequently developed [neuropathic pain] at [the incision site] . [She] is currently on [Vicodin] , one to two tablets every four hours p.r.n. , [Fentanyl patch] 25 mcg an hour , change of patch every 72 hours , [Elavil] 50 mgq .h.s. , [Neurontin] 600 mg p.o. t.i.d. with still what [she] reports as [stabbing left-sided chest pain] [that] can be as severe as a 7/10. [She] has failed [conservative therapy] and is admitted for [a spinal cord stimulator trial] .

4

Red : Problems Green : Treatments Purple : Tests Blue : People

Identify Concept Types

slide-5
SLIDE 5

Analyzing Electronic Health Records

[The patient] is a 65 year old female with [post thoracotomy syndrome] [that]

  • ccurred on the site of [[her] thoracotomy incision] .

[She] had [a thoracic aortic aneurysm] repaired in the past and subsequently developed [neuropathic pain] at [the incision site] . [She] is currently on [Vicodin] , one to two tablets every four hours p.r.n. , [Fentanyl patch] 25 mcg an hour , change of patch every 72 hours , [Elavil] 50 mgq .h.s. , [Neurontin] 600 mg p.o. t.i.d. with still what [she] reports as [stabbing left-sided chest pain] [that] can be as severe as a 7/10. [She] has failed [conservative therapy] and is admitted for [a spinal cord stimulator trial] .

5

Coreference Resolution

Other needs: temporal recognition & reasoning, relations, quantities, etc.

slide-6
SLIDE 6

Multiple Applications

 Clinical Decisions:

 “Please show me the reports of all patients who had headache that

was not cured by Aspirin.”

 Concept Recognition; Relation Identification (Problem, Treatment)

 “Please show me the reports of all patients who have had myocardial

infarction (heart attack) more than once.”

 Coreference Resolution

 Identification of sensitive data (Privacy Reasons)

 HIV Data, Drug Abuse, Family Abuse, Genetic Information

 Concept Recognition, Relations Recognition (drug, drug abuse),

coreference resolution (multiple incidents, same people)  Generating summaries for patient  Creating automatic reminders of medications

6

slide-7
SLIDE 7

Machine Learning + Inference based NLP

 All these Information extraction problems are being studied in Natural

Language Processing, typically on newswire data

 These problems are extremely difficult due to

 Ambiguity (everything has multiple meanings)  Variability (everything you want to say you can say in many ways)

 Impossible to reliably “program” predicates of interest.  Models are based on Statistical Machine Learning & Inference  The focus is on modeling and learning algorithms for different phenomena

 Classification models  Structured models  Learning protocols that exploit Minimal Supervision

 Inference as a way to introduce domain & task specific constraints

7

Well understood; easy to build black box categorizers

slide-8
SLIDE 8

Information Extraction in the Medical Domain

 Models learned on newswire data do not adapt well to the medical domain.  Different vocabulary, sentence and document structure.  More importantly, the medical domain offers a chance to do better than the

general newswire domain.

 Background Knowledge: Narrow domain; a lot of manually curated KB

resources that can be used to help identification & disambiguation.

 UMLS: A large biomedical KB, with semantic types and relationships between

concepts.

 Mesh: A large thesaurus of medical vocabulary.  SNOMED CT: A comprehensive clinical terminology.

 Structure: Medical Text has more structure that can be exploited.

 Discourse structure: Concepts in the section “Principal Diagnosis” are more likely

to be “medical problems”.

 HERs have some internal structure: Doctors, One Patient, Family Members.

8

slide-9
SLIDE 9

Learning and Inference for Medical Information Extraction

 We develop models that make global decisions that consist of

several local decisions — on identified concepts, relations between identified concepts, etc.

 Our models are developed in such a way that they can exploit

external Knowledge that could relate sub-problems

 Allows local models to retract/modify decisions that do not

cohere with decisions made by other local models.

 Goal: Incorporate local models’ information, along with prior

knowledge (constraints) in making coherent decisions

 Decisions that respect the local models as well as domain & context

specific knowledge/constraints.

Page 9

slide-10
SLIDE 10

Constrained Conditional Models

A collection of local models Coreference: pairwise classifier between mentions Concepts: a model that determines boundaries for important phrases.

Relations: Per-relation classifier (Soft) constraints component Weight Vector for “local” models Penalty for violating the constraint. How far y is from a “legal” assignment Features, classifiers; log- linear models (HMM, CRF) or a combination

Knowledge as Constraints Doctor cannot co-ref with a patient. Consistency with KB resources Consistency of anatomical terms Legitimacy of relations

Page 10

slide-11
SLIDE 11

Constrained Conditional Models

How to solve? This is an Integer Linear Program Solving using ILP packages gives an exact solution.

(Other methods are possible) search techniques are possible (Soft) constraints component Weight Vector for “local” models Penalty for violating the constraint. How far y is from a “legal” assignment Features, classifiers; log- linear models (HMM, CRF) or a combination

How to train? Training is learning the objective function We decompose the objective and learn components. ½

is can be learned jointly

1: 11

slide-12
SLIDE 12

Current Status

 State-of-the-art Coreference Resolution System for Clinical

Narratives (JAMIA’12, COLING’12)

 State-of-the-art Concept and Relation Extraction (I2B2

workshop’12)

 Current work:

 Continuing work on concept identification and Relations  End-2-End Coreference Resolution System  Sensitive Concepts

12

slide-13
SLIDE 13

Mapping to Encyclopedic Resources (Demo)

13

Beyond supporting better Natural Language Processing, Wikification could allow people to read and understand these documents and access them in an easier way.

Hydrocodone/paracetamol http://http://en.wikipedia.org/wiki/Vicodin http://en.wikipedia.org/wiki/Amitriptyline

Thanks!