Using ontologies to mine unstructured data in medicine Nigam Shah, - - PowerPoint PPT Presentation

using ontologies to mine unstructured data in medicine
SMART_READER_LITE
LIVE PREVIEW

Using ontologies to mine unstructured data in medicine Nigam Shah, - - PowerPoint PPT Presentation

Using ontologies to mine unstructured data in medicine Nigam Shah, MBBS, PhD nigam@stanford.edu Profiling a patient set Patients with some diagnosis All patients Disease Ontology Appropriate control Profiling patient sets ICD9 789.00 (


slide-1
SLIDE 1

Using ontologies to mine unstructured data in medicine

Nigam Shah, MBBS, PhD nigam@stanford.edu

slide-2
SLIDE 2

Profiling a patient set

Disease Ontology

Appropriate control Patients with some diagnosis All patients

slide-3
SLIDE 3

Profiling patient sets

86k patient Reports ICD9 789.00 (Abdominal pain, unspecified site)

Patient records processed from U. Pittsburg NLP Repository with IRB approval.

X (+) X (-) Y (+) a b Y (-) c d

slide-4
SLIDE 4

Associations and outcomes

Gene Disease Drug Device Procedure Environment Gene Disease Drug Device Procedure Environment

Side effects Off-label Indications

Gene Enrichment

What associations can we find?

slide-5
SLIDE 5

Term – 1 : : : Term – n

Sy ntactic ty pes Frequency

Term recognition tool NCBO Annotator NegEx Patterns NegEx Rules – Negation detection

P1 ICD9 ICD9 ICD9 ICD9 ICD9 ICD9 P1 T1, T2, no T4 … T5, T4, T3 … T4, T3, T1 T8, T9, T4 … T6, T8, T10 T1, T2, no T4 P2 P2 P3 P3 : : Pn Pn

Terms form a temporal series of tags 

Cohort

  • f

Interest

Diseases Procedures Drugs

BioPortal – knowledge graph Creating clean lexicons Annotation Workflow

Further Analysis

Text clinical note Terms Recognized Negation detection

Generation of annotated data at scale

slide-6
SLIDE 6

ROR of 2.058, CI of [1.804, 2.349] The X2 statistic has p-value < 10-7 ROR=1.524, CI=[0.872, 2.666] X2 p-value = 0.06816.

Detecting the Vioxx Risk Signal

Vioxx Patients (1,560) RA Patients (14,079) MI Patients (1,827) VioxxMI (339)

p-value < 1.3x10-24

MI No MI Vioxx

a = 339 b = 1221

No Vioxx

c = 1488 d = 11031

slide-7
SLIDE 7

We should stop acting as if our goal is to author extremely elegant theories, […] and make use of the best ally we have: the unreasonable effectiveness of data.

slide-8
SLIDE 8

Big Data in biomedicine

Data Size Big Next gen-seq Small EMR, Clinical notes Small Large Number of samples

?

slide-9
SLIDE 9

The problem

On-label Off-label Indication What Pharma companies get approval for Whatever else the doctor prescribes for Side effect / Adverse effect Found during the pre- marketing phase Goal of drug-safety surveillance

  • 21% of prescriptions
  • 73% with very little

evidence

  • Ambulatory: 100,000 deaths

and $177 billion annually

  • In patient: estimated that

roughly 30% of hospital stays have an adverse drug event

slide-10
SLIDE 10

Detecting Off-label use

slide-11
SLIDE 11

Detecting Adverse Events

slide-12
SLIDE 12

Patterns worth testing (off-label usage, which is risky)

 Identify off-label use

  • Find drug-indication pairs that “look like” indications

 Identify which use “may be risky”

  • Use existing, known side effect databases
  • Learn drug-disease associations that look like side

effects

 Assemble I-D-A triplets

  • Indication – Drug – Adverse effect. e.g. RA – Vioxx – MI

 Test on unstructured data

slide-13
SLIDE 13

Testing ‘interesting patterns’

slide-14
SLIDE 14
slide-15
SLIDE 15

15

The team @

www.bioontology.org/project-team NIH Roadmap grant U54 HG004028