using ontologies to mine unstructured data in medicine
play

Using ontologies to mine unstructured data in medicine Nigam Shah, - PowerPoint PPT Presentation

Using ontologies to mine unstructured data in medicine Nigam Shah, MBBS, PhD nigam@stanford.edu Profiling a patient set Patients with some diagnosis All patients Disease Ontology Appropriate control Profiling patient sets ICD9 789.00 (


  1. Using ontologies to mine unstructured data in medicine Nigam Shah, MBBS, PhD nigam@stanford.edu

  2. Profiling a patient set Patients with some diagnosis All patients Disease Ontology Appropriate control

  3. Profiling patient sets ICD9 789.00 ( Abdominal pain, unspecified site ) X (+) X (-) a b Y (+) Y (-) c d 86k patient Reports Patient records processed from U. Pittsburg NLP Repository with IRB approval.

  4. Associations and outcomes Gene Disease Drug Device Procedure Environment Gene Gene Enrichment Off-label Disease Indications What Drug Side effects associations Device can we find? Procedure Environment

  5. Generation of annotated data at scale Text clinical note BioPortal – knowledge graph Creating clean lexicons Frequency Term – 1 Diseases : Annotation Workflow Term recognition : tool NCBO NegEx Procedures Annotator : Patterns Sy ntactic ty pes Term – n Drugs Terms Recognized P1 ICD9 ICD9 ICD9 ICD9 ICD9 ICD9 … … … P1 T1, T5, T4, T8, T6, T1, Further Analysis T2, T4, T3, T9, T8, T2, NegEx Rules – no T3 T1 T4 T10 no T4 T4 Negation detection P2 P2 Negation detection P3 Interest Cohort P3 of : : Pn Terms form a temporal series of tags  Pn

  6. Detecting the Vioxx Risk Signal Vioxx Patients (1,560) Vioxx  MI (339) MI Patients (1,827) ROR of 2.058, CI of [1.804, 2.349] ROR=1.524, CI=[0.872, The X 2 statistic has p-value < 10 -7 2.666] X 2 p-value = 0.06816. RA Patients (14,079) p-value < 1.3x10 -24 MI No MI Vioxx a = 339 b = 1221 No Vioxx c = 1488 d = 11031

  7. We should stop acting as if our goal is to author extremely elegant theories, […] and make use of the best ally we have: the unreasonable effectiveness of data.

  8. Big Data in biomedicine ? Big Next gen-seq Data Size Small EMR, Clinical notes Small Large Number of samples

  9. The problem On-label Off-label What Pharma Whatever else the Indication companies get approval doctor prescribes for for Side effect / Found during the pre- Goal of drug-safety Adverse marketing phase surveillance effect • Ambulatory: 100,000 deaths and $177 billion annually • 21% of prescriptions • In patient: estimated that roughly 30% of hospital stays • 73% with very little have an adverse drug event evidence

  10. Detecting Off-label use

  11. Detecting Adverse Events

  12. Patterns worth testing (off-label usage, which is risky)  Identify off-label use • Find drug- indication pairs that “look like” indications  Identify which use “may be risky” • Use existing, known side effect databases • Learn drug-disease associations that look like side effects  Assemble I-D-A triplets • Indication – Drug – Adverse effect. e.g. RA – Vioxx – MI  Test on unstructured data

  13. Testing ‘interesting patterns’

  14. The team @ www.bioontology.org/project-team NIH Roadmap grant U54 HG004028 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend