Clinical NLP, PubGene Clinical trials in Coremine Oncology Text - - PowerPoint PPT Presentation

clinical nlp pubgene
SMART_READER_LITE
LIVE PREVIEW

Clinical NLP, PubGene Clinical trials in Coremine Oncology Text - - PowerPoint PPT Presentation

Clinical NLP, PubGene Clinical trials in Coremine Oncology Text processing and information extraction for surgery planning form November 2017 Dag Are Steenhoff Hov, PubGene AS 1 PubGene, founded 2001 ArrayIt H25K microarray Scientific


slide-1
SLIDE 1

Clinical NLP, PubGene

Clinical trials in Coremine Oncology Text processing and information extraction for surgery planning form

November 2017 Dag Are Steenhoff Hov, PubGene AS

1

slide-2
SLIDE 2

PubGene, founded 2001

2

ArrayIt H25K microarray Scientific Literature Coremine Networks

COREMINE Oncology COREMINE Medical COREMINE Platform

Integration of structured and unstructured information

  • Interpretation of biomedical analysis data
  • General information
  • Specialized information analysis
slide-3
SLIDE 3

Clinical NLP in PubGene - examples

3

  • Clinical trials in Coremine Oncology
  • PubGene in Ahus Optique

Courtesy of DNV-GL (Tore Hartvigsen)

slide-4
SLIDE 4

Coremine Oncology

4

AIM: To enable oncologists to make better treatment decisions HOW: Combine data from relevant sources to aid interpretation of oncogenomics data from NGS and

  • ther platforms
  • Input: Somatic mutations, copy number changes, gene expression, or similar

quantity

  • Output: Gene/biomarker annotations, related drugs and drug sensitivity,

pathways, clinical trials, etc.

slide-5
SLIDE 5

Coremine Oncology – Our Scope

We focus on:

  • Analysis of “called events”; assumed that normalization and data quality

considerations have been taken care of

  • Collecting and integrating information for interpretation
  • Linking to potentially relevant treatments
  • Linking to clinical trials related to the input data

5

slide-6
SLIDE 6

Coremine Oncology

  • Currently three types of input data:

– (Somatic) mutations – Copy number changes – gene expression

  • Analysis/Interpretation module to display information (annotations) about

– Mutation – Gene/Protein – Protein Domains

  • Summary module to show patient level information with respect to:

– Statistics on mutations – Related drugs for targets with change (in progress: also biomarker and sensitivity info) – Pathways for targets with change – Relevant clinical trials for aberrations

6

slide-7
SLIDE 7

Example Somatic mutations input data

7

  • Input for Coremine Oncology, case from lung cancer

– Chromosome number – Position – Reference nucleotide – Alternate nucleotide

slide-8
SLIDE 8

View of imported data file

slide-9
SLIDE 9

Mutation annotation – 1 patient - 1 missense mutation

slide-10
SLIDE 10

10

slide-11
SLIDE 11

Clinical Trials for Cetuximab

11

slide-12
SLIDE 12

Clinical Trials for biomarkers

AIM:

  • To map biomarkers from patient data

to relevant clinical trials

METHOD:

  • Identify how biomarkers are

mentioned (referred to) in clinical trials

  • Download and index data from

clinicaltrials.gov

  • Develop dictionaries of biomarkers

and methods for detecting these in trial descriptions

  • Focus on eligibility

CHALLENGES

  • Text mining is difficult!
  • Biomarkers are described, or referred

to in many ways

  • Ultimately, we want to identify

biomarkers related to eligibility, but this is not straightforward

  • Complicated logic in

inclusion/exclusion criteria, e.g., negation

  • Also need to check title, description,

and condition for biomarkers

12

slide-13
SLIDE 13

Clinical Trials text data mining

  • Compiled several lists of biomarkers
  • f different types:

– Single-Nucleotide mutations (Cosmic) – Polymorphisms – Fusion genes – Gene regulation (Exp-up/down) – Copy number changes

  • Several strategies for finding these in

text:

– Detect explicit mentions – Detect patterns based on gene name and ‘marker’ type, e.g., “GENE amplification” “GENE activating mutation”

  • Curated list of cancer types

matched with conditions

Statistics for patterns

  • Expression: 135
  • CNV: 32
  • Other (positive/negative): 20/10
  • Mutation: 37
  • Fusion/rearrangement/translocation:

10

Indexing statistics

  • 5350 trials with at least one

biomarker

  • 855 different biomarkers with hits
  • Top markers: BCR/ABL1 (907), ERBB2

positive (725), ERBB2 negative (603), ESR1 positive (467), ERBB2 exp-up (403)

13

slide-14
SLIDE 14

Clinical Trials for example case – NSCLC and Erlotinib

14

slide-15
SLIDE 15

Clinical Trials for copy number data (CNV)

15

slide-16
SLIDE 16

Trials matching patient biomarkers and disease

16

Cancer type, e.g., NSCLC CNA EXP SNA INDEL FUSION SNP

Domain knowledge Manual curation Filter

GUI or command line

Clinical Trials

slide-17
SLIDE 17

Clinical Trials for combined data – NSCLC

17

BRAF G469A BRAF D594G BRAF V600E EGFR T790M KIF5B/RET CD74/ROS1 KIF5B/ALK BCR/ABL1

slide-18
SLIDE 18

Details from Clinical Trial information – NCT01922583

18

slide-19
SLIDE 19

Clinical Trials matching to patient data

  • Various levels of stringency for

matching trial to patient

  • Perfect match
  • Other alteration (incl. same effect)
  • Same gene (other biomarker)
  • Related gene
  • S = weighted sum of scores
  • Biomarker specific scoring models

due to different prioritization of relevance of other alterations

  • AIM: To better map/identify other

alterations with same/similar effect, e.g., amplification/up-regulation with activating mutation Example: Patient ERBB2 Exp up Trial:

  • 1. Perfect match: ERBB2 Exp up
  • 2. Same effect: ERBB2 CNV gain
  • 3. Similar effect: ERBB2 Positive
  • 4. Other alteration: ERBB2 mutation
  • 5. Likely opposite effect: ERBB2 Neg.
  • 6. Opposite effect: ERBB2 Exp down
  • r, ERBB2 CNV loss
  • 7. Gene Only: ERBB2
  • 8. Related Gene: EGFR

19

slide-20
SLIDE 20

Clinical NLP in PubGene - examples

20

  • PubGene in Ahus Optique

Courtesy of DNV-GL (Tore Hartvigsen)

slide-21
SLIDE 21

Increase patient security by providing easier access to existing information

Human touch and empathy – with professional skill

Akershus University Hospital (Ahus) Optique project.

Courtesy of DNV-GL (Tore Hartvigsen)

slide-22
SLIDE 22

The Surgery Planning Form is completed in 3 Stages

22

Surgery Planning Form (“The Green Form”)

Stage 1: Examination Stage 3: Check/ QA Stage 2: Preparations Structured data Text

DIPS Ahus

Metavision O Metavision I Metavision DKS System System

Metavision Ahus Additional systems To complete the form, data must be collected from a number of systems! This is today done manually.

Courtesy of DNV-GL (Tore Hartvigsen)

slide-23
SLIDE 23

Leave the data in the source systems!

23

Metavision O DIPS Metav Ahus production databases Ahus research Databases. Metavision I Metavision DKS

DIPS (EPJ) (EPJ)

DIPS

(EPJ)

(EPJ)

Researchers/ Analysts

Data warehousing is an option

A semantic IT solution and

  • ntology for clinical use in

Health Care Expert users «Ordinary» users

Courtesy of DNV-GL (Tore Hartvigsen)

slide-24
SLIDE 24

We want to «lift» the data out of the silos!

24 Text mining Solutions provided by the Optique project A semantic IT solution and

  • ntology for clinical use in

Health Care

Structured data Unstructured data (text) Expert users

«Ordinary» users

Courtesy of DNV-GL (Tore Hartvigsen)

slide-25
SLIDE 25

PubGene in Ahus Optique, information extraction

Unstructured information

  • Height 1,83 m

Structured information

  • name=height, type=int,

unit=cm, value=183

25

Fields

  • ASA
  • BMI
  • Height
  • Weight
  • Puls
  • Blood pressure
  • Temperature
  • Diagnose codes
  • Treatment codes
slide-26
SLIDE 26

PubGene i Ahus Optique, allergy information

26

slide-27
SLIDE 27

PubGene i Ahus Optique, status on smoking

27

Sentence Status Røyker. Yes Røyker 15-20 om dagen. Yes Ifølge datter er han også storrøyker, 40/ dag siste 50 år. Yes Røykeplaster? Uncertain Tidligere storrøyker. Stopped Ikke røyker og drikker ikke alkohol, tidligere, måteholdent alkoholbruk. No Eks-røyker, lite alkohol. Stopped Text analysis

  • Separate text in sentences, detection of sentences containing “røyke…”, “røyki…”, “røykt…”
  • Classification of sentences based on recognition of keywords and word or sentence patterns
  • NB: Based on a small database
slide-28
SLIDE 28

Ahus Optique

28

  • Screenshots

Courtesy of DNV-GL (Tore Hartvigsen)

slide-29
SLIDE 29

Courtesy of DNV-GL (Tore Hartvigsen)

slide-30
SLIDE 30

Courtesy of DNV-GL (Tore Hartvigsen)

Page for surgery planning form

slide-31
SLIDE 31

Courtesy of DNV-GL (Tore Hartvigsen)

slide-32
SLIDE 32

BMI

Courtesy of DNV-GL (Tore Hartvigsen)

slide-33
SLIDE 33

Courtesy of DNV-GL (Tore Hartvigsen)

slide-34
SLIDE 34

Courtesy of DNV-GL (Tore Hartvigsen)

slide-35
SLIDE 35

Courtesy of DNV-GL (Tore Hartvigsen)

slide-36
SLIDE 36

Courtesy of DNV-GL (Tore Hartvigsen)

slide-37
SLIDE 37

Courtesy of DNV-GL (Tore Hartvigsen)

Allergy

slide-38
SLIDE 38
slide-39
SLIDE 39

Courtesy of DNV-GL (Tore Hartvigsen)

Smoking

slide-40
SLIDE 40

Courtesy of DNV-GL (Tore Hartvigsen)

slide-41
SLIDE 41

Courtesy of DNV-GL (Tore Hartvigsen)

Surgery planning form

slide-42
SLIDE 42

Further development, text processing/analysis

  • A large set of options and potential

– Far more effective collection of more relevant information, e.g., by filling surgery forms (“The green form”) – Improved quality through automatic detection of errors in documents and control of consistency with structured data

  • Further steps for Ahus Optique

– Simple: Extraction of more “static” fields, like lab results – Information about medication – Information on heart function, lung function – Exploit document structure and information on document types

42