Clinical NLP, PubGene Clinical trials in Coremine Oncology Text - PowerPoint PPT Presentation

Clinical NLP, PubGene Clinical trials in Coremine Oncology Text processing and information extraction for surgery planning form November 2017 Dag Are Steenhoff Hov, PubGene AS 1

PubGene, founded 2001 ArrayIt H25K microarray Scientific Literature Coremine Networks Integration of structured and unstructured COREMINE information Oncology • Interpretation of biomedical analysis data COREMINE • General information Medical • Specialized information analysis COREMINE Platform 2

Clinical NLP in PubGene - examples • Clinical trials in Coremine Oncology • PubGene in Ahus Optique Courtesy of DNV-GL (Tore Hartvigsen) 3

Coremine Oncology AIM: To enable oncologists to make better treatment decisions HOW: Combine data from relevant sources to aid interpretation of oncogenomics data from NGS and other platforms • Input: Somatic mutations, copy number changes, gene expression, or similar quantity • Output: Gene/biomarker annotations, related drugs and drug sensitivity, pathways, clinical trials, etc. 4

Coremine Oncology – Our Scope We focus on: • Analysis of “called events”; assumed that normalization and data quality considerations have been taken care of • Collecting and integrating information for interpretation • Linking to potentially relevant treatments • Linking to clinical trials related to the input data 5

Coremine Oncology • Currently three types of input data: – (Somatic) mutations – Copy number changes – gene expression • Analysis/Interpretation module to display information (annotations) about – Mutation – Gene/Protein – Protein Domains • Summary module to show patient level information with respect to: – Statistics on mutations – Related drugs for targets with change (in progress: also biomarker and sensitivity info) – Pathways for targets with change – Relevant clinical trials for aberrations 6

Example Somatic mutations input data • Input for Coremine Oncology, case from lung cancer – Chromosome number – Position – Reference nucleotide – Alternate nucleotide 7

View of imported data file

Mutation annotation – 1 patient - 1 missense mutation

Clinical Trials for Cetuximab 11

Clinical Trials for biomarkers AIM: CHALLENGES • • To map biomarkers from patient data Text mining is difficult! to relevant clinical trials • Biomarkers are described, or referred to in many ways METHOD: • • Ultimately, we want to identify Identify how biomarkers are biomarkers related to eligibility, but mentioned (referred to) in clinical this is not straightforward trials • • Complicated logic in Download and index data from inclusion/exclusion criteria, e.g., clinicaltrials.gov negation • Develop dictionaries of biomarkers • Also need to check title, description, and methods for detecting these in and condition for biomarkers trial descriptions • Focus on eligibility 12

Clinical Trials text data mining • Compiled several lists of biomarkers Statistics for patterns of different types: • Expression: 135 – Single-Nucleotide mutations (Cosmic) • CNV: 32 – Polymorphisms • Other (positive/negative): 20/10 – Fusion genes • Mutation: 37 – Gene regulation (Exp-up/down) • Fusion/rearrangement/translocation: – Copy number changes 10 • Several strategies for finding these in Indexing statistics text: • – 5350 trials with at least one Detect explicit mentions – biomarker Detect patterns based on gene name and ‘marker’ type, e.g., • 855 different biomarkers with hits “GENE amplification” • Top markers: BCR/ABL1 (907), ERBB2 “GENE activating mutation” positive (725), ERBB2 negative (603), • Curated list of cancer types ESR1 positive (467), ERBB2 exp-up matched with conditions (403) 13

Clinical Trials for example case – NSCLC and Erlotinib 14

Clinical Trials for copy number data (CNV) 15

Trials matching patient biomarkers and disease Cancer type, e.g., NSCLC Filter Manual curation Domain knowledge Clinical Trials GUI or command line CNA SNP EXP FUSION INDEL SNA 16

Clinical Trials for combined data – NSCLC BRAF G469A BRAF D594G BRAF V600E EGFR T790M KIF5B/RET CD74/ROS1 KIF5B/ALK BCR/ABL1 17

Details from Clinical Trial information – NCT01922583 18

Clinical Trials matching to patient data • Various levels of stringency for Example: Patient ERBB2 Exp up matching trial to patient Trial: • Perfect match 1. Perfect match: ERBB2 Exp up • Other alteration (incl. same effect) 2. Same effect: ERBB2 CNV gain • Same gene (other biomarker) 3. Similar effect: ERBB2 Positive • Related gene 4. Other alteration: ERBB2 mutation • S = weighted sum of scores 5. Likely opposite effect: ERBB2 Neg. • Biomarker specific scoring models 6. Opposite effect: ERBB2 Exp down due to different prioritization of or, ERBB2 CNV loss relevance of other alterations 7. Gene Only: ERBB2 • AIM: To better map/identify other 8. Related Gene: EGFR alterations with same/similar effect, e.g., amplification/up-regulation with activating mutation 19

Clinical NLP in PubGene - examples • PubGene in Ahus Optique Courtesy of DNV-GL (Tore Hartvigsen) 20

Akershus University Hospital (Ahus) Optique project. Increase patient security by providing easier access to existing information Courtesy of DNV-GL (Tore Hartvigsen) Human touch and empathy – with professional skill

The Surgery Planning Form is completed in 3 Stages Surgery Planning Form Stage 1: DIPS Ahus Examination (“The Green Form”) Structured data Text Stage 2: Metavision Ahus Preparations Metavision O Metavision I Metavision DKS Additional systems Stage 3: System Check/ QA System To complete the form, data must be collected from a number of systems! This is today done manually. 22 Courtesy of DNV-GL (Tore Hartvigsen)

Leave the data in the source systems! Expert users «Ordinary» users Researchers/ Analysts Data A semantic IT solution and warehousing is an option ontology for clinical use in Health Care Ahus research Databases. DIPS DIPS Metav Metavision Metavision (EPJ) Metavision I (EPJ) O DKS DIPS (EPJ) (EPJ) Ahus production databases 23 Courtesy of DNV-GL (Tore Hartvigsen)

We want to «lift» the data out of the silos! « Ordinary » users Expert users A semantic IT solution and ontology for clinical use in Health Care Unstructured data (text) Structured data Solutions provided by the Text mining Optique project 24 Courtesy of DNV-GL (Tore Hartvigsen)

PubGene in Ahus Optique, information extraction Unstructured information Structured information • Height 1,83 m • name=height, type=int, unit=cm, value=183 Fields • ASA • BMI • Height • Weight • Puls • Blood pressure • Temperature • Diagnose codes • Treatment codes 25

PubGene i Ahus Optique, allergy information 26

PubGene i Ahus Optique, status on smoking Sentence Status Røyker. Yes Røyker 15-20 om dagen. Yes Ifølge datter er han også storrøyker, 40/ dag siste 50 år. Yes Røykeplaster? Uncertain Tidligere storrøyker. Stopped Ikke røyker og drikker ikke alkohol, tidligere, måteholdent alkoholbruk. No Eks-røyker, lite alkohol. Stopped Text analysis • Separate text in sentences, detection of sentences containing “ røyke …”, “ røyki …”, “ røykt …” • Classification of sentences based on recognition of keywords and word or sentence patterns • NB: Based on a small database 27

Ahus Optique • Screenshots Courtesy of DNV-GL (Tore Hartvigsen) 28

Courtesy of DNV-GL (Tore Hartvigsen)

Page for surgery planning form Courtesy of DNV-GL (Tore Hartvigsen)

BMI Courtesy of DNV-GL (Tore Hartvigsen)

Allergy Courtesy of DNV-GL (Tore Hartvigsen)

Smoking Courtesy of DNV-GL (Tore Hartvigsen)

Surgery planning form Courtesy of DNV-GL (Tore Hartvigsen)

Further development, text processing/analysis • A large set of options and potential – Far more effective collection of more relevant information, e.g., by filling surgery forms (“The green form”) – Improved quality through automatic detection of errors in documents and control of consistency with structured data • Further steps for Ahus Optique – Simple: Extraction of more “static” fields, like lab results – Information about medication – Information on heart function, lung function – Exploit document structure and information on document types 42

Clinical NLP, PubGene Clinical trials in Coremine Oncology Text - PowerPoint PPT Presentation

Clinical NLP, PubGene Clinical trials in Coremine Oncology Text processing and information extraction for surgery planning form November 2017 Dag Are Steenhoff Hov, PubGene AS 1 PubGene, founded 2001 ArrayIt H25K microarray Scientific

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

Ontologies for NLP NLP for Ontologies FOIS 2014 - LogOnto Workshop on Logics and Ontologies for

Facing NLP German Rigau i Claramunt http://adimen.si.ehu.es/~rigau IXA group Departamento de

IXA pipes: Efficient and Ready to Use Multilingual NLP tools Rodrigo Agerri IXA NLP Group,

Prominent Research Directions in NLP Alexander Panchenko Assistant Professor for NLP About

Natural Language Processing (NLP) In 11-711 Algorithms for NLP we take an

Deep Learning for NLP Kiran Vodrahalli Feb 11, 2015 Overview What is NLP? Natural

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

NLP Programming Tutorial 4 - Word Segmentation Graham Neubig Nara Institute of Science and

SI485i : NLP Set 12 Features and Prediction What is NLP, really? Many of our tasks boil down

Capsule Networks for NLP Will Merrill Advanced NLP 10/25/18 Capsule Networks: A Better ConvNet

Emily Emily Emily Couric Emily Couric Couric Clinical Couric Clinical Clinical Clinical

Clinical Audit Dr Richard Morgan Trust Clinical Audit Lead Jayne Porter Clinical Audit Team

CUDA-Accelerated Short-Read Alignment to a Large Reference Genome Richard Wilton Department of

CommandButton1 ber Presentation Time Abstract file name Name Abstract Title Authors

Molecular (cyto-) genetics Find genetic variation responsible for a specific disease in a patient

in Patients with Facioscapulohumeral Muscular Dystrophy Jeffrey Statland 1 , Elena Bravver 2 ,

Linking gene expression patterns and transcriptional regulation in Plasmodium falciparum CAMDA

Analysis of gene copy number changes in tumor phylogenetics Jun Zhou, Yu Lin, Vaibhav Rajan,

Genes, Environment, & Gene-Environment Interplay The Future of Mental Health Treatment?

Three objective responses: Two complete responses (CRs) in

Sambuz

Useful Links

Newsletter

Mail Us

Clinical NLP, PubGene Clinical trials in Coremine Oncology Text - PowerPoint PPT Presentation

Clinical NLP, PubGene Clinical trials in Coremine Oncology Text processing and information extraction for surgery planning form November 2017 Dag Are Steenhoff Hov, PubGene AS 1 PubGene, founded 2001 ArrayIt H25K microarray Scientific

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

Ontologies for NLP NLP for Ontologies FOIS 2014 - LogOnto Workshop on Logics and Ontologies for

Facing NLP German Rigau i Claramunt http://adimen.si.ehu.es/~rigau IXA group Departamento de

IXA pipes: Efficient and Ready to Use Multilingual NLP tools Rodrigo Agerri IXA NLP Group,

Prominent Research Directions in NLP Alexander Panchenko Assistant Professor for NLP About

Natural Language Processing (NLP) In 11-711 Algorithms for NLP we take an

Deep Learning for NLP Kiran Vodrahalli Feb 11, 2015 Overview What is NLP? Natural

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

NLP Programming Tutorial 4 - Word Segmentation Graham Neubig Nara Institute of Science and

SI485i : NLP Set 12 Features and Prediction What is NLP, really? Many of our tasks boil down

Capsule Networks for NLP Will Merrill Advanced NLP 10/25/18 Capsule Networks: A Better ConvNet

Emily Emily Emily Couric Emily Couric Couric Clinical Couric Clinical Clinical Clinical

Clinical Audit Dr Richard Morgan Trust Clinical Audit Lead Jayne Porter Clinical Audit Team

CUDA-Accelerated Short-Read Alignment to a Large Reference Genome Richard Wilton Department of

CommandButton1 ber Presentation Time Abstract file name Name Abstract Title Authors

Molecular (cyto-) genetics Find genetic variation responsible for a specific disease in a patient

in Patients with Facioscapulohumeral Muscular Dystrophy Jeffrey Statland 1 , Elena Bravver 2 ,

Linking gene expression patterns and transcriptional regulation in Plasmodium falciparum CAMDA

Analysis of gene copy number changes in tumor phylogenetics Jun Zhou, Yu Lin, Vaibhav Rajan,

Genes, Environment, &amp; Gene-Environment Interplay The Future of Mental Health Treatment?

Three objective responses: Two complete responses (CRs) in

Sambuz

Useful Links

Newsletter

Mail Us

Genes, Environment, & Gene-Environment Interplay The Future of Mental Health Treatment?