High-Performance Data Mining for Drug Effect Detection - a project - - PowerPoint PPT Presentation

high performance data mining for drug effect detection
SMART_READER_LITE
LIVE PREVIEW

High-Performance Data Mining for Drug Effect Detection - a project - - PowerPoint PPT Presentation

High-Performance Data Mining for Drug Effect Detection - a project funded by the Swedish Foundation for Strategic Research during 2012-2017 Henrik Bostrm*, Lars Asker, Hercules Dalianis, Mia Kvist Aron Henriksson, Isak Karlsson, Jing Zhao


slide-1
SLIDE 1

High-Performance Data Mining for Drug Effect Detection

  • a project funded by the Swedish Foundation for Strategic Research during 2012-2017

Henrik Boström*, Lars Asker, Hercules Dalianis, Mia Kvist Aron Henriksson, Isak Karlsson, Jing Zhao Department of Computer and Systems Sciences Stockholm University Ulf Johansson, Håkan Sundell Karl Jansson, Henrik Linusson, Tuve Löfström Department of Information Technology University of Borås

  • Project focus
  • Organization
  • Results
  • Continuation

*At KTH Royal Institute of Technology since Oct. 2017

slide-2
SLIDE 2

Project focus

To develop techniques and tools to support decision making and discovery of drug effects by analyzing electronic health records and chemical compound data

Electronic health records (EHRs) Chemical compound data

Technical challenges

  • heterogeneous data
  • high-dimensional, sparse and incomplete data
  • temporal dependencies
  • guarantees needed for predictions
slide-3
SLIDE 3

Project organization

PhD students Henrik Linusson Tuve Löfström Supervisors Ulf Johansson Henrik Boström PhD student Karl Jansson Supervisors Håkan Sundell Henrik Boström PhD students Aron Henriksson Maria Skeppstedt Supervisors Hercules Dalianis Martin Duneld Mia Kvist PhD students Isak Karlsson Jing Zhao Supervisors Lars Asker Henrik Boström Panos Papapetrou

Adverse drug event (ADE) detection Clinical text mining Conformal prediction Parallel data mining

slide-4
SLIDE 4

Scientific output

  • Publications
  • 21 journal papers
  • 45 conference papers
  • 12 workshop papers
  • 4 PhD Theses
  • 1 Licentiate thesis
  • 1 forthcoming book
  • Awards
  • The Börje Langefors Prize awarded by SISA to

Aron Henriksson for best PhD thesis in informatics at a Swedish university in 2016

  • Carl H. Smith Award for best paper for I. Karlsson et al, 2016,

Early Random Shapelet Forest, Proc. of the 19th International Conference on Discovery Science

  • Distinguished Paper Award to Jing Zhao et al., 2015, American

Medical Informatics Association (AMIA) Annual Symposium

slide-5
SLIDE 5
  • Random forests are capable of

screening EHRs that should be assigned ADE codes

  • High dimensionality can be handled

by using random indexing to reduce dimensionality on EHRs

  • Sparsity can be handled in

random forests by resampling when predicting ADEs

  • The choice of representation of

EHRs has a high impact on the result, e.g., using concept hierarchies of clinical codes, patient-level vs. visit-level analysis, and how time dependencies are encoded

Main results on ADE detection

  • The random forest algorithm was

extended to handle heterogeneous time evolving data by sampling temporal patterns

x …

slide-6
SLIDE 6
  • Ensembles of semantic spaces

created by manipulating underlying data and model hyper-parameters lead to improved performance on terminology development, NER, relation extraction and ADE detection

  • Distributional semantics extended to

non-linguistic sequence data

  • Diagnosis codes (ICD)
  • Drug codes (ATC)
  • Clinical measurements

Main results on clinical text mining

  • Combining heterogeneous data

from EHRs shown to lead to increased predictive performance – early fusion

  • utperforming late fusion strategies
  • Annotated corpus for learning to identity

drugs and symptoms/disorders in clinical notes

slide-7
SLIDE 7

Main results on conformal prediction

  • The framework was adapted to specific machine learning techniques

– decision trees – random forests – ensembles of neural networks

  • The conformal prediction framework was improved

– tighter one-tailed predictions – sound procedure for using out-of-bag-instances for calibration

  • Application to specific learning situations

– handling imbalanced data – streaming data Database Model Standard prediction: T

  • xicity = 5.2

P(correct) = ? Conformal prediction: P(correct) = 95% T

  • xicity = (4.5, 5.9)
slide-8
SLIDE 8

Main results on parallel data mining

Parallel implementations of Random Forests and Extremely Randomized Trees for

  • GPU and CPU for both classification and regression tasks
  • Streaming solutions for GPUs to support datasets larger than the memory

available on the GPU

  • GPU solutions were found to outperform state-of-the-art implementations
  • GPU solutions were found to scale well (almost linear) with multiple GPUs

1000x 50 100 150 200 250 300 350 400 gpuERT cpuERT gpuRF FastRF WekaRF trees seconds

slide-9
SLIDE 9

Additional results

  • Ethical permissions and access to EHRs:
  • 2 million electronic patient records from Karolinska

University Hospital during 2007-2014

  • HEALTH BANK Infrastructure
  • Software packages:
  • random forests (Julia, GPU, Java, Erlang)
  • text mining tools
  • conformal prediction (Python, Julia)
  • adverse event exploration and detection tools
  • Critical mass of expertise in the area and established

connections with stake holders:

  • Karolinska University Hospital
  • Centre for Pharmacoepidemiology at Karolinska Institute
  • AstraZeneca
  • Swedish Toxicology Sciences Research Center
slide-10
SLIDE 10

Continuation

  • Nordic Center of Excellence in Health-Related e-Sciences

(NIASC), Nordforsk, 44 MSEK, 2014-2018, partners: Karolinska Institutet, CBS, University of Copenhagen and Cancer Registry

  • f Norway
  • Analyzing registry data to find ways to improve treatment of

heart failure patients, Stockholm County Council, 4.6MSEK, 2017-2019

  • Data Analytics for Research and Development, Knowledge

Foundation, 4 MSEK, 2016-2018, partners: AstraZeneca and Scania R&D

  • Data Driven Innovation – Algorithms, Platforms and

Ecosystems, Knowledge Foundation, 12 MSEK, 2016-2020, partners: Ellos, Eton and Vinga of Sweden

  • Temporal Data Mining for Detecting Adverse Events in

Healthcare, Swedish Research Council, 3.4MSEK, 2017-2020