Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017
23451: Developing a deep learning and AI platform for life science - - PowerPoint PPT Presentation
23451: Developing a deep learning and AI platform for life science - - PowerPoint PPT Presentation
23451: Developing a deep learning and AI platform for life science research Robert Esnouf robert@well.ox.ac.uk Head of Research Computing Core, Wellcome Centre for Human Genetics Director of Research Computing, Big Data Institute Research
Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017
Overview of talk
- The WCHG, the BDI and the Old Road Campus
- Areas of interest for applying DL techniques in the
clinical/life sciences
- Early promising results
- Expanding provision for DL/AI and general purpose GPU
computing
- Acknowledgments
Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017
The Wellcome Centre for Human Genetics
About 500 researchers in a purpose-built institute
- “to advance the understanding of genetically-related conditions through
multi-disciplinary research”
- Sequencing, statistical genetics, disease-focused research (diabetes,
- besity, heart disease, malaria), optical microscopy, MRI, functional
genetics, crystallography & electron microscopy Opened in 1999, the first building on the “Old Road Campus” surrounded by five hospitals in Headington, east Oxford
Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017
Computing growth in the WCHG
- Genetics was largely a lab-based
science with small separate servers for each research group
- Next-generation sequencing (~2007)
changed all that and in 2009 I started to build a shared infrastructure for the whole of the WCHG
- WCHG now has HPC cluster of ~4200
CPU cores; 5x Tesla K80, 8x Tesla P100 and consumer cards; ~6.7PB raw GPFS and ~5PB other storage
Death registries Cancer registries Hospital records Primary care data Pharmacy records Pathology records Screening programmes Environmental data Employment records Built environment Genetic data Imaging Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017
Big data is transforming the study of human biology and disease
Cohorts
Prospective cohorts (UKB, China, Mexico) Disease-focused cohorts Partnerships with NHS / NIHR Tropical Medicine overseas centres WHO & National ID surveillance
Measurement technologies
Imaging Genomics and other ‘omics Sensors Electronic healthcare records Patient-interactive systems
Integrative analysis methods
Statistics Epidemiology Machine learning Software development Computational ecosystem Interdisciplinary and problem- focused research institute of 350 researchers working on the acquisition and analysis of population-scale data resources linking detailed biological measurement with longitudinal information on health, treatment and outcome.
Data access and sharing
Consent Privacy and security Information governance Intellectual property Standards and protocols
The Oxford Big Data Institute: The Li Ka Shing Centre for Health Information and Discovery
Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017
Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017
A research computing infrastructure for the WCHG and BDI
- Linking with dark fibre and quad EDR InfiniBand
- Expanding shared HPC and high-performance storage
- Creating a scalable virtualization platform on OpenStack
- Secure multisite scalable S3 object store
- GPU-accelerated virtual desktop infrastructure and
independent identity management and authorization
- Opening facility across Oxford departments to drive
efficient collaboration
Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017
UK Biobank: wearable accelerometer data
Aiden Doherty
- 103,712 participants with 7 days data per participant
- 100Hz tri-axial acceleration data and 0.2Hz temperature/light information
Self-reporting: 50% Accelerometer: 5% Self-reporting: 38% Accelerometer: 5%
Accelerometers better than self-reporting! (R = 0.48–0.60 vs. R = 0.07–0.28) Objective measures of physical activity more strongly associated with mortality
Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017
Predicting malaria risk
- Relate environmental factors (temperature, rainfall etc.) to malaria prevalence.
- Using point surveys and environment from annual 5km x 5km raster pixels.
- We already use stacking. Train a number of machine learning models and feed
predictions from these models into a meta-learner. We use geostatistical models as our meta-learners.
Tim Lucas and Pete Gething
Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017
Base-calling data from Oxford Nanopore Technologies sequencers
Hannah Roberts and Gerton Lunter As DNA or RNA pass through the pore, current traces are recorded from which the sequence of bases can be inferred (‘base-calling’) State-of-the-art base-callers use deep neural networks to interpret the current signals, improving on older methods from 71% (HMM; R7.3 chemistry) to 90% accurate (DNN; R9 chemistry) With more training, DNNs may be able to detect modified bases (e.g. methylation patterns)
G T T C T G T A T AT C TT
Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017
Detecting rare genetic conditions from craniofacial features
- There are many, many rare
genetic conditions that often go undiagnosed
- Something like 1 in 12 people
has one of these conditions
- Often these conditions are
also manifest in craniofacial features
- www.minervaandme.com
does image analysis on faces to predict genetic conditions
- With better feature recognition and DL techniques researchers expect to be able
to detect more conditions more reliably
Michael Ferlaino and Chris Nellåker
Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017
Deep neural networks as models for the brain
Jessie Liu and Tom Nichols
- Deep neural networks (DNNs) were inspired by the
brain and learn similar features
- DNNs could take further inspiration from the brain
- Can we build more sophisticated or cognitive
neural representations in to DNNs?
- Such as the brain’s GPS system:
This approach will offer:
- Insights in to principles underlying neural
representations in the brain
- New DNN architectures capable of powerful, brain-
like computations
Artificial neuron firing field Real neuron firing field
Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017
Deep learning of chromatin features to predict islet-specific SNP effects
Agata Wesolowska-Andersen, Chris Holmes and Mark McCarthy
Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017
CNNs capture motifs of input ChIP-seq and known islet transcription factors
Agata Wesolowska-Andersen, Chris Holmes and Mark McCarthy
Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017
Deep learning predicts regulatory effects for high PPA SNPs
Agata Wesolowska-Andersen, Chris Holmes and Mark McCarthy
Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017
Predicting Expression using Convolutional Neural Networks (CNNs)
Moustafa Abdalla, Chris Holmes and Mark McCarthy
peaBrain
- a promoter-derived
embedding and abundance (pea) model
- a convolutional
neural network that leverages DNA sequence to predict expression
- can be used to
predict both average gene expression and variation in expression (between individuals)
Dataset: 19k genes x 4 kilo-basepairs x 32 channels (18.47 GB) representing the “core” promoter sequence of all protein-coding genes in the human reference genome
Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017
Moustafa Abdalla, Chris Holmes and Mark McCarthy
GPUs are necessary for computational tractibility
Quad E5-4640 (64 threads) Single Tesla K80 Single Tesla K80 Single Tesla P100
Green: fraction of genes whose expression can be predicted using the model R2 is average of repeated out-of-sample (test) sets
Neural Network Regularized Linear Regression Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017
CNNs already outperform previous computational and experimental methods
Moustafa Abdalla, Chris Holmes and Mark McCarthy
Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017
Provision for DL/AI in WCHG/BDI
- Until Easter 2017, GPUs were mainly used for electron tomography
(“dynamo”) and single-particle electron microscopy (“relion”)
- Dell C4130 with 4x K80; Dell R730 with 1x K80;
Scan workstation with TitanXp
- Free-for-all access
- Adding a tiered set of local and shared resources:
- Initial exploration and testing
- 3x Gigabyte servers each with 4x GTX 1080Ti
- 1x SuperMicro workstation with 1x GTX 1080Ti
- 1x Scan workstation with 1x TitanXp
- Mid-scale training and inference along with image analysis
- 1x Dell R730 with 1x K80
- 1x Dell C4130 with 4x K80
- 2x Dell C4130 each with 4x P100 (SXM2)
- 1x Scan workstation with 1x V100 (PCIe)
- Controlling access within Univa Grid Engine
Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017
Provision for DL/AI with central Oxford IT
- Oxford IT Services Advanced Research Computing (ARC) mainly
supports maths, physical and life sciences across Oxford
- ARC has ~7500 core cluster and ~40x K40 and K80 GPUs
- Shared purchase of an NVidia DGX-1V between ARC, WCHG, BDI
and WIMM to be housed by ARC
- First Volta system within UK academic sector
- Will be majority devoted to life-science and clinical research
- Delivery before end of October – thanks NVidia!
- Oxford ARC manages new JADE cluster
- UK national GPU cluster
- 22x NVidia DGX-1 Pascal systems
- Some access for life-science projects
- There is vast enthusiasm for DL/AI in life sciences and clinical
- research. Early results are promising. The WCHG and BDI will grow
their DL/AI hardware capability to meet this researcher demand
Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017
Acknowledgments
- Members of the the research computing teams
- Jon Diprose, Colin Freeman, Callum Smith and Adam Huffman
- The researchers across Oxford who have provided me
with descriptions of their research
- Moustafa Abdalla, Gavin Band, Adrian Cortes, Aiden Doherty,
Michael Ferlaino, Jessie Liu, Tim Lucas, Hannah Roberts, Agata Wesolowska-Andersen and Joe Zhu
- My bosses and others who have helped us to grow
- Profs. Peter Donnelly (WCHG) and Gil McVean (BDI)
- Funding agencies, especially the Wellcome Trust, the Li Ka Shing
and Robertson Foundations, the Medical Research Council
- And to you for your attention…
- Enjoy the rest of the conference!