HEALTH AND BIOSECURITY
Aidan O’Brien Health Data Analytics 2018 aydun1
Large-scale machine learning for genotype / phenotype association - - PowerPoint PPT Presentation
Large-scale machine learning for genotype / phenotype association Aidan OBrien Health Data Analytics 2018 HEALTH AND BIOSECURITY aydun1 By 2025 it is estimated that 50% of the world population will have been sequenced. Frost&Sullivan
HEALTH AND BIOSECURITY
Aidan O’Brien Health Data Analytics 2018 aydun1
20 EB Storage / year
Stephens et al. BigData: Astronomical or Genomical (2015)
Data acquisition of BigData disciplines in 2025 Genomics
YouTube Astronomy Twitter
Frost&Sullivan
Large-scale Machine Learning for Gen-Phen Association | Aidan O’Brien | @aydun1 2 |
https://www.projectmine.com/about/ Large-scale Machine Learning for Gen-Phen Association | Aidan O’Brien | @aydun1 3 |
cases controls Gene1 Gene2
Large-scale Machine Learning for Gen-Phen Association | Aidan O’Brien | @aydun1 4 |
Large-scale Machine Learning for Gen-Phen Association | Aidan O’Brien | @aydun1
cases controls
5 |
80 Million features Individuals
Genomic profile Disease status 22,500 samples
Disease genes
Large-scale Machine Learning for Gen-Phen Association | Aidan O’Brien | @aydun1 6 |
A B C
Large-scale Machine Learning for Gen-Phen Association | Aidan O’Brien | @aydun1
Individuals Genomic profile Predictive variants Predictive variants Individuals Genomic profile
7 |
Large-scale Machine Learning for Gen-Phen Association | Aidan O’Brien | @aydun1 8 |
High-performance compute cluster Hadoop/Spark compute cluster Focus Compute-intensive Data-intensive Fault tolerant No Yes Node-bound Yes No Parallelization 100+ CPU 1000+ CPU Parallelization procedure bespoke standardized CSIRO solution
Large-scale Machine Learning for Gen-Phen Association | Aidan O’Brien | @aydun1 9 |
Spark Core
SparkML MLlib
Variant Spark
low
Accuracy high
low
Speed high
“Analyzes 3000 individuals with 80M features in 30 minutes“
BMC Genomics 2015, 16:1052 PMID: 26651996 (citation=16)
Large-scale Machine Learning for Gen-Phen Association | Aidan O’Brien | @aydun1 10 |
phenotype: 1,936 individuals with 7.2 Million variants (imputed from array)
traditional GWAS (single loci regression).
smaller cohorts give robust insights
Large-scale Machine Learning for Gen-Phen Association | Aidan O’Brien | @aydun1 11 |
More accurate biomarker discovery
Large-scale Machine Learning for Gen-Phen Association | Aidan O’Brien | @aydun1 12 |
HipsterScore =
(2 * B6) + (0.2 * B2) + (1.5 * R1) + (0.1 * C2) + (3 * B6 * B2) + (2.5 * R1 * C1) + noise
independent interacting Hipster? Y Y N Y N N N Y N Genome 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Large-scale Machine Learning for Gen-Phen Association | Aidan O’Brien | @aydun1 13 |
https://docs.databricks.com/applications /genomics/variant-spark.html
CSIRO’s cloud- based solutions
Innovation In Digital Health - Open Floor Forum | Denis C. Bauer | @allPowerde 14 |
Finding Disease Genes Correcting Genomes Treating Individuals
Large-scale Machine Learning for Gen-Phen Association | Aidan O’Brien | @aydun1 15 |
Denis Bauer, PhD Oscar Luo, PhD Rob Dunne, PhD Piotr Szul Team Aidan O’Brien Laurence Wilson, PhD Collaborators News Software Arash Bayat Lynn Langit Natalie Twine, PhD
Top 10 Australian IT stories of 2017
You?
We are hiring… …email Denis Brendan Hosking
Keynote Aidan O’Brien, CSIRO
Large-scale Machine Learning for Gen-Phen Association | Aidan O’Brien | @aydun1