Are Next-Generation HPC Systems Ready for Population-level Genomics - PowerPoint PPT Presentation

www.bsc.es Are Next-Generation HPC Systems Ready for Population-level Genomics Data Analytics? Calvin Bulla, Lluc Alvarez and Miquel Moretó AACBB Workshop, 24/02/2018

Genome Sequencing Explosion Faster-than- Moore’s -Law growth! Whole Human Genome (WHS) sequencing cost 10x increase per year in <1K$ genomics data Source (left): National Human Genome Research Institute Source (right): B. Berger et al., CACM 2016 2

Genomics Data Analytics Typical workflow for WHG sequencing analytics Main challenge : the performance bottleneck in these applications is moving from the sequencing side (as used to be the case in the last decade) towards the computing side. 3

Barcelona Supercomputing Center (BSC) BSC is a consortium that includes: BSC objectives: • Spanish Government 60% Supercomputing services to Spanish and EU researchers 30% Catalan Government • R&D in Computer, Life, Earth and Engineering Sciences Univ. Politècnica de Catalunya (UPC) 10% • PhD programme, technology transfer, public engagement 447 people from 44 countries * 31 th of December 2015 4

The MareNostrum 4 Supercomputer Over 10 16 Floating Point Operations per second Nearly 331.8 TB 14 PB 150,000 cores of main memory of disk storage 5

Mission of BSC Scientific Departments Computer Earth Sciences Sciences To influence the way machines are built, To develop and implement global and programmed and used: programming models, regional state-of-the-art models for short- performance tools, Big Data, computer term air quality forecast and long-term architecture, energy efficiency climate applications Life CASE Sciences To develop scientific and engineering software To understand living organisms by means of to efficiently exploit super-computing capabilities theoretical and computational methods (biomedical, geophysics, atmospheric, energy, (molecular modeling, genomics, proteomics) social and economic simulations) 6

BSC: A National Lab for Precision Medicine Development and application of computational solutions for Genome Analysis in Biomedicine Nature 2011, Nature Gen. 2012 Hum. Mol. Gen , 2012 PLoS Genetics 2012 ICGC-PanCancer Gut , 2013 Gastroenterology 2015 Nature Biotech. 2014 Human Mol. Gen. 2014 SMUFIN Nature Genetics 2014 Nature 2015 BSC in the Health Nature 2016 Care system. Technology Alliances with Involved in international Pilot phase Prec. Transfer Hospitals and health research consortia for Med. foundations genomics and disease National Supercomputing Platform for Clinical Genomics Research Lab. for Precision Medicine Management of Genome Analysis Data Analytics primary data Identification of Relational DataBase Storage / Data variants Functional Interpretation Genome Base SNVs Sequence Program 2 indel 1 SVs Filtering Program 3 indel 2 Indels Patient CNV Program 4 large SV Care

Virtuous Circle for Precision Medicine GENOME SEQUENCING HOSPITAL GENOMIC DATA MANAGEMENT Patient DECISION GENOME DATA CLINICAL AND ANALYSIS FUNCTIONAL INTERPRETATION 8

Smufin S omatic Mu tation Fin der – Identification and analysis of somatic mutations related to different diseases – Identify mutations on tumour genomes comparing them against the corresponding normal genome of the same patient 9

Smufin steps Identify tumor-specific reads – Build sequence tree using tumor and normal reads – Extract unbalanced branches – Group into read blocks; expanded by aligning corresponding normal reads Define and classify potential tumor variants – Small variants: SNVs and SVs within read length – Characterization of large structural rearrangements Norm Freq Group Dict. Genome (+180GB) Tables to check Count Filter Group (+100GBs) (+MBs) Tumor Genome (+180GB) 10

Smufin in numbers Inefficient execution on current processors: – 6 hours run on 16 Intel Xeon nodes (total of 256 cores) – Huge memory and I/O constraints • Input: 375 GB gzipped data • Reads: 4,288 million strings of length 80 • Substrings of length 30 (in billions): – 218 (potential), 76 (actual), 14 ( interesting ) • Over 2TB of main memory requirements – Streaming pattern • 5-10x more loads than stores – Poor LLC locality • ~15% hit rate; ~5 MPKI 11

HPC Requirements of Genomics Data Analytics Estimate compute power required to analyze Signifincat improvements (several orders of generated genomics data magnitude) are needed to enable population- Assumptions: wise genomics data analytics: – Moore’s Law and Genomics Data Explosion trends Better algorithms and HPC architectures – Same compute efficiency for SMuFIn @ MN3 Population- wise Analytics Source: www.top500.org and B. Berger et al., CACM’16 12

HPC Architectures for Genomics Data-centric architectures for genomics – Near-Memory or Near-Storage Computation • Pattern matching small reads on a huge data set in memory • Computation on very small integer data types (8 bits or less) • Embarrassingly parallel + data set distributed across nodes • MICRON’s Automata; on -board FPGA; Active storage technology 13

HPC Architectures for Genomics Domain-specific Accelerators – GPGPUs to exploit data-level parallelism and high bandwidth – Vector processors • ISA extensions that fit well genomics workloads (AVX512, SVE, ...) • Explore long vectors for energy efficiency – Devise new accelerators for genomics workloads • Exploit on-chip FPGAs and build custom accelerators 14

Conclusions Genome sequencing is becoming faster and cheaper following an exponential growth – Population-wise sequencing will be a reality in the next 5- 10 years Data analytics based on sequenced human genomes require a significant computation power and suffer inefficient execution (memory and I/O-bound) – Only relying on Moore’s Law won’t provide enough compute power to perform genomic data analytics at a population level Novel algorithms, HPC architectures and accelerators will be required to achieve such challenge 15

Thanks to… Computational Genomics research group at BSC – David Torrents (group leader) – Romina Royo Data-Centric Computing research group at BSC – David Carrera (group leader) – Jordà Polo 16

www.bsc.es Are Next-Generation HPC Systems Ready for Population-level Genomics Data Analytics? Calvin Bulla, Lluc Alvarez and Miquel Moretó AACBB Workshop, 24/02/2018

Are Next-Generation HPC Systems Ready for Population-level Genomics - PowerPoint PPT Presentation

www.bsc.es Are Next-Generation HPC Systems Ready for Population-level Genomics Data Analytics? Calvin Bulla, Lluc Alvarez and Miquel Moret AACBB Workshop, 24/02/2018 Genome Sequencing Explosion Faster-than- Moores -Law growth! Whole Human

HPC @ SAO S.G. Korzennik - SAO HPC Analyst hpc@cfa February 2013 SGK ( hpc@cfa ) HPC @ SAO

Uni.lu HPC School 2020 PS6: HPC Containers: Singularity Uni.lu High Performance Computing (HPC)

The HPC Skill Tree A Brief Overview Kai Himstedt On Behalf of the HPC-CF Board BoF:

Whats new in HPC? Gregory Bauer To keep up-to-date on HPC HPC Guru -

UL HPC School 2017[bis] PS1: Getting Started on the UL HPC platform UL High Performance

UL HPC School 2017 PS5: Advanced Scheduling with SLURM and OAR on UL HPC clusters UL High

UL HPC School 2017 PS1: Getting Started on the UL HPC platform UL High Performance Computing

CONTAINERS DEMOCRATIZE HPC CJ Newburn, Principal Architect for HPC, NVIDIA GTC19 S9525 -

Computer Security Summer Scholars 2016 Ma7 Vander Werf HPC System Administrator Security in HPC

Building a Grid System for HPC HPC on Grid High Performance Computing (HPC): Use of computer

HPC IN EUROPE Organisation of public HPC resources Context Focus on publicly-funded HPC

HPC platforms @ UL Overview (as of 2013) and Usage http://hpc.uni.lu S. Varrette, PhD.

HPC platforms @ UL Overview (as of 2013) and Usage http://hpc.uni.lu S. Varrette, H. Cartiaux

MATLAB on UL HPC Checkpointing & parallel execution UL High Performance Computing (HPC) Team

building software with ease kenneth.hoste@ugent.be HPC UGENT About HPC UGent: central

UL HPC School 2017 PS9: [Advanced] Prototyping with Python UL High Performance Computing (HPC)

Ian Gilmore Chair, UK Alcohol Health Alliance President, British Society of Gastroenterology

evaluation of dissolution profile comparisons in support of minor/moderate product quality changes

Voluntary Sector Perspective on Malnutrition in Older People 29th September 2015 What can the

Medical Indemnity Forum 24 th August Issues and Trends in Risk Management Heather Martin Risk

Pseudonymisation https://bit.ly/2OyWD2u C edric Lauradoux November 22, 2019 Personal data

Electronics 16-1a Semiconductors They collect a positive electric charge on a small

Thin-Film PV Technologies III-V PV Technology Week 5.1 Arno Smets ` (Source: NASA) III V

Charge Extraction Lecture 9 10/06/2011 MIT Fundamentals of Photovoltaics 2.626/2.627 Fall

Are Next-Generation HPC Systems Ready for Population-level Genomics - PowerPoint PPT Presentation

www.bsc.es Are Next-Generation HPC Systems Ready for Population-level Genomics Data Analytics? Calvin Bulla, Lluc Alvarez and Miquel Moret AACBB Workshop, 24/02/2018 Genome Sequencing Explosion Faster-than- Moores -Law growth! Whole Human

HPC @ SAO S.G. Korzennik - SAO HPC Analyst hpc@cfa February 2013 SGK ( hpc@cfa ) HPC @ SAO

Uni.lu HPC School 2020 PS6: HPC Containers: Singularity Uni.lu High Performance Computing (HPC)

The HPC Skill Tree A Brief Overview Kai Himstedt On Behalf of the HPC-CF Board BoF:

Whats new in HPC? Gregory Bauer To keep up-to-date on HPC HPC Guru -

UL HPC School 2017[bis] PS1: Getting Started on the UL HPC platform UL High Performance

UL HPC School 2017 PS5: Advanced Scheduling with SLURM and OAR on UL HPC clusters UL High

UL HPC School 2017 PS1: Getting Started on the UL HPC platform UL High Performance Computing

CONTAINERS DEMOCRATIZE HPC CJ Newburn, Principal Architect for HPC, NVIDIA GTC19 S9525 -

Computer Security Summer Scholars 2016 Ma7 Vander Werf HPC System Administrator Security in HPC

Building a Grid System for HPC HPC on Grid High Performance Computing (HPC): Use of computer

HPC IN EUROPE Organisation of public HPC resources Context Focus on publicly-funded HPC

HPC platforms @ UL Overview (as of 2013) and Usage http://hpc.uni.lu S. Varrette, PhD.

HPC platforms @ UL Overview (as of 2013) and Usage http://hpc.uni.lu S. Varrette, H. Cartiaux

MATLAB on UL HPC Checkpointing &amp; parallel execution UL High Performance Computing (HPC) Team

building software with ease kenneth.hoste@ugent.be HPC UGENT About HPC UGent: central

UL HPC School 2017 PS9: [Advanced] Prototyping with Python UL High Performance Computing (HPC)

Ian Gilmore Chair, UK Alcohol Health Alliance President, British Society of Gastroenterology

evaluation of dissolution profile comparisons in support of minor/moderate product quality changes

Voluntary Sector Perspective on Malnutrition in Older People 29th September 2015 What can the

Medical Indemnity Forum 24 th August Issues and Trends in Risk Management Heather Martin Risk

Pseudonymisation https://bit.ly/2OyWD2u C edric Lauradoux November 22, 2019 Personal data

Electronics 16-1a Semiconductors They collect a positive electric charge on a small

Thin-Film PV Technologies III-V PV Technology Week 5.1 Arno Smets ` (Source: NASA) III V

Charge Extraction Lecture 9 10/06/2011 MIT Fundamentals of Photovoltaics 2.626/2.627 Fall

MATLAB on UL HPC Checkpointing & parallel execution UL High Performance Computing (HPC) Team