Using a CUDA-Accelerated PGAS Model on a GPU Cluster for - PowerPoint PPT Presentation

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Jorge González-Domínguez Parallel and Distributed Architectures Group Johannes Gutenberg University of Mainz, Germany j.gonzalez@uni-mainz.de GTC 2015

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Overview of the Problem 1 Intra-GPU Parallelization with CUDA 2 3 Inter-GPU Parallelization with UPC++ Experimental Evaluation 4 Conclusions 5

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Overview of the Problem Overview of the Problem 1 Intra-GPU Parallelization with CUDA 2 Inter-GPU Parallelization with UPC++ 3 Experimental Evaluation 4 5 Conclusions

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Overview of the Problem Genome-Wide Association Studies (I) Analyses of genetic influence on diseases

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Overview of the Problem Genome-Wide Association Studies (I) Analyses of genetic influence on diseases M individuals

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Overview of the Problem Genome-Wide Association Studies (I) Analyses of genetic influence on diseases M individuals K cases

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Overview of the Problem Genome-Wide Association Studies (I) Analyses of genetic influence on diseases M individuals K cases C controls

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Overview of the Problem Genome-Wide Association Studies (I) Analyses of genetic influence on diseases M individuals K cases C controls N genetic markers, Single Nucleotide Polymorphisms (SNPs). 3 genotypes: Homozygous Wild (w, AA, 0) Heterozygous (h, Aa, 1) Homozygous Variant (v, aa, 2)

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Overview of the Problem Genome-Wide Association Studies (II) Cases Controls SNP 1 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 1 SNP 2 0 1 1 0 2 0 0 0 1 2 2 1 0 1 1 2 SNP 3 0 0 0 0 0 0 0 0 1 2 1 1 1 2 1 1 SNP 4 0 1 0 1 0 1 0 1 2 2 2 2 1 1 1 1 SNP 5 0 2 2 2 0 1 1 1 1 0 0 1 1 0 2 2 SNP 6 1 0 1 0 1 0 1 0 1 2 1 2 1 2 2 1

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Overview of the Problem Genome-Wide Association Studies (and III) Definition Two SNPs present epistasis or interaction if: Their joint genotype frequencies show a statistically significant difference between cases and controls which potentially explains the effect of the genetic variation leading to disease. The difference between cases and controls shown by the joint values is significantly higher than using only the individual SNP values.

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Overview of the Problem BOOST BOolean Operation-based Screening and Testing Binary traits Exhaustive search Statistical regression Good accuracy (used by biologists) Returns a list of SNP pairs with high interaction probability Fastest available tool. Intel Core i7 3.20GHz: 40,000 SNPs and 3,200 individuals About 800 million pairs 51 minutes 500,000 SNPs and 5,000 individuals About 125 billion pairs (moderated size) Estimated 7 days

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Overview of the Problem GBOOST CUDA version for GPUs Same accuracy as BOOST 40,000 SNPs and 6,400 individuals About 800 million pairs 28 seconds on a GTX Titan 500,000 SNPs and 5,000 individuals About 125 billion pairs (moderated size) 1 hour on a GTX Titan

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Overview of the Problem GBOOST CUDA version for GPUs Same accuracy as BOOST 40,000 SNPs and 6,400 individuals About 800 million pairs 28 seconds on a GTX Titan 500,000 SNPs and 5,000 individuals About 125 billion pairs (moderated size) 1 hour on a GTX Titan High-throughput genotyping technologies collect few million SNPs of an individual within a few minutes → Expected datasets with 5M SNPs and 10,000 individuals

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Intra-GPU Parallelization with CUDA Overview of the Problem 1 Intra-GPU Parallelization with CUDA 2 Inter-GPU Parallelization with UPC++ 3 Experimental Evaluation 4 5 Conclusions

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Intra-GPU Parallelization with CUDA Calculation of Contingency Tables (I) For each SNP-pair → Number of occurrences of each combination of genotypes Cases SNP2=0 SNP2=1 SNP2=2 SNP1=0 n 000 n 010 n 020 SNP1=1 n 100 n 110 n 120 SNP1=2 n 200 n 210 n 220 Controls SNP2=0 SNP2=1 SNP2=2 SNP1=0 n 001 n 011 n 021 SNP1=1 n 101 n 111 n 121 SNP1=2 n 201 n 211 n 221

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Intra-GPU Parallelization with CUDA Calculation of Contingency Tables (II) SNP 4 0 1 0 1 0 1 0 1 2 2 2 2 1 1 1 1 SNP 6 1 0 1 0 1 0 1 0 1 2 1 2 1 2 2 1 Casos SNP6=0 SNP6=1 SNP6=2 SNP4=0 0 4 0 SNP4=1 4 0 0 SNP4=2 0 0 0 Controles SNP6=0 SNP6=1 SNP6=2 SNP4=0 0 0 0 SNP4=1 0 2 2 SNP4=2 0 1 2

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Intra-GPU Parallelization with CUDA Filtering Stage Epistatic interaction measured via log-linear models All SNP-pairs analyzed The measure is obtained with numerical calculations from the values of the contingency table Pairs with measure higher than a threshold pass the filter They are included in the output file multiEpistSearch uses a faster filter than GBOOST (out of the scope)

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Intra-GPU Parallelization with CUDA CUDA Implementation CUDA Kernel Genotyping information loaded in device memory through pinned copies Each thread performs the whole calculation of independent SNP-pairs Only one kernel for the whole computation Each call to the kernel analyzes a batch of SNP-pairs

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Intra-GPU Parallelization with CUDA CUDA Implementation CUDA Kernel Genotyping information loaded in device memory through pinned copies Each thread performs the whole calculation of independent SNP-pairs Only one kernel for the whole computation Each call to the kernel analyzes a batch of SNP-pairs Optimization Techniques Boolean representation of genotyping information Increase of coalescence Exploitation of shared memory

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Inter-GPU Parallelization with UPC++ Overview of the Problem 1 Intra-GPU Parallelization with CUDA 2 Inter-GPU Parallelization with UPC++ 3 Experimental Evaluation 4 5 Conclusions

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Inter-GPU Parallelization with UPC++ UPC++ (I) Unified Parallel C++ Novel extension of ANSI C++ Y Zheng, A Kamil, M Driscoll, H Shan, and K Yelick. a PGAS Extension for C++ . In Proc. 28th UPC++: IEEE Intl. Parallel and Distributed Processing Symp. (IPDPS’14) , Phoenix, AR, USA, 2014. Follows the Partitioned Global Address Space (PGAS) programming model Single Program Multiple Data (SPMD) execution model Works on shared and distributed memory systems

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for - PowerPoint PPT Presentation

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Jorge Gonzlez-Domnguez Parallel and Distributed Architectures Group Johannes Gutenberg

Outline Overview Parallel Computing with GPU Introduction to CUDA CUDA Thread Model

Lecture 2.1 - Introduction to CUDA C CUDA C vs. Thrust vs. CUDA Libraries Objective To learn

Introduction to CUDA C What is CUDA? CUDA Architecture Expose general-purpose GPU

Programming paradigms using PGAS-based languages Marc Tajchman CEA - DEN/DM2S/SFME/LGLS Monday,

Single-sided PGAS Communications Libraries Overview of PGAS approaches David Henty, Alan Simpson

Lecture 2.4 Introduction to CUDA C Introduction to the CUDA Toolkit Objective To become

How to Write a Parallel GPU Application Using CUDA and Charm++ Presented by Lukasz Wesolowski

Performance Evaluation of a Multithreaded GPU Using CUDA GPU architecture GeForce 8800 GPU

GPU Programming Alan Gray EPCC The University of Edinburgh Overview Motivation and need

2110412 Parallel Comp Arch CUDA: Parallel Programming on GPU Natawut Nupairoj, Ph.D. Department

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe Eaton Ph.D. Accelerated

CUDA/Ada An Ada binding to CUDA Reto B urki, Adrian-Ken R uegsegger University of Applied

Super GPU & Super Kernels: Make programming of multi-GPU systems easy Michael Frumkin, May 8,

GPU-Accelerated GPU-Accelerated Large Vocabulary Continuous Speech Recognition Large

A High-Level Intro to CUDA CS5220 Fall 2015 What is CUDA? C ompute U nified D evice A

Accelerated Solutions with Python and CUDA Luciano Martins Principal Software Engineer Oracle

with Cloudgene and CloudMan Sebastian Schnherr, Lukas Forer, Davor Davidovic, Hansi

Using genomic tools to understand and manage adaptation to climate Sally Aitken Department of

BIOINFORMATICS VOLUME 19 NUMBER 11 JULY 22 2003 Electronic edition

GPS@: Bioinformatics grid portal for protein sequence analysis on EGEE grid Blanchet, C.,

Applications for Heterogeneous Systems: A Case Study Irena Lanc University of Notre Dame At

Thailand Bioinformatics: Research and Applications Sissades T ongsima Bioinformatics

Personalized Health Creating a Birthplace for Discovery Merrifield Suburban Center Study Task

ELIXIR Serena Scollen Head of Human Genomics and Translational Data ELIXIR Hub, Cambridge UK

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for - PowerPoint PPT Presentation

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Jorge Gonzlez-Domnguez Parallel and Distributed Architectures Group Johannes Gutenberg

Outline Overview Parallel Computing with GPU Introduction to CUDA CUDA Thread Model

Lecture 2.1 - Introduction to CUDA C CUDA C vs. Thrust vs. CUDA Libraries Objective To learn

Introduction to CUDA C What is CUDA? CUDA Architecture Expose general-purpose GPU

Programming paradigms using PGAS-based languages Marc Tajchman CEA - DEN/DM2S/SFME/LGLS Monday,

Single-sided PGAS Communications Libraries Overview of PGAS approaches David Henty, Alan Simpson

Lecture 2.4 Introduction to CUDA C Introduction to the CUDA Toolkit Objective To become

How to Write a Parallel GPU Application Using CUDA and Charm++ Presented by Lukasz Wesolowski

Performance Evaluation of a Multithreaded GPU Using CUDA GPU architecture GeForce 8800 GPU

GPU Programming Alan Gray EPCC The University of Edinburgh Overview Motivation and need

2110412 Parallel Comp Arch CUDA: Parallel Programming on GPU Natawut Nupairoj, Ph.D. Department

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe Eaton Ph.D. Accelerated

CUDA/Ada An Ada binding to CUDA Reto B urki, Adrian-Ken R uegsegger University of Applied

Super GPU &amp; Super Kernels: Make programming of multi-GPU systems easy Michael Frumkin, May 8,

GPU-Accelerated GPU-Accelerated Large Vocabulary Continuous Speech Recognition Large

A High-Level Intro to CUDA CS5220 Fall 2015 What is CUDA? C ompute U nified D evice A

Accelerated Solutions with Python and CUDA Luciano Martins Principal Software Engineer Oracle

with Cloudgene and CloudMan Sebastian Schnherr, Lukas Forer, Davor Davidovic, Hansi

Using genomic tools to understand and manage adaptation to climate Sally Aitken Department of

BIOINFORMATICS VOLUME 19 NUMBER 11 JULY 22 2003 Electronic edition

GPS@: Bioinformatics grid portal for protein sequence analysis on EGEE grid Blanchet, C.,

Applications for Heterogeneous Systems: A Case Study Irena Lanc University of Notre Dame At

Thailand Bioinformatics: Research and Applications Sissades T ongsima Bioinformatics

Personalized Health Creating a Birthplace for Discovery Merrifield Suburban Center Study Task

ELIXIR Serena Scollen Head of Human Genomics and Translational Data ELIXIR Hub, Cambridge UK

Super GPU & Super Kernels: Make programming of multi-GPU systems easy Michael Frumkin, May 8,