Accelerating Gene Set Enrichment Analysis on CUDA-Enabled GPUs - - PowerPoint PPT Presentation

▶

Nov 07, 2022 370 likes •557 views

Accelerating Gene Set Enrichment Analysis on CUDA-Enabled GPUs Bertil Schmidt Christian Hundt Contents Gene Set Enrichment Analysis (GSEA) Background Algorithmic details cudaGSEA Performance evaluation GSEA and

SLIDE 1

Accelerating Gene Set Enrichment Analysis on CUDA-Enabled GPUs

Bertil Schmidt Christian Hundt

SLIDE 2

Gene Set Enrichment Analysis (GSEA)

– Background – Algorithmic details

cudaGSEA
Performance evaluation

SLIDE 3

GSEA and Bioinformatics

High throughput technologies generate large-scale

gene expression data sets

– RNA-Seq – Microarrays

GSEA uses annotated gene sets to mine a given gene

expression matrix

– MSigDB contains over 10K signatures each containing around 100 gene identifiers on average

Typical GSEA study:

– identify metabolic pathways that are differentially changed in human type-2 diabetes

SLIDE 4

Gene Set Enrichment Analysis

Reveals correlation between gene

sets and diseases using gene expression data

State-of-the-art tool with over

10,000 citations

Written in (multi-threaded) Java
Highly time consuming

– analyzing 20,639 genes measured in 200 patients with 4,725 pathways and 1M permutations takes around 1 week with GSEA 2.2.2 software on a CPU

We present

– GSEA parallelization on a GPU using CUDA (cudaGSEA) – cudaGSEA around two orders-of- magnitude faster than BroadGSEA

SLIDE 5

GSEA Algorithm – Gene Ranking

Gene expression matrix D obtained from RNA-Seq or Microarray experiments
For each gene i and patient j with associated (binary) phenotype C expression value

D[i,j] is stored

Diseases driven by complex gene interactions  simply reporting top-ranked genes

produce many false positives

Domain experts provides set of genes that might possibly explain observed

phenotypes

SLIDE 6

GSEA Algorithm – Enrichment score

Enrichment score (ES) measure correlation between given gene set S and

calculated gene ranking g(i)

– Report maximum deviation of a running sum (k) – Sum increases if we hit a member of S and decreases otherwise

How significant is ES = 0.857?  p-value calculation using permutation testing

SLIDE 7

GSEA Algorithm – Permuation testing

SLIDE 8

GSEA Algorithm – Permuation testing

SLIDE 9

GSEA Algorithm

|ES|

|ES|
Histogram of 1,000,000 enrichment scores gained by permuting

patient phenotypes

Estimate p-value by counting events in both tails
Why so many permutations?

– When testing 1,000 gene sets at significance level p<0.001 we need more than 1,000,000 samples to reject null hypothesis at 1,000p < 0.001 (Bonferroni correction)

SLIDE 10

CUDA Parallelization

Transpose D to ensure coalesced memory accesses

SLIDE 11

CUDA Parallelization

SLIDE 12

CUDA Parallelization

SLIDE 13

CUDA Implementation Details

Support for single-precision and double-precision
Resulting matrix of enrichment scores (#gene sets x

#permutations) can be large

– e.g. 5K x 1M x 8B = 40GB

p-value estimation, Family-wise error rate (FWER),

normalized enrichment score (NES) computation can be accomplished on the GPU with (sum/max) reduction kernels without the need for storing this matrix

False discovery rate (FDR) computation this matrix is

transferred to the CPU for post-processing

SLIDE 14

cudaGSEA Features

Reading data sets directly in Broad Institute-compatible file

formats

Supporting several local deviation measures

– Mean-based measures (difference/quotient/log-quotient of means) – Mean and standard deviation-based measures (signal to noise- ratio, t-tests, one/two-pass estimation) – Numerically stable summation schemes for local measures and ES (Kahan etc.)

Package for the R framework and standalone application
Multi-threaded CPU version in C++ using OpenMP

SLIDE 15

GSE19429 dataset

– collapsed to 20,639 gene symbols; 200 patients (183 cases + 17 controls)

Hallmark: 50 gene sets

– MSigDB 5.1 smallest gene set collection

GeForce Titan X (single precison) / Tesla K40c (double precision, ECC off), CUDA 7.5
10 core Xeon E5-2660v3@2.60GHz, 20 Threads, Ubuntu 14.04, gcc 4.8.4, 64-bit OpenJDK
BroadGSEA v.2.2.2

Performance Evaluation

SLIDE 16

Performance Evaluation

GSE19429 dataset

– collapsed to 20,639 gene symbols; 200 patients (183 cases + 17 controls)

C2: 4726 gene sets

– MSigDB 5.1 largest gene set collection

GeForce Titan X (single precison) / Tesla K40c (double precision, ECC off), CUDA 7.5
10 core Xeon E5-2660v3@2.60GHz, 20 Threads, Ubuntu 14.04, gcc 4.8.4, 64-bit OpenJDK
BroadGSEA v.2.2.2

SLIDE 17

Conclusion

High-throughput technologies establish the need for

scalable bioinformatics tools that can process large- scale gene expression data sets

CUDA is a suitable technology to address this need
cudaGSEA on one GPU achieves around two orders-of-

magnitude speedup versus BroadGSEA on a CPU

– analyzing 20,639 genes measured in 200 patients with 4,726 pathways and 1M permutations takes around 1 week with GSEA 2.2.2 on a Xeon E5-2660v3 CPU while less than 1 hour on a GeForce Titan X

Source code available at:

– https://github.com/gravitino/cudaGSEA

Group Website:

– https://www.hpc.informatik.uni-mainz.de/

SLIDE 18

Accelerating Gene Set Enrichment Analysis on CUDA-Enabled GPUs

Bertil Schmidt, Christian Hundt Institute of Computer Science Johannes Gutenberg University Mainz {bertil.schmidt, hundt}@uni-mainz.de

Accelerating Gene Set Enrichment Analysis on CUDA-Enabled GPUs

Bertil Schmidt Christian Hundt

Contents

– Background – Algorithmic details

GSEA and Bioinformatics

Gene Set Enrichment Analysis

GSEA Algorithm – Gene Ranking

GSEA Algorithm – Enrichment score

GSEA Algorithm – Permuation testing

GSEA Algorithm – Permuation testing

GSEA Algorithm

|ES|

patient phenotypes

CUDA Parallelization

CUDA Parallelization

CUDA Parallelization

CUDA Implementation Details

#permutations) can be large

– e.g. 5K x 1M x 8B = 40GB

normalized enrichment score (NES) computation can be accomplished on the GPU with (sum/max) reduction kernels without the need for storing this matrix

transferred to the CPU for post-processing

cudaGSEA Features

formats

Performance Evaluation

Performance Evaluation

Conclusion

scalable bioinformatics tools that can process large- scale gene expression data sets

magnitude speedup versus BroadGSEA on a CPU

– analyzing 20,639 genes measured in 200 patients with 4,726 pathways and 1M permutations takes around 1 week with GSEA 2.2.2 on a Xeon E5-2660v3 CPU while less than 1 hour on a GeForce Titan X

– https://github.com/gravitino/cudaGSEA

– https://www.hpc.informatik.uni-mainz.de/

Accelerating Gene Set Enrichment Analysis on CUDA-Enabled GPUs

Thank you!