Fast Binding Site Mapping using GPUs and CUDA Bharat Sukhwani - PowerPoint PPT Presentation

Fast Binding Site Mapping using GPUs and CUDA Bharat Sukhwani Martin C. Herbordt Computer Architecture and Automated Design Laboratory Department of Electrical and Computer Engineering Boston University http://www.bu.edu/caadlab * This work supported, in part, by the U.S. NIH/NCRR

Why Bother? Problem: Combat the bird flu virus Method: Inhibit its function by “gumming up” Neuraminidase, a surface protein, with an inhibitor - Neuraminidase helps release progeny viruses from the cell. Procedure*: - Search protein surface for likely sites - Find a molecule that binds there (and only there) Binding site mapping : - Very compute intensive: Usually run on clusters - GPU based desktop alternative *Landon, et al. Chem. Biol. Drug Des 2008 # # From From New Scientist New Scientist www.newscientist.com/channel/health/bird www.newscientist.com/channel/health/bird- -flu flu 2

Outline � Overview of Binding Site Mapping � Rigid Docking � Energy Minimization � Overview of NVIDIA GPUs / CUDA � Rigid Docking on GPU � Energy Minimization on GPU � Results 3

Binding Site Mapping Purpose: Identification of hot spots Significance: Very effective for drug-discovery Rationale: � Hot spots are major contributors to the binding energy � They bind a large variety of small molecules Process: Docking small probes � Rigid Docking � Energy Minimization 4

Mapping: Two Step Process � Rigid Docking of Probes into Protein � Grid-based computation � Exhaustive 6D search � Find an approximate conformation Good fit Collision Poor fit � Local refinement – Energy Minimization � Model the flexibility in the side-chains 5

FTMap* � 16 small molecule probes � Dock each probes into the protein � 500 rotations – 10 6 translations per rotation � 30 minutes on a single CPU � Energy minimize 2000 conformations per protein-probe complex � Up to 30 seconds per conformation � 16 hours per probe! * Brenke R, Kozakov D, Chuang G-Y, Beglov D, Mattos C, and Vajda S. Fragment-based identification of druggable "hot spots" of proteins using Fourier domain correlation, Bioinformatics. 6

NVIDIA GPU Architecture Streaming Processor (SP) Streaming Multiprocessor (SM) Device Memory NVIDIA Tesla C1060 Architecture �� 4 GB Device memory � �� * Source: NVIDIA Corporation 8

Memory Hierarchy 100 GB/s 1000 GB/s 3 GB/s CPU Main Device Memory Shared Memory Memory Register Read Constant Cache On-board On-chip * Source: NVIDIA Corporation 9

CUDA Programming Model Thread Threads within a block can be synchronized On-chip Block of Threads Different blocks must be independent On-board Grid of Blocks * Source: NVIDIA Corporation 10

Rigid Docking: Procedure Protein Probe Rotation Grid Assignment Pose Score: 3D FFT Correlation Scoring and Filtering 12

PIPER Rigid Docking Program 2.4% 2.3% 2.3% � Structural Bioinformatics lab at BU 93% � Complex energy functions � Top scorer in CAPRI * challenge Rotation + Grid FFT Correlation Accumulation Scoring and Filtering E E w E w E = + + shape 2 elec 3 desol E shape = E attr + w 1 E repul Perform once Repeat for each rotation Read Receptor and Rotate ligand grid by Ligand files next incremental angle Read parameter, rotation E E E and coefficients Repeat for each of (P + 4) grids = + elec born coulomb Perform forward FFT Compute FFT size on ligand grid Modulate the transformed receptor and Create receptor grids for different P 1 − ligand grids energy functions � E E = Perform inverse FFT Perform (P + 4) desol pairpot _ k on product grid forward FFTs k 0 = Accumulate pairwise potential Compute complex product grids conjugate of FFT grids Up to 22 FFT correlations Create ligand grids for Perform weighted different energy functions are required scoring and filtering Best Fit * Janin, J., Henrick, K., Moult, J., Eyck, L., Sternberg, M., Vajda, S., Vakser, I., and Wodak, S. CAPRI: A 13 critical assessment of predicted interactions. Proteins, 52 (2003), 2-9

Rigid Docking on GPUs - Correlation � Direct Correlation (better than FFT!) � For small grid sizes � Replaces FFT, voxel-voxel summation, IFFT SMP SMP SMP � Each multiprocessor accesses both Shared Shared Shared Memory Memory Memory the grids � Protein grid on the global memory Global Memory � Probe grid duplicated on shared memories �� Multiple correlations together �� Voxel represents multiple energy functions 14

Direct Correlation on GPUs SMP SMP SMP � Shared memory limits the probe size Shared Shared Shared Memory Memory Memory � With 8 correlations – 8 cubed � Probe grids are typically 4 cubed Global Memory � Multiple rotations together � 8 rotations SMP � Effectively loop-unrolling Shared Memory � Multiple computations per global memory fetch � 2.7x additional performance improvement 15

Direct Correlation on GPUs � Distribution of work among threads / blocks � Scheme 1: Entire 2D-plane to a thread block � Scheme 2: Part of the 2D-plane to a thread block � Both yield similar results Result grid SMP SMP SMP SMP SMP SMP SMP SMP 16

Scoring and Filtering on GPUs � Score Computation N 3 Scores N 3 � Divide work among different threads M T 0 T 1 T 2 T M-2 T M-1 � Sync and Serialize to find the best-of- the-best Shared Memory � Only one multiprocessor utilized T 0 Best Score � Flagging for exclusion (N 3 entries) � Serial code – Exclusion bit-vector 1 1 0 0 0 1 0 (100 entries) � GPU Solution 1 – Exclusion index array 4 5 16 28 45 � GPU Solution 2 – Exclusion bit-vector on (N 3 entries) GPU global memory 1 1 0 0 0 1 0 17

Energy Minimization � Minimizing energy between two molecules � Iterative process Convergence? � Optimization moves � Used to model flexible side chains � �� !�� !�� "�#��! � �#$�� !�� # N-body problem with a cut-off "�#�� #�#%"�#�� 19

Looks like MD, but it’s not � Performed on a local region � Many fewer atoms, typically few thousand Different � Much smaller atom neighborhoods geometry � Very small cut-off radius � Move to the next position � Coordinate adjustments - No motion / velocity updates � No cell-lists / efficient filtering Different � Refinement step; close to dest. - small motions computations � Neighbor lists are very sparse, with non- uniform distribution 20

Energy Minimization step of FTMap �� FTMap Minimization Step Energy evaluation phase � �� !�� !�� "�#��! � �#$�� !�� # Absolute time ~ 10 ms per iteration (on a single core) #�#%"�#�� "�#�� 21

FTMap Electrostatics Model � �� !�� !�� "�#��! � �#$�� !�� # Analytic Continuum Electrostatics (ACE) Atom Self Energy: Electrostatic energy due to the charge itself � � 2 ~ r 4 � � � � 2 ik 2 2 3 q q − � � q V r � τ τ 2 � � self self i � � self σ i i k ik E E E e ik = + � � = + i ik ik 4 4 � � 2 R 8 r ε ω π + µ s i k i ik ik ik ≠ Pairwise interaction – Generalized Born eqn.: Electrostatic energy due to the presence of other charges q q q q � � i j i j int E 332 166 = − τ ij � � r 2 r ij � � j i j i ≠ ≠ ij � � − 4 � α α � 2 i j r e + α α ij i j Born Radii – depends on E self 22

FTMap Data Structure - Neighbor Lists First Atoms Second Atoms Atoms List Self Energy 2 0 0 1 1 6 Cycle through 1 st 1 11 7 2 14 1 2 11 atoms – update 3 4 2 3 0 partial energies of 5 both 2 4 14 15 3 12 12 n-1 4 5 Random updates for second atoms � � Can’t distribute the atoms list across multiprocessors • Memory conflicts during Write conflicts � updates • Serialization during � Second atom might appear in multiple lists accumulation Not suitable for parallel implementations � 23

Fast Binding Site Mapping using GPUs and CUDA Bharat Sukhwani - PowerPoint PPT Presentation

Fast Binding Site Mapping using GPUs and CUDA Bharat Sukhwani Martin C. Herbordt Computer Architecture and Automated Design Laboratory Department of Electrical and Computer Engineering Boston University http://www.bu.edu/caadlab * This work

CUDA/Ada An Ada binding to CUDA Reto B urki, Adrian-Ken R uegsegger University of Applied

Outline Overview Parallel Computing with GPU Introduction to CUDA CUDA Thread Model

Lecture 2.1 - Introduction to CUDA C CUDA C vs. Thrust vs. CUDA Libraries Objective To learn

Introduction to CUDA C What is CUDA? CUDA Architecture Expose general-purpose GPU

Texture and other Mappings Texture Mapping Texture Mapping Bump Mapping Bump Mapping

Attention, Binding, and Consciousness 1. Perceptual binding, dynamic binding 2. Neural

Attention, Binding, and Consciousness 1. Perceptual binding, dynamic binding 2. Neural

The Binding Problem(s) 8/25/2010 9:38 AM Jerome Feldman Abstract The neural binding problem

Image Warping Image Mapping Image Mapping - Examples Forward Mapping Forward Mapping -

TEXTURE MAPPING 1 OUTLINE Introduce Mapping Methods Texture Mapping Environment

Late binding Ch 15.3 Highlights - Late binding for variables - Late binding for functions

Why use GPUs for graph processing? FOSDEM 2020 2 GPUs and Graphs Graphs GPUs Found

GPU Programming Alan Gray EPCC The University of Edinburgh Overview Motivation and need

A High-Level Intro to CUDA CS5220 Fall 2015 What is CUDA? C ompute U nified D evice A

Lecture 2.4 Introduction to CUDA C Introduction to the CUDA Toolkit Objective To become

CS4402-9535: Many-core Computing with CUDA Marc Moreno Maza University of Western Ontario,

UE Nikon _ Nga Tran Anh Hang , Hiroko Kobayashi, Yu Sawai, Paulo Quaresma Outline Introduction

Stability Assessment of Tree Ensembles and Psychotrees Using the stablelearner 1 package Lennart

Mental Health Courts: Solving Criminal Justice Problems or Perpetuating Criminal Justice

Talking about complex substance use, chem sex, and serious mental health problems

Constitutions and Government Responses to Financial Crises Ragnhildur Helgadttir There are

Introduction to Information Retrieval http://informationretrieval.org IIR 13: Text Classification

Purifan Clean Air Office Wellness Program Purifans Clean Air Wellness Program Offers a

Teaching Students How to Apply Teaching Students How to Apply GIS Technology to Improve GIS