GPU Accelerated Virtual Cell Biology and SIMD Enhanced High - PowerPoint PPT Presentation

GPU Accelerated Virtual Cell Biology and SIMD Enhanced High Throughput Computational Biology Narayan Ganesan Assistant professor at the Department of Electrical and Computer Engineering Hanyu Jiang Ph.D. student, research assistant at the Department of Electrical and Computer Engineering

PART I: Agent Based Virtual Cell Biology

Advantages of Process Simulation in 3D Space • Serves as a computational microscope into the behavior of the cell. • Helps observe behavioral patterns, such as expected time to DNA transcription, induced variance in protein translation and decay. • Helps model and study noise in biological processes. • Helps study cross talk between several large and complex pathways. 3

Computational Tasks Involved Each particle maintains its own identity and attributes. • The list of interactions between the different chemical species are specified at the • start of the simulation. The particles are allowed to diffuse independently, at the given rate of diffusion and • given variance in their velocities. (random walk in 3D space) When two particles that can react with each other come within the vicinity of each • other (radius of reaction), then the scheduler schedules a reaction between them and marks the particles as inactive. The reaction is executed wherein new particles (products of the reaction) are added • to the system, along with their new identity and attributes.

Computational Challenge: Parallel Selection • Each thread is assigned to a particle along with its identity and attributes. • Thus each particle is independent and autonomous agent within the 3D space. • Feasible list returns the set of feasible neighboring particles that it can react with based on all the reactions within the system. Reactions + = + = + = Feasible list generated will include particles feasible for all reactions.

Computational Challenge: Parallel Selection Inconsistent reaction selection Consistent reaction selection Feasible List Generation Computational Workgroup of threads

Algorithm For Consistent Parallel Selection 1) Build the feasibleList for each particle, which is a subset of neighborList and contains the set of particles capable of reacting with the current particle. 2) Sort the feasibleList according to the Euclidean metric in order to set the reaction priority. 3) Each particle selects the first available particle in its sorted feasibleList for reaction. 4) If the selection is mutual then schedule the corresponding reaction in the reaction pipeline and mark the particle as not available for any more selections. – else mark the particle as still available for selection by other particles. – Perform steps 3) and 4) until converged or no more available particles in the feasibleList . Algorithm Converges within 6 iterations.

Example: JAK-STAT Signaling Mechanism 1. JAK binds to IFN- γ receptor and forms IFNR-JAK complex (RJ). 2. IFN- γ binds to extra cellular domain of RJ complex and forms IFNRJ complex. 3. Dimerization of IFNRJ leads to formation of IFNRJ2. 4. IFNRJ2 is phosphorylated and IFNRJ2* is formed. 5. STAT1c binds to IFNRJ2* and is phosphorylated (STAT1c*). 6. Phosphorylated STAT1c (STAT1c*) forms a homo-dimer (STAT1c*- STAT1c*). 7. Homo-dimer (STAT1c*-STAT1c*) are trans-located to nucleus (STAT1n*-STAT1n*). 8. STAT1n*-STAT1n* works as a transcription factor. 9. SOCS1 is induced by JAK/STAT pathway. 10. SOCS1 binds to the activated receptor (IFNRJ2*) and inhibits its activity.

ODEs for JAK-STAT Signaling Pathway

Computing Framework – Input Config. File #---------------------------------------------------------------- Regions # Regionid, x_orig, y_orig, z_orig, x_length, y_length, z_length #---------------------------------------------------------------- 4 0.0 0.0 57.0 60.0 60.0 3.0 #extracellular medium 3 0.0 0.0 54.0 60.0 60.0 3.0 #cellplasma membrane 2 0.0 0.0 8.0 60.0 60.0 46.0 #cytoplasm 1 0.0 0.0 5.0 60.0 60.0 3.0 #nuclear membrane 0 0.0 0.0 0.0 60.0 60.0 5.0 #nucleus # all concentrations are in nM/L. 1nM/L = 602.3*VOL*conc parts in Cell, VOL = 3.3 ncc. #---------------------------------------- #----------------------------------------------- Reagents Reactions # Reagent inertia, init_cond, region_id # reaction, forward_rate, reverse_rate #---------------------------------------- #----------------------------------------------- R, 0.5, 12.0 IFNRJ2 = IFNRJ2x, 0.005, 0.0, JAK, 0.5, 12.0 IFNRJ2x + STAT1c = IFNRJ2x-STAT1c, 1.0, 0.1, RJ, 0.5, 0.0 IFNRJ2x-STAT1c = IFNRJ2x + STAT1cx, 0.4, 0.0, IFN, 0.5, 15.0 IFNRJ2x + STAT1cx = IFNRJ2x-STAT1cx, 1.0, 0.1, IFNRJ, 1.0, 0.0 STAT1cx + STAT1cx = STAT1cx2, 1.0, 0.005, IFNRJ2, 1.0, 0.0 …. IFNRJ2x, 1.0, 0.0 STAT1c, 1.0, 300.0

Process Simulation Framework: Workflow Input: 3D trajectory and snapshots Process Configuration of particles within the Simulation Time to biological cell. Framework simulate Particle Concentrations Sample Output: Step R JAK RJ IFN IFNRJ IFNRJ2 IFNRJ2x STAT1c 1 2 3 4 5 6 7 8 9 0 24140 24140 0 60350 0 0 0 603504 . . . 10015913 15913 153 52276 584 3649 15 603372 . . . 2008855 8855 69 45134 666 6928 28 602761 . . . 3005963 5963 25 42198 768 7967 69 601335 . . . 4004485 4485 25 40720 700 8351 72 599184 . . . . . .

GPU Enabled Virtual Cell Biology Simulation The particle concentration is output as a function of time.

Performance and Scalability Weak Scaling w.r.t. number of Processors Strong Linear Scaling w.r.t. number of agents 13

Part II: SIMD Enhanced Protein Motif Detection

Hidden Markov Model and hmmsearch of HMMER Each Sample Path follows a set of predefined transition probabilities between the states 15

HMM Model & Sequence Database HMM model Protein sequence database 16

Dependencies and Computational Hotspot • Match states: 𝐽 (𝑗 − 1, 𝑘 − 1) + 𝑈 𝐽𝑁 (𝑘 − 1, 𝑘) , 𝑊 𝑁 𝑗, 𝑘 = 𝜁 𝑆 𝑗 , 𝑁 𝑘 + max { 𝑊 𝑁 (𝑗 − 1, 𝑘 − 1) + 𝑈 𝑁𝑁 (𝑘 − 1, 𝑘), 𝑊 𝑊 𝐸 (𝑗 − 1, 𝑘 − 1) + 𝑈 𝐸𝑁 (𝑘 − 1, 𝑘) , 𝐶 + 𝑈 𝐶𝑁 (𝑁 𝑘 ) } • Insert states: 𝑁 (𝑗 − 1, 𝑘) + 𝑈 𝑁𝐽 (𝑘, 𝑘), 𝑊 𝐽 𝑗, 𝑘 = max { 𝑊 𝑊 𝐽 (𝑗 − 1, 𝑘) + 𝑈 𝐽𝐽 (𝑘, 𝑘)} • Delete states: 𝑁 (𝑗, 𝑘 − 1) + 𝑈 𝑁𝐸 (𝑘 − 1, 𝑘), 𝑊 𝐸 𝑗, 𝑘 = max { 𝑊 𝑊 𝐸 (𝑗, 𝑘 − 1) + 𝑈 𝐸𝐸 (𝑘 − 1, 𝑘)} 17

How the computational kernel looks like… Match • HMM states Insert • 1 M MSV needs Match score and X E of X E • Delete • previous row 1 Viterbi needs adjacent Delete • One sequence score in current row Dependence on X E impose a row • major order computation Maximum probability that the sequence was generated by the model: N O(MxN) 18

Multi-tiered Parallel Framework for Acceleration 19

Detail #1: Synchronize-free Execution One warp pick up one • Warp #1 sequence Sequence Database Once done, move to next • Done Warp #2 schedule automatically Eliminate block-scoped • Warp #3 __syncthreads() caused by: Intra-states dependency of • HMM model Unbalance sequence data • Keep threads active • High throughput • 20

Detail #2: Striped Layout vs. Sequential Layout Sequential Layout Striped Layout • • Straightforward Only one reordering request per DP row • • Private data dependence across adjacent threads All parallel execution • • More sequential overhead and thread idling • 21

Detail #3: PTX assembly for Reordering Reorder 128 scores within one warp • Shifting • Exchange (Intra-warp shuffle) • Merge • Ready to go next! • 22

Detail #4: PTX assembly for Max-Reduction Max-reduction • SIMD max • Intra-warp shuffle • broadcast • 23

Benchmark Performance Overused shared memory hurts occupancy Considering cases like pipeline usage and • • and overall performance available registers, more threads/warps may not results in further speedup. Larger capacity of local memory for each • Reversely, it may bring stalling and register thread available is a good news spills. 24

Benchmark Performance – cont. GCUPS = GigaCell Update Per Second Complex algorithms bring in intensive • • register pressure and off-chip data transfer Larger model, better performance. • Lower hit ratio on L1, L2 and Read-Only • About 5x faster than highly-optimized CPU • caches is the performance killer implementations. 25

Acknowledgements • NVIDIA-Professor Partnership • Xilinx University Program (XUP) • Stevens Institute of Technology Start-up Foundation Contact Information • Narayan Ganesan Email: nganesan@stevens.edu • Hanyu Jiang Email: hjiang5@stevens.edu 26

GPU Accelerated Virtual Cell Biology and SIMD Enhanced High - PowerPoint PPT Presentation

GPU Accelerated Virtual Cell Biology and SIMD Enhanced High Throughput Computational Biology Narayan Ganesan Assistant professor at the Department of Electrical and Computer Engineering Hanyu Jiang Ph.D. student, research assistant at the

Bacteria Without a Cell Wall L-forms Pros & Cons of Cell Wall Cell membrane Cell wall DNA

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe Eaton Ph.D. Accelerated

GPU-Accelerated GPU-Accelerated Large Vocabulary Continuous Speech Recognition Large

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

MOLECULAR CELL BIOLOGY CONCENTRATION Director: Dr. Alexander M. Ishov Associate Director: Dr.

Picture This! Visualization on GPU Accelerated Supercomputers Peter Messmer, 11/15/2016 NVIDIA

GPU-accelerated similarity searching in a database of short DNA sequences Richard Wilton

Cell Communication and Cell Signaling Why is cell signaling important? Why is cell signaling

Accelerated Reader What is Accelerated Reader? Accelerated Reader is the number one software

2. Cellular and Molecular Biology 2.1 Cell Structure 2.2 Transport Across Cell Membranes 2.3

GPU-Accelerated Particle-in-cell Code on Minsky IWOPH17, ISC, Frankfurt a. M. Andreas Herten ,

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Does God play dice with the cell? Does God play dice with the cell? Does God play dice with the

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Jorge

Computing at and Grid Application for Belle Experiment S. Nishida (KEK) ISGC2006 @ Academia

Transforming Cook County Changing the game with shared services and open government What is

COVID-19 Medical Practice Priority Checklist Tips from leaders on the front lines in Washington

Automating Knowledge Flows by Integrating Workflow and Knowledge Discovery Techniques Surendra

Public Meeting : July 14, 2015 Wegmans Conference Center Welcome from Co-Chairs & Lt.

Public Meeting: 4/2/2014 Wegmans Conference Center Welcome from Co-Chairs REDC/CFA Round

Public Meeting 6/25/2014 Wegmans Conference Center Welcome from Co-Chairs CFA Round IV

Why Teacher Preparation Matters Effective Teachers There is an increasing body of

Sambuz

Useful Links

Newsletter

Mail Us

GPU Accelerated Virtual Cell Biology and SIMD Enhanced High - PowerPoint PPT Presentation

GPU Accelerated Virtual Cell Biology and SIMD Enhanced High Throughput Computational Biology Narayan Ganesan Assistant professor at the Department of Electrical and Computer Engineering Hanyu Jiang Ph.D. student, research assistant at the

Bacteria Without a Cell Wall L-forms Pros &amp; Cons of Cell Wall Cell membrane Cell wall DNA

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe Eaton Ph.D. Accelerated

GPU-Accelerated GPU-Accelerated Large Vocabulary Continuous Speech Recognition Large

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

MOLECULAR CELL BIOLOGY CONCENTRATION Director: Dr. Alexander M. Ishov Associate Director: Dr.

Picture This! Visualization on GPU Accelerated Supercomputers Peter Messmer, 11/15/2016 NVIDIA

GPU-accelerated similarity searching in a database of short DNA sequences Richard Wilton

Cell Communication and Cell Signaling Why is cell signaling important? Why is cell signaling

Accelerated Reader What is Accelerated Reader? Accelerated Reader is the number one software

2. Cellular and Molecular Biology 2.1 Cell Structure 2.2 Transport Across Cell Membranes 2.3

GPU-Accelerated Particle-in-cell Code on Minsky IWOPH17, ISC, Frankfurt a. M. Andreas Herten ,

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO &amp; Co-founder Blagovest Taskov, RT GPU Team

Does God play dice with the cell? Does God play dice with the cell? Does God play dice with the

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Jorge

Computing at and Grid Application for Belle Experiment S. Nishida (KEK) ISGC2006 @ Academia

Transforming Cook County Changing the game with shared services and open government What is

COVID-19 Medical Practice Priority Checklist Tips from leaders on the front lines in Washington

Automating Knowledge Flows by Integrating Workflow and Knowledge Discovery Techniques Surendra

Public Meeting : July 14, 2015 Wegmans Conference Center Welcome from Co-Chairs &amp; Lt.

Public Meeting: 4/2/2014 Wegmans Conference Center Welcome from Co-Chairs REDC/CFA Round

Public Meeting 6/25/2014 Wegmans Conference Center Welcome from Co-Chairs CFA Round IV

Why Teacher Preparation Matters Effective Teachers There is an increasing body of

Sambuz

Useful Links

Newsletter

Mail Us

Bacteria Without a Cell Wall L-forms Pros & Cons of Cell Wall Cell membrane Cell wall DNA

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Public Meeting : July 14, 2015 Wegmans Conference Center Welcome from Co-Chairs & Lt.