Toward Real-Time Simulation of Cardiac Dynamics Ezio Bartocci SUNY - PowerPoint PPT Presentation

Toward Real-Time Simulation of Cardiac Dynamics Ezio Bartocci SUNY Stony Brook Joint work with E. Cherry, J. Glimm, R. Grosu, S. A. Smolka, F. Fenton

Outline • Motivation • Cardiac Models as Reaction Diffusion Systems • CUDA Programming Model • Reaction Diffusion in CUDA • Case Studies • Work in Progress

Motivation for our Work Simulation-Based Analysis • Spiral Formation • Spiral Formation • Spiral Formation • Spiral Formation • Spiral Formation • Spiral Breakup • Spiral Breakup • Spiral Breakup • Spiral Breakup • Spiral Breakup • Tip Tracking • Tip Tracking • Tip Tracking • Tip Tracking • Tip Tracking • Front Wave Tracking • Front Wave Tracking • Front Wave Tracking • Front Wave Tracking • Front Wave Tracking • Curvature Analysis • Curvature Analysis • Curvature Analysis • Curvature Analysis • Curvature Analysis • Conduction Velocity • Conduction Velocity • Conduction Velocity • Conduction Velocity • Conduction Velocity • Restitution Analysis • Restitution Analysis • Restitution Analysis • Restitution Analysis • Restitution Analysis

Cardiac Models as Reaction Diffusion Systems Schematic Action Potential Membrane’s AP depends on: • Stimulus (voltage or current): – External / Neighboring cells • Cell’s state voltage AP has nonlinear behavior! Threshold • Reaction diffusion system: s u l failed initiation u m ∂ u i t S ∂ t = R ( u ) + ∇ ( D ∇ u ) Resting potential time Behavior Reaction Diffusion In time

Cardiac Models • Minimal Model (Flavio-Cherry) (4 v) Human • Beeler-Reuter (8 v) Canine • Ten Tussher Panfilov (19 v) Human • Iyer (67 v) Human

Available Technologies CPU based GPU based The GPU devotes more transistors to data processing This image is from CUDA programming guide

GPU vs CPU

GPU Architecture MULTIPROCESSORS Each GPU consists of a Registers Registers Registers Registers Set of multiprocessors. SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP Shared Memory Shared Memory Shared Memory Shared Memory Constant Cache Texture Cache Global Memory

GPU Architecture MULTIPROCESSORS Each Multiprocesssor Registers Registers Registers Registers can have 8/32 Stream Processors (SP) (called SP SP SP SP SP SP SP SP by NVIDIA also cores) SP SP SP SP SP SP SP SP which share access to local memory. SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP Shared Memory Shared Memory Shared Memory Shared Memory Constant Cache Texture Cache Global Memory

GPU Architecture MULTIPROCESSORS Each Stream Processor Registers Registers Registers Registers (core) contains a fused multiply-adder capable SP SP SP SP SP SP SP SP of single precision arithmetic. It is capable SP SP SP SP SP SP SP SP of completing 3 floating SP SP SP SP SP SP point operations per SP SP cycle - a fused MADD SP SP SP SP SP SP SP SP and a MUL. Shared Memory Shared Memory Shared Memory Shared Memory Constant Cache Texture Cache Global Memory

GPU Architecture MULTIPROCESSORS Registers Registers Registers Registers SP SP SP SP SP SP SP SP Each multiprocessor can SP SP SP SP SP SP SP SP contain one or more 64- bit fused multiple adder SP SP SP SP SP SP SP SP for double precision operations. SP SP SP SP SP SP SP SP Shared Memory Shared Memory Shared Memory Shared Memory Constant Cache Texture Cache Global Memory

Memory Hierarchy MULTIPROCESSORS The fastest available Registers Registers Registers Registers Memory for GPU computation is device SP SP SP SP SP SP SP SP registers. Each multiprocessor contains SP SP SP SP SP SP SP SP 16KB of registers. … SP SP SP SP SP SP SP SP The registers are partitioned among the SP SP SP SP SP SP SP SP MP-resident threads Shared Memory Shared Memory Shared Memory Shared Memory Constant Cache Texture Cache Global Memory

Memory Hierarchy MULTIPROCESSORS Registers Registers Registers Registers SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP … SP SP SP SP SP SP SP SP Shared memory (16KB) is primarily intended SP SP SP SP SP SP SP SP as a means to provide fast communication Shared Memory Shared Memory Shared Memory Shared Memory between threads of the executed by the same multiprocessor, although, Constant Cache due to its speed, it can also be used as a Texture Cache programmer controlled memory cache. Global Memory

Memory Hierarchy MULTIPROCESSORS Registers Registers Registers Registers SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP … SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP Shared Memory Shared Memory Shared Memory Shared Memory Constant Cache GPUs have also DRAM The latency is 150x is slower Texture Cache then registers and shared memory Global Memory

Memory Hierarchy MULTIPROCESSORS Registers Registers Registers Registers SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP … SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP Shared Memory Shared Memory Shared Memory Shared Memory Constant memory, as the Constant Cache name implies, is a read-only region which also has a small cache. Texture Cache Global Memory

Memory Hierarchy MULTIPROCESSORS Registers Registers Registers Registers SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP … SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP Shared Memory Shared Memory Shared Memory Shared Memory Texture memory is read-only with a small cache optimized Constant Cache for manipulation of textures. It also provides built-in linear Texture Cache interpolation of the data. also provides built-in linear interpolation of the data. Global Memory

Memory Hierarchy MULTIPROCESSORS Registers Registers Registers Registers SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP … SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP Shared Memory Shared Memory Shared Memory Shared Memory Constant Cache Texture Cache Global memory is available to all threads and is persistent between GPU calls. Global Memory

CUDA Programming Model Single Instruction Multiple Threads (SIMT) similar to CPU Host GPU Device Single Instruction Multiple Data (SIMD) Grid k THREAD ID 0 THREAD ID 1 THREAD ID 2 THREAD ID 3 Serial Code i A[ID]=ID A[ID]=ID A[ID]=ID A[ID]=ID Block (0,1) Block (0,0) if (ID%2) if (ID%2) if (ID%2) if (ID%2) Block (1,0) Block (1,1) A[ID]+=2; A[ID]+=2; A[ID]+=2; A[ID]+=2; A[ID]*=2; A[ID]*=2; A[ID]*=2; A[ID]*=2; Kernel else else else else Block (2,1) Block (2,0) A[ID]=0; A[ID]=0; A[ID]=0; A[ID]=0; Invocation endif endif endif endif A[ID]+=2; A[ID]+=2; A[ID]+=2; A[ID]+=2; Block (3,0) Block (3,1) Block (4,0) Block (4,1) 6 2 10 2 2 0 1 0 4 2 3 0 4 0 0 1 8 2 0 3 4 0 8 0 Serial Code j Block (5,1) Block (5,0) A vector When branches occur in the code (e.g. due to if statements) the divergent When execution merges, the threads can threads will become inactive until the conforming threads complete their continue to operate in parallel. separate execution.

CUDA Programming Model Different threads are multiplexed and executed by the same core GRID BLOCK in order to reduce the latency of memory access. Each Thread block is executed by a multiprocessors The max number of threads for a thread block is 512 and it depends on the amount of registers that each thread may need. GPU DEVICE THREAD BLOCK REGISTERS Global Memory SHARED MEMORY

Tesla C1060 Fermi C2070 14 Multiprocessors 30 Multiprocessors 448 Cores 240 Cores Processor core clock: 1.15 GHz Processor core clock: 1.296 GHz 1030 Gigaflops (Single precision) 933 Gigaflops (Single precision) 515 Gigaflops (Double precision) 78 Gigaflops (Double Precision) Max Bandwidth (144 GBytes/sec) Max Bandwidth(102 Gigabytes/sec) 6 GB of DRAM 4 GB of DRAM Cost: 1000 $ Cost: 3200 $

Reaction-Diffusion in CUDA ∂ u ∂ t = R ( u ) + ∇ ( D ∇ u ) For each time step, a set of (ODEs) and Partial Differential Equations (PDEs) must be solved. for (timestep=1; timestep < nend; i++){ solveODEs <<grid, block>> (….); calcLaplacian <<grid, block>> (….); } Solving ODEs using different method Solving PDEs (Calc the Laplacian) depending on the model: -Runge-Kutta 4 th order -Forward Euclid -Backward Euclid -Semi-Implicit ( ) ∇ ( D ∇ u ) i , j = Ddt dx 2 u i − 1, j + u i , j − 1 + u i + 1, j + u i , j + 1 − 4 u i , j ……

Optimize the Reaction Term in CUDA Heaviside Simplification • In order to avoid divergent branches among threads we substitute if then else condition of Heaviside functions in this way: If (x > a){ y = b; y =b + (x > a)*(b_c) }else{ y = c; a, b, b_c (b-c) are constant } Precomputing Lookup Tables using Texture • We build tables where we precompute nonlinear part of the ODEs, we bind these table to textures and we exploit the built-in linear interpolation feature of the texture.

Toward Real-Time Simulation of Cardiac Dynamics Ezio Bartocci SUNY - PowerPoint PPT Presentation

Toward Real-Time Simulation of Cardiac Dynamics Ezio Bartocci SUNY Stony Brook Joint work with E. Cherry, J. Glimm, R. Grosu, S. A. Smolka, F. Fenton Outline Motivation Cardiac Models as Reaction Diffusion Systems CUDA Programming

Presentation of the Cardiac Surgery Simulation Syllabus to the Thoracic Surgery Directors

Real- Real -Time Systems Time Systems Real- -Time Systems Time Systems Real

Real Real- -Time Systems Time Systems Designing a real- Designing a real -time system time

Real- Real -time systems time systems Real- Real -time programming time programming

Real graduates, Real graduates, real transitions, real transitions, real stories: real

Outline Narcisse Ngada DESY, MKK 1) What is simulation ? 14.05.2014 2) Why simulation ? 3)

Cardiac Pathology 2: Congenital and Ischemic Heart Disease Kristine Krafts, M.D. Cardiac

1 Histology EARLY Lymphocytic Infiltrate GRANULOMA PHASE Diagnosis LATE - SCAR Favora et.

12/17/16 Cardiac CT, Nuclear Cardiology, and Cardiac MRI DISCLOSURE Roles in the Era of

3/12/2019 Cardiac Cath Why Bother? Cardiac Catheterization in Respiratory Status Preterm

Real Real Real Time Real-Time Time Time Model Checking Model Model Checking Model

Grid simulation (AliEn) Outline GRID simulation Simulation tool Ptolemy (Berkeley)

Outline Introduction Space-Time Simulation Time Parallel Simulation Fix-up

Molecular Dynamics Simulation A Short Introduction Michel Cuendet 1 Molecular Dynamics

EMBEDDED EMBEDDED REAL TIME SYSTEMS REAL TIME SYSTEMS EMBEDDED EMBEDDED REAL TIME SYSTEMS

Real Time Operating Systems Shirvaikar Chapter 4 REAL TIME SYSTEMS SHIRVAIKAR 1 Real Time

New Device Options for Conflict of EVAR: What Advantages Intere$t do they Provide? I make

Precision Medicine in Obesity Aaron S. Kelly, Ph.D. Associate Professor of Pediatrics Minnesota

Disclosures Research: NIH Novel Predictors of Atrial PCORI Fibrillation

DeepGene: An Advanced Cancer Type Classifier Based on Deep Learning and Somatic Point Mutations

Accelerator Research and Technology Developments for Industrial Applications (excluding medicine)

The case for why it matters Fluid balance a common concern Patients with Sepsis who

Outline Defining Sepsis Suspected infection Sepsis Definitions And Screening in

Physio behavioral Synchronicity as an Index of Teamwork (LRIR) PI: Gregory Funke (Air Force