PORTABLE PERFORMANCE FOR MONTE CARLO SIMULATION OF PHOTON MIGRATION - PowerPoint PPT Presentation

PORTABLE PERFORMANCE FOR MONTE CARLO SIMULATION OF PHOTON MIGRATION IN 3D TURBID MEDIA FOR SINGLE AND MULTIPLE GPUS Fanny Nina-Paravecino Leiming Yu Qianqian Fang* David Kaeli Department of Electrical and Computer Engineering Department of Bioengineering* Northeastern University Boston, MA

Outline • Portable Performance Monte Carlo Extreme (MCX) • MCX in CUDA • Persistent Threads in MCX • Portable Performance MCX • MCX on multiple GPUs • Linear Performance • Linear Programming Model • Performance Results

PORTABLE PERFORMANCE MCX Photons initialization 3D voxelated media

Monte Carlo Extreme (MCX) in CUDA • Estimates the 3D light (fluence) distribution by simulating a large number of independent photons • Most accurate algorithm for a wide ranges of optical properties, including low-scattering/voids, high absorption and short source-detector separation • Computationally intensive, so a great target for GPU acceleration • Widely adopted for bio-optical imaging applications: • Optical brain functional imaging • Fluorescence imaging of small animals for drug development • Gold stand for validating new optical imaging instrumentation designs and algorithms

MCX in CUDA Simulation of photon transport inside human brain Imaging of bone marrow in the tibia Imaging of a complex mouse model using Monte Carlo simulations

MCX in CUDA [1] … Loop of repetitions Thread i+1 Thread i Seed GPU RNG Start Launch a photon with CPU RNG Compute the scattering length Global Move photo one Memor voxel y Compute attenuation based on absorption Compute a scattering Accumu. Probability direction Repetition to the volume vector complete? n y y Scattering Exceeds ends ? time gate? Retrieve solution n End of y n Total move Terminate Normalize & save or photon # simulatio thread reached? solution n CPU GPU [1] Q. Fang and D. A. Boas. "Monte Carlo simulation of photon migration in 3D turbid media accelerated by graphics processing units." Optics express 17.22 (2009): 20178-20190.

Persistent Threads (PT) in MCX • PT kernels alter the notion of a virtual thread lifetime, treating those threads as physical hardware threads • PT kernels provide a view that threads are active for the entire duration of the kernel • We schedule only as many threads as the GPU SMs can concurrently run • The threads remain active until end of kernel execution Thread Block CUDA Grid Structure Grid …

Persistent Threads (PT) in MCX • A PT kernel bypasses the hardware scheduler, relying on a work queue to schedule blocks • A PT kernel checks the queue for more work and continues doing so until no work is left • PT MCX works on a FIFO blocking queue Blocks Back Front Shared Multiprocessor Enqueue Queue

Portable Performance for MCX • Fermi Kepler Maxwell MaxThreadBlocks 8 16 32 /Multiprocessor MaxThreads/Multi 1536 2048 2058 processor Multiprocessors 16 14 22 (MP) CUDA cores / MP 32 192 128 # threadsPerBlock = (MaxThread/MP)/(MaxThreadBlocks/MP) # blocks = # threadsPerBlock * (MaxThreadBlocks/MP) * MP

Portable Performance MCX - Results Kepler GK110 Baseline Improved Code Maxwell 980Ti Baseline Improved Code 32 128 32 128 ThreadsPerBlock ThreadsPerBlock 86,016 28,672 90,112 45,056 # Total Threads # Total Threads 2688 224 2816 352 # Blocks # Blocks Performance Performance 2383 2887 13,369 15,015 (Photons/ms) (Photons/ms) 1.0 1.21 1.0 1.12 Speedup Speedup Baseline Code Improved 1.3 1.2 1.1 1.0 0.9 Speedup 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 Kepler GK110 Maxwell 980Ti

MCX ON MULTIPLE GPUS

Linear Programming Model • Given n devices: D 1 , D 2 , … D n • Given linear performance for each device • Given the performance for 10 Million photons and 100 Millions for each device • We can obtain the linear equation for each device as follow: y 1 = b 1 + ( x 1 - 1) a 1 + C 1 Device 1 f1: y 2 = b 2 + ( x 2 - 1) a 2 + C 2 Device 2 f2: . . . . . . y n = b n + ( x n - 1) a n + C n Device n f3:

Performance Results • We evaluated our Linear Programming on Linear Model ( LPLM ) scheme for two different configurations of NVIDIA devices • The resulting partition of the workload achieves an average 8% speedup over the baseline 2100 Baseline LPLM 2050 Photos/ms 2000 1950 1900 1850 1800 1750 10M 50M 100M 10M 50M 100M GTX980+GT730 GTX980+GT730+GTX580 # Photons

Summary • We have improved the performance of MCX across a range of NVIDIA GPU architectures • We have showed how to exploit Persistent Thread kernel to automatically tune MCX kernel • We developed a linear programming model to find the best partition to run MCX on multiple GPUs • We improved performance of MCX run on multiple NVIDIA GPUs, including Kepler and Maxwell • We obtained an 8% speedup when using automatic partitioning

Future Work • PT MCX • The queue of blocks can either can be static (know at compile time) or dynamic (generated at runtime), and can be used to control the order, location, and the timing of each block • Instrumentation of MCX • Leverage SASSI to instrument MCX and better characterize the behavior of a kernel to guide auto-tuning • MCX on Multiple GPUs • Evaluate our partitioning optimization for multiple devices

THANK YOU! Questions?

PORTABLE PERFORMANCE FOR MONTE CARLO SIMULATION OF PHOTON MIGRATION - PowerPoint PPT Presentation

PORTABLE PERFORMANCE FOR MONTE CARLO SIMULATION OF PHOTON MIGRATION IN 3D TURBID MEDIA FOR SINGLE AND MULTIPLE GPUS Fanny Nina-Paravecino Leiming Yu Qianqian Fang* David Kaeli Department of Electrical and Computer Engineering Department of

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Simulation Monte Carlo Monte Carlo simulation Outcome of a single stochastic simulation run

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

PC PORTABLE PC PORTABLE PC PORTABLE Introducing the PC Portable Lamp, one of a range of

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

PORTABLE PERFORMANCE FOR MONTE CARLO SIMULATION OF PHOTON MIGRATION IN 3D TURBID MEDIA FOR

Portable Monte Carlo Transport Performance Evaluation in the PATMOS Prototype Tao CHANG 1

4. THE MONTE CARLO METHOD 4.1 I ntroduction This chapter is aimed at describing the Monte Carlo

Photon Tracing Photon Maps Simulating light propagation by shooting photons from the light

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference,

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1.

3. Monte Carlo Simulations Understanding Molecular Simulation Molecular Simulations Molecular

Welcome to SOAR Elementary Programs Professional Education Department Shelby Dr. Nissa

Digitising Music Collections Marion Leonard and Jacqueline Waldock Institute of Popular Music

Persistent Organic Pollutants and WEEE Name: Bob McIntyre Job title: Senior Advisor -

POPs in air and w ater, and micropollutants from WWTP Outcome of the 2016 questionnaire presented

North Carolinas MASH -16 Policy Update Steve Kite, PE State Work Zone Engineer Todays

Solomon Systech (International) Limited 2007 Annual Results Announcement 27 March 2008 Hong Kong

Oregon Tech Portland-Metro IEEE Club Baja Team Cyber Security Club Brand new on campus

Portland General Electric Transmission System OPUC Transmission Workshop January 17, 2019 Shaun

PORTABLE PERFORMANCE FOR MONTE CARLO SIMULATION OF PHOTON MIGRATION - PowerPoint PPT Presentation

PORTABLE PERFORMANCE FOR MONTE CARLO SIMULATION OF PHOTON MIGRATION IN 3D TURBID MEDIA FOR SINGLE AND MULTIPLE GPUS Fanny Nina-Paravecino Leiming Yu Qianqian Fang* David Kaeli Department of Electrical and Computer Engineering Department of

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Simulation Monte Carlo Monte Carlo simulation Outcome of a single stochastic simulation run

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

PC PORTABLE PC PORTABLE PC PORTABLE Introducing the PC Portable Lamp, one of a range of

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

PORTABLE PERFORMANCE FOR MONTE CARLO SIMULATION OF PHOTON MIGRATION IN 3D TURBID MEDIA FOR

Portable Monte Carlo Transport Performance Evaluation in the PATMOS Prototype Tao CHANG 1

4. THE MONTE CARLO METHOD 4.1 I ntroduction This chapter is aimed at describing the Monte Carlo

Photon Tracing Photon Maps Simulating light propagation by shooting photons from the light

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference,

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1

Monte Carlo Control CMPUT 366: Intelligent Systems S&amp;B 5.3-5.5, 5.7 Lecture Outline 1.

3. Monte Carlo Simulations Understanding Molecular Simulation Molecular Simulations Molecular

Welcome to SOAR Elementary Programs Professional Education Department Shelby Dr. Nissa

Digitising Music Collections Marion Leonard and Jacqueline Waldock Institute of Popular Music

Persistent Organic Pollutants and WEEE Name: Bob McIntyre Job title: Senior Advisor -

POPs in air and w ater, and micropollutants from WWTP Outcome of the 2016 questionnaire presented

North Carolinas MASH -16 Policy Update Steve Kite, PE State Work Zone Engineer Todays

Solomon Systech (International) Limited 2007 Annual Results Announcement 27 March 2008 Hong Kong

Oregon Tech Portland-Metro IEEE Club Baja Team Cyber Security Club Brand new on campus

Portland General Electric Transmission System OPUC Transmission Workshop January 17, 2019 Shaun

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1.