A GPU-Accelerated 3D Kinematic Modeling Platform for Behavioral Neuroscience John Long, PhD Buzsáki Laboratory Neuroscience Institute New York University Langone Medical Center 03.17.2015

A little about me… Jose Carmena György Buzsáki

…and my previous work. Venkatraman et al. 2009 Koralek et al. 2012 Long and Carmena 2011 Long and Carmena 2013

Why am I at a GPU conference? • As a neuroscientist, I find small form factor, massively parallel computing machines intriguing. • For ease of interface and visualization, I often program in Matlab or Python, and I suffer agonizing computational bottlenecks, which has led me to GPUs. • I’ve had a fair amount of success applying GPU computing to my scientific work.

What I have in store for you… • An introduction to the work I do in behavioral neuroscience that led me to GPU computing. • A detailed description of one of the CUDA programs I have implemented in the context of my research. • Throughout, I’ll mention a workflow I’ve found useful for porting CUDA code into Matlab and Python.

Who reads the maps in the brain? Hippocampal “ place ” cells Sensory receptive fields (Hubel and Wiesel 1959) (O ’ Keefe and Nadel, 1978; O ’ Keefe and Recce, 1993) Geisler, Sirota, Zugaro, Robbe, Buzsaki, PNAS 2007 Lurilli et al. 2012

The State of the Art in Behavioral Neuroscience More and more neural data!

The State of the Art in Behavioral Neuroscience The behaving rat…

Advances in Motion Capture Corazza et al. 2006

Environment Construction 6 5 4 1 2 3

Multiple Camera Synchronization Lines to cameras Line to Amplipex system

Image Segmentation

Camera Calibration 3D to 2D perspective transformation Svoboda et al. 2005

Visual Hull Construction Visual Hull Algorithm modified from Forbes et al. 2006

Kinematic Model Design

Kinematic Model Manipulation Murray et al. 1994

Kinematic Model Fitting Generate Candidate Poses Compute Cost Components n j n j n i D j d ij M i d ij = ||M i – D j || 2 α ij = dot(n i ,n j ) Update Posterior Estimate Score Each Pose

Kinematic Model Fitting Generate Candidate Poses Compute Cost Components n j n j n i D j d ij M i d ij = ||M i – D j || 2 α ij = dot(n i ,n j ) Update Posterior Estimate Score Each Pose

Open Chain Kinematics P1 = t1a * t1b * t1c * t1d * t1e * P1; N1 = t1a * t1b * t1c * t1d * t1e * N1; N1 = N1-P1; P4 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * P4; N4 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * N4; N4 = N4-P4; P5 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * t5a * t5b * t5c * P5; N5 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * t5a * t5b * t5c * N5; N5 = N5-P5; P7 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * t5a * t5b * t5c * t6a * t6b * t6c * t7a * t7b * t7c * P7; N7 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * t5a * t5b * t5c * t6a * t6b * t6c * t7a * t7b * t7c * N7; N7 = N7-P7; “A mathematical introduction to robotic manipulation” by Murray, Li, and Sastry 1994

Open Chain Kinematics: On the GPU Exposing Parallelism //Thread parameters unsigned int hWID = threadIdx.x / halfWarpSz; unsigned int hWoff = threadIdx.x % halfWarpSz; • All 4x4 transformation matrices t i can be unsigned int x = hWoff % DimXY; unsigned int y = hWoff / DimXY; computed in parallel. • There are many shared computational //MATRIX REDUCTION: across temporary variables over twists float sum[2]; //1st reduction from 16, 4x4 matrices to 8, 4x4 matrices blocks. if(hWID < 8) { sum[0] = 0.0f; #pragma unroll for(int k = 0; k < 4; k++) { P1 = t1a * t1b * t1c * t1d * t1e * P1; sum[0] += Stwists[4*(2*hWID) + y][k]* N1 = t1a * t1b * t1c * t1d * t1e * N1; Stwists[4*(2*hWID+1) + k][x]; } N1 = N1-P1; Transtmp0[4*hWID + y][x] = sum[0]; }; P4 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * P4; __syncthreads(); N4 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * N4; N4 = N4-P4; P5 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * t5a * t5b * t5c * P5; N5 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * t5a * t5b * t5c * N5; N5 = N5-P5; P7 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * t5a * t5b * t5c * t6a * t6b * t6c * t7a * t7b * t7c * P7; N7 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * t5a * t5b * t5c * t6a * t6b * t6c * t7a * t7b * t7c * N7; N7 = N7-P7;

Open Chain Kinematics: On the GPU: Results Compute Time Comparison CUDA ported into Matlab via Mex single Matlab: mean = 12.6 sec Per frame compute time (seconds) • x22.5 speedup relative to single Matlab process parfor Matlab: mean = 8.2 sec • x14.6 speedup relative to parallel Matlab process (6 CPUs) • Qualitative speedup allowed for parameter tuning resulting in an average 50% reduction in per frame model fit error i.e. better model CUDA in Matlab: mean = 0.55 sec fits! • Promising approach to open chain kinematic Model Fit Comparison CUDA where you need it Per frame model fit error (a.u.) errors prior to tuning • Qualitative speedups mean more efficient science. • Work where you need to and let user friendly languages like Matlab and Python do the rest. errors after tuning Frame number

Putting it all together

Promising Directions Berman et al. 2014

Label Clusters Kinematic Modeling Behavioral Classification forward gaze Parameterize rearing Dynamics Wavelet Analysis tight scan 1 st principal component Cluster Embedded Dynamics T-SNE 2 nd principal component Map Dynamics 3 rd principal component Time (seconds)

Conclusion • Fitting kinematic models to 3D visual hull data is greatly accelerated by GPUs. • The framework I’ve presented can be applied to open chain kinematics models in general. • In science, Big Data too often means sitting around waiting to find out you need to run your analysis again. GPUs are a game changer. • Interfaces like mex (Matlab) and ctypes (Python) allow you to tackle the hard parts and be lazy about the easy parts. – You can incrementally deal with bottlenecks of decreasing priority.

Acknowledgements György Buzsáki The entire Buzsáki lab Andres Grosmark Antal Berenyi Thank you!

Kinematic Model Design

Download Presentation

Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend

More recommend