A GPU-Accelerated 3D Kinematic Modeling Platform for Behavioral - - PowerPoint PPT Presentation

a gpu accelerated 3d kinematic
SMART_READER_LITE
LIVE PREVIEW

A GPU-Accelerated 3D Kinematic Modeling Platform for Behavioral - - PowerPoint PPT Presentation

A GPU-Accelerated 3D Kinematic Modeling Platform for Behavioral Neuroscience John Long, PhD Buzski Laboratory Neuroscience Institute New York University Langone Medical Center 03.17.2015 A little about me Jose Carmena Gyrgy Buzski


slide-1
SLIDE 1

A GPU-Accelerated 3D Kinematic Modeling Platform for Behavioral Neuroscience

John Long, PhD Buzsáki Laboratory Neuroscience Institute New York University Langone Medical Center 03.17.2015

slide-2
SLIDE 2

A little about me…

György Buzsáki Jose Carmena

slide-3
SLIDE 3

…and my previous work.

Venkatraman et al. 2009 Long and Carmena 2013 Long and Carmena 2011 Koralek et al. 2012

slide-4
SLIDE 4
  • As a neuroscientist, I find small form factor,

massively parallel computing machines intriguing.

  • For ease of interface and visualization, I often

program in Matlab or Python, and I suffer agonizing computational bottlenecks, which has led me to GPUs.

  • I’ve had a fair amount of success applying

GPU computing to my scientific work.

Why am I at a GPU conference?

slide-5
SLIDE 5
  • An introduction to the work I do in behavioral

neuroscience that led me to GPU computing.

  • A detailed description of one of the CUDA

programs I have implemented in the context of my research.

  • Throughout, I’ll mention a workflow I’ve

found useful for porting CUDA code into Matlab and Python.

What I have in store for you…

slide-6
SLIDE 6

Who reads the maps in the brain?

Lurilli et al. 2012 Geisler, Sirota, Zugaro, Robbe, Buzsaki, PNAS 2007

Hippocampal “place” cells

(O’Keefe and Nadel, 1978; O’Keefe and Recce, 1993)

Sensory receptive fields

(Hubel and Wiesel 1959)

slide-7
SLIDE 7

The State of the Art in Behavioral Neuroscience

More and more neural data!

slide-8
SLIDE 8

The behaving rat…

The State of the Art in Behavioral Neuroscience

slide-9
SLIDE 9

Advances in Motion Capture

Corazza et al. 2006

slide-10
SLIDE 10

Environment Construction

4 2 1 6 3 5

slide-11
SLIDE 11

Lines to cameras Line to Amplipex system

Multiple Camera Synchronization

slide-12
SLIDE 12

Image Segmentation

slide-13
SLIDE 13

Svoboda et al. 2005

3D to 2D perspective transformation

Camera Calibration

slide-14
SLIDE 14

Visual Hull Construction

Visual Hull Algorithm modified from Forbes et al. 2006

slide-15
SLIDE 15

Kinematic Model Design

slide-16
SLIDE 16

Kinematic Model Manipulation

Murray et al. 1994

slide-17
SLIDE 17

Generate Candidate Poses Score Each Pose ni nj nj dij Mi Dj dij = ||Mi – Dj||2 αij = dot(ni,nj) Compute Cost Components Update Posterior Estimate

Kinematic Model Fitting

slide-18
SLIDE 18

Generate Candidate Poses Score Each Pose ni nj nj dij Mi Dj dij = ||Mi – Dj||2 αij = dot(ni,nj) Compute Cost Components Update Posterior Estimate

Kinematic Model Fitting

slide-19
SLIDE 19

Open Chain Kinematics

P1 = t1a * t1b * t1c * t1d * t1e * P1; N1 = t1a * t1b * t1c * t1d * t1e * N1; N1 = N1-P1; P4 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * P4; N4 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * N4; N4 = N4-P4; P5 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * t5a * t5b * t5c * P5; N5 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * t5a * t5b * t5c * N5; N5 = N5-P5; P7 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * t5a * t5b * t5c * t6a * t6b * t6c * t7a * t7b * t7c * P7; N7 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * t5a * t5b * t5c * t6a * t6b * t6c * t7a * t7b * t7c * N7; N7 = N7-P7; “A mathematical introduction to robotic manipulation” by Murray, Li, and Sastry 1994

slide-20
SLIDE 20

Open Chain Kinematics: On the GPU

P1 = t1a * t1b * t1c * t1d * t1e * P1; N1 = t1a * t1b * t1c * t1d * t1e * N1; N1 = N1-P1; P4 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * P4; N4 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * N4; N4 = N4-P4; P5 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * t5a * t5b * t5c * P5; N5 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * t5a * t5b * t5c * N5; N5 = N5-P5; P7 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * t5a * t5b * t5c * t6a * t6b * t6c * t7a * t7b * t7c * P7; N7 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * t5a * t5b * t5c * t6a * t6b * t6c * t7a * t7b * t7c * N7; N7 = N7-P7;

Exposing Parallelism

//MATRIX REDUCTION: across temporary variables over twists float sum[2]; //1st reduction from 16, 4x4 matrices to 8, 4x4 matrices if(hWID < 8) { sum[0] = 0.0f; #pragma unroll for(int k = 0; k < 4; k++) { sum[0] += Stwists[4*(2*hWID) + y][k]* Stwists[4*(2*hWID+1) + k][x]; } Transtmp0[4*hWID + y][x] = sum[0]; }; __syncthreads(); //Thread parameters unsigned int hWID = threadIdx.x / halfWarpSz; unsigned int hWoff = threadIdx.x % halfWarpSz; unsigned int x = hWoff % DimXY; unsigned int y = hWoff / DimXY;

  • All 4x4 transformation matrices ti can be

computed in parallel.

  • There are many shared computational

blocks.

slide-21
SLIDE 21

Open Chain Kinematics: On the GPU: Results

  • x22.5 speedup relative to single Matlab

process

  • x14.6 speedup relative to parallel Matlab

process (6 CPUs)

  • Qualitative speedup allowed for parameter

tuning resulting in an average 50% reduction in per frame model fit error i.e. better model fits!

  • Promising approach to open chain kinematic

CUDA ported into Matlab via Mex

Per frame compute time (seconds) Frame number Per frame model fit error (a.u.) Compute Time Comparison Model Fit Comparison

single Matlab: mean = 12.6 sec parfor Matlab: mean = 8.2 sec CUDA in Matlab: mean = 0.55 sec

CUDA where you need it

errors prior to tuning errors after tuning

  • Qualitative speedups mean more efficient

science.

  • Work where you need to and let user

friendly languages like Matlab and Python do the rest.

slide-22
SLIDE 22

Putting it all together

slide-23
SLIDE 23

Promising Directions

Berman et al. 2014

slide-24
SLIDE 24

Wavelet Analysis 1st principal component 2nd principal component 3rd principal component Time (seconds) Kinematic Modeling Behavioral Classification Parameterize Dynamics Cluster Embedded Dynamics T-SNE Map Dynamics Label Clusters rearing forward gaze tight scan

slide-25
SLIDE 25

Conclusion

  • Fitting kinematic models to 3D visual hull data is greatly

accelerated by GPUs.

  • The framework I’ve presented can be applied to open chain

kinematics models in general.

  • In science, Big Data too often means sitting around waiting to

find out you need to run your analysis again. GPUs are a game changer.

  • Interfaces like mex (Matlab) and ctypes (Python) allow you to

tackle the hard parts and be lazy about the easy parts.

– You can incrementally deal with bottlenecks of decreasing priority.

slide-26
SLIDE 26

Acknowledgements

György Buzsáki Antal Berenyi Andres Grosmark

The entire Buzsáki lab Thank you!

slide-27
SLIDE 27

Kinematic Model Design