Dynamics Framework on the GPU Daniel Melanz, Luning Fang, Ang Li, - PowerPoint PPT Presentation

GPU TECHNOLOGY CONFERENCE: S5400: Chrono::SPIKE – A Nonsmooth Contact Dynamics Framework on the GPU Daniel Melanz, Luning Fang, Ang Li, Hammad Mazhar, Radu Serban, Dan Negrut Simulation-Based Engineering Laboratory University of Wisconsin - Madison

Overview Nonsmooth Contact Dynamics 1) Quadratic Optimization w/ Conic Constraints 2) Preconditioning with SPIKE 3) Numerical Results 4) Conclusions & Future Work 5) 3/19/2015 2 University of Wisconsin

Nonsmooth Contact Dynamics 3/19/2015 3 University of Wisconsin

Nonsmooth Dynamics 3/19/2015 4 University of Wisconsin

Nonsmooth Dynamics: Frictionless Case The Signorini Conditions : Every relative velocity should be zero or separating Every contact impulse should be non- attractive No impulse at separating contacts: Antonio Signorini Tonge, 2012 3/19/2015 5 University of Wisconsin

Nonsmooth Dynamics: Frictionless Case The Signorini Conditions : This is a compact way to write the three conditions in one line of math Antonio Signorini Tonge, 2012 3/19/2015 6 University of Wisconsin

Nonsmooth Dynamics: Frictionless Case The final model can be expressed by these equations: Tonge, 2012 3/19/2015 7 University of Wisconsin

Nonsmooth Dynamics: Friction Case Stewart and Trinkle, 1996 3/19/2015 8 University of Wisconsin

Nonsmooth Dynamics: Friction Case Anitescu and Hart, 2004 3/19/2015 9 University of Wisconsin

Nonsmooth Dynamics: The Cone Complementarity Problem (CCP) where 3/19/2015 10 University of Wisconsin

Nonsmooth Dynamics: The Quadratic Programming Angle… • The CCP captures the first-order optimality condition for a quadratic optimization problem with conic constraints: • Notation used: 3/19/2015 11 University of Wisconsin

Quadratic Optimization w/ Conic Constraints (CCQO’s) 3/19/2015 12 University of Wisconsin

CCQO’s: First Order Methods 3/18/2015 13

CCQO’s: Second Order Methods • Original problem: • Reformulation via an indicator function: where otherwise • Approximation via logarithmic barrier: 3/18/2015 14

Interior Point 3/18/2015 15

Numerical Results 3/19/2015 16 University of Wisconsin

Results: Physical Model • Several numerical experiments were performed using a model of spheres falling into a bucket 3/19/2015 17 University of Wisconsin

Results: Comparison of Solver Results • Simulations of the filling simulation were performed for 3 seconds with a step size, h=10 -3 seconds using the APGD and PDIP solvers -2000 -2000 1e-1 1e-1 1e-2 1e-2 PDIP APGD 1e-3 1e-3 -4000 1e-4 -4000 1e-4 1e-5 1e-5 -6000 -6000 Weight [N] -8000 Weight [N] -8000 -10000 -10000 -12000 -12000 -14000 -14000 -16000 -16000 2 2.2 2.4 2.6 2.8 3 3.2 2 2.2 2.4 2.6 2.8 3 3.2 Time [s] Time [s] 3/19/2015 18 University of Wisconsin

Results: Comparison of Solver Iterations • Simulations of the filling simulation were performed for 3 seconds with a step size, h=10 -3 seconds using the APGD and PDIP solvers 500 60 1e-1 1e-1 1e-2 1e-2 450 1e-3 PDIP APGD 1e-3 1e-4 1e-4 1e-5 50 1e-5 400 350 40 300 Iterations [#] Iterations [#] 250 30 200 20 150 100 10 50 0 0 0 0.5 1 1.5 2 2.5 3 3.5 0 0.5 1 1.5 2 2.5 3 3.5 Time [s] Time [s] 3/19/2015 19 University of Wisconsin

Results: Comparison of Solver Execution Time • Simulations of the filling simulation were performed for 3 seconds with a step size, h=10 -3 seconds using the APGD and PDIP solvers PDIP APGD 3/19/2015 20 University of Wisconsin

Results: Comparison of Solvers • Simulations of the filling simulation were performed for 3 seconds with a step size, h=10 -3 seconds using the APGD and PDIP solvers 500 1e-1 1e-2 450 PDIP APGD 1e-3 1e-4 1e-5 400 350 300 Iterations [#] 250 200 150 100 50 0 0 0.5 1 1.5 2 2.5 3 3.5 Time [s] 3/19/2015 21 University of Wisconsin

Preconditioning with SPIKE 3/19/2015 22 University of Wisconsin

The SPIKE algorithm • SPIKE: a divide-and-conquer approach to solving banded dense systems. • Proposed by A. H. Sameh and D. J. Kuck in 1978. (see also E. Polizzi and A. H. Sameh, Parallel Computing 32(2), 2006) • Basic idea: • Partition the matrix A . • Factorize A to isolate independent blocks. • Solve a reduced system to account for coupling information. • Recover solution of original system. • SPIKE comes in two main flavors: • Full-SPIKE : recursively solve an exact reduced system (direct solver for banded matrices). • Truncated-SPIKE : solve an approximate reduced system in one step (needs iterative refinement). 3/19/2015 23 University of Wisconsin

SPIKE: algorithmic details Partitioning and Factorization • Partition and factorize A into block diagonal matrix D and spike matrix S. 3/19/2015 24 University of Wisconsin

SPIKE: algorithmic details Solving Dg=b • Reduced to solving P independent (banded dense) linear systems. • Map these systems to P blocks on GPU. • Apply classical LU (or UL) methods to each sub-system. 3/19/2015 25 University of Wisconsin

SPIKE factorization in plain math • The right ( V i ) and left ( W i ) spike blocks can be obtained through the solution of P independent multiple-RHS banded linear systems. 3/19/2015 26 University of Wisconsin

SPIKE: algorithmic details Solving Sx=g (full SPIKE) • Combine all coupling blocks into a reduced matrix • (Recursively) solve the reduced system • Recover solution from reduced solution Combine coupling blocks 3/19/2015 27 University of Wisconsin

SPIKE: algorithmic details Solving Sx=g (truncated SPIKE) • Justified for diagonally dominant systems only. • All spike blocks W and V are approximated by their top and bottom parts, respectively. • Results in a decoupling of the reduced matrix into ( P -1 ) small independent systems ( 2 K x 2 K ). Truncate spike blocks 3/19/2015 28 University of Wisconsin

Truncated SPIKE as a preconditioner • Fundamental idea: • Reorder a sparse matrix to obtain a banded matrix with as “heavy” a diagonal as possible. • Drop small entries far from the main diagonal in an attempt to produce an even narrower band. • Use truncated SPIKE on resulting banded matrix. • Sparse matrix reordering • Reordering is critical • Non-zeroes can spread while we prefer them to gather around diagonals. • Both truncated SPIKE and BiCGStab(2) prefer diagonal elements with large absolute values. • Reordering strategies • Use row permutations to maximize product of absolute diagonal values: A  QA • Apply symmetric RCM for bandwidth reduction: QA + A T Q T  P ( QA + A T Q T ) P T 3/19/2015 29 University of Wisconsin

Numerical Results 3/19/2015 30 University of Wisconsin

Results: Preconditioned PDIP (P-PDIP) • Adding preconditioning to the search direction computation drastically improves computation time 3/19/2015 31 University of Wisconsin

Results: Effect of Problem Size • A series of simulations on filling models of increasing size were performed to estimate how the solver performance scales with problem dimension 3/19/2015 32 University of Wisconsin

Conclusions & Future Work 3/19/2015 33 University of Wisconsin

Conclusions • Interior point methods require much less iterations than gradient descent methods, but each iteration is much more computationally expensive • Preconditioning is responsible for an four-fold reduction in run times when simulating nonsmooth contact problems • Although used with the nonsmooth dynamics, this speed-up is independent of the specific formalism adopted for the formulation of the equations of motion 3/19/2015 34 University of Wisconsin

Future Work • Investigate improvements to the interior point algorithm • Investigate SPIKE update strategies and preconditioner re-use • Investigate the effectiveness of spectral reordering methods • Understand and gauge the software implementation effort and simulation efficiency trade-offs related to moving from the GPU to parallel multi-core CPU architectures 3/19/2015 35 University of Wisconsin

Thank you. • Source available for download under BSD-3 http://spikegpu.sbel.org/ • For all of our animations, please visit https://vimeo.com/uwsbel • For more information about the Simulation- Based Engineering Laboratory, please visit http://sbel.wisc.edu/ 3/19/2015 36 University of Wisconsin

Thank You. melanz@wisc.edu Simulation Based Engineering Lab Wisconsin Applied Computing Center 3/19/2015 37 University of Wisconsin

Dynamics Framework on the GPU Daniel Melanz, Luning Fang, Ang Li, - PowerPoint PPT Presentation

GPU TECHNOLOGY CONFERENCE: S5400: Chrono::SPIKE A Nonsmooth Contact Dynamics Framework on the GPU Daniel Melanz, Luning Fang, Ang Li, Hammad Mazhar, Radu Serban, Dan Negrut Simulation-Based Engineering Laboratory University of Wisconsin -

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Use Tesla to provide first GPU VM Service in China Feng Zhu

THEIA GPU Open Source multicore programmable GPU Problem Statement Develop an open source 3D

Performance Evaluation of a Multithreaded GPU Using CUDA GPU architecture GeForce 8800 GPU

Super GPU & Super Kernels: Make programming of multi-GPU systems easy Michael Frumkin, May 8,

MULTI-GPU TRAINING WITH NCCL Sylvain Jeaugey MULTI-GPU COMPUTING Harvesting the power of

GPU Architecture and chitecture and GPU Ar The good The good The bad The bad

GPU programming in Haskell Henning Thielemann 2015-01-23 GPU programming in Haskell Motivation:

MVAPICH2-GPU: Op0mized GPU to GPU Communica0on for InfiniBand

Real-Time GPU Management Heechul Yun 1 This Week Topic: General Purpose Graphic Processing

GPU programming Dr. Bernhard Kainz 1 Overview About myself Last week Motivation GPU

GPU PROGRAMMING 2 GPU Programming Assignment 4 Consists of

Web Dynamics Part 1 Introduction 1.1 Dimensions of dynamics in the Web 1.2 Application examples

H O W E T R U S S HISTORY, USE AND STRUCTURAL ANALYSIS 143 bridges are supported by the Howe

Nicholas Reed Structural Option Seneca Allegany Casino Hotel Addition AE Senior Thesis 2013

Parallelogram This is a four sided figure where opposite sides are equal and parallel. The

are: Opposite sides of a rectangle are parallel. 1. Opposite sides of a rectangle are equal. 2.

DETERMINATION OF FRACTURE TOUGHNESS OF AMORPHOUS CARBON COATINGS USING INDENTATION METHOD S. M.

A Scalable Generator of Non-Hermitian Test Matrices computed from Given Spectra for Large-scale

Basic Photography The 6 Things To Know Know your camera Hold the camera still

Self maps of P 1 with fixed degeneracies Lucien Szpiro Self maps of P 1 with fixed degeneracies

Dynamics Framework on the GPU Daniel Melanz, Luning Fang, Ang Li, - PowerPoint PPT Presentation

GPU TECHNOLOGY CONFERENCE: S5400: Chrono::SPIKE A Nonsmooth Contact Dynamics Framework on the GPU Daniel Melanz, Luning Fang, Ang Li, Hammad Mazhar, Radu Serban, Dan Negrut Simulation-Based Engineering Laboratory University of Wisconsin -

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO &amp; Co-founder Blagovest Taskov, RT GPU Team

Use Tesla to provide first GPU VM Service in China Feng Zhu

THEIA GPU Open Source multicore programmable GPU Problem Statement Develop an open source 3D

Performance Evaluation of a Multithreaded GPU Using CUDA GPU architecture GeForce 8800 GPU

Super GPU &amp; Super Kernels: Make programming of multi-GPU systems easy Michael Frumkin, May 8,

MULTI-GPU TRAINING WITH NCCL Sylvain Jeaugey MULTI-GPU COMPUTING Harvesting the power of

GPU Architecture and chitecture and GPU Ar The good The good The bad The bad

GPU programming in Haskell Henning Thielemann 2015-01-23 GPU programming in Haskell Motivation:

MVAPICH2-GPU: Op0mized GPU to GPU Communica0on for InfiniBand

Real-Time GPU Management Heechul Yun 1 This Week Topic: General Purpose Graphic Processing

GPU programming Dr. Bernhard Kainz 1 Overview About myself Last week Motivation GPU

GPU PROGRAMMING 2 GPU Programming Assignment 4 Consists of

Web Dynamics Part 1 Introduction 1.1 Dimensions of dynamics in the Web 1.2 Application examples

H O W E T R U S S HISTORY, USE AND STRUCTURAL ANALYSIS 143 bridges are supported by the Howe

Nicholas Reed Structural Option Seneca Allegany Casino Hotel Addition AE Senior Thesis 2013

Parallelogram This is a four sided figure where opposite sides are equal and parallel. The

are: Opposite sides of a rectangle are parallel. 1. Opposite sides of a rectangle are equal. 2.

DETERMINATION OF FRACTURE TOUGHNESS OF AMORPHOUS CARBON COATINGS USING INDENTATION METHOD S. M.

A Scalable Generator of Non-Hermitian Test Matrices computed from Given Spectra for Large-scale

Basic Photography The 6 Things To Know Know your camera Hold the camera still

Self maps of P 1 with fixed degeneracies Lucien Szpiro Self maps of P 1 with fixed degeneracies

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Super GPU & Super Kernels: Make programming of multi-GPU systems easy Michael Frumkin, May 8,