Complex Systems Simulations with CUDA (S5133) Dr Paul Richmond - PowerPoint PPT Presentation

From Biological Cells to Populations of Individuals: Complex Systems Simulations with CUDA (S5133) Dr Paul Richmond Research Fellow University of Sheffield (NVIDIA CUDA Research Centre) GTC 2015

Overview • Complex Systems • A Framework for Modelling Agents • Degrees of Parallelisation • Agent Communication • Putting it all together GTC 2015

Complex Systems • Many individuals • Interact and behave according to simple rules • System level behaviour emerges GTC 2015

Agent Based Modelling • A method for specification and simulation of a complex system • Model is a set of autonomous communicating agents • Simulation helps to understand complex systems • Interventions and prediction • Presents a computational challenge! • Especially for real time or faster GTC 2015

Difficulties in Applying GPUs • Agents are heterogeneous i.e. They diverge • Agents are born and agents die Leads to sparse populations and non coalesced access • Agents communicate No global mechanism for GPU thread communication • Agents don't stay still Acceleration structures used for simulation need to be rebuilt GTC 2015

• Complex Systems • A Framework for Modelling Agents • Degrees of Parallelisation • Agent Communication • Putting it all together GTC 2015

A Formal Model of an Agent • Abstract the underlying architecture • Let modellers write models not parallel programs • Describe agents as a form of state machine (X- Machine) • Minimises divergence • Describe state transition functions (agent functions) using high level script • Describe communication as message dependencies between agent functions • Results in Directed Acyclic Graph • Identifies synchronisation points for scheduling GTC 2015

FLAME GPU: A Code Generation Framework • XML Model File • Describe Agents and Communication (messages) as a model in XML • XSLT Templates • Code generate a simulation API from agent descriptions • Scripted Behaviour • Scripted behaviour links with dynamic simulation API • Simulation Program • Loads initial data and provides I/O or interactive visualisation GTC 2015

Code Generation using XSLT • Powerful technique for code generation from Declarative XML model • Full functional programming language <xagents> <gpu:xagent> <name>Circle</name> <memory> <gpu:variable> <xsl:for-each select="xagents/gpu:xagent"> <type>int</type> struct __align__(16) xmachine_memory_<xsl:value-of select="name"/> <name>id</name> {<xsl:for-each select="memory/gpu:variable"> </gpu:variable> <xsl:value-of select="type"/><xsl:text> </xsl:text><xsl:if test="arrayLength">*</xsl:if><xsl:value-of select="name"/>; <gpu:variable> </xsl:for-each> <type>float</type> }; <name>x</name> </xsl:for-each> </gpu:variable> <gpu:variable> <type>float</type> <name>y</name> </gpu:variable> <gpu:variable> <type>float</type> struct __align__(16) xmachine_memory_Circle <name>z</name> { </gpu:variable> int id; <gpu:variable> float x; <type>float</type> float y; <name>fx</name> float z; </gpu:variable> float fx; <gpu:variable> float fy; <type>float</type> }; <name>fy</name> </gpu:variable> </memory> GTC 2015 ...

Mapping an Agent to the GPU __FLAME_GPU_FUNC__ int read_locations( • Each agent function is corresponds to a single GPU xmachine_memory_bird* xmemory, xmachine_message_location_list* location_messages) kernel { /* Get the first message */ • Each CUDA thread represents a single agent instance xmachine_message_location* location_message = get_first_location_message(location_messages); • Agent functions use a dynamically generated API /* Repeat untill there are no more messages */ • Agent Data is transparently loaded from Structures while(location_message) { of arrays /* Process the message */ if distance_check(xmemory, location_message) { typedef struct agent{ typedef struct agent_list{ updateSteerVelocity(xmemory, location_message); float x; float x[N]; } float y; float y[N]; } xm_memory_agent_list [N]; } xm_memory_agent_list; /* Get the next message */ location_message = get_next_location_message(location_message, location_messages); } … … … /* Update any other xmemory variables */ xmemory->x += xmemory->vel_x*TIME_STEP; 0 1 2 3 N 0 1 2 3 N 0 1 2 3 N ... return 0; } … … GTC 2015

Agent Divergence and Sparsity Agent List • Divergence : Must group agents (threads) • Good News: Agents are already grouped by Agent Function state • Bad News: Agents change states so we are left with sparse lists 1 0 1 1 0 1 0 1 • Avoid Sparse Lists by using parallel 3 4 0 0 1 2 2 4 Prefix Sum compaction. • Thrust C++ library Compact New Agent List GTC 2015

Parallelism within the model • Behaviour consists of function layers Layer 1 • Each layer is a synchronisation barrier • Synchronisation between agents only required when a dependency exists (communication or agent Layer 2 memory) • This creates parallelism within the function layers of the model • CUDA Streams can be used to Layer 3 execute independent functions GTC 2015

High Divergence Example • Single agent ‘cell’ type • 5 types of cell within • Single message type • Advantages • Large population counts (good utilisation) • Simple modelling (but complicated agent transition functions) • Disadvantages • Lots of code divergence • Unnecessary message reading GTC 2015

Low Divergence Example • Multiple agent types • Different agent type for each cell type • Distinction between message • Advantages • Less divergent code • More parallelism within the model • Less message reading • Disadvantages • Complex dependencies • More complex (looking) model • Smaller population sizes GTC 2015

Parallelism within the model - performance Simulation Speedup Average iteration time of cell behaviour 100 2000 1800 1600 Speedup of Cell Behaviour 1400 10 1200 Time (ms) 1000 800 1 600 500 2000 8000 32000 128000 512000 400 200 0 0.1 Population Size 500 2000 8000 32000 128000 512000 Population Size High Divergence Low Divergence GTC 2015

Agent Communication • Brute Force Messaging (N-Body problem) • Tile Messages into shared memory • Spatially Distributed Agents • Build data structure to bin agents • CUDA Particles • Use counting sort to improve performance • Discrete Space Limited Range (Cellular Automaton) • Cache results via texture cache (good locality) GTC 2015

Spatially Distributed Communication Radix Sorting Count Sort Hash Message Hash Message atomic add to bin Sort using Thrust (Sort by Key) Prefix Sum sort keys global index of bins Reorder Reorder scatter messages scatter messages build partition matrix GTC 2015

Counting Sort Performance Study Sorting Performance (1M elements) Tesla K20 Tesla K40 GTX 980 2 1.6 1.6 1.8 1.4 1.4 1.6 1.2 1.2 Time (ms) 1.4 1 1 1.2 1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 Element range Thrust Sort Counting Sort GTC 2015

Performance Breakdown for 16k agents 1 0.8 Time (ms) 0.6 Performance Improvement using Count Sort (GTX980) 0.4 1.35 1.3 0.2 1.25 Speedup 0 1.2 Count Sort Thrust Sort 1.15 Performance Breakdown for 4M agents 1.1 1.05 1400 1 1200 1000 Time (ms) Population Szie 800 600 400 • Counting sort best suited to smaller 200 0 population sizes Count Sort Thrust Sort • Message reading is the bottleneck send_locations read_locations move GTC 2015

Spatially Distributed Communication Benchmark 262144 65536 16384 GTX 980 K40 FLAME CPU 4096 1024 Time (ms) 256 64 16 4 1 4096 8192 16384 32768 65536 131072 262144 524288 1048576 2097152 4194304 Population Size 27k faster than FLAME on CPU with 50k agents ( apples != oranges ) 700x faster than FLAME II with 50k agents on 16 cores (using MPI, vector splitting) GTC 2015

Pedestrian Dynamics • Pedestrian agents • Social Repulsion (Social Forces) • Reynolds steering forces • Reciprocal Velocity Obstacles • Navigation agents • Global Vector Field • Navigation Graph • Environment and Goals are calculated as a weighted influence • An extension: Navigation graphs GTC 2015

Conclusions • Agent based modelling can be used to represent complex systems at differing biological scales • FLAME GPU is a framework for model description and CUDA code generation • Using state based representation avoids divergence and allows parallelism within a model to be exploited • Counting sort helpful for highly divergent population • Visualisation is extremely cheap GTC 2015

Complex Systems Simulations with CUDA (S5133) Dr Paul Richmond - PowerPoint PPT Presentation

From Biological Cells to Populations of Individuals: Complex Systems Simulations with CUDA (S5133) Dr Paul Richmond Research Fellow University of Sheffield (NVIDIA CUDA Research Centre) GTC 2015 Overview Complex Systems A Framework

Complex Numbers Complex Numbers 1 / 19 Complex Numbers Complex numbers ( C ) are an extension of

Intermembrane Space H + H + Cyt c Co Q Complex Complex III IV H + ATPase H + Complex

An introduction to complex numbers The complex numbers Are the real numbers not sufficient? A

Overview of Complex Networks Complex Networks Principles of Complex Systems | @pocsvox Basic

Complex Networks Principles of Complex Systems Basic definitions Examples of CSYS/MATH 300,

Complex Networks Basic definitions Principles of Complex Systems Books Course 300, Fall, 2008

Why Complex-Valued When Are Integration . . . Relation to Complex . . . Fuzzy? Why Complex

Math 211 Math 211 Complex Numbers and Matrices October 29, 2001 2 Complex Numbers Complex

Complex Networks Principles of Complex Systems Basic definitions Examples of Course CSYS/MATH

Overview of Complex Networks Principles of Complex Systems Basic definitions Examples of

Complex Networks :. .: Lectures notes for Basics of Complex Networks Course 295C Fall, 2007

network Complex Networks Complex Networks experience for professional or social purposes : a

network Complex Networks Complex Networks Prof. Peter Dodds Nutshell Nutshell noun

network Complex Networks Complex Networks Prof. Peter Dodds Nutshell Nutshell noun Basic

Outline Overview of Complex Networks Complex Networks Complex Networks Basic definitions

COSYTEC COS C Complex Systems Technologies Complex Systems Technologies A Problem

TAKING A CLOSER LOOK AT IRATI Supervisors: Koen Veelenturf Marijke Kaat - SURFnet

123 194 O. Noroozi et al. The separate effects of interactive digital learning materials (IDLMs)

Migrating Speedment to Java 9 Dan Lawesson , @dan_lawesson CSO, Speedment, Inc About Us About

Associative Fine-Tuning of Biologically Inspired Active Neuro-Associative Knowledge Graphs Adrian

OpenMP 5.0 for Accelerators and What Comes Next Tom Scogland and Bronis de Supinski LLNL

BOOM Analytics: Exploring Data-Centric,Declarative Programming for the Cloud Jadwiga Kaska 21

Ad-hoc Shared State for Web Applications Jack Jansen <ajn350@student.vu.nl> Introduction

Best Practice in Teaching Children Who Are Non-Verbal Teaching, Planning, Target Setting and

Complex Systems Simulations with CUDA (S5133) Dr Paul Richmond - PowerPoint PPT Presentation

From Biological Cells to Populations of Individuals: Complex Systems Simulations with CUDA (S5133) Dr Paul Richmond Research Fellow University of Sheffield (NVIDIA CUDA Research Centre) GTC 2015 Overview Complex Systems A Framework

Complex Numbers Complex Numbers 1 / 19 Complex Numbers Complex numbers ( C ) are an extension of

Intermembrane Space H + H + Cyt c Co Q Complex Complex III IV H + ATPase H + Complex

An introduction to complex numbers The complex numbers Are the real numbers not sufficient? A

Overview of Complex Networks Complex Networks Principles of Complex Systems | @pocsvox Basic

Complex Networks Principles of Complex Systems Basic definitions Examples of CSYS/MATH 300,

Complex Networks Basic definitions Principles of Complex Systems Books Course 300, Fall, 2008

Why Complex-Valued When Are Integration . . . Relation to Complex . . . Fuzzy? Why Complex

Math 211 Math 211 Complex Numbers and Matrices October 29, 2001 2 Complex Numbers Complex

Complex Networks Principles of Complex Systems Basic definitions Examples of Course CSYS/MATH

Overview of Complex Networks Principles of Complex Systems Basic definitions Examples of

Complex Networks :. .: Lectures notes for Basics of Complex Networks Course 295C Fall, 2007

network Complex Networks Complex Networks experience for professional or social purposes : a

network Complex Networks Complex Networks Prof. Peter Dodds Nutshell Nutshell noun

network Complex Networks Complex Networks Prof. Peter Dodds Nutshell Nutshell noun Basic

Outline Overview of Complex Networks Complex Networks Complex Networks Basic definitions

COSYTEC COS C Complex Systems Technologies Complex Systems Technologies A Problem

TAKING A CLOSER LOOK AT IRATI Supervisors: Koen Veelenturf Marijke Kaat - SURFnet

123 194 O. Noroozi et al. The separate effects of interactive digital learning materials (IDLMs)

Migrating Speedment to Java 9 Dan Lawesson , @dan_lawesson CSO, Speedment, Inc About Us About

Associative Fine-Tuning of Biologically Inspired Active Neuro-Associative Knowledge Graphs Adrian

OpenMP 5.0 for Accelerators and What Comes Next Tom Scogland and Bronis de Supinski LLNL

BOOM Analytics: Exploring Data-Centric,Declarative Programming for the Cloud Jadwiga Kaska 21

Ad-hoc Shared State for Web Applications Jack Jansen &lt;ajn350@student.vu.nl&gt; Introduction

Best Practice in Teaching Children Who Are Non-Verbal Teaching, Planning, Target Setting and

Ad-hoc Shared State for Web Applications Jack Jansen <ajn350@student.vu.nl> Introduction