NestMC A new multi-compartment neuronal network simulator Alexander - - PowerPoint PPT Presentation

nestmc
SMART_READER_LITE
LIVE PREVIEW

NestMC A new multi-compartment neuronal network simulator Alexander - - PowerPoint PPT Presentation

NestMC A new multi-compartment neuronal network simulator Alexander Peyser (FZ-J) & Sam Yates (CSCS) November 3, 2016 NestMC NestMC is a project to develop: a new multi-compartmental neuronal network simulator, that is optimized for HPC


slide-1
SLIDE 1

NestMC

A new multi-compartment neuronal network simulator Alexander Peyser (FZ-J) & Sam Yates (CSCS) November 3, 2016

slide-2
SLIDE 2

NestMC

NestMC is a project to develop: a new multi-compartmental neuronal network simulator, that is optimized for HPC systems, and is easy to integrate into existing workflows. See current development at

https://github.com/eth-cscs/nestmc-proto

NestMC | 2

slide-3
SLIDE 3

Who are we?

Cross-institutional collaboration As part of

NestMC | 3

slide-4
SLIDE 4

Why?

Why develop a new simulator?

There are problems and models that we can’t explore with current software and systems. New HPC architectures. Adapting existing simulators to new architectures is hard.

NestMC | 4

slide-5
SLIDE 5

Hard problems

Examples

Near real-time multi-compartment simulations. ‘Large’ networks:

long simulations, parameter search, statistical validation.

Field potential calculations:

large networks, volume visualization.

NestMC | 5

slide-6
SLIDE 6

New architectures

Processor clock speed growth suddenly slowed around 2004.

1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 0.1 1 10 100 1’000 10’000 year frequency (MHz)

Problem: power ∝ frequency3

NestMC | 6

slide-7
SLIDE 7

New HPC architectures

New performance gains primarily from: Highly parallel architectures (e.g. Intel KNL). Wider vector operations (e.g. AVX512). Specialized accelerator hardware: GPU, FPGA.

NestMC | 7

slide-8
SLIDE 8

New HPC architectures

Prototype Human Brain Project HPC systems at J¨ ulich

Julia Intel many core KNL blade Juron IBM Power8+GPU ‘fat’ node

Efficient use demands new approaches. We no longer get good performance improvement ‘for free’.

NestMC | 8

slide-9
SLIDE 9

Prototype design

Modular: components can be substituted according to internal API. Internal API: ‘thin’ API; type parameterization allows components to determine low-overhead API data structures.

model description (NMODL & recipes) model execution loop cell simulation spike exchange CPU implementation GPU implementation MPI implementation thread parallel implementation API API API

NestMC | 9

slide-10
SLIDE 10

Prototype design — backends

Cell simulation modules share computational backends for channel and synapse state evolution. CPU-hosted finite volume cell simulation

F.V.M. solver F.D. solver CPU scalar kernels CPU vector kernels GPU kernels NMODL specifications API API

NestMC | 10

slide-11
SLIDE 11

Prototype benchmarks

Test case

500 ms simulation. Each cell has 350 compartments and 2000 exponential excitatory synapses. H–H mechanism on cell somas, passive dendrites. Random network. Approximately 50 Hz spiking rate. Benchmarks run on Pitz Dora, a Cray XC-40 system with 36 Broadwell cores per node.

NestMC | 11

slide-12
SLIDE 12

Prototype benchmarks — strong scaling

1 2 4 8 16 32 64 128 256 1 10 100 1000 10000

8’45” – 0.146 nh 5” – 0.158 nh 70’03” – 1.17 nh 18” – 1.25 nh

nodes wall time (s)

147,456 cells 18,432 cells

NestMC | 12

slide-13
SLIDE 13

Prototype benchmarks — weak scaling

1 2 4 8 16 32 64 128 256 255 260 265 270 275

2,359,296 cells 9,216 cells

nodes wall time (s)

NestMC | 13

slide-14
SLIDE 14

Prototype status

Currently implemented

Finite-volume based discretization. Distributed model instantiation. Spike and voltage trace output. x66 multi-core and Intel KNL support. Synapse and ion-channel descriptions in NMODL. Unit and validation testing suite. GPU support

NestMC | 14

slide-15
SLIDE 15

How does this relate to NEST?

Expertise: developers and experience from NEST goes into NestMC and what we learn from NestMC feeds back to NEST Infrastructure: Community and legal infrastructure can be leveraged for multiple products Interface: similar Python interfaces can reduce time for users to use multiple tools

NestMC | 15

slide-16
SLIDE 16

How does this relate to NEST?

Components: libraries such as for connectivity can be shared between projects Formats: commonality of formats and communications such as NESTml and I/O formats Multiscaling: hybrid simulations may be built across the spectrum from neural mass models to point models to compartment models...

NestMC | 16

slide-17
SLIDE 17

How is this different from NEST?

Models: solving point model ODEs is very distinct from large coupled compartment models Performance: NEST’s point neurons are memory bound, NestMC is computationally bound Flexibility: NEST commits to maximum flexibility across platforms — NestMC is HPC focused on a subset of the “easy” 90% of cases Science: NEST is a highly reduced model for maximizing the size of simulations which are most amenable to mathematical analysis, while NestMC will be useful for morphologically detailed simulations

NestMC | 17

slide-18
SLIDE 18

Thank you!

CSCS: Ben Cumming, Vasileios Karakasis, Stuart Yates FZ-J: Wouter Klijn BSC: Ivan Martinez Contact

bcumming@cscs.ch a.peyser@fz-juelich.de https://eth-cscs.github.io/nestmc https://github.com/eth-cscs/nestmc-proto

slide-19
SLIDE 19

Participate

Success depends on facilitating use cases! Do you have a use case which is hard to simulate with current tools? Are there computational experiments that you wish to run, but currently cannot? We want to work closely with researchers and research groups to ensure that our designs meet real needs in the community.

NestMC | 18

slide-20
SLIDE 20

The limits of frequency

Microprocessor transistor counts have grown exponentially over a very long time frame.

1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 10’000 100’000 1’000’000 10’000’000 100’000’000 1’000’000’000 10’000’000’000 year transistors

NestMC | 19

slide-21
SLIDE 21

The limits of frequency

Processor clock speed growth suddenly slowed around 2004.

1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 0.1 1 10 100 1’000 10’000 year frequency (MHz)

Problem: power ∝ frequency3

NestMC | 20

slide-22
SLIDE 22

The limits of frequency

Single-thread performance growth also slows.

1995 2000 2005 2010 2015 0.03 0.1 0.3 1 3 10 30 100 300 year normalized single-thread specFP

Analysis thanks to Jeff Preshing, http://preshing.com

NestMC | 21

slide-23
SLIDE 23

Why?

Why develop a new simulator?

There are problems and models that we can’t explore with current software and systems. New HPC architectures. Adapting existing simulators to new architectures is hard.

NestMC | 22

slide-24
SLIDE 24

Existing simulators

NEURON and GENESIS have had a very long development. NEURON in particular is very large with many features. Newer simulators such as MOOSE and Brian still primarily target the workstation. GPU support for these simulators still under development. Adapting existing large applications to highly parallel and hardware-accelerated architectures is non-trivial.

NestMC | 23

slide-25
SLIDE 25

Opportunity

A new development project can: target contemporary and future HPC architectures, be co-designed to faciliatate new and difficult use cases.

NestMC | 24

slide-26
SLIDE 26

Opportunity

A new development project can: target contemporary and future HPC architectures, be co-designed to faciliatate new and difficult use cases. In addition, much easier to adopt modern software development processes from the start.

NestMC | 24

slide-27
SLIDE 27

NestMC

Two year initial project to design and develop a multi-compartmental simulator for HPC systems. Goals Interoperability

Simulator as library Visualization Multi-physics Multi-scale

Extensibility

Modular internal API New integration schemes Custom spike communication Specialized cells

Performance

HPC targeted Highly parallel GPU and vector targets Design for scalability

NestMC | 25

slide-28
SLIDE 28

Development timeline

Now Prototype development. 12/2016 Wind-up prototype. Finalize design of initial release. 4/2017 First public release of simulator. Move to open development model.

NestMC | 26

slide-29
SLIDE 29

Prototype

The prototype development allows us to explore the design space. “Plan one to throw one away” — Fred Brookes Currently implementing features and use cases to refine our design: LFP live visualization → interoperability features. Gap junctions → extensibility, internal API design. GPU execution → performance, modularity.

NestMC | 27

slide-30
SLIDE 30

Prototype design

Modular: components can be substituted according to internal API. Internal API: ‘thin’ API; type parameterization allows components to determine low-overhead API data structures.

model description (NMODL & recipes) model execution loop cell simulation spike exchange CPU implementation GPU implementation MPI implementation thread parallel implementation API API API

NestMC | 28

slide-31
SLIDE 31

Prototype design — backends

Cell simulation modules share computational backends for channel and synapse state evolution. CPU-hosted finite volume cell simulation

F.V.M. solver F.D. solver CPU scalar kernels CPU vector kernels GPU kernels NMODL specifications API API

NestMC | 29

slide-32
SLIDE 32

Prototype observations

Design

Abstractions over communication and threading

→ greatly simplified testing and validation.

Component architecture

→ rapid prototyping, → use-case driven API changes are limited in scope.

Functional model description

→ reproducible across different systems, → distributed instantiation.

NestMC | 30

slide-33
SLIDE 33

Prototype observations

Development practices

Unit and validation testing

→ limits exposure to bugs and design errors.

Large matrix of compilers and hardware targets

→ continuous integration a necessity

‘Agile’ iterative refinement of development practices allows us to adapt our processes to our team distributed across multiple institutions and countries.

NestMC | 31