EXADAPT Mar 2012 1
Self-Tuning Bio-Inspired Massively-Parallel Computing
Steve Furber The University of Manchester
steve.furber@manchester.ac.uk
Self-Tuning Bio-Inspired Massively-Parallel Computing Steve Furber - - PowerPoint PPT Presentation
Self-Tuning Bio-Inspired Massively-Parallel Computing Steve Furber The University of Manchester steve.furber@manchester.ac.uk EXADAPT Mar 2012 1 Outline 63 years of progress Many cores make light work Building brains The
EXADAPT Mar 2012 1
Steve Furber The University of Manchester
steve.furber@manchester.ac.uk
EXADAPT Mar 2012 2
EXADAPT Mar 2012 3
EXADAPT Mar 2012 4
EXADAPT Mar 2012 5
– filled a medium-sized room – used 3.5 kW of electrical power – executed 700 instructions per second
– fills ~3.5mm2 of silicon (130nm) – uses 40 mW of electrical power – executes 200,000,000 instructions per second
EXADAPT Mar 2012 6
– 5 Joules per instruction
– 0.000 000 000 2 Joules per instruction
better than Baby!
(James Prescott Joule born Salford, 1818)
EXADAPT Mar 2012 7
0.001 0.01 0.1 1 10 100 1970 1975 1980 1985 1990 1995 2000 Year Millions of transistors per chip 8008 8080 8086 286 386 486 Pentium 4004 Pentium II Pentium III Pentium 4
EXADAPT Mar 2012 8
EXADAPT Mar 2012 9
EXADAPT Mar 2012 10
– diminishing returns from complexity – wire vs transistor delays
– cut-and-paste – simple way to deliver more MIPS
– more transistors – more cores
… but what about the software?
EXADAPT Mar 2012 11
– an unsolved problem – the ‘Holy Grail’ of computer science for half a century? – but imperative in the many-core world
– few complex cores, or many simple cores? – simple cores win hands-down on power-efficiency!
EXADAPT Mar 2012 12
– a limitless supply of (free) processors – load-balancing is irrelevant – all that matters is:
EXADAPT Mar 2012 13
EXADAPT Mar 2012 14
– massive parallelism (1011 neurons) – massive connectivity (1015 synapses) – excellent power-efficiency
– low-performance components (~ 100 Hz) – low-speed communication (~ metres/sec) – adaptivity – tolerant of component failure – autonomous learning
EXADAPT Mar 2012 15
EXADAPT Mar 2012 16
(c.f. logic gate)
(102 to 1011)
‘microarchitecture’
EXADAPT Mar 2012 17
EXADAPT Mar 2012 18
Neuron ID Simulation time (msec)
EXADAPT Mar 2012 19
Neuron ID Simulation time (msec)
EXADAPT Mar 2012 20
Simulation time Delay after pattern input (ms)
EXADAPT Mar 2012 21
Masquelier & Thorpe, 2007
EXADAPT Mar 2012 22
EXADAPT Mar 2012 23
– 18 ARM968 processors – to model large-scale systems of spiking neurons
with 10,000s of nodes
– over a million processors – >108 MIPS total
EXADAPT Mar 2012 24
– physical and logical connectivity are decoupled
– time models itself
– processors are free – the real cost of computation is energy
EXADAPT Mar 2012 25
EXADAPT Mar 2012 26
EXADAPT Mar 2012 27
Mobile DDR SDRAM interface
EXADAPT Mar 2012 28
Multi-chip packaging by UNISEM Europe
EXADAPT Mar 2012 29
– fault insertion – how do we test the FT feature? – fault detection – we have a problem! – fault isolation – contain the damage – reconfiguration – repair the damage
– real-time system, so checkpoint & restart inapplicable
EXADAPT Mar 2012 30
– 3-of-6 RTZ on chip – 2-of-7 NRZ off chip
– Tx & Rx circuits have high deadlock immunity – Tx & Rx can be reset independently
surplus token
din (2 phase) dout (4 phase) ¬reset ¬ack
Tx Rx
data ack
EXADAPT Mar 2012 31
– any processor can be Monitor Processor
– all nodes are identical at start-up
connection (0,0)
– system initialised using flood-fill
EXADAPT Mar 2012 32
– hardware routing – ‘emergency’ routing
– permanent fault
EXADAPT Mar 2012 33
EXADAPT Mar 2012 34
conveyed efficiently to >1,000 inputs
use the same delivery mechanism
EXADAPT Mar 2012 35
– MC (multicast): source routed; carry events (spikes) – P2P (point-to-point): used for bootstrap, debug, monitoring, etc – NN (nearest neighbour): build address map, flood-fill code – FR (fixed route): carry 64-bit debug data to host
– which could otherwise circulate forever
Header (8 bits) Event ID (32 bits) P ER TS T 0 - Payload (32 bits) Header (8 bits) Address (16+16 bits) P SQ TS T 1 - Srce Dest
EXADAPT Mar 2012 36
(but careful network mapping also essential)
– automatic multicasting
Inter-chip 0 0 1 0 X 1 1 X 000000010000010000 001001 On-chip Event ID
EXADAPT Mar 2012 37
06 07 03 09 01 07 01 94 Problem graph (circuit) 1 02 3 2 23 72
4
72 23 Node 94 14 15 Core 10 2 6
5 9 3 6 Synapse 10 2 7 11 1 8 12 1 3 1 2 1 2 2 2 3 3 23 72 94
72 94 3 2 3 23 72 94 3 2 2 23 72 94 1 23 72 94 2
72 94
2
Topology Fragment of MC table
EXADAPT Mar 2012 38
SpiNNaker:
Problem: represented as a network of nodes with a certain behaviour... ...behaviour of each node embodied as an interrupt handler in code... ...compile, link... ...binary files loaded into core instruction memory... Our job is to make the model behaviour reflect reality ...problem is split into two parts... ...problem topology loaded into firmware routing tables... ...abstract problem topology...
The code says "send message" but has no control where the output message goes
EXADAPT Mar 2012 39
– in each direction
bandwidth
EXADAPT Mar 2012 40
EXADAPT Mar 2012 41
EXADAPT Mar 2012 42
EXADAPT Mar 2012 43
EXADAPT Mar 2012 44
We don’t know how to do this!
EXADAPT Mar 2012 45
EXADAPT Mar 2012 46
EXADAPT Mar 2012 47
– 500 LIF neurons
EXADAPT Mar 2012 48
EXADAPT Mar 2012 49
EXADAPT Mar 2012 50
FPGA FPGA FPGA 2x 3.1 Gbps SATA links 3-board basic unit:
EXADAPT Mar 2012 51
EXADAPT Mar 2012 52
EXADAPT Mar 2012 53
103 machine: 864 cores, 1 PCB, 75W 104 machine:10,368 cores, 1 rack, 900W
(NB 12 PCBs for operation without aircon)
105 m/c: 103,680 cores, 1 cabinet, 9kW 106 m/c: 1M cores, 10 cabs, 90kW
EXADAPT Mar 2012 54
– Cards can be linked together
build Router tables, etc …and the next steps:
500-chip 104 machine (Q2 2012), 5,000-chip 105 machine (H2 2012), 50,000-chip 106 machine (end H2 2012).
EXADAPT Mar 2012 55
challenge
multicast communications infrastructure
effectively, in the neurons themselves!
EXADAPT Mar 2012 56