Student Instrumentation Weekly August 2016 update TrueNorth Kalman - - PowerPoint PPT Presentation

student instrumentation weekly
SMART_READER_LITE
LIVE PREVIEW

Student Instrumentation Weekly August 2016 update TrueNorth Kalman - - PowerPoint PPT Presentation

rcarney@lbl.gov 26/08/16 Student Instrumentation Weekly August 2016 update TrueNorth Kalman filter Rebecca Carney 1 Overview RMD Carney 29/04/15 3 Recap/Intro TrueNorth & Kalman filters 6 Neuron parameters and default properties 9


slide-1
SLIDE 1

26/08/16

rcarney@lbl.gov

Student Instrumentation Weekly

Rebecca Carney

August 2016 update

1

TrueNorth Kalman filter

slide-2
SLIDE 2

Overview

RMD Carney 29/04/15

3 6 9 13 23 39 Recap/Intro TrueNorth & Kalman filters Neuron parameters and default properties Quantifying our implementation diffs Encoding window Spiking precision Understanding the residuals Next steps

2

slide-3
SLIDE 3

RMD Carney 26/08/16

3

Kalman filter and simple dataset

A linear update algorithm that uses noisy measurements and previous states of a system to predict new states. The technique of combining previous state estimates and refining them with measurements produces a more accurate estimate of the state than using measurements alone.

Kalman Filter

So far, we’re only considering steady-state kaman filters as the covariance for the state prediction converges and can be calculated ahead of time. Our test dataset is the constant acceleration 1-dim projectile motion. So the idea is that for a set of noisy measurements in position, velocity, and acceleration the kaman filter will predict the next state.

Test dataset

From last time…

slide-4
SLIDE 4

RMD Carney 26/08/16

KF as linear update

4

Blocks of linear algebra in spiking NN

+ Ax(t) y(t)

x(t+dt)

x(t + ∆t) = Ax(t) + By(t)

B By(t)

The steady-state KF has a very simple update equation for the state prediction. If we start thinking about how we would implement this algorithm on a low-power multi- processor, it’s clear that we can pre-multiply the measurement off-chip to save space . If we do this then there is only one loop and multiplication step in the system!

Off-chip On-chip

x = state prediction (position, velocity, accn) y = measurement (position, velocity, accn)

slide-5
SLIDE 5

RMD Carney 01/05/16

TrueNorth architechture

5

Inside TrueNorth

TrueNorth numbers

1 million programmable neurons 256 million synapses 4096 neurosynaptic cores 70mW per chip* Spiking rate =1kHz 4096 cores

Each core contains its own memory, connectivity map processing unit - with ‘neuron’ capabilities. The neuron computations are carried out in synchronous logic but wait for a ‘tick’ to start. Here is a schematic representation of a core. For the purposes of this update all you need to know if that the computation takes part in the neuron.

slide-6
SLIDE 6

RMD Carney 26/08/15

Some other things…

6

There are 3 versions of the KF I will talk about in this presentation:

  • A spiking simulation of the Kalman filter, using temporal encoding, written in python (custom)
  • A spiking simulation of the chip, supplied by IBM
  • The chip itself.

This neuromorphic setup:

  • Communicates with ‘spikes’, single bits of information
  • The chip has limited functionality and parameters, you cannot program it or implement digital

logic

  • We encode floating point values using temporal encoding: we count how many spikes occur

in a given window.

slide-7
SLIDE 7

RMD Carney 26/08/15

From last time…

7

Major issue!

Positive to negative transition is screwed up somehow… Last time I showed that the KF implementation works on NSCS and NS1e (IBM’s chip simulator and the physical board) except for the transition from positive-to-negative (and vice versa!) values.

slide-8
SLIDE 8

RMD Carney 26/08/15

Kappa and Gamma

8

Neuron threshold reset parameters: positive and negative thresholds

The problem turned out to be the default negative reset mode. Because we are encoding negative values in a separate channel that gets combined with the positive channel in the addition block, when the neuron potential dips below zero we want to immediately reset it to zero. The default mode however just adds Beta ( == 0 in our setup), so the potential just kept decreasing and no spikes were ever fired. Once we changed Kappa == 1…

slide-9
SLIDE 9

RMD Carney 26/08/16

Positive to negative transition… resolved!

9

NSCS formulation with Kappa/ Gamma == 1

… it resolves the issue! But do the chip simulator and the chip still agree? They have to agree EXACTLY if we want to trust the NSCS.

slide-10
SLIDE 10

RMD Carney 26/08/15

Quantifying the KF: NSCS vs TN

10

Chip simulator response vs. chip response

0.2 0.4 0.6 0.8 1 Projectile position [arb] NSCS state estimate Input: measurements Output: NS1e state estimate

  • 1e-10

1e-10 50 100 150 200 250 300 Residuals Time [ms]

They agree exactly (diff of two output files is empty)!

slide-11
SLIDE 11

RMD Carney 26/08/15

11

Quantifying the KF: NSCS vs pySpiking

Chip simulator response vs. spiking python response: non-identical!

  • 1
  • 0.5

0.5 1 Projectile data [arb]

Position Velocity Accn input NSCS

  • 0.01

0.01 50 100 150 200 250 Unscaled residuals [arb] Time [ms]

BUT! The NSCS and pySpiking response is NOT identical. There are no random number throws in this formulation and no noise, so why are these different? This is very important: if these are different it means we’ve misunderstood something about how the chip works!

slide-12
SLIDE 12

RMD Carney 26/08/15

12

Quantifying the KF: NSCS vs pySpiking

The problem with time-encoded spiking networks…

t=10 t=10 t=10 t=10 t=10 t=10

1 2 5 10 15 20 It turns out that this stark difference is due to how the input data is encoded into spike trains. For an encoding window of 10 ticks, the value 0.7 would be expressed as 7 ticks at the end of the window in NSCS, and 7 ticks at the beginning of the window in the Python simulation.
 This matters because there is no way to explicitly reset the neuron potentials at the end of the encoding window and in the addition unit two values added together will likely overlap into the next window (see cartoon). In general this is a problem even if we prepend the spikes: large values that do not finish spiking by the end of the encoding frame will fall into the next time window and a smearing affect happens. Also, because the neuron potentials are likely not multiples of the threshold there will be some remainder stored int the neuron potential. This will build up over time, introducing additional spikes in the wrong window. Because of the recurrent connection there is no way to avoid this.

But does using the same encoding scheme make python and NSCS identical?

slide-13
SLIDE 13

RMD Carney 26/08/16

13

No difference!

Quantifying the KF: NSCS vs pySpiking

  • 1
  • 0.5

0.5 1 Projectile position [arb] KF Position measurements KF Velocity NSCS estimate KF Accn

  • 0.01

0.01 50 100 150 200 250 300 Residuals Time [ms]

Yes! (A diff on the two outputs was empty).

slide-14
SLIDE 14

RMD Carney 26/08/16

Representation error

14

How does the encoding window change the magnitude of the representation error?

  • 1
  • 0.5

0.5 1 Projectile data [arb] NSCS KF vs numerical KF for 1D projectile - 10k tick encoding window, 1ms tick

Position Velocity Accn input NSCS

  • 0.002

0.002 50 100 150 200 250 300 Unscaled residuals [arb] Time [ms]

Now that we understand the difficulties with the encoding scheme it would be interesting to see how the spiking simulations compare to the numerical KF with increasing encoding

  • windows. 


As encoding window —> infinity, it should approach the plot above.

slide-15
SLIDE 15

RMD Carney 26/08/15

15

Scan encoding window

  • 1
  • 0.5

0.5 1 Projectile data [arb] NSCS KF vs numerical KF for 1D projectile - 10 ms encoding window

Position Velocity Accn input NSCS

  • 0.3

0.3 50 100 150 200 250 300 Unscaled residuals [arb] Time [ms]

10 spikes, TN weights, python

Can clearly see the max. transition in units of ticks

slide-16
SLIDE 16

RMD Carney 26/08/15

16

  • 1
  • 0.5

0.5 1 Projectile data [arb] NSCS KF vs numerical KF for 1D projectile - 20 tick encoding window, 1ms tick

Position Velocity Accn input NSCS

  • 0.1

0.1 50 100 150 200 250 300 Unscaled residuals [arb] Time [ms]

Scan encoding window

20 spikes, TN weights, python

NB: The residuals scale changing

slide-17
SLIDE 17

RMD Carney 26/08/15

17

  • 1
  • 0.5

0.5 1 Projectile data [arb] NSCS KF vs numerical KF for 1D projectile - 50 tick encoding window, 1ms tick

Position Velocity Accn input NSCS

  • 0.05

0.05 50 100 150 200 250 300 Unscaled residuals [arb] Time [ms]

Scan encoding window

50 spikes, TN weights, python

slide-18
SLIDE 18

RMD Carney 26/08/15

18

  • 1
  • 0.5

0.5 1 Projectile data [arb] NSCS KF vs numerical KF for 1D projectile - 100 tick encoding window, 1ms tick

Position Velocity Accn input NSCS

  • 0.02

0.02 50 100 150 200 250 300 Unscaled residuals [arb] Time [ms]

Scan encoding window

100 spikes, TN weights, python

slide-19
SLIDE 19

RMD Carney 26/08/15

19

  • 1
  • 0.5

0.5 1 Projectile data [arb] NSCS KF vs numerical KF for 1D projectile - 200 tick encoding window, 1ms tick

Position Velocity Accn input NSCS

  • 0.01

0.01 50 100 150 200 250 300 Unscaled residuals [arb] Time [ms]

Scan encoding window

200 spikes, TN weights, python

slide-20
SLIDE 20

RMD Carney 26/08/15

20

  • 1
  • 0.5

0.5 1 Projectile data [arb] NSCS KF vs numerical KF for 1D projectile - 1k tick encoding window, 1ms tick

Position Velocity Accn input NSCS

  • 0.005

0.005 50 100 150 200 250 300 Unscaled residuals [arb] Time [ms]

Scan encoding window

1k spikes, TN weights, python

slide-21
SLIDE 21

RMD Carney 26/08/15

21

  • 1
  • 0.5

0.5 1 Projectile data [arb] NSCS KF vs numerical KF for 1D projectile - 2k tick encoding window, 1ms tick

Position Velocity Accn input NSCS

  • 0.002

0.002 50 100 150 200 250 300 Unscaled residuals [arb] Time [ms]

Scan encoding window

2k spikes, TN weights, python

A trend emerging?

slide-22
SLIDE 22

RMD Carney 26/08/15

22

  • 1
  • 0.5

0.5 1 Projectile data [arb] NSCS KF vs numerical KF for 1D projectile - 5k tick encoding window, 1ms tick

Position Velocity Accn input NSCS

  • 0.002

0.002 50 100 150 200 250 300 Unscaled residuals [arb] Time [ms]

5k spikes, TN weights, python

Scan encoding window

slide-23
SLIDE 23

RMD Carney 26/08/15

10k spikes, TN weights, python

23

  • 1
  • 0.5

0.5 1 Projectile data [arb] NSCS KF vs numerical KF for 1D projectile - 10k tick encoding window, 1ms tick

Position Velocity Accn input NSCS

  • 0.002

0.002 50 100 150 200 250 300 Unscaled residuals [arb] Time [ms]

Scan encoding window

There is clearly a trend! Can we get rid o it with full precision weights?

slide-24
SLIDE 24

RMD Carney 26/08/15

With full precision weights: spiking

24

  • 1
  • 0.5

0.5 1 Projectile data [arb] NSCS KF vs numerical KF for 1D projectile - 10k tick encoding window, 1ms tick

Position Velocity Accn input NSCS

  • 0.002

0.002 50 100 150 200 250 300 Unscaled residuals [arb] Time [ms]

Improvement but not identical

Situation is improved somewhat, but the systematic deviation is still present. What happens if we flip the dynamics sign?

Acc = -0.7, Vel = +1

slide-25
SLIDE 25

RMD Carney 26/08/15

25

  • 1
  • 0.5

0.5 1 Projectile data [arb] NSCS KF vs numerical KF for 1D projectile - 10k tick encoding window, 1ms tick

Position Velocity Accn input NSCS

  • 0.002

0.002 50 100 150 200 250 300 Unscaled residuals [arb] Time [ms]

Same dynamics, reverse accn and vel sign

Different dynamics

Acc = +0.7, Vel = -1

The residuals pattern is also flipped over the x-axis! What happens if we make the magnitude smaller?

slide-26
SLIDE 26

RMD Carney 26/08/15

26

  • 1
  • 0.5

0.5 1 Projectile data [arb] NSCS KF vs numerical KF for 1D projectile - 10k tick encoding window, 1ms tick

Position Velocity Accn input NSCS

  • 0.002

0.002 50 100 150 200 250 300 Unscaled residuals [arb] Time [ms]

Different dynamics

Acc = +0.4, Vel = -0.5 Smaller magnitude dynamics, reverse accn and vel sign

The residuals are also smaller. Is the same true for the positive dynamics? What if the accn is 0?

slide-27
SLIDE 27

RMD Carney 26/08/15

27

  • 1
  • 0.5

0.5 1 Projectile data [arb] NSCS KF vs numerical KF for 1D projectile - 10k tick encoding window, 1ms tick

Position Velocity Accn input NSCS

  • 0.002

0.002 50 100 150 200 250 300 Unscaled residuals [arb] Time [ms]

Different dynamics

Acc = 0, Vel = -0.8 All values negative or 0: no transitions

Errors look bad again! Try all positive…

slide-28
SLIDE 28

RMD Carney 26/08/15

Different dynamics

28

  • 1
  • 0.5

0.5 1 Projectile data [arb] NSCS KF vs numerical KF for 1D projectile - 10k tick encoding window, 1ms tick

Position Velocity Accn input NSCS

  • 0.002

0.002 50 100 150 200 250 300 Unscaled residuals [arb] Time [ms]

Acc = +0.2, Vel = -0.1 Small magnitude values, all positive

Situation looks good and starts deviating near the end. Up till now we’ve had an ideal (and unrealistic) KF. What happens when we add noise back in? Will the deviation scale linearly with noise?

slide-29
SLIDE 29

RMD Carney 26/08/16

Reintroducing noise

29

Process noise = 1E-3, measurement noise = 1E-2

  • 1
  • 0.5

0.5 1 Projectile data [arb] NSCS KF vs numerical KF for 1D projectile - 10k tick encoding window, 1ms tick

Position Velocity Accn input NSCS

  • 0.02

0.02 50 100 150 200 250 300 Unscaled residuals [arb] Time [ms]

Adding process and measurement noise back into the system increases the deviations by 10%!! There appears to be some correlation with the way the accn/pos react, if we remove their correlation from the weight matrix: what happens?

slide-30
SLIDE 30

RMD Carney 26/08/16

30

Removing the accn/pos correlation actually makes things worse

  • 1
  • 0.5

0.5 1 Projectile data [arb] NSCS KF vs numerical KF for 1D projectile - 10k tick encoding window, 1ms tick

Position Velocity Accn input NSCS

  • 0.02

0.02 50 100 150 200 250 300 Unscaled residuals [arb] Time [ms]

No acc/pos correlations?

This makes the situation worse! If it’s the remainder effect in the spiking neurons again, perhaps we can see that by plotting the neuron potential?

slide-31
SLIDE 31

RMD Carney 26/08/16

What do the neuron potentials look like?

31

Nothing obvious to discern from the neuron potentials

2e+09 4e+09 6e+09 8e+09 1e+10 Neuron pot [arb] Neuron Potential Position Ax 2e+09 4e+09 6e+09 8e+09 Neuron pot [arb] 100000 200000 300000 400000 500000 600000 700000 800000 900000 1e+06 Neuron pot [arb] Ticks [ms] 2e+09 4e+09 6e+09 8e+09 1e+10 Neuron Potential Velocity Ax

a10

4e+09 8e+09 1.2e+10 1.6e+10

a11

4e+08 8e+08 1.2e+09 1.6e+09 100000 200000 300000 400000 500000 600000 700000 800000 900000 1e+06 Ticks [ms]

a12

2e+09 4e+09 6e+09 8e+09 Neuron pot [arb] Neuron Potential Acceleration Ax 2e+09 4e+09 6e+09 8e+09 Neuron pot [arb] 2e+09 4e+09 6e+09 8e+09 100000 200000 300000 400000 500000 600000 700000 800000 900000 1e+06 Neuron pot [arb] Ticks [ms]

These plots show the neuron potentials for all 3 weights for the pos/vel/accn inputs (Ax in the KF equation). I am only plotting the neuron potential at the beginning of each encoding window, so every 10k point (if I don’t do this it is very hard to see what is happening). Obviously this introduces some aliasing so don’t pay attention to the pattern too much. 
 The point of these plots is that there are INDEED remainders in the neuron potentials at the start of a new encoding window.

slide-32
SLIDE 32

RMD Carney 26/08/16

Vary the noise

32

What to do next?

If this is better understood in a top-down approach and varying the dynamics hasn’t given us too much cause as to how the behavior of the dispersion changes, perhaps varying the process and measurement noise could give us an idea?

slide-33
SLIDE 33

RMD Carney 26/08/16

O(1E-10) process & meas. noise

33

Very small noise: still a trend

  • 1
  • 0.5

0.5 1 Projectile data [arb] NSCS KF vs numerical KF for 1D projectile - 10k tick encoding window, 1ms tick

Position Velocity Accn input NSCS

  • 0.002

0.002 50 100 150 200 250 300 Unscaled residuals [arb] Time [ms]

For very small, identical noise, we still see a trend in the residuals.

slide-34
SLIDE 34

RMD Carney 26/08/16

34

Larger noise

O(1E-3) process & meas. noise

  • 1
  • 0.5

0.5 1 Projectile data [arb] NSCS KF vs numerical KF for 1D projectile - 10k tick encoding window, 1ms tick

Position Velocity Accn input NSCS

  • 0.002

0.002 50 100 150 200 250 300 Unscaled residuals [arb] Time [ms]

Increasing the magnitude of the noise doesn’t appear to change the shape of the residuals..

slide-35
SLIDE 35

RMD Carney 26/08/16

35

Large noise

  • 1
  • 0.5

0.5 1 Projectile data [arb] NSCS KF vs numerical KF for 1D projectile - 10k tick encoding window, 1ms tick

Position Velocity Accn input NSCS

  • 0.002

0.002 50 100 150 200 250 300 Unscaled residuals [arb] Time [ms]

O(1E-1) process & meas. noise

Even with noise ~10% of the signal, the deviations from the residuals are about the same! What happens if we don’t use identical noise?

slide-36
SLIDE 36

RMD Carney 26/08/16

36

Don’t be tempted to think this is spiking

  • vs. truth!
  • 1
  • 0.5

0.5 1 Projectile data [arb] NSCS KF vs numerical KF for 1D projectile - 10k tick encoding window, 1ms tick

Position Velocity Accn input NSCS

  • 0.002

0.002 50 100 150 200 250 300 Unscaled residuals [arb] Time [ms]

O(1E-3) process & O(1E-1) meas. noise

A massive difference! Here we are saying that our process is deviating slightly from what we expect it to (crosswinds buffeting the ball?) but that we’re using a very poor measurement

  • device. What happens if we reverse the trend?
slide-37
SLIDE 37

RMD Carney 26/08/16

37

The systematic variation has entirely disappeared!!

  • 1
  • 0.5

0.5 1 Projectile data [arb] NSCS KF vs numerical KF for 1D projectile - 10k tick encoding window, 1ms tick

Position Velocity Accn input NSCS

  • 0.002

0.002 50 100 150 200 250 300 Unscaled residuals [arb] Time [ms]

O(1E-1) process & O(1E-3) meas. noise

The trend has almost disappeared! Here we are saying that our process is deviating a lot from what we expect (hurricanes buffeting the ball?) but that we’re using a really accurate measurement device. What is going on??

slide-38
SLIDE 38

RMD Carney 26/08/16

Weighted inputs

38

How do the process/meas. noise affect the Kalman gain?

0.5 1

Effect of dynamics variance and meas. noise - 10k tick encoding window, 1ms tick

  • 1

1

Comparison of weighted inputs [arb]

Dy=10-3, meas=10-1 Dy=10-1, meas=10-3

  • 1
  • 0.5
  • 0.01

0.01

Dy=10-1, meas=10-3

  • 0.01

0.01

50 100 150 200 250 300 Residuals [arb] Ticks [ms]

Dy=10-3, meas=10-1

Process noise and measurement noise are combined in the Kalman filter in a non-linear way to produce the Kalman Gain term. This Kalman gain is multiplied by the input measurement. So what does ‘By’ look like in the two cases?

slide-39
SLIDE 39

RMD Carney 26/08/16

slide title

39

0.5 1

Effect of dynamics variance and meas. noise - 10k tick encoding window, 1ms tick

  • 0.005

0.005 Dy=10-3, meas=10-1 Dy=10-1, meas=10-3 0.004

Comparison of weighted inputs [arb]

  • 0.8

0.8

  • 0.003
  • 0.0015
  • 0.004

0.004

  • 1
  • 0.5

Ticks [ms]

How do the process/meas. noise affect the recurrent term?

The dynamics matrix is subtracted by the Kalman gain to produce the weight matrix for the recurrent connections. So what does ‘Ax’ look like in the two cases?

slide-40
SLIDE 40

RMD Carney 26/08/16

Moving forward…

40

We know that the memory affect could be a major problem but we need to quantify it further

The kaman filter is just showing us that large values inside the chip begin to introduce a systematic error due to the memory effect. When the process noise is large wrt the measurement noise, the recurrent matrix weights are small and so the memory effect is

  • small. However it is not likely this will be a common situation in real life.

What does this tell us?

Whilst it would be good to further probe this diversion from the performance of the numerical KF, we cannot solve it. I previously mentioned that we have formulated a ‘parallel’ version of the KF for TN. Instead of encoding values using time, it encodes them using neuron populations: very similar to a flash ADC. I will talk about this in detail next month. This formulation should remove the encoding window overlap problem

What to do next?

IBM have given me the name of the parameter that allows us to set the tick rate. Currently my edits are getting overwritten but hopefully that will be resolved shortly.

tickPeriod

Questions?