[PPT] - Topics in Brain Computer Interfaces Topics in Brain Computer PowerPoint Presentation

SLIDE 1

Frank Wood - fwood@cs.brown.edu

Topics in Brain Computer Interfaces Topics in Brain Computer Interfaces CS295 CS295-

7

7

Professor: MICHAEL BLACK TA: FRANK WOOD Spring 2005 Automated Spike Sorting

SLIDE 2

Frank Wood - fwood@cs.brown.edu

Today

Particle Filter Homework Discussion and Review
Kalman Filter Review
PCA Introduction
EM Review
Spike Sorting

SLIDE 3

Frank Wood - fwood@cs.brown.edu

Particle Filtering Movies

SLIDE 4

Frank Wood - fwood@cs.brown.edu

Homework Results?

Better than CC X .5, CC Y .8? How?
What state estimator did you use (ML/E[])? Why?
When did you estimate the state?
Particle re-sampling schedule?
Remaining questions?
Initial state estimate?
How did the homework synthesize with the lecture

notes and readings?

SLIDE 5

Frank Wood - fwood@cs.brown.edu

Viewing the Bayesian Recursion after implementing Particle Filtering

( ) ( )

1

P , P model the and P and P given

− k k k k

x x x z z x r r r r r r

( ) ( ) ( ) ( )

P P | P | P z x x z z x r r r r r r =

( ) ( ) ( )

1 1

| P | P | P x z x x x z x r r r r r r r ∂ = ∫

( ) ( ) ( ) ( )

1 1 1 1 1 1

| P | P | P , | P z z z x x z z z x r r r r r r r r r =

( ) ( ) ( )

1 1 1 1 2 1 2

, | P | P , | P x z z x x x z z x r r r r r r r r ∂ = ∫

( ) ( ) ( ) ( )

1 2 1 2 2 2 2 1 2

, | P , | P | P , , | P z z z z z x x z z z z x r r r r r r r r r r r r =

M

Start here with particles representing the posterior. Start here with particles representing the posterior. System model System model Observation model Observation model

SLIDE 6

Frank Wood - fwood@cs.brown.edu

Next Homework: The Kalman Filter

Closed form solution to recursive Bayesian

estimation where the observation and state models are linear + Gaussian noise.

Seminal paper published in 1960:

–

R.E. Kalman, “A New Approach to Linear Filtering and Prediction Problems”

k k k k

w x A x + =

−1

k k k k

q x H z + =

Observation model State model

) , ( ~ Q N qk

) , ( ~ W N wk

SLIDE 7

Frank Wood - fwood@cs.brown.edu

1 1

and ˆ

f

estimate Initial

k- k

P x −

Time Update Measurement Update

Welch and Bishop 2002

Prior estimate Error covariance Posterior estimate Kalman gain Error covariance

W A AP P x A x

T k k k k

+ = =

− − − − 1 1

ˆ ˆ

1

) ( ) ( ) ˆ ( ˆ ˆ

− − − − − −

+ = − = − + = Q H HP H P K P H K I P x H z K x x

T k T k k k k k k k k k k

v

The Kalman Filter Algorithm

SLIDE 8

Frank Wood - fwood@cs.brown.edu

Where do these equations come from?

Find an unbiased minimum variance estimator of

the state at time k+1 of the form

For to be unbiased means:

1 1 1 1

ˆ ˆ

+ + + +

+ ′ =

k k k k k

z K x K x

We’ll look at this today. We’ll look at this today.

[ ]

ˆ

1 1

= −

+ + k k

x x E

1

ˆ +

k

x

This bit is much trickier. A link to a full derivation is on the web. This bit is much trickier. A link to a full derivation is on the web.

Excerpted and modified from aticourses.com

SLIDE 9

Frank Wood - fwood@cs.brown.edu

Remember from the previous slide

½ Unbiased Estimate

[ ]

ˆ

1 1 1 1

= − + ′

+ + + + k k k k k

x z K x K E

( ) [ ]

ˆ

1 1 1 1 1 1 1

= ′ + ′ − − + + ′

+ + + + + + + k k k k k k k k k k

x K x K x q Hx K x K E

( ) ( ) ( ) ( ) [ ] 0

ˆ

1 1 1 1 1 1

= ′ + + − + + + − ′

+ + + + + + k k k k k k k k k k k

x K w Ax q w Ax H K x x K E

( ) ( ) [ ]

) ( ˆ

1 1 1 1 1 1 1

= + − + ′ + − + − ′

+ + + + + + + k k k k k k k k k k

q K w I H K x K A HA K x x K E

( ) [ ] 0

1 1

= ′ + −

+ + k k k

x E K A HA K

1 1

= ′ + − ⇒

+ + k k

K A HA K

( )A

H K I K

k k 1 1

r

+ +

− = ′

1 1 1 1

ˆ ˆ

+ + + +

+ ′ =

k k k k k

z K x K x

[ ]

ˆ

1 1

= −

+ + k k

x x E

Trick alert!

Excerpted and modified from aticourses.com

SLIDE 10

Frank Wood - fwood@cs.brown.edu

Pulling it together (a bit)

Remember from the previous slide

( )A

H K I K

k k 1 1 + +

− = ′

1 1 1 1

ˆ ˆ

+ + + +

+ ′ =

k k k k k

z K x K x

( )

1 1 1 1

ˆ ˆ

+ + + +

+ − =

k k k k k

z K x A H K I x

( )

k k k k k

x HA z K x A x ˆ ˆ ˆ

1 1 1

− + =

+ + +

Can get the Kalman gain by minimizing the variance of the estimation error. Can get the Kalman gain by minimizing the variance of the estimation error. Error covariance Posterior estimate Kalman gain

1

) ( ) ( ) ˆ ( ˆ ˆ

− − − − − −

+ = − = − + = Q H HP H P K P H K I P x H z K x x

T k T k k k k k k k k k k

v

Excerpted and modified from aticourses.com

SLIDE 11

Frank Wood - fwood@cs.brown.edu

1 1

and ˆ

f

estimate Initial

k- k

P x −

Time Update Measurement Update

Welch and Bishop 2002

Prior estimate Error covariance Posterior estimate Kalman gain Error covariance

W A AP P x A x

T k k k k

+ = =

− − − − 1 1

ˆ ˆ

1

) ( ) ( ) ˆ ( ˆ ˆ

− − − − − −

+ = − = − + = Q H HP H P K P H K I P x H z K x x

T k T k k k k k k k k k k

v

The Kalman Filter Algorithm

SLIDE 12

Frank Wood - fwood@cs.brown.edu

Good time for a Break

Changing gears to PCA/EM/Mixture Modeling

SLIDE 13

Frank Wood - fwood@cs.brown.edu

Principal Component Analysis (PCA)

“The central idea of [PCA] is to reduce the dimensionality of a data set consisting of a large number of interrelated variables, while retaining as much as possible of the variation present in the principal components (PCs), which are uncorrelated, and which are

rdered so that the first few retain most of the variation present

in all of the original variables.”, I.T. Joliffe

Example applications

– Compression – Noise Reduction – Dimensionality Reduction

Eigenfaces, etc.

SLIDE 14

Frank Wood - fwood@cs.brown.edu

10
5

5 10

8
6
4
2

2 4 6 8 10 Gaussian cloud

The Gist of PCA

num_points = 500; angle = pi/4; variances = [5 0; 0 .5]' rotation = [cos(angle) -sin(angle);… sin(angle) cos(angle)] data = rotation*(variances*randn(2,1000)); [pcadata,eigenvectors,eigenvalues] = pca(data,2); recovered_rotation = eigenvectors recovered_variances = sqrt(eigenvalues)

variances = 5.0000 0 0 0.5000 rotation = 0.7071 -0.7071 0.7071 0.7071 recovered_rotation =

0.7042 -0.7100
0.7100 0.7042

recovered_variances = 5.1584 0 0 0.4934

20
15
10
5

5 10 15 10 20 30 40 50 60 Histogram of data projected onto first PC

SLIDE 15

Frank Wood - fwood@cs.brown.edu

The Math of PCA

First step: Find a linear function (a projection)
f a R.V. that has maximum variance. i.e.
Second through step: Find the subsequent

uncorrelated projection with maximum variance

etc. i.e.
Continue until “enough variance” is accounted for
r up to , the dimensionality of .

∑

=

= + + + =

p j j j p p

x x x x x

1 1 1 2 12 1 11 T 1

α α α α α L r r x r

th

k

α α α r K r r , , ,

3 2

x r k

SLIDE 16

Frank Wood - fwood@cs.brown.edu

Finding a principal component (PC)

Maximize the variance of the projection:
Easy to do! Set
Solution: constrain

( )

[ ]

T T 1 T 1

1

max arg x x E r r r r

r

α α

α

[ ]

1 T T 1

1

max arg α α

α

r r r r

r

x x E

1 T 1

1

max arg α α

α

r r

r

Σ ∞ =

1

α r 1

1 T 1

= α α r r

SLIDE 17

Frank Wood - fwood@cs.brown.edu

Constrained Optimization

Use Lagrange multiplier and differentiate:
So is an eigenvector of and is the

corresponding eigenvalue.

( ) ( )

1

1 T 1 1 T 1 1

= − − Σ ∂ ∂ α α λ α α α r r r r r

1 1

= − Σ α λ α r r

( )

1 =

Ι − Σ α λ r

r

1 1

α λ α r r = Σ

1

α r Σ λ

SLIDE 18

Frank Wood - fwood@cs.brown.edu

Optimal Properties of PC’s

The second, third, etc. PC’s can be found using a

similar derivation subject of course to additional constraints.

It can be shown that a choosing B’ to be the

first q eigenvectors of the covariance matrix

f x that the orthonormal linear transformation

maximizes the covariance of y.

x B y ′ =

Σ

SLIDE 19

Frank Wood - fwood@cs.brown.edu

10
5

5 10

8
6
4
2

2 4 6 8 10 Gaussian cloud

The Gist of PCA

num_points = 500; angle = pi/4; variances = [5 0; 0 .5]' rotation = [cos(angle) -sin(angle);… sin(angle) cos(angle)] data = rotation*(variances*randn(2,1000)); [pcadata,eigenvectors,eigenvalues] = pca(data,2); recovered_rotation = eigenvectors recovered_variances = sqrt(eigenvalues)

variances = 5.0000 0 0 0.5000 rotation = 0.7071 -0.7071 0.7071 0.7071 recovered_rotation =

0.7042 -0.7100
0.7100 0.7042

recovered_variances = 5.1584 0 0 0.4934

20
15
10
5

5 10 15 10 20 30 40 50 60 Histogram of data projected onto first PC

SLIDE 20

Frank Wood - fwood@cs.brown.edu

EM for Gaussian Mixture Models

Expectation Maximization is a recursive method

for estimating the parameters of data distributions with missing or unobserved data.

In our case, the “missing data” is data class

memberships.

This represents a generative view with latent structure.

( ) (

)

∏

=

= Υ Χ = Υ Χ

n i y i i

i

x p y p P L

1

| log )) | , ( log( )) , | ( log( θ θ θ

Shorthand for “the prior probability of the class labeled yi“ Shorthand for “the prior probability of the class labeled yi“ Probability of xi assuming that it came from that class. Probability of xi assuming that it came from that class.

SLIDE 21

Frank Wood - fwood@cs.brown.edu

Closed form E & M steps for GMM

( )

∑

=

Θ =

N i g i new l

x l p N

1

, | 1 α

( ) ( )

∑ ∑

= =

Θ Θ =

N i g i N i g i i new l

x l p x l p x

1 1

, | , | µ

( )( ) ( ) ( )

∑ ∑

= =

Θ − − Θ = Σ

N i g i N i new l i new l i g i new l

x l p x x x l p

1 1 T

, | , | µ µ

From “A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models”, Jeff A. Bilmes

SLIDE 22

Frank Wood - fwood@cs.brown.edu

Example Application – Spike Sorting

Previous Solutions

– Non-parametric template matching – Various clustering's of principle components (Lewiki) – EM on mixtures of multivariate t- distributions (Shoham, et al) – Wavelet packets (Hulata, et al)

Problems

– Manually determine waveform templates – Manually determine number of clusters – Manually identify noise – Waveform variability – Inter-spike interval – Off-line vs. on-line

SLIDE 23

Frank Wood - fwood@cs.brown.edu

Neural Decoding / Prostheses

Computer cursor and keyboard entry Robotic arm

Stimulation of Muscles, Spinal Cord, and Brain

Voluntary control signal Decoder (Kalman filter, linear filter, etc)

multi-unit activity
single unit activity
EEG, LFP, etc.

Signal

Detection Spike Sorting Rate Estimation

SLIDE 24

Frank Wood - fwood@cs.brown.edu

A Slight Untruth has been Implied

The data you have is sorted horribly.
Most of the 42 channels you have are “multi-

units” not actually single neurons.

It is virtually impossible to isolate and record a

single neuron with perfect certainty with any recording technology and is even more difficult to do with an array due to its random insertion.

SLIDE 25

Frank Wood - fwood@cs.brown.edu

Spike Sorting.

Our definition: waveforms

captured at threshold crossings are “sorted” by deciding :

– which are “spikes” – how many neurons there are – which neurons each came from. – Not detection!

Results from Bionic

microelectrode array.

SLIDE 26

Frank Wood - fwood@cs.brown.edu

SLIDE 27

Frank Wood - fwood@cs.brown.edu

Off-line Sorter Screen-Capture

SLIDE 28

Frank Wood - fwood@cs.brown.edu

Spike Sorting’s dirty little secret.

Inspired by Harris et al (2000) we conducted a study
f spike sorting subjectivity.

– Real data

5 Expert sorters
20 Representative channels

35 18 27 32 28 Units 202351 77194 150917 50796 99160 Spikes E D C B A Subject

SLIDE 29

Frank Wood - fwood@cs.brown.edu

Two people sorting the same channel.

0. .5 1 1. .5 2

Subject D

Seconds 0. .5 1 1. .5 2

Subject E

Seconds

100 200 300 400 500 600 700 800 900 1000 µ sec.

Subject D

Noise: 19484 Unit A: 3474

100 200 300 400 500 600 700 800 900 1000 µ sec.

Subject E

Noise: 15074 Unit A: 4013 Unit B: 3539 Unit C: 332

SLIDE 30

Frank Wood - fwood@cs.brown.edu

Our Goal

Better decoding accuracy by way of improved

spike sorting.

Better spike sorting for neuroscience would be

great to achieve as well but is a slightly different goal.

SLIDE 31

Frank Wood - fwood@cs.brown.edu

A Greedy Automatic Spike Sorting Algorithm

Initialize mixture model with n mixture components by spectral clustering ala Ng and Jordan (2001) Best decoding MSE? For every channel in the recording Optimize model parameters (EM) Propose 0,2,3… units

n current channel.

Decode using all channels using Kalman filter ala Wu et al (2003)

Yes No

Assign every channel to have 1 unit and decode using Kalman filter. Record the MSE of the reconstruction.

For full details see the paper.

SLIDE 32

Frank Wood - fwood@cs.brown.edu

Why did we talk about PCA?

The waveforms are largely similar (even

between two different neurons).

The intrinsic dimensionality of a waveform is

probably much lower than the 48 samples we had for each.

Speeds computation considerably and makes

estimates of mixture model parameters more robust.

SLIDE 33

Frank Wood - fwood@cs.brown.edu

Automatic Spike Sorting Visual Results

100 200 300 400 500 600 700 800 900 1000 Noise : 16 Unit A: 387 Unit B: 84 Unit C: 325 100 200 300 400 500 600 700 800 900 1000 Noise : 42 Unit A: 0 Unit B: 0 Unit C: 0 100 200 300 400 500 600 700 800 900 1000 Noise : 7 Unit A: 544 Unit B: 0 Unit C: 0 100 200 300 400 500 600 700 800 900 1000 Noise : 75 Unit A: 556 Unit B: 578 Unit C: 0

Waveforms Corresponding 2 largest PCA coefficients.

SLIDE 34

Frank Wood - fwood@cs.brown.edu

Decoding Results

Kalman filter, trained on 3 minutes of pinball data, average

results for 5 one minute decoding segments.

11.30 +/- 1.15 625861 114 Auto Weighted 11.31 +/- 1.33 625861 114 Auto Max 12.78 +/- 1.89 860261 96 None 13.28 +/- 1.54 860261 288 Random 13.46 +/- 2.54 547993 92

Ave. Human

12.37 +/- 1.22 642422 88 D 13.37 +/- 1.52 456221 78 C 16.16 +/- 2.38 335656 96 B 11.45 +/- 1.39 757674 107 A MSE (cm^2) Spikes Neurons Subject

Rank: Auto Sorted → No Sorting → Randomly Sorted → Human Sorted !

Actual Monkey hand position Neural Reconstruction

SLIDE 35

Frank Wood - fwood@cs.brown.edu

Conclusions and Discussion

This automatic sorting algorithm produces better spike

trains for neural decoding.

Maybe spike sorting isn’t necessary for good decoding?

– Hints at using a different signal instead?

Linking decoding to sorting may not identify

physiological neurons.

Next Steps

– Fully leverage probabilistic interpretation for enhanced rate estimation. – Different cost function. – Extend to continuous signal.

SLIDE 36

Frank Wood - fwood@cs.brown.edu

Next Week

Fun!
Crazy papers – thought provoking ethics article.
Get the Kalman filter assignment out of the way
quickly. Given what you know now it should be