Topics in Brain Computer Interfaces Topics in Brain Computer - - PowerPoint PPT Presentation

topics in brain computer interfaces topics in brain
SMART_READER_LITE
LIVE PREVIEW

Topics in Brain Computer Interfaces Topics in Brain Computer - - PowerPoint PPT Presentation

Topics in Brain Computer Interfaces Topics in Brain Computer Interfaces CS295- -7 7 CS295 Professor: M ICHAEL B LACK TA: F RANK W OOD Spring 2005 Automated Spike Sorting Frank Wood - fwood@cs.brown.edu Today Particle Filter Homework


slide-1
SLIDE 1

Frank Wood - fwood@cs.brown.edu

Topics in Brain Computer Interfaces Topics in Brain Computer Interfaces CS295 CS295-

  • 7

7

Professor: MICHAEL BLACK TA: FRANK WOOD Spring 2005 Automated Spike Sorting

slide-2
SLIDE 2

Frank Wood - fwood@cs.brown.edu

Today

  • Particle Filter Homework Discussion and Review
  • Kalman Filter Review
  • PCA Introduction
  • EM Review
  • Spike Sorting
slide-3
SLIDE 3

Frank Wood - fwood@cs.brown.edu

Particle Filtering Movies

slide-4
SLIDE 4

Frank Wood - fwood@cs.brown.edu

Homework Results?

  • Better than CC X .5, CC Y .8? How?
  • What state estimator did you use (ML/E[])? Why?
  • When did you estimate the state?
  • Particle re-sampling schedule?
  • Remaining questions?
  • Initial state estimate?
  • How did the homework synthesize with the lecture

notes and readings?

slide-5
SLIDE 5

Frank Wood - fwood@cs.brown.edu

Viewing the Bayesian Recursion after implementing Particle Filtering

( ) ( )

( ) ( )

1

P , P model the and P and P given

− k k k k

x x x z z x r r r r r r

( ) ( ) ( ) ( )

P P | P | P z x x z z x r r r r r r =

( ) ( ) ( )

1 1

| P | P | P x z x x x z x r r r r r r r ∂ = ∫

( ) ( ) ( ) ( )

1 1 1 1 1 1

| P | P | P , | P z z z x x z z z x r r r r r r r r r =

( ) ( ) ( )

1 1 1 1 2 1 2

, | P | P , | P x z z x x x z z x r r r r r r r r ∂ = ∫

( ) ( ) ( ) ( )

1 2 1 2 2 2 2 1 2

, | P , | P | P , , | P z z z z z x x z z z z x r r r r r r r r r r r r =

M

Start here with particles representing the posterior. Start here with particles representing the posterior. System model System model Observation model Observation model

slide-6
SLIDE 6

Frank Wood - fwood@cs.brown.edu

Next Homework: The Kalman Filter

  • Closed form solution to recursive Bayesian

estimation where the observation and state models are linear + Gaussian noise.

  • Seminal paper published in 1960:

R.E. Kalman, “A New Approach to Linear Filtering and Prediction Problems”

k k k k

w x A x + =

−1

k k k k

q x H z + =

Observation model State model

) , ( ~ Q N qk

) , ( ~ W N wk

slide-7
SLIDE 7

Frank Wood - fwood@cs.brown.edu

1 1

and ˆ

  • f

estimate Initial

k- k

P x −

Time Update Measurement Update

Welch and Bishop 2002

Prior estimate Error covariance Posterior estimate Kalman gain Error covariance

W A AP P x A x

T k k k k

+ = =

− − − − 1 1

ˆ ˆ

1

) ( ) ( ) ˆ ( ˆ ˆ

− − − − − −

+ = − = − + = Q H HP H P K P H K I P x H z K x x

T k T k k k k k k k k k k

v

The Kalman Filter Algorithm

slide-8
SLIDE 8

Frank Wood - fwood@cs.brown.edu

Where do these equations come from?

  • Find an unbiased minimum variance estimator of

the state at time k+1 of the form

  • For to be unbiased means:

1 1 1 1

ˆ ˆ

+ + + +

+ ′ =

k k k k k

z K x K x

We’ll look at this today. We’ll look at this today.

[ ]

ˆ

1 1

= −

+ + k k

x x E

1

ˆ +

k

x

This bit is much trickier. A link to a full derivation is on the web. This bit is much trickier. A link to a full derivation is on the web.

Excerpted and modified from aticourses.com

slide-9
SLIDE 9

Frank Wood - fwood@cs.brown.edu

Remember from the previous slide

½ Unbiased Estimate

[ ]

ˆ

1 1 1 1

= − + ′

+ + + + k k k k k

x z K x K E

( ) [ ]

ˆ

1 1 1 1 1 1 1

= ′ + ′ − − + + ′

+ + + + + + + k k k k k k k k k k

x K x K x q Hx K x K E

( ) ( ) ( ) ( ) [ ] 0

ˆ

1 1 1 1 1 1

= ′ + + − + + + − ′

+ + + + + + k k k k k k k k k k k

x K w Ax q w Ax H K x x K E

( ) ( ) [ ]

) ( ˆ

1 1 1 1 1 1 1

= + − + ′ + − + − ′

+ + + + + + + k k k k k k k k k k

q K w I H K x K A HA K x x K E

( ) [ ] 0

1 1

= ′ + −

+ + k k k

x E K A HA K

1 1

= ′ + − ⇒

+ + k k

K A HA K

( )A

H K I K

k k 1 1

  • r

+ +

− = ′

1 1 1 1

ˆ ˆ

+ + + +

+ ′ =

k k k k k

z K x K x

[ ]

ˆ

1 1

= −

+ + k k

x x E

Trick alert!

Excerpted and modified from aticourses.com

slide-10
SLIDE 10

Frank Wood - fwood@cs.brown.edu

Pulling it together (a bit)

Remember from the previous slide

( )A

H K I K

k k 1 1 + +

− = ′

1 1 1 1

ˆ ˆ

+ + + +

+ ′ =

k k k k k

z K x K x

( )

1 1 1 1

ˆ ˆ

+ + + +

+ − =

k k k k k

z K x A H K I x

( )

k k k k k

x HA z K x A x ˆ ˆ ˆ

1 1 1

− + =

+ + +

Can get the Kalman gain by minimizing the variance of the estimation error. Can get the Kalman gain by minimizing the variance of the estimation error. Error covariance Posterior estimate Kalman gain

1

) ( ) ( ) ˆ ( ˆ ˆ

− − − − − −

+ = − = − + = Q H HP H P K P H K I P x H z K x x

T k T k k k k k k k k k k

v

Excerpted and modified from aticourses.com

slide-11
SLIDE 11

Frank Wood - fwood@cs.brown.edu

1 1

and ˆ

  • f

estimate Initial

k- k

P x −

Time Update Measurement Update

Welch and Bishop 2002

Prior estimate Error covariance Posterior estimate Kalman gain Error covariance

W A AP P x A x

T k k k k

+ = =

− − − − 1 1

ˆ ˆ

1

) ( ) ( ) ˆ ( ˆ ˆ

− − − − − −

+ = − = − + = Q H HP H P K P H K I P x H z K x x

T k T k k k k k k k k k k

v

The Kalman Filter Algorithm

slide-12
SLIDE 12

Frank Wood - fwood@cs.brown.edu

Good time for a Break

  • Changing gears to PCA/EM/Mixture Modeling
slide-13
SLIDE 13

Frank Wood - fwood@cs.brown.edu

Principal Component Analysis (PCA)

“The central idea of [PCA] is to reduce the dimensionality of a data set consisting of a large number of interrelated variables, while retaining as much as possible of the variation present in the principal components (PCs), which are uncorrelated, and which are

  • rdered so that the first few retain most of the variation present

in all of the original variables.”, I.T. Joliffe

  • Example applications

– Compression – Noise Reduction – Dimensionality Reduction

  • Eigenfaces, etc.
slide-14
SLIDE 14

Frank Wood - fwood@cs.brown.edu

  • 10
  • 5

5 10

  • 8
  • 6
  • 4
  • 2

2 4 6 8 10 Gaussian cloud

The Gist of PCA

num_points = 500; angle = pi/4; variances = [5 0; 0 .5]' rotation = [cos(angle) -sin(angle);… sin(angle) cos(angle)] data = rotation*(variances*randn(2,1000)); [pcadata,eigenvectors,eigenvalues] = pca(data,2); recovered_rotation = eigenvectors recovered_variances = sqrt(eigenvalues)

variances = 5.0000 0 0 0.5000 rotation = 0.7071 -0.7071 0.7071 0.7071 recovered_rotation =

  • 0.7042 -0.7100
  • 0.7100 0.7042

recovered_variances = 5.1584 0 0 0.4934

  • 20
  • 15
  • 10
  • 5
5 10 15 10 20 30 40 50 60 Histogram of data projected onto first PC
slide-15
SLIDE 15

Frank Wood - fwood@cs.brown.edu

The Math of PCA

  • First step: Find a linear function (a projection)
  • f a R.V. that has maximum variance. i.e.
  • Second through step: Find the subsequent

uncorrelated projection with maximum variance

  • etc. i.e.
  • Continue until “enough variance” is accounted for
  • r up to , the dimensionality of .

=

= + + + =

p j j j p p

x x x x x

1 1 1 2 12 1 11 T 1

α α α α α L r r x r

th

k

k

α α α r K r r , , ,

3 2

x r k

slide-16
SLIDE 16

Frank Wood - fwood@cs.brown.edu

Finding a principal component (PC)

  • Maximize the variance of the projection:
  • Easy to do! Set
  • Solution: constrain

( )

[ ]

T T 1 T 1

1

max arg x x E r r r r

r

α α

α

[ ]

1 T T 1

1

max arg α α

α

r r r r

r

x x E

1 T 1

1

max arg α α

α

r r

r

Σ ∞ =

1

α r 1

1 T 1

= α α r r

slide-17
SLIDE 17

Frank Wood - fwood@cs.brown.edu

Constrained Optimization

  • Use Lagrange multiplier and differentiate:
  • So is an eigenvector of and is the

corresponding eigenvalue.

( ) ( )

1

1 T 1 1 T 1 1

= − − Σ ∂ ∂ α α λ α α α r r r r r

1 1

= − Σ α λ α r r

( )

1 =

Ι − Σ α λ r

  • r

1 1

α λ α r r = Σ

1

α r Σ λ

slide-18
SLIDE 18

Frank Wood - fwood@cs.brown.edu

Optimal Properties of PC’s

  • The second, third, etc. PC’s can be found using a

similar derivation subject of course to additional constraints.

  • It can be shown that a choosing B’ to be the

first q eigenvectors of the covariance matrix

  • f x that the orthonormal linear transformation

maximizes the covariance of y.

x B y ′ =

Σ

slide-19
SLIDE 19

Frank Wood - fwood@cs.brown.edu

  • 10
  • 5

5 10

  • 8
  • 6
  • 4
  • 2

2 4 6 8 10 Gaussian cloud

The Gist of PCA

num_points = 500; angle = pi/4; variances = [5 0; 0 .5]' rotation = [cos(angle) -sin(angle);… sin(angle) cos(angle)] data = rotation*(variances*randn(2,1000)); [pcadata,eigenvectors,eigenvalues] = pca(data,2); recovered_rotation = eigenvectors recovered_variances = sqrt(eigenvalues)

variances = 5.0000 0 0 0.5000 rotation = 0.7071 -0.7071 0.7071 0.7071 recovered_rotation =

  • 0.7042 -0.7100
  • 0.7100 0.7042

recovered_variances = 5.1584 0 0 0.4934

  • 20
  • 15
  • 10
  • 5
5 10 15 10 20 30 40 50 60 Histogram of data projected onto first PC
slide-20
SLIDE 20

Frank Wood - fwood@cs.brown.edu

EM for Gaussian Mixture Models

  • Expectation Maximization is a recursive method

for estimating the parameters of data distributions with missing or unobserved data.

  • In our case, the “missing data” is data class

memberships.

  • This represents a generative view with latent structure.

( ) (

)

=

= Υ Χ = Υ Χ

n i y i i

i

x p y p P L

1

| log )) | , ( log( )) , | ( log( θ θ θ

Shorthand for “the prior probability of the class labeled yi“ Shorthand for “the prior probability of the class labeled yi“ Probability of xi assuming that it came from that class. Probability of xi assuming that it came from that class.

slide-21
SLIDE 21

Frank Wood - fwood@cs.brown.edu

Closed form E & M steps for GMM

( )

=

Θ =

N i g i new l

x l p N

1

, | 1 α

( ) ( )

∑ ∑

= =

Θ Θ =

N i g i N i g i i new l

x l p x l p x

1 1

, | , | µ

( )( ) ( ) ( )

∑ ∑

= =

Θ − − Θ = Σ

N i g i N i new l i new l i g i new l

x l p x x x l p

1 1 T

, | , | µ µ

From “A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models”, Jeff A. Bilmes

slide-22
SLIDE 22

Frank Wood - fwood@cs.brown.edu

Example Application – Spike Sorting

  • Previous Solutions

– Non-parametric template matching – Various clustering's of principle components (Lewiki) – EM on mixtures of multivariate t- distributions (Shoham, et al) – Wavelet packets (Hulata, et al)

  • Problems

– Manually determine waveform templates – Manually determine number of clusters – Manually identify noise – Waveform variability – Inter-spike interval – Off-line vs. on-line

slide-23
SLIDE 23

Frank Wood - fwood@cs.brown.edu

Neural Decoding / Prostheses

Computer cursor and keyboard entry Robotic arm

Stimulation of Muscles, Spinal Cord, and Brain

Voluntary control signal Decoder (Kalman filter, linear filter, etc)

  • multi-unit activity
  • single unit activity
  • EEG, LFP, etc.

Signal

Detection Spike Sorting Rate Estimation

slide-24
SLIDE 24

Frank Wood - fwood@cs.brown.edu

A Slight Untruth has been Implied

  • The data you have is sorted horribly.
  • Most of the 42 channels you have are “multi-

units” not actually single neurons.

  • It is virtually impossible to isolate and record a

single neuron with perfect certainty with any recording technology and is even more difficult to do with an array due to its random insertion.

slide-25
SLIDE 25

Frank Wood - fwood@cs.brown.edu

Spike Sorting.

  • Our definition: waveforms

captured at threshold crossings are “sorted” by deciding :

– which are “spikes” – how many neurons there are – which neurons each came from. – Not detection!

  • Results from Bionic

microelectrode array.

slide-26
SLIDE 26

Frank Wood - fwood@cs.brown.edu

slide-27
SLIDE 27

Frank Wood - fwood@cs.brown.edu

Off-line Sorter Screen-Capture

slide-28
SLIDE 28

Frank Wood - fwood@cs.brown.edu

Spike Sorting’s dirty little secret.

  • Inspired by Harris et al (2000) we conducted a study
  • f spike sorting subjectivity.

– Real data

  • 5 Expert sorters
  • 20 Representative channels

35 18 27 32 28 Units 202351 77194 150917 50796 99160 Spikes E D C B A Subject

slide-29
SLIDE 29

Frank Wood - fwood@cs.brown.edu

Two people sorting the same channel.

0. .5 1 1. .5 2

Subject D

Seconds 0. .5 1 1. .5 2

Subject E

Seconds

100 200 300 400 500 600 700 800 900 1000 µ sec.

Subject D

Noise: 19484 Unit A: 3474

100 200 300 400 500 600 700 800 900 1000 µ sec.

Subject E

Noise: 15074 Unit A: 4013 Unit B: 3539 Unit C: 332

slide-30
SLIDE 30

Frank Wood - fwood@cs.brown.edu

Our Goal

  • Better decoding accuracy by way of improved

spike sorting.

  • Better spike sorting for neuroscience would be

great to achieve as well but is a slightly different goal.

slide-31
SLIDE 31

Frank Wood - fwood@cs.brown.edu

A Greedy Automatic Spike Sorting Algorithm

Initialize mixture model with n mixture components by spectral clustering ala Ng and Jordan (2001) Best decoding MSE? For every channel in the recording Optimize model parameters (EM) Propose 0,2,3… units

  • n current channel.

Decode using all channels using Kalman filter ala Wu et al (2003)

Yes No

Assign every channel to have 1 unit and decode using Kalman filter. Record the MSE of the reconstruction.

For full details see the paper.

slide-32
SLIDE 32

Frank Wood - fwood@cs.brown.edu

Why did we talk about PCA?

  • The waveforms are largely similar (even

between two different neurons).

  • The intrinsic dimensionality of a waveform is

probably much lower than the 48 samples we had for each.

  • Speeds computation considerably and makes

estimates of mixture model parameters more robust.

slide-33
SLIDE 33

Frank Wood - fwood@cs.brown.edu

Automatic Spike Sorting Visual Results

100 200 300 400 500 600 700 800 900 1000 Noise : 16 Unit A: 387 Unit B: 84 Unit C: 325 100 200 300 400 500 600 700 800 900 1000 Noise : 42 Unit A: 0 Unit B: 0 Unit C: 0 100 200 300 400 500 600 700 800 900 1000 Noise : 7 Unit A: 544 Unit B: 0 Unit C: 0 100 200 300 400 500 600 700 800 900 1000 Noise : 75 Unit A: 556 Unit B: 578 Unit C: 0

Waveforms Corresponding 2 largest PCA coefficients.

slide-34
SLIDE 34

Frank Wood - fwood@cs.brown.edu

Decoding Results

  • Kalman filter, trained on 3 minutes of pinball data, average

results for 5 one minute decoding segments.

11.30 +/- 1.15 625861 114 Auto Weighted 11.31 +/- 1.33 625861 114 Auto Max 12.78 +/- 1.89 860261 96 None 13.28 +/- 1.54 860261 288 Random 13.46 +/- 2.54 547993 92

  • Ave. Human

12.37 +/- 1.22 642422 88 D 13.37 +/- 1.52 456221 78 C 16.16 +/- 2.38 335656 96 B 11.45 +/- 1.39 757674 107 A MSE (cm^2) Spikes Neurons Subject

Rank: Auto Sorted → No Sorting → Randomly Sorted → Human Sorted !

Actual Monkey hand position Neural Reconstruction

slide-35
SLIDE 35

Frank Wood - fwood@cs.brown.edu

Conclusions and Discussion

  • This automatic sorting algorithm produces better spike

trains for neural decoding.

  • Maybe spike sorting isn’t necessary for good decoding?

– Hints at using a different signal instead?

  • Linking decoding to sorting may not identify

physiological neurons.

  • Next Steps

– Fully leverage probabilistic interpretation for enhanced rate estimation. – Different cost function. – Extend to continuous signal.

slide-36
SLIDE 36

Frank Wood - fwood@cs.brown.edu

Next Week

  • Fun!
  • Crazy papers – thought provoking ethics article.
  • Get the Kalman filter assignment out of the way
  • quickly. Given what you know now it should be

quite easy!