Frank Wood - CS295-7 2005 Brown University
Topics in Brain Computer Interfaces Topics in Brain Computer Interfaces CS295 CS295-
- 7
Topics in Brain Computer Interfaces Topics in Brain Computer - - PowerPoint PPT Presentation
Topics in Brain Computer Interfaces Topics in Brain Computer Interfaces CS295- -7 7 CS295 Professor: M ICHAEL B LACK TA: F RANK W OOD Spring 2005 Bayesian Inference through Particle Filtering Frank Wood - CS295-7 2005 Brown University
Frank Wood - CS295-7 2005 Brown University
Frank Wood - CS295-7 2005 Brown University
Frank Wood - CS295-7 2005 Brown University
1 −
k k k
d k k T k d k k T k
− −
: 2 : 1 v
Frank Wood - CS295-7 2005 Brown University
1 −
k k k
k k
Frank Wood - CS295-7 2005 Brown University
Frank Wood - CS295-7 2005 Brown University
Every real process has process and measurement noise
: n
k k
: n
k k k
1 : process
− k k
A probabilistic process model accounts for process and measurement noise
1 : process 1
− − k k k
Example: missile interceptor system. The missile propulsion system is noisy and radar observations are noisy. Even if we are given exact process and observation models our estimate of the missile’s position may diverge if we don’t account for uncertainty.
Frank Wood - CS295-7 2005 Brown University
: t measuremen
k k k
1 : model 1
− − k k k
Frank Wood - CS295-7 2005 Brown University
normalization constant (independent of mouth) Prior (a priori – before the evidence) Likelihood (evidence)
Posterior a posteriori probability (after the evidence)
We infer system state from uncertain observations and our prior knowledge (model) of system state.
Frank Wood - CS295-7 2005 Brown University
k n k k k
, , 2 , 1
e.g. firing rates of all n cells at time k
k y k x k y k x k k k
, , , ,
e.g. hand kinematics at time k
1 2 1 : 1 −
k k k
2 1 : 1 k k k
Observations System State
Frank Wood - CS295-7 2005 Brown University
neural firing rate state (e.g. hand position, velocity, acceleration) noise (e.g. Normal or Poisson)
linear, non-linear?
f()’s Markov?
Frank Wood - CS295-7 2005 Brown University
) z | x (
: 1 k k
p
Recursion!
: 1 1 : 1 1 k k k k
− −
) | ( argmax
: 1 k k x
z x p
k
) | (
: 1 k k k
z x p x
Frank Wood - CS295-7 2005 Brown University
Graphical models are a way of systematically diagramming the dependencies amongst groups of random variables. Graphical models can help elucidate assumptions and modeling choices that would
Using a graphical model will help us design our model!
Frank Wood - CS295-7 2005 Brown University
k
k
k k x
Generative model:
Frank Wood - CS295-7 2005 Brown University
k
k
1 − k
1 − k
1 + k
1 + k
Frank Wood - CS295-7 2005 Brown University
k
k
1 − k
1 − k
1 + k
1 + k
1 1 2 1 − − −
k k k k k
1 : 1 k k k k k
−
1 1 , 1 : 1 − − −
k k k k k
Frank Wood - CS295-7 2005 Brown University
From these modeling choices all we have to choose is: Likelihood model Temporal prior model How to compute the posterior
k k x
1 − k k x
− − − −
1 1 : 1 1 1 : 1
k k k k k k k k k
Encoding model decoding
Initial distributions
Frank Wood - CS295-7 2005 Brown University
⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ⎛
42 2 1 k k k
z z z M firing rate vector (zero mean, sqrt) 42 X 42 matrix
L
, 2 , 1 ,
= k k
42 X 6 matrix
⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ⎛
k k k k k k
y x y x
a a v v y x
system state vector (zero mean) 6 X 6 matrix
6 X 6 matrix
k k k
+ 1
L
, 2 , 1 ,
= k k
k k k
Frank Wood - CS295-7 2005 Brown University
k k k k k
Recall:
2 2 σ
Frank Wood - CS295-7 2005 Brown University
= = −
= =
M k k k M k k k M M M M M
p p p p p p
1 2 1 1
)] ( ][ ) ( ) ( [ ) ( ) ( ) , ( x z x x x X | Z X Z X
k
1 − k
1 + k
k k k k
w x A x + =
−1 1 − k
k
1 + k
k k k k
q x H z + =
Frank Wood - CS295-7 2005 Brown University
Frank Wood - CS295-7 2005 Brown University
Frank Wood - CS295-7 2005 Brown University
normalization constant (independent of kinematics) Prior (a priori – before the evidence) Likelihood (evidence)
Posterior a posteriori probability (after the evidence)
) firing ( ) kinematics ( ) kinematics | firing ( ) firing | kinematics (
t : 1 t t t t : 1 t
p p p p = We sequentially infer hand kinematics from uncertain evidence and our prior knowledge of how hands move.
Frank Wood - CS295-7 2005 Brown University
Frank Wood - CS295-7 2005 Brown University
: 1 k k
1 : 1 1 : 1 − −
k k k k k k
Cancels Cancels
: 1 : 1 k k k
Bayes Rule Bayes Rule
1 : 1 1 : 1 1 : 1 1 : 1 − − − −
k k k k k k k k
Bayes Rule Again Bayes Rule Again
1 : 1 1 : 1 1 : 1 1 : 1 − − − −
k k k k k k k k
Independence Independence New Observation New Observation Prior Prior Posterior Posterior
Frank Wood - CS295-7 2005 Brown University
1 : 1 − k k 1 1 : 1 1 1
− − − −
k k k k k
A C A C A B C B δ ) | Pr( ) , | Pr( ) | Pr(
=
1 1 : 1 1 1 : 1 1
− − − − −
k k k k k k
Law of Total Probability Law of Total Probability Independence Independence Posterior Posterior Prior Prior
Frank Wood - CS295-7 2005 Brown University
Frank Wood - CS295-7 2005 Brown University
1
− k k k k
1 1
1 1 1 1 1 1
1 1 1 1 2 1 2
1 2 1 2 2 2 2 1 2
Run the Bayesian recursion to depth 2.
Frank Wood - CS295-7 2005 Brown University
: 1 k k
1 : 1 k k k
−
1 1 : 1 1 1
− − − −
k k k k k k k
1 : 1 1 : 1 − −
k k k k k
Bayes rule: ) ( / ) ( ) | ( ) | ( a p b p a b p b a p = Independence assumption:
) x | x ( ) z , x | x (
1 1 : 1 1 − − −
=
k k k k k
p p
Independence assumption: ) | ( ) , | (
1 k k k k k
p p x z Z x z =
−
1 : 1 −
k k k k
Law of Total Probability:
= db c b p c b a p c a p ) | ( ) , | ( ) | (
1 1 : 1 1 1 : 1 1
− − − − −
k k k k k k k k
What’s missing?
Frank Wood - CS295-7 2005 Brown University
1 1 1 1 :
− − − −
k :k k k k k k k k
k k
k k 1
−
:k k 1 1 1
− −
Frank Wood - CS295-7 2005 Brown University
: 0 k k
Frank Wood - CS295-7 2005 Brown University
Frank Wood - CS295-7 2005 Brown University
– rand() Linear Congruential Generator
0.2311 0.6068 0.4860 0.8913 0.7621 0.4565 0.0185
– randn() Box-Mueller
– y1 = sqrt( - 2 ln(x1) ) cos( 2 pi x2 ) – y2 = sqrt( - 2 ln(x1) ) sin( 2 pi x2 )
– if(rand()<p)
– Metropolis Hastings / Gibbs
Frank Wood - CS295-7 2005 Brown University
1 2 3 4 5 0.2 0.4 1/(2π)-1/2 e-x2/2
1 2 3 4 5 1 2
µ = .0013
var = .8162
1 2 3 4 5 5 10
µ = .0012
var = .7805
1 2 3 4 5 20 40 60
µ = -.043
var = .9547
Frank Wood - CS295-7 2005 Brown University
emperical average of a the value of a function over the samples converges to the expected value of the function as the number
) ( E ) ( 1
1
X f x f n
P n n i i ∞ → =
→
: 1
Frank Wood - CS295-7 2005 Brown University
Frank Wood - CS295-7 2005 Brown University
1 : k N i x i k k k N
i k ∂
=
1 1 : 1 1
− − − −
k k k k k
1 : − k k 1 1 1 1 1
1
− − = − −
−
k k N i x i k k k
i k
1 1 1 1 1
1
− − − = −
−
k k x k k N i i k
i k
1 1 1 i k k N i i k
− = −
Given Evaluate Weight Weight Particle/Sample Particle/Sample
Frank Wood - CS295-7 2005 Brown University
– Simplified from Arulampalam et al and DeFreitas et al – Largely overlooks the importance weights are maintained and updated – Doesn’t touch particle degeneration and replacement
Frank Wood - CS295-7 2005 Brown University
We use a set of random samples from the posterior distribution to represent the posterior. The we can use sample statistics to approximate expectations over the posterior. Problem: Problem: we need to update the samples such that they still accurately represent the posterior after the next observation. Let be a set of fair samples from distribution , then for functions
Frank Wood - CS295-7 2005 Brown University
Assume we have a weighted sample set the prediction distribution becomes a linear mixture model
sample
and then sampling from the model proposal pdf
as mixing probabilities
N 1 0 1 Cumulative distribution of weights
1 : − k k z
1 1 1 i k k N i i k
− = −
From Monte Carlo integration over posterior at time k. From Monte Carlo integration over posterior at time k. Specified as part
Specified as part
1 i k k x −
N i z x S
i k i k k
< < =
− − −
1 for ,
) ( 1 ) ( 1 1
Importance weights Importance weights
Frank Wood - CS295-7 2005 Brown University
Weighted samples Draw samples from a proposal distribution
weighted samples
i.e. Find weights so that the linearly weighted sample statistics approximate expectations under the desired distribution
Frank Wood - CS295-7 2005 Brown University
The samples from the prediction distribution need to be re-weighted such that they still represent the posterior distribution well after a new observation:
: 1 k k
1 : 1 : − −
k k k k k k
We have a sample representation for this. We have a sample representation for this.
−
N i j k j k j k k
1 1
This is our model likelihood This is our model likelihood Goes away after re- normalization. Goes away after re- normalization.
Major hand waving here! Inaccuracies abound!
j k j k k j k
1
−
Frank Wood - CS295-7 2005 Brown University
Simple particle filter: draw samples from the prediction distribution weights are proportional to the ratio of posterior and prediction distributions, i.e. the normalized likelihood [Gordon et al ’93; Isard & Blake ’98; Liu & Chen ’98, …]
posterior posterior temporal dynamics likelihood sample sample normalize
Frank Wood - CS295-7 2005 Brown University
Isard & Blake ‘96 Posterior
1 1 − − k k
Frank Wood - CS295-7 2005 Brown University
Isard & Blake ‘96 Posterior
1 : 1 1 − − k k
sample sample
Frank Wood - CS295-7 2005 Brown University
Isard & Blake ‘96 Temporal dynamics sample sample
1 − k k
Posterior
1 : 1 1 − − k k
sample sample
Frank Wood - CS295-7 2005 Brown University
Isard & Blake ‘96 Temporal dynamics sample sample Likelihood
k k x
Posterior sample sample
1 : 1 1 − − k k
1 − k k
Frank Wood - CS295-7 2005 Brown University
Isard & Blake ‘96 Temporal dynamics sample sample Posterior Likelihood normalize normalize Posterior sample sample
1 : 1 1 − − k k
: 1 k k z
1 − k k
k k x
Frank Wood - CS295-7 2005 Brown University
Frank Wood - CS295-7 2005 Brown University
Q LL
T =
[ ]
T n
τ τ τ τ τ ... , ,
2 1
= r
) , ( ~ Q N L µ µ τ r r r +
Given
) ( Q N , µ
Show And explain how this fact can be used to sample from a Gaussian. A Gaussian distribution A Gaussian distribution A Cholesky decomposition
A Cholesky decomposition
A random vector where each element is normal with zero mean and unit variance. A random vector where each element is normal with zero mean and unit variance.
E E = = τ τ L L
T T L
L τ τ r r E =
T L
L τ τ r r E = Q =
τ r L Var
Frank Wood - CS295-7 2005 Brown University
Given a weighted sample set
sample N 1 1 Cumulative distribution of weights
) ( ) (
i i
Frank Wood - CS295-7 2005 Brown University
k k
likelihood
− − − − − = 1 1 1 1 1
k k k k k k k
prior
1 1 − − k k
1 W
k k −
system model
1 −
k k k k k k
Frank Wood - CS295-7 2005 Brown University
1 t t t t
−
Likelihood Temporal prior Posterior is also Gaussian
t j t
−
1 − t t x
− − − −
1 1 1 1
t t t t t t t t t
t t t t
Kalman filter. Real-time, recursive, decoding.
system model:
Frank Wood - CS295-7 2005 Brown University
1 1
and ˆ
estimate Initial
k- k
P x −
Welch and Bishop 2002
Prior estimate Error covariance Posterior estimate Kalman gain Error covariance
T k k k k
− − − − 1 1
1
− − − − − −
T k T k k k k k k k k k k
Frank Wood - CS295-7 2005 Brown University
Weighted samples
weighted samples
= N i i t t n t t
1 ) ( ) (
) ( ) (
i i