SLIDE 1 Real-time adaptive information-theoretic
- ptimization of neurophysiology experiments
Presented by Alex Roper March 5, 2009
SLIDE 2
Goals
◮ How do neurons react to stimuli? ◮ What is a neuron’s preferred stimulus?
SLIDE 3
Goals
◮ How do neurons react to stimuli? ◮ What is a neuron’s preferred stimulus? ◮ Minimize number of trials. ◮ Speed - must run in real time.
SLIDE 4
Goals
◮ How do neurons react to stimuli? ◮ What is a neuron’s preferred stimulus? ◮ Minimize number of trials. ◮ Speed - must run in real time. ◮ Emphasis on dimensional scalability (vision)
SLIDE 5 Challenges
◮ Typically high dimension
◮ Model complexity - memory ◮ Stimulus complexity - visual bitmap
SLIDE 6 Challenges
◮ Typically high dimension
◮ Model complexity - memory ◮ Stimulus complexity - visual bitmap
◮ Bayesian approach expensive
◮ Estimation ◮ Integration ◮ Multivariate optimization
SLIDE 7 Challenges
◮ Typically high dimension
◮ Model complexity - memory ◮ Stimulus complexity - visual bitmap
◮ Bayesian approach expensive
◮ Estimation ◮ Integration ◮ Multivariate optimization
◮ Limited firing capacity of a neuron (exhaustion)
SLIDE 8 Challenges
◮ Typically high dimension
◮ Model complexity - memory ◮ Stimulus complexity - visual bitmap
◮ Bayesian approach expensive
◮ Estimation ◮ Integration ◮ Multivariate optimization
◮ Limited firing capacity of a neuron (exhaustion) ◮ Essential issues
◮ Update a posteriori beliefs quickly given new data ◮ Find optimal stimulus quickly
SLIDE 9
Neuron Model
p(rt|{xt, xt−1, ..., xt−tk}, {rt−1, ..., rt−tk})
SLIDE 10
Neuron Model
p(rt|{xt, xt−1, ..., xt−tk}, {rt−1, ..., rt−tk})
◮ The response rt to stimulus xt is dependent on xt itself, as
well as the history of stimuli and responses for a constant sliding window.
SLIDE 11
Neuron Model
p(rt|{xt, xt−1, ..., xt−tk}, {rt−1, ..., rt−tk})
◮ The response rt to stimulus xt is dependent on xt itself, as
well as the history of stimuli and responses for a constant sliding window.
◮ This is needed to measure exhaustion, depletion, etc.
SLIDE 12 Neuron Model
p(rt|{xt, xt−1, ..., xt−tk}, {rt−1, ..., rt−tk})
◮ The response rt to stimulus xt is dependent on xt itself, as
well as the history of stimuli and responses for a constant sliding window.
◮ This is needed to measure exhaustion, depletion, etc.
λt = E(rt) = f
j=1 ajrt−j,
SLIDE 13 Neuron Model
p(rt|{xt, xt−1, ..., xt−tk}, {rt−1, ..., rt−tk})
◮ The response rt to stimulus xt is dependent on xt itself, as
well as the history of stimuli and responses for a constant sliding window.
◮ This is needed to measure exhaustion, depletion, etc.
λt = E(rt) = f
j=1 ajrt−j,
- ◮ Filter coefficients ki,t−l represent dependence on the input
itself.
SLIDE 14 Neuron Model
p(rt|{xt, xt−1, ..., xt−tk}, {rt−1, ..., rt−tk})
◮ The response rt to stimulus xt is dependent on xt itself, as
well as the history of stimuli and responses for a constant sliding window.
◮ This is needed to measure exhaustion, depletion, etc.
λt = E(rt) = f
j=1 ajrt−j,
- ◮ Filter coefficients ki,t−l represent dependence on the input
itself.
◮ aj models dependence on observed recent activity.
SLIDE 15 Neuron Model
p(rt|{xt, xt−1, ..., xt−tk}, {rt−1, ..., rt−tk})
◮ The response rt to stimulus xt is dependent on xt itself, as
well as the history of stimuli and responses for a constant sliding window.
◮ This is needed to measure exhaustion, depletion, etc.
λt = E(rt) = f
j=1 ajrt−j,
- ◮ Filter coefficients ki,t−l represent dependence on the input
itself.
◮ aj models dependence on observed recent activity. ◮ We summarize all unknown parameters as θ. This is what
we’re trying to learn.
SLIDE 16
Generalized Linear Models
◮ Distribution function (multivariate gaussian). ◮ Linear predictor, θ. ◮ Link function (exponential).
SLIDE 17
Updating the Posterior
◮ Ideally, this runs in real time. ◮ Approximate the posterior as Gaussian
SLIDE 18 Updating the Posterior
◮ Ideally, this runs in real time. ◮ Approximate the posterior as Gaussian
◮ The posterior is the product of two smooth, log-concave
terms.
◮ (The GLM likelihood function and the Gaussian prior)
SLIDE 19 Updating the Posterior
◮ Ideally, this runs in real time. ◮ Approximate the posterior as Gaussian
◮ The posterior is the product of two smooth, log-concave
terms.
◮ (The GLM likelihood function and the Gaussian prior)
◮ Laplace approximation to construct a Gaussian
approximation of the posterior.
SLIDE 20 Updating the Posterior
◮ Ideally, this runs in real time. ◮ Approximate the posterior as Gaussian
◮ The posterior is the product of two smooth, log-concave
terms.
◮ (The GLM likelihood function and the Gaussian prior)
◮ Laplace approximation to construct a Gaussian
approximation of the posterior.
◮ Set µt to the peak of the posterior. ◮ Set covariance matrix Ct to negative inverse of Hessian of
log posterior at µt.
SLIDE 21 Updating the Posterior
◮ Ideally, this runs in real time. ◮ Approximate the posterior as Gaussian
◮ The posterior is the product of two smooth, log-concave
terms.
◮ (The GLM likelihood function and the Gaussian prior)
◮ Laplace approximation to construct a Gaussian
approximation of the posterior.
◮ Set µt to the peak of the posterior. ◮ Set covariance matrix Ct to negative inverse of Hessian of
log posterior at µt.
◮ Compute directly?
SLIDE 22 Updating the Posterior
◮ Ideally, this runs in real time. ◮ Approximate the posterior as Gaussian
◮ The posterior is the product of two smooth, log-concave
terms.
◮ (The GLM likelihood function and the Gaussian prior)
◮ Laplace approximation to construct a Gaussian
approximation of the posterior.
◮ Set µt to the peak of the posterior. ◮ Set covariance matrix Ct to negative inverse of Hessian of
log posterior at µt.
◮ Compute directly? ◮ Complexity is O(td2 + d3)
SLIDE 23 Updating the Posterior
◮ Ideally, this runs in real time. ◮ Approximate the posterior as Gaussian
◮ The posterior is the product of two smooth, log-concave
terms.
◮ (The GLM likelihood function and the Gaussian prior)
◮ Laplace approximation to construct a Gaussian
approximation of the posterior.
◮ Set µt to the peak of the posterior. ◮ Set covariance matrix Ct to negative inverse of Hessian of
log posterior at µt.
◮ Compute directly? ◮ Complexity is O(td2 + d3) ◮ O(td2) for product of t likelihood terms. ◮ O(d3) for inverting the Hessian ◮ Approximate p(θt−1|xt−1, rt−1) as Gaussian
SLIDE 24 Updating the Posterior
◮ Ideally, this runs in real time. ◮ Approximate the posterior as Gaussian
◮ The posterior is the product of two smooth, log-concave
terms.
◮ (The GLM likelihood function and the Gaussian prior)
◮ Laplace approximation to construct a Gaussian
approximation of the posterior.
◮ Set µt to the peak of the posterior. ◮ Set covariance matrix Ct to negative inverse of Hessian of
log posterior at µt.
◮ Compute directly? ◮ Complexity is O(td2 + d3) ◮ O(td2) for product of t likelihood terms. ◮ O(d3) for inverting the Hessian ◮ Approximate p(θt−1|xt−1, rt−1) as Gaussian ◮ Now we can use Bayes’ rule to find the posterior in one
dimension.
SLIDE 25 Updating the Posterior
◮ Ideally, this runs in real time. ◮ Approximate the posterior as Gaussian
◮ The posterior is the product of two smooth, log-concave
terms.
◮ (The GLM likelihood function and the Gaussian prior)
◮ Laplace approximation to construct a Gaussian
approximation of the posterior.
◮ Set µt to the peak of the posterior. ◮ Set covariance matrix Ct to negative inverse of Hessian of
log posterior at µt.
◮ Compute directly? ◮ Complexity is O(td2 + d3) ◮ O(td2) for product of t likelihood terms. ◮ O(d3) for inverting the Hessian ◮ Approximate p(θt−1|xt−1, rt−1) as Gaussian ◮ Now we can use Bayes’ rule to find the posterior in one
SLIDE 26
Deriving the optimal stimulus
◮ Main idea: maximize conditional mutual information:
SLIDE 27
Deriving the optimal stimulus
◮ Main idea: maximize conditional mutual information: ◮ I(θ; rt+1|xt+1, xt, rt) = H(θ|xt, rt) − H(θ|xt+1, rt+1).
SLIDE 28
Deriving the optimal stimulus
◮ Main idea: maximize conditional mutual information: ◮ I(θ; rt+1|xt+1, xt, rt) = H(θ|xt, rt) − H(θ|xt+1, rt+1). ◮ This ends up being equivalent to minimizing the conditional
entropy H(θ|xt+1, rt+1).
SLIDE 29
Deriving the optimal stimulus
◮ Main idea: maximize conditional mutual information: ◮ I(θ; rt+1|xt+1, xt, rt) = H(θ|xt, rt) − H(θ|xt+1, rt+1). ◮ This ends up being equivalent to minimizing the conditional
entropy H(θ|xt+1, rt+1).
◮ End up with equation for covariance in terms of Fisher
information, Jobs.
SLIDE 30
Deriving the optimal stimulus
◮ Main idea: maximize conditional mutual information: ◮ I(θ; rt+1|xt+1, xt, rt) = H(θ|xt, rt) − H(θ|xt+1, rt+1). ◮ This ends up being equivalent to minimizing the conditional
entropy H(θ|xt+1, rt+1).
◮ End up with equation for covariance in terms of Fisher
information, Jobs.
◮ We are able to solve for optimal stimulus using the
Lagrange method for constrained optimization
SLIDE 31
Deriving the optimal stimulus
◮ Main idea: maximize conditional mutual information: ◮ I(θ; rt+1|xt+1, xt, rt) = H(θ|xt, rt) − H(θ|xt+1, rt+1). ◮ This ends up being equivalent to minimizing the conditional
entropy H(θ|xt+1, rt+1).
◮ End up with equation for covariance in terms of Fisher
information, Jobs.
◮ We are able to solve for optimal stimulus using the
Lagrange method for constrained optimization
◮ Thus, we have a system of equations in the Lagrange
multiplier, and we can simply line search over it.
SLIDE 32
Deriving the optimal stimulus
◮ Complexity?
SLIDE 33 Deriving the optimal stimulus
◮ Complexity?
◮ Rank-one matrix update and line search to compute µt and
Ct.
SLIDE 34 Deriving the optimal stimulus
◮ Complexity?
◮ Rank-one matrix update and line search to compute µt and
Ct.O(d2).
◮ Eigendecomposition of Ct.
SLIDE 35 Deriving the optimal stimulus
◮ Complexity?
◮ Rank-one matrix update and line search to compute µt and
Ct.O(d2).
◮ Eigendecomposition of Ct. O(d3) ◮ Line search over Lagrange multiplier to compute optimal
◮ O(d3) for the eigendecomposition isn’t great...
SLIDE 36 Deriving the optimal stimulus
◮ Complexity?
◮ Rank-one matrix update and line search to compute µt and
Ct.O(d2).
◮ Eigendecomposition of Ct. O(d3) ◮ Line search over Lagrange multiplier to compute optimal
◮ O(d3) for the eigendecomposition isn’t great... ◮ ...but because of our Gaussian approximation of θ, we can
- btain Ct from Ct−1 with a rank-one modification...
SLIDE 37 Deriving the optimal stimulus
◮ Complexity?
◮ Rank-one matrix update and line search to compute µt and
Ct.O(d2).
◮ Eigendecomposition of Ct. O(d3) ◮ Line search over Lagrange multiplier to compute optimal
◮ O(d3) for the eigendecomposition isn’t great... ◮ ...but because of our Gaussian approximation of θ, we can
- btain Ct from Ct−1 with a rank-one modification...
◮ ...and there are eigendecomposition algorithms that can
take advantage of this.
SLIDE 38 Deriving the optimal stimulus
◮ Complexity?
◮ Rank-one matrix update and line search to compute µt and
Ct.O(d2).
◮ Eigendecomposition of Ct. O(d3) ◮ Line search over Lagrange multiplier to compute optimal
◮ O(d3) for the eigendecomposition isn’t great... ◮ ...but because of our Gaussian approximation of θ, we can
- btain Ct from Ct−1 with a rank-one modification...
◮ ...and there are eigendecomposition algorithms that can
take advantage of this.
◮ This provides an average case runtime of O(d2) for the
data considered, though the complexity is still O(d3) in the worst case.
SLIDE 39
What if θ is dynamic?
◮ Spike history terms
SLIDE 40 What if θ is dynamic?
◮ Spike history terms
◮ Adds a linear term to a quadratic minimization problem for
maximizing entropy.
SLIDE 41 What if θ is dynamic?
◮ Spike history terms
◮ Adds a linear term to a quadratic minimization problem for
maximizing entropy.
◮ Systematic trends in θ.
SLIDE 42 What if θ is dynamic?
◮ Spike history terms
◮ Adds a linear term to a quadratic minimization problem for
maximizing entropy.
◮ Systematic trends in θ.
◮ Just add a random variable N(0, Ct + Q) for known Q.
SLIDE 43 What if θ is dynamic?
◮ Spike history terms
◮ Adds a linear term to a quadratic minimization problem for
maximizing entropy.
◮ Systematic trends in θ.
◮ Just add a random variable N(0, Ct + Q) for known Q. ◮ θt+1 = θt + ωt.
SLIDE 44 What if θ is dynamic?
◮ Spike history terms
◮ Adds a linear term to a quadratic minimization problem for
maximizing entropy.
◮ Systematic trends in θ.
◮ Just add a random variable N(0, Ct + Q) for known Q. ◮ θt+1 = θt + ωt.
SLIDE 45 Results
◮ Simple, memoryless, visual cell
◮ 25x33 bitmaps. ◮ Results on average much better, and never worse, than
random.
SLIDE 46 Results
◮ Simple, memoryless, visual cell
◮ 25x33 bitmaps. ◮ Results on average much better, and never worse, than
random.
◮ Memoryful neuron (simple sine wave)
◮ Outperformed random sampling for estimating spike history
and stimulus coefficients.
SLIDE 47 Results
◮ Simple, memoryless, visual cell
◮ 25x33 bitmaps. ◮ Results on average much better, and never worse, than
random.
◮ Memoryful neuron (simple sine wave)
◮ Outperformed random sampling for estimating spike history
and stimulus coefficients.
◮ Non-systematic time drift
◮ Analogous to eye fatigue/exhaustion. ◮ Outperformed random sampling for estimating spike history
and stimulus coefficients.
SLIDE 48
Conclusion
◮ Approximations based on GLMs allow dramatically faster
algorithm.
SLIDE 49
Conclusion
◮ Approximations based on GLMs allow dramatically faster
algorithm.
◮ At worst, O(n3). on average, O(n2).
SLIDE 50
Conclusion
◮ Approximations based on GLMs allow dramatically faster
algorithm.
◮ At worst, O(n3). on average, O(n2). ◮ Fast enough to run in real time even for high-dimensional
problems.