Neural Encoding Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk - - PowerPoint PPT Presentation
Neural Encoding Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk - - PowerPoint PPT Presentation
Neural Encoding Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2011 Studying sensory systems x ( t ) y ( t ) x ( t )= G [ y ( t )] Decoding: (reconstruction)
Studying sensory systems
x(t) y(t)
Decoding:
ˆ x(t)= G[y(t)]
(reconstruction) Encoding:
ˆ y(t)= F[x(t)]
(systems identification)
General approach
Goal: Estimate p(spike|x, H) [or λ(t|x[0, t), H(t))] from data.
- Naive approach: measure p(spike, H|x) directly for every setting of x.
– too hard: too little data and too many potential inputs.
- Estimate some functional f(p) instead (e.g. mutual information)
- Select stimuli efficiently
- Fit models with smaller numbers of parameters
Spikes, or rate?
Most neurons communicate using action potentials — statistically de- scribed by a point process:
P
- spike ∈ [t, t + dt)
- = λ(t|H(t), stimulus, network activity)dt
To fully model the response we need to identify λ. In general this de- pends on spike history H(t) and network activity. Three options:
- Ignore the history dependence, take network activity as source of
“noise” (i.e. assume firing is inhomogeneous Poisson or Cox process, conditioned on the stimulus).
- Average multiple trials to estimate
λ(t, stimulus) = lim
N→∞
1 N
- n
λ(t|Hn(t), stimulus, networkn)
the mean intensity (or PSTH), and try to fit this.
- Attempt to capture history and network effects in simple models.
Spike-triggered average
Decoding: mean of P (x | y = 1) Encoding: predictive filter
Linear regression
y(t) = T x(t − τ)w(τ)dτ W(ω) = X(ω)∗Y (ω) |X(ω)|2 x1 x2 x3 . . . xT xT+1 . . . x1 x2 x3 . . . xT
- x1 x2 x3 . . . xT xT
- x1 x2 x3 . . . xT+1
x2 x3 x4 . . . xT+1
. . .
× wt
. . .
w3 w2 w1 = yT yT+1
. . .
XW = Y W = (XTX) ΣSS
−1 (XTY )
STA
Linear models
So the (whitened) spike-triggered average gives the minimum-squared- error linear model. Issues:
- overfitting and regularisation
– standard methods for regression
- negative predicted rates
– can model deviations from background
- real neurons aren’t linear
– models are still used extensively – interpretable suggestions of underlying sensitivity – may provide unbiased estimates of cascade filters (see later)
How good are linear predictions?
We would like an absolute measure of model performance. Measured responses can never be predicted perfectly:
- The measurements themselves are noisy.
Models may fail to predict because:
- They are the wrong model.
- Their parameters are mis-estimated due to noise.
Estimating predictable power
Psignal Pnoise
response
- r(n)
= signal + noise P(r(n)) = Psignal + Pnoise P(r(n)) = Psignal + 1 N Pnoise ⇒
- Psignal =
1 N − 1
- NP(r(n)) − P(r(n))
- Pnoise = P(r(n)) −
Psignal
Signal power in A1 responses
0.5 1 1.5 2 2.5 3 3.5 4 0.1 0.2 0.3 0.4 Noise power (spikes2/bin) Signal power (spikes2/bin) 50 100 150 Number of recordings
Testing a model
For a perfect prediction
- P(trial) − P(residual)
- = P(signal)
Thus, we can judge the performance of a model by the normalized pre- dictive power
P(trial) − P(residual)
- P(signal)
Similar to coefficient of determination (r2), but the denominator is the predictable variance.
Predictive performance
0.5 1 1.5 2 2.5 0.5 1 1.5 2 2.5
normalised Bayes predictive power
Training Error
−2 −1 1 −2 −1.5 −1 −0.5 0.5 1
Cross−Validation Error
normalised STA predictive power
Extrapolating the model performance
Jackknifed estimates
50 100 150 −0.5 0.5 1 1.5 2 2.5 3
Normalized linearly predictive power Normalized noise power
Extrapolated linearity
50 100 150 −0.5 0.5 1 1.5 2 2.5 −5 5 10 15 20 25 30 −0.2 0.2 0.4 0.6 0.8 1
Normalized noise power Normalized linearly predictive power
[extrapolated range: (0.19,0.39); mean Jackknife estimate: 0.29]
Simulated (almost) linear data
50 100 150 0.5 1 1.5 2 2.5 3 −5 5 10 15 20 25 30 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5
Normalized noise power Normalized linearly predictive power
[extrapolated range: (0.95,0.97); mean Jackknife estimate: 0.97]
Linear fits to non-linear functions
Linear fits to non-linear functions
(Stimulus dependence does not always signal response adaptation)
Approximations are stimulus dependent
(Stimulus dependence does not always signal response adaptation)
Consequences
Local fitting can have counterintuitive consequences on the interpreta- tion of a “receptive field”.
“Independently distributed” stimuli
Knowing stimulus power at any set of points in analysis space provides noinformation about stimulus power at any other point. DRC:
Space Spectrotemporal
Ripple: Independence is a property of stimulus and analysis space.
Christianson, Sahani, and Linden (2008)
Nonlinearity & non-independence distort RF estimates
Stimulus may have higher-order correlations in other analysis spaces — interaction with nonlinearities can produce misleading “receptive fields.”
Christianson, Sahani, and Linden (2008)
What about natural sounds?
Multiplicative RF
Time (ms)
- Freq. (kHz)
−30 −25 −20 −15 −10 −5 1 2 3 4 5 6 7
Multiplicative RF
- Freq. (kHz)
Time (ms)
−30 −25 −20 −15 −10 −5 1 2 3 4 5 6 7
Finch Song
- Freq. (kHz)
Time (ms)
−30 −25 −20 −15 −10 −5 1 2 3 4 5 6 7
Finch Song
- Freq. (kHz)
Time (ms)
−30 −25 −20 −15 −10 −5 1 2 3 4 5 6 7
Usually not independent in any space — so STRFs may not be conser- vative estimates of receptive fields.
Christianson, Sahani, and Linden (2008)
Beyond linearity
Beyond linearity
Linear models often fail to predict well. Alternatives?
- Wiener/Volterra functional expansions
– M-series – Linearised estimation – Kernel formulations
- LN (Wiener) cascades
– Spike-trigger covariance (STC) methods – “Maximimally informative” dimensions (MID) ⇔ ML nonparametric LNP models – ML Parametric GLM models
- NL (Hammerstein) cascades
– Multilinear formulations
Non-linear models
The LNP (Wiener) cascade
k n
Rectification addresses negative firing rates. Possible biophysical justi- fication.
LNP estimation – the Spike-triggered ensemble
Single linear filter
STA. Non-linearity. STA unbiased for spherical (elliptical) data. Bussgang. Non-spherical inputs. Biases.
Multiple filters
Distribution changes along relevant directions (and, usually, along all linear combinations of relevant directions). Proxies for distribution:
- mean: STA (can only reveal a single direction)
- variance: STC
- binned (or kernel) KL: MID “maximally informative directions” (equiv-
alent to ML in LNP model with binned nonlinearity)
STC
Project out STA:
- X = X − (Xksta)kT
sta;
Cprior =
- XT
X N ; Cspike =
- XTdiag(Y )
X Nspike
Choose directions with greatest change in variance: k- argmax
v=1
vT(Cprior − Cspike)v ⇒ find eigenvectors of (Cprior − Cspike) with large (absolute) eigvals.
STC
Reconstruct nonlinearity (may assume separability)
Biases
STC (obviously) requires that the nonlinearity alter variance. If so, subspace is unbiased if distribution
- radially (elliptically) symmetric
- AND independent
⇒ Gaussian.
May be possible to correct by transformation, subsampling or weighting (latter two at cost of variance).
More LNP methods
- Non-parametric non-linearities: “Maximally informative dimensions”
(MID) ⇔ “non-parametric” maximum likelihood. – Intuitively, extends the variance difference idea to arbitrary differ- ences between marginal and spike-conditioned stimulus distribu- tions.
kMID = argmax
k
KL[P(k · x)P(k · x|spike)] – Measuring KL requires binning or smoothing—turns out to be equiv- alent to fitting a non-parametric nonlinearity by binning or smooth- ing. – Difficult to use for high-dimensional LNP models.
- Parametric non-linearities: the “generalised linear model” (GLM).
Generalised linear models
LN models with specified nonlinearities and exponential family noise. In general (for monotonic g):
y ∼ ExpFamily[µ(x)]; g(µ) = βx
For our purposes easier to write
y ∼ ExpFamily[f(βx)]
(Continuous time) point process likelihood with GLM-like dependence of
λ on covariates is approached in limit of bins → 0 by either Poisson or
Bernoulli GLM.
Mark Berman and T. Rolf Turner (1992) Approximating Point Process Likelihoods with GLIM Journal of the Royal Statistical Society. Series C (Applied Statistics), 41(1):31-38.
Generalised linear models
Poisson distribution ⇒ f = exp() is canonical (natural params = βx). Canonical link functions give concave likelihoods ⇒ unique maxima. Generalises (for Poisson) to any f which is convex and log-concave: log-likelihood = c − f(βx) + y log f(βx) Includes:
- threshold-linear
- threshold-polynomial
- “soft-threshold” f(z) = α−1 log(1 + eαz).
Generalised linear models
ML parameters found by
- gradient ascent
- IRLS
Regularisation by L2 (quadratic) or L1 (absolute value – sparse) penal- ties (MAP with Gaussian/Laplacian priors) preserves concavity.
Linear-Nonlinear-Poisson (GLM)
stimulus filter
point nonlinearity Poisson spiking
stimulus
k
(t)
GLM with history-dependence
- rate is a product of stim- and spike-history dependent terms
- output no longer a Poisson process
- also known as “soft-threshold” Integrate-and-Fire model
exponential nonlinearity
+
post-spike filter
h
(t)
stimulus filter
(Truccolo et al 04)
k
Poisson spiking
conditional intensity
(spike rate)
stimulus
filter output
traditional IF
filter output
- “hard threshold”
“soft-threshold” IF
spike rate
GLM with history-dependence
exponential nonlinearity
+
post-spike filter
h
(t)
stimulus filter
k
Poisson spiking
- “soft-threshold” approximation to Integrate-and-Fire model
stimulus
GLM dynamic behaviors
time after spike time (ms)
50 100 100 200 300 400 500
stimulus x(t) post-spike waveform stim-induced spike-history induced
regular spiking
GLM dynamic behaviors
stimulus x(t) post-spike waveform stim-filter output spike-history filter output
regular spiking
10 20 100 200 300 400 500
time after spike time (ms)
irregular spiking
GLM dynamic behaviors
stimulus x(t)
bursting
post-spike waveform
time after spike time (ms)
20 40 100 200 300 400 500
- 10
adaptation
Generalized Linear Model (GLM)
post-spike filter
exponential nonlinearity probabilistic spiking
stimulus
stimulus filter
+
multi-neuron GLM
exponential nonlinearity probabilistic spiking
stimulus
neuron 1 neuron 2 post-spike filter stimulus filter
+ +
multi-neuron GLM
exponential nonlinearity probabilistic spiking
coupling filters
stimulus
neuron 1 neuron 2 post-spike filter stimulus filter
+ +
conditional intensity
(spike rate)
...
time
t
GLM equivalent diagram:
Multilinear models
Input nonlinearities (Hammerstein cascades) can be identified in a mul- tilinear (cartesian tensor) framework.
Input nonlinearities
The basic linear model (for sounds):
- r(i)
- predicted rate
=
- jk
wtf
jk
- STRF weights
s(i − j, k)
- stimulus power
,
How to measure s? (pressure, intensity, dB, thresholded, . . . ) We can learn an optimal representation g(.):
ˆ r(i) =
- jk
wtf
jkg(s(i − j, k)).
Define: basis functions {gl} such that g(s) =
l wl lgl(s)
and stimulus array Mijkl = gl(s(i − j, k)). Now the model is
ˆ r(i) =
- j
wtf
jkwl lMijkl
- r
- r = (wtf ⊗ wl) • M.
Multilinear models
Multilinear forms are straightforward to optimise by alternating least squares. Cost function:
E =
- r − (wtf ⊗ wl) • M
- 2
Minimise iteratively, defining matrices B = wl • M and A = wtf • M and updating wtf = (BTB)−1BTr and wl = (ATA)−1ATr. Each linear regression step can be regularised by evidence optimisa- tion (suboptimal), with uncertainty propagated approximately using vari- ational methods.
Some input non-linearities
25 40 55 70 l (dB−SPL) wl
Parameter grouping
Separable STRF: (time) ⊗ (frequency). The input nonlinearity model is separable in another sense: (time, frequency) ⊗ (sound level).
intensity time frequency time frequency intensity weight
Other separations:
- (time, sound level) ⊗ (frequency):
- r = (wtl ⊗ wf) • M,
- (frequency, sound level) ⊗ (time):
- r = (wfl ⊗ wt) • M,
- (time) ⊗ (frequency) ⊗ (sound level):
- r = (wl ⊗ wf ⊗ wl) • M.
(time, frequency) ⊗ (sound level)
t (ms) f (kHz) 180 120 60 32 16 8 4 2 25 40 55 70 l (dB−SPL) wl
(time, sound level) ⊗ (frequency)
t (ms) l (dB−SPL) 180 120 60 70 55 40 25 25 50 100 f (kHz) wf
(frequency, sound level) ⊗ (time)
f (kHz) l (dB−SPL) 2 4 8 16 32 70 55 40 25 180 120 60 t (ms) wt
Contextual influences
stimulus wτφ wtf PSTH time frequency
rate(i) = c +
- jkl
wt
jwf kwl l gl(soundi−j,k)
- c2 +
- mnp
wτ
mwφ n wλ p gp(soundi−j−m,k+n)
Contextual influences
Introduce extra dimensions:
- τ: time difference between contextual and primary tone,
- φ: frequency difference between contextual and primary tone,
- λ: sound level of the contextual tone.
Model the effective sound level of a primary tone by Level(i, j) → Level(i, j) · (const + Context(i, j)) . and the context by Context(i, j) =
- m,n,p
wτ
mwφ n wλ p hp(s(i − m, j + n))
This leads to a multilinear model
- r = (wt ⊗ wf ⊗ wl ⊗ wτ ⊗ wφ ⊗ wλ) • M.
Inseparable contexts
We can also allow inseparable contexts (and principal fields), dropping the level-nonlinearity to reduce parameters.
r(i) = c +
- jk
wtf
jk soundi−j,k
- 1 +
- mn
wτφ
mn soundi−j−m,k+n
- stimulus
wτφ wtf PSTH time frequency