neural encoding models
play

Neural Encoding Models Maneesh Sahani Gatsby Computational - PowerPoint PPT Presentation

Neural Encoding Models Maneesh Sahani Gatsby Computational Neuroscience Unit University College London February 2019 Neural coding Neural Coding The brain appears to process sensory information in a modular way. Different structures and


  1. Spikes, or rate? Most neurons communicate using action potentials — statistically described by a point process: � � spike ∈ [ t , t + dt ) = λ ( t | H ( t ) , stimulus , network activity ) dt P To fully model the response we need to identify λ . In general this depends on spike history H ( t ) and network activity. Three options:

  2. Spikes, or rate? Most neurons communicate using action potentials — statistically described by a point process: � � spike ∈ [ t , t + dt ) = λ ( t | H ( t ) , stimulus , network activity ) dt P To fully model the response we need to identify λ . In general this depends on spike history H ( t ) and network activity. Three options: ◮ Ignore the history dependence, take network activity as source of “noise” (i.e. assume firing is inhomogeneous Poisson or Cox process, conditioned on the stimulus).

  3. Spikes, or rate? Most neurons communicate using action potentials — statistically described by a point process: � � spike ∈ [ t , t + dt ) = λ ( t | H ( t ) , stimulus , network activity ) dt P To fully model the response we need to identify λ . In general this depends on spike history H ( t ) and network activity. Three options: ◮ Ignore the history dependence, take network activity as source of “noise” (i.e. assume firing is inhomogeneous Poisson or Cox process, conditioned on the stimulus). ◮ Average multiple trials to estimate the mean intensity (or PSTH) � 1 λ ( t , stimulus ) = λ ( t | H n ( t ) , stimulus , network n ) , lim N N →∞ n and try to fit this.

  4. Spikes, or rate? Most neurons communicate using action potentials — statistically described by a point process: � � spike ∈ [ t , t + dt ) = λ ( t | H ( t ) , stimulus , network activity ) dt P To fully model the response we need to identify λ . In general this depends on spike history H ( t ) and network activity. Three options: ◮ Ignore the history dependence, take network activity as source of “noise” (i.e. assume firing is inhomogeneous Poisson or Cox process, conditioned on the stimulus). ◮ Average multiple trials to estimate the mean intensity (or PSTH) � 1 λ ( t , stimulus ) = λ ( t | H n ( t ) , stimulus , network n ) , lim N N →∞ n and try to fit this. ◮ Attempt to capture history and network effects in simple models.

  5. Tuning – stationary stimuli

  6. (Nonlinear) filtering – dynamic stimuli A Time signal C B Temporal filter Stimulus history Estimation method Target time bin Find parameters of model Spikes Time lag Rearrange stimulus Time history as vector: Spectrogram Spectro-temporal filter Neural encoding model Frequency Frequency Stimulus-response function relates and through linear Time lag Time filters Images Spatio-temporal filter Response value y Model validation y in target time bin: Quantify how well model captures neural response x x Time lag Time

  7. Spike-triggered average Decoding: mean of P ( s | r = 1 )

  8. Spike-triggered average Decoding: mean of P ( s | r = 1 ) Encoding: predictive filter

  9. Linear regression s 1 s 2 s 3 . . . s T s T + 1 . . . � T r ( t ) = s ( t − τ ) w ( τ ) d τ 0

  10. Linear regression s 1 s 2 s 3 . . . s T s T + 1 . . . s 1 s 2 s 3 . . . s T � T � �� � r ( t ) = s ( t − τ ) w ( τ ) d τ 0 w t r T s 1 s 2 s 3 . . . s T + 1 . . . × = w 3 w 2 w 1

  11. Linear regression s 1 s 2 s 3 . . . s T s T + 1 . . . s 1 s 2 s 3 . . . s T � T � �� � s 1 s 2 s 3 . . . s T s T r ( t ) = s ( t − τ ) w ( τ ) d τ � �� � 0 . . . s 1 s 2 s 3 s T + 1 w t r T s 2 s 3 s 4 . . . s T + 1 . r T + 1 . . × = w 3 . . . . . w 2 . w 1

  12. Linear regression s 1 s 2 s 3 . . . s T s T + 1 . . . s 1 s 2 s 3 . . . s T � T � �� � s 1 s 2 s 3 . . . s T s T r ( t ) = s ( t − τ ) w ( τ ) d τ � �� � 0 . . . s 1 s 2 s 3 s T + 1 w t r T s 2 s 3 s 4 . . . s T + 1 . r T + 1 . . × = w 3 . . . . . w 2 . w 1 SW = R

  13. Linear regression s 1 s 2 s 3 . . . s T s T + 1 . . . s 1 s 2 s 3 . . . s T � T � �� � s 1 s 2 s 3 . . . s T s T r ( t ) = s ( t − τ ) w ( τ ) d τ � �� � 0 . . . s 1 s 2 s 3 s T + 1 w t r T s 2 s 3 s 4 . . . s T + 1 . r T + 1 . . × = w 3 . . . . . w 2 . w 1 SW = R − 1 ( S T R ) W ( ω ) = S ( ω ) ∗ R ( ω ) W = ( S T S ) � �� � � �� � | S ( ω ) | 2 Σ SS STA

  14. Linear models So the (whitened) spike-triggered average gives the minimum-squared-error linear model. Issues:

  15. Linear models So the (whitened) spike-triggered average gives the minimum-squared-error linear model. Issues: ◮ overfitting and regularisation

  16. Linear models So the (whitened) spike-triggered average gives the minimum-squared-error linear model. Issues: ◮ overfitting and regularisation ◮ standard methods for regression

  17. Linear models So the (whitened) spike-triggered average gives the minimum-squared-error linear model. Issues: ◮ overfitting and regularisation ◮ standard methods for regression ◮ negative predicted rates

  18. Linear models So the (whitened) spike-triggered average gives the minimum-squared-error linear model. Issues: ◮ overfitting and regularisation ◮ standard methods for regression ◮ negative predicted rates ◮ can model deviations from background

  19. Linear models So the (whitened) spike-triggered average gives the minimum-squared-error linear model. Issues: ◮ overfitting and regularisation ◮ standard methods for regression ◮ negative predicted rates ◮ can model deviations from background ◮ real neurons aren’t linear

  20. Linear models So the (whitened) spike-triggered average gives the minimum-squared-error linear model. Issues: ◮ overfitting and regularisation ◮ standard methods for regression ◮ negative predicted rates ◮ can model deviations from background ◮ real neurons aren’t linear ◮ models are still used extensively

  21. Linear models So the (whitened) spike-triggered average gives the minimum-squared-error linear model. Issues: ◮ overfitting and regularisation ◮ standard methods for regression ◮ negative predicted rates ◮ can model deviations from background ◮ real neurons aren’t linear ◮ models are still used extensively ◮ interpretable suggestions of underlying sensitivity (but see later)

  22. Linear models So the (whitened) spike-triggered average gives the minimum-squared-error linear model. Issues: ◮ overfitting and regularisation ◮ standard methods for regression ◮ negative predicted rates ◮ can model deviations from background ◮ real neurons aren’t linear ◮ models are still used extensively ◮ interpretable suggestions of underlying sensitivity (but see later) ◮ may provide unbiased estimates of cascade filters (see later)

  23. Likelihood penalties for regularisation w = argmax � L ( w ; Data ) − R ( w ) � �� � � �� � w Likelihood Regulariser R may penalise large values of w (e.g. � w � 2 or � i | w i | ) or may promote smoothness or other properties.

  24. Appropriate priors frequency (kHz) 100 50 25 −240 −120 0 time (ms)

  25. Appropriate priors frequency (kHz) 100 50 25 −240 −120 0 time (ms) ◮ sparsity [ C ii zero for many i ] ARD

  26. Appropriate priors frequency (kHz) 100 50 25 −240 −120 0 time (ms) ◮ sparsity [ C ii zero for many i ] ARD ◮ smoothness [ C ij high for close i and j ] ASD

  27. Appropriate priors frequency (kHz) 100 50 25 −240 −120 0 time (ms) ◮ sparsity [ C ii zero for many i ] ARD ◮ smoothness [ C ij high for close i and j ] ASD ◮ locality [ C ii high in a single region] ALD

  28. Smoothness and sparsity (ASD/RD) ML ARD ASD ASD/RD 100 100 100 100 frequency (kHz) 50 50 50 50 25 25 25 25 −240 −180 −120 −60 0 −240 −180 −120 −60 0 −240 −180 −120 −60 0 −240 −180 −120 −60 0 time (ms) time (ms) time (ms) time (ms) R2001011802G/20010731/pen14loc2poisshical020 1.4 0.2305 0.2305 0.2305 1.2 0.1975 0.1975 0.1975 1 0.1646 0.1646 0.1646 0.8 0.1317 0.1317 0.1317 0.6 0.0988 0.0988 0.0988 0.4 0.0658 0.0658 0.0658 0.2 0.0329 0.0329 0.0329 0 0 0 0 ML ARD ASD ASD/RD

  29. Beyond linearity

  30. Beyond linearity Linear models often fail to predict well. Alternatives? ◮ Wiener/Volterra functional expansions ◮ M-series ◮ Linearised estimation ◮ Kernel formulations ◮ LN (Wiener) cascades ◮ Spike-trigger covariance (STC) methods ◮ “Maximimally informative” dimensions (MID) ⇔ ML nonparametric LNP models ◮ ML Parametric GLM models ◮ NL (Hammerstein) cascades ◮ Multilinear formulations ◮ LNLN and more . . .

  31. The Volterra functional expansion A polynomial-like expansion for functionals (or operators). Let y ( t ) = F [ x ( t )] . Then: � �� y ( t ) ≈ k ( 0 ) + d τ k ( 1 ) ( τ ) x ( t − τ ) + d τ 1 d τ 2 k ( 2 ) ( τ 1 , τ 2 ) x ( t − τ 1 ) x ( t − τ 2 ) ��� d τ 1 d τ 2 d τ 3 k ( 3 ) ( τ 1 , τ 2 , τ 3 ) x ( t − τ 1 ) x ( t − τ 2 ) x ( t − τ 3 ) + . . . + or (in discretised time) � � � y t = K ( 0 ) + K ( 1 ) K ( 2 ) K ( 3 ) x t − i + x t − i x t − j + ijk x t − i x t − j x t − k + . . . i ij i ij ijk For finite expansion, the kernels k ( 0 ) , k ( 1 ) ( · ) , k ( 2 ) ( · , · ) , k ( 3 ) ( · , · , · ) , . . . are not straightforwardly related to the functional F . Indeed, values of lower-order kernels change as the maximum order of the expansion is increased. Estimation: model is linear in kernels, so can be estimated just like a linear (first-order) model with expanded “input”. ◮ Kernel trick: polynomial kernel K ( x 1 , x 2 ) = ( 1 + x 1 x 2 ) n . ◮ M-series.

  32. Wiener Expansion The Wiener expansion gives functionals of different orders that are orthogonal for white noise input x ( t ) . G 0 [ x ( t ); h ( 0 ) ] = h ( 0 ) � G 1 [ x ( t ); h ( 1 ) ] = d τ h ( 1 ) ( τ ) x ( t − τ ) �� � G 2 [ x ( t ); h ( 2 ) ] = d τ 1 d τ 2 h ( 2 ) ( τ 1 , τ 2 ) x ( t − τ 1 ) x ( t − τ 2 ) − P d τ 1 h ( 2 ) ( τ 1 , τ 1 ) ��� G 3 [ x ( t ); h ( 3 ) ] = d τ 1 d τ 2 d τ 3 h ( 3 ) ( τ 1 , τ 2 , τ 3 ) x ( t − τ 1 ) x ( t − τ 2 ) x ( t − τ 3 ) �� d τ 1 d τ 2 h ( 3 ) ( τ 1 , τ 2 , τ 2 ) x ( t − τ 1 ) − 3 P Easy to verify that E [ G i [ x ( t )] G j [ x ( t )]] = 0 for i � = j . Thus, these kernels can be estimated independently. But, they depend on the stimulus.

  33. Cascade models The LNP (Wiener) cascade n k ◮ Rectification addresses negative firing rates. ◮ Loose biophysical correspondance.

  34. LNP cascades and noise Linear-Gaussian model A B f ( x ) + Linear filtering x Filtered stimulus x Time Stimulus s Filter k Linear-nonlinear Poisson model C × = x f ( x ) x Filtered stimulus x Time Time Linear-nonlinear Bernoulli model D f ( x ) x Filtered stimulus x Time Weight 0 Spike history Time

  35. LNP estimation – the Spike-triggered ensemble

  36. Single linear filter n k ◮ STA is unbiased estimate of filter for spherical input distribution. (Bussgang’s theorem) ◮ Elliptically-distributed data can be whitened ⇒ linear regression weights are unbiased. ◮ Linear weights are not necessarily maximum-likelihood (or otherwise optimal), even for spherical/elliptical stimulus distributions. ◮ Linear weights may be biased for general stimuli (binary/uniform or natural).

  37. Multiple filters Stimuli Linear filters Nonlinearity Spike generation x 1 k 1 x 2 k 2 Time k 3 x 3 Time x 4 k 4 Distribution changes along relevant directions (and, usually, along all linear combinations of relevant directions). Proxies to measure change in distribution: ◮ mean: STA (can only reveal a single direction) ◮ variance: STC ◮ binned (or kernel) KL divergence: MID “maximally informative directions” (equivalent to ML in LNP model with binned nonlinearity)

  38. STC Project out STA: S T � � � S T diag ( R ) � S S � S = S − ( S k sta ) k T sta ; C prior = N ; C spike = N spike Choose directions with greatest change in variance: v T ( C prior − C spike ) v k- argmax � v � = 1 ⇒ find eigenvectors of ( C prior − C spike ) with large (absolute) eigvals.

  39. STC Reconstruct nonlinearity (may assume separability)

  40. Biases STC (obviously) requires that the nonlinearity alter variance. If so, subspace is unbiased provided distribution is ◮ radially (elliptically) symmetric ◮ AND independent ⇒ Gaussian. May be possible to correct for non-Gaussian stimulus by transformation, subsampling or weighting (latter two at cost of variance).

  41. More LNP methods ◮ Non-parametric non-linearities: “Maximally informative dimensions” (MID) ⇔ “non-parametric” maximum likelihood. ◮ Intuitively, extends the variance difference idea to arbitrary differences between marginal and spike-conditioned stimulus distributions. k MID = argmax KL [ P ( k · x ) � P ( k · x | spike )] k ◮ Measuring KL requires binning or smoothing—turns out to be equivalent to fitting a non-parametric nonlinearity by binning or smoothing (Williamson, Sahani, Pillow PLoSCB 2015). ◮ Difficult to use for high-dimensional LNP models (but ML viewpoint suggests separable or “cylindrical” basis functions – see Williamson et al.). ◮ Parametric non-linearities: the “generalised linear model” (GLM).

  42. Generalised linear models LN models with specified nonlinearities and exponential-family noise. In general (for monotonic g ): y ∼ ExpFamily [ µ ( x )]; g ( µ ) = β x For our purposes easier to write y ∼ ExpFamily [ f ( β x )] (Continuous time) point process likelihood with GLM-like dependence of λ on covariates is approached in limit of bins → 0 by either Poisson or Bernoulli GLM. Mark Berman and T. Rolf Turner (1992) Approximating Point Process Likelihoods with GLIM Journal of the Royal Statistical Society. Series C (Applied Statistics), 41(1):31-38.

  43. Generalised linear models Poisson distribution ⇒ f = exp () is canonical ( natural params = β x ). Canonical link functions give concave likelihoods ⇒ unique maxima. Generalises (for Poisson) to any f which is convex and log-concave: log-likelihood = c − f ( β x ) + y log f ( β x ) Includes: ◮ threshold-linear ◮ threshold-polynomial f ( z ) = [ z 3 ] + ◮ “soft-threshold” f ( z ) = α − 1 log ( 1 + e α z ) . f ( z ) f ( z ) = log ( 1 + e z ) f ( z ) = 1 3 log ( 1 + e 3 z ) f ( z ) = [ z ] + z

  44. Generalised linear models ML parameters found by ◮ gradient ascent ◮ IRLS Regularisation by L 2 (quadratic) or L 1 (absolute value – sparse) penalties (MAP with Gaussian/Laplacian priors) preserves concavity.

  45. Linear-Nonlinear-Poisson (GLM) Poisson point stimulus filter nonlinearity spiking stimulus ( t ) k

  46. GLM with history-dependence (Truccolo et al 04) Poisson exponential stimulus filter nonlinearity spiking + stimulus ( t ) k post-spike filter h conditional intensity (spike rate) • rate is a product of stim- and spike-history dependent terms • output no longer a Poisson process • also known as “soft-threshold” Integrate-and-Fire model

  47. GLM with history-dependence Poisson exponential stimulus filter nonlinearity spiking + stimulus ! ( t ) k post-spike filter h traditional IF “soft-threshold” IF spike rate “hard threshold” filter output filter output • “soft-threshold” approximation to Integrate-and-Fire model

  48. GLM dynamic behaviors stimulus x(t) post-spike waveform regular spiking stim-induced 0 0 spike-history induced 0 50 100 0 100 200 300 400 500 time after spike time (ms)

  49. GLM dynamic behaviors stimulus x(t) post-spike waveform regular spiking stim-filter output 0 0 spike-history filter output irregular spiking 0 0 0 10 20 0 100 200 300 400 500 time after spike time (ms)

  50. GLM dynamic behaviors stimulus x(t) post-spike waveform bursting 0 0 adaptation 0 0 -10 0 20 40 0 100 200 300 400 500 time after spike time (ms)

  51. Generalized Linear Model (GLM) exponential probabilistic stimulus filter nonlinearity spiking + stimulus post-spike filter

  52. multi-neuron GLM exponential probabilistic stimulus filter nonlinearity spiking + post-spike filter neuron 1 stimulus + neuron 2

  53. multi-neuron GLM exponential probabilistic stimulus filter nonlinearity spiking + post-spike filter neuron 1 stimulus coupling filters + neuron 2

  54. GLM equivalent diagram: ... time t conditional intensity (spike rate)

  55. Non-LN models? The idea of responses depending on one or a few linear stimulus projections has been dominant, but cannot capture all non-linearities. ◮ Contrast sensitivity might require normalisation by � s � . ◮ Linear weighting may depend on units of stimulus measurement: amplitude? energy? logarithms? thresholds? (NL models – Hammerstein cascades) ◮ Neurons, particularly in the auditory system are known to be sensitive to combinations of inputs: forward suppression; spectral patterns (Young); time-frequency interactions (Sadogopan and Wang). ◮ Experiments with realistic stimuli reveal nonlinear sensivity to parts/whole (Bar-Yosef and Nelken). Many of these questions can be tackled using a multilinear (cartesian tensor) framework.

  56. Input nonlinearities The basic linear model (for sounds): � w tf � r ( i ) = s ( i − j , k ) , jk ���� ���� � �� � jk predicted rate STRF weights stimulus power

  57. Input nonlinearities The basic linear model (for sounds): � w tf � r ( i ) = s ( i − j , k ) , jk ���� ���� � �� � jk predicted rate STRF weights stimulus power How to measure s ? (pressure, intensity, dB, thresholded, . . . )

  58. Input nonlinearities The basic linear model (for sounds): � w tf � r ( i ) = s ( i − j , k ) , jk ���� ���� � �� � jk predicted rate STRF weights stimulus power How to measure s ? (pressure, intensity, dB, thresholded, . . . ) We can learn an optimal representation g ( . ) : � w tf ˆ r ( i ) = jk g ( s ( i − j , k )) . jk

  59. Input nonlinearities The basic linear model (for sounds): � w tf � r ( i ) = s ( i − j , k ) , jk ���� ���� � �� � jk predicted rate STRF weights stimulus power How to measure s ? (pressure, intensity, dB, thresholded, . . . ) We can learn an optimal representation g ( . ) : � w tf ˆ r ( i ) = jk g ( s ( i − j , k )) . jk Define: basis functions { g l } such that g ( s ) = � l w l l g l ( s ) and stimulus array M ijkl = g l ( s ( i − j , k )) . Now the model is � w tf jk w l ˆ r ( i ) = l M ijkl jkl

  60. Input nonlinearities The basic linear model (for sounds): � w tf � r ( i ) = s ( i − j , k ) , jk ���� ���� � �� � jk predicted rate STRF weights stimulus power How to measure s ? (pressure, intensity, dB, thresholded, . . . ) We can learn an optimal representation g ( . ) : � w tf ˆ r ( i ) = jk g ( s ( i − j , k )) . jk Define: basis functions { g l } such that g ( s ) = � l w l l g l ( s ) and stimulus array M ijkl = g l ( s ( i − j , k )) . Now the model is � r = ( w tf ⊗ w l ) • M . w tf jk w l r ( i ) = ˆ � l M ijkl or jkl

  61. Multilinear models Multilinear forms are straightforward to optimise by alternating least squares. Cost function: � � 2 � r − ( w tf ⊗ w l ) • M � � E = � Minimise iteratively, defining matrices B = w l • M A = w tf • M and and updating w tf = ( B T B ) − 1 B T r w l = ( A T A ) − 1 A T r . and Each linear regression step can be regularised by evidence optimisation (suboptimal), with uncertainty propagated approximately using variational methods.

  62. Some input non-linearities w l 0 25 40 55 70 l (dB−SPL)

  63. Variable (combination-dependent) input gain ◮ Sensitivities to different points in sensory space are not independent.

  64. Variable (combination-dependent) input gain ◮ Sensitivities to different points in sensory space are not independent. ◮ Rather, the sensitivity at one point depends on other elements of the stimulus that create a local sensory context.

  65. Variable (combination-dependent) input gain ◮ Sensitivities to different points in sensory space are not independent. ◮ Rather, the sensitivity at one point depends on other elements of the stimulus that create a local sensory context. ◮ This context adjusts the input gain of the cell from moment to moment, dynamically refining the shape of the weighted receptive field.

  66. Context-sensitive gain s ( i , k ) r ( i )

  67. Context-sensitive gain � J � K w tf ˆ r ( i ) = c + j + 1 , k s ( i − j , k ) j = 0 k = 1

  68. Context-sensitive gain � � � J � K � M � N w tf w τφ ˆ r ( i ) = c + j + 1 , k s ( i − j , k ) 1 + m + 1 , n + N + 1 s ( i − j − m , k + n ) j = 0 k = 1 m = 0 n = − N

  69. LNLN cascades ◮ Limited description of ’layered’ structure of sensory pathways: � � � N � � k T ˆ r ( t ) = f w n g n n s ( t ) n = 1 ◮ k n describes the linear filter and g n the output nonlinearity of each of N input subunits. The g n are usually fixed half-wave rectifiers. ◮ Called a generalised nonlinear model (GNM; Butts et al . 2007, 2011; Schinkel-Bielefeld et al . 2012) ◮ Or a nonlinear input model (NIM; McFarland et al . 2013). ◮ Parameters estimated by maximum-likelihood using inhomogeneous Poisson noise – often by alternation (following Ahrens et al. 2008). ◮ Resembles a (perceptron) “neural network”.

  70. Convolutional LNLN � � C N B � � � � � k T ˆ r ( t ) = f c , n s ( t ) w c , n b c , i g i c = 1 n = 1 i = 1 ◮ C “channels” – each uses same kernel k c translated to a different location (convolution). ◮ Input nonlinearities learned using basis expansion and alternation (Ahrens et al. 2008). ◮ Output nonlinearity f fixed.

  71. Limitations of linear approximations What are the consequences of nonlinearities in the stimulus-response function for interpretation of structure in linear models like STRFs?

  72. Linear fits to non-linear functions (Stimulus dependence does not always signal response adaptation)

  73. Linear fits to non-linear functions (Stimulus dependence does not always signal response adaptation)

  74. Approximations are stimulus dependent

  75. Approximations are stimulus dependent

  76. Approximations are stimulus dependent

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend