Neural Encoding Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk - PowerPoint PPT Presentation

Neural Encoding Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2011

Studying sensory systems x ( t ) y ( t ) x ( t )= G [ y ( t )] ˆ Decoding: (reconstruction) y ( t )= F [ x ( t )] ˆ Encoding: (systems identification)

General approach Goal: Estimate p ( spike | x, H ) [or λ ( t | x [0 , t ) , H ( t )) ] from data. • Naive approach: measure p ( spike , H | x ) directly for every setting of x . – too hard: too little data and too many potential inputs. • Estimate some functional f ( p ) instead (e.g. mutual information) • Select stimuli efficiently • Fit models with smaller numbers of parameters

Spikes, or rate? Most neurons communicate using action potentials — statistically de- scribed by a point process: � � spike ∈ [ t, t + dt ) = λ ( t | H ( t ) , stimulus , network activity ) dt P To fully model the response we need to identify λ . In general this de- pends on spike history H ( t ) and network activity. Three options: • Ignore the history dependence, take network activity as source of “noise” (i.e. assume firing is inhomogeneous Poisson or Cox process, conditioned on the stimulus). • Average multiple trials to estimate � 1 λ ( t, stimulus ) = lim λ ( t | H n ( t ) , stimulus , network n ) N N →∞ n the mean intensity (or PSTH), and try to fit this. • Attempt to capture history and network effects in simple models.

Spike-triggered average mean of P ( x | y = 1) Decoding: Encoding: predictive filter

Linear regression x 1 x 2 x 3 . . . x T x T +1 . . . � T x 1 x 2 x 3 . . . x T � �� y ( t ) = x ( t − τ ) w ( τ ) dτ x 1 x 2 x 3 . . . x T x T � �� 0 x 1 x 2 x 3 . . . x T +1 w t y T . . x 2 x 3 x 4 . . . x T +1 . y T +1 × = w 3 . . . . . w 2 . w 1 XW = Y − 1 ( X T Y ) W ( ω ) = X ( ω ) ∗ Y ( ω ) W = ( X T X ) � �� | X ( ω ) | 2 Σ SS STA

Linear models So the (whitened) spike-triggered average gives the minimum-squared- error linear model. Issues: • overfitting and regularisation – standard methods for regression • negative predicted rates – can model deviations from background • real neurons aren’t linear – models are still used extensively – interpretable suggestions of underlying sensitivity – may provide unbiased estimates of cascade filters (see later)

How good are linear predictions? We would like an absolute measure of model performance. Measured responses can never be predicted perfectly: • The measurements themselves are noisy. Models may fail to predict because: • They are the wrong model. • Their parameters are mis-estimated due to noise.

Estimating predictable power P signal P noise = signal + noise response � �� r ( n )   � � 1   P ( r ( n ) ) = P signal + P noise �   N P ( r ( n ) ) − P ( r ( n ) ) P signal = N − 1 ⇒ P ( r ( n ) ) = P signal + 1     N P noise � P noise = P ( r ( n ) ) − � P signal

Signal power in A1 responses 0.4 Signal power (spikes 2 /bin) 0.3 0.2 0.1 0 0 0.5 1 1.5 2 2.5 3 3.5 4 0 50 100 150 Noise power (spikes 2 /bin) Number of recordings

Testing a model For a perfect prediction � � P ( trial ) − P ( residual ) = P ( signal ) Thus, we can judge the performance of a model by the normalized predictive power P ( trial ) − P ( residual ) � P ( signal ) Similar to coefficient of determination ( r 2 ), but the denominator is the predictable variance.

Predictive performance Training Error Cross−Validation Error 2.5 1 normalised Bayes predictive power 0.5 2 0 1.5 −0.5 1 −1 0.5 −1.5 0 −2 0 0.5 1 1.5 2 2.5 −2 −1 0 1 normalised STA predictive power

Extrapolating the model performance

Jackknifed estimates 3 2.5 Normalized linearly predictive power 2 1.5 1 0.5 0 −0.5 0 50 100 150 Normalized noise power

Extrapolated linearity 2.5 2 1.5 1 1 0.5 0.8 0 Normalized linearly predictive power −0.5 0 50 100 150 0.6 0.4 0.2 0 −0.2 −5 0 5 10 15 20 25 30 Normalized noise power [extrapolated range: (0.19,0.39); mean Jackknife estimate: 0.29]

Simulated (almost) linear data 3 2.5 2 1.5 1.5 1.4 1 0.5 1.3 Normalized linearly predictive power 0 0 50 100 150 1.2 1.1 1 0.9 0.8 0.7 0.6 0.5 −5 0 5 10 15 20 25 30 Normalized noise power [extrapolated range: (0.95,0.97); mean Jackknife estimate: 0.97]

Linear fits to non-linear functions

Linear fits to non-linear functions (Stimulus dependence does not always signal response adaptation)

Approximations are stimulus dependent (Stimulus dependence does not always signal response adaptation)

Consequences Local fitting can have counterintuitive consequences on the interpreta- tion of a “receptive field”.

“Independently distributed” stimuli Knowing stimulus power at any set of points in analysis space provides noinformation about stimulus power at any other point. Ripple: DRC: Spectrotemporal Space Independence is a property of stimulus and analysis space. Christianson, Sahani, and Linden (2008)

Nonlinearity & non-independence distort RF estimates Stimulus may have higher-order correlations in other analysis spaces — interaction with nonlinearities can produce misleading “receptive fields.” Christianson, Sahani, and Linden (2008)

What about natural sounds? Multiplicative RF Multiplicative RF 7 7 6 6 5 5 Freq. (kHz) Freq. (kHz) 4 4 3 3 2 2 1 1 −30 −25 −20 −15 −10 −5 −30 −25 −20 −15 −10 −5 Time (ms) Time (ms) Finch Song Finch Song 7 7 6 6 5 5 Freq. (kHz) Freq. (kHz) 4 4 3 3 2 2 1 1 −30 −25 −20 −15 −10 −5 −30 −25 −20 −15 −10 −5 Time (ms) Time (ms) Usually not independent in any space — so STRFs may not be conser- vative estimates of receptive fields. Christianson, Sahani, and Linden (2008)

Beyond linearity

Beyond linearity Linear models often fail to predict well. Alternatives? • Wiener/Volterra functional expansions – M-series – Linearised estimation – Kernel formulations • LN (Wiener) cascades – Spike-trigger covariance (STC) methods – “Maximimally informative” dimensions (MID) ⇔ ML nonparametric LNP models – ML Parametric GLM models • NL (Hammerstein) cascades – Multilinear formulations

Non-linear models The LNP (Wiener) cascade n k Rectification addresses negative firing rates. Possible biophysical justi- fication.

LNP estimation – the Spike-triggered ensemble

Single linear filter STA. Non-linearity. STA unbiased for spherical (elliptical) data. Bussgang. Non-spherical inputs. Biases.

Multiple filters Distribution changes along relevant directions (and, usually, along all linear combinations of relevant directions). Proxies for distribution: • mean: STA (can only reveal a single direction) • variance: STC • binned (or kernel) KL: MID “maximally informative directions” (equiv- alent to ML in LNP model with binned nonlinearity)

STC Project out STA: X T � � X T diag ( Y ) � � X X X = X − ( X k sta ) k T � sta ; C prior = ; C spike = N N spike Choose directions with greatest change in variance: v T ( C prior − C spike ) v k- argmax � v � =1 ⇒ find eigenvectors of ( C prior − C spike ) with large (absolute) eigvals.

STC Reconstruct nonlinearity (may assume separability)

Biases STC (obviously) requires that the nonlinearity alter variance. If so, subspace is unbiased if distribution • radially (elliptically) symmetric • AND independent ⇒ Gaussian. May be possible to correct by transformation, subsampling or weighting (latter two at cost of variance).

More LNP methods • Non-parametric non-linearities: “Maximally informative dimensions” (MID) ⇔ “non-parametric” maximum likelihood. – Intuitively, extends the variance difference idea to arbitrary differ- ences between marginal and spike-conditioned stimulus distribu- tions. k MID = argmax KL [ P ( k · x ) � P ( k · x | spike )] k – Measuring KL requires binning or smoothing—turns out to be equiv- alent to fitting a non-parametric nonlinearity by binning or smoothing. – Difficult to use for high-dimensional LNP models. • Parametric non-linearities: the “generalised linear model” (GLM).

Generalised linear models LN models with specified nonlinearities and exponential family noise. In general (for monotonic g ): y ∼ ExpFamily [ µ ( x )]; g ( µ ) = β x For our purposes easier to write y ∼ ExpFamily [ f ( β x )] (Continuous time) point process likelihood with GLM-like dependence of λ on covariates is approached in limit of bins → 0 by either Poisson or Bernoulli GLM. Mark Berman and T. Rolf Turner (1992) Approximating Point Process Likelihoods with GLIM Journal of the Royal Statistical Society. Series C (Applied Statistics), 41(1):31-38.

Generalised linear models Poisson distribution ⇒ f = exp() is canonical ( natural params = β x ). Canonical link functions give concave likelihoods ⇒ unique maxima. Generalises (for Poisson) to any f which is convex and log-concave: log-likelihood = c − f ( β x ) + y log f ( β x ) Includes: • threshold-linear • threshold-polynomial • “soft-threshold” f ( z ) = α − 1 log(1 + e αz ) .

Neural Encoding Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk - PowerPoint PPT Presentation

Neural Encoding Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2011 Studying sensory systems x ( t ) y ( t ) x ( t )= G [ y ( t )] Decoding: (reconstruction)

Overview Understanding the neural code Neural Encoding Encoding: Prediction of neural response to

61A Extra Lecture 4 Announcements Encoding Strings Representing Strings: UTF-8 Encoding 4

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Deep Encode: Machine Learning for Per-Title Encoding Daniel Silhavy| IBC20| Per-Title Encoding

Language and Computers Relation to language Encoding written language Prologue: Encoding

Language and Computers Relation to language Encoding written Prologue: Encoding Language

Neural Encoding Models Maneesh Sahani Gatsby Computational Neuroscience Unit University College

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Encoding Models Maneesh Sahani Gatsby Computational Neuroscience Unit University College

Neural encoding models & maximum likelihood Jonathan Pillow 1 probability leftovers:

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

Memory Chapter 7 Encoding, Storage and Retrieval of Memor y Encoding Storage

HIGH-PERFORMANCE GPU VIDEO ENCODING ABHIJIT PATAIT SR. MANAGER, NVIDIA AGENDA GPU Video

1 Comparison of Encoding Comparison of Encoding Schemes (1) Schemes (2) Signal Spectrum

Part II Video General Concepts MPEG1 encoding MPEG2

Self-Organization in Autonomous Sensor/Actuator Networks [SelfOrg] Dr.-Ing. Falko Dressler

ESEVO The Time-Triggered Architecture Bernhard Frmel based on slides by Hermann Kopetz,

INF3490/INF4490: Biologically Inspired Computing Autumn 2017 Lecturer: Kai Olav

Parallel Programming and High-Performance Computing Part 6: Dynamic Load Balancing Dr.

Generalized Linear Models & Logistic Regression Jonathan Pillow Mathematical Tools for

Convolutional Networks II Bhiksha Raj Spring 2020 1 Story so far Pattern classification

mozaik : integrative work-flow framework for large-scale spiking network simulations Jan

Topic 9 Reactance, Impedance and filter circuit Professor Peter YK Cheung Dyson School of

Neural Encoding Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk - PowerPoint PPT Presentation

Neural Encoding Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2011 Studying sensory systems x ( t ) y ( t ) x ( t )= G [ y ( t )] Decoding: (reconstruction)

Overview Understanding the neural code Neural Encoding Encoding: Prediction of neural response to

61A Extra Lecture 4 Announcements Encoding Strings Representing Strings: UTF-8 Encoding 4

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Deep Encode: Machine Learning for Per-Title Encoding Daniel Silhavy| IBC20| Per-Title Encoding

Language and Computers Relation to language Encoding written language Prologue: Encoding

Language and Computers Relation to language Encoding written Prologue: Encoding Language

Neural Encoding Models Maneesh Sahani Gatsby Computational Neuroscience Unit University College

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Encoding Models Maneesh Sahani Gatsby Computational Neuroscience Unit University College

Neural encoding models &amp; maximum likelihood Jonathan Pillow 1 probability leftovers:

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

Memory Chapter 7 Encoding, Storage and Retrieval of Memor y Encoding Storage

HIGH-PERFORMANCE GPU VIDEO ENCODING ABHIJIT PATAIT SR. MANAGER, NVIDIA AGENDA GPU Video

1 Comparison of Encoding Comparison of Encoding Schemes (1) Schemes (2) Signal Spectrum

Part II Video General Concepts MPEG1 encoding MPEG2

Self-Organization in Autonomous Sensor/Actuator Networks [SelfOrg] Dr.-Ing. Falko Dressler

ESEVO The Time-Triggered Architecture Bernhard Frmel based on slides by Hermann Kopetz,

INF3490/INF4490: Biologically Inspired Computing Autumn 2017 Lecturer: Kai Olav

Parallel Programming and High-Performance Computing Part 6: Dynamic Load Balancing Dr.

Generalized Linear Models &amp; Logistic Regression Jonathan Pillow Mathematical Tools for

Convolutional Networks II Bhiksha Raj Spring 2020 1 Story so far Pattern classification

mozaik : integrative work-flow framework for large-scale spiking network simulations Jan

Topic 9 Reactance, Impedance and filter circuit Professor Peter YK Cheung Dyson School of

Neural encoding models & maximum likelihood Jonathan Pillow 1 probability leftovers:

Generalized Linear Models & Logistic Regression Jonathan Pillow Mathematical Tools for