Population Coding High dimensionality (cells stimulus time). - PowerPoint PPT Presentation

Population codes Population Coding • High dimensionality (cells × stimulus × time). – usually limited to simple rate codes. – even prosthetic work assumes instantaneous (lagged) coding • Limited empirical data Peter Latham / Maneesh Sahani – can record 10s - 100s of neurons. – population size more like 10 4 - 10 6 . Gatsby Computational Neuroscience Unit – theoretical inferences, based on single-cell and aggregate (fMRI, LFP , optical) University College London measurements . Term 1, Autumn 2013 Common approach Rate coding In the rate coding context, we imagine that the firing rate of a cell r represents a The most common sort of questions asked of population codes: single (possibly multidimensional) stimulus value s at any one time: • given assumed encoding functions, how well can we (or downstream areas) de- r = f ( s ) . code the encoded stimulus value? • what encoding schemes would be optimal, in the sense of allowing decoders to estimate stimulus values as well as possible. Even if s and r are embedded in time-series we assume: 1. that coding is instantaneous (with a fixed lag), 2. that r (and therefore s ) is constant over a short time ∆ . Before considering populations, we need to formulate some ideas about rate coding The actual number of spikes n produced in ∆ is then taken to be distributed around in the context of single cells. r ∆ , often according to a Poisson distribution.

Tuning curves Measuring the performance of rate codes: Discrete choice Suppose we want to make a binary choice based on firing rate: The function f ( s ) is known as a tuning curve. • present / absent (signal detection) Commonly assumed forms: • up / down � � • horizontal / vertical − 1 2 σ 2 ( x − x pref ) 2 • Gaussian r 0 + r max exp Call one potential stimulus s 0 , the other s 1 . P ( n | s ) : • Cosine r 0 + r max cos ( θ − θ pref ) � � probability density � − 1 2 σ 2 ( θ − θ pref − 2 π n ) 2 • Wrapped Gaussian r 0 + r max exp P(n|s 0 ) P(n|s 1 ) n � � • von Mises (“circular Gaussian”) r 0 + r max exp κ cos ( θ − θ pref ) response ROC curves ROC curves probability density probability density P(n|s 0 ) P(n|s 1 ) P(n|s 0 ) P(n|s 1 ) response response 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 hit rate hit rate 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 false alarm rate false alarm rate

Summary measures Continuous estimation Now consider a (one dimensional) stimulus that takes on continuous values (e.g. angle). • contrast • area under the ROC curve • orientation – given n 1 ∼ P ( n | s 1 ) and n 0 ∼ P ( n | s 0 ) , this equals P ( n 1 > n 0 ) • motion direction • movement speed • discriminability d ′ – for equal variance Gaussians d ′ = µ 1 − µ 0 . σ – for any threshold d ′ = Φ − 1 (1 − FA ) − Φ − 1 (1 − HR ) where Φ is a standard normal Suppose a neuron fires n spikes in response to stimulus s according to some distri- cdf. bution – definition unclear for non-Gaussian distributions. P ( n | f ( s ) ∆ ) Given an observation of n , how well can we estimate s ? Continuous estimation Continuous estimation Now, Taylor expand the KL divergence in s around s ∗ : Useful to consider a limit given N → ∞ measurements n i all generated by the same � � stimulus s ∗ . P ( n | s ∗ ) � P ( n | s ) KL � � � � The posterior over s is log P ( n | s ∗ ) = − log P ( n | s ) n | s ∗ + n | s ∗ � d log P ( n | s ) � � d 2 log P ( n | s ) � � � � � � � s ∗ − 1 � log P ( s |{ n i } ) = log P ( n i | s ) + log P ( s ) − log Z ( { n i } ) log P ( n | s ∗ ) n | s ∗ − ( s − s ∗ ) 2( s − s ∗ ) 2 = − � � s ∗ + . . . ds ds 2 s ∗ s ∗ i � � log P ( n | s ∗ ) + Taking N → ∞ we have n | s ∗ � d 2 log P ( n | s ) � � = − 1 � 2( s − s ∗ ) 2 � � � s ∗ + . . . 1 n | s ∗ + 0 − log Z ( s ∗ ) ds 2 s ∗ N log P ( s |{ n i } ) → log P ( n | s ) = 1 2( s − s ∗ ) 2 J ( s ∗ ) + . . . and so P ( s |{ n i } ) → e N � log P ( n | s ) � n | s ∗ / Z So in asymptopia, the posterior → N ( s ∗ , 1 / J ( s ∗ )) . J ( s ∗ ) is called the Fisher Information . = e − N KL [ P ( n | s ∗ ) � P ( n | s ) ] / Z � d 2 log P ( n | s ) � �� d log P ( n | s ) � 2 � � � � � J ( s ∗ ) = − � s ∗ = � ds 2 ds s ∗ s ∗ s ∗ (Note: Z is being redefined as we go, but never depends on s ) (You will show that these are identical in the homework.)

Cram´ er-Rao bound Fisher Info and tuning curves n = r ∆ + noise ; r = f ( s ) ⇒ �� d � 2 � � � The Fisher Information is important even outside the large data limit due to a deeper J ( s ∗ ) = � s ∗ log P ( n | s ) ds result that is due to Cram´ er and Rao. s ∗ �� d � 2 � � � f ( s ∗ ) log P ( n | r ∆ ) ∆ f ′ ( s ∗ ) � = dr ∆ This states that for any N , any unbiased estimator ˆ s ( { n i } ) of s will have the property s ∗ that = J noise ( r ∆ ) ∆ 2 f ′ ( s ∗ ) 2 f(s) � s ( { n i } ) − s ∗ ) 2 � 1 J(s) (ˆ n i | s ∗ ≥ J ( s ∗ ) . firing rate / Fisher info Thus, Fisher Information gives a lower bound on the variance of any unbiased estimator. This is called the Cram´ er-Rao bound. � s ( { n i } ) − s ∗ ) 2 � n i | s ∗ ≥ (1 + b ′ ( s ∗ )) 2 [For estimators with bias b ( s ∗ ) = � ˆ s ( { n i } ) − s ∗ � the bound is + b 2 ( s ∗ ) ] (ˆ J ( s ∗ ) The Fisher Information will be our primary tool to quantify the performance of a population code. s Fisher info for Poisson neurons Coding a continuous variable For Poisson neurons Scalar coding Labelled Line P ( n | r ∆ ) = e − r ∆ ( r ∆ ) n n ! so �� d � 2 � � firing rate firing rate � J noise [ r ∆ ] = � r ∗ ∆ log P ( n | r ∆ ) dr ∆ �� d s ∗ � 2 � � � � = r ∗ ∆ − r ∆ + n log r ∆ − log n ! dr ∆ s s s ∗ �� 2 � Distributed encoding − 1 + n / r ∗ ∆ = s ∗ � ( n − r ∗ ∆ ) 2 � = ( r ∗ ∆ ) 2 s ∗ r ∗ ∆ ( r ∗ ∆ ) 2 = 1 firing rate firing rate [ not surprising! � r ∗ ∆ = n and V ar [ n ] = r ∗ ∆ ] = r ∗ ∆ and, referred back to the stimulus value: J [ s ∗ ] = f ′ ( s ∗ ) 2 ∆ / f ( s ∗ ) s s

Coding a continuous variable Coding in multiple dimensions Cartesian Multi-D distributed All of these schemes have been found in biological systems. Issues: s 2 s 2 1. redundancy and robustness (not scalar) 2. efficiency/resolution (not labelled line) 3. local computation (not scalar or scalar distributed) 4. multiple values (not scalar) s 1 s 1 • efficient • represent multiple values • problems with multiple values • may require more neurons Cricket cercal system Motor cortex (simplified) c T 1 c 2 = 0 Cosine tuning, randomly distributed preferred directions. r a ( s ) = r max [cos( θ − θ a )] + = r max [ c T a v ] + c 3 = − c 1 a a In general, population vector decoding works for c 4 = − c 2 • cosine tuning r a = r a / r max So, writing ˜ : a � ˜ � � � • cartesian or dense ( tight ) directions c T r 1 − ˜ r 3 1 = v c T But: r 2 − ˜ ˜ r 4 � ˜ 2 � � r 1 − ˜ r 3 • is it optimal? v = ( c 1 c 2 ) r 1 c 1 − ˜ r 3 c 3 + ˜ r 2 c 2 − ˜ r 4 c 4 = r a c a = ˜ ˜ ˜ r 2 − ˜ r 4 • does it generalise? (Gaussian tuning curves) a • how accurate is it? This is called population vector decoding.

Bayesian decoding Bayesian decoding Take n a ∼ Poisson [ f a ( s ) ∆ ] , independently for different cells. Then � e − f a ( s ) ∆ ( f a ( s ) ∆ ) n a P ( n | s ) = n a ! Now, consider f a ( s ) = e − ( s − s a ) 2 / 2 σ 2 , so f ′ a ( s ) = − ( s − s a ) /σ 2 e − ( s − s a ) 2 / 2 σ 2 a and set the derivative to 0: and � � n a ( s − s a ) /σ 2 = 0 log P ( s | n ) = − f a ( s ) ∆ + n a log ( f a ( s ) ∆ ) − log n a ! + log P ( s ) a � a a n a s a � Assume � s MAP = ˆ a f a ( s ) is independent of s for a homogeneous population, and prior is a n a flat. So the MAP estimate is a population average of preferred directions. Not exactly a � ds log P ( s | n ) = d d population vector. n a log ( f a ( s ) ∆ ) ds a � n a f a ( s ) ∆ f ′ = a ( s ) ∆ a Population Fisher Info Optimal tuning properties A considerable amount of work has been done in recent years on finding optimal properties of tuning curves for rate-based population codes. Here, we reproduce Fisher Informations for independent random variates add: one such argument (from Zhang and Sejnowski, 1999). � � − d 2 Consider a population of cells that codes the value of a D dimensional stimulus, ds 2 log P ( n | s ) J n ( s ) = s . Let the a th cell emit r spikes in an interval τ with probability distribution that is � � conditionally independent of the other cells (given s ) and has the form � − d 2 = log P ( n a | s ) P a ( r | s , τ ) = S ( r , f a ( s ) , τ ) . ds 2 a � � � � − d 2 The tuning curve of the a th cell, f a ( s ) , has the form = ds 2 log P ( n a | s ) = J n a ( s ) . a a � D � � ( ξ a ) 2 � i = s i − c a f ′ a ( s ) 2 ( ξ a ) 2 = f a ( s ) = F · φ ( ξ a i ) 2 ; ξ a i ; , = ∆ [for Poisson cells] σ f a ( s ) i a where F is a maximal rate and the function φ is monotically decreasing. The param- eters c a and σ give the centre of the a th tuning curve and the (common) width.

Population Coding High dimensionality (cells stimulus time). - PowerPoint PPT Presentation

Population codes Population Coding High dimensionality (cells stimulus time). usually limited to simple rate codes. even prosthetic work assumes instantaneous (lagged) coding Limited empirical data Peter Latham / Maneesh

Formal Modeling in Cognitive Science 1 Coding Theorems Lecture 28: Kraft Inequality; Source Coding

Image and Video Coding: Video Coding Extensions Screen Content Coding Screen Content Coding

ADVANCED MULTIMEDIA ADVANCED MULTIMEDIA CODING CODING Fernando Pereira Instituto Superior

Dynamical systems Expanding maps on the circle. Coding Jana Rodriguez Hertz ICTP 2018 coding

Population Ecology 1. Population Concepts 2. Population Growth 3. Regulation of Population

Risk-Based Coding and Reimbursement What is Risk-Based Coding? Risk-Based Coding Overview A

Entropy Coding Definition of Entropy Three Entropy coding techniques: (taken from the

Coding and Applications in Sensor Networks Coding and Applications in Sensor Networks Why coding?

Applications of Random Coding and Algebraic Coding Theories to Universal Lossless Source Coding

Coding and Applications in Sensor Networks Why coding? Information compression

CODING: ICD-10 CODING & UB-04 CODING FOR PDPM NELIA ADACI RN, BSN CDONA, DNS-CT, RAC-CTA

Lecture 5 Lossless Coding (II) May 20, 2009 Shujun LI ( ): INF-10845-20091 Multimedia

Lecture 11 Vector Linear Network Coding Vector Linear Network Coding Outline Fundamentals for

Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jrgen

Image and Video Coding: Hybrid Video Coding s n 1 [ x , y ] s n [ x , y ] m k = ( m x , m

VIDEO SIGNALS Lossless coding g LOSSLESS CODING LOSSLESS CODING The goal of lossless image

Cases of COVID-19 per 100,000 Population for Select Races/Ethnicities, Apr. 29 Aug. 30, 2020,

Contents Planning vs. problem solving 1 Foundations of Artificial Intelligence Planning in the

KELLOGG COMPANY 2019 Q2 EARNINGS August 1, 2019 KELLOGG COMPANY | Q2 2019 EARNINGS | August 1,

Classical Planning Partial-Order Planning Sections 10.1,10.4.4 Nilufer Onder Department

Hypothesis Tests using Z.TEST function in Excel 2008 V1c 11/16/2012 Hypothesis Tests [Excel

M8S1 - Regression Inference Professor Jarad Niemi STAT 226 - Iowa State University November 29,

Hypotheses MPM1D: Principles of Mathematics Much of the time, researchers spend their time trying

Business Statistics CONTENTS A hypothesis test Hypotheses Rejection region and significance

Population Coding High dimensionality (cells stimulus time). - PowerPoint PPT Presentation

Population codes Population Coding High dimensionality (cells stimulus time). usually limited to simple rate codes. even prosthetic work assumes instantaneous (lagged) coding Limited empirical data Peter Latham / Maneesh

Formal Modeling in Cognitive Science 1 Coding Theorems Lecture 28: Kraft Inequality; Source Coding

Image and Video Coding: Video Coding Extensions Screen Content Coding Screen Content Coding

ADVANCED MULTIMEDIA ADVANCED MULTIMEDIA CODING CODING Fernando Pereira Instituto Superior

Dynamical systems Expanding maps on the circle. Coding Jana Rodriguez Hertz ICTP 2018 coding

Population Ecology 1. Population Concepts 2. Population Growth 3. Regulation of Population

Risk-Based Coding and Reimbursement What is Risk-Based Coding? Risk-Based Coding Overview A

Entropy Coding Definition of Entropy Three Entropy coding techniques: (taken from the

Coding and Applications in Sensor Networks Coding and Applications in Sensor Networks Why coding?

Applications of Random Coding and Algebraic Coding Theories to Universal Lossless Source Coding

Coding and Applications in Sensor Networks Why coding? Information compression

CODING: ICD-10 CODING &amp; UB-04 CODING FOR PDPM NELIA ADACI RN, BSN CDONA, DNS-CT, RAC-CTA

Lecture 5 Lossless Coding (II) May 20, 2009 Shujun LI ( ): INF-10845-20091 Multimedia

Lecture 11 Vector Linear Network Coding Vector Linear Network Coding Outline Fundamentals for

Speech &amp; Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jrgen

Image and Video Coding: Hybrid Video Coding s n 1 [ x , y ] s n [ x , y ] m k = ( m x , m

VIDEO SIGNALS Lossless coding g LOSSLESS CODING LOSSLESS CODING The goal of lossless image

Cases of COVID-19 per 100,000 Population for Select Races/Ethnicities, Apr. 29 Aug. 30, 2020,

Contents Planning vs. problem solving 1 Foundations of Artificial Intelligence Planning in the

KELLOGG COMPANY 2019 Q2 EARNINGS August 1, 2019 KELLOGG COMPANY | Q2 2019 EARNINGS | August 1,

Classical Planning Partial-Order Planning Sections 10.1,10.4.4 Nilufer Onder Department

Hypothesis Tests using Z.TEST function in Excel 2008 V1c 11/16/2012 Hypothesis Tests [Excel

M8S1 - Regression Inference Professor Jarad Niemi STAT 226 - Iowa State University November 29,

Hypotheses MPM1D: Principles of Mathematics Much of the time, researchers spend their time trying

Business Statistics CONTENTS A hypothesis test Hypotheses Rejection region and significance

CODING: ICD-10 CODING & UB-04 CODING FOR PDPM NELIA ADACI RN, BSN CDONA, DNS-CT, RAC-CTA

Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jrgen