population coding
play

Population Coding High dimensionality (cells stimulus time). - PowerPoint PPT Presentation

Population codes Population Coding High dimensionality (cells stimulus time). usually limited to simple rate codes. even prosthetic work assumes instantaneous (lagged) coding Limited empirical data Peter Latham / Maneesh


  1. Population codes Population Coding • High dimensionality (cells × stimulus × time). – usually limited to simple rate codes. – even prosthetic work assumes instantaneous (lagged) coding • Limited empirical data Peter Latham / Maneesh Sahani – can record 10s - 100s of neurons. – population size more like 10 4 - 10 6 . Gatsby Computational Neuroscience Unit – theoretical inferences, based on single-cell and aggregate (fMRI, LFP , optical) University College London measurements . Term 1, Autumn 2013 Common approach Rate coding In the rate coding context, we imagine that the firing rate of a cell r represents a The most common sort of questions asked of population codes: single (possibly multidimensional) stimulus value s at any one time: • given assumed encoding functions, how well can we (or downstream areas) de- r = f ( s ) . code the encoded stimulus value? • what encoding schemes would be optimal, in the sense of allowing decoders to estimate stimulus values as well as possible. Even if s and r are embedded in time-series we assume: 1. that coding is instantaneous (with a fixed lag), 2. that r (and therefore s ) is constant over a short time ∆ . Before considering populations, we need to formulate some ideas about rate coding The actual number of spikes n produced in ∆ is then taken to be distributed around in the context of single cells. r ∆ , often according to a Poisson distribution.

  2. Tuning curves Measuring the performance of rate codes: Discrete choice Suppose we want to make a binary choice based on firing rate: The function f ( s ) is known as a tuning curve. • present / absent (signal detection) Commonly assumed forms: • up / down � � • horizontal / vertical − 1 2 σ 2 ( x − x pref ) 2 • Gaussian r 0 + r max exp Call one potential stimulus s 0 , the other s 1 . P ( n | s ) : • Cosine r 0 + r max cos ( θ − θ pref ) � � probability density � − 1 2 σ 2 ( θ − θ pref − 2 π n ) 2 • Wrapped Gaussian r 0 + r max exp P(n|s 0 ) P(n|s 1 ) n � � • von Mises (“circular Gaussian”) r 0 + r max exp κ cos ( θ − θ pref ) response ROC curves ROC curves probability density probability density P(n|s 0 ) P(n|s 1 ) P(n|s 0 ) P(n|s 1 ) response response 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 hit rate hit rate 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 false alarm rate false alarm rate

  3. Summary measures Continuous estimation Now consider a (one dimensional) stimulus that takes on continuous values (e.g. angle). • contrast • area under the ROC curve • orientation – given n 1 ∼ P ( n | s 1 ) and n 0 ∼ P ( n | s 0 ) , this equals P ( n 1 > n 0 ) • motion direction • movement speed • discriminability d ′ – for equal variance Gaussians d ′ = µ 1 − µ 0 . σ – for any threshold d ′ = Φ − 1 (1 − FA ) − Φ − 1 (1 − HR ) where Φ is a standard normal Suppose a neuron fires n spikes in response to stimulus s according to some distri- cdf. bution – definition unclear for non-Gaussian distributions. P ( n | f ( s ) ∆ ) Given an observation of n , how well can we estimate s ? Continuous estimation Continuous estimation Now, Taylor expand the KL divergence in s around s ∗ : Useful to consider a limit given N → ∞ measurements n i all generated by the same � � stimulus s ∗ . P ( n | s ∗ ) � P ( n | s ) KL � � � � The posterior over s is log P ( n | s ∗ ) = − log P ( n | s ) n | s ∗ + n | s ∗ � d log P ( n | s ) � � d 2 log P ( n | s ) � � � � � � � s ∗ − 1 � log P ( s |{ n i } ) = log P ( n i | s ) + log P ( s ) − log Z ( { n i } ) log P ( n | s ∗ ) n | s ∗ − ( s − s ∗ ) 2( s − s ∗ ) 2 = − � � s ∗ + . . . ds ds 2 s ∗ s ∗ i � � log P ( n | s ∗ ) + Taking N → ∞ we have n | s ∗ � d 2 log P ( n | s ) � � = − 1 � 2( s − s ∗ ) 2 � � � s ∗ + . . . 1 n | s ∗ + 0 − log Z ( s ∗ ) ds 2 s ∗ N log P ( s |{ n i } ) → log P ( n | s ) = 1 2( s − s ∗ ) 2 J ( s ∗ ) + . . . and so P ( s |{ n i } ) → e N � log P ( n | s ) � n | s ∗ / Z So in asymptopia, the posterior → N ( s ∗ , 1 / J ( s ∗ )) . J ( s ∗ ) is called the Fisher Information . = e − N KL [ P ( n | s ∗ ) � P ( n | s ) ] / Z � d 2 log P ( n | s ) � �� d log P ( n | s ) � 2 � � � � � J ( s ∗ ) = − � s ∗ = � ds 2 ds s ∗ s ∗ s ∗ (Note: Z is being redefined as we go, but never depends on s ) (You will show that these are identical in the homework.)

  4. Cram´ er-Rao bound Fisher Info and tuning curves n = r ∆ + noise ; r = f ( s ) ⇒ �� d � 2 � � � The Fisher Information is important even outside the large data limit due to a deeper J ( s ∗ ) = � s ∗ log P ( n | s ) ds result that is due to Cram´ er and Rao. s ∗ �� d � 2 � � � f ( s ∗ ) log P ( n | r ∆ ) ∆ f ′ ( s ∗ ) � = dr ∆ This states that for any N , any unbiased estimator ˆ s ( { n i } ) of s will have the property s ∗ that = J noise ( r ∆ ) ∆ 2 f ′ ( s ∗ ) 2 f(s) � s ( { n i } ) − s ∗ ) 2 � 1 J(s) (ˆ n i | s ∗ ≥ J ( s ∗ ) . firing rate / Fisher info Thus, Fisher Information gives a lower bound on the variance of any unbiased esti- mator. This is called the Cram´ er-Rao bound. � s ( { n i } ) − s ∗ ) 2 � n i | s ∗ ≥ (1 + b ′ ( s ∗ )) 2 [For estimators with bias b ( s ∗ ) = � ˆ s ( { n i } ) − s ∗ � the bound is + b 2 ( s ∗ ) ] (ˆ J ( s ∗ ) The Fisher Information will be our primary tool to quantify the performance of a population code. s Fisher info for Poisson neurons Coding a continuous variable For Poisson neurons Scalar coding Labelled Line P ( n | r ∆ ) = e − r ∆ ( r ∆ ) n n ! so �� d � 2 � � firing rate firing rate � J noise [ r ∆ ] = � r ∗ ∆ log P ( n | r ∆ ) dr ∆ �� d s ∗ � 2 � � � � = r ∗ ∆ − r ∆ + n log r ∆ − log n ! dr ∆ s s s ∗ �� � 2 � Distributed encoding − 1 + n / r ∗ ∆ = s ∗ � ( n − r ∗ ∆ ) 2 � = ( r ∗ ∆ ) 2 s ∗ r ∗ ∆ ( r ∗ ∆ ) 2 = 1 firing rate firing rate [ not surprising! � r ∗ ∆ = n and V ar [ n ] = r ∗ ∆ ] = r ∗ ∆ and, referred back to the stimulus value: J [ s ∗ ] = f ′ ( s ∗ ) 2 ∆ / f ( s ∗ ) s s

  5. Coding a continuous variable Coding in multiple dimensions Cartesian Multi-D distributed All of these schemes have been found in biological systems. Issues: s 2 s 2 1. redundancy and robustness (not scalar) 2. efficiency/resolution (not labelled line) 3. local computation (not scalar or scalar distributed) 4. multiple values (not scalar) s 1 s 1 • efficient • represent multiple values • problems with multiple values • may require more neurons Cricket cercal system Motor cortex (simplified) c T 1 c 2 = 0 Cosine tuning, randomly distributed preferred directions. r a ( s ) = r max [cos( θ − θ a )] + = r max [ c T a v ] + c 3 = − c 1 a a In general, population vector decoding works for c 4 = − c 2 • cosine tuning r a = r a / r max So, writing ˜ : a � ˜ � � � • cartesian or dense ( tight ) directions c T r 1 − ˜ r 3 1 = v c T But: r 2 − ˜ ˜ r 4 � ˜ 2 � � r 1 − ˜ r 3 • is it optimal? v = ( c 1 c 2 ) r 1 c 1 − ˜ r 3 c 3 + ˜ r 2 c 2 − ˜ r 4 c 4 = r a c a = ˜ ˜ ˜ r 2 − ˜ r 4 • does it generalise? (Gaussian tuning curves) a • how accurate is it? This is called population vector decoding.

  6. Bayesian decoding Bayesian decoding Take n a ∼ Poisson [ f a ( s ) ∆ ] , independently for different cells. Then � e − f a ( s ) ∆ ( f a ( s ) ∆ ) n a P ( n | s ) = n a ! Now, consider f a ( s ) = e − ( s − s a ) 2 / 2 σ 2 , so f ′ a ( s ) = − ( s − s a ) /σ 2 e − ( s − s a ) 2 / 2 σ 2 a and set the derivative to 0: and � � n a ( s − s a ) /σ 2 = 0 log P ( s | n ) = − f a ( s ) ∆ + n a log ( f a ( s ) ∆ ) − log n a ! + log P ( s ) a � a a n a s a � Assume � s MAP = ˆ a f a ( s ) is independent of s for a homogeneous population, and prior is a n a flat. So the MAP estimate is a population average of preferred directions. Not exactly a � ds log P ( s | n ) = d d population vector. n a log ( f a ( s ) ∆ ) ds a � n a f a ( s ) ∆ f ′ = a ( s ) ∆ a Population Fisher Info Optimal tuning properties A considerable amount of work has been done in recent years on finding optimal properties of tuning curves for rate-based population codes. Here, we reproduce Fisher Informations for independent random variates add: one such argument (from Zhang and Sejnowski, 1999). � � − d 2 Consider a population of cells that codes the value of a D dimensional stimulus, ds 2 log P ( n | s ) J n ( s ) = s . Let the a th cell emit r spikes in an interval τ with probability distribution that is � � conditionally independent of the other cells (given s ) and has the form � − d 2 = log P ( n a | s ) P a ( r | s , τ ) = S ( r , f a ( s ) , τ ) . ds 2 a � � � � − d 2 The tuning curve of the a th cell, f a ( s ) , has the form = ds 2 log P ( n a | s ) = J n a ( s ) . a a � D � � ( ξ a ) 2 � i = s i − c a f ′ a ( s ) 2 ( ξ a ) 2 = f a ( s ) = F · φ ( ξ a i ) 2 ; ξ a i ; , = ∆ [for Poisson cells] σ f a ( s ) i a where F is a maximal rate and the function φ is monotically decreasing. The param- eters c a and σ give the centre of the a th tuning curve and the (common) width.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend