Information Theory Slides Jonathan Pillow Barlows Efficient Coding - - PowerPoint PPT Presentation
Information Theory Slides Jonathan Pillow Barlows Efficient Coding - - PowerPoint PPT Presentation
Information Theory Slides Jonathan Pillow Barlows Efficient Coding Hypothesis Barlow 1961 Efficient Coding Hypothesis: Atick & Redlich 1990 goal of nervous system: maximize information about environment (one of the core
Barlow’s “Efficient Coding Hypothesis”
Efficient Coding Hypothesis:
mutual information channel capacity redundancy:
- goal of nervous system: maximize information about environment
(one of the core “big ideas” in theoretical neuroscience)
Barlow 1961 Atick & Redlich 1990
Efficient Coding Hypothesis:
Barlow 1961 Atick & Redlich 1990
mutual information channel capacity redundancy: channel capacity:
- upper bound on mutual information
- determined by physical properties of encoder
mutual information:
- avg # yes/no questions you can
answer about x given y (“bits”)
- entropy:
“noise” entropy response entropy
- goal of nervous system: maximize information about environment
(one of the core “big ideas” in theoretical neuroscience)
Barlow’s original version:
mutual information redundancy: mutual information:
response entropy “noise” entropy
if responses are noiseless
Barlow 1961 Atick & Redlich 1990
Barlow’s original version:
response entropy redundancy: mutual information:
“noise” entropy
noiseless system brain should maximize response entropy
- use full dynamic range
- decorrelate (“reduce redundancy”)
- mega impact: huge number of theory and experimental papers focused
- n decorrelation / information-maximizing codes in the brain
Barlow 1961 Atick & Redlich 1990
response entropy
basic intuition
natural image nearby pixels exhibit strong dependencies
neural response i neural response i+1
neural representation desired encoding
pixel i pixel i+1
pixels
stimulus prior noiseless, discrete encoding
Gaussian prior Q: what solution for infomax? Application Example: single neuron encoding stimuli from a distribution P(x)
cdf
stimulus prior noiseless, discrete encoding infomax
Gaussian prior Q: what solution for infomax? A: histogram-equalization Application Example: single neuron encoding stimuli from a distribution P(x)
response data
Laughlin 1981: blowfly light response
cdf of light level
- first major validation of Barlow’s theory
luminance-dependent receptive fields
Atick & Redlich 1990 - extended theory to noisy responses
weighting space High SNR (“whitening” / decorrelating) Middle SNR (partial whitening) Low SNR (averaging / correlating)
estimating entropy and MI from data
- 1. the “direct method”
repeated stimulus raster
(Strong et al 1998)
- fix bin size Δ
- fix word length N
samples from 001 010 010 110 ... estimate eg, Δ=10ms,N=3 23=8 possible words
(Strong et al 1998)
i.e., from histogram-based estimate of probabilities p(R|Sj), then H = -∑P log P
repeated stimulus raster
- fix bin size Δ
- fix word length N
samples from 001 010 010 110 ... estimate eg, Δ=10ms,N=3 23=8 possible words
average over all blocks of size N
Estimate is:
all words
- 1. the “direct method”
(Strong et al 1998)
repeated stimulus raster psth Information per spike: mean rate
- equal to the information
carried by an inhomogeneous Poisson process
- 2. “single-spike information”
(Brenner et al 2000)
derivation of single-spike information
entropy of Unif([0 T]) entropy of p(tsp|stim)
normalized psth mean rate
derivation of single-spike information
mean rate
entropy of Unif([0 T])
normalized psth
entropy of p(tsp|stim)
So far we have focused on the formulation: Decoding-based approaches focus on the alternative version:
- 3. decoding-based methods
Suppose we have decoder to estimate the stimulus from spikes:
(e.g., MAP, or Optimal Linear Estimator): Stimulus Response Decoder Covariance of residual errors entropy of a Gaussian with cov = cov(residual errors) (Maximum Entropy distribution with this covariance)
Data Processing Inequality:
Bound #1 Bound #2
- 3. decoding-based methods