information theory the efficient coding hypothesis
play

Information Theory & the Efficient Coding Hypothesis Jonathan - PowerPoint PPT Presentation

Information Theory & the Efficient Coding Hypothesis Jonathan Pillow Mathematical Tools for Neuroscience (NEU 314) Spring, 2016 lecture 19 Information Theory A mathematical theory of communication , Claude Shannon 1948 Entropy


  1. Information Theory & the Efficient Coding Hypothesis Jonathan Pillow Mathematical Tools for Neuroscience (NEU 314) Spring, 2016 lecture 19

  2. Information Theory A mathematical theory of communication , Claude Shannon 1948 • Entropy • Conditional Entropy • Mutual Information • Data Processing Inequality • Efficient Coding Hypothesis (Barlow 1961)

  3. Entropy averaged “surprise” over p(x) of x • average “surprise” of viewing a sample from p(x) • number of “yes/no” questions needed to identify x (on average) for distribution on K bins, • maximum entropy = log K (achieved by uniform dist) • minimum entropy = 0 (achieved by all probability in 1 bin)

  4. Entropy

  5. aside: log-likelihood and entropy model: entropy H: How would we compute a Monte Carlo estimate of this? draw samples: for i = 1,…,N compute average: log-likelihood • Neg Log likelihood = Monte Carlo estimate for entropy! • maximizing likelihood ⇒ minimizing entropy of P(x| θ )

  6. Conditional Entropy � � H ( x | y ) = p ( y ) p ( x | y ) log p ( x | y ) − y x � averaged entropy of x given over p(y) some fixed value of y

  7. Conditional Entropy � � H ( x | y ) = p ( y ) p ( x | y ) log p ( x | y ) − y x � averaged entropy of x given � � over p(y) some fixed value of y � = p ( x, y ) log p ( x | y ) − x,y if “On average, how uncertain are you about x if you know y ?”

  8. Mutual Information total entropy in X minus conditional entropy of X given Y total entropy in Y minus conditional entropy of Y given X sum of entropies minus joint entropy “How much does X tell me about Y (or vice versa)?” “How much is your uncertainty about X reduced from knowing Y?”

  9. Venn diagram of entropy and information

  10. Data Processing Inequality Suppose form a Markov chain, that is Then necessarily: • in other words, we can only lose information during processing

  11. Barlow 1961 Efficient Coding Hypothesis: Atick & Redlich 1990 • goal of nervous system: maximize information about environment (one of the core “big ideas” in theoretical neuroscience) mutual information redundancy: channel capacity

  12. Barlow 1961 Efficient Coding Hypothesis: Atick & Redlich 1990 • goal of nervous system: maximize information about environment (one of the core “big ideas” in theoretical neuroscience) mutual information redundancy: channel capacity mutual information: • avg # yes/no questions you can answer about x given y (“bits”) response entropy “noise” entropy channel capacity: • upper bound on mutual information • determined by physical properties of encoder

  13. Barlow 1961 Barlow’s original version: Atick & Redlich 1990 mutual information redundancy: mutual information: if responses are noiseless response entropy “noise” entropy

  14. Barlow 1961 Barlow’s original version: Atick & Redlich 1990 response entropy redundancy: mutual information: noiseless system response entropy “noise” entropy brain should maximize response entropy • use full dynamic range • decorrelate (“reduce redundancy”) • mega impact: huge number of theory and experimental papers focused on decorrelation / information-maximizing codes in the brain

  15. basic intuition natural image nearby pixels exhibit strong dependencies neural representation pixels desired 256 100 neural response i+1 encoding pixel i+1 128 50 0 0 0 128 256 0 50 100 neural response i pixel i

  16. Example: single neuron encoding stimuli from a distribution P(x) stimulus prior noiseless, discrete (with constraint on range of y values) encoding

  17. Application Example: single neuron encoding stimuli from a distribution P(x) stimulus prior noiseless, discrete (with constraint on range of y values) encoding 0.5 Gaussian prior p(x) 0.25 0 response distribution −3 0 3 x 20 output level y p(y) 10 cdf 0 0 0 10 20 −3 0 3 output level y x

  18. Laughlin 1981: blowfly light response • first major validation of Barlow’s theory cdf of light level response data

  19. summary • entropy • negative log-likelihood / N • conditional entropy • mutual information • data processing inequality • efficient coding hypothesis (Barlow) 
 - neurons should “maximize their dynamic range” 
 - multiple neurons: marginally independent responses • direct method for estimating mutual information from data

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend