Information Theory & the Efficient Coding Hypothesis Jonathan - PowerPoint PPT Presentation

Information Theory & the Efficient Coding Hypothesis Jonathan Pillow Mathematical Tools for Neuroscience (NEU 314) Spring, 2016 lecture 19

Information Theory A mathematical theory of communication , Claude Shannon 1948 • Entropy • Conditional Entropy • Mutual Information • Data Processing Inequality • Efficient Coding Hypothesis (Barlow 1961)

Entropy averaged “surprise” over p(x) of x • average “surprise” of viewing a sample from p(x) • number of “yes/no” questions needed to identify x (on average) for distribution on K bins, • maximum entropy = log K (achieved by uniform dist) • minimum entropy = 0 (achieved by all probability in 1 bin)

Entropy

aside: log-likelihood and entropy model: entropy H: How would we compute a Monte Carlo estimate of this? draw samples: for i = 1,…,N compute average: log-likelihood • Neg Log likelihood = Monte Carlo estimate for entropy! • maximizing likelihood ⇒ minimizing entropy of P(x| θ )

Conditional Entropy � � H ( x | y ) = p ( y ) p ( x | y ) log p ( x | y ) − y x � averaged entropy of x given over p(y) some fixed value of y

Conditional Entropy � � H ( x | y ) = p ( y ) p ( x | y ) log p ( x | y ) − y x � averaged entropy of x given � � over p(y) some fixed value of y � = p ( x, y ) log p ( x | y ) − x,y if “On average, how uncertain are you about x if you know y ?”

Mutual Information total entropy in X minus conditional entropy of X given Y total entropy in Y minus conditional entropy of Y given X sum of entropies minus joint entropy “How much does X tell me about Y (or vice versa)?” “How much is your uncertainty about X reduced from knowing Y?”

Venn diagram of entropy and information

Data Processing Inequality Suppose form a Markov chain, that is Then necessarily: • in other words, we can only lose information during processing

Barlow 1961 Efficient Coding Hypothesis: Atick & Redlich 1990 • goal of nervous system: maximize information about environment (one of the core “big ideas” in theoretical neuroscience) mutual information redundancy: channel capacity

Barlow 1961 Efficient Coding Hypothesis: Atick & Redlich 1990 • goal of nervous system: maximize information about environment (one of the core “big ideas” in theoretical neuroscience) mutual information redundancy: channel capacity mutual information: • avg # yes/no questions you can answer about x given y (“bits”) response entropy “noise” entropy channel capacity: • upper bound on mutual information • determined by physical properties of encoder

Barlow 1961 Barlow’s original version: Atick & Redlich 1990 mutual information redundancy: mutual information: if responses are noiseless response entropy “noise” entropy

Barlow 1961 Barlow’s original version: Atick & Redlich 1990 response entropy redundancy: mutual information: noiseless system response entropy “noise” entropy brain should maximize response entropy • use full dynamic range • decorrelate (“reduce redundancy”) • mega impact: huge number of theory and experimental papers focused on decorrelation / information-maximizing codes in the brain

basic intuition natural image nearby pixels exhibit strong dependencies neural representation pixels desired 256 100 neural response i+1 encoding pixel i+1 128 50 0 0 0 128 256 0 50 100 neural response i pixel i

Example: single neuron encoding stimuli from a distribution P(x) stimulus prior noiseless, discrete (with constraint on range of y values) encoding

Application Example: single neuron encoding stimuli from a distribution P(x) stimulus prior noiseless, discrete (with constraint on range of y values) encoding 0.5 Gaussian prior p(x) 0.25 0 response distribution −3 0 3 x 20 output level y p(y) 10 cdf 0 0 0 10 20 −3 0 3 output level y x

Laughlin 1981: blowfly light response • first major validation of Barlow’s theory cdf of light level response data

summary • entropy • negative log-likelihood / N • conditional entropy • mutual information • data processing inequality • efficient coding hypothesis (Barlow)   - neurons should “maximize their dynamic range”   - multiple neurons: marginally independent responses • direct method for estimating mutual information from data

Information Theory & the Efficient Coding Hypothesis Jonathan - PowerPoint PPT Presentation

Information Theory & the Efficient Coding Hypothesis Jonathan Pillow Mathematical Tools for Neuroscience (NEU 314) Spring, 2016 lecture 19 Information Theory A mathematical theory of communication , Claude Shannon 1948 Entropy

Formal Modeling in Cognitive Science 1 Coding Theorems Lecture 28: Kraft Inequality; Source Coding

Information Theory Slides Jonathan Pillow Barlows Efficient Coding Hypothesis Barlow

An Introduction to Empirical Support of Efficient Market Hypothesis Behavioral Finance

Image and Video Coding: Video Coding Extensions Screen Content Coding Screen Content Coding

ADVANCED MULTIMEDIA ADVANCED MULTIMEDIA CODING CODING Fernando Pereira Instituto Superior

Dynamical systems Expanding maps on the circle. Coding Jana Rodriguez Hertz ICTP 2018 coding

Cluster Validity Hypothesis Random Graph Hypothesis Random Label Hypothesis Relative Criteria

Efficient Network Coding in Planar Multicast Networks Tang Xiahou Department of Computer Science

Coding and Applications in Sensor Networks Why coding? Information compression

Overview Coding and Information Theory What is information theory? Entropy Coding Chris

Coding and Applications in Sensor Networks Coding and Applications in Sensor Networks Why coding?

An Introduction to (Network) Coding Theory Anna-Lena Horlemann-Trautmann University of St.

Risk-Based Coding and Reimbursement What is Risk-Based Coding? Risk-Based Coding Overview A

Entropy Coding Definition of Entropy Three Entropy coding techniques: (taken from the

Applications of Random Coding and Algebraic Coding Theories to Universal Lossless Source Coding

6.16.4 Hypothesis tests Prof. Tesler Math 186 Winter 2019 Prof. Tesler 6.16.4 Hypothesis

The Optimality of when there is resale? Being Efficient Our Conclusion Lawrence Ausubel and

Simulation of 802.11 PHY/MAC: the Quest for Accuracy and Efficiency Michele Segata Renato Lo

Efficiency Tricks for Neural Nets Graham Neubig Site https://phontron.com/class/nn4nlp2020/

Econ 551 Government Finance: Revenues Fall 2019 Given by Kevin Milligan Vancouver School of

Distributed, Egocentric Representations of Graphs for Detecting Critical Structures Ruo-Chun Tzeng

Distributed, Egocentric Representations of Graphs for Detecting Critical Structures Ruo-Chun Tzeng

Egocentric Analysis of Dynamic Networks with EgoLines Jian Zhao, Michael Glueck, Fanny Chevalier,

An Automated Social Graph De-anonymization Technique Kumar Sharad 1 George Danezis 2 1 2