Computing and Communications 2. Information Theory -Entropy Ying - PowerPoint PPT Presentation

1896 1920 1987 2006 Computing and Communications 2. Information Theory -Entropy Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1

Outline • Entropy • Joint entropy and conditional entropy • Relative entropy and mutual information • Relationship between entropy and mutual information • Chain rules for entropy, relative entropy and mutual information • Jensen’s inequality and its consequences 2

Reference • Elements of information theory, T. M. Cover and J. A. Thomas, Wiley 3

OVERVIEW 4

Information Theory • Information theory answers two fundamental questions in communication theory – what is the ultimate data compression? -- entropy H – what is the ultimate transmission rate of communication? -- channel capacity C • Information theory is considered as a subset of communication theory 5

Information Theory • Information theory has fundamental contributions to other fields 6

A Mathematical Theory of Commun. • In 1948, Shannon published “A Mathematical Theory of Communication”, founding Information Theory • Shannon made two major modifications having huge impact on communication design – the source and channel are modeled probabilistically – bits became the common currency of communication 7

A Mathematical Theory of Commun. • Shannon proved the following three theorems – Theorem 1. Minimum compression rate of the source is its entropy rate H – Theorem 2. Maximum reliable rate over the channel is its mutual information I – Theorem 3. End-to-end reliable communication happens if and only if H < I , i.e. there is no loss in performance by using a digital interface between source and channel coding • Impacts of Shannon’s results – after almost 70 years, all communication systems are designed based on the principles of information theory – the limits not only serve as benchmarks for evaluating communication schemes, but also provide insights on designing good ones – basic information theoretic limits in Shannon’s theorems have now been successfully achieved using efficient algorithms and codes 8

ENTROPY 9

Definition • Entropy is a measure of the uncertainty of a r.v. • Consider discrete r.v. X with alphabet and p.m.f.    ( ) Pr[ ], p x X x x – log is to the base 2, and entropy is expressed in bits • e.g., the entropy of a fair coin toss is 1 bit   – define , since  log 0 as 0 0log0 0 x x x • adding terms of zero probability does not change the entropy 10

Properties – entropy is nonnegative – base of log can be changed 11

Example – H(X)=1 bit when p=0.5 • maximum uncertainty – H(X)=0 bit when p=0 or 1 • minimum uncertainty – concave function of p 12

Example 13

JOINT ENTROPY AND CONDITIONAL ENTROPY 14

Joint Entropy • Joint entropy is a measure of the uncertainty of a pair of r.v.s • Consider a pair of discrete r.v.s (X,Y) with alphabet ，       ， and p.m.f.s ( ) Pr[ ], ( ) Pr[ ], p x X x x p y Y y y 15

Conditional Entropy • Conditional entropy of a r.v. (Y) given another r.v. (X) – expected value of entropies of conditional distributions, averaged over conditioning r.v. 16

Chain Rule 17

Chain Rule 18

Example 19

Example 20

RELATIVE ENTROPY AND MUTUAL INFORMATION 21

Relative Entropy • Relative entropy is a measure of the “distance” between two distributions 0 0 p – convention:     0log 0, 0log 0 and log p 0 0 q – if there is any      such that ( ) 0 and ( ) 0, then ( || ) . x p x q x D p q 22

Example 23

Mutual Information • Mutual information is a measure of the amount of information that one r.v. contains about another r.v. 24

RELATIONSHIP BETWEEN ENTROPY AND MUTUAL INFORMATION 25

Relation 26

Proof 27

Illustration 28

CHAIN RULES FOR ENTROPY, RELATIVE ENTROPY, AND MUTUAL INFORMATION 29

Chain Rule for Entropy 30

Proof 31

Alternative Proof 32

Chain Rule for Information 33

Proof 34

Chain Rule for Relative Entropy 35

Proof 36

JENSEN'S INEQUALITY AND ITS CONSEQUENCES 37

Convex & Concave Functions • Examples: x  2 x convex functions: , | |, , log (for 0) x x e x x x  concave functions: log and (for 0) x x  linear functions are both convex and concave ax b 38

Convex & Concave Functions 39

Jensen’s Inequality 40

Information Inequality 41

Proof 42

Nonnegativity of Mutual Information 43

Max. Entropy Dist. – Uniform Dist. 44

Conditioning Reduces Entropy 45

Independence Bound on Entropy 46

Summary 47

Summary 48

Summary 49

cuiying@sjtu.edu.cn iwct.sjtu.edu.cn/Personal/yingcui 50

Computing and Communications 2. Information Theory -Entropy Ying - PowerPoint PPT Presentation

1896 1920 1987 2006 Computing and Communications 2. Information Theory -Entropy Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1 Outline Entropy Joint entropy and conditional

Entropy, Relative Entropy, Cross Entropy Entropy Entropy, H(x) is a measure of the uncertainty of

Formal Modeling in Cognitive Science Lecture 25: Entropy, Joint Entropy, Conditional Entropy 1

Chapter 2 Entropy, Relative Entropy, and Mutual Infor- mation Peng-Hua Wang Graduate Institute

1) Entropy = measure of randomness 2) Entropy = measure of compressibility More random = Less

Entropy Coding Definition of Entropy Three Entropy coding techniques: (taken from the

Information Theory Lecture 1 Course introduction Entropy, relative entropy and mutual

Entropy and Shannon information Entropy and Shannon information For a random variable X with

Topological entropy and algebraic entropy on locally compact abelian groups - The Bridge Theorem

Probabilistic Models of Human Sentence Experiment 1: Entropy and Sentence Length 2 Processing

Entropy and The Second Law of Thermodynamics Entropy (S)

Entropy Change in Entropy Reversible Isobaric Process Ideal Gas in a Reversible Process Free

Road detection via entropy By Anna Zaidman 1 1 What is entropy? Entropy is a mathematically

Orc David Schleef Entropy Wave Inc (c) 2009 Entropy Wave Inc What is Orc A system for

Information & Entropy Comp 595 DM Professor Wang Information & Entropy Information

Huffman Encoding 13-Oct-11 Entropy Entropy is a measure of information content: the number of

Infotheory for Statistics and Learning Lecture 1 Entropy Relative entropy Mutual

Optimization for Machine Learning Lecture 1: Introduction to Convexity S.V . N. (vishy)

Convex Functions (II) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline The

Application of K ahler manifold to signal processing and Bayesian inference Jaehyung Choi ,

= + N 1 N M measurements M 1 sparse signal Problem : Solve for x Basis pursuit,

Note on von Neumann and R enyi entropies of a graph Jephian C.-H. Lin Department of

Randomness in Computing L ECTURE 21 Last time Probabilistic method Sample and Modify

The Fundamental Theorem prof. dr Arno Siebes Algorithmic Data Analysis Group Department of

Wasserstein barycenters over Riemannian manifolds Brendan Pass (joint work with Y.H. Kim (UBC))

Computing and Communications 2. Information Theory -Entropy Ying - PowerPoint PPT Presentation

1896 1920 1987 2006 Computing and Communications 2. Information Theory -Entropy Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1 Outline Entropy Joint entropy and conditional

Entropy, Relative Entropy, Cross Entropy Entropy Entropy, H(x) is a measure of the uncertainty of

Formal Modeling in Cognitive Science Lecture 25: Entropy, Joint Entropy, Conditional Entropy 1

Chapter 2 Entropy, Relative Entropy, and Mutual Infor- mation Peng-Hua Wang Graduate Institute

1) Entropy = measure of randomness 2) Entropy = measure of compressibility More random = Less

Entropy Coding Definition of Entropy Three Entropy coding techniques: (taken from the

Information Theory Lecture 1 Course introduction Entropy, relative entropy and mutual

Entropy and Shannon information Entropy and Shannon information For a random variable X with

Topological entropy and algebraic entropy on locally compact abelian groups - The Bridge Theorem

Probabilistic Models of Human Sentence Experiment 1: Entropy and Sentence Length 2 Processing

Entropy and The Second Law of Thermodynamics Entropy (S)

Entropy Change in Entropy Reversible Isobaric Process Ideal Gas in a Reversible Process Free

Road detection via entropy By Anna Zaidman 1 1 What is entropy? Entropy is a mathematically

Orc David Schleef Entropy Wave Inc (c) 2009 Entropy Wave Inc What is Orc A system for

Information &amp; Entropy Comp 595 DM Professor Wang Information &amp; Entropy Information

Huffman Encoding 13-Oct-11 Entropy Entropy is a measure of information content: the number of

Infotheory for Statistics and Learning Lecture 1 Entropy Relative entropy Mutual

Optimization for Machine Learning Lecture 1: Introduction to Convexity S.V . N. (vishy)

Convex Functions (II) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline The

Application of K ahler manifold to signal processing and Bayesian inference Jaehyung Choi ,

= + N 1 N M measurements M 1 sparse signal Problem : Solve for x Basis pursuit,

Note on von Neumann and R enyi entropies of a graph Jephian C.-H. Lin Department of

Randomness in Computing L ECTURE 21 Last time Probabilistic method Sample and Modify

The Fundamental Theorem prof. dr Arno Siebes Algorithmic Data Analysis Group Department of

Wasserstein barycenters over Riemannian manifolds Brendan Pass (joint work with Y.H. Kim (UBC))

Information & Entropy Comp 595 DM Professor Wang Information & Entropy Information