Computing and Communications 2. Information Theory -Entropy Ying - - PowerPoint PPT Presentation

computing and communications 2 information theory entropy
SMART_READER_LITE
LIVE PREVIEW

Computing and Communications 2. Information Theory -Entropy Ying - - PowerPoint PPT Presentation

1896 1920 1987 2006 Computing and Communications 2. Information Theory -Entropy Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1 Outline Entropy Joint entropy and conditional


slide-1
SLIDE 1

1896 1920 1987 2006

Computing and Communications

  • 2. Information Theory
  • Entropy

Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn

1

slide-2
SLIDE 2

Outline

  • Entropy
  • Joint entropy and conditional entropy
  • Relative entropy and mutual information
  • Relationship between entropy and mutual

information

  • Chain rules for entropy, relative entropy and mutual

information

  • Jensen’s inequality and its consequences

2

slide-3
SLIDE 3

Reference

  • Elements of information theory, T. M. Cover and J. A.

Thomas, Wiley

3

slide-4
SLIDE 4

OVERVIEW

4

slide-5
SLIDE 5

Information Theory

  • Information theory answers two fundamental

questions in communication theory

– what is the ultimate data compression?

  • - entropy H

– what is the ultimate transmission rate of communication?

  • - channel capacity C
  • Information theory is considered as a subset of

communication theory

5

slide-6
SLIDE 6

Information Theory

  • Information theory has fundamental contributions to
  • ther fields

6

slide-7
SLIDE 7

A Mathematical Theory of Commun.

  • In 1948, Shannon published “A Mathematical Theory
  • f Communication”, founding Information Theory
  • Shannon made two major modifications having huge

impact on communication design

– the source and channel are modeled probabilistically – bits became the common currency of communication

7

slide-8
SLIDE 8

A Mathematical Theory of Commun.

  • Shannon proved the following three theorems

– Theorem 1. Minimum compression rate of the source is its entropy rate H – Theorem 2. Maximum reliable rate over the channel is its mutual information I – Theorem 3. End-to-end reliable communication happens if and only if H < I, i.e. there is no loss in performance by using a digital interface between source and channel coding

  • Impacts of Shannon’s results

– after almost 70 years, all communication systems are designed based

  • n the principles of information theory

– the limits not only serve as benchmarks for evaluating communication schemes, but also provide insights on designing good ones – basic information theoretic limits in Shannon’s theorems have now been successfully achieved using efficient algorithms and codes

8

slide-9
SLIDE 9

ENTROPY

9

slide-10
SLIDE 10

Definition

  • Entropy is a measure of the uncertainty of a r.v.
  • Consider discrete r.v. X with alphabet

and p.m.f.

– log is to the base 2, and entropy is expressed in bits

  • e.g., the entropy of a fair coin toss is 1 bit

– define , since

  • adding terms of zero probability does not change the entropy

10

( ) Pr[ ], p x X x x   

log 0 as x x x   0log0 

slide-11
SLIDE 11

Properties

– entropy is nonnegative – base of log can be changed

11

slide-12
SLIDE 12

Example

– H(X)=1 bit when p=0.5

  • maximum uncertainty

– H(X)=0 bit when p=0 or 1

  • minimum uncertainty

– concave function of p

12

slide-13
SLIDE 13

Example

13

slide-14
SLIDE 14

JOINT ENTROPY AND CONDITIONAL ENTROPY

14

slide-15
SLIDE 15

Joint Entropy

  • Joint entropy is a measure of the uncertainty of a

pair of r.v.s

  • Consider a pair of discrete r.v.s (X,Y) with alphabet

and p.m.f.s

15

, ( ) Pr[ ], ( ) Pr[ ], p x X x x p y Y y y       ,

slide-16
SLIDE 16

Conditional Entropy

  • Conditional entropy of a r.v. (Y) given another r.v. (X)

– expected value of entropies of conditional distributions, averaged over conditioning r.v.

16

slide-17
SLIDE 17

Chain Rule

17

slide-18
SLIDE 18

Chain Rule

18

slide-19
SLIDE 19

Example

19

slide-20
SLIDE 20

Example

20

slide-21
SLIDE 21

RELATIVE ENTROPY AND MUTUAL INFORMATION

21

slide-22
SLIDE 22

Relative Entropy

  • Relative entropy is a measure of the “distance”

between two distributions

– convention: – if there is any

22

0log 0, 0log 0 and log p p q     such that ( ) 0 and ( ) 0, then ( || ) . x p x q x D p q     

slide-23
SLIDE 23

Example

23

slide-24
SLIDE 24

Mutual Information

  • Mutual information is a measure of the amount of

information that one r.v. contains about another r.v.

24

slide-25
SLIDE 25

RELATIONSHIP BETWEEN ENTROPY AND MUTUAL INFORMATION

25

slide-26
SLIDE 26

Relation

26

slide-27
SLIDE 27

Proof

27

slide-28
SLIDE 28

Illustration

28

slide-29
SLIDE 29

CHAIN RULES FOR ENTROPY, RELATIVE ENTROPY, AND MUTUAL INFORMATION

29

slide-30
SLIDE 30

Chain Rule for Entropy

30

slide-31
SLIDE 31

Proof

31

slide-32
SLIDE 32

Alternative Proof

32

slide-33
SLIDE 33

Chain Rule for Information

33

slide-34
SLIDE 34

Proof

34

slide-35
SLIDE 35

Chain Rule for Relative Entropy

35

slide-36
SLIDE 36

Proof

36

slide-37
SLIDE 37

JENSEN'S INEQUALITY AND ITS CONSEQUENCES

37

slide-38
SLIDE 38

Convex & Concave Functions

  • Examples:

38

2

convex functions: , | |, , log (for 0)

x

x x e x x x  concave functions: log and (for 0) x x x  linear functions are both convex and concave ax b 

slide-39
SLIDE 39

Convex & Concave Functions

39

slide-40
SLIDE 40

Jensen’s Inequality

40

slide-41
SLIDE 41

Information Inequality

41

slide-42
SLIDE 42

Proof

42

slide-43
SLIDE 43

Nonnegativity of Mutual Information

43

slide-44
SLIDE 44
  • Max. Entropy Dist. – Uniform Dist.

44

slide-45
SLIDE 45

Conditioning Reduces Entropy

45

slide-46
SLIDE 46

Independence Bound on Entropy

46

slide-47
SLIDE 47

Summary

47

slide-48
SLIDE 48

Summary

48

slide-49
SLIDE 49

Summary

49

slide-50
SLIDE 50

cuiying@sjtu.edu.cn iwct.sjtu.edu.cn/Personal/yingcui

50