Lecture 1 Introduction I-Hsiang Wang Department of Electrical - - PowerPoint PPT Presentation

lecture 1 introduction
SMART_READER_LITE
LIVE PREVIEW

Lecture 1 Introduction I-Hsiang Wang Department of Electrical - - PowerPoint PPT Presentation

Course Information Overview Lecture 1 Introduction I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw September 22, 2015 1 / 46 I-Hsiang Wang IT Lecture 1 Course Information Overview


slide-1
SLIDE 1

Course Information Overview

Lecture 1 Introduction

I-Hsiang Wang

Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw

September 22, 2015

1 / 46 I-Hsiang Wang IT Lecture 1

slide-2
SLIDE 2

Course Information Overview

Information Theory

Information Theory is a mathematical theory of information Information is usually obtained by getting some “messages” (speech, text, images, etc.) from others. When obtaining information from a message, you may care about: What is the meaning of a message? How important is the message? How much information can I get from the message?

2 / 46 I-Hsiang Wang IT Lecture 1

slide-3
SLIDE 3

Course Information Overview

Information Theory

Information Theory is a mathematical theory of information. Information is usually obtained by getting some “messages” (speech, text, images, etc.) from others. When obtaining information from a message, you may care about: What is the meaning of a message? How important is the message? How much information can I get from the message? Information theory is about the quantification of information.

3 / 46 I-Hsiang Wang IT Lecture 1

slide-4
SLIDE 4

Course Information Overview

Information Theory

Information Theory is a mathematical theory of information (primarily for communication systems) that Establishes the fundamental limits of communication systems (Quantifies the amount of information that can be delivered from a party to another) Built upon probability theory and statistics Main concern: ultimate performance limit (usually the rate of information processing) as certain resources (usually the total amount of time) scales to the asymptotic regime, given that the desired information is delivered “reliably”.

4 / 46 I-Hsiang Wang IT Lecture 1

slide-5
SLIDE 5

Course Information Overview

In this course, we will

1 Establish solid foundations and intuitions of information theory, 2 Introduce explicit methods to achieve information theoretic limits, 3 Demonstrate further applications of information theory beyond

communications.

Later, we begin with a brief overview of information theory and the materials to be covered in this course.

5 / 46 I-Hsiang Wang IT Lecture 1

slide-6
SLIDE 6

Course Information Overview

1 Course Information 2 Overview

6 / 46 I-Hsiang Wang IT Lecture 1

slide-7
SLIDE 7

Course Information Overview

Logistics

1 Instructor: I-Hsiang Wang 王奕翔

Email: ihwang@ntu.edu.tw Office: MD-524 明達館 524 室 Office Hours: 17:00 – 18:00, Monday and Tuesday

2 Lecture Time:

13:20 – 14:10 (6) Tuesday, and 10:20 – 12:10 (34) Wednesday

3 Lecture Location: EE2-225 電機二館 225 室 4 Course Website:

http://homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/IT.html

5 Prerequisites: Probability, Linear Algebra.

7 / 46 I-Hsiang Wang IT Lecture 1

slide-8
SLIDE 8

Course Information Overview

Logistics

6 Grading: Homework (35%), Midterm (30%), Final (35%) 7 References

  • T. Cover and J. Thomas, Elements of Information Theory, 2nd

Edition, Wiley-Interscience, 2006.

  • R. Gallager, Information Theory and Reliable Communications,

Wiley, 1968.

  • I. Csiszar and J. Korner, Information Theory: Coding Theorems for

Discrete Memoryless Systems, 2nd Edition, Cambridge University Press, 2011.

  • S. M. Moser, Information Theory (Lecture Notes), 4th edition, ISI

Lab, ETH Zürich, Switzerland, 2014.

  • R. Yeung, Information Theory and Network Coding, Springer, 2008.
  • A. El Gamal and Y.-H. Kim, Network Information Theory, Cambridge

University Press, 2011.

8 / 46 I-Hsiang Wang IT Lecture 1

slide-9
SLIDE 9

Course Information Overview

Homework

1 Roughly 5 ∼ 6 problems every two weeks, in total 7 times. 2 Homework (HW) is usually released on Monday. Deadline of

submission is usually on the next Wednesday in class.

3 Late homework = 0 points. (Let me know in advance if you have difficulties.) 4 Everyone has to develop detailed solution for one HW problem,

documented in L

AT

EX and submitted 1 week after the HW due.

We will provide L

A

T EX templates, and you should discuss with the instructor about the homework problem that you are in charge of, making sure the solution is correct.

5 This additional effort accounts for part of your homework grades.

9 / 46 I-Hsiang Wang IT Lecture 1

slide-10
SLIDE 10

Course Information Overview

Reading and Lecture Notes

1 Slides: Slides are usually released/updated every Sunday evening. 2 Readings: Each lecture has assigned readings.

Reading is required: it is not enough to learn from the slides!

3 Go through the slides and the assigned readings before our lectures.

It helps you learn better.

4 I recommend you get a copy of the textbook by Cover and Thomas.

It is a good reference, and we will often assign readings in the book.

5 Other assigned readings could be Moser’s lecture note (can be

  • btained online) and relevant papers.

10 / 46 I-Hsiang Wang IT Lecture 1

slide-11
SLIDE 11

Course Information Overview

Interaction

1 In-class:

Language: This class is taught in English. However, to encourage interaction, feel free to ask questions in Mandarin. I will repeat your question in English (if necessary), and answer it in English. Exercises: We put some exercises on the slides to help you learn and

  • understand. Occasionally, I will call for volunteer to solve the

exercises in class. Volunteers get bonus.

2 Out-of-class:

Office Hours: Both TA and myself have 2-hour office hours per

  • week. You are more than welcome to come visit us and ask

questions, discuss about research, chat, complain, etc. If you cannot make it to the regular office hours, send us emails to schedule a time slot. My schedule can be found in my website. Send us emails with a subject starting with “[NTU Fall15 IT]” . Feedback: There will be online polls during the semester to collect your feedback anonymously.

11 / 46 I-Hsiang Wang IT Lecture 1

slide-12
SLIDE 12

Course Information Overview

Course Outline

Measures of Information: entropy, conditional entropy, relative entropy (KL divergence), mutual information. Lossless Source Coding: lossless source coding theorem, discrete memoryless sources, asymptotic equipartition property, typical sequences, Fano’s inequality, converse proof, ergodic sources, entropy rate. Noisy Channel Coding: noisy channel coding theorem, discrete memoryless channels, random coding, typicality decoder, threshold decoder, error probability analysis, converse proof, channel with feedback. Channel Coding over Continuous Valued Channels: channel coding with cost constraints, discretization technique, differential entropy, Gaussian channel capacity.

12 / 46 I-Hsiang Wang IT Lecture 1

slide-13
SLIDE 13

Course Information Overview

Course Outline

Lossy Source Coding (Rate Distortion Theory): distortion, rate-distortion tradeoff, typicality encoder, converse proof. Source-Channel Separation and Joint Source-Channel Coding Information Theory and Statistics: method of types, Sanov’s theorem, large deviation, hypothesis testing, estimation, Cramér-Rao lower bound, non-parametric estimation. Data Compression: prefix-free code, Kraft’s inequality, Huffman code, Lempel-Ziv compression. Capacity Achieving Channel Codes: polar codes, LDPC codes. Selected Advanced Topics: network coding, compressed sensing, community detection, non-asymptotic information theory, etc.

13 / 46 I-Hsiang Wang IT Lecture 1

slide-14
SLIDE 14

Course Information Overview

Tentative Schedule

Week Date Content Remark 1 09/15, 16 Introduction; Measures of Information 2 09/22, 23 Measures of Information 3 09/29, 30 Lossless Source Coding HW1 out 4 10/06, 07 Lossless Source Coding HW1 due 5 10/13, 14 Noisy Channel Coding HW2 out 6 10/20, 21 Noisy Channel Coding HW2 due 7 10/27, 28 Continuous-Valued Channel Coding HW3 out 8 11/03, 04 Continuous-Valued Channel Coding HW3 due 9 11/10, 11 Midterm Exam

14 / 46 I-Hsiang Wang IT Lecture 1

slide-15
SLIDE 15

Course Information Overview

Tentative Schedule

Week Date Content Remark 10 11/17, 18 Lossy Source Coding HW4 out 11 11/24, 25 Joint Source-Channel Coding HW4 due 12 12/01, 02 Information Theory and Statistics HW5 out 13 12/08, 09 Information Theory and Statistics HW5 due 14 12/15, 16 Data Compression HW6 out 15 12/22, 23 Data Compression; Polar Code HW6 due 16 12/29, 30 Polar Code HW7 out 17 01/05, 06 Advanced Topics HW7 due 18 01/12, 13 Final Exam

15 / 46 I-Hsiang Wang IT Lecture 1

slide-16
SLIDE 16

Course Information Overview

1 Course Information 2 Overview

16 / 46 I-Hsiang Wang IT Lecture 1

slide-17
SLIDE 17

Course Information Overview 17 / 46 I-Hsiang Wang IT Lecture 1

slide-18
SLIDE 18

Course Information Overview

Claude E. Shannon

(1916 – 2001)

18 / 46 I-Hsiang Wang IT Lecture 1

slide-19
SLIDE 19

Course Information Overview

Information theory is a mathematical theory

  • f communication

19 / 46 I-Hsiang Wang IT Lecture 1

slide-20
SLIDE 20

Course Information Overview 20 / 46 I-Hsiang Wang IT Lecture 1

slide-21
SLIDE 21

Course Information Overview

Information theory is the mathematical theory

  • f communication

21 / 46 I-Hsiang Wang IT Lecture 1

slide-22
SLIDE 22

Course Information Overview

Origin of Information Theory

22 / 46 I-Hsiang Wang IT Lecture 1

slide-23
SLIDE 23

Course Information Overview

Origin of Information Theory

23 / 46 I-Hsiang Wang IT Lecture 1

slide-24
SLIDE 24

Course Information Overview

Origin of Information Theory

Shannon’s landmark paper in 1948 is generally considered as the “birth” of information theory. In the paper, Shannon set it clear that information theory is about the quantification of information in a communication system. In particular, it focuses on characterizing the necessary and sufficient condition of whether or not a destination terminal can reproduce a message generated by a source terminal.

24 / 46 I-Hsiang Wang IT Lecture 1

slide-25
SLIDE 25

Course Information Overview

What is Information Theory about?

25 / 46 I-Hsiang Wang IT Lecture 1

slide-26
SLIDE 26

Course Information Overview

It is about the analysis of fundamental limits

1 Stochastic modeling

It is a unified theory based on stochastic modeling (information source, noisy channel, etc.).

2 Theorems, not only definitions

It provides mathematical theorems on optimal performance of algorithms (coding schemes), rather than merely definitions.

3 Sharp phase transition

It draws the boundary between what is possible to achieve and what is impossible, leading to math-driven system design.

26 / 46 I-Hsiang Wang IT Lecture 1

slide-27
SLIDE 27

Course Information Overview

It is about the design driven by theory

lossless

Universal data compression Error correcting code DSL modem Cellular system Wireless access network Cryptography

…and much more!

27 / 46 I-Hsiang Wang IT Lecture 1

slide-28
SLIDE 28

Course Information Overview

Communication System

Encoder Channel Decoder Source Destination Noise

Above is an abstract model of communication system:

1 The source would like to deliver some message to the destination,

where the message includes speech, image, video, audio, text, etc.

2 The channel is the physical medium that connects the source and

the destination, such as cable, optical fiber, EM radiation, etc., and is usually subject to certain noise disturbances.

3 The encoder can carry out any processing of the source output,

including compression, modulation, insertion of redundancy, etc.

4 The decoder can carry out any processing of the channel output to

reproduce the source message.

28 / 46 I-Hsiang Wang IT Lecture 1

slide-29
SLIDE 29

Course Information Overview

A primary concern of information theory is on the encoder and the decoder, both in terms of How the encoder and the decoder function, and The existence or nonexistence of encoders and decoders that achieve a given level of performance

29 / 46 I-Hsiang Wang IT Lecture 1

slide-30
SLIDE 30

Course Information Overview

Prior to the 1948 paper, design of communication systems followed the analog paradigm – if the source produces a electromagnetic waveform, the destination should try its best to reconstruct this waveform, in order to extract useful information (usually, voice). This line of research was based on Fourier analysis and gave birth to sampling theory. Shannon asked: If the receiver knows that a sine wave of unknown frequency is to be communicated, why not simply send the frequency rather than the entire waveform? Prior to Shannon, theorist and engineers were able to analyze the performance of certain choice of encoders/decoders, but had little knowledge about what is the ultimate limit. Shannon asked: For all possible encoders/decoders, what is the necessary and sufficient condition for the destination to be able to reconstruct the message sent from the source?

30 / 46 I-Hsiang Wang IT Lecture 1

slide-31
SLIDE 31

Course Information Overview

Shannon’s View

Encoder Channel Decoder Source Destination Noise

Key new insights due to Shannon’s work: Shannon: “Information is the resolution of uncertainty.” Indeed, the set of possible source outputs, rather than any particular output, is

  • f primary interest.

Introduction of an abstract mathematical model of communication system based on random processes (hence, a stochastic model) Creation of the digital paradigm of communication system design – bit – as the universal currency of information, by proposing and proving the source-channel separation theorem

31 / 46 I-Hsiang Wang IT Lecture 1

slide-32
SLIDE 32

Course Information Overview

Stochastic Modeling

Encoder Channel Decoder Source Destination Noise

The stochastic modeling of communication system comprises: Source: model the information source by random processes, where the data to be conveyed is drawn randomly from a given distribution Channel: model the noisy channel by random processes, where the impact of noise is drawn randomly from a given distribution Why use random processes to model communication system? Shannon: “The system must be designed to operate for each possible selection, not just the one which will actually be chosen since this is unknown at the time of design.”

32 / 46 I-Hsiang Wang IT Lecture 1

slide-33
SLIDE 33

Course Information Overview

Source-Channel Separation

Source Encoder Source Noisy Channel Channel Encoder Destination Source Decoder Channel Decoder Binary Interface Bits Bits

Shannon showed that by splitting the coders into source coders and channel coders, the fundamental limit of the system remains the same. In other words, introducing a digital (binary) interface does not incur any loss of optimality, in terms of whether or not the destination can reproduce the source data. Separation of source coding and channel coding simplifies engineering design – source coder design (data compression) and channel coder design (data transmission).

33 / 46 I-Hsiang Wang IT Lecture 1

slide-34
SLIDE 34

Course Information Overview

I have been always wondering how on earth Shannon came up with the brilliant idea of separating source coding and channel coding. A very likely answer: “Shannon is simply a genius.” A more down-to-earth one: “Shannon saw the essence of research: seek for simplification first.”

34 / 46 I-Hsiang Wang IT Lecture 1

slide-35
SLIDE 35

Course Information Overview

Original Block Diagram

Encoder Channel Decoder Source Destination Noise

Simplification: Remove the Channel Noise!

Encoder Channel Decoder Source Destination Noise

This step makes life much easier. Yet, it is still a non-trivial problem.

35 / 46 I-Hsiang Wang IT Lecture 1

slide-36
SLIDE 36

Course Information Overview

Source Coding (Data Compression)

Noisy Channel Channel Encoder Channel Decoder Source Encoder Source Destination Source Decoder

Features of source messages: Uncertainty: the destination has no idea what message is chosen by the source a priori. Redundancy: though randomly chosen, some choices are more likely, while others are less likely. Goal: Remove redundancy of the source message and represent it by a bit sequence, so that it can be delivered to the destination reliably.

36 / 46 I-Hsiang Wang IT Lecture 1

slide-37
SLIDE 37

Course Information Overview

Source Coding (Data Compression)

Noisy Channel Channel Encoder Channel Decoder Source Encoder Source Destination Source Decoder

s[1], . . . , s[N] b[1], . . . , b[K] b s[1], . . . , b s[N]

Notations: {s[1], . . . , s[N]} represent the source message; each s[t] is called a “source symbol”. {b[1], . . . , b[K]} represent the codeword, generated by the source encoder; each bit b[t] is called a “source codeword symbol (bit)”. { s[1], . . . , s[N]} represent the reproduced source message at the destination.

37 / 46 I-Hsiang Wang IT Lecture 1

slide-38
SLIDE 38

Course Information Overview

Source Coding (Data Compression)

Noisy Channel Channel Encoder Channel Decoder Source Encoder Source Destination Source Decoder

s[1], . . . , s[N] b[1], . . . , b[K] b s[1], . . . , b s[N]

Question: For a given N (# of source symbols), what is the minimum K (# of bits) to recover s[1], . . . , s[N] at the decoder? It is not hard to show that the smallest K = Θ (N). (check!)

The right (non-trivial) question to ask is: What is the minimum value of K

N ?

38 / 46 I-Hsiang Wang IT Lecture 1

slide-39
SLIDE 39

Course Information Overview

Source Coding (Data Compression)

Noisy Channel Channel Encoder Channel Decoder Source Encoder Source Destination Source Decoder

s[1], . . . , s[N] b[1], . . . , b[K] b s[1], . . . , b s[N]

Shannon answered the above question and characterized the necessary and sufficient condition for (lossless) source coding: A Source Coding Theorem The destination can reconstruct the source message losslessly ⇐ ⇒ code rate R := K

N > the entropy rate of the source, H(S)

We will define entropy in Lecture 2; it is a quantity that can be computed from the distribution of the source random process {S[t] | t ∈ N}.

39 / 46 I-Hsiang Wang IT Lecture 1

slide-40
SLIDE 40

Course Information Overview

Original Block Diagram

Encoder Channel Decoder Source Destination Noise

Simplification’: Remove the Source Redundancy!

Encoder Channel Decoder Source Destination Noise i.i.d. Bernoulli(1/2) i.e., Random Bits

It remains a highly non-trivial problem.

40 / 46 I-Hsiang Wang IT Lecture 1

slide-41
SLIDE 41

Course Information Overview

Channel Coding (Data Transmission)

Source Encoder Source Noisy Channel Channel Encoder Destination Source Decoder Channel Decoder

Features of noisy channel: Noise: channel input sent by the channel encoder is corrupted by the noise randomly, and produce the channel output. Uniform messages: input of channel encoders are assumed WLOG to be bit sequences with no redundancy, since source coding already removes all redundancy and convert to bit sequences. Goal: add minimum redundancy so that messages (bit sequences) can be communicated over the noisy channel and decoded reliably.

41 / 46 I-Hsiang Wang IT Lecture 1

slide-42
SLIDE 42

Course Information Overview

Channel Coding (Data Transmission)

Source Encoder Source Noisy Channel Channel Encoder Destination Source Decoder Channel Decoder

b[1], . . . , b[K] b b[1], . . . ,b b[K] x[1], . . . , x[N] y[1], . . . , y[N]

p (y|x)

Notations: {x[1], . . . , x[N]} represent the codeword; each x[t] is called a “coded symbol”. {b[1], . . . , b[K]} represent the message; each bit b[t] is called a “data symbol (bit)”. {y[1], . . . , y[N]} represent the channel output.

42 / 46 I-Hsiang Wang IT Lecture 1

slide-43
SLIDE 43

Course Information Overview

Channel Coding (Data Transmission)

Source Encoder Source Noisy Channel Channel Encoder Destination Source Decoder Channel Decoder

b[1], . . . , b[K] b b[1], . . . ,b b[K] x[1], . . . , x[N] y[1], . . . , y[N]

p (y|x)

Question: For a given K (# of input bits), what is the minimum N (# of coded symbols) to recover b[1], . . . , b[K] at the decoder? It turns out that N = Θ (K). However, proving this is already non-trivial.

Shannon further ask: What is the maximum value of K

N ?

43 / 46 I-Hsiang Wang IT Lecture 1

slide-44
SLIDE 44

Course Information Overview

Channel Coding (Data Transmission)

Source Encoder Source Noisy Channel Channel Encoder Destination Source Decoder Channel Decoder

b[1], . . . , b[K] b b[1], . . . ,b b[K] x[1], . . . , x[N] y[1], . . . , y[N]

p (y|x)

Shannon gave the necessary and sufficient condition for channel coding: A Channel Coding Theorem The (channel) decoder can decode the message reliably ⇐ ⇒ code rate R := K

N < the channel capacity of the channel, C

We will define channel capacity later; it is a quantity that can be computed by maximizing the “mutual information” between X and Y, which can be computed from the conditional distribution of the channel.

44 / 46 I-Hsiang Wang IT Lecture 1

slide-45
SLIDE 45

Course Information Overview

Summary

45 / 46 I-Hsiang Wang IT Lecture 1

slide-46
SLIDE 46

Course Information Overview

Information theory focuses on the quantitative aspects of information, not the qualitative aspects Information theory is mainly about what is possible and what is impossible in communication systems In information theory, one investigates problems in communication systems through the lens of probability theory and statistics In the course, we mainly focus on discrete-time signals, not on continuous-time signals Source-channel separation forms the basis of digital communication, where binary digits (bits) become the universal currency of information

46 / 46 I-Hsiang Wang IT Lecture 1