Course Information Overview
Lecture 1 Introduction
I-Hsiang Wang
Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw
September 22, 2015
1 / 46 I-Hsiang Wang IT Lecture 1
Lecture 1 Introduction I-Hsiang Wang Department of Electrical - - PowerPoint PPT Presentation
Course Information Overview Lecture 1 Introduction I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw September 22, 2015 1 / 46 I-Hsiang Wang IT Lecture 1 Course Information Overview
Course Information Overview
I-Hsiang Wang
Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw
September 22, 2015
1 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
Information Theory is a mathematical theory of information Information is usually obtained by getting some “messages” (speech, text, images, etc.) from others. When obtaining information from a message, you may care about: What is the meaning of a message? How important is the message? How much information can I get from the message?
2 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
Information Theory is a mathematical theory of information. Information is usually obtained by getting some “messages” (speech, text, images, etc.) from others. When obtaining information from a message, you may care about: What is the meaning of a message? How important is the message? How much information can I get from the message? Information theory is about the quantification of information.
3 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
Information Theory is a mathematical theory of information (primarily for communication systems) that Establishes the fundamental limits of communication systems (Quantifies the amount of information that can be delivered from a party to another) Built upon probability theory and statistics Main concern: ultimate performance limit (usually the rate of information processing) as certain resources (usually the total amount of time) scales to the asymptotic regime, given that the desired information is delivered “reliably”.
4 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
1 Establish solid foundations and intuitions of information theory, 2 Introduce explicit methods to achieve information theoretic limits, 3 Demonstrate further applications of information theory beyond
communications.
Later, we begin with a brief overview of information theory and the materials to be covered in this course.
5 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
1 Course Information 2 Overview
6 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
1 Instructor: I-Hsiang Wang 王奕翔
Email: ihwang@ntu.edu.tw Office: MD-524 明達館 524 室 Office Hours: 17:00 – 18:00, Monday and Tuesday
2 Lecture Time:
13:20 – 14:10 (6) Tuesday, and 10:20 – 12:10 (34) Wednesday
3 Lecture Location: EE2-225 電機二館 225 室 4 Course Website:
http://homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/IT.html
5 Prerequisites: Probability, Linear Algebra.
7 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
6 Grading: Homework (35%), Midterm (30%), Final (35%) 7 References
Edition, Wiley-Interscience, 2006.
Wiley, 1968.
Discrete Memoryless Systems, 2nd Edition, Cambridge University Press, 2011.
Lab, ETH Zürich, Switzerland, 2014.
University Press, 2011.
8 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
1 Roughly 5 ∼ 6 problems every two weeks, in total 7 times. 2 Homework (HW) is usually released on Monday. Deadline of
submission is usually on the next Wednesday in class.
3 Late homework = 0 points. (Let me know in advance if you have difficulties.) 4 Everyone has to develop detailed solution for one HW problem,
documented in L
AT
EX and submitted 1 week after the HW due.
We will provide L
A
T EX templates, and you should discuss with the instructor about the homework problem that you are in charge of, making sure the solution is correct.
5 This additional effort accounts for part of your homework grades.
9 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
1 Slides: Slides are usually released/updated every Sunday evening. 2 Readings: Each lecture has assigned readings.
Reading is required: it is not enough to learn from the slides!
3 Go through the slides and the assigned readings before our lectures.
It helps you learn better.
4 I recommend you get a copy of the textbook by Cover and Thomas.
It is a good reference, and we will often assign readings in the book.
5 Other assigned readings could be Moser’s lecture note (can be
10 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
1 In-class:
Language: This class is taught in English. However, to encourage interaction, feel free to ask questions in Mandarin. I will repeat your question in English (if necessary), and answer it in English. Exercises: We put some exercises on the slides to help you learn and
exercises in class. Volunteers get bonus.
2 Out-of-class:
Office Hours: Both TA and myself have 2-hour office hours per
questions, discuss about research, chat, complain, etc. If you cannot make it to the regular office hours, send us emails to schedule a time slot. My schedule can be found in my website. Send us emails with a subject starting with “[NTU Fall15 IT]” . Feedback: There will be online polls during the semester to collect your feedback anonymously.
11 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
Measures of Information: entropy, conditional entropy, relative entropy (KL divergence), mutual information. Lossless Source Coding: lossless source coding theorem, discrete memoryless sources, asymptotic equipartition property, typical sequences, Fano’s inequality, converse proof, ergodic sources, entropy rate. Noisy Channel Coding: noisy channel coding theorem, discrete memoryless channels, random coding, typicality decoder, threshold decoder, error probability analysis, converse proof, channel with feedback. Channel Coding over Continuous Valued Channels: channel coding with cost constraints, discretization technique, differential entropy, Gaussian channel capacity.
12 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
Lossy Source Coding (Rate Distortion Theory): distortion, rate-distortion tradeoff, typicality encoder, converse proof. Source-Channel Separation and Joint Source-Channel Coding Information Theory and Statistics: method of types, Sanov’s theorem, large deviation, hypothesis testing, estimation, Cramér-Rao lower bound, non-parametric estimation. Data Compression: prefix-free code, Kraft’s inequality, Huffman code, Lempel-Ziv compression. Capacity Achieving Channel Codes: polar codes, LDPC codes. Selected Advanced Topics: network coding, compressed sensing, community detection, non-asymptotic information theory, etc.
13 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
Week Date Content Remark 1 09/15, 16 Introduction; Measures of Information 2 09/22, 23 Measures of Information 3 09/29, 30 Lossless Source Coding HW1 out 4 10/06, 07 Lossless Source Coding HW1 due 5 10/13, 14 Noisy Channel Coding HW2 out 6 10/20, 21 Noisy Channel Coding HW2 due 7 10/27, 28 Continuous-Valued Channel Coding HW3 out 8 11/03, 04 Continuous-Valued Channel Coding HW3 due 9 11/10, 11 Midterm Exam
14 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
Week Date Content Remark 10 11/17, 18 Lossy Source Coding HW4 out 11 11/24, 25 Joint Source-Channel Coding HW4 due 12 12/01, 02 Information Theory and Statistics HW5 out 13 12/08, 09 Information Theory and Statistics HW5 due 14 12/15, 16 Data Compression HW6 out 15 12/22, 23 Data Compression; Polar Code HW6 due 16 12/29, 30 Polar Code HW7 out 17 01/05, 06 Advanced Topics HW7 due 18 01/12, 13 Final Exam
15 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
1 Course Information 2 Overview
16 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview 17 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
(1916 – 2001)
18 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
19 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview 20 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
21 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
22 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
23 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
Shannon’s landmark paper in 1948 is generally considered as the “birth” of information theory. In the paper, Shannon set it clear that information theory is about the quantification of information in a communication system. In particular, it focuses on characterizing the necessary and sufficient condition of whether or not a destination terminal can reproduce a message generated by a source terminal.
24 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
25 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
1 Stochastic modeling
It is a unified theory based on stochastic modeling (information source, noisy channel, etc.).
2 Theorems, not only definitions
It provides mathematical theorems on optimal performance of algorithms (coding schemes), rather than merely definitions.
3 Sharp phase transition
It draws the boundary between what is possible to achieve and what is impossible, leading to math-driven system design.
26 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
lossless
Universal data compression Error correcting code DSL modem Cellular system Wireless access network Cryptography
27 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
Encoder Channel Decoder Source Destination Noise
Above is an abstract model of communication system:
1 The source would like to deliver some message to the destination,
where the message includes speech, image, video, audio, text, etc.
2 The channel is the physical medium that connects the source and
the destination, such as cable, optical fiber, EM radiation, etc., and is usually subject to certain noise disturbances.
3 The encoder can carry out any processing of the source output,
including compression, modulation, insertion of redundancy, etc.
4 The decoder can carry out any processing of the channel output to
reproduce the source message.
28 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
A primary concern of information theory is on the encoder and the decoder, both in terms of How the encoder and the decoder function, and The existence or nonexistence of encoders and decoders that achieve a given level of performance
29 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
Prior to the 1948 paper, design of communication systems followed the analog paradigm – if the source produces a electromagnetic waveform, the destination should try its best to reconstruct this waveform, in order to extract useful information (usually, voice). This line of research was based on Fourier analysis and gave birth to sampling theory. Shannon asked: If the receiver knows that a sine wave of unknown frequency is to be communicated, why not simply send the frequency rather than the entire waveform? Prior to Shannon, theorist and engineers were able to analyze the performance of certain choice of encoders/decoders, but had little knowledge about what is the ultimate limit. Shannon asked: For all possible encoders/decoders, what is the necessary and sufficient condition for the destination to be able to reconstruct the message sent from the source?
30 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
Encoder Channel Decoder Source Destination Noise
Key new insights due to Shannon’s work: Shannon: “Information is the resolution of uncertainty.” Indeed, the set of possible source outputs, rather than any particular output, is
Introduction of an abstract mathematical model of communication system based on random processes (hence, a stochastic model) Creation of the digital paradigm of communication system design – bit – as the universal currency of information, by proposing and proving the source-channel separation theorem
31 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
Encoder Channel Decoder Source Destination Noise
The stochastic modeling of communication system comprises: Source: model the information source by random processes, where the data to be conveyed is drawn randomly from a given distribution Channel: model the noisy channel by random processes, where the impact of noise is drawn randomly from a given distribution Why use random processes to model communication system? Shannon: “The system must be designed to operate for each possible selection, not just the one which will actually be chosen since this is unknown at the time of design.”
32 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
Source Encoder Source Noisy Channel Channel Encoder Destination Source Decoder Channel Decoder Binary Interface Bits Bits
Shannon showed that by splitting the coders into source coders and channel coders, the fundamental limit of the system remains the same. In other words, introducing a digital (binary) interface does not incur any loss of optimality, in terms of whether or not the destination can reproduce the source data. Separation of source coding and channel coding simplifies engineering design – source coder design (data compression) and channel coder design (data transmission).
33 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
I have been always wondering how on earth Shannon came up with the brilliant idea of separating source coding and channel coding. A very likely answer: “Shannon is simply a genius.” A more down-to-earth one: “Shannon saw the essence of research: seek for simplification first.”
34 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
Original Block Diagram
Encoder Channel Decoder Source Destination Noise
Simplification: Remove the Channel Noise!
Encoder Channel Decoder Source Destination Noise
This step makes life much easier. Yet, it is still a non-trivial problem.
35 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
Noisy Channel Channel Encoder Channel Decoder Source Encoder Source Destination Source Decoder
Features of source messages: Uncertainty: the destination has no idea what message is chosen by the source a priori. Redundancy: though randomly chosen, some choices are more likely, while others are less likely. Goal: Remove redundancy of the source message and represent it by a bit sequence, so that it can be delivered to the destination reliably.
36 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
Noisy Channel Channel Encoder Channel Decoder Source Encoder Source Destination Source Decoder
s[1], . . . , s[N] b[1], . . . , b[K] b s[1], . . . , b s[N]
Notations: {s[1], . . . , s[N]} represent the source message; each s[t] is called a “source symbol”. {b[1], . . . , b[K]} represent the codeword, generated by the source encoder; each bit b[t] is called a “source codeword symbol (bit)”. { s[1], . . . , s[N]} represent the reproduced source message at the destination.
37 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
Noisy Channel Channel Encoder Channel Decoder Source Encoder Source Destination Source Decoder
s[1], . . . , s[N] b[1], . . . , b[K] b s[1], . . . , b s[N]
Question: For a given N (# of source symbols), what is the minimum K (# of bits) to recover s[1], . . . , s[N] at the decoder? It is not hard to show that the smallest K = Θ (N). (check!)
The right (non-trivial) question to ask is: What is the minimum value of K
N ?
38 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
Noisy Channel Channel Encoder Channel Decoder Source Encoder Source Destination Source Decoder
s[1], . . . , s[N] b[1], . . . , b[K] b s[1], . . . , b s[N]
Shannon answered the above question and characterized the necessary and sufficient condition for (lossless) source coding: A Source Coding Theorem The destination can reconstruct the source message losslessly ⇐ ⇒ code rate R := K
N > the entropy rate of the source, H(S)
We will define entropy in Lecture 2; it is a quantity that can be computed from the distribution of the source random process {S[t] | t ∈ N}.
39 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
Original Block Diagram
Encoder Channel Decoder Source Destination Noise
Simplification’: Remove the Source Redundancy!
Encoder Channel Decoder Source Destination Noise i.i.d. Bernoulli(1/2) i.e., Random Bits
It remains a highly non-trivial problem.
40 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
Source Encoder Source Noisy Channel Channel Encoder Destination Source Decoder Channel Decoder
Features of noisy channel: Noise: channel input sent by the channel encoder is corrupted by the noise randomly, and produce the channel output. Uniform messages: input of channel encoders are assumed WLOG to be bit sequences with no redundancy, since source coding already removes all redundancy and convert to bit sequences. Goal: add minimum redundancy so that messages (bit sequences) can be communicated over the noisy channel and decoded reliably.
41 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
Source Encoder Source Noisy Channel Channel Encoder Destination Source Decoder Channel Decoder
b[1], . . . , b[K] b b[1], . . . ,b b[K] x[1], . . . , x[N] y[1], . . . , y[N]
p (y|x)
Notations: {x[1], . . . , x[N]} represent the codeword; each x[t] is called a “coded symbol”. {b[1], . . . , b[K]} represent the message; each bit b[t] is called a “data symbol (bit)”. {y[1], . . . , y[N]} represent the channel output.
42 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
Source Encoder Source Noisy Channel Channel Encoder Destination Source Decoder Channel Decoder
b[1], . . . , b[K] b b[1], . . . ,b b[K] x[1], . . . , x[N] y[1], . . . , y[N]
p (y|x)
Question: For a given K (# of input bits), what is the minimum N (# of coded symbols) to recover b[1], . . . , b[K] at the decoder? It turns out that N = Θ (K). However, proving this is already non-trivial.
Shannon further ask: What is the maximum value of K
N ?
43 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
Source Encoder Source Noisy Channel Channel Encoder Destination Source Decoder Channel Decoder
b[1], . . . , b[K] b b[1], . . . ,b b[K] x[1], . . . , x[N] y[1], . . . , y[N]
p (y|x)
Shannon gave the necessary and sufficient condition for channel coding: A Channel Coding Theorem The (channel) decoder can decode the message reliably ⇐ ⇒ code rate R := K
N < the channel capacity of the channel, C
We will define channel capacity later; it is a quantity that can be computed by maximizing the “mutual information” between X and Y, which can be computed from the conditional distribution of the channel.
44 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
45 / 46 I-Hsiang Wang IT Lecture 1
Course Information Overview
Information theory focuses on the quantitative aspects of information, not the qualitative aspects Information theory is mainly about what is possible and what is impossible in communication systems In information theory, one investigates problems in communication systems through the lens of probability theory and statistics In the course, we mainly focus on discrete-time signals, not on continuous-time signals Source-channel separation forms the basis of digital communication, where binary digits (bits) become the universal currency of information
46 / 46 I-Hsiang Wang IT Lecture 1